This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-20 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.12194v1
Compressor summary: The paper shows gender bias in mock interview questions for Indian civil service candidates and in language model explanations, and introduces a new dataset for social science research.
http://arxiv.org/abs/2409.12193v1
Compressor summary: Vista3D is a framework that quickly generates 3D models from images using Gaussian Splatting, Signed Distance Functions, and disentangled representations.
http://arxiv.org/abs/2409.12191v1
Compressor summary: The Qwen2-VL Series is an advanced visual processing model that can dynamically adapt to different image resolutions and fuse multimodal information, achieving competitive performance in various benchmarks.
http://arxiv.org/abs/2409.12189v1
Compressor summary: Key points: - The paper proposes a scene-aware social transformer model (SAST) to forecast long-term human motion - SAST can handle varying numbers of people and objects in a scene - SAST uses denoising diffusion models to generate realistic and diverse motion - SAST performs better than other approaches on the Humans in Kitchens dataset Summary: The paper introduces SAST, a model that can forecast long-term human motion in scenes with different numbers of people and objects, using denoising diffusion models for realism and diversity.
http://arxiv.org/abs/2409.12186v1
Compressor summary: The report introduces Qwen2.5-Coder, a new code generation model that outperforms larger models and excels at various code-related tasks.
http://arxiv.org/abs/2409.12183v1
Compressor summary: Key points: - Chain-of-thought (CoT) helps large language models reason on tasks involving math or logic, but not much on other types of tasks. - CoT improves symbolic execution, but is outperformed by symbolic solvers. - CoT can be applied selectively to save costs and improve performance. - There is a need for new paradigms that better leverage intermediate computation. Summary: CoT boosts math and logic reasoning in LLMs, but is not always needed and can be improved by symbolic solvers and new paradigms.
http://arxiv.org/abs/2409.12181v1
Compressor summary: The authors propose a controlled protocol to compare different methods for extending language models to handle long contexts, and find that perplexity is a good indicator, approximate attention underperforms, and exact fine-tuning works well within its range.
http://arxiv.org/abs/2409.12180v1
Compressor summary: The authors study how to improve language models' ability to express uncertainty, which can help users better judge their reliability in information-seeking and decision-making tasks.
http://arxiv.org/abs/2409.12172v1
Compressor summary: YORO is a text-to-SQL model that uses database knowledge during training and eliminates schema encoding, leading to shorter inputs and competitive performance on benchmarks.
http://arxiv.org/abs/2409.12162v1
Compressor summary: The text describes a deep learning method that predicts higher resolution future sky images to improve solar irradiance forecasting, especially for cloud movement near the horizon.
http://arxiv.org/abs/2409.12156v1
Compressor summary: The paper proposes a NeRF-based network for joint audio and expression guided talking face generation that learns disentangled representations using self-supervised and contrastive learning techniques.
http://arxiv.org/abs/2409.12154v1
Compressor summary: The paper proposes new types of explanations for classifier decisions that consider constraints between features, reducing redundant explanations and providing coverage information.
http://arxiv.org/abs/2409.12147v1
Compressor summary: MAgICoRe improves LLM reasoning by categorizing problem difficulty, incorporating external reward models for error localization, and using a multi-agent loop to refine solutions iteratively.
http://arxiv.org/abs/2409.12140v1
Compressor summary: MoRAG is a new text-based human motion generation method that uses improved motion retrieval and multi-part fusion to create diverse and realistic motions for various text descriptions.
http://arxiv.org/abs/2409.12136v1
Compressor summary: The paper introduces GRIN, a method to train large Mixture-of-Experts models more effectively by estimating sparse gradients for expert routing and configuring model parallelism.
http://arxiv.org/abs/2409.12135v1
Compressor summary: The paper shows that linear TD learning in reinforcement learning converges without requiring linearly independent features or other assumptions, using a new characterization of its mean ODE.
http://arxiv.org/abs/2409.12134v1
Compressor summary: The paper proposes a novel Vietnamese MDS framework that combines extractive and abstractive summarization methods using BERT embeddings and VBD-LLaMA2-7B-50b model, achieving better performance than existing approaches.
http://arxiv.org/abs/2409.12126v1
Compressor summary: The paper introduces a new benchmark to test language models' linguistic reasoning across low-resource languages, showing a performance gap between open and closed models.
http://arxiv.org/abs/2409.12122v1
Compressor summary: The report introduces large language models for math with self-improvement features that enhance their reasoning capabilities in Chinese and English.
http://arxiv.org/abs/2409.12116v1
Compressor summary: This paper argues for using stronger baseline models in healthcare ML evaluations to improve model transparency, address data silos, and simplify metrics, enabling better deployment of ML models in clinical settings.
http://arxiv.org/abs/2409.12112v1
Compressor summary: The Pareto Data Framework helps optimize efficiency in resource-constrained environments by selecting Minimum Viable Data for machine learning applications on IoT devices.
http://arxiv.org/abs/2409.12108v1
Compressor summary: SPRMamba is a novel framework for recognizing different phases of endoscopic submucosal dissection, a minimally invasive procedure, using Mamba-based long-term temporal modeling with a Scaled Residual TranMamba block and a Temporal Sample Strategy to improve precision and efficiency.
http://arxiv.org/abs/2409.12106v1
Compressor summary: Generative Psychometrics for Values (GPV) is a data-driven method based on large language models that measures human and AI values from text inputs, overcoming limitations of previous approaches.
http://arxiv.org/abs/2409.12100v1
Compressor summary: The text introduces a new framework for machine learning using higher-order symmetries and category theory to improve model robustness, generalization, and convergence.
http://arxiv.org/abs/2409.12099v1
Compressor summary: Brain-Streams uses textual and visual guidance from fMRI signals to generate more detailed and semantically plausible images.
http://arxiv.org/abs/2409.12097v1
Compressor summary: The paper introduces a new neural retriever architecture that uses multilingual language models to efficiently match job proposals with freelancers based on their skills.
http://arxiv.org/abs/2409.12089v1
Compressor summary: Element ordering greatly affects language model agents' performance in virtual environments, especially when deriving elements from pixels.
http://arxiv.org/abs/2409.12087v1
Compressor summary: The study uses machine learning and deep learning to predict kidney disease progression from insurance claims data, finding that a long short-term memory model outperforms existing methods and can be explained at the patient level.
http://arxiv.org/abs/2409.12076v1
Compressor summary: AdaPrune is a novel method for unsupervised domain adaptation that removes training examples to align the training distribution to the target data using maximum mean discrepancy, achieving better performance than related techniques on bioacoustic event detection.
http://arxiv.org/abs/2409.12060v1
Compressor summary: Paraphrasus is a benchmark for evaluating paraphrase detection models in a more nuanced way than previous methods, revealing trade-offs that were not visible before.
http://arxiv.org/abs/2409.12059v1
Compressor summary: The paper proposes a new language model architecture called TaS that uses a thinking layer to generate more reasonable responses based on prompt-response samples.
http://arxiv.org/abs/2409.12053v1
Compressor summary: EDSFs are neural network-representable set functions that extend Deep Submodular Functions, representing all monotone submodular functions and having better generalization error than DSFs.
http://arxiv.org/abs/2409.12046v1
Compressor summary: The study explores using large language models to automate the creation of clinical trial data summary tables and figures, showing their potential in this domain.
http://arxiv.org/abs/2409.12045v1
Compressor summary: The paper proposes a method that combines safe exploration with learnable constraints for reinforcement learning in real-world robots, improving both performance and safety.
http://arxiv.org/abs/2409.12042v1
Compressor summary: The authors introduce a new dataset of phone conversations to test how well automatic speech recognition systems perform in real-world, unstructured settings with disfluencies and diverse accents.
http://arxiv.org/abs/2409.12040v1
Compressor summary: Key points: - rPPG uses facial video to measure blood volume - Traditional rPPG models need access to source and target domains, which may not be possible or privacy-friendly - The paper proposes SFDA-rPPG, a benchmark that enables domain adaptation without source data - The method uses TSTC-Net and FWD loss to enhance feature consistency and distribution alignment across domains Summary: The paper introduces SFDA-rPPG, a source-free domain adaptation benchmark for rPPG measurement using facial video. It uses a novel FWD loss and a spatio-temporal consistency network to align features and distributions across domains effectively.
http://arxiv.org/abs/2409.12038v1
Compressor summary: The paper introduces Hamiltonian Learning, a novel framework for online neural network learning using differential equations from optimal control theory, which offers flexibility, generalization, and integration without external solvers.
http://arxiv.org/abs/2409.12034v1
Compressor summary: This chapter reviews how combining multi-sensor remote sensing data and deep learning allows for better mapping of glaciers and detecting their changes, particularly for challenging cases like debris-covered and calving glaciers.
http://arxiv.org/abs/2409.12033v1
Compressor summary: The paper proposes a new architecture using the Mamba state-space model to handle graph data with higher-order interactions in simplicial complexes, avoiding message-passing mechanisms.
http://arxiv.org/abs/2409.12031v1
Compressor summary: PhysMamba uses Mamba, a state space model, to capture long-range physiological dependencies from facial videos for remote photoplethysmography measurement.
http://arxiv.org/abs/2409.12026v1
Compressor summary: The paper compares the performance of ViT models and CNN architectures for binary classification tasks in SSS imagery, with ViTs showing better results but requiring more resources.
http://arxiv.org/abs/2409.12016v1
Compressor summary: The text describes a new system and algorithm that can forecast cloud movement and solar power generation more accurately by using high-resolution wide-angle images and spatio-temporal slices of data.
http://arxiv.org/abs/2409.12014v1
Compressor summary: BRDF-NeRF is a machine learning technique that estimates the RPV model to analyze anisotropic reflectance of complex Earth surfaces from satellite imagery.
http://arxiv.org/abs/2409.12011v1
Compressor summary: The paper proposes a mixture of soft prompt learning method with a routing module that selects the best prompts for each instance, and introduces semantically grouped text-level supervision to preserve initial knowledge and reduce overfitting.
http://arxiv.org/abs/2409.12010v1
Compressor summary: ChefFusion is a novel multimodal food computing foundation model that leverages large language models and pre-trained image encoder and decoder models to perform various food computing tasks, such as understanding, recognition, generation, and translation.
http://arxiv.org/abs/2409.12001v1
Compressor summary: The paper highlights the lack of data quality and standardization in offline multi-agent reinforcement learning research and proposes guidelines for dataset generation, dataset standardization, and analysis tools to improve the field.
http://arxiv.org/abs/2409.11995v1
Compressor summary: The paper studies how the loss landscape of neural networks changes with more samples, providing bounds and empirical evidence for convergence in image classification tasks.
http://arxiv.org/abs/2409.11985v1
Compressor summary: The paper proposes a new way to estimate uncertainty in soil property prediction by transforming regression into classification problems, using existing machine learning algorithms and showing its advantages over common pedometric models on German agricultural data.
http://arxiv.org/abs/2409.11983v1
Compressor summary: The paper proposes a new 3D/2D intraoperative registration method during neurosurgery using a Neural Radiance Field controlled by a multi-style hypernetwork, which outperforms current methods and meets clinical standards.
http://arxiv.org/abs/2409.11971v1
Compressor summary: Large language models can create material embeddings from literature, but require optimal context and comparisons for accurate property predictions.
http://arxiv.org/abs/2409.11968v1
Compressor summary: Synthetic data generated by large language models can effectively benchmark simpler NLP tasks but introduces biases for complex ones, and using multiple models reduces these biases.
http://arxiv.org/abs/2409.11960v1
Compressor summary: The paper introduces a new dataset for Chinese sign language recognition in diverse real-life scenarios and proposes a time-frequency network model that performs well under complex backgrounds.
http://arxiv.org/abs/2409.11953v1
Compressor summary: The paper proposes FE-TAP, a point tracker that combines image frames and events to achieve high frame rate and robust tracking in various challenging conditions.
http://arxiv.org/abs/2409.11951v1
Compressor summary: Key points: - The paper proposes a new method to generate realistic and dynamic human head avatars from multi-view imagery in real-time. - The method uses a hierarchical representation of head models that captures complex facial expressions and head movements. - The method is trained end-to-end with video inputs and can synthesize novel views of challenging facial expressions. Summary: The paper presents a real-time method to create dynamic human head avatars from multi-view images using a hierarchical model that learns from video inputs and generalizes to new facial expressions and poses.
http://arxiv.org/abs/2409.11937v1
Compressor summary: The paper proposes DTAN, a method for digital orthodontic planning that decouples predicting tasks and feature modeling, improves hidden features learning, and uses a differentiable collision loss function for point cloud data.
http://arxiv.org/abs/2409.11933v1
Compressor summary: Key points: - RL with heuristic methods solves optimization problems by learning from search process data - RL agent improves suboptimal solution by applying small changes - Approach uses Transformer encoding and probability matrix to swap jobs - Outperforms other heuristics on real production scheduling problem Summary: The paper proposes a novel RL-based heuristic method that uses Transformer encoding and job swapping to optimize production scheduling, showing better results than existing methods.
http://arxiv.org/abs/2409.11929v1
Compressor summary: The study uses machine learning to classify road accident outcomes and identify key factors affecting fatality risk in Dhaka, Bangladesh.
http://arxiv.org/abs/2409.11920v1
Compressor summary: Key points: - Paper proposes a method to generate realistic 3D human motions for unseen action classes using GPTs models and diffusion models - Method decomposes complex actions into simpler movements and recombining them - Method can be integrated with any pre-trained diffusion model and outperforms the state-of-the-art Summary: The paper presents a method that uses GPTs and diffusion models to synthesize realistic 3D human motions for complex actions by breaking them down into simpler movements and combining them.
http://arxiv.org/abs/2409.11919v1
Compressor summary: The paper proposes LLM-wrapper, a black-box method to improve VLMs' zero-shot capabilities on REC task using large language models.
http://arxiv.org/abs/2409.11917v1
Compressor summary: This tutorial explores the use of large language models (LLMs) in education, covering their impact on teaching, learning, and assessment across four applications: reading, writing, speaking, and intelligent tutoring systems.
http://arxiv.org/abs/2409.11904v1
Compressor summary: The paper proposes an efficient annotation framework using Rapidata's technology to collect human feedback and rank text-to-image models based on style, coherence, and alignment.
http://arxiv.org/abs/2409.11902v1
Compressor summary: The authors propose using pooling to compress activation maps in the backward pass of deep neural networks, reducing memory footprint and data movement, while maintaining prediction accuracy.
http://arxiv.org/abs/2409.11901v1
Compressor summary: Key points: - Personalization is important for language tasks and applications - Existing methods have drawbacks such as high cost or low fidelity - Proposed model uses a user-specific embedding to improve personalization without fine-tuning - Model shows better performance on LaMP benchmark Summary: The paper proposes a novel personalized LLM model that uses a user-specific embedding to capture the user's habits and preferences, improving language tasks without fine-tuning. The model outperforms existing methods on various tasks.
http://arxiv.org/abs/2409.11899v1
Compressor summary: The paper presents a new GNN model combining Self-Attention and Message Passing, a dynamic mesh pruning method, a self-supervised training technique based on BERT, and a large dataset for CFD using finite element methods.
http://arxiv.org/abs/2409.11887v1
Compressor summary: DocMamba is a new framework that uses the state space model to understand visually-rich documents more efficiently and effectively than existing transformer-based models.
http://arxiv.org/abs/2409.11884v1
Compressor summary: This paper reviews recent OOD detection methods from different problem scenarios, such as test-time adaptation and multi-modal data, and provides a new taxonomy for the field.
http://arxiv.org/abs/2409.11874v1
Compressor summary: Key points: - The paper introduces a novel evaluation matrix for text and typography accuracy in AI-generated images - It uses letter by letter matching and brevity adjustment to handle errors and redundancies - It analyzes the impact of frequently and less frequently used words on text generation quality - It aims to make AI-generated images more meaningful and democratize graphic design industry Summary: The paper presents a new way to measure how well AI can generate accurate and stylish text in images, using letter matching and word adjustment techniques.
http://arxiv.org/abs/2409.11869v1
Compressor summary: SpheriGait is a novel method that uses spherical projection and a special network block to improve gait recognition from 3D point clouds captured by LiDAR sensors.
http://arxiv.org/abs/2409.11862v1
Compressor summary: Key points: - The text is about a deep learning model (MQ-TCN) for EV charging demand forecasting - The model is location-based, adaptive, and cost-effective - It outperforms previous models on different charging sites with diverse user types Summary: The text introduces a new deep learning model that can accurately predict the electric vehicle charging demand at different locations using limited data.
http://arxiv.org/abs/2409.11859v1
Compressor summary: The paper proposes a new upper bound for the spectral norm of the Jacobian matrix in CNNs, which is independent of the input image resolution and can be efficiently calculated during training, improving their performance.
http://arxiv.org/abs/2409.11856v1
Compressor summary: Key points: - The paper proposes a new pooling operator for graph neural networks that is simple, cheap, and preserves data - The proposed operator improves performance on four benchmark datasets and reduces complexity and parameters compared to existing methods Summary: The paper introduces a novel pooling operator for graph neural networks that balances simplicity, efficiency, and data fidelity, and shows its advantages over existing methods on various datasets.
http://arxiv.org/abs/2409.11847v1
Compressor summary: W-PINNs are a wavelet-based neural network model that efficiently solves singularly perturbed differential equations by representing the solution in wavelet space and searching for it within this space during training.
http://arxiv.org/abs/2409.11844v1
Compressor summary: MEOW is a gradient descent-based unlearning method that uses inverted facts to selectively forget sensitive information from LLMs without compromising their utility or performance on other tasks.
http://arxiv.org/abs/2409.11843v1
Compressor summary: The Graph Neural Network-State Predictive Information Bottleneck framework uses atomic coordinates to learn low-dimensional representations and predict structural, thermodynamic, and kinetic information for slow processes in molecular dynamics simulations.
http://arxiv.org/abs/2409.11827v1
Compressor summary: ExtAbs is a new method that combines extractive and abstractive summarization in one model using a saliency mask to focus on important parts of the input.
http://arxiv.org/abs/2409.11820v1
Compressor summary: The paper proposes using Deep Reinforcement Learning to enhance Job Shop Scheduling Problem in the furniture industry by considering various factors and integrating with ERP and Manufacturing Execution Systems.
http://arxiv.org/abs/2409.11817v1
Compressor summary: EFCM framework combines unsupervised feature distillation with slide-level fine-tuning to improve accuracy and efficiency in medical image analysis using large models, overcoming challenges like memory and inference latency.
http://arxiv.org/abs/2409.11816v1
Compressor summary: This paper proposes a symmetry-based loss function for face verification that leverages facial symmetry to improve the reliability of face embeddings and outperforms existing methods.
http://arxiv.org/abs/2409.11813v1
Compressor summary: EventAug is a systematic augmentation scheme that enhances spatial-temporal diversity in event camera data, improving model robustness and performance across various tasks and backbones.
http://arxiv.org/abs/2409.11807v1
Compressor summary: This paper proposes an extension to Constraint Guided AutoEncoders for machine condition monitoring, which can handle both anomaly detection and condition indicator estimation with improved monotonic behavior.
http://arxiv.org/abs/2409.11802v1
Compressor summary: The paper proposes a new method using GANs to enhance latent fingerprints by preserving local and structural features, improving identification performance for forensic investigations.
http://arxiv.org/abs/2409.11798v1
Compressor summary: The paper tests the effectiveness of large language models as knowledge bases in the legal domain by evaluating them on a dataset of factual questions about case law and legislation using various methods and strategies.
http://arxiv.org/abs/2409.11786v1
Compressor summary: The paper proposes a method to train a light-weight face recognition model that can work well with low-resolution faces by transferring knowledge from high-resolution models using two-step distillation.
http://arxiv.org/abs/2409.11785v1
Compressor summary: Channel distillation is a novel framework that improves deep trackers by adaptively selecting informative feature channels for efficient object tracking with high accuracy, speed, and low memory requirements.
http://arxiv.org/abs/2409.11783v1
Compressor summary: The authors develop and evaluate a medical adaptation of a 7B language model that can operate on low computational resources, achieving comparable or better performance than larger medical models on question-answering tasks in Japanese and English.
http://arxiv.org/abs/2409.11770v1
Compressor summary: The paper proposes KANet, a method to improve few-shot class-incremental learning by using CLIP as a network pedestal and a Knowledge Adapter module that fuses data-specific knowledge into the general representation.
http://arxiv.org/abs/2409.11761v1
Compressor summary: The paper proposes a consistent estimator for distance between covariance matrices based on traces of functions applied separately to each matrix, and shows its asymptotic Gaussianity and superior performance over conventional methods.
http://arxiv.org/abs/2409.11756v1
Compressor summary: The text describes a new AI architecture that uses intrinsic motivation and classical planning to learn and plan without specific goals, integrating low-level and high-level representations in a self-directed exploration process.
http://arxiv.org/abs/2409.11750v1
Compressor summary: The paper presents a method inspired by human memory to improve image recall in artificial systems, achieving high accuracy with natural images but lower performance with textures.
http://arxiv.org/abs/2409.11749v1
Compressor summary: RockTrack is a 3D MOT method that improves multi-camera tracker versatility and performance by using a confidence-guided preprocessing module, a fusion association module, and a novel appearance similarity metric.
http://arxiv.org/abs/2409.11744v1
Compressor summary: The paper proposes a novel method using clustering algorithms and machine learning models to automatically analyze and predict ASD based on gaze patterns in children, achieving state-of-the-art results.
http://arxiv.org/abs/2409.11741v1
Compressor summary: HARP is a multi-agent reinforcement learning framework that integrates human assistance with agent regrouping to improve group-oriented task completion and allow non-experts to provide valuable guidance.
http://arxiv.org/abs/2409.11734v1
Compressor summary: GEO is a versatile image editing technique that combines text and image prompts with a novel geometric loss to achieve precise and diverse editing outcomes without training.
http://arxiv.org/abs/2409.11733v1
Compressor summary: The text introduces an evaluation framework for testing affective cognition in AI models, which shows they can understand emotions and their influence on humans as well as or better than humans themselves.
http://arxiv.org/abs/2409.11727v1
Compressor summary: The paper proposes a new duplex decoding approach for LLMs that improves real-time interactive feedback with minimal additional training.
http://arxiv.org/abs/2409.11726v1
Compressor summary: The paper introduces a probing dataset to assess large language models' error detection skills in role-playing and proposes a self-reflective reasoning method to improve them.
http://arxiv.org/abs/2409.11724v1
Compressor summary: TART is a framework that combines large language models with specialized tools to improve table reasoning tasks like TQA and TFV, using a new dataset and achieving competitive results.
http://arxiv.org/abs/2409.11718v1
Compressor summary: The text proposes a method for compressing videos with rich semantics by using various video feature models (VFMs) and a dynamic trajectory-based inter-frame compression scheme, resulting in improved efficiency and performance on several tasks and datasets.
http://arxiv.org/abs/2409.11704v1
Compressor summary: The paper investigates various format biases in reinforcement learning from human feedback and shows their impact on preference models, reward models, and alignment algorithms.
http://arxiv.org/abs/2409.11703v1
Compressor summary: The paper introduces a system that uses large language models for simplifying software interactions by classifying natural language inputs into API calls and generating sample datasets for evaluating LLMs in API management.
http://arxiv.org/abs/2409.11697v1
Compressor summary: The paper introduces Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN), which use scaling/sign-flipping symmetries to reduce trainable parameters and improve efficiency in neural networks.
http://arxiv.org/abs/2409.11692v1
Compressor summary: ORB-SfMLearner is a novel visual odometry method that uses ORB features for learning-based ego-motion estimation, cross-attention mechanism for explainability, and selective online adaptation for generalizability, achieving better accuracy and performance than previous methods.
http://arxiv.org/abs/2409.11689v1
Compressor summary: PoseDiffusion is a framework that uses a diffusion model with GUNet to generate diverse, correct, and aesthetically pleasing human pose skeletons based on natural language inputs.
http://arxiv.org/abs/2409.11686v1
Compressor summary: The study uses deep learning to improve opportunistic CT scans for diagnosing conditions like sarcosenia, hepatic steatosis, and ascites, which are often underdiagnosed and poorly documented.
http://arxiv.org/abs/2409.11684v1
Compressor summary: Key points: - Sequential models like RNNs or transformers are widely used for probabilistic time series forecasting but struggle with high-dimensional complex distributions and cross-feature dependencies - Diffusion or flow-based models can address these issues by generative modeling, but they face scalability challenges - The proposed approach combines the strengths of recurrent neural networks and diffusion models for efficient and high-quality probabilistic time series forecasting Summary: The paper proposes a new method that combines RNNs and diffusion models to improve probabilistic time series forecasting, overcoming the limitations of both approaches in handling complex distributions and dependencies.
http://arxiv.org/abs/2409.11682v1
Compressor summary: SRIF is a novel framework that uses diffusion-based image morphing, large vision models, and normalizing flow to register and interpolate shapes with semantic information.
http://arxiv.org/abs/2409.11677v1
Compressor summary: The paper introduces HDR, a large and diverse dataset for hierarchical MER, and proposes HDNet, a model that improves MER performance by handling formula details at different levels.
http://arxiv.org/abs/2409.11675v1
Compressor summary: The eXplainable Goal Recognition (XGR) model generates explanations for why and why not questions about an agent's goals, improving human understanding, trust, and decision-making in collaborating with AI agents.
http://arxiv.org/abs/2409.11673v1
Compressor summary: RUIE is a framework that uses in-context learning and retrieval to improve unified information extraction tasks, reducing computational costs and enabling generalization to new tasks.
http://arxiv.org/abs/2409.11671v1
Compressor summary: Key points: - The text presents an approach to anticipate actions and policies of oblivious environments in concurrent stochastic games - The approach uses a finite information state machine that maps states to belief states about the environment's policy - The approach ensures consistency of the belief states using a distance metric - The approach yields an MDP for computing optimal policies and shows experimental results on human activity data Summary: The text describes how to use a consistent finite information state machine to anticipate and optimize actions in concurrent stochastic games with oblivious environments, and demonstrates its effectiveness on real-world tasks.
http://arxiv.org/abs/2409.11664v1
Compressor summary: AMD-MIL is a novel weakly supervised learning method for whole slide image classification that uses agent aggregation and mask denoise mechanisms to improve attention allocation, capture fine details, and enhance interpretability.
http://arxiv.org/abs/2409.11656v1
Compressor summary: The VL-Reader approach combines masked autoencoding with a novel reconstruction and decoder objective to achieve high accuracy in scene text recognition by effectively modeling visual and linguistic information.
http://arxiv.org/abs/2409.11653v1
Compressor summary: SSL framework RDSS uses a modified Frank-Wolfe algorithm to select diverse and representative samples from unlabeled data for annotation, improving performance under low-budget settings.
http://arxiv.org/abs/2409.11652v1
Compressor summary: Relax DARTS is a method to improve eye movement recognition using automated network search and flexible input selection, achieving state-of-the-art results and adapting to other tasks.
http://arxiv.org/abs/2409.11650v1
Compressor summary: The paper reviews quantization techniques for large neural networks to reduce size, improve efficiency, and address environmental concerns without sacrificing accuracy.
http://arxiv.org/abs/2409.11642v1
Compressor summary: The paper proposes DAF-Net, a dual-branch feature decomposition fusion network with domain adaptive MK-MMD, which effectively aligns visible and infrared image features and improves their fusion quality and performance.
http://arxiv.org/abs/2409.11640v1
Compressor summary: The study compares three methods for filling in missing air pollution data to improve air quality management.
http://arxiv.org/abs/2409.11638v1
Compressor summary: The BanStereoSet dataset evaluates social biases in multilingual LLMs for the Bangla language, covering 9 categories of bias and aiming to guide the development of more equitable language technologies.
http://arxiv.org/abs/2409.11636v1
Compressor summary: The study examines how large language models' understanding of social norms changes based on the assigned persona and finds variations across personas and sociodemographic categories.
http://arxiv.org/abs/2409.11635v1
Compressor summary: PainDiffusion is a model that generates realistic facial expressions of pain in robots using diffusion methods, improving on previous autoregressive approaches and enabling better rehabilitation nurse training.
http://arxiv.org/abs/2409.11631v1
Compressor summary: The paper presents a model for pandemic mitigation using lockdowns and a planning method to solve it efficiently.
http://arxiv.org/abs/2409.11624v1
Compressor summary: The paper proposes MM-GCD, a multimodal category discovery method that aligns heterogeneous information across modalities using contrastive learning and distillation techniques.
http://arxiv.org/abs/2409.11618v1
Compressor summary: PieClam is a probabilistic graph model that represents graphs as overlapping generalized communities by embedding nodes into a code space with a learned prior and using a new decoder based on the Lorentz inner product.