arxiv compressed, 2024-09-20

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-20 generated by the compressor, my personal LLM-based project.


Gender Representation and Bias in Indian Civil Service Mock Interviews

http://arxiv.org/abs/2409.12194v1

Compressor summary: The paper shows gender bias in mock interview questions for Indian civil service candidates and in language model explanations, and introduces a new dataset for social science research.


Vista3D: Unravel the 3D Darkside of a Single Image

http://arxiv.org/abs/2409.12193v1

Compressor summary: Vista3D is a framework that quickly generates 3D models from images using Gaussian Splatting, Signed Distance Functions, and disentangled representations.


Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

http://arxiv.org/abs/2409.12191v1

Compressor summary: The Qwen2-VL Series is an advanced visual processing model that can dynamically adapt to different image resolutions and fuse multimodal information, achieving competitive performance in various benchmarks.


Massively Multi-Person 3D Human Motion Forecasting with Scene Context

http://arxiv.org/abs/2409.12189v1

Compressor summary: Key points: - The paper proposes a scene-aware social transformer model (SAST) to forecast long-term human motion - SAST can handle varying numbers of people and objects in a scene - SAST uses denoising diffusion models to generate realistic and diverse motion - SAST performs better than other approaches on the Humans in Kitchens dataset Summary: The paper introduces SAST, a model that can forecast long-term human motion in scenes with different numbers of people and objects, using denoising diffusion models for realism and diversity.


Qwen2.5-Coder Technical Report

http://arxiv.org/abs/2409.12186v1

Compressor summary: The report introduces Qwen2.5-Coder, a new code generation model that outperforms larger models and excels at various code-related tasks.


To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

http://arxiv.org/abs/2409.12183v1

Compressor summary: Key points: - Chain-of-thought (CoT) helps large language models reason on tasks involving math or logic, but not much on other types of tasks. - CoT improves symbolic execution, but is outperformed by symbolic solvers. - CoT can be applied selectively to save costs and improve performance. - There is a need for new paradigms that better leverage intermediate computation. Summary: CoT boosts math and logic reasoning in LLMs, but is not always needed and can be improved by symbolic solvers and new paradigms.


A Controlled Study on Long Context Extension and Generalization in LLMs

http://arxiv.org/abs/2409.12181v1

Compressor summary: The authors propose a controlled protocol to compare different methods for extending language models to handle long contexts, and find that perplexity is a good indicator, approximate attention underperforms, and exact fine-tuning works well within its range.


Finetuning Language Models to Emit Linguistic Expressions of Uncertainty

http://arxiv.org/abs/2409.12180v1

Compressor summary: The authors study how to improve language models' ability to express uncertainty, which can help users better judge their reliability in information-seeking and decision-making tasks.


You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

http://arxiv.org/abs/2409.12172v1

Compressor summary: YORO is a text-to-SQL model that uses database knowledge during training and eliminates schema encoding, leading to shorter inputs and competitive performance on benchmarks.


Precise Forecasting of Sky Images Using Spatial Warping

http://arxiv.org/abs/2409.12162v1

Compressor summary: The text describes a deep learning method that predicts higher resolution future sky images to improve solar irradiance forecasting, especially for cloud movement near the horizon.


JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation

http://arxiv.org/abs/2409.12156v1

Compressor summary: The paper proposes a NeRF-based network for joint audio and expression guided talking face generation that learns disentangled representations using self-supervised and contrastive learning techniques.


Abductive explanations of classifiers under constraints: Complexity and properties

http://arxiv.org/abs/2409.12154v1

Compressor summary: The paper proposes new types of explanations for classifier decisions that consider constraints between features, reducing redundant explanations and providing coverage information.


MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

http://arxiv.org/abs/2409.12147v1

Compressor summary: MAgICoRe improves LLM reasoning by categorizing problem difficulty, incorporating external reward models for error localization, and using a multi-agent loop to refine solutions iteratively.


MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion

http://arxiv.org/abs/2409.12140v1

Compressor summary: MoRAG is a new text-based human motion generation method that uses improved motion retrieval and multi-part fusion to create diverse and realistic motions for various text descriptions.


GRIN: GRadient-INformed MoE

http://arxiv.org/abs/2409.12136v1

Compressor summary: The paper introduces GRIN, a method to train large Mixture-of-Experts models more effectively by estimating sparse gradients for expert routing and configuring model parallelism.


Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

http://arxiv.org/abs/2409.12135v1

Compressor summary: The paper shows that linear TD learning in reinforcement learning converges without requiring linearly independent features or other assumptions, using a new characterization of its mean ODE.


BERT-VBD: Vietnamese Multi-Document Summarization Framework

http://arxiv.org/abs/2409.12134v1

Compressor summary: The paper proposes a novel Vietnamese MDS framework that combines extractive and abstractive summarization methods using BERT embeddings and VBD-LLaMA2-7B-50b model, achieving better performance than existing approaches.


Linguini: A benchmark for language-agnostic linguistic reasoning

http://arxiv.org/abs/2409.12126v1

Compressor summary: The paper introduces a new benchmark to test language models' linguistic reasoning across low-resource languages, showing a performance gap between open and closed models.


Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

http://arxiv.org/abs/2409.12122v1

Compressor summary: The report introduces large language models for math with self-improvement features that enhance their reasoning capabilities in Chinese and English.


Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility

http://arxiv.org/abs/2409.12116v1

Compressor summary: This paper argues for using stronger baseline models in healthcare ML evaluations to improve model transparency, address data silos, and simplify metrics, enabling better deployment of ML models in clinical settings.


Pareto Data Framework: Steps Towards Resource-Efficient Decision Making Using Minimum Viable Data (MVD)

http://arxiv.org/abs/2409.12112v1

Compressor summary: The Pareto Data Framework helps optimize efficiency in resource-constrained environments by selecting Minimum Viable Data for machine learning applications on IoT devices.


SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba

http://arxiv.org/abs/2409.12108v1

Compressor summary: SPRMamba is a novel framework for recognizing different phases of endoscopic submucosal dissection, a minimally invasive procedure, using Mamba-based long-term temporal modeling with a Scaled Residual TranMamba block and a Temporal Sample Strategy to improve precision and efficiency.


Measuring Human and AI Values based on Generative Psychometrics with Large Language Models

http://arxiv.org/abs/2409.12106v1

Compressor summary: Generative Psychometrics for Values (GPV) is a data-driven method based on large language models that measures human and AI values from text inputs, overcoming limitations of previous approaches.


Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models

http://arxiv.org/abs/2409.12100v1

Compressor summary: The text introduces a new framework for machine learning using higher-order symmetries and category theory to improve model robustness, generalization, and convergence.


Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance

http://arxiv.org/abs/2409.12099v1

Compressor summary: Brain-Streams uses textual and visual guidance from fMRI signals to generate more detailed and semantically plausible images.


Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval

http://arxiv.org/abs/2409.12097v1

Compressor summary: The paper introduces a new neural retriever architecture that uses multilingual language models to efficiently match job proposals with freelancers based on their skills.


The Impact of Element Ordering on LM Agent Performance

http://arxiv.org/abs/2409.12089v1

Compressor summary: Element ordering greatly affects language model agents' performance in virtual environments, especially when deriving elements from pixels.


Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques

http://arxiv.org/abs/2409.12087v1

Compressor summary: The study uses machine learning and deep learning to predict kidney disease progression from insurance claims data, finding that a long short-term memory model outperforms existing methods and can be explained at the patient level.


Unsupervised Domain Adaptation Via Data Pruning

http://arxiv.org/abs/2409.12076v1

Compressor summary: AdaPrune is a novel method for unsupervised domain adaptation that removes training examples to align the training distribution to the target data using maximum mean discrepancy, achieving better performance than related techniques on bioacoustic event detection.


PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models

http://arxiv.org/abs/2409.12060v1

Compressor summary: Paraphrasus is a benchmark for evaluating paraphrase detection models in a more nuanced way than previous methods, revealing trade-offs that were not visible before.


Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking

http://arxiv.org/abs/2409.12059v1

Compressor summary: The paper proposes a new language model architecture called TaS that uses a thinking layer to generate more reasonable responses based on prompt-response samples.


Extended Deep Submodular Functions

http://arxiv.org/abs/2409.12053v1

Compressor summary: EDSFs are neural network-representable set functions that extend Deep Submodular Functions, representing all monotone submodular functions and having better generalization error than DSFs.


Using Large Language Models to Generate Clinical Trial Tables and Figures

http://arxiv.org/abs/2409.12046v1

Compressor summary: The study explores using large language models to automate the creation of clinical trial data summary tables and figures, showing their potential in this domain.


Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

http://arxiv.org/abs/2409.12045v1

Compressor summary: The paper proposes a method that combines safe exploration with learnable constraints for reinforcement learning in real-world robots, improving both performance and safety.


ASR Benchmarking: Need for a More Representative Conversational Dataset

http://arxiv.org/abs/2409.12042v1

Compressor summary: The authors introduce a new dataset of phone conversations to test how well automatic speech recognition systems perform in real-world, unstructured settings with disfluencies and diverse accents.


SFDA-rPPG: Source-Free Domain Adaptive Remote Physiological Measurement with Spatio-Temporal Consistency

http://arxiv.org/abs/2409.12040v1

Compressor summary: Key points: - rPPG uses facial video to measure blood volume - Traditional rPPG models need access to source and target domains, which may not be possible or privacy-friendly - The paper proposes SFDA-rPPG, a benchmark that enables domain adaptation without source data - The method uses TSTC-Net and FWD loss to enhance feature consistency and distribution alignment across domains Summary: The paper introduces SFDA-rPPG, a source-free domain adaptation benchmark for rPPG measurement using facial video. It uses a novel FWD loss and a spatio-temporal consistency network to align features and distributions across domains effectively.


A Unified Framework for Neural Computation and Learning Over Time

http://arxiv.org/abs/2409.12038v1

Compressor summary: The paper introduces Hamiltonian Learning, a novel framework for online neural network learning using differential equations from optimal control theory, which offers flexibility, generalization, and integration without external solvers.


Multi-Sensor Deep Learning for Glacier Mapping

http://arxiv.org/abs/2409.12034v1

Compressor summary: This chapter reviews how combining multi-sensor remote sensing data and deep learning allows for better mapping of glaciers and detecting their changes, particularly for challenging cases like debris-covered and calving glaciers.


Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes

http://arxiv.org/abs/2409.12033v1

Compressor summary: The paper proposes a new architecture using the Mamba state-space model to handle graph data with higher-order interactions in simplicial complexes, avoiding message-passing mechanisms.


PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba

http://arxiv.org/abs/2409.12031v1

Compressor summary: PhysMamba uses Mamba, a state space model, to capture long-range physiological dependencies from facial videos for remote photoplethysmography measurement.


On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery

http://arxiv.org/abs/2409.12026v1

Compressor summary: The paper compares the performance of ViT models and CNN architectures for binary classification tasks in SSS imagery, with ViTs showing better results but requiring more resources.


Computational Imaging for Long-Term Prediction of Solar Irradiance

http://arxiv.org/abs/2409.12016v1

Compressor summary: The text describes a new system and algorithm that can forecast cloud movement and solar power generation more accurately by using high-resolution wide-angle images and spatio-temporal slices of data.


BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling

http://arxiv.org/abs/2409.12014v1

Compressor summary: BRDF-NeRF is a machine learning technique that estimates the RPV model to analyze anisotropic reflectance of complex Earth surfaces from satellite imagery.


Mixture of Prompt Learning for Vision Language Models

http://arxiv.org/abs/2409.12011v1

Compressor summary: The paper proposes a mixture of soft prompt learning method with a routing module that selects the best prompts for each instance, and introduces semantically grouped text-level supervision to preserve initial knowledge and reduce overfitting.


ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation

http://arxiv.org/abs/2409.12010v1

Compressor summary: ChefFusion is a novel multimodal food computing foundation model that leverages large language models and pre-trained image encoder and decoder models to perform various food computing tasks, such as understanding, recognition, generation, and translation.


Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning

http://arxiv.org/abs/2409.12001v1

Compressor summary: The paper highlights the lack of data quality and standardization in offline multi-agent reinforcement learning research and proposes guidelines for dataset generation, dataset standardization, and analysis tools to improve the field.


Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes

http://arxiv.org/abs/2409.11995v1

Compressor summary: The paper studies how the loss landscape of neural networks changes with more samples, providing bounds and empirical evidence for convergence in image classification tasks.


An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications

http://arxiv.org/abs/2409.11985v1

Compressor summary: The paper proposes a new way to estimate uncertainty in soil property prediction by transforming regression into classification problems, using existing machine learning algorithms and showing its advantages over common pedometric models on German agricultural data.


Intraoperative Registration by Cross-Modal Inverse Neural Rendering

http://arxiv.org/abs/2409.11983v1

Compressor summary: The paper proposes a new 3D/2D intraoperative registration method during neurosurgery using a Neural Radiance Field controlled by a multi-style hypernetwork, which outperforms current methods and meets clinical standards.


Sampling Latent Material-Property Information From LLM-Derived Embedding Representations

http://arxiv.org/abs/2409.11971v1

Compressor summary: Large language models can create material embeddings from literature, but require optimal context and comparisons for accurate property predictions.


Efficacy of Synthetic Data as a Benchmark

http://arxiv.org/abs/2409.11968v1

Compressor summary: Synthetic data generated by large language models can effectively benchmark simpler NLP tasks but introduces biases for complex ones, and using multiple models reduces these biases.


A Chinese Continuous Sign Language Dataset Based on Complex Environments

http://arxiv.org/abs/2409.11960v1

Compressor summary: The paper introduces a new dataset for Chinese sign language recognition in diverse real-life scenarios and proposes a time-frequency network model that performs well under complex backgrounds.


Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

http://arxiv.org/abs/2409.11953v1

Compressor summary: The paper proposes FE-TAP, a point tracker that combines image frames and events to achieve high frame rate and robust tracking in various challenging conditions.


GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations

http://arxiv.org/abs/2409.11951v1

Compressor summary: Key points: - The paper proposes a new method to generate realistic and dynamic human head avatars from multi-view imagery in real-time. - The method uses a hierarchical representation of head models that captures complex facial expressions and head movements. - The method is trained end-to-end with video inputs and can synthesize novel views of challenging facial expressions. Summary: The paper presents a real-time method to create dynamic human head avatars from multi-view images using a hierarchical model that learns from video inputs and generalizes to new facial expressions and poses.


Differentiable Collision-Supervised Tooth Arrangement Network with a Decoupling Perspective

http://arxiv.org/abs/2409.11937v1

Compressor summary: The paper proposes DTAN, a method for digital orthodontic planning that decouples predicting tasks and feature modeling, improves hidden features learning, and uses a differentiable collision loss function for point cloud data.


Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling

http://arxiv.org/abs/2409.11933v1

Compressor summary: Key points: - RL with heuristic methods solves optimization problems by learning from search process data - RL agent improves suboptimal solution by applying small changes - Approach uses Transformer encoding and probability matrix to swap jobs - Outperforms other heuristics on real production scheduling problem Summary: The paper proposes a novel RL-based heuristic method that uses Transformer encoding and job swapping to optimize production scheduling, showing better results than existing methods.


An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction

http://arxiv.org/abs/2409.11929v1

Compressor summary: The study uses machine learning to classify road accident outcomes and identify key factors affecting fatality risk in Dhaka, Bangladesh.


Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models

http://arxiv.org/abs/2409.11920v1

Compressor summary: Key points: - Paper proposes a method to generate realistic 3D human motions for unseen action classes using GPTs models and diffusion models - Method decomposes complex actions into simpler movements and recombining them - Method can be integrated with any pre-trained diffusion model and outperforms the state-of-the-art Summary: The paper presents a method that uses GPTs and diffusion models to synthesize realistic 3D human motions for complex actions by breaking them down into simpler movements and combining them.


LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

http://arxiv.org/abs/2409.11919v1

Compressor summary: The paper proposes LLM-wrapper, a black-box method to improve VLMs' zero-shot capabilities on REC task using large language models.


LLMs in Education: Novel Perspectives, Challenges, and Opportunities

http://arxiv.org/abs/2409.11917v1

Compressor summary: This tutorial explores the use of large language models (LLMs) in education, covering their impact on teaching, learning, and assessment across four applications: reading, writing, speaking, and intelligent tutoring systems.


Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation

http://arxiv.org/abs/2409.11904v1

Compressor summary: The paper proposes an efficient annotation framework using Rapidata's technology to collect human feedback and rank text-to-image models based on style, coherence, and alignment.


Less Memory Means smaller GPUs: Backpropagation with Compressed Activations

http://arxiv.org/abs/2409.11902v1

Compressor summary: The authors propose using pooling to compress activation maps in the backward pass of deep neural networks, reducing memory footprint and data movement, while maintaining prediction accuracy.


LLMs + Persona-Plug = Personalized LLMs

http://arxiv.org/abs/2409.11901v1

Compressor summary: Key points: - Personalization is important for language tasks and applications - Existing methods have drawbacks such as high cost or low fidelity - Proposed model uses a user-specific embedding to improve personalization without fine-tuning - Model shows better performance on LaMP benchmark Summary: The paper proposes a novel personalized LLM model that uses a user-specific embedding to capture the user's habits and preferences, improving language tasks without fine-tuning. The model outperforms existing methods on various tasks.


Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics

http://arxiv.org/abs/2409.11899v1

Compressor summary: The paper presents a new GNN model combining Self-Attention and Message Passing, a dynamic mesh pruning method, a self-supervised training technique based on BERT, and a large dataset for CFD using finite element methods.


DocMamba: Efficient Document Pre-training with State Space Model

http://arxiv.org/abs/2409.11887v1

Compressor summary: DocMamba is a new framework that uses the state space model to understand visually-rich documents more efficiently and effectively than existing transformer-based models.


Recent Advances in OOD Detection: Problems and Approaches

http://arxiv.org/abs/2409.11884v1

Compressor summary: This paper reviews recent OOD detection methods from different problem scenarios, such as test-time adaptation and multi-modal data, and provides a new taxonomy for the field.


ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images

http://arxiv.org/abs/2409.11874v1

Compressor summary: Key points: - The paper introduces a novel evaluation matrix for text and typography accuracy in AI-generated images - It uses letter by letter matching and brevity adjustment to handle errors and redundancies - It analyzes the impact of frequently and less frequently used words on text generation quality - It aims to make AI-generated images more meaningful and democratize graphic design industry Summary: The paper presents a new way to measure how well AI can generate accurate and stylish text in images, using letter matching and word adjustment techniques.


SpheriGait: Enriching Spatial Representation via Spherical Projection for LiDAR-based Gait Recognition

http://arxiv.org/abs/2409.11869v1

Compressor summary: SpheriGait is a novel method that uses spherical projection and a special network block to improve gait recognition from 3D point clouds captured by LiDAR sensors.


Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional Network

http://arxiv.org/abs/2409.11862v1

Compressor summary: Key points: - The text is about a deep learning model (MQ-TCN) for EV charging demand forecasting - The model is location-based, adaptive, and cost-effective - It outperforms previous models on different charging sites with diverse user types Summary: The text introduces a new deep learning model that can accurately predict the electric vehicle charging demand at different locations using limited data.


Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

http://arxiv.org/abs/2409.11859v1

Compressor summary: The paper proposes a new upper bound for the spectral norm of the Jacobian matrix in CNNs, which is independent of the input image resolution and can be efficiently calculated during training, improving their performance.


Edge-Based Graph Component Pooling

http://arxiv.org/abs/2409.11856v1

Compressor summary: Key points: - The paper proposes a new pooling operator for graph neural networks that is simple, cheap, and preserves data - The proposed operator improves performance on four benchmark datasets and reduces complexity and parameters compared to existing methods Summary: The paper introduces a novel pooling operator for graph neural networks that balances simplicity, efficiency, and data fidelity, and shows its advantages over existing methods on various datasets.


An efficient wavelet-based physics-informed neural networks for singularly perturbed problems

http://arxiv.org/abs/2409.11847v1

Compressor summary: W-PINNs are a wavelet-based neural network model that efficiently solves singularly perturbed differential equations by representing the solution in wavelet space and searching for it within this space during training.


MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

http://arxiv.org/abs/2409.11844v1

Compressor summary: MEOW is a gradient descent-based unlearning method that uses inverted facts to selectively forget sensitive information from LLMs without compromising their utility or performance on other tasks.


Graph Neural Network-State Predictive Information Bottleneck (GNN-SPIB) approach for learning molecular thermodynamics and kinetics

http://arxiv.org/abs/2409.11843v1

Compressor summary: The Graph Neural Network-State Predictive Information Bottleneck framework uses atomic coordinates to learn low-dimensional representations and predict structural, thermodynamic, and kinetic information for slow processes in molecular dynamics simulations.


Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework

http://arxiv.org/abs/2409.11827v1

Compressor summary: ExtAbs is a new method that combines extractive and abstractive summarization in one model using a saliency mask to focus on important parts of the input.


Optimizing Job Shop Scheduling in the Furniture Industry: A Reinforcement Learning Approach Considering Machine Setup, Batch Variability, and Intralogistics

http://arxiv.org/abs/2409.11820v1

Compressor summary: The paper proposes using Deep Reinforcement Learning to enhance Job Shop Scheduling Problem in the furniture industry by considering various factors and integrating with ERP and Manufacturing Execution Systems.


EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis

http://arxiv.org/abs/2409.11817v1

Compressor summary: EFCM framework combines unsupervised feature distillation with slide-level fine-tuning to improve accuracy and efficiency in medical image analysis using large models, overcoming challenges like memory and inference latency.


SymFace: Additional Facial Symmetry Loss for Deep Face Recognition

http://arxiv.org/abs/2409.11816v1

Compressor summary: This paper proposes a symmetry-based loss function for face verification that leverages facial symmetry to improve the reliability of face embeddings and outperforms existing methods.


EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning

http://arxiv.org/abs/2409.11813v1

Compressor summary: EventAug is a systematic augmentation scheme that enhances spatial-temporal diversity in event camera data, improving model robustness and performance across various tasks and backbones.


Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring

http://arxiv.org/abs/2409.11807v1

Compressor summary: This paper proposes an extension to Constraint Guided AutoEncoders for machine condition monitoring, which can handle both anomaly detection and condition indicator estimation with improved monotonic behavior.


Latent fingerprint enhancement for accurate minutiae detection

http://arxiv.org/abs/2409.11802v1

Compressor summary: The paper proposes a new method using GANs to enhance latent fingerprints by preserving local and structural features, improving identification performance for forensic investigations.


The Factuality of Large Language Models in the Legal Domain

http://arxiv.org/abs/2409.11798v1

Compressor summary: The paper tests the effectiveness of large language models as knowledge bases in the legal domain by evaluating them on a dataset of factual questions about case law and legislation using various methods and strategies.


Efficient Low-Resolution Face Recognition via Bridge Distillation

http://arxiv.org/abs/2409.11786v1

Compressor summary: The paper proposes a method to train a light-weight face recognition model that can work well with low-resolution faces by transferring knowledge from high-resolution models using two-step distillation.


Distilling Channels for Efficient Deep Tracking

http://arxiv.org/abs/2409.11785v1

Compressor summary: Channel distillation is a novel framework that improves deep trackers by adaptively selecting informative feature channels for efficient object tracking with high accuracy, speed, and low memory requirements.


Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources

http://arxiv.org/abs/2409.11783v1

Compressor summary: The authors develop and evaluate a medical adaptation of a 7B language model that can operate on low computational resources, achieving comparable or better performance than larger medical models on question-answering tasks in Japanese and English.


Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

http://arxiv.org/abs/2409.11770v1

Compressor summary: The paper proposes KANet, a method to improve few-shot class-incremental learning by using CLIP as a network pedestal and a Knowledge Adapter module that fuses data-specific knowledge into the general representation.


Consistent Estimation of a Class of Distances Between Covariance Matrices

http://arxiv.org/abs/2409.11761v1

Compressor summary: The paper proposes a consistent estimator for distance between covariance matrices based on traces of functions applied separately to each matrix, and shows its asymptotic Gaussianity and superior performance over conventional methods.


Synthesizing Evolving Symbolic Representations for Autonomous Systems

http://arxiv.org/abs/2409.11756v1

Compressor summary: The text describes a new AI architecture that uses intrinsic motivation and classical planning to learn and plan without specific goals, integrating low-level and high-level representations in a self-directed exploration process.


Neural Encoding for Image Recall: Human-Like Memory

http://arxiv.org/abs/2409.11750v1

Compressor summary: The paper presents a method inspired by human memory to improve image recall in artificial systems, achieving high accuracy with natural images but lower performance with textures.


RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework

http://arxiv.org/abs/2409.11749v1

Compressor summary: RockTrack is a 3D MOT method that improves multi-camera tracker versatility and performance by using a confidence-guided preprocessing module, a fusion association module, and a novel appearance similarity metric.


Exploring Gaze Pattern in Autistic Children: Clustering, Visualization, and Prediction

http://arxiv.org/abs/2409.11744v1

Compressor summary: The paper proposes a novel method using clustering algorithms and machine learning models to automatically analyze and predict ASD based on gaze patterns in children, achieving state-of-the-art results.


HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning

http://arxiv.org/abs/2409.11741v1

Compressor summary: HARP is a multi-agent reinforcement learning framework that integrates human assistance with agent regrouping to improve group-oriented task completion and allow non-experts to provide valuable guidance.


InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models

http://arxiv.org/abs/2409.11734v1

Compressor summary: GEO is a versatile image editing technique that combines text and image prompts with a novel geometric loss to achieve precise and diverse editing outcomes without training.


Human-like Affective Cognition in Foundation Models

http://arxiv.org/abs/2409.11733v1

Compressor summary: The text introduces an evaluation framework for testing affective cognition in AI models, which shows they can understand emotions and their influence on humans as well as or better than humans themselves.


Enabling Real-Time Conversations with Minimal Training Costs

http://arxiv.org/abs/2409.11727v1

Compressor summary: The paper proposes a new duplex decoding approach for LLMs that improves real-time interactive feedback with minimal additional training.


Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

http://arxiv.org/abs/2409.11726v1

Compressor summary: The paper introduces a probing dataset to assess large language models' error detection skills in role-playing and proposes a self-reflective reasoning method to improve them.


TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

http://arxiv.org/abs/2409.11724v1

Compressor summary: TART is a framework that combines large language models with specialized tools to improve table reasoning tasks like TQA and TFV, using a new dataset and achieving competitive results.


Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression

http://arxiv.org/abs/2409.11718v1

Compressor summary: The text proposes a method for compressing videos with rich semantics by using various video feature models (VFMs) and a dynamic trajectory-based inter-frame compression scheme, resulting in improved efficiency and performance on several tasks and datasets.


From Lists to Emojis: How Format Bias Affects Model Alignment

http://arxiv.org/abs/2409.11704v1

Compressor summary: The paper investigates various format biases in reinforcement learning from human feedback and shows their impact on preference models, reward models, and alignment algorithms.


Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation

http://arxiv.org/abs/2409.11703v1

Compressor summary: The paper introduces a system that uses large language models for simplifying software interactions by classifying natural language inputs into API calls and generating sample datasets for evaluating LLMs in API management.


Monomial Matrix Group Equivariant Neural Functional Networks

http://arxiv.org/abs/2409.11697v1

Compressor summary: The paper introduces Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN), which use scaling/sign-flipping symmetries to reduce trainable parameters and improve efficiency in neural networks.


ORB-SfMLearner: ORB-Guided Self-supervised Visual Odometry with Selective Online Adaptation

http://arxiv.org/abs/2409.11692v1

Compressor summary: ORB-SfMLearner is a novel visual odometry method that uses ORB features for learning-based ego-motion estimation, cross-attention mechanism for explainability, and selective online adaptation for generalizability, achieving better accuracy and performance than previous methods.


GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation

http://arxiv.org/abs/2409.11689v1

Compressor summary: PoseDiffusion is a framework that uses a diffusion model with GUNet to generate diverse, correct, and aesthetically pleasing human pose skeletons based on natural language inputs.


Detecting Underdiagnosed Medical Conditions with Deep Learning-Based Opportunistic CT Imaging

http://arxiv.org/abs/2409.11686v1

Compressor summary: The study uses deep learning to improve opportunistic CT scans for diagnosing conditions like sarcosenia, hepatic steatosis, and ascites, which are often underdiagnosed and poorly documented.


Recurrent Interpolants for Probabilistic Time Series Prediction

http://arxiv.org/abs/2409.11684v1

Compressor summary: Key points: - Sequential models like RNNs or transformers are widely used for probabilistic time series forecasting but struggle with high-dimensional complex distributions and cross-feature dependencies - Diffusion or flow-based models can address these issues by generative modeling, but they face scalability challenges - The proposed approach combines the strengths of recurrent neural networks and diffusion models for efficient and high-quality probabilistic time series forecasting Summary: The paper proposes a new method that combines RNNs and diffusion models to improve probabilistic time series forecasting, overcoming the limitations of both approaches in handling complex distributions and dependencies.


SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation

http://arxiv.org/abs/2409.11682v1

Compressor summary: SRIF is a novel framework that uses diffusion-based image morphing, large vision models, and normalizing flow to register and interpolate shapes with semantic information.


Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network

http://arxiv.org/abs/2409.11677v1

Compressor summary: The paper introduces HDR, a large and diverse dataset for hierarchical MER, and proposes HDNet, a model that improves MER performance by handling formula details at different levels.


Towards Explainable Goal Recognition Using Weight of Evidence (WoE): A Human-Centered Approach

http://arxiv.org/abs/2409.11675v1

Compressor summary: The eXplainable Goal Recognition (XGR) model generates explanations for why and why not questions about an agent's goals, improving human understanding, trust, and decision-making in collaborating with AI agents.


RUIE: Retrieval-based Unified Information Extraction using Large Language Model

http://arxiv.org/abs/2409.11673v1

Compressor summary: RUIE is a framework that uses in-context learning and retrieval to improve unified information extraction tasks, reducing computational costs and enabling generalization to new tasks.


Anticipating Oblivious Opponents in Stochastic Games

http://arxiv.org/abs/2409.11671v1

Compressor summary: Key points: - The text presents an approach to anticipate actions and policies of oblivious environments in concurrent stochastic games - The approach uses a finite information state machine that maps states to belief states about the environment's policy - The approach ensures consistency of the belief states using a distance metric - The approach yields an MDP for computing optimal policies and shows experimental results on human activity data Summary: The text describes how to use a consistent finite information state machine to anticipate and optimize actions in concurrent stochastic games with oblivious environments, and demonstrates its effectiveness on real-world tasks.


Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis

http://arxiv.org/abs/2409.11664v1

Compressor summary: AMD-MIL is a novel weakly supervised learning method for whole slide image classification that uses agent aggregation and mask denoise mechanisms to improve attention allocation, capture fine details, and enhance interpretability.


VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer

http://arxiv.org/abs/2409.11656v1

Compressor summary: The VL-Reader approach combines masked autoencoding with a novel reconstruction and decoder objective to achieve high accuracy in scene text recognition by effectively modeling visual and linguistic information.


Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

http://arxiv.org/abs/2409.11653v1

Compressor summary: SSL framework RDSS uses a modified Frank-Wolfe algorithm to select diverse and representative samples from unlabeled data for annotation, improving performance under low-budget settings.


Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

http://arxiv.org/abs/2409.11652v1

Compressor summary: Relax DARTS is a method to improve eye movement recognition using automated network search and flexible input selection, achieving state-of-the-art results and adapting to other tasks.


Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview

http://arxiv.org/abs/2409.11650v1

Compressor summary: The paper reviews quantization techniques for large neural networks to reduce size, improve efficiency, and address environmental concerns without sacrificing accuracy.


DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion

http://arxiv.org/abs/2409.11642v1

Compressor summary: The paper proposes DAF-Net, a dual-branch feature decomposition fusion network with domain adaptive MK-MMD, which effectively aligns visible and infrared image features and improves their fusion quality and performance.


Enhancing PM2.5 Data Imputation and Prediction in Air Quality Monitoring Networks Using a KNN-SINDy Hybrid Model

http://arxiv.org/abs/2409.11640v1

Compressor summary: The study compares three methods for filling in missing air pollution data to improve air quality management.


BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

http://arxiv.org/abs/2409.11638v1

Compressor summary: The BanStereoSet dataset evaluates social biases in multilingual LLMs for the Bangla language, covering 9 categories of bias and aiming to guide the development of more equitable language technologies.


"A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs

http://arxiv.org/abs/2409.11636v1

Compressor summary: The study examines how large language models' understanding of social norms changes based on the assigned persona and finds variations across personas and sociodemographic categories.


PainDiffusion: Can robot express pain?

http://arxiv.org/abs/2409.11635v1

Compressor summary: PainDiffusion is a model that generates realistic facial expressions of pain in robots using diffusion methods, improving on previous autoregressive approaches and enabling better rehabilitation nurse training.


A Metric Hybrid Planning Approach to Solving Pandemic Planning Problems with Simple SIR Models

http://arxiv.org/abs/2409.11631v1

Compressor summary: The paper presents a model for pandemic mitigation using lockdowns and a planning method to solve it efficiently.


Multimodal Generalized Category Discovery

http://arxiv.org/abs/2409.11624v1

Compressor summary: The paper proposes MM-GCD, a multimodal category discovery method that aligns heterogeneous information across modalities using contrastive learning and distillation techniques.


PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities

http://arxiv.org/abs/2409.11618v1

Compressor summary: PieClam is a probabilistic graph model that represents graphs as overlapping generalized communities by embedding nodes into a code space with a learned prior and using a new decoder based on the Lorentz inner product.