arxiv compressed, 2024-09-16

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-16 generated by the compressor, my personal LLM-based project.


INN-PAR: Invertible Neural Network for PPG to ABP Reconstruction

http://arxiv.org/abs/2409.09021v1

Compressor summary: The text introduces an invertible neural network for non-invasive blood pressure monitoring using photoplethysmography, which improves accuracy by capturing high-frequency details and learning features across multiple scales.


An Efficient and Streaming Audio Visual Active Speaker Detection System

http://arxiv.org/abs/2409.09018v1

Compressor summary: The paper proposes two methods to reduce latency and memory usage in real-time Active Speaker Detection systems, limiting future and past context frames, and shows they perform well compared to existing models.


AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

http://arxiv.org/abs/2409.09013v1

Compressor summary: AI-LieDar is a framework to study how large language models navigate situations where being truthful conflicts with achieving goals, showing that current models are often untruthful and hard to steer towards truthfulness.


Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach

http://arxiv.org/abs/2409.09009v1

Compressor summary: The proposed method uses retrieved examples to improve rare word translation accuracy in direct ST models, with speech-to-speech retrieval being the most effective and robust approach.


SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity

http://arxiv.org/abs/2409.09007v1

Compressor summary: The paper evaluates the need for multi-layer attention in graph Transformers, proposes a simplified single-layer version (SGFormer) that scales well and requires less data, and shows its effectiveness on large graphs.


E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on High-impact Legal Cases

http://arxiv.org/abs/2409.09001v1

Compressor summary: E2MoCase is a dataset that helps study how emotions, morals, and events in legal stories affect media coverage and public opinion.


PINNfluence: Influence Functions for Physics-Informed Neural Networks

http://arxiv.org/abs/2409.08958v1

Compressor summary: The text discusses using influence functions to improve interpretability and validate physics-informed neural networks in fluid flow problems.


Pushing the boundaries of event subsampling in event-based video classification using CNNs

http://arxiv.org/abs/2409.08953v1

Compressor summary: Event cameras can classify images with low accuracy even after heavily subsampling events, but training CNNs becomes more sensitive to hyperparameters in highly subsampled scenarios.


A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

http://arxiv.org/abs/2409.08947v1

Compressor summary: The method creates relightable radiance fields from single-illumination data by using 2D diffusion model priors to augment the data and optimize appearance features for multi-view consistency.


DELTA: Dual Consistency Delving with Topological Uncertainty for Active Graph Domain Adaptation

http://arxiv.org/abs/2409.08946v1

Compressor summary: DELTA is a novel approach for active graph domain adaptation that selects informative nodes and uses two subnetworks to explore topological semantics, improving performance on target graphs.


SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records

http://arxiv.org/abs/2409.08936v1

Compressor summary: The SynSUM benchmark is a synthetic dataset for research on clinical information extraction and reasoning with tabular background variables and text.


Optimization and Generalization Guarantees for Weight Normalization

http://arxiv.org/abs/2409.08935v1

Compressor summary: This paper provides the first theory for optimizing and generalizing deep neural networks with weight normalization, showing how it affects convergence and uniformity in training and testing.


Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation

http://arxiv.org/abs/2409.08917v1

Compressor summary: The Latent Space Score-Based Diffusion Model (LSSDM) is an unsupervised learning approach for probabilistic multivariate time series imputation that projects observed values onto a low-dimensional latent space, reconstructs coarse missing data, and uses a conditional diffusion model to obtain precise imputed values with uncertainty analysis.


Affective Computing Has Changed: The Foundation Model Disruption

http://arxiv.org/abs/2409.08907v1

Compressor summary: This paper explores the potential and challenges of using foundation models for affective computing, which involves generating and analysing multimodal data related to human emotions.


Exploring Action-Centric Representations Through the Lens of Rate-Distortion Theory

http://arxiv.org/abs/2409.08892v1

Compressor summary: The text discusses how efficient coding and rate-distortion theory can be used to understand action-oriented efficient representations in organisms' perception.


Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark

http://arxiv.org/abs/2409.08887v1

Compressor summary: The text introduces VLT-MI, a novel benchmark for visual language tracking with multi-modal interaction, which improves cognitive alignment and robustness of trackers by enabling multiple rounds of text and object updates during tracking.


Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing

http://arxiv.org/abs/2409.08885v1

Compressor summary: Key points: - Object detection in remote sensing imagery is challenging due to small and barely visible objects across diverse terrains - Multimodal learning can integrate features from different data modalities to improve detection accuracy - Masked Image Modeling (MIM) can be used as a pre-training technique for object detection using self-supervised learning on unlabeled data - Conventional MIM such as MAE lacks contextual information and fine-grained details - The paper proposes a new interactive MIM method that can establish interactions between different tokens, which is beneficial for object detection in remote sensing Summary: The paper introduces an interactive Masked Image Modeling method to improve object detection in remote sensing imagery by leveraging self-supervised learning on unlabeled data and establishing interactions between different tokens.


Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection

http://arxiv.org/abs/2409.08884v1

Compressor summary: Synthetic data helps detect fake images by training vision transformers with synthetic representation learners like SynCLR, outperforming CLIP on unseen GAN models.


Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages

http://arxiv.org/abs/2409.08872v1

Compressor summary: The study shows that using a data selection scheme to augment limited target language data improves automatic speech recognition for two endangered languages, Amis and Seediq.


Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies

http://arxiv.org/abs/2409.08864v1

Compressor summary: The study explores how using images alongside text improves large language models' ability to understand graphs.


Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

http://arxiv.org/abs/2409.08861v1

Compressor summary: This paper proposes a new method for reward fine-tuning of dynamical generative models using stochastic optimal control, which improves their quality and generalization.


InstantDrag: Improving Interactivity in Drag-based Image Editing

http://arxiv.org/abs/2409.08857v1

Compressor summary: InstantDrag is an optimization-free method that enables fast, photo-realistic drag-based image editing without masks or text prompts using two networks that learn motion dynamics from real-world video datasets.


Using The Concept Hierarchy for Household Action Recognition

http://arxiv.org/abs/2409.08853v1

Compressor summary:


Kinect Calibration and Data Optimization For Anthropometric Parameters

http://arxiv.org/abs/2409.08847v1

Compressor summary: The text discusses a new method for calibrating and optimizing Microsoft Kinect sensors, which are widely used in 3D vision systems for applications such as medical and biometric fields.


AIPO: Improving Training Objective for Iterative Preference Optimization

http://arxiv.org/abs/2409.08845v1

Compressor summary: Agreement-aware Iterative Preference Optimization (AIPO) addresses the length exploitation issue in iterative preference optimization with synthetic data, achieving state-of-the-art results on various language model benchmarks.


Direct-CP: Directed Collaborative Perception for Connected and Autonomous Vehicles via Proactive Attention

http://arxiv.org/abs/2409.08840v1

Compressor summary: Direct-CP is a system that uses RSUs to help autonomous vehicles signal their interests and focus on important areas, improving their local perception accuracy in collaborative 3D object detection tasks.


Can Kans (re)discover predictive models for Direct-Drive Laser Fusion?

http://arxiv.org/abs/2409.08832v1

Compressor summary: The paper introduces Kolmogorov-Arnold Networks as a new method for machine learning in laser fusion, which improves prediction accuracy and interpretability compared to other approaches.


AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning

http://arxiv.org/abs/2409.08823v1

Compressor summary: The paper proposes a multistage fitting procedure that improves scoring accuracy and calibration for computerized adaptive tests using out-of-the-box AutoML tools.


A RAG Approach for Generating Competency Questions in Ontology Engineering

http://arxiv.org/abs/2409.08820v1

Compressor summary: The paper proposes a method to automatically generate competency questions (CQs) for ontology development using large language models (LLMs) and scientific papers as input, and evaluates its performance on two domain engineering tasks.


Your Weak LLM is Secretly a Strong Teacher for Alignment

http://arxiv.org/abs/2409.08813v1

Compressor summary: This paper shows how using a less powerful language model can generate effective feedback for aligning AI systems with human values and intentions, making alignment more scalable and sustainable.


TabKANet: Tabular Data Modelling with Kolmogorov-Arnold Network and Transformer

http://arxiv.org/abs/2409.08806v1

Compressor summary: The study introduces TabKANet, a Transformer-based model that uses Kolmogorov-Arnold network to encode and merge numerical and categorical features for tabular data, achieving excellent results in six binary classification tasks.


Exploring SSL Discrete Tokens for Multilingual ASR

http://arxiv.org/abs/2409.08805v1

Compressor summary: This paper compares discrete tokens from self-supervised learning models for speech recognition in multiple languages and scenarios, showing improved performance and efficiency over Fbank features.


Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

http://arxiv.org/abs/2409.08800v1

Compressor summary: The text introduces a method to extend the field-of-view of CBCT systems using deep learning and improve their clinical applications, especially for reconstructing rib structures.


Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

http://arxiv.org/abs/2409.08797v1

Compressor summary: The paper shows that using discrete speech features from self-supervised learning in ASR systems improves performance, especially for cross-utterance contexts.


Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes

http://arxiv.org/abs/2409.08792v1

Compressor summary: The study shows how large language models can help create recipes with more phytochemicals, potentially improving health, but cautions that these benefits need clinical validation.


Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling

http://arxiv.org/abs/2409.08788v1

Compressor summary: ECG-ReGen is a retrieval-based method that uses self-supervised learning and large language models to generate comprehensive reports and answer questions from electrocardiograms, potentially improving patient care.


Contactless Fingerprint Recognition Using 3D Graph Matching

http://arxiv.org/abs/2409.08782v1

Compressor summary: The paper proposes a novel contactless fingerprint recognition algorithm that captures the 3D feature of contactless fingerprints and improves matching accuracy across multiple poses.


Sign Language Sense Disambiguation

http://arxiv.org/abs/2409.08780v1

Compressor summary: The text describes a project that explores how to improve sign language translation of German sign language, especially for ambiguous words, using different bodypart representations in transformer models and evaluating their impact on performance.


In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting

http://arxiv.org/abs/2409.08771v1

Compressor summary: The paper proposes a distributed algorithm for low-rank matrix factorization, using power initialization to improve convergence rates and reduce communication overhead.


Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent

http://arxiv.org/abs/2409.08770v1

Compressor summary: The paper analyzes four mini-batch SGD schedulers and shows that increasing batch size and learning rate can improve performance and minimize the full gradient norm of the empirical loss faster.


SAUC: Sparsity-Aware Uncertainty Calibration for Spatiotemporal Prediction with Graph Neural Networks

http://arxiv.org/abs/2409.08766v1

Compressor summary: The paper proposes SAUC, a novel framework that calibrates uncertainty in both zero and non-zero values for spatiotemporal prediction using probabilistic Graph Neural Networks and quantile approaches.


Online Network Inference from Graph-Stationary Signals with Hidden Nodes

http://arxiv.org/abs/2409.08760v1

Compressor summary: The paper proposes a new method for estimating unknown graph connectivity from incomplete and streaming data using stationary signals and a convex optimization problem.


Uncertainty Estimation by Density Aware Evidential Deep Learning

http://arxiv.org/abs/2409.08754v1

Compressor summary: DAEDL is a novel method for improving uncertainty estimation in deep learning by integrating feature space density and using a new parameterization, achieving state-of-the-art results on various tasks.


A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization

http://arxiv.org/abs/2409.08752v1

Compressor summary: Juggler-MAB is a hybrid recommender system that combines meta-learning and Multi-Armed Bandits to balance multiple objectives for various stakeholders in online marketplaces.


Uncertainty and Generalizability in Foundation Models for Earth Observation

http://arxiv.org/abs/2409.08744v1

Compressor summary: The authors study how different Foundation Models and labeling strategies affect the performance and uncertainty of estimating vegetation coverage in various areas using Sentinel satellite images.


Adaptive Sampling for Continuous Group Equivariant Neural Networks

http://arxiv.org/abs/2409.08741v1

Compressor summary: The paper proposes an adaptive sampling method for steerable networks that adjusts to data symmetries, improving performance, equivariance, and computational efficiency.


Multi-intent Aware Contrastive Learning for Sequential Recommendation

http://arxiv.org/abs/2409.08733v1

Compressor summary: Sequence recommendation models should consider multiple user intents instead of just one to better capture real-world scenarios.


Bridging Dynamic Factor Models and Neural Controlled Differential Equations for Nowcasting GDP

http://arxiv.org/abs/2409.08732v1

Compressor summary: The authors propose NCDENow, a GDP nowcasting framework that combines neural controlled differential equations with dynamic factor models to handle irregular dynamics and improve prediction accuracy.


Quasimetric Value Functions with Dense Rewards

http://arxiv.org/abs/2409.08724v1

Compressor summary: The paper explores how goal-conditioned reinforcement learning can be improved with dense rewards by using a quasimetric structure in the optimal value function, leading to more efficient neural architectures and better sample complexity in robotics tasks.


Distilling Monolingual and Crosslingual Word-in-Context Representations

http://arxiv.org/abs/2409.08719v1

Compressor summary: Key points: - Propose a method to distil word meaning in context from masked language model - No human-annotated corpora or parameter updates needed - Use self-attention and auto-encoder to combine hidden layers outputs - Perform well on monolingual and crosslingual tasks Summary: The study presents a method that uses self-attention and auto-encoder to extract word meaning in context from masked language models without human annotations or updates, achieving competitive results on various semantic tasks.


Layerwise Change of Knowledge in Neural Networks

http://arxiv.org/abs/2409.08712v1

Compressor summary: The paper investigates how deep neural networks learn and forget features during forward propagation, and tracks the changes in their interactions and generalization capacity.


L3Cube-IndicQuest: A Benchmark Questing Answering Dataset for Evaluating Knowledge of LLMs in Indic Context

http://arxiv.org/abs/2409.08706v1

Compressor summary: The paper introduces L3Cube-IndicQuest, a question-answering benchmark dataset for evaluating regional knowledge in multilingual LLMs across 20 Indic languages and five domains.


Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

http://arxiv.org/abs/2409.08700v1

Compressor summary: The study uses wearable devices and AI to predict weight loss in overweight people by analyzing various data sources and achieves promising results with an 84.44% accuracy rate.


Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding

http://arxiv.org/abs/2409.08695v1

Compressor summary: Key points: - An innovative system combines computer vision and IoT for precise Tilapia feeding - It uses real-time sensors to monitor water quality and fish size/count - A mobile app enables remote monitoring and control - The method could increase production up to 58 times compared to traditional farms Summary: The system, which combines computer vision and IoT, monitors water quality and fish size with sensors and a mobile app, and feeds Tilapia optimally, potentially boosting production by 58 times.


Autoregressive Sequence Modeling for 3D Medical Image Representation

http://arxiv.org/abs/2409.08691v1

Compressor summary: The authors propose an autoregressive pre-training method for 3D medical image representations that leverages spatial, contrast, and semantic correlations to better understand and integrate contextual information.


Redesigning graph filter-based GNNs to relax the homophily assumption

http://arxiv.org/abs/2409.08676v1

Compressor summary: The paper proposes a new GNN architecture that can handle heterophilic data by reinterpreting graph filters and improving expressiveness, permutation equivariance, and performance.


AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius

http://arxiv.org/abs/2409.08669v1

Compressor summary: AdR-Gaussian accelerates 3D Gaussian splatting by moving culling to the preprocess stage, using adaptive radius and load balancing to reduce overhead and increase quality.


Test-time Training for Hyperspectral Image Super-resolution

http://arxiv.org/abs/2409.08667v1

Compressor summary: The paper proposes a novel self-training framework with a new network architecture and data augmentation method for hyperspectral image super-resolution, achieving significant improvements over existing methods.


Towards certifiable AI in aviation: landscape, challenges, and opportunities

http://arxiv.org/abs/2409.08666v1

Compressor summary: The paper provides a detailed overview of formal AI certification in avionics, discussing the challenges and importance of ensuring safety and reliability in AI systems.


Online Learning Of Expanding Graphs

http://arxiv.org/abs/2409.08660v1

Compressor summary: The paper presents a new online algorithm for learning expanding graphs from spatiotemporal signals that can handle graph growth and node dynamics.


Promoting Fairness in Link Prediction with Graph Enhancement

http://arxiv.org/abs/2409.08658v1

Compressor summary: FairLink is a method that learns a fairness-enhanced graph for link prediction, ensuring equal link probabilities between nodes from the same sensitive group and reducing bias in predictions.


Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks

http://arxiv.org/abs/2409.08647v1

Compressor summary: This paper studies how label noise affects gradient-boosted decision trees (GBDTs) for tabular data and proposes methods to improve their performance.


CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks

http://arxiv.org/abs/2409.08642v1

Compressor summary: CPL uses Monte Carlo Tree Search to improve large language models' general reasoning capabilities by learning step-level planning preferences, while Step-APO enhances existing preference learning approaches for complex multi-step reasoning tasks.


Developing an Algorithm Selector for Green Configuration in Scheduling Problems

http://arxiv.org/abs/2409.08641v1

Compressor summary: The paper presents an intelligent algorithm selection tool for the Job Shop Scheduling Problem using machine learning, which optimizes energy efficiency and production metrics by recommending the best solver for each instance.


Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering

http://arxiv.org/abs/2409.08640v1

Compressor summary: Our method improves distributed learning by using Polyak Momentum to defend against Byzantine workers and achieve better convergence results.


Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

http://arxiv.org/abs/2409.08636v1

Compressor summary: The paper introduces a novel data fingerprint method that helps select AI algorithms for time series classification without needing access to all data points, improving algorithm selection accuracy.


Improving Analog Neural Network Robustness: A Noise-Agnostic Approach with Explainable Regularizations

http://arxiv.org/abs/2409.08633v1

Compressor summary: The authors propose an approach to improve noise resistance in analog neural networks by revealing and using the underlying mechanisms that reduce sensitivity to noise.


Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

http://arxiv.org/abs/2409.08613v1

Compressor summary: Dust-GS is a new framework for scene synthesis that improves on 3D Gaussian Splatting by using an adaptive masking technique and working better with sparse input data.


Optimizing Item-based Marketing Promotion Efficiency in C2C Marketplace with Dynamic Sequential Coupon Allocation Framework

http://arxiv.org/abs/2409.08609v1

Compressor summary: DSCAF is a framework that optimizes coupons for e-commerce sellers by dynamically adjusting their allocation strategies across multiple promotions to maximize ROI and sell-through rate.


Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation

http://arxiv.org/abs/2409.08598v1

Compressor summary: Key points: - Existing FER methods use discrete labels, which are limited for emotional recognition - Proposed a novel method that uses text embeddings to enhance facial expression representations - Used an emotional-to-neutral transformation with a self-contrast objective - Outperformed state-of-the-art FER methods on four datasets using different visual encoders Summary: The paper proposes a new FER method that leverages text embeddings and an emotional-to-neutral transformation to improve facial expression recognition, achieving superior results on four datasets.


Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

http://arxiv.org/abs/2409.08596v1

Compressor summary: The paper explores how large language models can be used to transcribe speech in multi-talker situations using different instructions and speaker characteristics.


Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

http://arxiv.org/abs/2409.08585v1

Compressor summary: The proposed WaveLUT method improves low-light video enhancement by using a wavelet-based lookup table, dynamic fusion strategy, and text-driven appearance reconstruction to achieve color coherence, accurate mapping, and low latency.


ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning

http://arxiv.org/abs/2409.08582v1

Compressor summary: ChangeChat is a bitemporal vision-language model that supports interactive RS change analysis using multimodal instruction tuning and the ChangeChat-87k dataset.


Molecular Graph Representation Learning via Structural Similarity Information

http://arxiv.org/abs/2409.08580v1

Compressor summary: The Molecular Structural Similarity Motif GNN (MSSM-GNN) is a novel method that leverages graph kernel algorithms to capture structural similarity between molecules and improve feature representation learning for property prediction.


HTR-VT: Handwritten Text Recognition with Vision Transformer

http://arxiv.org/abs/2409.08573v1

Compressor summary: Key points: - ViT for handwritten text recognition with limited data - Data-efficient encoder + CNN + SAM optimizer - Span mask technique as a regularizer - Outperforms traditional models on small datasets and sets new benchmark on LAM dataset Summary: The paper proposes a data-efficient ViT method for handwritten text recognition that uses an encoder, a CNN, and a SAM optimizer with span masking. It beats conventional models on small datasets and achieves the best result on the largest LAM dataset.


DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

http://arxiv.org/abs/2409.08572v1

Compressor summary: The paper proposes DiffFAS, a framework to improve face anti-spoofing by addressing image quality and style shifts between domains and attack types, using diffusion-based generation of high-fidelity spoof faces.


Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

http://arxiv.org/abs/2409.08570v1

Compressor summary: The paper proposes a simple batch ensemble scheme for online RL that achieves near-optimal regret in stochastic MAB with just one parameter, the number of batches.


Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection

http://arxiv.org/abs/2409.08566v1

Compressor summary: The paper proposes Hybrid-TTA, a method that adapts to domain shifts by combining Full-Tuning and Efficient-Tuning strategies with Dynamic Domain Shift Detection and Masked Image Modeling based Adaptation.


Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia

http://arxiv.org/abs/2409.08564v1

Compressor summary: IndoCareer is a diverse dataset for evaluating language models on vocational and professional exams in Indonesia, highlighting their challenges in local contexts like insurance and finance.


Second-order difference subspace

http://arxiv.org/abs/2409.08563v1

Compressor summary: The paper introduces the second-order difference subspace, which analyzes geometric differences between multiple subspaces in machine learning, and applies it to temporal shape analysis and biometric signal analysis.


CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting

http://arxiv.org/abs/2409.08562v1

Compressor summary: CSS is a new technique that uses crowd-sourced images to reconstruct challenging scenes with high quality and accuracy, overcoming limitations of traditional 3D methods.


Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

http://arxiv.org/abs/2409.08561v1

Compressor summary: Key points: - Large language models can reason and solve problems using chain-of-thought (CoT) prompting, but it's slow and costly to generate the full CoT process. - The proposed method compresses the CoT process through semantic alignment, using an auxiliary CoT model that learns to generate a compact representation of the thought process. - The method achieves competitive or improved performance compared to the full CoT baseline, while providing significant speedup in decoding time. Summary: The paper proposes a novel approach to compress the chain-of-thought (CoT) process in large language models using semantic alignment and an auxiliary model, improving efficiency and performance in various tasks.


Fair CoVariance Neural Networks

http://arxiv.org/abs/2409.08558v1

Compressor summary: Fair coVariance Neural Networks (FVNNs) use graph convolutions to process covariance matrices for both fair and accurate predictions in signal processing and machine learning applications.


DICS: Find Domain-Invariant and Class-Specific Features for Out-of-Distribution Generalization

http://arxiv.org/abs/2409.08557v1

Compressor summary: The paper proposes a DICS model to extract domain-invariant and class-specific features for deep neural networks, which improves their performance in out-of-distribution scenarios.


LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

http://arxiv.org/abs/2409.08554v1

Compressor summary: The paper evaluates large language models for grapheme-to-phoneme conversion and proposes methods to improve their performance without extra training or data, showing they can outperform traditional tools in Persian.


Causal GNNs: A GNN-Driven Instrumental Variable Approach for Causal Inference in Networks

http://arxiv.org/abs/2409.08544v1

Compressor summary: CgNN is a new method that uses network structure as instrumental variables to estimate causal effects in network data, while accounting for hidden confounders and node importance.


An Efficient Privacy-aware Split Learning Framework for Satellite Communications

http://arxiv.org/abs/2409.08538v1

Compressor summary: The text proposes DTIP, a novel framework that uses split learning and differential privacy to enhance satellite communication efficiency, accuracy, and privacy.


Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics

http://arxiv.org/abs/2409.08530v1

Compressor summary: MAT is a new hybrid model combining Mamba and Transformer techniques to improve long-short range time series forecasting by leveraging their respective strengths in capturing dependencies and patterns.


Eir: Thai Medical Large Language Models

http://arxiv.org/abs/2409.08523v1

Compressor summary: Eir Thai Medical LLM is a large language model that enhances medical tasks in Thai with high accuracy and clear answers for healthcare professionals and patients.


GroundingBooth: Grounding Text-to-Image Customization

http://arxiv.org/abs/2409.08520v1

Compressor summary: The paper presents GroundingBooth, a framework that generates personalized images with accurate layout alignment and identity preservation in text-to-image customization tasks, enabling the customization of multiple subjects at once.


Anytime Continual Learning for Open Vocabulary Classification

http://arxiv.org/abs/2409.08518v1

Compressor summary: The authors present a method for open vocabulary image classification that can learn from new data anytime, improve existing models, and reduce storage and computation using attention-weighted PCA compression.


Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

http://arxiv.org/abs/2409.08513v1

Compressor summary: Mamba-YOLO-World improves object detection beyond predefined categories by introducing a novel feature fusion mechanism that combines speed and efficiency.


Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

http://arxiv.org/abs/2409.08509v1

Compressor summary: VESPR is a defense mechanism that exploits supervised learning's vulnerability to poison attacks, enhancing self-supervised learning's performance on poisoned images and outperforming six previous defenses.


Identifying Human Indoor Daily Life Behavior employing Thermal Sensor Arrays (TSAs)

http://arxiv.org/abs/2409.08508v1

Compressor summary: The study used thermal sensor arrays to monitor daily activities in households, preserving privacy and accurately detecting sleep and daily life activities.


A BERT-Based Summarization approach for depression detection

http://arxiv.org/abs/2409.08483v1

Compressor summary: The text describes using machine learning and AI to detect depression indicators from text data by analyzing interviews with virtual agents and proposes text summarization as a preprocessing technique to improve accuracy.


Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights

http://arxiv.org/abs/2409.08482v1

Compressor summary: The paper explores privacy risks of fine-tuning diffusion models on personal images and shows that existing defenses fail to protect the data.


Integrating Neural Operators with Diffusion Models Improves Spectral Representation in Turbulence Modeling

http://arxiv.org/abs/2409.08477v1

Compressor summary: The authors propose a new method that combines neural operators with diffusion models to improve the surrogate modeling of turbulent flows by enhancing the resolution of turbulent structures and better capturing high-frequency flow dynamics.


RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision

http://arxiv.org/abs/2409.08475v1

Compressor summary: RT-DETRv3 improves real-time object detection by adding a CNN branch, self-attention perturbation, and a shared-weight decoder branch for dense positive supervision.


Rethinking Meta-Learning from a Learning Lens

http://arxiv.org/abs/2409.08474v1

Compressor summary: The paper proposes a new meta-learning method called Task Relation Learner (TRLearner) that uses task relations to calibrate optimization and reduce overfitting and underfitting issues in previous methods.


Explaining Datasets in Words: Statistical Models with Natural Language Parameters

http://arxiv.org/abs/2409.08466v1

Compressor summary: The authors propose a framework to fit statistical models with interpretable natural language predicates, which can be applied to various problems in textual and visual domains.


Inter Observer Variability Assessment through Ordered Weighted Belief Divergence Measure in MAGDM Application to the Ensemble Classifier Feature Fusion

http://arxiv.org/abs/2409.08450v1

Compressor summary: The study proposes an Evidential MAGDM method that handles uncertainty and conflict among experts by assessing inter-observational variability, generating belief degrees, and constructing weighted belief and plausibility measures.


Towards Unified Facial Action Unit Recognition Framework by Large Language Models

http://arxiv.org/abs/2409.08444v1

Compressor summary: AU-LLaVA is a new framework that uses a large language model to recognize facial expressions accurately and generate different formats of results for the same image.


CF-PRNet: Coarse-to-Fine Prototype Refining Network for Point Cloud Completion and Reconstruction

http://arxiv.org/abs/2409.08443v1

Compressor summary: Key points: - The paper introduces CF-PRNet, a network that reconstructs 3D shapes of fruits from partial views for agricultural tasks. - The network uses coarse-to-fine prototype refining with scaling vectors to complete point clouds accurately. - The network achieves high performance metrics and wins a challenge in shape completion and reconstruction of sweet peppers. Summary: CF-PRNet is a novel network that reconstructs 3D fruit shapes from partial views using coarse-to-fine refining with scaling vectors, and performs well in a shape completion and reconstruction challenge.


When Context Leads but Parametric Memory Follows in Large Language Models

http://arxiv.org/abs/2409.08435v1

Compressor summary: This study examines how large language models use contextual and parametric knowledge to answer questions in consistent scenarios, finding a balance between the two and fewer hallucinations with more context.


Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

http://arxiv.org/abs/2409.08434v1

Compressor summary: The paper proposes an algorithm for policy design in non-stationary MDPs that uses look-ahead predictions to achieve low regret, and shows its effectiveness in simulations.