This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-16 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.09021v1
Compressor summary: The text introduces an invertible neural network for non-invasive blood pressure monitoring using photoplethysmography, which improves accuracy by capturing high-frequency details and learning features across multiple scales.
http://arxiv.org/abs/2409.09018v1
Compressor summary: The paper proposes two methods to reduce latency and memory usage in real-time Active Speaker Detection systems, limiting future and past context frames, and shows they perform well compared to existing models.
http://arxiv.org/abs/2409.09013v1
Compressor summary: AI-LieDar is a framework to study how large language models navigate situations where being truthful conflicts with achieving goals, showing that current models are often untruthful and hard to steer towards truthfulness.
http://arxiv.org/abs/2409.09009v1
Compressor summary: The proposed method uses retrieved examples to improve rare word translation accuracy in direct ST models, with speech-to-speech retrieval being the most effective and robust approach.
http://arxiv.org/abs/2409.09007v1
Compressor summary: The paper evaluates the need for multi-layer attention in graph Transformers, proposes a simplified single-layer version (SGFormer) that scales well and requires less data, and shows its effectiveness on large graphs.
http://arxiv.org/abs/2409.09001v1
Compressor summary: E2MoCase is a dataset that helps study how emotions, morals, and events in legal stories affect media coverage and public opinion.
http://arxiv.org/abs/2409.08958v1
Compressor summary: The text discusses using influence functions to improve interpretability and validate physics-informed neural networks in fluid flow problems.
http://arxiv.org/abs/2409.08953v1
Compressor summary: Event cameras can classify images with low accuracy even after heavily subsampling events, but training CNNs becomes more sensitive to hyperparameters in highly subsampled scenarios.
http://arxiv.org/abs/2409.08947v1
Compressor summary: The method creates relightable radiance fields from single-illumination data by using 2D diffusion model priors to augment the data and optimize appearance features for multi-view consistency.
http://arxiv.org/abs/2409.08946v1
Compressor summary: DELTA is a novel approach for active graph domain adaptation that selects informative nodes and uses two subnetworks to explore topological semantics, improving performance on target graphs.
http://arxiv.org/abs/2409.08936v1
Compressor summary: The SynSUM benchmark is a synthetic dataset for research on clinical information extraction and reasoning with tabular background variables and text.
http://arxiv.org/abs/2409.08935v1
Compressor summary: This paper provides the first theory for optimizing and generalizing deep neural networks with weight normalization, showing how it affects convergence and uniformity in training and testing.
http://arxiv.org/abs/2409.08917v1
Compressor summary: The Latent Space Score-Based Diffusion Model (LSSDM) is an unsupervised learning approach for probabilistic multivariate time series imputation that projects observed values onto a low-dimensional latent space, reconstructs coarse missing data, and uses a conditional diffusion model to obtain precise imputed values with uncertainty analysis.
http://arxiv.org/abs/2409.08907v1
Compressor summary: This paper explores the potential and challenges of using foundation models for affective computing, which involves generating and analysing multimodal data related to human emotions.
http://arxiv.org/abs/2409.08892v1
Compressor summary: The text discusses how efficient coding and rate-distortion theory can be used to understand action-oriented efficient representations in organisms' perception.
http://arxiv.org/abs/2409.08887v1
Compressor summary: The text introduces VLT-MI, a novel benchmark for visual language tracking with multi-modal interaction, which improves cognitive alignment and robustness of trackers by enabling multiple rounds of text and object updates during tracking.
http://arxiv.org/abs/2409.08885v1
Compressor summary: Key points: - Object detection in remote sensing imagery is challenging due to small and barely visible objects across diverse terrains - Multimodal learning can integrate features from different data modalities to improve detection accuracy - Masked Image Modeling (MIM) can be used as a pre-training technique for object detection using self-supervised learning on unlabeled data - Conventional MIM such as MAE lacks contextual information and fine-grained details - The paper proposes a new interactive MIM method that can establish interactions between different tokens, which is beneficial for object detection in remote sensing Summary: The paper introduces an interactive Masked Image Modeling method to improve object detection in remote sensing imagery by leveraging self-supervised learning on unlabeled data and establishing interactions between different tokens.
http://arxiv.org/abs/2409.08884v1
Compressor summary: Synthetic data helps detect fake images by training vision transformers with synthetic representation learners like SynCLR, outperforming CLIP on unseen GAN models.
http://arxiv.org/abs/2409.08872v1
Compressor summary: The study shows that using a data selection scheme to augment limited target language data improves automatic speech recognition for two endangered languages, Amis and Seediq.
http://arxiv.org/abs/2409.08864v1
Compressor summary: The study explores how using images alongside text improves large language models' ability to understand graphs.
http://arxiv.org/abs/2409.08861v1
Compressor summary: This paper proposes a new method for reward fine-tuning of dynamical generative models using stochastic optimal control, which improves their quality and generalization.
http://arxiv.org/abs/2409.08857v1
Compressor summary: InstantDrag is an optimization-free method that enables fast, photo-realistic drag-based image editing without masks or text prompts using two networks that learn motion dynamics from real-world video datasets.
http://arxiv.org/abs/2409.08853v1
Compressor summary:
http://arxiv.org/abs/2409.08847v1
Compressor summary: The text discusses a new method for calibrating and optimizing Microsoft Kinect sensors, which are widely used in 3D vision systems for applications such as medical and biometric fields.
http://arxiv.org/abs/2409.08845v1
Compressor summary: Agreement-aware Iterative Preference Optimization (AIPO) addresses the length exploitation issue in iterative preference optimization with synthetic data, achieving state-of-the-art results on various language model benchmarks.
http://arxiv.org/abs/2409.08840v1
Compressor summary: Direct-CP is a system that uses RSUs to help autonomous vehicles signal their interests and focus on important areas, improving their local perception accuracy in collaborative 3D object detection tasks.
http://arxiv.org/abs/2409.08832v1
Compressor summary: The paper introduces Kolmogorov-Arnold Networks as a new method for machine learning in laser fusion, which improves prediction accuracy and interpretability compared to other approaches.
http://arxiv.org/abs/2409.08823v1
Compressor summary: The paper proposes a multistage fitting procedure that improves scoring accuracy and calibration for computerized adaptive tests using out-of-the-box AutoML tools.
http://arxiv.org/abs/2409.08820v1
Compressor summary: The paper proposes a method to automatically generate competency questions (CQs) for ontology development using large language models (LLMs) and scientific papers as input, and evaluates its performance on two domain engineering tasks.
http://arxiv.org/abs/2409.08813v1
Compressor summary: This paper shows how using a less powerful language model can generate effective feedback for aligning AI systems with human values and intentions, making alignment more scalable and sustainable.
http://arxiv.org/abs/2409.08806v1
Compressor summary: The study introduces TabKANet, a Transformer-based model that uses Kolmogorov-Arnold network to encode and merge numerical and categorical features for tabular data, achieving excellent results in six binary classification tasks.
http://arxiv.org/abs/2409.08805v1
Compressor summary: This paper compares discrete tokens from self-supervised learning models for speech recognition in multiple languages and scenarios, showing improved performance and efficiency over Fbank features.
http://arxiv.org/abs/2409.08800v1
Compressor summary: The text introduces a method to extend the field-of-view of CBCT systems using deep learning and improve their clinical applications, especially for reconstructing rib structures.
http://arxiv.org/abs/2409.08797v1
Compressor summary: The paper shows that using discrete speech features from self-supervised learning in ASR systems improves performance, especially for cross-utterance contexts.
http://arxiv.org/abs/2409.08792v1
Compressor summary: The study shows how large language models can help create recipes with more phytochemicals, potentially improving health, but cautions that these benefits need clinical validation.
http://arxiv.org/abs/2409.08788v1
Compressor summary: ECG-ReGen is a retrieval-based method that uses self-supervised learning and large language models to generate comprehensive reports and answer questions from electrocardiograms, potentially improving patient care.
http://arxiv.org/abs/2409.08782v1
Compressor summary: The paper proposes a novel contactless fingerprint recognition algorithm that captures the 3D feature of contactless fingerprints and improves matching accuracy across multiple poses.
http://arxiv.org/abs/2409.08780v1
Compressor summary: The text describes a project that explores how to improve sign language translation of German sign language, especially for ambiguous words, using different bodypart representations in transformer models and evaluating their impact on performance.
http://arxiv.org/abs/2409.08771v1
Compressor summary: The paper proposes a distributed algorithm for low-rank matrix factorization, using power initialization to improve convergence rates and reduce communication overhead.
http://arxiv.org/abs/2409.08770v1
Compressor summary: The paper analyzes four mini-batch SGD schedulers and shows that increasing batch size and learning rate can improve performance and minimize the full gradient norm of the empirical loss faster.
http://arxiv.org/abs/2409.08766v1
Compressor summary: The paper proposes SAUC, a novel framework that calibrates uncertainty in both zero and non-zero values for spatiotemporal prediction using probabilistic Graph Neural Networks and quantile approaches.
http://arxiv.org/abs/2409.08760v1
Compressor summary: The paper proposes a new method for estimating unknown graph connectivity from incomplete and streaming data using stationary signals and a convex optimization problem.
http://arxiv.org/abs/2409.08754v1
Compressor summary: DAEDL is a novel method for improving uncertainty estimation in deep learning by integrating feature space density and using a new parameterization, achieving state-of-the-art results on various tasks.
http://arxiv.org/abs/2409.08752v1
Compressor summary: Juggler-MAB is a hybrid recommender system that combines meta-learning and Multi-Armed Bandits to balance multiple objectives for various stakeholders in online marketplaces.
http://arxiv.org/abs/2409.08744v1
Compressor summary: The authors study how different Foundation Models and labeling strategies affect the performance and uncertainty of estimating vegetation coverage in various areas using Sentinel satellite images.
http://arxiv.org/abs/2409.08741v1
Compressor summary: The paper proposes an adaptive sampling method for steerable networks that adjusts to data symmetries, improving performance, equivariance, and computational efficiency.
http://arxiv.org/abs/2409.08733v1
Compressor summary: Sequence recommendation models should consider multiple user intents instead of just one to better capture real-world scenarios.
http://arxiv.org/abs/2409.08732v1
Compressor summary: The authors propose NCDENow, a GDP nowcasting framework that combines neural controlled differential equations with dynamic factor models to handle irregular dynamics and improve prediction accuracy.
http://arxiv.org/abs/2409.08724v1
Compressor summary: The paper explores how goal-conditioned reinforcement learning can be improved with dense rewards by using a quasimetric structure in the optimal value function, leading to more efficient neural architectures and better sample complexity in robotics tasks.
http://arxiv.org/abs/2409.08719v1
Compressor summary: Key points: - Propose a method to distil word meaning in context from masked language model - No human-annotated corpora or parameter updates needed - Use self-attention and auto-encoder to combine hidden layers outputs - Perform well on monolingual and crosslingual tasks Summary: The study presents a method that uses self-attention and auto-encoder to extract word meaning in context from masked language models without human annotations or updates, achieving competitive results on various semantic tasks.
http://arxiv.org/abs/2409.08712v1
Compressor summary: The paper investigates how deep neural networks learn and forget features during forward propagation, and tracks the changes in their interactions and generalization capacity.
http://arxiv.org/abs/2409.08706v1
Compressor summary: The paper introduces L3Cube-IndicQuest, a question-answering benchmark dataset for evaluating regional knowledge in multilingual LLMs across 20 Indic languages and five domains.
http://arxiv.org/abs/2409.08700v1
Compressor summary: The study uses wearable devices and AI to predict weight loss in overweight people by analyzing various data sources and achieves promising results with an 84.44% accuracy rate.
http://arxiv.org/abs/2409.08695v1
Compressor summary: Key points: - An innovative system combines computer vision and IoT for precise Tilapia feeding - It uses real-time sensors to monitor water quality and fish size/count - A mobile app enables remote monitoring and control - The method could increase production up to 58 times compared to traditional farms Summary: The system, which combines computer vision and IoT, monitors water quality and fish size with sensors and a mobile app, and feeds Tilapia optimally, potentially boosting production by 58 times.
http://arxiv.org/abs/2409.08691v1
Compressor summary: The authors propose an autoregressive pre-training method for 3D medical image representations that leverages spatial, contrast, and semantic correlations to better understand and integrate contextual information.
http://arxiv.org/abs/2409.08676v1
Compressor summary: The paper proposes a new GNN architecture that can handle heterophilic data by reinterpreting graph filters and improving expressiveness, permutation equivariance, and performance.
http://arxiv.org/abs/2409.08669v1
Compressor summary: AdR-Gaussian accelerates 3D Gaussian splatting by moving culling to the preprocess stage, using adaptive radius and load balancing to reduce overhead and increase quality.
http://arxiv.org/abs/2409.08667v1
Compressor summary: The paper proposes a novel self-training framework with a new network architecture and data augmentation method for hyperspectral image super-resolution, achieving significant improvements over existing methods.
http://arxiv.org/abs/2409.08666v1
Compressor summary: The paper provides a detailed overview of formal AI certification in avionics, discussing the challenges and importance of ensuring safety and reliability in AI systems.
http://arxiv.org/abs/2409.08660v1
Compressor summary: The paper presents a new online algorithm for learning expanding graphs from spatiotemporal signals that can handle graph growth and node dynamics.
http://arxiv.org/abs/2409.08658v1
Compressor summary: FairLink is a method that learns a fairness-enhanced graph for link prediction, ensuring equal link probabilities between nodes from the same sensitive group and reducing bias in predictions.
http://arxiv.org/abs/2409.08647v1
Compressor summary: This paper studies how label noise affects gradient-boosted decision trees (GBDTs) for tabular data and proposes methods to improve their performance.
http://arxiv.org/abs/2409.08642v1
Compressor summary: CPL uses Monte Carlo Tree Search to improve large language models' general reasoning capabilities by learning step-level planning preferences, while Step-APO enhances existing preference learning approaches for complex multi-step reasoning tasks.
http://arxiv.org/abs/2409.08641v1
Compressor summary: The paper presents an intelligent algorithm selection tool for the Job Shop Scheduling Problem using machine learning, which optimizes energy efficiency and production metrics by recommending the best solver for each instance.
http://arxiv.org/abs/2409.08640v1
Compressor summary: Our method improves distributed learning by using Polyak Momentum to defend against Byzantine workers and achieve better convergence results.
http://arxiv.org/abs/2409.08636v1
Compressor summary: The paper introduces a novel data fingerprint method that helps select AI algorithms for time series classification without needing access to all data points, improving algorithm selection accuracy.
http://arxiv.org/abs/2409.08633v1
Compressor summary: The authors propose an approach to improve noise resistance in analog neural networks by revealing and using the underlying mechanisms that reduce sensitivity to noise.
http://arxiv.org/abs/2409.08613v1
Compressor summary: Dust-GS is a new framework for scene synthesis that improves on 3D Gaussian Splatting by using an adaptive masking technique and working better with sparse input data.
http://arxiv.org/abs/2409.08609v1
Compressor summary: DSCAF is a framework that optimizes coupons for e-commerce sellers by dynamically adjusting their allocation strategies across multiple promotions to maximize ROI and sell-through rate.
http://arxiv.org/abs/2409.08598v1
Compressor summary: Key points: - Existing FER methods use discrete labels, which are limited for emotional recognition - Proposed a novel method that uses text embeddings to enhance facial expression representations - Used an emotional-to-neutral transformation with a self-contrast objective - Outperformed state-of-the-art FER methods on four datasets using different visual encoders Summary: The paper proposes a new FER method that leverages text embeddings and an emotional-to-neutral transformation to improve facial expression recognition, achieving superior results on four datasets.
http://arxiv.org/abs/2409.08596v1
Compressor summary: The paper explores how large language models can be used to transcribe speech in multi-talker situations using different instructions and speaker characteristics.
http://arxiv.org/abs/2409.08585v1
Compressor summary: The proposed WaveLUT method improves low-light video enhancement by using a wavelet-based lookup table, dynamic fusion strategy, and text-driven appearance reconstruction to achieve color coherence, accurate mapping, and low latency.
http://arxiv.org/abs/2409.08582v1
Compressor summary: ChangeChat is a bitemporal vision-language model that supports interactive RS change analysis using multimodal instruction tuning and the ChangeChat-87k dataset.
http://arxiv.org/abs/2409.08580v1
Compressor summary: The Molecular Structural Similarity Motif GNN (MSSM-GNN) is a novel method that leverages graph kernel algorithms to capture structural similarity between molecules and improve feature representation learning for property prediction.
http://arxiv.org/abs/2409.08573v1
Compressor summary: Key points: - ViT for handwritten text recognition with limited data - Data-efficient encoder + CNN + SAM optimizer - Span mask technique as a regularizer - Outperforms traditional models on small datasets and sets new benchmark on LAM dataset Summary: The paper proposes a data-efficient ViT method for handwritten text recognition that uses an encoder, a CNN, and a SAM optimizer with span masking. It beats conventional models on small datasets and achieves the best result on the largest LAM dataset.
http://arxiv.org/abs/2409.08572v1
Compressor summary: The paper proposes DiffFAS, a framework to improve face anti-spoofing by addressing image quality and style shifts between domains and attack types, using diffusion-based generation of high-fidelity spoof faces.
http://arxiv.org/abs/2409.08570v1
Compressor summary: The paper proposes a simple batch ensemble scheme for online RL that achieves near-optimal regret in stochastic MAB with just one parameter, the number of batches.
http://arxiv.org/abs/2409.08566v1
Compressor summary: The paper proposes Hybrid-TTA, a method that adapts to domain shifts by combining Full-Tuning and Efficient-Tuning strategies with Dynamic Domain Shift Detection and Masked Image Modeling based Adaptation.
http://arxiv.org/abs/2409.08564v1
Compressor summary: IndoCareer is a diverse dataset for evaluating language models on vocational and professional exams in Indonesia, highlighting their challenges in local contexts like insurance and finance.
http://arxiv.org/abs/2409.08563v1
Compressor summary: The paper introduces the second-order difference subspace, which analyzes geometric differences between multiple subspaces in machine learning, and applies it to temporal shape analysis and biometric signal analysis.
http://arxiv.org/abs/2409.08562v1
Compressor summary: CSS is a new technique that uses crowd-sourced images to reconstruct challenging scenes with high quality and accuracy, overcoming limitations of traditional 3D methods.
http://arxiv.org/abs/2409.08561v1
Compressor summary: Key points: - Large language models can reason and solve problems using chain-of-thought (CoT) prompting, but it's slow and costly to generate the full CoT process. - The proposed method compresses the CoT process through semantic alignment, using an auxiliary CoT model that learns to generate a compact representation of the thought process. - The method achieves competitive or improved performance compared to the full CoT baseline, while providing significant speedup in decoding time. Summary: The paper proposes a novel approach to compress the chain-of-thought (CoT) process in large language models using semantic alignment and an auxiliary model, improving efficiency and performance in various tasks.
http://arxiv.org/abs/2409.08558v1
Compressor summary: Fair coVariance Neural Networks (FVNNs) use graph convolutions to process covariance matrices for both fair and accurate predictions in signal processing and machine learning applications.
http://arxiv.org/abs/2409.08557v1
Compressor summary: The paper proposes a DICS model to extract domain-invariant and class-specific features for deep neural networks, which improves their performance in out-of-distribution scenarios.
http://arxiv.org/abs/2409.08554v1
Compressor summary: The paper evaluates large language models for grapheme-to-phoneme conversion and proposes methods to improve their performance without extra training or data, showing they can outperform traditional tools in Persian.
http://arxiv.org/abs/2409.08544v1
Compressor summary: CgNN is a new method that uses network structure as instrumental variables to estimate causal effects in network data, while accounting for hidden confounders and node importance.
http://arxiv.org/abs/2409.08538v1
Compressor summary: The text proposes DTIP, a novel framework that uses split learning and differential privacy to enhance satellite communication efficiency, accuracy, and privacy.
http://arxiv.org/abs/2409.08530v1
Compressor summary: MAT is a new hybrid model combining Mamba and Transformer techniques to improve long-short range time series forecasting by leveraging their respective strengths in capturing dependencies and patterns.
http://arxiv.org/abs/2409.08523v1
Compressor summary: Eir Thai Medical LLM is a large language model that enhances medical tasks in Thai with high accuracy and clear answers for healthcare professionals and patients.
http://arxiv.org/abs/2409.08520v1
Compressor summary: The paper presents GroundingBooth, a framework that generates personalized images with accurate layout alignment and identity preservation in text-to-image customization tasks, enabling the customization of multiple subjects at once.
http://arxiv.org/abs/2409.08518v1
Compressor summary: The authors present a method for open vocabulary image classification that can learn from new data anytime, improve existing models, and reduce storage and computation using attention-weighted PCA compression.
http://arxiv.org/abs/2409.08513v1
Compressor summary: Mamba-YOLO-World improves object detection beyond predefined categories by introducing a novel feature fusion mechanism that combines speed and efficiency.
http://arxiv.org/abs/2409.08509v1
Compressor summary: VESPR is a defense mechanism that exploits supervised learning's vulnerability to poison attacks, enhancing self-supervised learning's performance on poisoned images and outperforming six previous defenses.
http://arxiv.org/abs/2409.08508v1
Compressor summary: The study used thermal sensor arrays to monitor daily activities in households, preserving privacy and accurately detecting sleep and daily life activities.
http://arxiv.org/abs/2409.08483v1
Compressor summary: The text describes using machine learning and AI to detect depression indicators from text data by analyzing interviews with virtual agents and proposes text summarization as a preprocessing technique to improve accuracy.
http://arxiv.org/abs/2409.08482v1
Compressor summary: The paper explores privacy risks of fine-tuning diffusion models on personal images and shows that existing defenses fail to protect the data.
http://arxiv.org/abs/2409.08477v1
Compressor summary: The authors propose a new method that combines neural operators with diffusion models to improve the surrogate modeling of turbulent flows by enhancing the resolution of turbulent structures and better capturing high-frequency flow dynamics.
http://arxiv.org/abs/2409.08475v1
Compressor summary: RT-DETRv3 improves real-time object detection by adding a CNN branch, self-attention perturbation, and a shared-weight decoder branch for dense positive supervision.
http://arxiv.org/abs/2409.08474v1
Compressor summary: The paper proposes a new meta-learning method called Task Relation Learner (TRLearner) that uses task relations to calibrate optimization and reduce overfitting and underfitting issues in previous methods.
http://arxiv.org/abs/2409.08466v1
Compressor summary: The authors propose a framework to fit statistical models with interpretable natural language predicates, which can be applied to various problems in textual and visual domains.
http://arxiv.org/abs/2409.08450v1
Compressor summary: The study proposes an Evidential MAGDM method that handles uncertainty and conflict among experts by assessing inter-observational variability, generating belief degrees, and constructing weighted belief and plausibility measures.
http://arxiv.org/abs/2409.08444v1
Compressor summary: AU-LLaVA is a new framework that uses a large language model to recognize facial expressions accurately and generate different formats of results for the same image.
http://arxiv.org/abs/2409.08443v1
Compressor summary: Key points: - The paper introduces CF-PRNet, a network that reconstructs 3D shapes of fruits from partial views for agricultural tasks. - The network uses coarse-to-fine prototype refining with scaling vectors to complete point clouds accurately. - The network achieves high performance metrics and wins a challenge in shape completion and reconstruction of sweet peppers. Summary: CF-PRNet is a novel network that reconstructs 3D fruit shapes from partial views using coarse-to-fine refining with scaling vectors, and performs well in a shape completion and reconstruction challenge.
http://arxiv.org/abs/2409.08435v1
Compressor summary: This study examines how large language models use contextual and parametric knowledge to answer questions in consistent scenarios, finding a balance between the two and fewer hallucinations with more context.
http://arxiv.org/abs/2409.08434v1
Compressor summary: The paper proposes an algorithm for policy design in non-stationary MDPs that uses look-ahead predictions to achieve low regret, and shows its effectiveness in simulations.