arxiv compressed, 2024-07-22

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-22 generated by the compressor, my personal LLM-based project.


DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks

http://arxiv.org/abs/2407.14509v1

Compressor summary: The proposed method explains how an image classifier works by permuting interpretable concepts like captions across images and measuring performance changes.


Internal Consistency and Self-Feedback in Large Language Models: A Survey

http://arxiv.org/abs/2407.14507v1

Compressor summary: This paper introduces Internal Consistency and Self-Feedback frameworks to improve large language models' reasoning and reduce hallucinations by evaluating and updating themselves.


T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

http://arxiv.org/abs/2407.14505v1

Compressor summary: The paper introduces T2V-CompBench, a benchmark for evaluating the ability of text-to-video generation models to compose different objects, attributes, actions, and motions into videos, covering various aspects of compositionality and proposing evaluation metrics.


On Pre-training of Multimodal Language Models Customized for Chart Understanding

http://arxiv.org/abs/2407.14506v1

Compressor summary: The paper proposes a method to improve large language models' ability to understand and extract numeric values from scientific charts by incorporating data value alignment pre-training, random image replacement, and question-based fine-tuning.


Nonlinear Schrödinger Network

http://arxiv.org/abs/2407.14504v1

Compressor summary: The Nonlinear Schr"odinger Network combines physics with AI by treating the Nonlinear Schr"odinger Equation as a trainable model for learning complex patterns, offering interpretability, efficiency, and accuracy in time series tasks.


Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification

http://arxiv.org/abs/2407.14503v1

Compressor summary: RLHF rewards may have light- or heavy-tailed error, and while light-tailed errors are mitigated by less restrictive KL penalties, heavy-tailed errors can lead to catastrophic Goodhart even with KL regularization.


M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

http://arxiv.org/abs/2407.14502v1

Compressor summary: The Multi-Motion Discrete Diffusion Models (M2D2M) is a novel method to generate human motions from text descriptions using discrete diffusion models that adapt transition probabilities based on motion proximity and use a two-phase sampling strategy for smooth and coherent results.


Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities

http://arxiv.org/abs/2407.14501v1

Compressor summary: The text describes a dataset of air quality measurements from 30 diverse indoor sites in India, which can help researchers understand and address the problem of indoor air pollution in developing countries.


Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

http://arxiv.org/abs/2407.14499v1

Compressor summary: The Discover-then-Name-CBM (DN-CBM) method uses sparse autoencoders to find, name, and classify concepts learned by deep neural networks, improving their interpretability.


Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation

http://arxiv.org/abs/2407.14498v1

Compressor summary: The paper proposes a YOLO-based framework for detecting multiple layout hotspots using PCA-augmented layout images, achieving high precision and recall rates with low false alarms.


Conformal Thresholded Intervals for Efficient Regression

http://arxiv.org/abs/2407.14495v1

Compressor summary: CTI is a new method for producing prediction sets with guaranteed coverage by estimating the conditional interquantile intervals and thresholding them based on their length.


InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

http://arxiv.org/abs/2407.14494v1

Compressor summary: InterpBench is a tool for evaluating mechanistic interpretability methods using semi-synthetic transformers with known circuits trained by Strict IIT, which aligns internal computation with a high-level causal model.


PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding

http://arxiv.org/abs/2407.14491v1

Compressor summary: PD-TPE is a model that uses a double-branch decoder to focus on relevant tokens in 3D visual grounding tasks, outperforming the state-of-the-art on two datasets.


Evaluating the Reliability of Self-Explanations in Large Language Models

http://arxiv.org/abs/2407.14487v1

Compressor summary: The paper studies how well language models can explain their own answers and suggests using counterfactual explanations as a better way to understand their reasoning.


ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

http://arxiv.org/abs/2407.14482v1

Compressor summary: ChatQA 2 is a model that combines long-context understanding and retrieval-augmented generation, outperforming GPT-4 on some tasks and achieving comparable results on others.


A review on vision-based motion estimation

http://arxiv.org/abs/2407.14478v1

Compressor summary: The paper reviews vision-based motion measurement methods, discusses their limitations, and proposes a new method using Gaussian kernels for improved accuracy and robustness.


Data-Centric Human Preference Optimization with Rationales

http://arxiv.org/abs/2407.14477v1

Compressor summary: The text proposes enriching preference datasets with machine-generated rationales to improve reinforcement learning from human feedback, and shows that this approach enhances learning efficiency, data efficiency, convergence speed, and reduces some biases.


Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

http://arxiv.org/abs/2407.14474v1

Compressor summary: CoFE is a novel framework for radiology report generation that uses counterfactual explanations to learn non-spurious visual representations, leading to better report quality and performance.


Check-Eval: A Checklist-based Approach for Evaluating Text Quality

http://arxiv.org/abs/2407.14467v1

Compressor summary: Check-Eval is a novel evaluation framework that uses LLMs to assess text quality through a checklist-based approach, which improves on existing metrics and correlates better with human judgments.


AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection

http://arxiv.org/abs/2407.14464v1

Compressor summary: The paper proposes two new 3D attention blocks for efficient processing of large 3D medical images, and applies them to 3D lung nodule detection using pulmonary CT scans.


SurvReLU: Inherently Interpretable Survival Analysis via Deep ReLU Networks

http://arxiv.org/abs/2407.14463v1

Compressor summary: The paper proposes SurvReLU, a deep ReLU network that combines the interpretability of tree-based survival models with the performance of neural networks.


PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer

http://arxiv.org/abs/2407.14459v1

Compressor summary: PolyFormer uses PolyAttn to learn node-wise filters efficiently and effectively for graph representation learning on large-scale graphs.


Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

http://arxiv.org/abs/2407.14439v1

Compressor summary: The text proposes a method to compress document images for more efficient and adaptive document understanding in multimodal language models, by assessing token repetitiveness and selecting informative tokens based on their correlation with the [CLS] token.


Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

http://arxiv.org/abs/2407.14435v1

Compressor summary: JumpReLU SAEs are a new type of sparse autoencoder that achieves better reconstruction fidelity and interpretability than other methods by using a discontinuous activation function and straight-through estimators.


The Extrapolation Power of Implicit Models

http://arxiv.org/abs/2407.14430v1

Compressor summary: Implicit deep learning models outperform traditional ones in extrapolating unobserved data across various scenarios due to their adaptability and feedback incorporation.


HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

http://arxiv.org/abs/2407.14419v1

Compressor summary: HOTS3D is a new method that uses spherical optimal transport to align text and image features for better 3D shape generation based on CLIP embeddings.


System-1.x: Learning to Balance Fast and Slow Planning with Language Models

http://arxiv.org/abs/2407.14414v1

Compressor summary: The System-1.x Planner is a controllable planning framework that combines system 1 and system 2 modes of language models to generate hybrid plans for long-horizon problems based on difficulty.


DEAL: Disentangle and Localize Concept-level Explanations for VLMs

http://arxiv.org/abs/2407.14412v1

Compressor summary: DEAL improves VLMs' explanations for fine-grained concepts by making them more distinct and localized, reducing spurious correlations and boosting accuracy.


The Vision of Autonomic Computing: Can LLMs Make It a Reality?

http://arxiv.org/abs/2407.14402v1

Compressor summary: This paper explores using large language models to help computers manage themselves like biological organisms in changing environments, and shows promising results with a microservice demo project.


GLAudio Listens to the Sound of the Graph

http://arxiv.org/abs/2407.14387v1

Compressor summary: GLAudio is a novel architecture that learns from audio data using graph learning, information propagation, and processing in two separate steps.


Frontiers of Deep Learning: From Novel Application to Real-World Deployment

http://arxiv.org/abs/2407.14386v1

Compressor summary: The text summarizes two recent papers on applying deep learning to image processing and recommendation systems, discussing their techniques and future research directions.


Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions

http://arxiv.org/abs/2407.14381v1

Compressor summary: The paper studies how using class-balanced loss functions can improve Gradient Boosting Decision Trees' performance on imbalanced datasets and introduces a Python package for this purpose.


Open Artificial Knowledge

http://arxiv.org/abs/2407.14371v1

Compressor summary: The OAK dataset is a large-scale resource of over 500 million tokens generated by multiple state-of-the-art LLMs to provide diverse and high-quality text across various domains for training chat-based AI systems.


Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

http://arxiv.org/abs/2407.14367v1

Compressor summary: The paper introduces a new dataset, FairFD, for studying racial bias in forgery detection models and proposes novel fairness metrics to evaluate them.


Interior Object Geometry via Fitted Frames

http://arxiv.org/abs/2407.14357v1

Compressor summary: The paper introduces a new representation method (evolutionary s-rep) for anatomic objects that captures locational correspondence and produces alignment-free geometric features using ellipsoid deformation and skeletal representation, and shows improved classification performance on hippocampi shape.


Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircrafts

http://arxiv.org/abs/2407.14352v1

Compressor summary: Key points: - Vision-based system to detect power lines and pylons for low-flying aircraft safety - Deep learning approach with transfer learning and curvilinear structure delineation - Single network for both detection tasks, tested on two datasets, integrated in onboard system Summary: The authors develop a deep learning vision system that uses one network to detect power lines and pylons from far away, enhancing the safety of low-flying aircraft.


LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

http://arxiv.org/abs/2407.14344v1

Compressor summary: The study explores if GPT-4 can classify news source political bias based on URLs and finds a high correlation with MBFC ratings but also some limitations.


Quantifying the value of positive transfer: An experimental case study

http://arxiv.org/abs/2407.14342v1

Compressor summary: The paper presents a method to measure how useful it is to share information from similar structures for maintaining and operating aircraft models.


Straightforward Layer-wise Pruning for More Efficient Visual Adaptation

http://arxiv.org/abs/2407.14330v1

Compressor summary: SLS is a layer-wise pruning method for transfer learning that uses clustering metrics to reduce model redundancy and storage overhead, improving model throughput and accuracy.


Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus

http://arxiv.org/abs/2407.14328v1

Compressor summary: The study presents a novel method to improve early detection of autism spectrum disorder in children by analyzing code-switched speech using advanced audio processing techniques and a hierarchical feature fusion strategy.


Multimodal Misinformation Detection using Large Vision-Language Models

http://arxiv.org/abs/2407.14321v1

Compressor summary: The paper explores how large language models can help detect misinformation by incorporating evidence retrieval into the process, using both text and images as input for fact-checking.


Joint or Disjoint: Mixing Training Regimes for Early-Exit Models

http://arxiv.org/abs/2407.14320v1

Compressor summary: Early exits help reduce computation in deep neural networks by allowing them to stop earlier for high-confidence inputs; this paper proposes a new training approach and categorizes early-exit strategies for efficiency and performance analysis.


How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading

http://arxiv.org/abs/2407.14309v1

Compressor summary: The authors introduce GuidingQ, a dataset of in-text questions from textbooks and scientific articles, analyze its linguistic characteristics, and explore various methods to generate similar questions using language models, finding that generated questions are almost as effective as human-written ones in improving reading comprehension.


Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

http://arxiv.org/abs/2407.14302v1

Compressor summary: The paper proposes Dynamic Adapter (Dyn-Adapter), an efficient visual recognition method that improves parameter-efficient transfer learning by disentangling features and reducing computation cost, while maintaining or enhancing recognition accuracy.


Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment

http://arxiv.org/abs/2407.14303v1

Compressor summary: Key points: - The paper proposes Spatio-Temporal Monge Alignment (STMA), an Optimal Transport method for Domain Adaptation in machine learning applications on signals. - STMA adapts the cross-PSD of multivariate signals by mapping them to the Wasserstein barycenter of source domains, without retraining the model with source data. - The paper also studies two special cases of STMA, TMA and SMA, and provides non-asymptotic concentration bounds for the mappings estimation. - The paper shows that STMA leads to significant performance gains on multivariate biosignals and image data, and is complementary to deep learning methods. Summary: The paper introduces STMA, an Optimal Transport method for adapting signals across domains without retraining the model, and shows its effectiveness on biosignals and images.


Adaptive Frequency Enhancement Network for Single Image Deraining

http://arxiv.org/abs/2407.14292v1

Compressor summary: AFENet is a novel end-to-end network for single image deraining that adaptively enhances images across various frequencies, improving the visibility of images damaged by rain and reconstructing details accurately.


How to Blend Concepts in Diffusion Models

http://arxiv.org/abs/2407.14280v1

Compressor summary: The authors explore how to manipulate latent spaces for concept blending using diffusion models and find that it is possible but depends on the context.


OpenSU3D: Open World 3D Scene Understanding using Foundation Models

http://arxiv.org/abs/2407.14279v1

Compressor summary: The paper proposes a method to build scalable, instance-level 3D scene representations from 2D models, improving open world understanding of complex queries and outperforming existing methods.


Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer's Disease classification

http://arxiv.org/abs/2407.14277v1

Compressor summary: PIMPNet is a neural network that uses 3D brain images and age to classify Alzheimer's Disease from structural Magnetic Resonance Imaging.


Voices in a Crowd: Searching for Clusters of Unique Perspectives

http://arxiv.org/abs/2407.14259v1

Compressor summary: The text proposes a framework for training language models that captures minority perspectives by clustering similar opinions from annotators without using their metadata.


SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

http://arxiv.org/abs/2407.14257v1

Compressor summary: SparseCraft is a fast and accurate method to reconstruct 3D shapes and appearance from few images using an implicit neural representation trained with ray marching and multi-view stereo cues.


An Attention-based Representation Distillation Baseline for Multi-Label Continual Learning

http://arxiv.org/abs/2407.14249v1

Compressor summary: The text discusses the challenges and solutions for continual learning in multi-label scenarios, proposing a new method called Selective Class Attention Distillation that outperforms existing methods.


Conditioning Chat-GPT for information retrieval: the Unipa-GPT case study

http://arxiv.org/abs/2407.14246v1

Compressor summary: Key points: - Unipa-GPT is a chatbot for helping students choose degrees at University of Palermo - It uses gpt-3.5-turbo, a Large Language Model - It employs Retrieval Augmented Generation and fine-tuning techniques - The paper compares RAG and fine-tuned systems, and discusses their performance - It also compares with other Large Language Models and shows results from SHARPER night Summary: The paper presents Unipa-GPT, a chatbot that helps students choose degrees using gpt-3.5-turbo and different techniques, and evaluates its performance and advantages over other models.


Dataset Distillation by Automatic Training Trajectories

http://arxiv.org/abs/2407.14245v1

Compressor summary: Automatic Training Trajectories (ATT) is a new approach that adapts trajectory length to avoid mismatches and improve generalization for dataset distillation.


Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection

http://arxiv.org/abs/2407.14231v1

Compressor summary: The paper evaluates test-time adaptation methods using surrogate-based hyperparameter selection and shows that some state-of-the-art methods perform worse in realistic scenarios, highlighting the need for more rigorous benchmarking and supervision.


ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

http://arxiv.org/abs/2407.14230v1

Compressor summary: The text describes a new computer-aided glaucoma diagnosis framework, ETSCL, that uses contrastive learning and evidence theory to improve feature extraction and prediction with uncertainty estimation.


Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition

http://arxiv.org/abs/2407.14224v1

Compressor summary: Key points: - The paper proposes a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure - The dataset covers 2,002 common words in ISL recorded by 20 deaf adult signers - The model, HWGAT, captures distinctive motions by giving attention to different body parts induced by the human skeleton graph structure - The model outperforms existing state-of-the-art skeleton-based models on various datasets Summary: The paper introduces a new ISL dataset and a skeleton graph-based SL recognition model, HWGAT, that captures distinctive motions and achieves superior performance compared to existing models.


Domain Adaptation for Industrial Time-series Forecasting via Counterfactual Inference

http://arxiv.org/abs/2407.14214v1

Compressor summary: The CDA forecaster uses causality analysis and answer-based attention to improve industrial time-series forecasting with limited data, enabling better decision-making in production processes.


Enhanced Mortality Prediction in ICU Stroke Patients via Deep Learning

http://arxiv.org/abs/2407.14211v1

Compressor summary: Key points: - The text is about predicting mortality risk of ischemic stroke patients in ICU using deep learning and machine learning models - The data was obtained from MIMIC-IV database and preprocessed with SMOTE and feature selection - The proposed model (XGB-DL) achieved higher specificity and AUROC than other baseline models Summary: The authors developed a deep learning model (XGB-DL) that improved mortality prediction of ischemic stroke patients in ICU using data from MIMIC-IV database and advanced feature selection techniques.


Fair Overlap Number of Balls (Fair-ONB): A Data-Morphology-based Undersampling Method for Bias Reduction

http://arxiv.org/abs/2407.14210v1

Compressor summary: Fair-ONB is an undersampling method that uses data morphology to reduce discrimination in machine learning models by selecting areas for undersampling where different groups overlap.


Unlearning Concepts from Text-to-Video Diffusion Models

http://arxiv.org/abs/2407.14209v1

Compressor summary: The paper proposes a low-cost method to unlearn specific concepts from text-to-video diffusion models by transferring the unlearning capability of text encoders from text-to-image diffusion models, enabling the removal of copyrighted content and unsafe videos.


Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model

http://arxiv.org/abs/2407.14208v1

Compressor summary: The paper proposes a memory-efficient method for adapting models to new classes and domains without access to the source data or large memory, using Gaussian mixture models and contrastive losses.


Longhorn: State Space Models are Amortized Online Learners

http://arxiv.org/abs/2407.14207v1

Compressor summary: The paper proposes a new deep state-space model architecture for sequence modeling based on online learning objectives, which achieves better performance than existing models.


Watermark Smoothing Attacks against Language Models

http://arxiv.org/abs/2407.14206v1

Compressor summary: Smoothing attacks can remove watermarks from large language models' text without affecting its quality, exposing a weakness in watermarking methods.


Bucketed Ranking-based Losses for Efficient Training of Object Detectors

http://arxiv.org/abs/2407.14204v1

Compressor summary: Bucketed Ranking-based Losses reduce the time complexity of ranking-based losses in object detection by grouping negative predictions into buckets, achieving similar accuracy with faster training.


Double-Shot 3D Shape Measurement with a Dual-Branch Network

http://arxiv.org/abs/2407.14198v1

Compressor summary: PDCNet is a neural network that combines convolutional and self-attention mechanisms to improve 3D measurement accuracy using structured light, reducing fringe order ambiguity and enhancing performance at object boundaries.


A Benchmark for Gaussian Splatting Compression and Quality Assessment Study

http://arxiv.org/abs/2407.14197v1

Compressor summary: The paper introduces Graph-based GS Compression (GGSC), a simple and effective method for compressing graph signal data, and a corresponding dataset (GSQA) to analyze the distortion characteristics of different GS operations.


LeKUBE: A Legal Knowledge Update BEnchmark

http://arxiv.org/abs/2407.14192v1

Compressor summary: The text introduces LeKUBE, a benchmark to evaluate knowledge update methods for legal language models considering the specific challenges of the legal domain.


Normative Diffusion Autoencoders: Application to Amyotrophic Lateral Sclerosis

http://arxiv.org/abs/2407.14191v1

Compressor summary: The authors propose a new method called normative diffusion autoencoder that combines advantages of normative modeling and diffusion models to improve survival prediction in ALS using MRI data.


Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity Models

http://arxiv.org/abs/2407.14185v1

Compressor summary: The study compares metrics for hyperparameter tuning in neural network models for drug discovery and proposes a Bayesian method to improve calibration and quantify uncertainty in predictions.


Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis

http://arxiv.org/abs/2407.14180v1

Compressor summary: The paper presents a method to analyze gender biases in French TV and radio news transcriptions using a large language model and a smaller specialized model.


EVLM: An Efficient Vision-Language Model for Visual Understanding

http://arxiv.org/abs/2407.14177v1

Compressor summary: The paper presents a new efficient multi-modal language model using cross-attention, hierarchical ViT features, and MoE to improve visual perception and reduce computational costs.


Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme

http://arxiv.org/abs/2407.14170v1

Compressor summary: The paper proposes a new algorithm, Forbes, that obscures faces from humans while preserving identity and attributes for machines using multiple transformations with optimized parameters.


On Maximum Entropy Linear Feature Inversion

http://arxiv.org/abs/2407.14166v1

Compressor summary: The paper presents a novel method for solving dimension-reducing linear mappings using maximum entropy that unifies existing methods and applies to new constraints like [0, 1].


A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C

http://arxiv.org/abs/2407.14151v1

Compressor summary: The study compares three Deep Reinforcement Learning models in BreakOut Atari, testing their learning efficiency, strategy development, and adaptability.


Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

http://arxiv.org/abs/2407.14143v1

Compressor summary: The paper proposes RAPF, a method to improve class-incremental learning by adjusting representations and fusing parameters during adaptation to new classes.


Visual Text Generation in the Wild

http://arxiv.org/abs/2407.14138v1

Compressor summary: SceneVTG is a visual text generator that uses a large language model to recommend reasonable text regions and contents, and a conditional diffusion model to produce high-quality text images in the wild.


I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction

http://arxiv.org/abs/2407.14133v1

Compressor summary: The paper introduces a new model called ZeroVLM that improves visual spatial reasoning in VLMs using 3D reconstruction and prompting mechanisms.


Comparing and Contrasting Deep Learning Weather Prediction Backbones on Navier-Stokes and Atmospheric Dynamics

http://arxiv.org/abs/2407.14129v1

Compressor summary: The text compares different deep learning weather prediction models and their performance on synthetic and real-world data, finding various tradeoffs and highlighting the need for further research.


Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation

http://arxiv.org/abs/2407.14126v1

Compressor summary: Mono-ViFI is a self-supervised monocular depth estimation framework that synthesizes virtual camera views, fuses multi-frame features, and uses image transformation and triplet loss for improved performance.


Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection

http://arxiv.org/abs/2407.14121v1

Compressor summary: The paper introduces Seismic Fault SAM, a pre-trained model that uses adaptors, 2.5D input strategy, and prior-based data augmentation to improve seismic fault detection and interpretation.


The Cardinality of Identifying Code Sets for Soccer Ball Graph with Application to Remote Sensing

http://arxiv.org/abs/2407.14120v1

Compressor summary: The paper presents a mathematical model using soccer ball graph and identifying code sets to optimize satellite monitoring of the Earth's surface by finding the minimum number of satellites needed to uniquely identify regions.


Shape and Style GAN-based Multispectral Data Augmentation for Crop/Weed Segmentation in Precision Farming

http://arxiv.org/abs/2407.14119v1

Compressor summary: Key points: - Paper presents a method for data augmentation using two GANs to create artificial images for precision farming - Method replaces only patches with objects of interest, considering both foreground and background - Experiments show effectiveness on public datasets Summary: The paper proposes a GAN-based data augmentation method that enhances training data for precision farming by replacing object patches in original images, respecting foreground and background.


Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation

http://arxiv.org/abs/2407.14117v1

Compressor summary: Our method improves low-shot capability of CLIP by refining visual content with prediction margins and adapting to global and local details.


Zero-Shot Underwater Gesture Recognition

http://arxiv.org/abs/2407.14103v1

Compressor summary: Key points: - Hand gesture recognition helps human-machine interaction underwater - CADDIAN is a new gesture-based language for divers - Zero-shot underwater gesture recognition (ZSUGR) aims to recognize unseen gestures using few seen classes - A two-stage framework with novel transformer and conditional GAN is proposed - The method outperforms existing zero-shot techniques Summary: The paper proposes a novel approach for recognizing unseen hand gestures underwater using few visual samples of seen gestures, by combining a transformer and a conditional generative adversarial network.


On the Robustness of Fully-Spiking Neural Networks in Open-World Scenarios using Forward-Only Learning Algorithms

http://arxiv.org/abs/2407.14097v1

Compressor summary: This work proposes a biologically plausible AI model using the Forward-Forward Algorithm (FFA) and develops an OoD detection algorithm (FF-SCP) that outperforms existing methods in terms of accuracy, energy efficiency, and explainability.


Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation

http://arxiv.org/abs/2407.14088v1

Compressor summary: The study investigates the impact of model size on D2T performance across five datasets and twelve LLMs, finding that larger models improve readability and informativeness but may sacrifice faithfulness and struggle with source-reference divergence.


Score Normalization for Demographic Fairness in Face Recognition

http://arxiv.org/abs/2407.14087v1

Compressor summary: Key points: - State-of-the-art face recognition networks have different score distributions across demographics - The paper proposes to integrate demographic information in score normalization methods to improve fairness - Experiments show that the proposed techniques improve overall fairness without sacrificing performance Summary: The paper introduces demographic-aware score normalization methods for face recognition networks to enhance fairness across different demographics, and demonstrates their effectiveness on two datasets.


Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking

http://arxiv.org/abs/2407.14086v1

Compressor summary: The paper proposes a new learning approach for multi-object tracking using cross-correlation to capture temporal information of objects, improving performance and reducing computational cost.


An Improved Method for Class-specific Keyword Extraction: A Case Study in the German Business Registry

http://arxiv.org/abs/2407.14085v1

Compressor summary: Keyword extraction method using KeyBERT library to identify class-specific keywords from seed keywords, achieving improved results in German business registry classification.


DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning

http://arxiv.org/abs/2407.14081v1

Compressor summary: DisenSemi is a novel framework for semi-supervised graph classification that learns disentangled representation using a factor-wise graph encoder and an MI-based consistency regularization.


Stable-Hair: Real-World Hair Transfer via Diffusion Model

http://arxiv.org/abs/2407.14078v1

Compressor summary: The Stable-Hair framework is a novel diffusion-based approach that can transfer realistic and diverse hairstyles onto bald faces for virtual hair try-on.


Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field

http://arxiv.org/abs/2407.14076v1

Compressor summary: The paper explores domain-specific and mixed-domain pretraining for creating specialized language models that can handle specific tasks with less sensitive data compared to general-purpose models like GPT-4 or Claude-3-opus.


Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

http://arxiv.org/abs/2407.14069v1

Compressor summary: The paper proposes BOLD-DI, a method to improve video contrastive learning by capturing both static and dynamic features of videos in an integrated and decoupled way.


360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

http://arxiv.org/abs/2407.14066v1

Compressor summary: The paper introduces a new benchmark dataset and method for interpolating distorted omnidirectional videos to reduce dizziness in VR experiences.


MSCT: Addressing Time-Varying Confounding with Marginal Structural Causal Transformer for Counterfactual Post-Crash Traffic Prediction

http://arxiv.org/abs/2407.14065v1

Compressor summary: The paper introduces MSCT, a causal deep learning model for predicting post-crash traffic conditions that considers time-varying confounding factors and treatment effects, and shows its superior performance over existing methods using synthetic and real data.


Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability

http://arxiv.org/abs/2407.14064v1

Compressor summary: The paper proposes using pre-training and a balanced optimization technique to improve tuberculosis classification from chest X-ray images, aligning models with human experts without sacrificing accuracy or generalization.


Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation

http://arxiv.org/abs/2407.14062v1

Compressor summary: The paper proposes a new method, DVQ-VAE, that encodes hand into separate parts and uses a dual-stage decoding strategy to generate more realistic human grasps for computer graphics and robotics applications.


Regularizing Dynamic Radiance Fields with Kinematic Fields

http://arxiv.org/abs/2407.14059v1

Compressor summary: The paper introduces a new way to reconstruct dynamic radiance fields from monocular videos using kinematics and physics-driven regularizers, improving the capture of real-world motion patterns.


On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

http://arxiv.org/abs/2407.14058v1

Compressor summary: The paper proposes a method, $C^3$R, to learn causally complete representations for multi-modal learning that balances sufficiency and necessity in cause discovery.


LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

http://arxiv.org/abs/2407.14057v1

Compressor summary: LazyLLM is a method that selectively computes the KV cache for prompts to speed up generation without fine-tuning.


Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings

http://arxiv.org/abs/2407.14056v1

Compressor summary: Rasa is a multilingual TTS dataset that provides a practical way to improve expressiveness by increasing neutral data and adding some expressive data for Indian languages.


PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training

http://arxiv.org/abs/2407.14054v1

Compressor summary: PointRegGPT generates realistic synthetic data for 3D point cloud registration using depth maps, improving performance of existing algorithms and achieving state-of-the-art results.


Prompted Aspect Key Point Analysis for Quantitative Review Summarization

http://arxiv.org/abs/2407.14049v1

Compressor summary: Key points: - PAKPA is a quantitative summarization method for reviews - It uses aspect sentiment analysis and prompted in-context learning with LLMs - It generates and quantifies key points grounded in aspects for business entities - It does not require supervised training or annotated data Summary: PAKPA is a novel review summarization method that uses aspect sentiment analysis and large language models to generate and measure key points for business entities without supervision.


OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

http://arxiv.org/abs/2407.14047v1

Compressor summary: The paper introduces a new problem (OCMOT) that involves tracking objects of known and unknown categories in open corpora, builds a large benchmark dataset (OCTrackB) for it, and proposes a new metric to evaluate recognition performance.


ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

http://arxiv.org/abs/2407.14044v1

Compressor summary: ECCO is a benchmark for evaluating program efficiency using natural language and history-based code generation and editing methods, which can help improve the reliability and performance of large language models in generating correct and efficient code.


Kinematics-based 3D Human-Object Interaction Reconstruction from Single View

http://arxiv.org/abs/2407.14043v1

Compressor summary: The paper proposes a kinematics-based method to accurately reconstruct 3D human-object interaction from single-view RGB images by using improved forward kinematics, Multi-Layer Perceptron, and Contact Region Recognition Network.


Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization

http://arxiv.org/abs/2407.14041v1

Compressor summary: The paper proposes a method to select and optimize noises for diffusion models to improve their generation quality and human preference.


Generative Language Model for Catalyst Discovery

http://arxiv.org/abs/2407.14040v1

Compressor summary: The authors introduce CatGPT, a transformer-based language model that generates valid and accurate inorganic catalyst structures from a vast chemical space, and show its applicability in fine-tuning for specific catalytic reactions.


BERTer: The Efficient One

http://arxiv.org/abs/2407.14039v1

Compressor summary: The authors propose various techniques to enhance BERT's performance in different natural language understanding tasks, achieving state-of-the-art results and showing BERT's versatility.


Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

http://arxiv.org/abs/2407.14032v1

Compressor summary: The paper introduces a novel method called Semantic-CC for generating more accurate and comprehensive change descriptions in bi-temporal remote sensing images using latent knowledge, multi-task interaction, and pixel-level semantics.


HeCiX: Integrating Knowledge Graphs and Large Language Models for Biomedical Research

http://arxiv.org/abs/2407.14030v1

Compressor summary: HeCiX-KG is a novel knowledge graph that fuses data from ClinicalTrials.gov and Hetionet to improve target validation and drug optimization in clinical research, and HeCiX integrates it with GPT-4 for enhanced usability.


PASS++: A Dual Bias Reduction Framework for Non-Exemplar Class-Incremental Learning

http://arxiv.org/abs/2407.14029v1

Compressor summary: The paper proposes a novel dual bias reduction framework for class-incremental learning that uses self-supervised transformation and prototype augmentation to address representation and classifier biases, enabling non-exemplar CIL without catastrophic forgetting.


Semi-supervised reference-based sketch extraction using a contrastive learning framework

http://arxiv.org/abs/2407.14026v1

Compressor summary: The paper introduces a new method to extract sketches with different styles from color images using unpaired data and achieves better results than existing methods.


TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

http://arxiv.org/abs/2407.14024v1

Compressor summary: The paper proposes an out-of-distribution detection method that uses test-time augmentation to improve abnormality detection in gastrointestinal images.


Investigating the Indirect Object Identification circuit in Mamb

http://arxiv.org/abs/2407.14008v1

Compressor summary: The authors apply and adapt existing interpretability techniques to Mamba, a recurrent model with scaling similar to Transformers, and investigate its Indirect Object Identification circuit.


Multi-modal Relation Distillation for Unified 3D Representation Learning

http://arxiv.org/abs/2407.14007v1

Compressor summary: MRD is a pre-training method that uses large VLMs to learn better 3D shape representations from point clouds, images, and language descriptions, improving downstream tasks like classification and retrieval.


NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication

http://arxiv.org/abs/2407.13999v1

Compressor summary: NeLLCom-X is a framework that simulates language emergence and evolution through realistic agent interactions and communication pressures, allowing for the study of various linguistic properties and dynamics.


RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

http://arxiv.org/abs/2407.13998v1

Compressor summary: LFRQA is a new dataset for question answering that tests cross-domain generalization and evaluates large language models by comparing their generated answers to human-written long-form answers.


Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models

http://arxiv.org/abs/2407.13989v1

Compressor summary: The paper proposes a novel approach that combines LLMs and GNNs to improve node classification accuracy with few labeled nodes, outperforming existing methods.


RealViformer: Investigating Attention for Real-World Video Super-Resolution

http://arxiv.org/abs/2407.13987v1

Compressor summary: The paper examines how artifacts affect attention mechanisms in video super-resolution, proposes a channel-attention-based framework called RealViformer that handles artifacts better than existing methods, and makes the code publicly available.


Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

http://arxiv.org/abs/2407.13986v1

Compressor summary: The paper proposes Deep Feature Surgery, a method to improve multi-exit network accuracy and efficiency by feature partitioning and referencing during training.


Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance

http://arxiv.org/abs/2407.13982v1

Compressor summary: The study finds that ASR performance varies by dialect and recording quality, suggesting a need for further investigation to address potential fairness issues in speech recognition technology.


Truthfulness of Calibration Measures

http://arxiv.org/abs/2407.13979v1

Compressor summary: The study analyzes the truthfulness of calibration measures in sequential prediction and introduces a new measure called SSCE that ensures optimal truthful predictions.


Double Gradient Reversal Network for Single-Source Domain Generalization in Multi-mode Fault Diagnosis

http://arxiv.org/abs/2407.13978v1

Compressor summary: DGRN is a method for fault diagnosis in industrial systems that uses dual adversarial training and contrastive learning to generate domain-invariant features from single-mode data, improving accuracy on unseen modes.


PlacidDreamer: Advancing Harmony in Text-to-3D Generation

http://arxiv.org/abs/2407.13976v1

Compressor summary: PlacidDreamer is a text-to-3D framework that uses a multi-view diffusion model to harmonize initialization, multi-view generation, and text-conditioned generation, while solving the limitations of previous methods in score distillation.


Personalized Privacy Protection Mask Against Unauthorized Facial Recognition

http://arxiv.org/abs/2407.13975v1

Compressor summary: Chameleon is a system that generates user-centric privacy masks for facial images to protect them from unauthorized face recognition with minimal quality loss and robustness.


Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

http://arxiv.org/abs/2407.13974v1

Compressor summary: ADDP is a novel method that tackles continual learning for rPPG measurement using adapter finetuning, domain prototypes, feature augmentation, and inference simplification strategies.


Optimizing Agricultural Order Fulfillment Systems: A Hybrid Tree Search Approach

http://arxiv.org/abs/2407.13968v1

Compressor summary: The paper proposes an adaptive hybrid tree search algorithm for optimizing seed order fulfillment in centralized warehouses, considering seasonality, unpredictability, and deadlines.


The Group Robustness is in the Details: Revisiting Finetuning under Spurious Correlations

http://arxiv.org/abs/2407.13957v1

Compressor summary: Key points: - Modern machine learning models can rely on spurious correlations and perform poorly on minority groups - Class-balancing techniques may decrease worst-group accuracy (WGA) over time or depending on group structure - Scaling pretrained models and appropriate class-balancing can improve WGA - Spectral imbalance in features is a potential source of group disparities Summary: The paper investigates how finetuned machine learning models behave on minority groups and identifies factors such as class-balancing, scaling, and spectral imbalance that affect worst-group accuracy.


Neural topology optimization: the good, the bad, and the ugly

http://arxiv.org/abs/2407.13954v1

Compressor summary: Neural topology optimization uses neural networks to reparameterize the decision space and shape the optimization landscape, influencing convergence and exploration depending on the network architecture.