arxiv compressed, 2024-09-05

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-05 generated by the compressor, my personal LLM-based project.


HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

http://arxiv.org/abs/2409.02919v1

Compressor summary: HiPrompt is a new method for generating high-resolution images using diffusion models, which improves the quality by providing both global and local guidance with hierarchical prompts and conditioning noise components on different prompt levels.


UC-NeRF: Uncertainty-aware Conditional Neural Radiance Fields from Endoscopic Sparse Views

http://arxiv.org/abs/2409.02917v1

Compressor summary: The paper proposes a new technique called uncertainty-aware conditional NeRF to improve visualization of surgical scenes by addressing challenges such as sparse views and photometric inconsistencies.


Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

http://arxiv.org/abs/2409.02908v1

Compressor summary: This paper reveals a theoretical issue with masked diffusion models (MDMs) and proposes a faster sampler that addresses it, showing that MDMs are not superior to auto-regressive models (ARMs).


Topological Methods in Machine Learning: A Tutorial for Practitioners

http://arxiv.org/abs/2409.02901v1

Compressor summary: TML uses algebraic topology techniques to analyze complex data structures and reveal insights hidden from traditional machine learning methods.


LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

http://arxiv.org/abs/2409.02897v1

Compressor summary: The authors propose a method to improve the trustworthiness of large language models by generating accurate answers with sentence-level citations in LQAC tasks.


LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

http://arxiv.org/abs/2409.02889v1

Compressor summary: The paper introduces LongLLaVA, a hybrid MLLM that balances efficiency and effectiveness for multi-modal tasks, with improved performance and reduced costs compared to existing models.


Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test

http://arxiv.org/abs/2409.02883v1

Compressor summary: The authors developed a multi-stream deep learning framework to improve the reliability and accuracy of detecting mild cognitive impairment using the Rey Complex Figure Test, which could be useful in clinical settings.


Benchmarking Spurious Bias in Few-Shot Image Classifiers

http://arxiv.org/abs/2409.02882v1

Compressor summary: The paper introduces FewSTAB, a benchmarking system to assess and improve the robustness of few-shot image classifiers against spurious bias using pre-trained vision-language models and existing test data.


Configurable Foundation Models: Building LLMs from a Modular Perspective

http://arxiv.org/abs/2409.02877v1

Compressor summary: This paper proposes configurable foundation models using functional modules called bricks, which can improve efficiency and scalability of large language models by allowing dynamic configuration based on instructions.


Look Into the LITE in Deep Learning for Time Series Classification

http://arxiv.org/abs/2409.02869v1

Compressor summary: The paper introduces LITE, a lightweight deep learning architecture for Time Series Classification that is faster, consumes less resources, and performs well on multivariate time series data.


The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition

http://arxiv.org/abs/2409.02867v1

Compressor summary: The paper explores how using balanced authentic and synthetic data affects face recognition accuracy and fairness, finding that diffusion-based models improve accuracy while pre-trained generative methods have little impact on fairness.


Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure

http://arxiv.org/abs/2409.02866v1

Compressor summary: The Hybrid-Segmentor model uses self-attention to detect and segment different types of cracks in infrastructure with high accuracy and generalization capabilities.


Bioinformatics Retrieval Augmentation Data (BRAD) Digital Assistant

http://arxiv.org/abs/2409.02864v1

Compressor summary: BRAD is a prototype digital assistant that can handle various bioinformatics tasks, such as question-answering, software pipeline execution, and automation of workflows.


Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models

http://arxiv.org/abs/2409.02851v1

Compressor summary: Human-VDM is a novel method for generating high-quality 3D humans from a single RGB image using Video Diffusion Models and Gaussian Splatting, addressing inconsistent view issues in existing methods.


Oops, I Sampled it Again: Reinterpreting Confidence Intervals in Few-Shot Learning

http://arxiv.org/abs/2409.02850v1

Compressor summary: The paper shows that using sampling with replacement in few-shot learning leads to misleading confidence intervals and proposes methods to improve them.


MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling

http://arxiv.org/abs/2409.02846v1

Compressor summary: The paper proposes MaDis-Stereo, a Transformer-based stereo matching model that uses Masked Image Modeling and knowledge distillation to improve performance on data-scarce tasks like ETH3D and KITTI 2015.


Historical German Text Normalization Using Type- and Token-Based Language Modeling

http://arxiv.org/abs/2409.02841v1

Compressor summary: The report presents a machine learning-based normalization system for historic German texts using Transformer models, achieving state-of-the-art accuracy with limited data.


R2GQA: Retriever-Reader-Generator Question Answering System to Support Students Understanding Legal Regulations in Higher Education

http://arxiv.org/abs/2409.02840v1

Compressor summary: The R2GQA system helps students understand legal regulations by retrieving, reading, and generating answers from documents using advanced techniques and a new Vietnamese dataset.


Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models

http://arxiv.org/abs/2409.02836v1

Compressor summary: The study analyzes different types of statements and emotions in cryptocurrency discussions using advanced natural language processing techniques and finds distinct patterns and interplay between predictive, hope, and regret sentiments across five cryptocurrencies.


CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

http://arxiv.org/abs/2409.02834v1

Compressor summary: The paper introduces a Chinese multimodal math dataset to evaluate and enhance large multimodal models' mathematical reasoning skills, and proposes a new model (Math-LMM) that improves performance on various problem types.


ExpLLM: Towards Chain of Thought for Facial Expression Recognition

http://arxiv.org/abs/2409.02828v1

Compressor summary: The paper proposes a novel method called ExpLLM that uses large language models to generate a chain of thought for accurate facial expression recognition, outperforming current methods and even GPT-4o in recognizing micro-expressions.


Deep Learning Meets Satellite Images -- An Evaluation on Handcrafted and Learning-based Features for Multi-date Satellite Stereo Images

http://arxiv.org/abs/2409.02825v1

Compressor summary: The paper compares different feature extraction and matching methods for digital surface models generation from satellite images, showing that traditional methods can be competitive with deep learning ones.


MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

http://arxiv.org/abs/2409.02813v1

Compressor summary: The paper presents MMMU-Pro, a challenging benchmark for multimodal models that tests their ability to integrate visual and textual information by embedding questions within images, and shows that current models perform significantly worse on it.


Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble

http://arxiv.org/abs/2409.02802v1

Compressor summary: Key points: - Adversarial robustness is a challenge for time series classification (TSC) - Randomized Smoothing provides provable lower bound on robustness radius but struggles with poor robustness datasets - Self-ensemble method improves robustness by reducing variance of classification margins and addressing computational overhead issue Summary: The paper proposes a self-ensemble method to enhance adversarial robustness for time series classification, improving on Randomized Smoothing's performance and lower bound.


Towards a Unified View of Preference Learning for Large Language Models: A Survey

http://arxiv.org/abs/2409.02795v1

Compressor summary: The text discusses a survey of preference alignment strategies for large language models, breaking them down into four components and providing examples to understand their strengths and challenges.


UnLearning from Experience to Avoid Spurious Correlations

http://arxiv.org/abs/2409.02792v1

Compressor summary: ULE is a new approach that improves robustness of deep neural networks by training two models in parallel: a student model that learns spurious correlations and a teacher model that unlearns the student's mistakes.


Unifying Causal Representation Learning with the Invariance Principle

http://arxiv.org/abs/2409.02772v1

Compressor summary: The paper shows that many causal representation learning methods align representations to data symmetries, not necessarily causal ones, and proposes a unified method that can use different assumptions based on relevant invariances for applications like treatment effect estimation.


An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting

http://arxiv.org/abs/2409.02760v1

Compressor summary: The paper proposes an active learning approach for learning non-monotonic preferences in MCS problems using max-margin optimization and information amount measurement.


A Comparative Study of Pre-training and Self-training

http://arxiv.org/abs/2409.02751v1

Compressor summary: Pre-training and fine-tuning is the best semi-supervised learning approach for sentiment analysis and natural language inference tasks, while self-training has no extra advantages.


Tractable Offline Learning of Regular Decision Processes

http://arxiv.org/abs/2409.02747v1

Compressor summary: The paper proposes two new techniques for offline RL in non-Markovian environments, improving on previous algorithms by using a formal language pseudometric and Count-Min-Sketch.


Task-Oriented Communication for Graph Data: A Graph Information Bottleneck Approach

http://arxiv.org/abs/2409.02728v1

Compressor summary: The paper introduces a method to create small, task-focused subgraphs from large graphs using GNNs and GIB principle, which reduces communication costs while preserving essential information.


Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models?

http://arxiv.org/abs/2409.02727v1

Compressor summary: This study explores optimal design and pooling strategies for Large Language Models (LLMs) based embedding models by conducting large-scale experiments and proposes a new pooling method, Multi-Layers Trainable Pooling, which improves performance in text similarity and retrieval tasks.


Pre-training data selection for biomedical domain adaptation using journal impact metrics

http://arxiv.org/abs/2409.02725v1

Compressor summary: The study investigates how using specific quality metrics for scientific papers to refine a pre-training dataset affects BERT's performance on biomedical language understanding tasks.


A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations

http://arxiv.org/abs/2409.02712v1

Compressor summary: This study proposes a data filtering approach using cross-lingual sentence representations to improve machine translation quality for low-resource English-Marathi language pairs.


Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL

http://arxiv.org/abs/2409.02711v1

Compressor summary: PostNL developed an AI system called SuperTracy to improve parcel tracking communication using generative AI technologies like Retrieval-Augmented Generation.


Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

http://arxiv.org/abs/2409.02708v1

Compressor summary: Meta-SP is a new algorithm for multi-task linear models that learns an invariant low-rank subspace shared by different tasks and outperforms other methods.


Decision Transformer for Enhancing Neural Local Search on the Job Shop Scheduling Problem

http://arxiv.org/abs/2409.02697v1

Compressor summary: The paper proposes a method to improve a deep reinforcement learning agent for job shop scheduling by training it on search trajectories and achieves state-of-the-art results with machine learning-enhanced search.


Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

http://arxiv.org/abs/2409.02686v1

Compressor summary: The paper explores why large language models struggle with reasoning tasks, proposes a causal framework to understand their limitations, and introduces Deconfounded Causal Adaptation (DCA), a novel fine-tuning method that improves their performance with minimal parameters.


Rethinking HTG Evaluation: Bridging Generation and Recognition

http://arxiv.org/abs/2409.02683v1

Compressor summary: The paper introduces three new metrics for evaluating handwriting generation models that consider style, content, and diversity, and shows they are better than existing metrics like FID.


Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon

http://arxiv.org/abs/2409.02681v1

Compressor summary: The study uses a mixed RNN model with LSTM and GRU architectures to predict monthly fire spot counts in the Amazon using satellite data, showing improved accuracy and capturing seasonal patterns.


Independence Constrained Disentangled Representation Learning from Epistemological Perspective

http://arxiv.org/abs/2409.02672v1

Compressor summary: The paper proposes a novel method for disentangled representation learning that combines mutual information and independence constraints within a GAN framework, improving the quality of controllable generation and explainability.


Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus

http://arxiv.org/abs/2409.02667v1

Compressor summary: The article presents a semi-automatic method to create domain-specific translation memories from Turkish cardiology journals for various language applications.


Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

http://arxiv.org/abs/2409.02664v1

Compressor summary: The paper proposes a novel method to improve deepfake detection by repurposing vision-language models and manipulating their input without tuning the model parameters.


PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation

http://arxiv.org/abs/2409.02657v1

Compressor summary: PoseTalk is a system that generates lip-synchronized talking head videos with free head poses using both audio and text inputs, addressing loss-imbalance issues and achieving better pose diversity and realness.


Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects

http://arxiv.org/abs/2409.02653v1

Compressor summary: The paper proposes Skip-and-Play, a depth-based pose control method for text-to-image generation that reduces shape dependency on depth maps while preserving the pose.


Learning-Based Error Detection System for Advanced Vehicle Instrument Cluster Rendering

http://arxiv.org/abs/2409.02647v1

Compressor summary: The text proposes a learning-based monitoring approach to detect and counter rendering errors in digital displays of the automotive industry using telltales that separate good and corrupted content.


MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

http://arxiv.org/abs/2409.02638v1

Compressor summary: MADiff is a method that predicts future hand waypoints in egocentric videos using diffusion models with motion-aware denoising and scene understanding, achieving real-time performance and reasonable results.


Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

http://arxiv.org/abs/2409.02634v1

Compressor summary: The paper introduces Loopy, an end-to-end audio-only conditioned video diffusion model that generates natural and high-quality human video without spatial motion templates.


Evaluating Environments Using Exploratory Agents

http://arxiv.org/abs/2409.02632v1

Compressor summary: The paper presents an exploratory agent that evaluates and optimizes procedurally generated game levels for player exploration based on motivations and a fitness function.


(Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models

http://arxiv.org/abs/2409.02628v1

Compressor summary: Epistemic uncertainty collapse occurs in deep learning models as complexity increases due to implicit ensembling, challenging the assumption of better uncertainty quantification with larger models.


PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation

http://arxiv.org/abs/2409.02617v1

Compressor summary: The paper introduces a new dataset to evaluate how well large language models understand various types of visual data, using text prompts and questions related to images.


GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

http://arxiv.org/abs/2409.02611v1

Compressor summary: The paper proposes a novel Graph-of-Thought guided compositional reasoning model called GoT-CQA for answering questions based on chart content, which performs well in complex reasoning tasks.


A Medical Multimodal Large Language Model for Pediatric Pneumonia

http://arxiv.org/abs/2409.02608v1

Compressor summary:


Hypothesizing Missing Causal Variables with LLMs

http://arxiv.org/abs/2409.02604v1

Compressor summary:


SurgTrack: CAD-Free 3D Tracking of Real-world Surgical Instruments

http://arxiv.org/abs/2409.02598v1

Compressor summary:


An Analysis of Linear Complexity Attention Substitutes with BEST-RQ

http://arxiv.org/abs/2409.02596v1

Compressor summary:


Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins

http://arxiv.org/abs/2409.02588v1

Compressor summary:


BMI Prediction from Handwritten English Characters Using a Convolutional Neural Network

http://arxiv.org/abs/2409.02584v1

Compressor summary:


Solving Video Inverse Problems Using Image Diffusion Models

http://arxiv.org/abs/2409.02574v1

Compressor summary:


More is More: Addition Bias in Large Language Models

http://arxiv.org/abs/2409.02569v1

Compressor summary:


How Do You Perceive My Face? Recognizing Facial Expressions in Multi-Modal Context by Modeling Mental Representations

http://arxiv.org/abs/2409.02566v1

Compressor summary:


Interacting Multiple Model-based Joint Homography Matrix and Multiple Object State Estimation

http://arxiv.org/abs/2409.02562v1

Compressor summary:


Vision-Language Navigation with Continual Learning

http://arxiv.org/abs/2409.02561v1

Compressor summary:


Low-Resolution Object Recognition with Cross-Resolution Relational Contrastive Distillation

http://arxiv.org/abs/2409.02555v1

Compressor summary:


A Sequential Decision-Making Model for Perimeter Identification

http://arxiv.org/abs/2409.02549v1

Compressor summary:


Real-Time Dynamic Scale-Aware Fusion Detection Network: Take Road Damage Detection as an example

http://arxiv.org/abs/2409.02546v1

Compressor summary:


UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

http://arxiv.org/abs/2409.02545v1

Compressor summary:


StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

http://arxiv.org/abs/2409.02543v1

Compressor summary:


Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models

http://arxiv.org/abs/2409.02530v1

Compressor summary:


Sample what you cant compress

http://arxiv.org/abs/2409.02529v1

Compressor summary:


Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments

http://arxiv.org/abs/2409.02522v1

Compressor summary:


Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts

http://arxiv.org/abs/2409.02519v1

Compressor summary:


Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

http://arxiv.org/abs/2409.02512v1

Compressor summary:


Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation

http://arxiv.org/abs/2409.02494v1

Compressor summary:


Reliable Deep Diffusion Tensor Estimation: Rethinking the Power of Data-Driven Optimization Routine

http://arxiv.org/abs/2409.02492v1

Compressor summary:


TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT

http://arxiv.org/abs/2409.02490v1

Compressor summary:


Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization

http://arxiv.org/abs/2409.02486v1

Compressor summary:


Volumetric Surfaces: Representing Fuzzy Geometries with Multiple Meshes

http://arxiv.org/abs/2409.02482v1

Compressor summary:


Word and Phrase Features in Graph Convolutional Network for Automatic Question Classification

http://arxiv.org/abs/2409.02481v1

Compressor summary:


DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels

http://arxiv.org/abs/2409.02465v1

Compressor summary:


What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations

http://arxiv.org/abs/2409.02449v1

Compressor summary:


Detecting Korean Food Using Image using Hierarchical Model

http://arxiv.org/abs/2409.02448v1

Compressor summary:


ForeCal: Random Forest-based Calibration for DNNs

http://arxiv.org/abs/2409.02446v1

Compressor summary:


Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation

http://arxiv.org/abs/2409.02438v1

Compressor summary:


Adversarial Learning for Neural PDE Solvers with Sparse Data

http://arxiv.org/abs/2409.02431v1

Compressor summary:


Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis

http://arxiv.org/abs/2409.02429v1

Compressor summary:


Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning

http://arxiv.org/abs/2409.02428v1

Compressor summary:


Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

http://arxiv.org/abs/2409.02426v1

Compressor summary:


Relative-Translation Invariant Wasserstein Distance

http://arxiv.org/abs/2409.02416v1

Compressor summary:


Abstractive Text Summarization: State of the Art, Challenges, and Improvements

http://arxiv.org/abs/2409.02413v1

Compressor summary:


Adaptive Class Emergence Training: Enhancing Neural Network Stability and Generalization through Progressive Target Evolution

http://arxiv.org/abs/2409.02410v1

Compressor summary:


Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation

http://arxiv.org/abs/2409.02404v1

Compressor summary:


Determination of language families using deep learning

http://arxiv.org/abs/2409.02393v1

Compressor summary:


Building Math Agents with Multi-Turn Iterative Preference Learning

http://arxiv.org/abs/2409.02392v1

Compressor summary:


Multi-modal Situated Reasoning in 3D Scenes

http://arxiv.org/abs/2409.02389v1

Compressor summary:


Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges

http://arxiv.org/abs/2409.02387v1

Compressor summary:


Unified Framework with Consistency across Modalities for Human Activity Recognition

http://arxiv.org/abs/2409.02385v1

Compressor summary:


STAB: Speech Tokenizer Assessment Benchmark

http://arxiv.org/abs/2409.02384v1

Compressor summary:


Coral Model Generation from Single Images for Virtual Reality Applications

http://arxiv.org/abs/2409.02376v1

Compressor summary:


How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

http://arxiv.org/abs/2409.02375v1

Compressor summary:


Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

http://arxiv.org/abs/2409.02374v1

Compressor summary:


Do Large Language Models Possess Sensitive to Sentiment?

http://arxiv.org/abs/2409.02370v1

Compressor summary:


Optimal Neural Network Approximation for High-Dimensional Continuous Functions

http://arxiv.org/abs/2409.02363v1

Compressor summary:


Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering

http://arxiv.org/abs/2409.02361v1

Compressor summary:


Understanding the Role of Functional Diversity in Weight-Ensembling with Ingredient Selection and Multidimensional Scaling

http://arxiv.org/abs/2409.02347v1

Compressor summary:


NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

http://arxiv.org/abs/2409.02343v1

Compressor summary: