arxiv compressed, 2024-08-29

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-29 generated by the compressor, my personal LLM-based project.


Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

http://arxiv.org/abs/2408.15998v1

Compressor summary: This study explores the design space for multimodal language models using vision encoders, finding that simple concatenation of visual tokens and pre-alignment improves performance on complex tasks.


Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need

http://arxiv.org/abs/2408.15997v1

Compressor summary: Mixture of Universals (MoU) is a versatile model that combines short-term and long-time dependencies for enhanced time series forecasting performance with low computational costs.


Spatio-Temporal Context Prompting for Zero-Shot Action Detection

http://arxiv.org/abs/2408.15996v1

Compressor summary: The paper proposes a method that uses pretrained image-language models for spatio-temporal action detection, incorporating person-context interaction and context prompting to handle unseen actions and multi-action videos.


TEDRA: Text-based Editing of Dynamic and Photoreal Actors

http://arxiv.org/abs/2408.15995v1

Compressor summary: TEDRA is a method that allows text-based editing of realistic 3D avatars while maintaining their fidelity, dynamics, and pose control.


Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

http://arxiv.org/abs/2408.15994v1

Compressor summary: Perceive-IR is an all-in-one image restorer that uses prompt learning and quality-aware strategies to achieve fine-grained quality control for different types and severities of image degradation.


ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

http://arxiv.org/abs/2408.15993v1

Compressor summary: ClimDetect is a standardized dataset that helps improve detection and attribution of climate change signals using deep learning and vision transformers, enabling better model evaluations and climate science.


CoGen: Learning from Feedback with Coupled Comprehension and Generation

http://arxiv.org/abs/2408.15992v1

Compressor summary: The text describes a study that improves language comprehension and generation by tightly integrating them and learning from user interactions, resulting in a more human-like system with up to 26% better performance.


Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

http://arxiv.org/abs/2408.15991v1

Compressor summary: The authors propose a new method called Distribution Backtracking Distillation (DisBack) that improves the speed and quality of training student diffusion models by using the entire convergence trajectory of teacher models.


WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

http://arxiv.org/abs/2408.15978v1

Compressor summary: WebPilot is a multi-agent system that combines strategic exploration and complex decision-making using MCTS to improve LLM-based web agents' adaptability and performance in dynamic, uncertain tasks.


BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems

http://arxiv.org/abs/2408.15971v1

Compressor summary: BattleAgentBench is a benchmark to evaluate language models' collaboration abilities in single-agent, paired-agent, and multi-agent scenarios of varying difficulty levels.


More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

http://arxiv.org/abs/2408.15966v1

Compressor summary: The paper proposes a new task for large language models to understand 3D objects with minimal data, introducing GreenPLM that uses more text data to compensate for the lack of 3D data and achieve robust 3D understanding.


Efficient Slice Anomaly Detection Network for 3D Brain MRI Volume

http://arxiv.org/abs/2408.15958v1

Compressor summary: SimpleSliceNet uses a pre-trained 2D model to extract features from 3D brain MRI slices, improving anomaly detection accuracy and reducing computational cost.


Fall Detection for Smart Living using YOLOv5

http://arxiv.org/abs/2408.15955v1

Compressor summary: The paper presents a highly accurate fall detection system using YOLOv5mu that works well in different smart home settings and can be improved with more data and sensors.


Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games

http://arxiv.org/abs/2408.15950v1

Compressor summary: This paper explores using multimodal LLMs as low-level controllers in Atari games, evaluating their performance against traditional RL agents and human players.


Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning

http://arxiv.org/abs/2408.15924v1

Compressor summary: The text proposes WATF, a strategy for local descriptor selection that adapts to image context and improves few-shot image classification by reducing background noise and focusing on category-related information.


DiffAge3D: Diffusion-based 3D-aware Face Aging

http://arxiv.org/abs/2408.15922v1

Compressor summary: DiffAge3D is a novel 3D face aging framework that performs faithful aging, identity preservation, and works in a 3D setting using a 3D GAN and CLIP model.


Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

http://arxiv.org/abs/2408.15915v1

Compressor summary: The authors propose a method to improve large language models' expertise in specific domains by using few human-annotated samples and a mixture-of-expert system that emphasizes diversity and problem-solving abilities.


CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

http://arxiv.org/abs/2408.15914v1

Compressor summary: CoRe is a method to improve text-to-image personalization by regularizing the context tokens around a new concept, enhancing its semantic understanding and integration with existing tokens.


MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

http://arxiv.org/abs/2408.15905v1

Compressor summary: MetaGFN is a novel exploration algorithm for continuous generative models that uses Adapted Metadynamics to balance exploration and exploitation, resulting in faster convergence and better rewards.


LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

http://arxiv.org/abs/2408.15903v1

Compressor summary: GMeLLo is a method that combines Knowledge Graphs and Large Language Models to efficiently update and reason about facts in multi-hop questions.


Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts

http://arxiv.org/abs/2408.15901v1

Compressor summary: Nexus is an enhanced MoE architecture that upcycles dense expert models for improved specialization and adaptability to new tasks, achieving significant gains in performance with limited data.


Airfoil Diffusion: Denoising Diffusion Model For Conditional Airfoil Generation

http://arxiv.org/abs/2408.15898v1

Compressor summary: The paper presents a new method to generate airfoils using a data-driven diffusion model that can produce realistic and innovative designs with desired aerodynamic properties.


A New Method for Cross-Lingual-based Semantic Role Labeling

http://arxiv.org/abs/2408.15896v1

Compressor summary: The paper proposes a deep learning model that improves semantic role labeling across multiple languages by using model transfer and limited data from English and Persian corpora, achieving better results than previous models.


Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models

http://arxiv.org/abs/2408.15895v1

Compressor summary: The study shows that Large Language Models (LLMs) have political biases similar to human coders, but unlike humans, LLMs are biased even when faced with statements from moderate parties.


The Role of Fibration Symmetries in Geometric Deep Learning

http://arxiv.org/abs/2408.15894v1

Compressor summary: Geometric Deep Learning (GDL) is improved by incorporating local symmetries in graphs, which enhances the performance and efficiency of Graph Neural Networks (GNNs).


Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

http://arxiv.org/abs/2408.15890v1

Compressor summary: The disentangled diffusion autoencoder (DDAE) is a novel diffusion model that generates high-quality, harmonized 2D MR images by controlling specific aspects of an image and preserving biological variability.


LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

http://arxiv.org/abs/2408.15881v1

Compressor summary: LLaVA-MoD is a novel framework that efficiently trains small-scale multimodal language models by distilling knowledge from large-scale ones using a sparse Mixture of Experts architecture and a progressive knowledge transfer strategy.


Persuasion Games using Large Language Models

http://arxiv.org/abs/2408.15879v1

Compressor summary: Large Language Models can enhance persuasive dialogue in diverse domains by collaborating with auxiliary agents that perform various tasks, counteracting user resistance, and adapting to different personality types.


Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

http://arxiv.org/abs/2408.15874v1

Compressor summary: Robust statistical scaling improves outlier probability transformation using robust estimators, addressing a limitation of common statistical scaling methods in outlier detection.


microYOLO: Towards Single-Shot Object Detection on Microcontrollers

http://arxiv.org/abs/2408.15865v1

Compressor summary: MicroYOLO is a single-shot object detector that works on small microcontrollers, achieving fast speeds and low memory usage.


What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector

http://arxiv.org/abs/2408.15857v1

Compressor summary: The study analyzes the YOLOv8 object detection model, its innovations, performance improvements, benchmark results, and developer-friendly features, showcasing it as a leading approach in object detection.


Network transferability of adversarial patches in real-time object detection

http://arxiv.org/abs/2408.15833v1

Compressor summary: This paper explores how adversarial patches can make objects invisible to object detectors and finds that patches optimized with larger models have better transferability across networks and datasets.


SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

http://arxiv.org/abs/2408.15829v1

Compressor summary: SITransformer is a new method for extreme multimodal summarization that uses cross-modal information to create accurate and concise summaries from various types of data.


Automatic Differential Diagnosis using Transformer-Based Multi-Label Sequence Classification

http://arxiv.org/abs/2408.15827v1

Compressor summary: Key points: - Assistive technologies, such as automatic diagnostic systems, are being developed for healthcare - The study proposes a transformer-based approach for providing differential diagnoses based on patient data - The study uses the DDXPlus dataset and modifies the data to improve model robustness and generalization - The models achieve over 97% F1 score on the test set and show promising results on a custom set Summary: The study develops a transformer-based system for differential diagnosis in healthcare using patient data from the DDXPlus dataset and data modification modules, achieving high F1 scores and generalizing well.


Mining Field Data for Tree Species Recognition at Scale

http://arxiv.org/abs/2408.15816v1

Compressor summary: The method automatically labels tree species in aerial images using pretrained models and public forest inventory data, requiring minimal human input and handling noisy data well.


Object Detection for Vehicle Dashcams using Transformers

http://arxiv.org/abs/2408.15809v1

Compressor summary: The paper proposes using transformers for object detection in dashcams, improving productivity and accuracy in the automotive industry.


Visual Prompt Engineering for Medical Vision Language Models in Radiology

http://arxiv.org/abs/2408.15802v1

Compressor summary: The paper explores how visual prompts, like arrows and circles, can improve VLMs' ability to classify lung nodule malignancy using BiomedCLIP.


Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization

http://arxiv.org/abs/2408.15801v1

Compressor summary: EYEGLAXS is a framework that uses large language models to efficiently and accurately summarize long text documents by extracting relevant information, overcoming common issues with abstractive methods.


Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough

http://arxiv.org/abs/2408.15793v1

Compressor summary: Continued pretraining of LLMs on a tight budget may help Arabic adaptation but hurts German adaptation, and suggests that training precision and tokenizer swapping can improve efficiency.


Efficient LLM Scheduling by Learning to Rank

http://arxiv.org/abs/2408.15792v1

Compressor summary: The paper proposes a novel scheduler for large language models that uses ranking information to approximate the shortest-job-first schedule and improve performance in serving applications.


Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions

http://arxiv.org/abs/2408.15787v1

Compressor summary: Key points: - Virtual counselors using large language models (LLMs) aim to assist clients with mental health challenges - Researchers propose a framework that simulates counselor-client interactions with two LLMs - They evaluate the synthetic data and compare it with human-generated conversations Summary: Researchers use LLMs to create virtual counselors that can interact with clients and assess their effectiveness by generating and comparing synthetic dialogues.


Implicit Regularization Paths of Weighted Neural Representations

http://arxiv.org/abs/2408.15784v1

Compressor summary: The paper investigates how different weightings of pretrained features affect the regularization of ridge estimators and proposes a cross-validation method for tuning these weights.


LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

http://arxiv.org/abs/2408.15778v1

Compressor summary: LogicGame is a novel benchmark for evaluating large language models' comprehension and application of predefined rules in diverse scenarios with varying difficulty levels.


A Survey on Facial Expression Recognition of Static and Dynamic Emotions

http://arxiv.org/abs/2408.15777v1

Compressor summary: This paper surveys facial expression recognition methods, covering image-based and video-based approaches, and discussing challenges and future directions in both domains.


A Survey on Evaluation of Multimodal Large Language Models

http://arxiv.org/abs/2408.15769v1

Compressor summary: This paper reviews various methods to evaluate multimodal large language models (MLLMs) that integrate different sensory encoders with powerful language models, aiming to help researchers improve these models for achieving artificial general intelligence (AGI).


Harmonized Speculative Sampling

http://arxiv.org/abs/2408.15766v1

Compressor summary: HASS improves speculative sampling for LLaMA models by harmonizing training and decoding to increase acceptance rate and reduce inference overhead.


A Neural Material Point Method for Particle-based Simulations

http://arxiv.org/abs/2408.15753v1

Compressor summary: NeuralMPM is a neural emulation framework that uses image-to-image neural networks to simulate particle-based physics, reducing training times and achieving comparable or superior accuracy compared to existing methods.


Adaptive Traffic Signal Control Using Reinforcement Learning

http://arxiv.org/abs/2408.15751v1

Compressor summary: The paper proposes using reinforcement learning to optimize traffic signals at intersections, reducing congestion and costs, and presents two RL algorithms that perform better than conventional systems in simulations.


Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of Tone 3 sandhi

http://arxiv.org/abs/2408.15747v1

Compressor summary: The study examines how contextual factors affect the pitch contours of two-character words with different tone patterns in spontaneous Taiwan Mandarin conversations.


MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms

http://arxiv.org/abs/2408.15740v1

Compressor summary: MambaPlace is a new framework that uses language and 3D point cloud information to improve robot localization accuracy by fusing complementary cross modal features through novel attention mechanisms.


LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

http://arxiv.org/abs/2408.15729v1

Compressor summary: LM-PUB-QUIZ is an open-source framework and leaderboard that uses the BEAR probe to evaluate relational knowledge in language models and help compare them.


Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks

http://arxiv.org/abs/2408.15721v1

Compressor summary: Text-to-image diffusion models are vulnerable to backdoor attacks, but can be protected by adding small textual perturbations without compromising image quality.


An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

http://arxiv.org/abs/2408.15720v1

Compressor summary:


Autoregressive model path dependence near Ising criticality

http://arxiv.org/abs/2408.15715v1

Compressor summary:


Pixels to Prose: Understanding the art of Image Captioning

http://arxiv.org/abs/2408.15714v1

Compressor summary:


Conan-embedding: General Text Embedding with More and Better Negative Samples

http://arxiv.org/abs/2408.15710v1

Compressor summary:


Synthetic Forehead-creases Biometric Generation for Reliable User Verification

http://arxiv.org/abs/2408.15693v1

Compressor summary:


TempoFormer: A Transformer for Temporally-aware Representations in Change Detection

http://arxiv.org/abs/2408.15689v1

Compressor summary:


Deep Learning Based Speckle Filtering for Polarimetric SAR Images. Application to Sentinel-1

http://arxiv.org/abs/2408.15678v1

Compressor summary:


Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

http://arxiv.org/abs/2408.15667v1

Compressor summary:


StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

http://arxiv.org/abs/2408.15666v1

Compressor summary:


Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

http://arxiv.org/abs/2408.15664v1

Compressor summary:


Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

http://arxiv.org/abs/2408.15660v1

Compressor summary:


Realigned Softmax Warping for Deep Metric Learning

http://arxiv.org/abs/2408.15656v1

Compressor summary:


Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings

http://arxiv.org/abs/2408.15650v1

Compressor summary:


Hierarchical Blockmodelling for Knowledge Graphs

http://arxiv.org/abs/2408.15649v1

Compressor summary:


Leveraging Persistent Homology for Differential Diagnosis of Mild Cognitive Impairment

http://arxiv.org/abs/2408.15647v1

Compressor summary:


μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

http://arxiv.org/abs/2408.15646v1

Compressor summary:


RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

http://arxiv.org/abs/2408.15643v1

Compressor summary:


Can SAR improve RSVQA performance?

http://arxiv.org/abs/2408.15642v1

Compressor summary:


MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion

http://arxiv.org/abs/2408.15641v1

Compressor summary:


GANs Conditioning Methods: A Survey

http://arxiv.org/abs/2408.15640v1

Compressor summary:


Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

http://arxiv.org/abs/2408.15637v1

Compressor summary:


Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail

http://arxiv.org/abs/2408.15626v1

Compressor summary:


CAPER: Enhancing Career Trajectory Prediction using Temporal Knowledge Graph and Ternary Relationship

http://arxiv.org/abs/2408.15620v1

Compressor summary:


Large-Scale Demand Prediction in Urban Rail using Multi-Graph Inductive Representation Learning

http://arxiv.org/abs/2408.15619v1

Compressor summary:


Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

http://arxiv.org/abs/2408.15616v1

Compressor summary:


Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

http://arxiv.org/abs/2408.15608v1

Compressor summary:


Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

http://arxiv.org/abs/2408.15593v1

Compressor summary:


Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection

http://arxiv.org/abs/2408.15580v1

Compressor summary:


Temporal Attention for Cross-View Sequential Image Localization

http://arxiv.org/abs/2408.15569v1

Compressor summary:


TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

http://arxiv.org/abs/2408.15566v1

Compressor summary:


SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

http://arxiv.org/abs/2408.15565v1

Compressor summary:


Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation

http://arxiv.org/abs/2408.15562v1

Compressor summary:


Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

http://arxiv.org/abs/2408.15556v1

Compressor summary:


A Novel Denoising Technique and Deep Learning Based Hybrid Wind Speed Forecasting Model for Variable Terrain Conditions

http://arxiv.org/abs/2408.15554v1

Compressor summary:


Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems

http://arxiv.org/abs/2408.15550v1

Compressor summary:


WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

http://arxiv.org/abs/2408.15549v1

Compressor summary:


ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model

http://arxiv.org/abs/2408.15548v1

Compressor summary:


SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

http://arxiv.org/abs/2408.15545v1

Compressor summary:


An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

http://arxiv.org/abs/2408.15543v1

Compressor summary:


Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

http://arxiv.org/abs/2408.15542v1

Compressor summary:


TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles

http://arxiv.org/abs/2408.15538v1

Compressor summary:


Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits

http://arxiv.org/abs/2408.15535v1

Compressor summary:


LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation

http://arxiv.org/abs/2408.15533v1

Compressor summary:


Ray-Distance Volume Rendering for Neural Scene Reconstruction

http://arxiv.org/abs/2408.15524v1

Compressor summary:


Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

http://arxiv.org/abs/2408.15518v1

Compressor summary:


Continual-learning-based framework for structural damage recognition

http://arxiv.org/abs/2408.15513v1

Compressor summary:


Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations

http://arxiv.org/abs/2408.15512v1

Compressor summary:


Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

http://arxiv.org/abs/2408.15510v1

Compressor summary:


What Machine Learning Tells Us About the Mathematical Structure of Concepts

http://arxiv.org/abs/2408.15507v1

Compressor summary:


MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning

http://arxiv.org/abs/2408.15501v1

Compressor summary:


Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network

http://arxiv.org/abs/2408.15498v1

Compressor summary:


ReMamba: Equip Mamba with Effective Long-Sequence Modeling

http://arxiv.org/abs/2408.15496v1

Compressor summary:


Remove Symmetries to Control Model Expressivity

http://arxiv.org/abs/2408.15495v1

Compressor summary:


Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

http://arxiv.org/abs/2408.15491v1

Compressor summary:


Legilimens: Practical and Unified Content Moderation for Large Language Model Services

http://arxiv.org/abs/2408.15488v1

Compressor summary:


NAS-BNN: Neural Architecture Search for Binary Neural Networks

http://arxiv.org/abs/2408.15484v1

Compressor summary:


Dynamic Reconstruction from Neuromorphic Data

http://arxiv.org/abs/2408.15465v1

Compressor summary:


Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

http://arxiv.org/abs/2408.15461v1

Compressor summary:


PersonalizedUS: Interpretable Breast Cancer Risk Assessment with Local Coverage Uncertainty Quantification

http://arxiv.org/abs/2408.15458v1

Compressor summary:


Avoiding Generative Model Writer's Block With Embedding Nudging

http://arxiv.org/abs/2408.15450v1

Compressor summary: