arxiv compressed, 2024-07-01

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-01 generated by the compressor, my personal LLM-based project.


Odd-One-Out: Anomaly Detection by Comparing with Neighbors

http://arxiv.org/abs/2406.20099v1

Compressor summary: The paper proposes a new anomaly detection problem that focuses on identifying odd-looking objects in a scene using multiple views, introduces two benchmarks, and presents a novel method to detect them with 3D object-centric representations.


Backdoor Attack in Prompt-Based Continual Learning

http://arxiv.org/abs/2406.19753v1

Compressor summary: Prompt-based continual learning faces backdoor attacks that manipulate prompts to make models follow a desired target when triggered, and the paper proposes solutions for transferability, resiliency, and authenticity challenges.


MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

http://arxiv.org/abs/2406.19736v1

Compressor summary: MM-Instruct is a new visual instruction dataset that enhances the capabilities of large multimodal models for various tasks by generating diverse and high-quality data from conventional image captioning datasets.


Message du troisi{è}me type : irruption d'un tiers dans un dialogue en ligne

http://arxiv.org/abs/2406.19731v1

Compressor summary: The study analyzes Wikipedia talk pages in French, focusing on multiparty conversations and the role of a third participant's intervention in these discussions.


Le sens de la famille : analyse du vocabulaire de la parent{é} par les plongements de mots

http://arxiv.org/abs/2406.19729v1

Compressor summary: The study analyzes how the structure and meaning of family relationship terms in French are revealed by their distribution in various corpora.


EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans

http://arxiv.org/abs/2406.19726v1

Compressor summary: The study proposes EPOCH, a framework that uses the full perspective camera model to estimate 3D human joint positions from single 2D images with weak supervision, achieving state-of-the-art results.


Uncertainty Quantification in Large Language Models Through Convex Hull Analysis

http://arxiv.org/abs/2406.19712v1

Compressor summary: The study proposes a new geometric method to measure uncertainty in large language models using convex hull analysis of response embeddings based on prompt complexity, model, and temperature setting.


CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

http://arxiv.org/abs/2406.19711v1

Compressor summary: CHASE is a framework for detecting anomalies and finding root causes in microservice systems using multimodal data and causality-based hypergraphs, outperforming existing methods.


InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

http://arxiv.org/abs/2406.19707v1

Compressor summary: InfiniGen is a framework that efficiently manages the key-value cache for long-text generation in large language models, improving performance and accuracy.


DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

http://arxiv.org/abs/2406.19705v1

Compressor summary: DISCO is an efficient diffusion solver for combinatorial optimization problems that improves solution quality and speed by denoising solutions quickly and sampling from a more meaningful domain.


Vision Transformer with Key-select Routing Attention for Single Image Dehazing

http://arxiv.org/abs/2406.19703v1

Compressor summary: Ksformer is a new dehazing method that uses MKRA for selective key area extraction and LFPM for enhancing high-frequency features, achieving better results than previous methods.


Deep Fusion Model for Brain Tumor Classification Using Fine-Grained Gradient Preservation

http://arxiv.org/abs/2406.19690v1

Compressor summary: The research proposes a novel computer vision-based architecture that accurately and quickly classifies brain tumors, making it suitable for deployment in resource-limited areas.


MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

http://arxiv.org/abs/2406.19680v1

Compressor summary: MimicMotion is a framework for generating high-quality, controllable videos of any length by mimicking specific motion guidance with improved pose confidence and reduced image distortion.


Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

http://arxiv.org/abs/2406.19675v1

Compressor summary: This paper surveys deep learning-based methods for estimating depth from single RGB images and videos, categorizing them by input/output modalities, network architectures, and learning methods, and discussing milestones, pipelines, datasets, and metrics.


Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

http://arxiv.org/abs/2406.19674v1

Compressor summary: Canary is a multilingual speech recognition and translation model that performs better than current models on four languages using much less data and advanced techniques.


Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

http://arxiv.org/abs/2406.19672v1

Compressor summary: The paper proposes DOTCNet, a novel finger knuckle print recognition approach that captures both first and second order textures using learnable Gabor filters and an attention mechanism.


PopAlign: Population-Level Alignment for Fair Text-to-Image Generation

http://arxiv.org/abs/2406.19668v1

Compressor summary: PopAlign is a method to reduce biases in text-to-image models by optimizing for population-level preferences during training.


CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

http://arxiv.org/abs/2406.19666v1

Compressor summary: The paper proposes a novel knowledge distillation framework for hyperspectral image fusion and super-resolution, using a Cross-Layer Residual Aggregation block and a Cross Self-Attention fusion module to enhance efficiency and quality.


PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation

http://arxiv.org/abs/2406.19665v1

Compressor summary: The paper presents PM-VIS+, a method that achieves high video instance segmentation performance without manual video annotations, using image datasets and adapting supervision based on annotation types.


Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems

http://arxiv.org/abs/2406.19662v1

Compressor summary: The authors propose a domain decomposition method for training KANs in parallel, called FBKANs, which can handle multiscale problems and noisy data using physics-informed training.


LLMEasyQuant -- An Easy to Use Toolkit for LLM Quantization

http://arxiv.org/abs/2406.19657v1

Compressor summary: LLMEasyQuant is a user-friendly and beginner-friendly package for LLM quantization that simplifies deployment and learning.


Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking

http://arxiv.org/abs/2406.19655v1

Compressor summary: The text describes a new method for tracking multiple objects in basketball videos, called Basketball-SORT, which handles occlusions and complex motions better than existing methods.


ACES: Automatic Cohort Extraction System for Event-Stream Datasets

http://arxiv.org/abs/2406.19653v1

Compressor summary: ACES is a tool that simplifies defining and extracting cohorts for machine learning tasks using event-stream datasets, improving reproducibility in healthcare studies.


DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting

http://arxiv.org/abs/2406.19650v1

Compressor summary: DECOR is a benchmark dataset to improve L2 English writing by detecting and rewriting incoherent sentences using expert annotations and fine-tuned models.


Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

http://arxiv.org/abs/2406.19644v1

Compressor summary: LLM4PG uses large language models to generate preferences for reinforcement learning, enabling faster convergence and better performance in complex game tasks with diverse constraints.


Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation

http://arxiv.org/abs/2406.19643v1

Compressor summary: Key points: - Argument writing is challenging for humans and machines - Current language models lack coherence and diversity in output - Proposed persona-based multi-agent framework inspired by human debate - Framework enables fluid and nonlinear development of ideas - Framework improves argument quality in essay writing Summary: The authors propose a novel framework that simulates human debate to generate more diverse and persuasive arguments for essay writing, using multiple agents with different personas.


IDT: Dual-Task Adversarial Attacks for Privacy Protection

http://arxiv.org/abs/2406.19642v1

Compressor summary: IDT is a method that uses adversarial techniques to modify texts to protect privacy without losing utility, by identifying and changing sensitive tokens based on auxiliary models' predictions.


Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

http://arxiv.org/abs/2406.19640v1

Compressor summary: RMFNet is a network that uses Feature Fusion Modules and Feature Exchange Modules to improve super-resolution of event streams by separating positive and negative events, fusing contextual information, and exchanging information between branches.


Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation

http://arxiv.org/abs/2406.19638v1

Compressor summary: ORANDNet is an ensemble method that improves semantic segmentation using CAMs from different classifiers and curriculum learning to increase precision and reduce noise.


Model Predictive Simulation Using Structured Graphical Models and Transformers

http://arxiv.org/abs/2406.19635v1

Compressor summary: The paper proposes a Model Predictive Simulation (MPS) method that uses transformers and probabilistic graphical models to improve safety and realism of trajectories for multiple interacting agents in a simulation.


PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

http://arxiv.org/abs/2406.19632v1

Compressor summary: The PPTFormer is a novel network that enhances UAV image segmentation by creating pseudo perspectives without needing multi-perspective labeled datasets, achieving state-of-the-art results.


Optimal Video Compression using Pixel Shift Tracking

http://arxiv.org/abs/2406.19630v1

Compressor summary: The paper introduces Redundancy Removal using Shift (R²S), a video compression method that removes redundant pixels across various ML models, improving storage efficiency and adaptability.


Safety through feedback in Constrained RL

http://arxiv.org/abs/2406.19626v1

Compressor summary: The paper proposes an approach to learn a cost function for safe reinforcement learning from trajectory-level feedback, using a surrogate objective and novelty-based sampling to reduce the burden on human evaluators.


Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

http://arxiv.org/abs/2406.19617v1

Compressor summary: This paper develops an algorithm that optimizes second-order smooth and strongly convex functions under noisy evaluations by combining bootstrapping and mirror-descent stages with a novel gradient estimator.


VarteX: Enhancing Weather Forecast through Distributed Variable Representation

http://arxiv.org/abs/2406.19615v1

Compressor summary: VarteX is a new framework for deep learning-based weather forecasting that efficiently handles multiple variables and outperforms conventional models with fewer parameters and resources.


A Survey on Data Quality Dimensions and Tools for Machine Learning

http://arxiv.org/abs/2406.19614v1

Compressor summary: This paper surveys 17 data quality evaluation and improvement tools for machine learning, discussing their strengths, limitations, and potential applications of large language models and generative AI in this field.


A Survey on Deep Clustering: From the Prior Perspective

http://arxiv.org/abs/2406.19602v1

Compressor summary: Key points: - Deep clustering is a powerful method for analyzing complex data using neural networks and prior knowledge. - The survey reviews different types of prior knowledge used in deep clustering methods and their evolution. - The survey provides a benchmark on five datasets and analyzes the performance of methods with diverse priors. Summary: The survey presents a comprehensive overview of deep clustering methods that use neural networks and prior knowledge to analyze complex data, categorizes them into six types of prior knowledge, and evaluates their performance on five datasets.


Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

http://arxiv.org/abs/2406.19598v1

Compressor summary: MoICE is a novel method that enhances large language models' context awareness by using routers to select the best RoPE angles for each attention head, improving performance and efficiency on various tasks.


SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

http://arxiv.org/abs/2406.19593v1

Compressor summary: Synthetic data helps train large vision and language models in context-augmented generation systems by providing diverse and challenging multimodal questions.