This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-01 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.20099v1
Compressor summary: The paper proposes a new anomaly detection problem that focuses on identifying odd-looking objects in a scene using multiple views, introduces two benchmarks, and presents a novel method to detect them with 3D object-centric representations.
http://arxiv.org/abs/2406.19753v1
Compressor summary: Prompt-based continual learning faces backdoor attacks that manipulate prompts to make models follow a desired target when triggered, and the paper proposes solutions for transferability, resiliency, and authenticity challenges.
http://arxiv.org/abs/2406.19736v1
Compressor summary: MM-Instruct is a new visual instruction dataset that enhances the capabilities of large multimodal models for various tasks by generating diverse and high-quality data from conventional image captioning datasets.
http://arxiv.org/abs/2406.19731v1
Compressor summary: The study analyzes Wikipedia talk pages in French, focusing on multiparty conversations and the role of a third participant's intervention in these discussions.
http://arxiv.org/abs/2406.19729v1
Compressor summary: The study analyzes how the structure and meaning of family relationship terms in French are revealed by their distribution in various corpora.
http://arxiv.org/abs/2406.19726v1
Compressor summary: The study proposes EPOCH, a framework that uses the full perspective camera model to estimate 3D human joint positions from single 2D images with weak supervision, achieving state-of-the-art results.
http://arxiv.org/abs/2406.19712v1
Compressor summary: The study proposes a new geometric method to measure uncertainty in large language models using convex hull analysis of response embeddings based on prompt complexity, model, and temperature setting.
http://arxiv.org/abs/2406.19711v1
Compressor summary: CHASE is a framework for detecting anomalies and finding root causes in microservice systems using multimodal data and causality-based hypergraphs, outperforming existing methods.
http://arxiv.org/abs/2406.19707v1
Compressor summary: InfiniGen is a framework that efficiently manages the key-value cache for long-text generation in large language models, improving performance and accuracy.
http://arxiv.org/abs/2406.19705v1
Compressor summary: DISCO is an efficient diffusion solver for combinatorial optimization problems that improves solution quality and speed by denoising solutions quickly and sampling from a more meaningful domain.
http://arxiv.org/abs/2406.19703v1
Compressor summary: Ksformer is a new dehazing method that uses MKRA for selective key area extraction and LFPM for enhancing high-frequency features, achieving better results than previous methods.
http://arxiv.org/abs/2406.19690v1
Compressor summary: The research proposes a novel computer vision-based architecture that accurately and quickly classifies brain tumors, making it suitable for deployment in resource-limited areas.
http://arxiv.org/abs/2406.19680v1
Compressor summary: MimicMotion is a framework for generating high-quality, controllable videos of any length by mimicking specific motion guidance with improved pose confidence and reduced image distortion.
http://arxiv.org/abs/2406.19675v1
Compressor summary: This paper surveys deep learning-based methods for estimating depth from single RGB images and videos, categorizing them by input/output modalities, network architectures, and learning methods, and discussing milestones, pipelines, datasets, and metrics.
http://arxiv.org/abs/2406.19674v1
Compressor summary: Canary is a multilingual speech recognition and translation model that performs better than current models on four languages using much less data and advanced techniques.
http://arxiv.org/abs/2406.19672v1
Compressor summary: The paper proposes DOTCNet, a novel finger knuckle print recognition approach that captures both first and second order textures using learnable Gabor filters and an attention mechanism.
http://arxiv.org/abs/2406.19668v1
Compressor summary: PopAlign is a method to reduce biases in text-to-image models by optimizing for population-level preferences during training.
http://arxiv.org/abs/2406.19666v1
Compressor summary: The paper proposes a novel knowledge distillation framework for hyperspectral image fusion and super-resolution, using a Cross-Layer Residual Aggregation block and a Cross Self-Attention fusion module to enhance efficiency and quality.
http://arxiv.org/abs/2406.19665v1
Compressor summary: The paper presents PM-VIS+, a method that achieves high video instance segmentation performance without manual video annotations, using image datasets and adapting supervision based on annotation types.
http://arxiv.org/abs/2406.19662v1
Compressor summary: The authors propose a domain decomposition method for training KANs in parallel, called FBKANs, which can handle multiscale problems and noisy data using physics-informed training.
http://arxiv.org/abs/2406.19657v1
Compressor summary: LLMEasyQuant is a user-friendly and beginner-friendly package for LLM quantization that simplifies deployment and learning.
http://arxiv.org/abs/2406.19655v1
Compressor summary: The text describes a new method for tracking multiple objects in basketball videos, called Basketball-SORT, which handles occlusions and complex motions better than existing methods.
http://arxiv.org/abs/2406.19653v1
Compressor summary: ACES is a tool that simplifies defining and extracting cohorts for machine learning tasks using event-stream datasets, improving reproducibility in healthcare studies.
http://arxiv.org/abs/2406.19650v1
Compressor summary: DECOR is a benchmark dataset to improve L2 English writing by detecting and rewriting incoherent sentences using expert annotations and fine-tuned models.
http://arxiv.org/abs/2406.19644v1
Compressor summary: LLM4PG uses large language models to generate preferences for reinforcement learning, enabling faster convergence and better performance in complex game tasks with diverse constraints.
http://arxiv.org/abs/2406.19643v1
Compressor summary: Key points: - Argument writing is challenging for humans and machines - Current language models lack coherence and diversity in output - Proposed persona-based multi-agent framework inspired by human debate - Framework enables fluid and nonlinear development of ideas - Framework improves argument quality in essay writing Summary: The authors propose a novel framework that simulates human debate to generate more diverse and persuasive arguments for essay writing, using multiple agents with different personas.
http://arxiv.org/abs/2406.19642v1
Compressor summary: IDT is a method that uses adversarial techniques to modify texts to protect privacy without losing utility, by identifying and changing sensitive tokens based on auxiliary models' predictions.
http://arxiv.org/abs/2406.19640v1
Compressor summary: RMFNet is a network that uses Feature Fusion Modules and Feature Exchange Modules to improve super-resolution of event streams by separating positive and negative events, fusing contextual information, and exchanging information between branches.
http://arxiv.org/abs/2406.19638v1
Compressor summary: ORANDNet is an ensemble method that improves semantic segmentation using CAMs from different classifiers and curriculum learning to increase precision and reduce noise.
http://arxiv.org/abs/2406.19635v1
Compressor summary: The paper proposes a Model Predictive Simulation (MPS) method that uses transformers and probabilistic graphical models to improve safety and realism of trajectories for multiple interacting agents in a simulation.
http://arxiv.org/abs/2406.19632v1
Compressor summary: The PPTFormer is a novel network that enhances UAV image segmentation by creating pseudo perspectives without needing multi-perspective labeled datasets, achieving state-of-the-art results.
http://arxiv.org/abs/2406.19630v1
Compressor summary: The paper introduces Redundancy Removal using Shift (R²S), a video compression method that removes redundant pixels across various ML models, improving storage efficiency and adaptability.
http://arxiv.org/abs/2406.19626v1
Compressor summary: The paper proposes an approach to learn a cost function for safe reinforcement learning from trajectory-level feedback, using a surrogate objective and novelty-based sampling to reduce the burden on human evaluators.
http://arxiv.org/abs/2406.19617v1
Compressor summary: This paper develops an algorithm that optimizes second-order smooth and strongly convex functions under noisy evaluations by combining bootstrapping and mirror-descent stages with a novel gradient estimator.
http://arxiv.org/abs/2406.19615v1
Compressor summary: VarteX is a new framework for deep learning-based weather forecasting that efficiently handles multiple variables and outperforms conventional models with fewer parameters and resources.
http://arxiv.org/abs/2406.19614v1
Compressor summary: This paper surveys 17 data quality evaluation and improvement tools for machine learning, discussing their strengths, limitations, and potential applications of large language models and generative AI in this field.
http://arxiv.org/abs/2406.19602v1
Compressor summary: Key points: - Deep clustering is a powerful method for analyzing complex data using neural networks and prior knowledge. - The survey reviews different types of prior knowledge used in deep clustering methods and their evolution. - The survey provides a benchmark on five datasets and analyzes the performance of methods with diverse priors. Summary: The survey presents a comprehensive overview of deep clustering methods that use neural networks and prior knowledge to analyze complex data, categorizes them into six types of prior knowledge, and evaluates their performance on five datasets.
http://arxiv.org/abs/2406.19598v1
Compressor summary: MoICE is a novel method that enhances large language models' context awareness by using routers to select the best RoPE angles for each attention head, improving performance and efficiency on various tasks.
http://arxiv.org/abs/2406.19593v1
Compressor summary: Synthetic data helps train large vision and language models in context-augmented generation systems by providing diverse and challenging multimodal questions.