This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-28 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.15241v1
Compressor summary: The paper introduces GenRec, a framework that learns spatial-temporal representations for video generation and recognition, and shows its effectiveness and robustness in various tasks.
http://arxiv.org/abs/2408.15240v1
Compressor summary: The authors propose a new way to improve reasoning performance of large language models by training verifiers jointly on generation and verification tasks, achieving better results than previous methods.
http://arxiv.org/abs/2408.15239v1
Compressor summary: The paper introduces a technique to generate videos with smooth motion between two input frames using a modified pretrained image-to-video diffusion model.
http://arxiv.org/abs/2408.15237v1
Compressor summary: Key points: - Linear RNN architectures like Mamba can compete with Transformers in language modeling - Pretrained Transformers can be distilled into linear RNNs using attention weights - The hybrid model outperforms some open-source models and achieves comparable performance to GPT-4 on chat benchmarks - A hardware-aware speculative decoding algorithm accelerates the inference speed of Mamba and hybrid models Summary: The paper shows how to distill pretrained Transformers into linear RNNs using attention weights, creating a hybrid model that performs well on language modeling tasks and is faster to decode with a hardware-aware algorithm.
http://arxiv.org/abs/2408.15232v1
Compressor summary: Co-STORM is an AI system that allows users to explore unknown information through conversations with several LM agents, who ask questions on their behalf and organize the results into a dynamic mind map and report.
http://arxiv.org/abs/2408.15213v1
Compressor summary: The paper presents a method to automatically classify populist language in political speeches using machine learning, achieving high accuracy rates across different contexts and data amounts.
http://arxiv.org/abs/2408.15204v1
Compressor summary: Confidence-Driv
http://arxiv.org/abs/2408.15185v1
Compressor summary: PoseWatch is a novel transformer-based architecture for detecting anomalous human behaviors in videos, using pose and motion information.
http://arxiv.org/abs/2408.15171v1
Compressor summary: The project proposes using Naive Bayes classification to estimate the factuality of summaries generated by large language models, addressing the problem of "hallucination."
http://arxiv.org/abs/2408.15165v1
Compressor summary: The paper proposes a method to include long-range interactions in machine learning potentials for molecular simulations, improving predictions and reducing artifacts.
http://arxiv.org/abs/2408.15159v1
Compressor summary: This paper presents a new method for synthesizing facial expressions in sign language, which improves sign language production by integrating sentiment information and outperforms existing approaches on benchmark datasets.
http://arxiv.org/abs/2408.15158v1
Compressor summary: The paper studies a stochastic Multi-armed Bandit problem where the payoff depends on the delay, and provides optimal regret bounds for both cost and reward settings.
http://arxiv.org/abs/2408.15143v1
Compressor summary: Key points: - Deep models excel at individual image restoration tasks but struggle with real-world challenges - General image restoration (GIR) is a new problem that covers most individual tasks and aims to address generalization and complex degradations - The paper defines GIR, introduces new datasets and evaluation framework, and analyzes existing approaches - The paper highlights the effectiveness of GIR and its practical difficulties, and suggests future directions for research Summary: The paper proposes general image restoration (GIR), a unified problem to tackle various individual image restoration tasks in real-world scenarios, and evaluates existing methods while identifying challenges and opportunities.
http://arxiv.org/abs/2408.15138v1
Compressor summary: The paper proposes a method to control positional correlations in sequence models on trees using encoder-only transformers that implement optimal Belief Propagation, which is shown by analyzing attention maps.
http://arxiv.org/abs/2408.15133v1
Compressor summary: The text discusses using causal inference methods and counterfactuals to generate natural language explanations for automated decision-making in Explainable AI, using a multi-step pipeline with LLMs.
http://arxiv.org/abs/2408.15128v1
Compressor summary: The text discusses evaluating energy consumption of Machine Learning (ML) using various tools and methods, comparing them, and providing a systematic literature review and open-source repositories for further exploration.
http://arxiv.org/abs/2408.15121v1
Compressor summary: The study investigates how to select appropriate Explainable AI methods for medical devices in compliance with EU regulations, using a categorization of smart devices, an analysis of legal requirements, and a classification of XAI objectives.
http://arxiv.org/abs/2408.15119v1
Compressor summary: The paper presents an OCR model for Urdu text with transformer-based architecture and attention mechanisms, achieving high accuracy and handling diverse styles but facing challenges in certain conditions.
http://arxiv.org/abs/2408.15116v1
Compressor summary: The paper proposes a mechanism for causing alignment issues in future AI systems by destabilizing their priorities and evaluates two risk factors for this mechanism using current large language models.
http://arxiv.org/abs/2408.15114v1
Compressor summary: The paper proposes a new method for learning Neural Signed Distance Functions (SDF) from sparse 3D point clouds by using adversarial samples to improve the representation of shapes.
http://arxiv.org/abs/2408.15113v1
Compressor summary: APC is a new system for anomaly detection in manufacturing images that uses fine-tuned feature extractors and memory banks to identify unusual features better than existing methods.
http://arxiv.org/abs/2408.15101v1
Compressor summary: Key points: - Multi-task dense scene understanding trains a model for multiple tasks and has many applications - MTMamba++ is a new architecture with a Mamba-based decoder that handles long-range dependency and cross-task interaction - It has two types of core blocks: STM and CTM, which use state-space models and feature/semantic perspectives to enhance information exchange - Experiments show that MTMamba++ outperforms CNN-based and Transformer-based methods on various datasets Summary: MTMamba++ is a novel architecture for multi-task dense scene understanding that uses a Mamba-based decoder, state-space models, and feature/semantic perspectives to improve performance over existing methods.
http://arxiv.org/abs/2408.15099v1
Compressor summary: The study finds that state-of-the-art Unsupervised Environment Design methods for reinforcement learning are not robust in a real-world robotics problem, and proposes a simple and intuitive approach based on training agents on levels with high learnability.
http://arxiv.org/abs/2408.15098v1
Compressor summary: The paper proposes a new model, CLIP-AGIQA, to assess the quality of AI-generated images using the visual and textual knowledge of CLIP, a powerful visual language model.
http://arxiv.org/abs/2408.15096v1
Compressor summary: The paper presents a new post-processing algorithm that reduces bias without sensitive attribute input and maintains minimal changes between biased and debiased predictions using a multiplicative factor on logit values.
http://arxiv.org/abs/2408.15094v1
Compressor summary: The paper proposes constrained diffusion models that generate data with desired distributions and requirements, reducing the gap between original and generated data while obeying constraints.
http://arxiv.org/abs/2408.15091v1
Compressor summary: The paper proposes a relation-focused perspective to improve knowledge editing in transformer language models and reduce over-generalization.
http://arxiv.org/abs/2408.15079v1
Compressor summary: Key points: - Data processing pipeline to scale up and improve quality of pretraining datasets - BaichuanSEED: a 7B LLM baseline with open-source data pipeline - Comparable performance on benchmarks with commercial models - Potential for further optimization of downstream tasks Summary: The paper introduces an open-source data processing pipeline for large language models and shows that BaichuanSEED, a 7B LLM trained with it, performs well on various tasks.
http://arxiv.org/abs/2408.15077v1
Compressor summary: MMASD+ is a multimodal dataset and algorithm for autism diagnosis and action prediction that improves accuracy over single-modality approaches.
http://arxiv.org/abs/2408.15076v1
Compressor summary: MiWaves is an RL algorithm that delivers personalized intervention messages to reduce cannabis use among emerging adults.
http://arxiv.org/abs/2408.15073v1
Compressor summary: DAVOTS is an interactive visual analytics tool that helps users explore and understand explanations from deep neural networks for time series data using dense-pixel visualization, clustering, and ordering strategies.
http://arxiv.org/abs/2408.15069v1
Compressor summary: The paper proposes an efficient method to address geometric artifacts in Symmetric Multi-Linear Computed Tomography (SMLCT) by using the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) algorithm for image registration.
http://arxiv.org/abs/2408.15063v1
Compressor summary: Key points: - The paper proposes a novel framework to use pre-trained SAM for multi-modal SOD tasks - The framework incorporates multi-modal saliency-specific knowledge into SAM using semantic feature fusion and adapter - The framework also uses semantic-geometric prompts to generate embeddings with various saliency cues - The experiments show the effectiveness of the proposed framework on RGB-D and RGB-T SOD benchmarks Summary: The paper presents a new framework that adapts pre-trained SAM for multi-modal SOD by fusing multi-modal semantic features, using an adapter to encode them, and generating embeddings with saliency cues. The framework outperforms existing methods on RGB-D and RGB-T datasets.
http://arxiv.org/abs/2408.15057v1
Compressor summary: mobDRF is an interpretable representation learning algorithm that uses transparent IF-THEN rules to improve existing models without sacrificing accuracy, which can help identify key risk factors for cognitive decline in healthcare.
http://arxiv.org/abs/2408.15055v1
Compressor summary: Key points: - HTE and CATE are important for personalized treatment recommendations - Existing approaches are good at estimating HTE but not very interpretable - Causal Rule Forest (CRF) is a new model that learns hidden patterns and converts them into interpretable rules - CRF improves the performance and interpretability of other causal inference models for HTE and CATE estimation Summary: The authors propose Causal Rule Forest, a novel model that learns hidden patterns from data and generates interpretable rules to improve the accuracy and interpretability of personalized treatment recommendations based on HTE and CATE.
http://arxiv.org/abs/2408.15050v1
Compressor summary: The paper proposes a new method called BoxTM that improves topic taxonomy discovery by using box embeddings to model semantic scopes and asymmetric distances for hierarchical relations among topics.
http://arxiv.org/abs/2408.15045v1
Compressor summary: DocLayLLM is a multi-modal extension of LLMs for text-rich document understanding, leveraging visual and positional tokens, and enhancing perception of OCR information with chain-of-thought techniques.
http://arxiv.org/abs/2408.15041v1
Compressor summary: The paper proposes a new technique using Graph Neural Networks and Deep Reinforcement Learning to select and schedule Earth observation satellite observations while respecting constraints and maximizing benefit.
http://arxiv.org/abs/2408.15040v1
Compressor summary: The paper gives an introduction to large language models and their applications in EU languages, describing different families of models and the data used to train them.
http://arxiv.org/abs/2408.15038v1
Compressor summary: The text introduces a new method (DNMMSI) for estimating occlusion boundaries in images, a synthetic benchmark (OB-FUTURE) for training, and a real benchmark (OB-LabName) for evaluation.
http://arxiv.org/abs/2408.15037v1
Compressor summary: EATQA is a framework that improves generative question answering by predicting all combinations of (Question, Evidence, Answer) triplets and learning the logical relations between them.
http://arxiv.org/abs/2408.15032v1
Compressor summary: Mamba2MIL is a novel framework for Multiple Instance Learning in Computational Pathology that improves feature fusion and sequence information utilization, achieving better performance than existing methods on various datasets.
http://arxiv.org/abs/2408.15026v1
Compressor summary: The text proposes a sequence-aware self-supervised pre-training method for cardiac ultrasound probe guidance that learns personalized 2D and 3D cardiac structural features, improving navigation accuracy by reducing errors.
http://arxiv.org/abs/2408.15020v1
Compressor summary: Key points: - The paper proposes a hierarchical graph interaction network (HGINet) for camouflaged object detection - HGINet uses region-aware token focusing attention, hierarchical graph interaction transformer, and confidence aggregated feature fusion modules - HGINet outperforms existing methods on four datasets Summary: The paper introduces HGINet, a network that detects camouflaged objects by interacting hierarchical features using attention and fusion modules, achieving state-of-the-art results on four datasets.
http://arxiv.org/abs/2408.15011v1
Compressor summary: TPP is a simple framework that pre-trains new parameters in fine-tuning using a defined pretext task to improve performance in self-supervised learning.
http://arxiv.org/abs/2408.14998v1
Compressor summary: FastTextSpotter is a new framework that combines Swin Transformer and Transformer Encoder-Decoder with a faster self-attention unit to improve the accuracy and efficiency of text spotting in various environments, achieving state-of-the-art results for multilingual scene text.
http://arxiv.org/abs/2408.14991v1
Compressor summary: The paper surveys transformer techniques for speech processing and recognition tasks, covering background, models, data, features, architecture, decoding, evaluation, and toolkits, while discussing challenges and future directions.
http://arxiv.org/abs/2408.14976v1
Compressor summary: The paper proposes a novel method called Prior-free Balanced Replay (PBR) for long-tailed continual learning that uses uncertainty-guided reservoir sampling and two prior-free components to reduce forgetting without knowing the label distribution of the data stream.
http://arxiv.org/abs/2408.14975v1
Compressor summary: MegActor-Σ is a new mixed-modal conditional diffusion transformer that enables flexible control of portrait animation using both audio and visual inputs, achieving better results than previous methods.
http://arxiv.org/abs/2408.14972v1
Compressor summary: The text introduces AgentMonitor, a framework that predicts multi-agent system performance before execution and enhances their security by detecting and correcting malicious agents in real time.
http://arxiv.org/abs/2408.14964v1
Compressor summary: The Multi-Modal Fusion framework combines Graph Neural Networks and Large Language Models to improve molecular property predictions by leveraging their complementary strengths in graph data and linguistic knowledge.
http://arxiv.org/abs/2408.14962v1
Compressor summary:
http://arxiv.org/abs/2408.14961v1
Compressor summary:
http://arxiv.org/abs/2408.14960v1
Compressor summary:
http://arxiv.org/abs/2408.14950v1
Compressor summary:
http://arxiv.org/abs/2408.14941v1
Compressor summary:
http://arxiv.org/abs/2408.14935v1
Compressor summary:
http://arxiv.org/abs/2408.14930v1
Compressor summary:
http://arxiv.org/abs/2408.14916v1
Compressor summary:
http://arxiv.org/abs/2408.14909v1
Compressor summary:
http://arxiv.org/abs/2408.14906v1
Compressor summary:
http://arxiv.org/abs/2408.14899v1
Compressor summary:
http://arxiv.org/abs/2408.14895v1
Compressor summary:
http://arxiv.org/abs/2408.14892v1
Compressor summary:
http://arxiv.org/abs/2408.14874v1
Compressor summary:
http://arxiv.org/abs/2408.14871v1
Compressor summary:
http://arxiv.org/abs/2408.14868v1
Compressor summary:
http://arxiv.org/abs/2408.14866v1
Compressor summary:
http://arxiv.org/abs/2408.14864v1
Compressor summary:
http://arxiv.org/abs/2408.14860v1
Compressor summary:
http://arxiv.org/abs/2408.14855v1
Compressor summary:
http://arxiv.org/abs/2408.14853v1
Compressor summary:
http://arxiv.org/abs/2408.14849v1
Compressor summary:
http://arxiv.org/abs/2408.14846v1
Compressor summary:
http://arxiv.org/abs/2408.14845v1
Compressor summary:
http://arxiv.org/abs/2408.14843v1
Compressor summary:
http://arxiv.org/abs/2408.14842v1
Compressor summary:
http://arxiv.org/abs/2408.14841v1
Compressor summary:
http://arxiv.org/abs/2408.14840v1
Compressor summary:
http://arxiv.org/abs/2408.14837v1
Compressor summary:
http://arxiv.org/abs/2408.14829v1
Compressor summary:
http://arxiv.org/abs/2408.14826v1
Compressor summary:
http://arxiv.org/abs/2408.14825v1
Compressor summary:
http://arxiv.org/abs/2408.14823v1
Compressor summary:
http://arxiv.org/abs/2408.14821v1
Compressor summary:
http://arxiv.org/abs/2408.14819v1
Compressor summary:
http://arxiv.org/abs/2408.14817v1
Compressor summary:
http://arxiv.org/abs/2408.14812v1
Compressor summary:
http://arxiv.org/abs/2408.14811v1
Compressor summary:
http://arxiv.org/abs/2408.14809v1
Compressor summary:
http://arxiv.org/abs/2408.14806v1
Compressor summary:
http://arxiv.org/abs/2408.14805v1
Compressor summary:
http://arxiv.org/abs/2408.14802v1
Compressor summary:
http://arxiv.org/abs/2408.14791v1
Compressor summary:
http://arxiv.org/abs/2408.14788v1
Compressor summary:
http://arxiv.org/abs/2408.14785v1
Compressor summary:
http://arxiv.org/abs/2408.14780v1
Compressor summary:
http://arxiv.org/abs/2408.14774v1
Compressor summary:
http://arxiv.org/abs/2408.14772v1
Compressor summary:
http://arxiv.org/abs/2408.14770v1
Compressor summary:
http://arxiv.org/abs/2408.14765v1
Compressor summary:
http://arxiv.org/abs/2408.14764v1
Compressor summary:
http://arxiv.org/abs/2408.14763v1
Compressor summary:
http://arxiv.org/abs/2408.14762v1
Compressor summary:
http://arxiv.org/abs/2408.14757v1
Compressor summary:
http://arxiv.org/abs/2408.14756v1
Compressor summary:
http://arxiv.org/abs/2408.14750v1
Compressor summary:
http://arxiv.org/abs/2408.14744v1
Compressor summary:
http://arxiv.org/abs/2408.14743v1
Compressor summary:
http://arxiv.org/abs/2408.14738v1
Compressor summary:
http://arxiv.org/abs/2408.14734v1
Compressor summary:
http://arxiv.org/abs/2408.14732v1
Compressor summary:
http://arxiv.org/abs/2408.14724v1
Compressor summary:
http://arxiv.org/abs/2408.14723v1
Compressor summary:
http://arxiv.org/abs/2408.14721v1
Compressor summary: