arxiv compressed, 2024-06-24

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-24 generated by the compressor, my personal LLM-based project.


A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

http://arxiv.org/abs/2406.15352v1

Compressor summary: SMART is a mnemonic generator that learns from students' preferences and feedback to create effective mnemonics for learning new terms.


NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

http://arxiv.org/abs/2406.15349v1

Compressor summary: NAVSIM is a non-reactive simulator that uses large datasets to benchmark vision-based driving policies in a middle ground between open-loop and closed-loop evaluation, enabling large-scale real-world benchmarking and a new competition at CVPR 2024.


GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians

http://arxiv.org/abs/2406.15341v1

Compressor summary: GenoTEX is a benchmark dataset for evaluating and developing LLM-based agents to automatically explore gene expression data for disease association identification.


Image Conductor: Precision Control for Interactive Video Synthesis

http://arxiv.org/abs/2406.15339v1

Compressor summary: Image Conductor is a method for generating precise and controllable camera transitions and object movements from a single image, using a well-cultivated training strategy and a camera-free guidance technique.


Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

http://arxiv.org/abs/2406.15335v1

Compressor summary: The study proposes a method to detect cheating using keystroke dynamics in online exams, achieving moderate to high accuracy depending on the scenario.


Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

http://arxiv.org/abs/2406.15334v1

Compressor summary: Our method compresses multimodal examples into fewer tokens using implicit representations extracted from attention heads, enabling LMMs to perform better many-shot in-context learning.


GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

http://arxiv.org/abs/2406.15333v1

Compressor summary: GeoLRM is a novel approach that leverages geometric relationships between 3D and 2D images to predict high-quality assets using fewer Gaussians, less memory, and outperforming existing models.


Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild

http://arxiv.org/abs/2406.15331v1

Compressor summary: The paper proposes a novel method for virtual try-on that uses a diffusion model with extended attention and no extra training, achieving better image quality and garment preservation.


Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance

http://arxiv.org/abs/2406.15330v1

Compressor summary: Gradient-Mask Tuning (GMT) is a method to improve large language models by selectively updating parameters based on gradient information, leading to better performance across various tasks and greater efficiency.


An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT

http://arxiv.org/abs/2406.15329v1

Compressor summary: The paper presents an image-based sequence recognition model that works without segmentation on a large handwritten Arabic text database and has various applications in different fields.


Fine-grained Attention in Hierarchical Transformers for Tabular Time-series

http://arxiv.org/abs/2406.15327v1

Compressor summary: Fieldy is a fine-grained hierarchical model for tabular time-series data that uses both row-wise and column-wise attention to learn patterns at the field level, improving performance on regression and classification tasks.


Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks

http://arxiv.org/abs/2406.15325v1

Compressor summary: The paper introduces BICS, a benchmark to evaluate LLMs' ability to detect syntax bugs in large code, and reveals challenges and disparities among models.


Rethinking Remote Sensing Change Detection With A Mask View

http://arxiv.org/abs/2406.15320v1

Compressor summary: The paper proposes CDMask and CDMaskFormer, two models for remote sensing change detection that use mask view and adaptive change query to accurately identify changes in complex scenes.


LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

http://arxiv.org/abs/2406.15319v1

Compressor summary: LongRAG improves answer retrieval by increasing the unit size and using a long context language model for extraction, achieving state-of-the-art results on NQ and HotpotQA.


Learning Spatio-Temporal Patterns of Polar Ice Layers With Physics-Informed Graph Neural Network

http://arxiv.org/abs/2406.15299v1

Compressor summary: The paper proposes a physics-informed hybrid graph neural network to learn and predict spatio-temporal patterns of polar ice layers from thickness data, using weather model measurements as physical node features.


NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing

http://arxiv.org/abs/2406.15294v1

Compressor summary: NLP-KG is a system that helps users explore NLP research literature using semantic search, survey papers, a hierarchy graph, and a chat interface.


Pessimistic asynchronous sampling in high-cost Bayesian optimization

http://arxiv.org/abs/2406.15291v1

Compressor summary: Asynchronous Bayesian optimization uses parallel experiments to speed up data generation and optimize complex systems with pessimistic predictions.


The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data

http://arxiv.org/abs/2406.15284v1

Compressor summary: The authors create a large corpus of Modern Greek podcasts using weak supervision and show that it improves ASR performance in the language.


FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

http://arxiv.org/abs/2406.15283v1

Compressor summary: The paper introduces a large lane-level freeway traffic dataset for anomaly detection, which could improve emergency response and clearance by reducing delays and errors in event identification and reporting.


Cross-Modality Safety Alignment

http://arxiv.org/abs/2406.15279v1

Compressor summary: The text introduces a new safety challenge for evaluating cross-modality interactions in AI systems that could lead to unsafe or unethical outputs.


Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model

http://arxiv.org/abs/2406.15275v1

Compressor summary: The paper explores how language models can improve their planning abilities by constructing a cognitive map of an environment, similar to human thinking.


You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation

http://arxiv.org/abs/2406.15269v1

Compressor summary: The YOAS framework generates dense-channel EEG signals from sparse-channel data by optimizing cross-channel problems, overcoming challenges through four stages, and improving data discernibility.


Towards Robust Training Datasets for Machine Learning with Ontologies: A Case Study for Emergency Road Vehicle Detection

http://arxiv.org/abs/2406.15268v1

Compressor summary: The paper proposes using ontologies to check the completeness and quality of ML training data in safety-critical domains like autonomous driving, increasing trust in model decisions.


Evaluating Diversity in Automatic Poetry Generation

http://arxiv.org/abs/2406.15267v1

Compressor summary: The paper evaluates automatic poetry generation systems and finds they are underdiverse in rhyme, semantics, and length, but style-conditioning and character-level modeling can improve diversity.


Perception of Phonological Assimilation by Neural Speech Recognition Models

http://arxiv.org/abs/2406.15265v1

Compressor summary: The article investigates how Wav2Vec2, a neural speech recognition model, compensates for assimilated sounds like [m] in "clea[m] pan" during Automatic Speech Recognition, using linguistic context cues to infer the intended sounds.


Fingerprint Membership and Identity Inference Against Generative Adversarial Networks

http://arxiv.org/abs/2406.15253v1

Compressor summary: This paper evaluates the risks of using generative models for biometrics and proposes an attack method on fingerprint datasets created by a generative network.


MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

http://arxiv.org/abs/2406.15252v1

Compressor summary: The paper introduces VideoFeedback, a large-scale dataset for video quality assessment, and MantisScore, a new metric that correlates well with human judges, improving on existing metrics.


Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning

http://arxiv.org/abs/2406.15250v1

Compressor summary: The text discusses the theoretical aspects of reinforcement learning, focusing on non-linear function approximation using kernel-based prediction, and the need for better performance guarantees in this case.


Unsupervised Morphological Tree Tokenizer

http://arxiv.org/abs/2406.15245v1

Compressor summary: The paper introduces a deep learning approach to tokenize text with morphological structure guidance, which improves semantic information and outperforms existing methods on language modeling tasks.


Large Batch Analysis for Adagrad Under Anisotropic Smoothness

http://arxiv.org/abs/2406.15244v1

Compressor summary: The paper analyzes the convergence properties of Adagrad, an adaptive gradient algorithm, on smooth objective functions for large batch sizes, and shows its advantages over SGD in theory and practice.


Detecting Synthetic Lyrics with Few-Shot Inference

http://arxiv.org/abs/2406.15231v1

Compressor summary: The paper introduces a new dataset for detecting generated lyrics in music and evaluates various methods, showing that LLM2Vec outperforms previous approaches.


ExDAG: Exact learning of DAGs

http://arxiv.org/abs/2406.15229v1

Compressor summary: ExDAG is a new method for learning causal structures from data using mixed-integer quadratic programming that performs well in identifying DAGs with up to 50 vertices and outperforms existing solvers.


A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation

http://arxiv.org/abs/2406.15227v1

Compressor summary: The paper proposes a new method to evaluate and generate counter narratives using large language models, which achieve high correlation with human judgments.


Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

http://arxiv.org/abs/2406.15225v1

Compressor summary: The paper proposes a Deep Reinforcement Learning framework for planning UAV paths with good cellular network connectivity in urban scenarios.


Unsupervised Extraction of Dialogue Policies from Conversations

http://arxiv.org/abs/2406.15214v1

Compressor summary: The paper proposes a method to extract and generate dialogue policies from conversational data using large language models and graph-based techniques for improved task-oriented dialogue systems.


Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors

http://arxiv.org/abs/2406.15213v1

Compressor summary: This paper shows how to inject biases into text-conditional image generative models with a backdoor that activates when triggered by specific words, highlighting the challenges of detecting and preventing such attacks.


How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy?

http://arxiv.org/abs/2406.15211v1

Compressor summary: GPT-4 Turbo generates educational questions that require higher-order thinking skills, but its effectiveness varies by cognitive level and human evaluation.


Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms

http://arxiv.org/abs/2406.15198v1

Compressor summary: The study explores using advanced language models in robot-assisted therapy for ADHD, finding that ChatGPT-4 Turbo performs better for time-sensitive applications, while Claude-3 Opus prioritizes safe and engaging interactions.


Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

http://arxiv.org/abs/2406.15193v1

Compressor summary: The paper proposes a new method to align LLM responses with user preferences by decoupling exploration and exploitation and using an evolutionary approach, which performs better than existing methods on two benchmarks.


Causal Learning in Biomedical Applications

http://arxiv.org/abs/2406.15189v1

Compressor summary: The paper introduces a test for evaluating how well methods learn causality from time-series data, using the Krebs cycle and other metabolic models as examples.


UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

http://arxiv.org/abs/2406.15187v1

Compressor summary: The paper introduces UDA, a benchmark suite for evaluating LLMs and RAGs on real-world unstructured documents with expert-annotated Q&A pairs, highlighting the importance of parsing and retrieval.


DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

http://arxiv.org/abs/2406.15182v1

Compressor summary: The paper proposes a method to generate counterfactual images that reveal influential features for AI model predictions, improving the reliability of medical image classification.


Hybrid Alignment Training for Large Language Models

http://arxiv.org/abs/2406.15178v1

Compressor summary: The paper proposes a Hybrid Alignment Training (Hbat) method for large language models that alternates between instruction-following and human-preference alignment, improving their performance on summarization and dialogue tasks.