This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-24 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.15352v1
Compressor summary: SMART is a mnemonic generator that learns from students' preferences and feedback to create effective mnemonics for learning new terms.
http://arxiv.org/abs/2406.15349v1
Compressor summary: NAVSIM is a non-reactive simulator that uses large datasets to benchmark vision-based driving policies in a middle ground between open-loop and closed-loop evaluation, enabling large-scale real-world benchmarking and a new competition at CVPR 2024.
http://arxiv.org/abs/2406.15341v1
Compressor summary: GenoTEX is a benchmark dataset for evaluating and developing LLM-based agents to automatically explore gene expression data for disease association identification.
http://arxiv.org/abs/2406.15339v1
Compressor summary: Image Conductor is a method for generating precise and controllable camera transitions and object movements from a single image, using a well-cultivated training strategy and a camera-free guidance technique.
http://arxiv.org/abs/2406.15335v1
Compressor summary: The study proposes a method to detect cheating using keystroke dynamics in online exams, achieving moderate to high accuracy depending on the scenario.
http://arxiv.org/abs/2406.15334v1
Compressor summary: Our method compresses multimodal examples into fewer tokens using implicit representations extracted from attention heads, enabling LMMs to perform better many-shot in-context learning.
http://arxiv.org/abs/2406.15333v1
Compressor summary: GeoLRM is a novel approach that leverages geometric relationships between 3D and 2D images to predict high-quality assets using fewer Gaussians, less memory, and outperforming existing models.
http://arxiv.org/abs/2406.15331v1
Compressor summary: The paper proposes a novel method for virtual try-on that uses a diffusion model with extended attention and no extra training, achieving better image quality and garment preservation.
http://arxiv.org/abs/2406.15330v1
Compressor summary: Gradient-Mask Tuning (GMT) is a method to improve large language models by selectively updating parameters based on gradient information, leading to better performance across various tasks and greater efficiency.
http://arxiv.org/abs/2406.15329v1
Compressor summary: The paper presents an image-based sequence recognition model that works without segmentation on a large handwritten Arabic text database and has various applications in different fields.
http://arxiv.org/abs/2406.15327v1
Compressor summary: Fieldy is a fine-grained hierarchical model for tabular time-series data that uses both row-wise and column-wise attention to learn patterns at the field level, improving performance on regression and classification tasks.
http://arxiv.org/abs/2406.15325v1
Compressor summary: The paper introduces BICS, a benchmark to evaluate LLMs' ability to detect syntax bugs in large code, and reveals challenges and disparities among models.
http://arxiv.org/abs/2406.15320v1
Compressor summary: The paper proposes CDMask and CDMaskFormer, two models for remote sensing change detection that use mask view and adaptive change query to accurately identify changes in complex scenes.
http://arxiv.org/abs/2406.15319v1
Compressor summary: LongRAG improves answer retrieval by increasing the unit size and using a long context language model for extraction, achieving state-of-the-art results on NQ and HotpotQA.
http://arxiv.org/abs/2406.15299v1
Compressor summary: The paper proposes a physics-informed hybrid graph neural network to learn and predict spatio-temporal patterns of polar ice layers from thickness data, using weather model measurements as physical node features.
http://arxiv.org/abs/2406.15294v1
Compressor summary: NLP-KG is a system that helps users explore NLP research literature using semantic search, survey papers, a hierarchy graph, and a chat interface.
http://arxiv.org/abs/2406.15291v1
Compressor summary: Asynchronous Bayesian optimization uses parallel experiments to speed up data generation and optimize complex systems with pessimistic predictions.
http://arxiv.org/abs/2406.15284v1
Compressor summary: The authors create a large corpus of Modern Greek podcasts using weak supervision and show that it improves ASR performance in the language.
http://arxiv.org/abs/2406.15283v1
Compressor summary: The paper introduces a large lane-level freeway traffic dataset for anomaly detection, which could improve emergency response and clearance by reducing delays and errors in event identification and reporting.
http://arxiv.org/abs/2406.15279v1
Compressor summary: The text introduces a new safety challenge for evaluating cross-modality interactions in AI systems that could lead to unsafe or unethical outputs.
http://arxiv.org/abs/2406.15275v1
Compressor summary: The paper explores how language models can improve their planning abilities by constructing a cognitive map of an environment, similar to human thinking.
http://arxiv.org/abs/2406.15269v1
Compressor summary: The YOAS framework generates dense-channel EEG signals from sparse-channel data by optimizing cross-channel problems, overcoming challenges through four stages, and improving data discernibility.
http://arxiv.org/abs/2406.15268v1
Compressor summary: The paper proposes using ontologies to check the completeness and quality of ML training data in safety-critical domains like autonomous driving, increasing trust in model decisions.
http://arxiv.org/abs/2406.15267v1
Compressor summary: The paper evaluates automatic poetry generation systems and finds they are underdiverse in rhyme, semantics, and length, but style-conditioning and character-level modeling can improve diversity.
http://arxiv.org/abs/2406.15265v1
Compressor summary: The article investigates how Wav2Vec2, a neural speech recognition model, compensates for assimilated sounds like [m] in "clea[m] pan" during Automatic Speech Recognition, using linguistic context cues to infer the intended sounds.
http://arxiv.org/abs/2406.15253v1
Compressor summary: This paper evaluates the risks of using generative models for biometrics and proposes an attack method on fingerprint datasets created by a generative network.
http://arxiv.org/abs/2406.15252v1
Compressor summary: The paper introduces VideoFeedback, a large-scale dataset for video quality assessment, and MantisScore, a new metric that correlates well with human judges, improving on existing metrics.
http://arxiv.org/abs/2406.15250v1
Compressor summary: The text discusses the theoretical aspects of reinforcement learning, focusing on non-linear function approximation using kernel-based prediction, and the need for better performance guarantees in this case.
http://arxiv.org/abs/2406.15245v1
Compressor summary: The paper introduces a deep learning approach to tokenize text with morphological structure guidance, which improves semantic information and outperforms existing methods on language modeling tasks.
http://arxiv.org/abs/2406.15244v1
Compressor summary: The paper analyzes the convergence properties of Adagrad, an adaptive gradient algorithm, on smooth objective functions for large batch sizes, and shows its advantages over SGD in theory and practice.
http://arxiv.org/abs/2406.15231v1
Compressor summary: The paper introduces a new dataset for detecting generated lyrics in music and evaluates various methods, showing that LLM2Vec outperforms previous approaches.
http://arxiv.org/abs/2406.15229v1
Compressor summary: ExDAG is a new method for learning causal structures from data using mixed-integer quadratic programming that performs well in identifying DAGs with up to 50 vertices and outperforms existing solvers.
http://arxiv.org/abs/2406.15227v1
Compressor summary: The paper proposes a new method to evaluate and generate counter narratives using large language models, which achieve high correlation with human judgments.
http://arxiv.org/abs/2406.15225v1
Compressor summary: The paper proposes a Deep Reinforcement Learning framework for planning UAV paths with good cellular network connectivity in urban scenarios.
http://arxiv.org/abs/2406.15214v1
Compressor summary: The paper proposes a method to extract and generate dialogue policies from conversational data using large language models and graph-based techniques for improved task-oriented dialogue systems.
http://arxiv.org/abs/2406.15213v1
Compressor summary: This paper shows how to inject biases into text-conditional image generative models with a backdoor that activates when triggered by specific words, highlighting the challenges of detecting and preventing such attacks.
http://arxiv.org/abs/2406.15211v1
Compressor summary: GPT-4 Turbo generates educational questions that require higher-order thinking skills, but its effectiveness varies by cognitive level and human evaluation.
http://arxiv.org/abs/2406.15198v1
Compressor summary: The study explores using advanced language models in robot-assisted therapy for ADHD, finding that ChatGPT-4 Turbo performs better for time-sensitive applications, while Claude-3 Opus prioritizes safe and engaging interactions.
http://arxiv.org/abs/2406.15193v1
Compressor summary: The paper proposes a new method to align LLM responses with user preferences by decoupling exploration and exploitation and using an evolutionary approach, which performs better than existing methods on two benchmarks.
http://arxiv.org/abs/2406.15189v1
Compressor summary: The paper introduces a test for evaluating how well methods learn causality from time-series data, using the Krebs cycle and other metabolic models as examples.
http://arxiv.org/abs/2406.15187v1
Compressor summary: The paper introduces UDA, a benchmark suite for evaluating LLMs and RAGs on real-world unstructured documents with expert-annotated Q&A pairs, highlighting the importance of parsing and retrieval.
http://arxiv.org/abs/2406.15182v1
Compressor summary: The paper proposes a method to generate counterfactual images that reveal influential features for AI model predictions, improving the reliability of medical image classification.
http://arxiv.org/abs/2406.15178v1
Compressor summary: The paper proposes a Hybrid Alignment Training (Hbat) method for large language models that alternates between instruction-following and human-preference alignment, improving their performance on summarization and dialogue tasks.