arxiv compressed, 2024-06-23

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-23 generated by the compressor, my personal LLM-based project.


Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

http://arxiv.org/abs/2406.14563v1

Compressor summary: The paper proposes a two-step method to create safer and more aligned large language models by generating and incorporating synthetic data during merging.


Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

http://arxiv.org/abs/2406.14562v1

Compressor summary: Whiteboard-of-thought prompting helps multimodal language models use images to solve visual reasoning tasks, improving their performance on four difficult natural language challenges.


How to Compute the Probability of a Word

http://arxiv.org/abs/2406.14561v1

Compressor summary: This paper presents correct methods for computing word probabilities in language models using subwords, highlighting issues with bow-marking tokenizers and showing how these corrections impact linguistic studies.


A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

http://arxiv.org/abs/2406.14555v1

Compressor summary: The text reviews multimodal-guided image editing techniques using text-to-image diffusion models, presenting a unified framework and discussing various editing scenarios, challenges, and future directions.


xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics

http://arxiv.org/abs/2406.14553v1

Compressor summary: We compress xCOMET, a machine translation evaluation metric, using distillation, quantization, and pruning techniques to make it more efficient and accessible.


Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

http://arxiv.org/abs/2406.14551v1

Compressor summary: SaSPA is a text-to-image diffusion method that generates diverse and accurate synthetic images for fine-grained visual classification tasks without using real images as guidance, outperforming existing methods in various settings.


GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

http://arxiv.org/abs/2406.14550v1

Compressor summary: GraphReader is a graph-based system that helps LLMs process long texts more effectively by using an agent to explore the text as a graph and generate answers based on gathered insights.


Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

http://arxiv.org/abs/2406.14549v1

Compressor summary: The study investigates how machine learning models memorize and leak sensitive information during training, and proposes a diagnostic test to detect latent memorized sequences.


Consistency Models Made Easy

http://arxiv.org/abs/2406.14548v1

Compressor summary: ECT improves training efficiency of consistency models by viewing them as a special case of diffusion models and fine-tuning them with a differential equation.


Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

http://arxiv.org/abs/2406.14546v1

Compressor summary: The text explores how large language models can infer hidden information from training data using inductive out-of-context reasoning, which could pose safety risks if they acquire dangerous knowledge.


Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

http://arxiv.org/abs/2406.14545v1

Compressor summary: The research develops a zero-knowledge framework to extract database schema elements from text-to-SQL models without compromising privacy and security.


Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

http://arxiv.org/abs/2406.14544v1

Compressor summary: Prism is a framework that separates perception and reasoning in vision language models to assess their strengths, improve performance, and reduce costs.


Are LLMs Naturally Good at Synthetic Tabular Data Generation?

http://arxiv.org/abs/2406.14541v1

Compressor summary: This paper shows that large language models struggle to generate realistic tables and proposes a method to make them better at it.


Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

http://arxiv.org/abs/2406.14539v1

Compressor summary: iCD is a new method to improve text-to-image generation by encoding real images into its latent space and enabling precise image manipulation using dynamic guidance.


MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading

http://arxiv.org/abs/2406.14537v1

Compressor summary: The paper proposes MacroHFT, a novel reinforcement learning method for high-frequency trading that combines multiple sub-agents with different financial indicators and a hyper-agent to handle market fluctuations using memory and context.


RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

http://arxiv.org/abs/2406.14532v1

Compressor summary: This paper investigates how finetuning LLMs on synthetic data affects math reasoning efficiency and proposes using per-step negatives to address spurious correlations and improve performance.


A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data

http://arxiv.org/abs/2406.14529v1

Compressor summary: The paper compares KANs and MLPs on tabular datasets, finding that KANs have better accuracy and F1 scores but are computationally more expensive than MLPs.


DeciMamba: Exploring the Length Extrapolation Potential of Mamba

http://arxiv.org/abs/2406.14528v1

Compressor summary: DeciMamba extends Mamba's context to handle long-range NLP tasks more effectively.


Fantastic Copyrighted Beasts and How (Not) to Generate Them

http://arxiv.org/abs/2406.14526v1

Compressor summary: The study evaluates how image and video generation models can produce copyrighted characters, even without explicit prompts, and suggests combining existing and new mitigation strategies to reduce this issue.


PostMark: A Robust Blackbox Watermark for Large Language Models

http://arxiv.org/abs/2406.14517v1

Compressor summary: The paper proposes PostMark, a post-hoc watermarking technique for LLM-generated text that does not require logit access and is more robust to paraphrasing attacks than existing methods.


MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

http://arxiv.org/abs/2406.14515v1

Compressor summary: MMBench-Video is a benchmark for evaluating large vision-language models' video understanding skills using free-form questions and YouTube videos.


Solving a Stackelberg Game on Transportation Networks in a Dynamic Crime Scenario: A Mixed Approach on Multi-Layer Networks

http://arxiv.org/abs/2406.14514v1

Compressor summary: The paper proposes a layered graph approach to model dynamic crime scenarios with moving attackers and defenders, and compares it with a MILP approach in terms of computational time and solution quality.


Investigating Mysteries of CoT-Augmented Distillation

http://arxiv.org/abs/2406.14511v1

Compressor summary: The authors investigate how chain of thought rationales improve model distillation and find that placing them after labels and using only a few key tokens are crucial for achieving better performance.


V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

http://arxiv.org/abs/2406.14510v1

Compressor summary: The paper proposes a weakly supervised diffusion model for consistent and identity-preserving removal of small attributes like glasses in videos, using synthetic imperfect data and strong video priors.


Evidence of a log scaling law for political persuasion with large language models

http://arxiv.org/abs/2406.14508v1

Compressor summary: The study shows that large language models can generate persuasive political messages, but their advantage over smaller models decreases significantly as they get bigger, and coherence is a key factor in their effectiveness.


On Newton's Method to Unlearn Neural Networks

http://arxiv.org/abs/2406.14507v1

Compressor summary: The paper proposes a cubic-regularized Newton's method for efficiently unlearning neural networks while mitigating catastrophic forgetting and preserving data ownership rights.


Translating Across Cultures: LLMs for Intralingual Cultural Adaptation

http://arxiv.org/abs/2406.14504v1

Compressor summary: The paper introduces a new task for evaluating how well large language models can adapt translations to different cultures, revealing their strengths and weaknesses in this area.


Overview of the CAIL 2023 Argument Mining Track

http://arxiv.org/abs/2406.14503v1

Compressor summary: The CAIL 2023 Argument Mining Track aims to identify and extract argument pairs from trial dialogs using summarized judgment documents and trial recordings, introducing new dataset CAIL2023-ArgMine with annotated cases.


Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

http://arxiv.org/abs/2406.14500v1

Compressor summary: This paper proposes a method to improve radiology report summarization by generating layperson summaries using non-expert communication techniques and few-shot learning, resulting in more accurate and accessible summaries.


LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

http://arxiv.org/abs/2406.14498v1

Compressor summary: The paper introduces SensorCaps, a dataset of IMU-derived activity narrations, OpenSQA, an instruction-following dataset, and LLaSA, a multimodal AI agent that can interpret and respond to questions about human activities and motions.


African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification

http://arxiv.org/abs/2406.14496v1

Compressor summary: FOCI is a multiple-choice benchmark for testing fine-grained object classification skills in LVLMs, which reveals CLIP models' superior performance over LVLMs.


rKAN: Rational Kolmogorov-Arnold Networks

http://arxiv.org/abs/2406.14495v1

Compressor summary: The paper introduces the rational KAN, a novel basis function for Kolmogorov-Arnold networks using Pade approximation and rational Jacobi functions, which are evaluated on deep learning and physics-informed tasks.


Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?

http://arxiv.org/abs/2406.14492v1

Compressor summary: The text discusses the problem of image caption hallucinations by large vision-language models and challenges the claim that adding grounding objectives reduces them, using a more realistic evaluation protocol.


Instruction Pre-Training: Language Models are Supervised Multitask Learners

http://arxiv.org/abs/2406.14491v1

Compressor summary: The paper introduces Instruction Pre-Training, a method that uses instruction-response pairs to enhance language models before fine-tuning them on specific tasks.


Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

http://arxiv.org/abs/2406.14485v1

Compressor summary: The workshop explored how XAI can enhance artistic expression and understanding in HCI, Interaction Design, AI, and digital arts fields.


Valid Error Bars for Neural Weather Models using Conformal Prediction

http://arxiv.org/abs/2406.14483v1

Compressor summary: The paper proposes a method to estimate uncertainty in neural weather forecasts, improving trust and usefulness of the predictions.


Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

http://arxiv.org/abs/2406.14482v1

Compressor summary: The paper introduces a large-scale benchmark dataset for multi-category visible-thermal small object detection (RGBT SOD) and proposes a robust evaluation measure called SAFit.


Revealing Vision-Language Integration in the Brain with Multimodal Networks

http://arxiv.org/abs/2406.14481v1

Compressor summary: The authors use deep neural networks to predict brain recordings from movies and identify sites where vision and language are integrated in the human brain.


On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

http://arxiv.org/abs/2406.14479v1

Compressor summary: The paper analyzes similarity between representations of transformer models' hidden layers using cosine similarity, proposes an aligned training approach to enhance similarity, and shows its benefits for multi-exit models.


Toward data-driven research: preliminary study to predict surface roughness in material extrusion using previously published data with Machine Learning

http://arxiv.org/abs/2406.14478v1

Compressor summary: This study proposes a machine learning model that predicts surface roughness in material extrusion processes using printing parameters, reducing the need for extensive experiments.


SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

http://arxiv.org/abs/2406.14477v1

Compressor summary: The SafeSora dataset helps align large vision models with human values in text-to-video generation tasks by providing preference annotations on helpfulness and harmlessness dimensions.


Learning telic-controllable state representations

http://arxiv.org/abs/2406.14476v1

Compressor summary: The text describes a novel approach to learning state representations in agents that couple descriptive and normative aspects via telic states, allowing for goal-directed behavior with minimal policy complexity.


Data-Centric AI in the Age of Large Language Models

http://arxiv.org/abs/2406.14473v1

Compressor summary: The paper proposes a data-centric viewpoint on large language models, highlighting the importance of data in various scenarios such as benchmarks, curation, attribution, and transfer, and suggesting new research directions to improve AI and LLM research.


Explicit and Implicit Large Language Model Personas Generate Opinions but Fail to Replicate Deeper Perceptions and Biases

http://arxiv.org/abs/2406.14462v1

Compressor summary: The paper explores using personas to make large language models more diverse and human-like in subjective social tasks, but finds they still struggle with implicit biases.


Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models

http://arxiv.org/abs/2406.14459v1

Compressor summary: This paper studies how BERT's performance degrades when some of its parameters are corrupted and then fine-tuned, finding that bottom-layer corruption is more harmful than top-layer corruption.


Centimeter Positioning Accuracy using AI/ML for 6G Applications

http://arxiv.org/abs/2406.14458v1

Compressor summary: The paper explores using AI/ML for highly accurate user positioning in 6G IIoT applications and reports promising results.


Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

http://arxiv.org/abs/2406.14457v1

Compressor summary: The paper proposes a reinforcement learning method that balances understanding and generation tasks in task-oriented dialogue systems, improving performance on three datasets and few-shot ability.


Capturing Temporal Components for Time Series Classification

http://arxiv.org/abs/2406.14456v1

Compressor summary: Key points: - The paper introduces a compositional representation learning approach for time series classification - It uses an unsupervised method to segment sequential data into coherent components based on change space - It shows competitive performance on public benchmarks Summary: The paper presents a novel method for time series classification that learns from coherent components of sequential data segmented by change space.


MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

http://arxiv.org/abs/2406.14455v1

Compressor summary: MM-GTUNets is a new framework that uses graph transformers and reward learning to predict brain disorders from multiple data types, outperforming existing methods on large datasets.


APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

http://arxiv.org/abs/2406.14449v1

Compressor summary: APEER is a novel algorithm that generates refined prompts for relevance ranking with large language models, improving performance over manual prompts and showing better transferability across tasks and LLMs.


Maintenance Required: Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes

http://arxiv.org/abs/2406.14446v1

Compressor summary: The paper presents a method to improve human activity recognition in smart homes using contrastive learning and seed points from an initial bootstrapping phase.


Graph Representation Learning Strategies for Omics Data: A Case Study on Parkinson's Disease

http://arxiv.org/abs/2406.14442v1

Compressor summary: The study compares different graph neural network architectures for case-control classification using omics data from Parkinson's disease and control samples.


Video Generation with Learned Action Prior

http://arxiv.org/abs/2406.14436v1

Compressor summary: The text proposes three models for stochastic video generation that incorporate camera motion and actions into image reconstruction using multi-modal learning.


Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

http://arxiv.org/abs/2406.14434v1

Compressor summary: The paper introduces a benchmark for evaluating truthfulness in multilingual language models and proposes FaMSS, a method to optimize data allocation across languages and data types, which reduces representation disparity and improves multilingual capabilities.


Control when confidence is costly

http://arxiv.org/abs/2406.14427v1

Compressor summary: The paper proposes a framework for efficient control with inference costs, where agents balance utility and task performance depending on the task demands, resulting in different inference strategies.


SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

http://arxiv.org/abs/2406.14425v1

Compressor summary: SynDARin is a method to create QA datasets for low-resource languages by using English paragraphs and generating synthetic questions and answers from them, which are then translated and validated.


FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

http://arxiv.org/abs/2406.14422v1

Compressor summary: FutureNet and Lane Occupancy Field (LOF) are proposed methods to improve motion prediction in autonomous driving by encoding future scenarios and jointly predicting lane occupancy of surrounding agents.


Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data

http://arxiv.org/abs/2406.14412v1

Compressor summary: The paper introduces two datasets for 3D canine pose estimation in different environments and analyzes various models' performance on them.


FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

http://arxiv.org/abs/2406.14408v1

Compressor summary: FVEL is a tool that combines formal verification with large language models to improve code verification efficiency and accuracy.


Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE

http://arxiv.org/abs/2406.14404v1

Compressor summary: The paper proposes QuEE, a dynamic network that combines quantization and early exiting to reduce computational resources during inference in machine learning models.


Fair Streaming Feature Selection

http://arxiv.org/abs/2406.14401v1

Compressor summary: FairSFS is a new algorithm for streaming feature selection that ensures fairness by preventing sensitive data from affecting the model output.


WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

http://arxiv.org/abs/2406.14399v1

Compressor summary: The WEATHER-5K dataset is a new, comprehensive, and publicly available resource for global station weather forecasting that improves accuracy by addressing limitations of existing datasets.


ATAC-Net: Zoomed view works better for Anomaly Detection

http://arxiv.org/abs/2406.14398v1

Compressor summary: ATAC-Net is a deep learning framework that uses a few known anomalous samples and attention-guided cropping to improve visual anomaly detection in quality control and manufacturing.


SEC-QA: A Systematic Evaluation Corpus for Financial QA

http://arxiv.org/abs/2406.14394v1

Compressor summary: SEC-QA is a framework for generating continuous financial dataset with multi-document QA pairs and refreshes using recent document collections.


Active Diffusion Subsampling

http://arxiv.org/abs/2406.14388v1

Compressor summary: ADS is a method to generate high-quality posterior distributions for partially observed signals by actively selecting measurements with maximum entropy using guided diffusion.


Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection

http://arxiv.org/abs/2406.14377v1

Compressor summary: FastECG is a computationally efficient semi-supervised learning method that adapts pre-trained models for robust detection of cardiovascular diseases using electrocardiography data with limited supervision.


Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory

http://arxiv.org/abs/2406.14373v1

Compressor summary: The text explores how large language models can simulate social dynamics and replicate forces that shape human societies using a simulated agent society based on Hobbes's Social Contract Theory.


Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

http://arxiv.org/abs/2406.14370v1

Compressor summary: The authors present a new dataset for bank check signature verification and propose an object detection network with a dilation module that improves detection and verification of genuine and forged signatures.


PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

http://arxiv.org/abs/2406.14367v1

Compressor summary: PoseBench is a benchmark to evaluate the robustness of pose estimation models against real-world corruption, revealing vulnerabilities in current state-of-the-art methods and suggesting design improvements.


Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

http://arxiv.org/abs/2406.14365v1

Compressor summary: Key points: - The text describes a weakly supervised learning task for pathological lymph node segmentation in the mediastinum. - The authors submitted a model that used pseudo labeling, TotalSegmentator toolbox, and public TCIA datasets to achieve third rank in the MICCAI2023 LNQ challenge. - They found that incorporating all visible lymph nodes improved segmentation performance and that models trained only on enlarged lymph nodes could not generalize to smaller ones. Summary: The authors participated in a weakly supervised learning challenge for lymph node segmentation in the mediastinum, using pseudo labeling and other methods to achieve third rank. They also discovered that incorporating all visible lymph nodes improved performance and that models trained on enlarged lymph nodes failed to generalize to smaller ones.


Robustness Analysis of AI Models in Critical Energy Systems

http://arxiv.org/abs/2406.14361v1

Compressor summary: The paper shows that current AI models for power grid control fail when a line is disconnected, and suggests using graph theory to improve them.


Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment

http://arxiv.org/abs/2406.14360v1

Compressor summary: The paper proposes a method, EBAD-NeRF, that uses event cameras to improve NeRF performance in scenes with motion blur by jointly optimizing camera poses and NeRF parameters.


Can you trust your explanations? A robustness test for feature attribution methods

http://arxiv.org/abs/2406.14349v1

Compressor summary: The text discusses the need for evaluating the stability and robustness of Explainable AI techniques, proposing a test for non-adversarial perturbations and an ensemble approach for analysing XAI methods in neural networks and tabular datasets.


iWISDM: Assessing instruction following in multimodal models at scale

http://arxiv.org/abs/2406.14343v1

Compressor summary: iWISDM is a new benchmark for evaluating multimodal models' ability to follow complex instructions in vision-language tasks, revealing a significant gap with human performance.


HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?

http://arxiv.org/abs/2406.14341v1

Compressor summary: HoTPP is a new benchmark for evaluating event sequence prediction models over a horizon, addressing the limitations of existing metrics and challenging traditional autoregressive methods.


Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction

http://arxiv.org/abs/2406.14336v1

Compressor summary: The text describes a method that uses a language model to extract spatial relations from historical narratives about the English Lake District, visualizing them as a network.


Self-supervised Interpretable Concept-based Models for Text Classification

http://arxiv.org/abs/2406.14335v1

Compressor summary: The paper proposes self-supervised Interpretable Concept Embedding Models (ICEMs) that can interpret and control Large-Language Models (LLMs) by predicting concept labels, offering meaningful explanations, and allowing human interventions.


Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization

http://arxiv.org/abs/2406.14329v1

Compressor summary: The paper introduces Adaptive Adversarial Cross-Entropy (AACE) loss for Sharpness-Aware Minimization (SAM), which improves model generalization by adjusting perturbation directions based on the model's convergence stage.


Computing Within Limits: An Empirical Study of Energy Consumption in ML Training and Inference

http://arxiv.org/abs/2406.14328v1

Compressor summary: The paper investigates Green ML, examining various model architectures and hyperparameters to identify energy-efficient practices for sustainable ML operations.


medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

http://arxiv.org/abs/2406.14326v1

Compressor summary: medIKAL is a framework that combines Large Language Models with knowledge graphs to enhance diagnostic capabilities in Electronic Medical Records by assigning weights, merging LLM results, and refining through path-based reranking and prompt templates.


Revealing the learning process in reinforcement learning agents through attention-oriented metrics

http://arxiv.org/abs/2406.14324v1

Compressor summary: The paper introduces attention-oriented metrics (ATOMs) to study how reinforcement learning agents learn to focus on different aspects of a game and how this affects their performance.


Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

http://arxiv.org/abs/2406.14322v1

Compressor summary: The text discusses the importance of ensuring uniform privacy protection for users when fine-tuning large language models on sensitive data using differential privacy techniques.


LiveMind: Low-latency Large Language Models with Simultaneous Inference

http://arxiv.org/abs/2406.14319v1

Compressor summary: The paper presents a low-latency inference framework for large language models that allows them to infer from incomplete prompts and reduces response time significantly while maintaining accuracy.


Identifying User Goals from UI Trajectories

http://arxiv.org/abs/2406.14314v1

Compressor summary: Key points: - The paper introduces goal identification from observed UI trajectories, a task to infer user intentions from GUI interactions. - It proposes a novel evaluation metric to measure paraphrasing of task descriptions within a specific UI environment. - It uses Android-In-The-Wild and Mind2Web datasets for experiments with humans and models (GPT-4 and Gemini-1.5 Pro). - It finds that Gemini performs better than GPT but still underperforms compared to humans. Summary: The paper presents a task and metric to infer user intentions from GUI interactions, tests them on two datasets, and shows that current models are inferior to humans.


Robust Few-shot Transfer Learning for Knowledge Base Question Answering with Unanswerable Questions

http://arxiv.org/abs/2406.14313v1

Compressor summary: The paper introduces a new method, FUn-FuSIC, that improves few-shot transfer for knowledge base question answering (KBQA) by handling unanswerable questions using logical forms and feedback from a large language model.


Infusing clinical knowledge into tokenisers for language models

http://arxiv.org/abs/2406.14312v1

Compressor summary: K-Tokeniser improves clinical text processing by using semantic-based tokenisation and requires no pre-training, leading to better results in various tasks.


Cross-level Requirement Traceability: A Novel Approach Integrating Bag-of-Words and Word Embedding for Enhanced Similarity Functionality

http://arxiv.org/abs/2406.14310v1

Compressor summary: The paper proposes a novel automated method to link high-level business requirements with technical system requirements using advanced natural language processing techniques and shows significant efficiency improvements over existing methods.


AI in Space for Scientific Missions: Strategies for Minimizing Neural-Network Model Upload

http://arxiv.org/abs/2406.14297v1

Compressor summary: The paper discusses using AI on spacecraft for efficient data analysis and transmission, demonstrating it with a CNN model for NASA's MMS mission that is reduced in size and precision while maintaining accuracy.


Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective

http://arxiv.org/abs/2406.14288v1

Compressor summary: MAGI is a community-aware graph clustering framework that uses modularity maximization as a contrastive pretext task to avoid semantic drift and achieve scalability, outperforming state-of-the-art methods on multiple datasets.


VAIYAKARANA : A Benchmark for Automatic Grammar Correction in Bangla

http://arxiv.org/abs/2406.14284v1

Compressor summary: The paper proposes a method to generate grammatically incorrect Bangla sentences and creates a corpus, Vaiyakarana, which can help improve automatic grammar correction in the language.


Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

http://arxiv.org/abs/2406.14283v1

Compressor summary: Q* is a framework that guides large language models to make better decisions in reasoning tasks without fine-tuning or additional computational cost.


Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

http://arxiv.org/abs/2406.14282v1

Compressor summary: The paper proposes a new framework to improve large language models' performance in complex question-answering tasks by using planning data from knowledge graphs.


FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability

http://arxiv.org/abs/2406.14281v1

Compressor summary: FairX is an open-source tool for analyzing and training models on fairness, utility, and explainability of data using various metrics and synthetic data generation.


Augmenting Query and Passage for Retrieval-Augmented Generation using LLMs for Open-Domain Question Answering

http://arxiv.org/abs/2406.14277v1

Compressor summary: The paper proposes a method for open-domain question-answering that improves retrieval by breaking down questions into sub-questions and adding self-generated passages to guide answer extraction.


Step-Back Profiling: Distilling User History for Personalized Scientific Writing

http://arxiv.org/abs/2406.14275v1

Compressor summary: Step-Back Profiling is a method to personalize large language models for scientific writing by capturing user characteristics, and it outperforms baselines on various tasks.


Learning to Discover Knowledge: A Weakly-Supervised Partial Domain Adaptation Approach

http://arxiv.org/abs/2406.14274v1

Compressor summary: SP-TCL is a simple and effective approach for weakly-supervised partial domain adaptation, which uses self-paced learning to discover and transfer knowledge across noisy labeled source and unlabeled target domains.


The Impact of AI on Perceived Job Decency and Meaningfulness: A Case Study

http://arxiv.org/abs/2406.14273v1

Compressor summary: The paper explores how AI might affect job satisfaction and meaning in IT work by interviewing experts who envision humans remaining dominant and AI as a complement.


MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

http://arxiv.org/abs/2406.14272v1

Compressor summary: The authors propose a new task, dataset, and model for generating realistic 3D talking heads from speech in different languages, improving lip-sync accuracy.


On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

http://arxiv.org/abs/2406.14267v1

Compressor summary: The paper analyzes existing evaluation frameworks for multilingual NLP models, discusses their limitations, and explores using machine translation to evaluate MLMs across a wide range of languages, showing that current approaches may overestimate performance on low-resource languages.


Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries

http://arxiv.org/abs/2406.14266v1

Compressor summary: The text describes a novel machine learning tool that helps academic educators improve their lectures by automatically analysing lecture videos and providing feedback on didactic features.


VeriFlow: Modeling Distributions for Neural Network Verification

http://arxiv.org/abs/2406.14265v1

Compressor summary: VeriFlow is an architecture that uses a flow-based density model to allow verifying neural networks' safety and reliability using SMT and abstract interpretation methods with fine-grained probabilistic control.


Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification

http://arxiv.org/abs/2406.14261v1

Compressor summary: The paper proposes a self-supervised method for unsupervised video person re-identification using tracklet partitioning, clustering, and pseudo label generation, achieving state-of-the-art results.


MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

http://arxiv.org/abs/2406.14259v1

Compressor summary: MEAT is a method to improve model robustness in adversarial training by searching for the median of historical model weights, reducing weight anomalies and overfitting.


DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

http://arxiv.org/abs/2406.14255v1

Compressor summary: DuMapNet is an end-to-end system that creates standardized, vectorized maps of lanes in cities using a transformer-based network and contextual information from neighboring areas, reducing costs by 95%.


E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

http://arxiv.org/abs/2406.14250v1

Compressor summary: The paper introduces E-ANT, a large Chinese GUI navigation dataset with human traces and screenshots, to help improve and evaluate multimodal language models for this task.


CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

http://arxiv.org/abs/2406.14240v1

Compressor summary: CityNav is a new dataset for language-goal aerial navigation using real-world cities' point cloud data, which reveals the importance of human-driven strategies and 2D spatial maps for efficient city-scale navigation.


LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection

http://arxiv.org/abs/2406.14239v1

Compressor summary: Key points: - The paper proposes optimizations for YOLO-based object detection models to improve efficiency - The optimizations include efficient backbone scaling, Fast Pyramidal Architecture Network (FPAN), and Decoupled Network-in-Network (DNiN) detection head - The new model family, LeYOLO, achieves a competitive FLOP-to-accuracy ratio in various resource constraints Summary: The paper introduces LeYOLO, a new model family for efficient YOLO-based object detection, which incorporates several optimizations such as backbone scaling, FPAN, and DNiN. LeYOLO outperforms existing models in terms of FLOP and accuracy in different resource settings.


Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

http://arxiv.org/abs/2406.14235v1

Compressor summary: The proposed method uses paired human-robot videos to adapt pre-trained models for robotic manipulation tasks, improving performance on different benchmarks.


Enhancing robustness of data-driven SHM models: adversarial training with circle loss

http://arxiv.org/abs/2406.14232v1

Compressor summary: The paper proposes an adversarial training method for structural health monitoring that improves model robustness by using circle loss to keep examples away from the decision boundary.


aeon: a Python toolkit for learning from time series

http://arxiv.org/abs/2406.14231v1

Compressor summary: aeon is a Python library that offers various machine learning tasks for time series data using efficient algorithms and a scikit-learn compatible API.


Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

http://arxiv.org/abs/2406.14230v1

Compressor summary: GET AETA is a new way to test how well large language models follow ethical guidelines by dynamically creating challenges based on each model's abilities.


EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

http://arxiv.org/abs/2406.14228v1

Compressor summary: EvoAgent is a method to create diverse multi-agent systems from existing large language models by applying evolutionary algorithms, improving their problem-solving abilities.


Uncertainty and Self-Supervision in Single-View Depth

http://arxiv.org/abs/2406.14226v1

Compressor summary: Key points: - Single-view depth estimation is an ill-posed problem with multiple depth solutions from a single image - Uncertainty in depth predictions can be harmful for applications like autonomous driving or medical robotics - Bayesian deep neural networks can quantify uncertainty, but may need synthetic data to transition to real domain - Self-supervised teacher-student architecture using illumination as a self-supervisory signal can improve depth estimation from endoscopic images Summary: The authors propose a self-supervised method for single-view depth estimation that uses illumination cues and a teacher-student architecture to account for uncertainty in depth predictions, especially for medical robotics.


Evaluation of Deep Learning Semantic Segmentation for Land Cover Mapping on Multispectral, Hyperspectral and High Spatial Aerial Imagery

http://arxiv.org/abs/2406.14220v1

Compressor summary: This study used deep learning techniques to improve land cover mapping accuracy using different types of satellite images, and found that the LinkNet model performed well with multispectral images.


Proving Olympiad Algebraic Inequalities without Human Demonstrations

http://arxiv.org/abs/2406.14219v1

Compressor summary: The paper presents AIPS, an autonomous system that generates complex inequality theorems and solves high-level mathematical problems without human guidance or large datasets, outperforming existing methods.


REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability

http://arxiv.org/abs/2406.14214v1

Compressor summary: REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments using visualizations and a GNN-based explainer.


Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

http://arxiv.org/abs/2406.14213v1

Compressor summary: The paper proposes adding symbolic working memory to Transformers for machine translation tasks, improving prediction quality by storing relevant keywords and handling text diversity.


SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots

http://arxiv.org/abs/2406.14208v1

Compressor summary: SeCoKD is a method that uses self-knowledge distillation to improve the In-Context Learning ability of large language models with fewer demonstrations and better performance on reasoning tasks.


LayerMatch: Do Pseudo-labels Benefit All Layers?

http://arxiv.org/abs/2406.14207v1

Compressor summary: LayerMatch is a layer-specific pseudo-label strategy that improves semi-supervised learning performance by mitigating the impact of noisy labels in the linear classification layer and accelerating clustering in the feature extraction layer.


Live Video Captioning

http://arxiv.org/abs/2406.14206v1

Compressor summary: The text introduces Live Video Captioning, a new approach for generating captions in real-time for video streams, and proposes a model using deformable transformers and temporal filtering to overcome its challenges.


Trusting Semantic Segmentation Networks

http://arxiv.org/abs/2406.14201v1

Compressor summary: The paper analyzes failure cases and predicts segmentation errors in computer vision tasks using uncertainty-based metrics like entropy.


On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

http://arxiv.org/abs/2406.14197v1

Compressor summary: The text explains how chain-of-thought reasoning enhances language models' performance by extending their computational power to a level similar to probabilistic Turing machines.


VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

http://arxiv.org/abs/2406.14194v1

Compressor summary: VLBiasBench is a comprehensive benchmark to evaluate social biases in large vision-language models using diverse images and questions.


Timo: Towards Better Temporal Reasoning for Language Models

http://arxiv.org/abs/2406.14192v1

Compressor summary: Key points: - The text proposes a universal framework for temporal reasoning tasks using LLMs - It studies 38 temporal reasoning tasks and finds that mathematical data helps but is not enough - It introduces a self-critic temporal optimization method to enhance temporal reasoning - It develops Timo, a model that achieves SOTA performance in temporal reasoning Summary: The text presents a framework for improving LLMs' temporal reasoning abilities using mathematical data and a novel optimization method. It demonstrates the framework's effectiveness with Timo, a state-of-the-art temporal reasoning model.


Temporal Knowledge Graph Question Answering: A Survey

http://arxiv.org/abs/2406.14191v1

Compressor summary: This paper surveys temporal knowledge graph question answering (TKGQA) methods, categorizes temporal questions, and suggests future research directions.


In Tree Structure Should Sentence Be Generated

http://arxiv.org/abs/2406.14189v1

Compressor summary: The paper introduces a new natural language generation method using tree-traversing order and compares it with diffusion models in graphic generation, while also presenting SenTree, a module for generating approximate binary trees.


Latent. Functional Map

http://arxiv.org/abs/2406.14183v1

Compressor summary: The text introduces a framework that uses spectral geometry principles to compare and align different neural model representations, improving interpretability and performance on various applications.


EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

http://arxiv.org/abs/2406.14178v1

Compressor summary: The paper proposes EvSegSNN, a low-power semantic segmentation method using spiking neural networks and event cameras, achieving better performance than existing models with less parameters and computation.


SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

http://arxiv.org/abs/2406.14177v1

Compressor summary: The paper presents SimulSeamless, a system for speech-to-text translation that combines two models and performs well in multiple languages at IWSLT 2024.


Ranking LLMs by compression

http://arxiv.org/abs/2406.14171v1

Compressor summary: The paper proposes ranking LLMs based on data compression, showing that compression length reflects the model's performance, and suggests using compression ratio as a metric for evaluating large language models.


Definition generation for lexical semantic change detection

http://arxiv.org/abs/2406.14167v1

Compressor summary: The authors propose a method that uses large language models to generate word definitions as senses, which helps detect semantic changes over time and provides interpretability.


A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning

http://arxiv.org/abs/2406.14164v1

Compressor summary: The paper proposes a new data-driven guided decoding method for generating diagnostic texts from medical images that incorporates existing tags of key conditions, and evaluates it on different systems and datasets.


Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations

http://arxiv.org/abs/2406.14161v1

Compressor summary: AMBER is an imitation learning approach that uses a graph neural network to predict the optimal mesh resolution for complex engineering systems based on expert examples.


Aligning Large Language Models with Diverse Political Viewpoints

http://arxiv.org/abs/2406.14155v1

Compressor summary: The paper proposes methods to reduce political biases in large language models and generate balanced overviews from diverse viewpoints.


Multi-modal Transfer Learning between Biological Foundation Models

http://arxiv.org/abs/2406.14150v1

Compressor summary: The authors propose a multi-modal model called IsoFormer that connects DNA, RNA, and proteins to predict differential transcript expression across human tissues.


Finding Safety Neurons in Large Language Models

http://arxiv.org/abs/2406.14144v1

Compressor summary: The paper investigates how to identify and analyze the neurons responsible for safety behaviors in large language models, which can help improve their alignment and reduce risks.


MACAROON: Training Vision-Language Models To Be Your Engaged Partners

http://arxiv.org/abs/2406.14137v1

Compressor summary: The text discusses the problem of LVLMs generating irrelevant or biased responses, and proposes a method called MACAROON to improve their ability to ask for clarifications and generate contrastive response pairs.


Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

http://arxiv.org/abs/2406.14132v1

Compressor summary: The CoMAN method helps Online Food Ordering Services allocate budgets more efficiently by predicting users' response to incentives and adapting to spatio-temporal preferences, leading to higher conversion rates and orders.


Detecting sexually explicit content in the context of the child sexual abuse materials (CSAM): end-to-end classifiers and region-based networks

http://arxiv.org/abs/2406.14131v1

Compressor summary: The study proposes methods for classifying sexually explicit content using different approaches, aiming to improve automated detection of child sexual abuse materials (CSAM) and reduce human reviewers' exposure to harmful images.


ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

http://arxiv.org/abs/2406.14130v1

Compressor summary: The paper proposes ExVideo, a post-tuning method that enhances video synthesis models to generate longer videos with less training time and without compromising quality or generalization capabilities.


Towards Event-oriented Long Video Understanding

http://arxiv.org/abs/2406.14129v1

Compressor summary: Event-Bench is a benchmark for evaluating video event understanding in multimodal large language models, addressing the short-cut bias issue and providing a cost-effective method to enhance video MLLMs using merged video instructions.


Measuring Sample Importance in Data Pruning for Training LLMs from a Data Compression Perspective

http://arxiv.org/abs/2406.14124v1

Compressor summary: The authors propose data pruning based on sample importance measured by information content, which improves the performance of large language models when training with limited data.


EduQate: Generating Adaptive Curricula through RMABs in Education Settings

http://arxiv.org/abs/2406.14122v1

Compressor summary: EduQate is a method for personalized learning that uses a network to model interdependent content and Q-learning to optimize arm selection based on student progress.


Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach

http://arxiv.org/abs/2406.14120v1

Compressor summary: The paper introduces a new HSI classification model combining CNNs for local feature extraction and transformers for long-range context modelling, which outperforms existing methods on four datasets.


Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models

http://arxiv.org/abs/2406.14115v1

Compressor summary: The text reviews existing methods for data selection in fine-tuning Large Language Models, proposes a three-stage scheme, and compares them using a unified method to improve efficiency and feasibility.