arxiv compressed, 2024-06-19

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-19 generated by the compressor, my personal LLM-based project.


Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

http://arxiv.org/abs/2406.12849v1

Compressor summary: Our new framework uses perspective depth estimation models to generate pseudo labels for 360-degree images, improving depth estimation accuracy in virtual reality and other applications.


ChangeViT: Unleashing Plain Vision Transformers for Change Detection

http://arxiv.org/abs/2406.12847v1

Compressor summary: The paper introduces ChangeViT, a framework that uses vision transformers to enhance change detection in remote sensing images by capturing both large-scale and fine-grained changes, achieving state-of-the-art performance on various datasets.


DrVideo: Document Retrieval Based Long Video Understanding

http://arxiv.org/abs/2406.12846v1

Compressor summary: DrVideo is a system that converts long videos into text documents to leverage large language models for understanding and answering questions about them, achieving state-of-the-art results on various benchmarks.


Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

http://arxiv.org/abs/2406.12845v1

Compressor summary: The paper proposes a new method for training human-interpretable reward models that use multiple objectives to guide large language models based on human preferences, achieving state-of-the-art results on RewardBench.


Can Go AIs be adversarially robust?

http://arxiv.org/abs/2406.12843v1

Compressor summary: The paper investigates whether simple defenses can improve KataGo's performance against adversarial strategies in Go and finds that none of them can withstand adaptive attacks.


Demystifying Higher-Order Graph Neural Networks

http://arxiv.org/abs/2406.12841v1

Compressor summary: The text introduces a taxonomy and blueprint for higher-order graph neural networks (HOGNNs) to analyze, compare, and select the best model for a given scenario.


Evaluating the design space of diffusion-based generative models

http://arxiv.org/abs/2406.12839v1

Compressor summary: This article analyzes the generation process of diffusion models, providing insights on how to design training and sampling for effective generation, and comparing with existing methods.


LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

http://arxiv.org/abs/2406.12837v1

Compressor summary: LayerMerge is a new method that reduces the number of layers in convolutional neural networks by pruning activation functions and convolution layers, achieving efficiency and performance gains without increasing kernel size.


Influence Maximization via Graph Neural Bandits

http://arxiv.org/abs/2406.12835v1

Compressor summary: The paper proposes a framework that uses neural bandit algorithms and graph convolutional networks to maximize the number of users influenced in multi-round diffusion campaigns with unknown network topology.


GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

http://arxiv.org/abs/2406.12834v1

Compressor summary: The paper proposes a method called GroPrompt that uses Text-Aware Prompt Contrastive Learning to train video object segmentation models with weak supervision, achieving competitive results on standard benchmarks.


LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

http://arxiv.org/abs/2406.12832v1

Compressor summary: LaMDA is a novel approach for fine-tuning large language models that reduces trainable parameters, GPU memory, and compute cost by using low-dimensional adaptation and two projection matrices.


VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

http://arxiv.org/abs/2406.12831v1

Compressor summary: VIA is a framework for adaptive video editing that ensures global and local consistency in spatiotemporal dimensions for minute-long videos.


What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

http://arxiv.org/abs/2406.12830v1

Compressor summary: The paper evaluates the probabilistic reasoning capabilities of language models using various tasks and contexts, and releases a new dataset for this purpose.


From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

http://arxiv.org/abs/2406.12824v1

Compressor summary: The paper examines how language models use external context to answer questions, finding that they rely heavily on context information and minimally on their own memory.


Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

http://arxiv.org/abs/2406.12822v1

Compressor summary: This study examines how using native or translated data during tuning and evaluation affects multilingual language models' performance, revealing differences in high-performance scenarios and suggesting regularization helps for structured tasks.


Neural Approximate Mirror Maps for Constrained Diffusion Models

http://arxiv.org/abs/2406.12816v1

Compressor summary: The paper proposes neural approximate mirror maps (NAMMs) for imposing constraints on diffusion models, which improve their accuracy and reliability in generating valid synthetic data and solving constrained inverse problems.


Adversarial Attacks on Multimodal Agents

http://arxiv.org/abs/2406.12814v1

Compressor summary: The paper demonstrates new safety risks of multimodal agents using adversarial text strings to manipulate image-based VLMs, showing how different VLMs vary in their robustness and discussing potential defenses.


Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

http://arxiv.org/abs/2406.12809v1

Compressor summary: The ConsisEval benchmark evaluates the inconsistency of large language models by comparing their answers to pairs of easy and hard questions, finding that GPT-4 is the most consistent but still has room for improvement.


Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

http://arxiv.org/abs/2406.12808v1

Compressor summary: The text discusses the use of Graph Neural Networks (GNNs) as a promising alternative to Convolutional Neural Networks (CNNs) for analyzing histopathological images, and identifies four emerging trends in this field.


Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs

http://arxiv.org/abs/2406.12807v1

Compressor summary: This paper proposes a novel neural network model that uses medical images and data to predict disease progression and treatment effects for multiple sclerosis, allowing for more personalized medicine and better understanding of patient subgroups.


AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

http://arxiv.org/abs/2406.12805v1

Compressor summary: Our method learns adaptive inclusive tokens to shift attribute distribution and reduce biases in text-to-image generation without explicit attribute specification or prior knowledge.


Scalable Rule Lists Learning with Sampling

http://arxiv.org/abs/2406.12803v1

Compressor summary: The paper proposes a scalable algorithm that learns nearly optimal rule lists from large datasets using sampling and guarantees on the quality of the approximation.


The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

http://arxiv.org/abs/2406.12795v1

Compressor summary: The paper explores how to maximize state entropy under partial observability by bounding the approximation of true state entropy using observation properties and regularizing observation entropy.


ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

http://arxiv.org/abs/2406.12793v1

Compressor summary: ChatGLM is a family of large language models that rival or outperform GPT-4 in various tasks, with GLM-4 All Tools being able to autonomously use different tools for complex tasks.


Generating Educational Materials with Different Levels of Readability using LLMs

http://arxiv.org/abs/2406.12787v1

Compressor summary: The study tests different AI models on generating educational materials with controlled readability levels and finds that LLaMA-2 70B performs best, but there are concerns about quality and accuracy.


In-Context Learning of Energy Functions

http://arxiv.org/abs/2406.12785v1

Compressor summary: The text introduces a new approach to machine learning called in-context learning of energy functions, which can handle more complex settings than traditional in-context learning methods.


UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions

http://arxiv.org/abs/2406.12784v1

Compressor summary: UBENCH is a benchmark for evaluating large language models' reliability, covering various abilities and saving computational resources compared to previous methods.


Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

http://arxiv.org/abs/2406.12779v1

Compressor summary: The paper proposes CNLC, CNL, and CFM methods for data augmentation to address the scarcity of annotated resources for NNER tasks and improve performance on ACE2004 and ACE2005 benchmarks.


Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

http://arxiv.org/abs/2406.12775v1

Compressor summary: The study analyzes how large language models perform multi-hop queries and proposes a "back-patching" method to improve their latent reasoning.


Towards Exact Gradient-based Training on Analog In-memory Computing

http://arxiv.org/abs/2406.12774v1

Compressor summary: The paper explores the challenges of training AI models on analog devices, proposing Tiki-Taka algorithm as a better alternative to SGD for converging exactly and avoiding asymptotic errors.


Formatics & dairy industry coalition: AI trends and present challenges

http://arxiv.org/abs/2406.12770v1

Compressor summary: AI can improve the dairy industry by enhancing production, minimizing manual tasks, and addressing challenges through novel technologies like Machine Learning.


Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

http://arxiv.org/abs/2406.12769v1

Compressor summary: Latent intuitive physics is a framework that can simulate fluids from 3D videos without knowing their physical properties.


Unsupervised explainable activity prediction in competitive Nordic Walking from experimental data

http://arxiv.org/abs/2406.12762v1

Compressor summary: The paper proposes an online unsupervised clustering method using wearable IMUs for Human Activity Recognition (HAR) in sports, enabling explainable classification and detection of cheating in Nordic Walking.


MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

http://arxiv.org/abs/2406.12757v1

Compressor summary: The authors introduce the Multi-Attribute Composition (MAC) dataset, which provides more realistic and diverse attribute annotations for compositional zero-shot learning tasks.


GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping

http://arxiv.org/abs/2406.12756v1

Compressor summary: Key points: - Machine Learning for Mineral Prospectivity Mapping is challenging and requires large-scale geospatial data and few labeled observations - Deep Learning methods may overfit due to scarce labels, but self-supervised learning using unlabeled data can improve robustness and interpretability - The approach uses a masked image modeling framework to pretrain a neural network for feature extraction and downstream tasks - The method is evaluated on MVT and CD deposits in North America and Australia and shows improved predictions and explainability Summary: The paper proposes a self-supervised learning approach for Mineral Prospectivity Mapping using unlabeled geospatial data and a masked image modeling framework. The method improves feature robustness and interpretability, and outperforms existing methods on MVT and CD deposits.


Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

http://arxiv.org/abs/2406.12754v1

Compressor summary: Chumor is a dataset of culturally nuanced Chinese jokes with explanations that challenge and outperform state-of-the-art language models.


OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

http://arxiv.org/abs/2406.12753v1

Compressor summary: OlympicArena is a benchmark for evaluating AI's cognitive reasoning abilities using complex problems from various scientific disciplines and modalities, revealing current limitations in advanced models like GPT-4o.


TSI-Bench: Benchmarking Time Series Imputation

http://arxiv.org/abs/2406.12747v1

Compressor summary: TSI-Bench is a benchmark suite for evaluating deep learning algorithms for time series imputation tasks, considering different missingness ratios and patterns.


Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA

http://arxiv.org/abs/2406.12746v1

Compressor summary: REACT is a K-VQA method that combines multiple question-answering tactics using decision contexts and rationales to generate and select answer candidates, achieving better performance than LLM-based baselines.


Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

http://arxiv.org/abs/2406.12742v1

Compressor summary: MIRB is a new benchmark to evaluate visual language models' ability to reason across multiple images, revealing gaps in current models and highlighting the need for further research.


Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

http://arxiv.org/abs/2406.12739v1

Compressor summary: MT-LLMs combine machine translation encoders with large language models to improve natural language understanding for underrepresented languages.


Large Language Model as a Universal Clinical Multi-task Decoder

http://arxiv.org/abs/2406.12738v1

Compressor summary: The paper proposes a universal language model decoder for handling diverse clinical tasks with minimal task-specific adaptation, achieving comparable or better performance than existing methods.


Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

http://arxiv.org/abs/2406.12736v1

Compressor summary: The PrivacyGuard framework uses a structured scene graph and data augmentation to determine privacy classes for objects in images based on scene contexts.


Automatic generation of insights from workers' actions in industrial workflows with explainable Machine Learning

http://arxiv.org/abs/2406.12732v1

Compressor summary: The paper proposes an explainable machine learning solution that uses data from manufacturing processes and workers' performance to evaluate productivity, differentiate between expert and inexpert workers, and generate insights to improve industrial workflows.


Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

http://arxiv.org/abs/2406.12725v1

Compressor summary: The paper proposes a new method to induce sound laws between ancestor and descendant languages using large language models and Python programs trained on examples, which complements existing methods' weaknesses.


BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

http://arxiv.org/abs/2406.12723v1

Compressor summary: The paper introduces the BIOSCAN-5M Insect dataset with multi-modal data for over 5 million insect specimens and proposes three benchmark tasks to evaluate its impact on classification and clustering accuracy.


On the Robustness of Language Models for Tabular Question Answering

http://arxiv.org/abs/2406.12719v1

Compressor summary: The study examines how different factors affect large language models' ability to understand and answer questions based on tables, finding that instructions improve performance but there are still challenges with data quality and reliability.


AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

http://arxiv.org/abs/2406.12718v1

Compressor summary: The paper proposes AGLA, a method to reduce object hallucinations in large vision-language models by combining global and local image features for response generation and visual discrimination.


Self-Localized Collaborative Perception

http://arxiv.org/abs/2406.12712v1

Compressor summary: $\mathtt{CoBEVGlue}$ is a novel collaborative perception system that aligns agents using co-visible objects and achieves robust performance under localization errors and attacks.


Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

http://arxiv.org/abs/2406.12709v1

Compressor summary: The paper proposes a new framework that combines three types of curriculum learning for training models on spatio-temporal data, improving their performance and addressing complex challenges.


AgentReview: Exploring Peer Review Dynamics with LLM Agents

http://arxiv.org/abs/2406.12708v1

Compressor summary: AgentReview is a simulation framework using a large language model that reveals how reviewer biases affect paper decisions in scientific publication.


Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

http://arxiv.org/abs/2406.12707v1

Compressor summary: PerceptiveAgent is a multi-modal dialogue system that uses LLMs to perceive acoustic information and generate empathetic responses, improving contextual understanding in Human-AI communication.


Jailbreak Paradox: The Achilles' Heel of LLMs

http://arxiv.org/abs/2406.12702v1

Compressor summary: The paper presents two paradoxes about detecting jailbreaks in foundation models, showing their impossibility and inconsistency, with examples and implications.


SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

http://arxiv.org/abs/2406.12700v1

Compressor summary: SUPER is a method that eliminates distortions and adjusts head pose in close-up face crops using 3D GAN inversion and visibility-based blending.


Online-Adaptive Anomaly Detection for Defect Identification in Aircraft Assembly

http://arxiv.org/abs/2406.12698v1

Compressor summary: The paper proposes an online-adaptive anomaly detection framework using transfer learning that adapts to different environments and achieves high detection accuracy by computing Mahalanobis distance between a normality model and test image features.


XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges

http://arxiv.org/abs/2406.12693v1

Compressor summary: XXLTraffic is a large public traffic dataset with long timespan and increasing sensor nodes, curated to support research in ultra-dynamic forecasting and address practical constraints.


MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

http://arxiv.org/abs/2406.12692v1

Compressor summary: MAGIC is a method that uses three agents to create and refine a self-correction guideline for text-to-SQL, improving the performance and interpretability of large language models in this task.


Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

http://arxiv.org/abs/2406.12687v1

Compressor summary: The paper shows how contemporary language models can help with mental health research tasks like data collection, annotation, and instrument deployment by using a dataset of 644 participants with different diagnoses.


Spatial Sequence Attention Network for Schizophrenia Classification from Structural Brain MR Images

http://arxiv.org/abs/2406.12683v1

Compressor summary: The study introduces a deep learning method using Spatial Sequence Attention to classify individuals with Schizophrenia based on structural MRI features extracted from pre-trained DenseNet.


Measuring Psychological Depth in Language Models

http://arxiv.org/abs/2406.12680v1

Compressor summary: The Psychological Depth Scale (PDS) measures how well large language models (LLMs) can create stories that emotionally engage readers, and shows that GPT-4 stories are comparable to highly-rated human stories.


Vernacular? I Barely Know Her: Challenges with Style Control and Stereotyping

http://arxiv.org/abs/2406.12679v1

Compressor summary: The study evaluates five large language models on two style control tasks and finds inconsistencies, cultural insensitivity, and varying performance levels across models.


Estimating Knowledge in Large Language Models Without Generating a Single Token

http://arxiv.org/abs/2406.12673v1

Compressor summary: This paper proposes KEEN, a probe that evaluates the knowledge of large language models about entities before generating any text, by analyzing their internal computation.


GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models

http://arxiv.org/abs/2406.12671v1

Compressor summary: Key points: - The text discusses the challenges and advances in monocular geometry estimation methods, which estimate 3D shapes from single images. - It compares discriminative and generative models, and shows that data quality is more important than data scale or model architecture for achieving good generalization. - It proposes new benchmarks with diverse scenes and high-quality annotations to evaluate and analyze the models. Summary: The text summarizes recent progress and challenges in monocular geometry estimation, a task that estimates 3D shapes from images. It shows that data quality matters more than other factors for model performance, and introduces new benchmarks to test and improve the models.


Stealth edits for provably fixing or attacking large language models

http://arxiv.org/abs/2406.12670v1

Compressor summary: The authors introduce new methods, theory, and a jet-pack block for editing large language models, and show how intrinsic dimensionality predicts editability and vulnerability to stealth attacks.


Disturbing Image Detection Using LMM-Elicited Emotion Embeddings

http://arxiv.org/abs/2406.12668v1

Compressor summary: The paper presents a method to detect disturbing images using large multimodal models and CLIP's text and image encoders, which achieves state-of-the-art results.


A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning

http://arxiv.org/abs/2406.12667v1

Compressor summary: Wagner proposed using Reinforcement Learning to test conjectures in graph theory by building graphs step-by-step and maximizing a score related to the conjecture.


CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis

http://arxiv.org/abs/2406.12665v1

Compressor summary: The text describes the creation of CollabStory, a dataset with LLM-generated collaborative stories, to explore and study multi-LLM scenarios for writing tasks.


Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

http://arxiv.org/abs/2406.12663v1

Compressor summary: The paper proposes a new decoding strategy for image captioning that generates detailed captions with low object hallucination and introduces reliable evaluation metrics to assess this performance.


Online Anchor-based Training for Image Classification Tasks

http://arxiv.org/abs/2406.12662v1

Compressor summary: The paper introduces Online Anchor-based Training (OAT), a novel method for image classification that improves performance by training models to learn percentage changes of class labels with respect to batch centers, and evaluates it on four datasets.


SCORE: A 1D Reparameterization Technique to Break Bayesian Optimization's Curse of Dimensionality

http://arxiv.org/abs/2406.12661v1

Compressor summary: SCORE is a fast and scalable Bayesian optimization method that uses a 1D reparametrization trick to overcome computational challenges in high-dimensional landscapes.


Investigating the Role of Explainability and AI Literacy in User Compliance

http://arxiv.org/abs/2406.12660v1

Compressor summary: The study explores how Explainable AI (XAI) affects users' compliance with AI recommendations and the role of AI literacy and mental models in this process.


Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

http://arxiv.org/abs/2406.12655v1

Compressor summary: The paper reviews existing methods to evaluate machine learning models that generate program code from natural language input, focusing on benchmarks and metrics.


Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

http://arxiv.org/abs/2406.12649v1

Compressor summary: Key points: - ViTs are powerful vision models but lack trustworthy explanations - The paper proposes five criteria for explaining ViTs and shows the limitations of existing methods - The paper introduces PACE, a variational Bayesian framework that provides faithful, stable, sparse, multi-level, and parsimonious explanations for ViT predictions Summary: The paper presents PACE, a new method to explain ViTs using variational Bayesian techniques, which meets five desiderata for trustworthy explanations and outperforms existing approaches.


Evaluating Transparency of Machine Generated Fact Checking Explanations

http://arxiv.org/abs/2406.12645v1

Compressor summary: The study compares human-curated and machine-selected evidence for generating fact-checking explanations using large language models, finding that machine-selected evidence can generate similar or higher quality explanions.


Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models

http://arxiv.org/abs/2406.12644v1

Compressor summary: The text introduces the Hierarchical Prompting Taxonomy, a framework for assessing large language models' abilities on diverse tasks using different prompting strategies, and compares its effectiveness with existing evaluation methods.


DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

http://arxiv.org/abs/2406.12641v1

Compressor summary: The paper introduces DetectBench, a benchmark to test implicit evidence detection in long contexts, and proposes Detective Reasoning Prompt and Finetuning methods to improve LLMs' abilities in this task.


Research and Implementation of Data Enhancement Techniques for Graph Neural Networks

http://arxiv.org/abs/2406.12640v1

Compressor summary: The paper analyzes the data enhancement technology of graph neural networks for deep learning applications with limited or costly data sets.


Ask-before-Plan: Proactive Language Agents for Real-World Planning

http://arxiv.org/abs/2406.12639v1

Compressor summary: The authors propose a new task and benchmark dataset for language agents to predict clarification needs, invoke external tools, and generate plans based on user instructions, using a multi-agent framework called Clarification-Execution-Planning.


Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model

http://arxiv.org/abs/2406.12638v1

Compressor summary: Candle is a framework that improves CLIP's performance on downstream tasks with long-tailed and few-shot data by using compensating logit-adjusted loss, cross-modal attention, and virtual prototypes for new classes.


SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation

http://arxiv.org/abs/2406.12629v1

Compressor summary: SeTAR is a novel OOD detection method that uses low-rank approximation of weight matrices and achieves superior performance on ImageNet1K and Pascal-VOC benchmarks.


Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

http://arxiv.org/abs/2406.12624v1

Compressor summary: The paper evaluates the performance of different large language models acting as judges and compares them with human annotations on TriviaQA, highlighting the importance of Cohen's kappa and the potential biases in this paradigm.


Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech

http://arxiv.org/abs/2406.12621v1

Compressor summary: The article compares two parsing methods for speech and shows that graph-based parsing performs best and is more efficient than a pipeline approach using ASR and syntactic parsers.


What makes two models think alike?

http://arxiv.org/abs/2406.12620v1

Compressor summary: The paper proposes MLEMs, a method that compares how different layers of language models represent linguistic information, providing transparent comparisons and potential extensions to other domains and neural systems.


From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP

http://arxiv.org/abs/2406.12618v1

Compressor summary: This paper analyzes the impact of interpretability and analysis (IA) research on NLP using citation data and survey responses, finding that IA is influential in shaping NLP progress and methodology, while also identifying areas for improvement.


Learning Diffusion at Lightspeed

http://arxiv.org/abs/2406.12616v1

Compressor summary: JKOnet* is a simple and efficient model for learning diffusion processes from data that recovers all components of the process and provides a closed-form optimal solution for some cases.


When Are Bias-Free ReLU Networks Like Linear Networks?

http://arxiv.org/abs/2406.12615v1

Compressor summary: The text explores the limitations of expressivity and learning dynamics of bias-free ReLU networks, which share some properties with linear networks due to symmetry conditions.


EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

http://arxiv.org/abs/2406.12614v1

Compressor summary: EUvsDisinfo is a large multilingual dataset of pro-Kremlin disinformation articles that helps track the evolution and spread of disinformation over time and across languages, as well as train models to distinguish between disinformation and trustworthy content.


Bridging Local Details and Global Context in Text-Attributed Graphs

http://arxiv.org/abs/2406.12608v1

Compressor summary: GraphBridge is a novel framework for text-attributed graphs that leverages contextual textual information to bridge local and global perspectives and improves efficiency with a graph-aware token reduction module.


Low-Redundant Optimization for Large Language Model Alignment

http://arxiv.org/abs/2406.12606v1

Compressor summary: The paper proposes a low-redundant alignment method called ALLO that improves large language models by focusing on optimizing the most related neurons with useful supervised signals.


Attack and Defense of Deep Learning Models in the Field of Web Attack Detection

http://arxiv.org/abs/2406.12605v1

Compressor summary: The paper explores backdoor attacks on web attack detection models and proposes five methods and defenses, achieving an 87% success rate that can be reduced by fine-tuning.


Generalization bounds for mixing processes via delayed online-to-PAC conversions

http://arxiv.org/abs/2406.12600v1

Compressor summary: The study analyzes how statistical learning algorithms generalize from non-i.i.d. data sampled from a stationary mixing process, and shows that online learning with delayed feedback can help achieve low generalization error.


Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images

http://arxiv.org/abs/2406.12592v1

Compressor summary: This paper explores concept ablation in pre-trained models, introduces a new variant called 'trademark ablation', and analyzes the model's limitations, resilience, and adaptability.


Discovering Minimal Reinforcement Learning Environments

http://arxiv.org/abs/2406.12589v1

Compressor summary: The paper explores how specialized training environments can improve reinforcement learning agents' performance, transferability, and speed of adaptation to new problems.


Restorer: Solving Multiple Image Restoration Tasks with One Set of Parameters

http://arxiv.org/abs/2406.12587v1

Compressor summary: The paper proposes Restorer, a Transformer network with U-Net architecture and all-axis attention mechanism, which achieves state-of-the-art or comparable performance in multiple image restoration tasks while being faster than existing methods.


Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling

http://arxiv.org/abs/2406.12585v1

Compressor summary: Key points: - Ensembling multiple models can improve classification accuracy - Existing methods mostly use full-text outputs of LLMs without exploiting token-level probabilities - The paper proposes GaC, which uses token-level probabilities from LLMs for ensembling - GaC outperforms existing methods on various benchmarks and can be further improved by ensembling only key tokens Summary: The paper introduces GaC, a novel method that exploits token-level probabilities from LLMs to ensemble multiple models and achieve better classification performance on various benchmarks.


Cephalometric Landmark Detection across Ages with Prototypical Network

http://arxiv.org/abs/2406.12577v1

Compressor summary: The paper introduces CeLDA, a method for detecting cephalometric landmarks in adults and adolescents using prototypical networks and prototype relation mining.


Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

http://arxiv.org/abs/2406.12572v1

Compressor summary: Mathador-LM is a new benchmark that tests large language models' math skills using a game-like challenge, and it shows that current LLMs perform worse than 5th graders on it.


Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection

http://arxiv.org/abs/2406.12570v1

Compressor summary: The paper proposes a method to detect machine-generated text using DetectGPT classifiers and ensembling techniques, achieving high accuracy even when the generative and discriminative language models are different or unknown.


MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

http://arxiv.org/abs/2406.12569v1

Compressor summary: The paper investigates MOYU, a property of large language models that affects inference speed and performance, and suggests ways to improve dynamic activation methods.


RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

http://arxiv.org/abs/2406.12566v1

Compressor summary: RichRAG is a novel RAG framework that uses a sub-aspect explorer, a multi-faceted retriever, and a list-wise ranker to generate rich long-form answers for open-ended user queries.


A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

http://arxiv.org/abs/2406.12563v1

Compressor summary: The paper presents a vision-based autonomous car racing agent that uses only local input features to outperform human drivers in time trial races.


Offline Imitation Learning with Model-based Reverse Augmentation

http://arxiv.org/abs/2406.12550v1

Compressor summary: The paper proposes a novel model-based framework, SRA, for offline imitation learning that uses reverse dynamic models and self-paced exploration to overcome covariate shift and achieve state-of-the-art performance.


MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

http://arxiv.org/abs/2406.12549v1

Compressor summary: The authors introduce MultiSocial, a new multilingual dataset for testing machine-generated text detection in social media, and evaluate existing methods on it.


P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

http://arxiv.org/abs/2406.12548v1

Compressor summary: The paper proposes a new method, called P-tailor, for personalizing large language models based on the Big Five Personality Traits, using a mixture of experts approach and a specialization loss function.


Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

http://arxiv.org/abs/2406.12546v1

Compressor summary: TruthQuest is a benchmark for suppositional reasoning based on knights and knaves puzzles that tests large language models' ability to reason about truth and lies.


The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

http://arxiv.org/abs/2406.12539v1

Compressor summary: The paper introduces a novel method for heterophilic graphs that assigns unique aggregation patterns to each node, called Heterophily Snowflake Hypothesis, and shows its effectiveness on various tasks and backbones.


Variational Distillation of Diffusion Policies into Mixture of Experts

http://arxiv.org/abs/2406.12538v1

Compressor summary: Variational Diffusion Distillation (VDD) is a new method that converts powerful but slow diffusion models into faster Mixtures of Experts (MoE) for behavior learning tasks.


ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

http://arxiv.org/abs/2406.12536v1

Compressor summary: Key points: - The paper introduces a new dataset (ViDSOD-100) and a new model (ATF-Net) for RGB-D video salient object detection - The model fuses appearance, spatio-temporal, and geometry information from different modalities using MEA and MDA modules - The model outperforms existing methods on various tasks and domains Summary: The paper presents a new dataset and a novel model for detecting salient objects in RGB-D videos, which integrates multiple modalities with attention and surpasses state-of-the-art techniques.


Unified Active Retrieval for Retrieval Augmented Generation

http://arxiv.org/abs/2406.12534v1

Compressor summary: UAR is a method for determining when to retrieve information in RAG systems that uses four criteria and standardized procedures to handle various types of user instructions efficiently and effectively.


TREE: Tree Regularization for Efficient Execution

http://arxiv.org/abs/2406.12531v1

Compressor summary: The paper proposes a method to reduce path lengths in decision trees during training to optimize their execution time on resource-constrained devices by favoring highly asymmetric distributions for split criteria, achieving up to 4x faster inference with minimal accuracy loss.


FuseGen: PLM Fusion for Data-generation based Zero-shot Learning

http://arxiv.org/abs/2406.12527v1

Compressor summary: FuseGen is a novel framework for zero-shot learning that improves synthetic dataset quality by using multiple PLMs and STMs for subset selection, re-weighting, and iterative data generation.


Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification

http://arxiv.org/abs/2406.12507v1

Compressor summary: The paper analyzes InterpretTime, a method to evaluate explanations in multivariate time series classification, pointing out its weaknesses, and proposing improvements. It also shows how the best attribution methods can improve channel selection in MTSC.


Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency

http://arxiv.org/abs/2406.12502v1

Compressor summary: Code-Optimise is a framework that improves both correctness and runtime of code language models by incorporating preference data from self-generated solutions.


Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

http://arxiv.org/abs/2406.12499v1

Compressor summary: The study explores the use of inverse reinforcement learning (IRL) to train models for autonomous navigation of catheters and guidewires in endovascular surgery, improving safety and efficiency.


Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation

http://arxiv.org/abs/2406.12496v1

Compressor summary: The paper introduces a new real-time semantic segmentation model (RDRNet) that balances accuracy and speed by reparameterizing multi-path blocks during inference and improving feature representation with RPPM.


LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

http://arxiv.org/abs/2406.12494v1

Compressor summary: LightPAL is a fast and effective passage retrieval method for summarizing multiple documents based on a large language model and random walk algorithm.


The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

http://arxiv.org/abs/2406.12480v1

Compressor summary: The authors demonstrate how to use large language models (LLMs) to generate synthetic data for training stance detection agents, which improves their performance in online political discussions.


RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

http://arxiv.org/abs/2406.12479v1

Compressor summary: The text describes a new paradigm for remote sensing image intelligence understanding using a multimodal instruction-following dataset and discusses its features and design.


Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices

http://arxiv.org/abs/2406.12478v1

Compressor summary: The paper explores ways to improve depthwise separable convolutions in efficient neural networks by optimizing data layouts, achieving significant latency and memory reduction on a low-power device.


Adversarial Multi-dueling Bandits

http://arxiv.org/abs/2406.12475v1

Compressor summary: MiDEX is an algorithm for minimizing regret in multi-dueling bandits with adversarial preferences, achieving near-optimal performance.


Exploring Intra and Inter-language Consistency in Embeddings with ICA

http://arxiv.org/abs/2406.12474v1

Compressor summary: The text describes how Independent Component Analysis (ICA) can reveal universal semantic axes across languages, and presents a method to verify their consistency within and across languages using statistical tests.


Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation

http://arxiv.org/abs/2406.12471v1

Compressor summary: DENI is a new strategy that reduces performance instability of fine-tuned language models by using ensembling, noise regularisation and model interpolation, while being computationally efficient.


Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

http://arxiv.org/abs/2406.12468v1

Compressor summary: ATBias is a new decoding technique that improves in-context editing (ICE) performance by biasing logits related to key entities, achieving significant improvements with low latency.


LFMamba: Light Field Image Super-Resolution with State Space Model

http://arxiv.org/abs/2406.12463v1

Compressor summary: LFMamba is a new super-resolution method for 4D light fields using State Space Models, which effectively captures contextual and angular information while being efficient and easy to implement.


HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

http://arxiv.org/abs/2406.12459v1

Compressor summary: HumanSplat is a method that predicts 3D human properties from one image, using a multi-view diffusion model and a transformer with structure priors, to achieve photorealistic novel-view synthesis.


A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints

http://arxiv.org/abs/2406.12454v1

Compressor summary: The article proposes a novel exact algorithm for the 2L-CVRP that combines attention and recurrence mechanisms, improving the state-of-the-art solution by 29.79% on average and resolving an open instance.


Insect Identification in the Wild: The AMI Dataset

http://arxiv.org/abs/2406.12452v1

Compressor summary: The text discusses the crisis of insect decline, the need for better monitoring tools, and presents new machine learning benchmarks and datasets for insect recognition using computer vision.


Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine

http://arxiv.org/abs/2406.12449v1

Compressor summary: Retrieval-augmented generation (RAG) combines generative AI's strengths with external knowledge retrieval to enhance accuracy and improve medical applications.


Abstraction-of-Thought Makes Language Models Better Reasoners

http://arxiv.org/abs/2406.12442v1

Compressor summary: Key points: - The paper introduces Abstraction-of-Thought (AoT), a structured reasoning format that requires varying levels of abstraction - The paper presents AoT Collection, a finetuning dataset for language models with AoT reasoning processes - The paper shows that AoT models outperform Chain-of-Thought (CoT) models on many reasoning tasks Summary: The paper proposes AoT, a new way of reasoning with abstraction, and AoT Collection, a dataset to train language models with it. It demonstrates that AoT models are better at reasoning than CoT models.


Cycle-Correspondence Loss: Learning Dense View-Invariant Visual Features from Unlabeled and Unordered RGB Images

http://arxiv.org/abs/2406.12441v1

Compressor summary: The paper introduces Cycle-Correspondence Loss (CCL) for learning view-invariant visual descriptors for robot manipulation tasks, enabling simple data collection and training on unpaired RGB camera views.


Deep self-supervised learning with visualisation for automatic gesture recognition

http://arxiv.org/abs/2406.12440v1

Compressor summary: The authors explore different deep learning methods for recognising hand gestures using 3D skeleton data and show that supervised learning is most accurate, while self-supervised learning improves accuracy in simulated settings.


A data-centric approach for assessing progress of Graph Neural Networks

http://arxiv.org/abs/2406.12439v1

Compressor summary: Key points: - GNNs excel at multi-class node classification, but less so at multi-label classification - The authors collected and released three biological datasets and a multi-label graph generator - They proposed new notions of homophily and Cross-Class Neighborhood Similarity for multi-label classification - They conducted a large-scale comparative study with eight methods across nine datasets Summary: The paper introduces new concepts and methods for multi-label node classification using GNNs, and releases three biological datasets and a graph generator.


PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

http://arxiv.org/abs/2406.12430v1

Compressor summary: The paper introduces Decision QA, a new task for using LLMs to help with complex decision making, and proposes a new RAG technique called PlanRAG that performs better than the state-of-the-art method in two scenarios.


Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

http://arxiv.org/abs/2406.12429v1

Compressor summary: This paper proposes a method to select homogeneous tools for tasks, considering both their performance and cost, and shows it outperforms existing methods.


PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

http://arxiv.org/abs/2406.12428v1

Compressor summary: The study proposes a method to generate text and speech simultaneously for spoken dialogue systems, reducing response generation latency and maintaining content quality.


Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models

http://arxiv.org/abs/2406.12423v1

Compressor summary: Key points: - Time-series data has privacy and business sensitivity issues - Generative time-series models can create synthetic data for societal benefits - Existing models are limited by memory, accuracy, and representativeness - The paper proposes a transformer-based diffusion model (TDDPM) that outperforms state-of-the-art - The model can also generate mobility data for different scenarios and environments Summary: The paper introduces TDDPM, a transformer-based diffusion model that generates high-quality and representative time-series data, especially for mobility applications, while addressing privacy and scalability challenges.


Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech

http://arxiv.org/abs/2406.12422v1

Compressor summary: The researchers developed a web service that combines deep learning and a morphological dictionary to improve Czech morphosyntactic analysis, achieving significant error reductions in lemmatization, POS tagging, and dependency parsing.


MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling

http://arxiv.org/abs/2406.12420v1

Compressor summary: The paper proposes a unified template filling model that connects textual and visual modalities via textual prompts to improve Event Argument Extraction performance on the M2E2 benchmark.


AI-Assisted Human Evaluation of Machine Translation

http://arxiv.org/abs/2406.12419v1

Compressor summary: The ESA$^\mathrm{AI}$ protocol uses AI assistance to help annotators reduce time and cost in evaluating machine translation quality, while providing more detailed annotations.


Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

http://arxiv.org/abs/2406.12416v1

Compressor summary: This paper evaluates how well preference learning fine-tunes LLMs for factuality on out-of-domain datasets and proposes a new method, APEFT, that improves factuality awareness.


A Novel Algorithm for Community Detection in Networks using Rough Sets and Consensus Clustering

http://arxiv.org/abs/2406.12412v1

Compressor summary: The RC-CCD framework uses rough set theory and consensus clustering to accurately detect overlapping communities in complex networks, handling uncertainties and enhancing reliability.


LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

http://arxiv.org/abs/2406.12407v1

Compressor summary: The authors present a new method using occupancy networks to accurately locate 67 body structures from single depth images, accounting for individual anatomical diversity, and aiming to enhance medical imaging and diagnostics.


Fast Rates for Bandit PAC Multiclass Classification

http://arxiv.org/abs/2406.12406v1

Compressor summary: The paper proposes a new algorithm for learning from limited feedback that improves upon existing bounds and has a small price of bandit feedback in the agnostic setting.


Scan-to-BIM for As-built Roads: Automatic Road Digital Twinning from Semantically Labeled Point Cloud Data

http://arxiv.org/abs/2406.12404v1

Compressor summary: The text introduces a new scan-to-BIM framework for creating geometric digital twins of as-built roads from semantically labeled point cloud data, which can improve automation and accuracy.


PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

http://arxiv.org/abs/2406.12403v1

Compressor summary: The paper proposes a framework that allows clients to train small language models on large language models' rationales without revealing sensitive information using two privacy strategies.


Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

http://arxiv.org/abs/2406.12402v1

Compressor summary: This paper introduces explainable templates for common informal logical fallacies to explicate their implicit logic and evaluates their effectiveness in annotating and detecting fallacies.


A Cutting-Edge Deep Learning Method For Enhancing IoT Security

http://arxiv.org/abs/2406.12400v1

Compressor summary: The paper presents an innovative IoT intrusion detection system using deep learning that achieves high accuracy, low false alarms, and real-time processing, making it a promising solution for IoT cybersecurity.


QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities

http://arxiv.org/abs/2406.12399v1

Compressor summary: QueerBench is a new framework that evaluates how large language models generate sentence completions that are potentially harmful towards LGBTQIA+ individuals, finding a significant difference in discriminatory behaviour.


Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models

http://arxiv.org/abs/2406.12397v1

Compressor summary: Synthetic data can improve LLMs' performance but may cause pattern overfitting and reduce instruction-following; using unlearning techniques can mitigate these issues.


SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions

http://arxiv.org/abs/2406.12395v1

Compressor summary: The paper proposes SDNIA-YOLO, a method that improves object detection in extreme conditions by enhancing image quality and learning from synthesized images using neural style transfer.


EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause

http://arxiv.org/abs/2406.12389v1

Compressor summary: The authors introduce a large and reliable dataset of emotion causes extracted from tweets, which can help develop emotion-aware systems.


IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

http://arxiv.org/abs/2406.12386v1

Compressor summary: IPEval is the first evaluation benchmark for Large Language Models in intellectual property tasks, covering creation, application, protection, and management aspects with multiple-choice questions and various evaluation methods.


VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

http://arxiv.org/abs/2406.12384v1

Compressor summary: The text introduces VRSBench, a new benchmark for remote sensing image understanding that improves on existing datasets with more diverse tasks, detailed object information, and quality control.


From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

http://arxiv.org/abs/2406.12382v1

Compressor summary: TAGI is a method to simulate human learning by generating task-specific models from instructions, improving cross-task generalization and reducing computational costs.


QOG:Question and Options Generation based on Language Model

http://arxiv.org/abs/2406.12381v1

Compressor summary: QOG models using fine-tuning sequence-to-sequence LMs are efficient and stable, outperforming other methods and being competitive with Llama 3-8B.


GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

http://arxiv.org/abs/2406.12375v1

Compressor summary: GW-MoE is a fine-tuning method that reduces uncertainty in Mixture-of-Experts models by broadcasting uncertain tokens across experts during inference, improving performance in various tasks and model sizes.


Problem-Solving in Language Model Networks

http://arxiv.org/abs/2406.12374v1

Compressor summary: This paper explores how network structures and agent interactions affect the reasoning and question-answering abilities of multi-agent systems, finding that random or scale-free networks with knowledgeable hubs improve performance.


WebCanvas: Benchmarking Web Agents in Online Environments

http://arxiv.org/abs/2406.12373v1

Compressor summary: WebCanvas is a framework for evaluating web agents that captures the dynamic nature of web interactions using a novel metric, dataset, and annotation tools.


A Comparative Study of Continuous Sign Language Recognition Techniques

http://arxiv.org/abs/2406.12369v1

Compressor summary: The study evaluates deep learning techniques for continuous sign language recognition across various datasets and languages, providing insights into their performance and robustness.


Mixing Natural and Synthetic Images for Robust Self-Supervised Representations

http://arxiv.org/abs/2406.12368v1

Compressor summary: DiffMix is a self-supervised learning framework that combines real and synthetic images to improve representation learning and reduce image augmentations.


Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines

http://arxiv.org/abs/2406.12367v1

Compressor summary: The paper presents a new method to train post-processing filters for machine vision tasks using competitive learning, achieving better results than independent training.


Structured Prediction in Online Learning

http://arxiv.org/abs/2406.12366v1

Compressor summary: The paper proposes a framework and algorithms for structured prediction in online learning, generalizing optimal supervised learning algorithms to non-i.i.d. and non-stationary data.


Does Context Help Mitigate Gender Bias in Neural Machine Translation?

http://arxiv.org/abs/2406.12364v1

Compressor summary: Context-aware Neural Machine Translation models may improve accuracy but not eliminate gender bias.


Certified ML Object Detection for Surveillance Missions

http://arxiv.org/abs/2406.12362v1

Compressor summary: This paper describes a machine learning-based drone detection system development process that follows the soon-to-be-published ED 324/ARP 6983 recommendations for reliability.


UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models

http://arxiv.org/abs/2406.12360v1

Compressor summary: UrbanLLM is a large language model that helps solve complex urban problems by using AI models for different sub-tasks and generating comprehensive responses.


Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents

http://arxiv.org/abs/2406.12359v1

Compressor summary: The text discusses how different data sampling methods affect the ability of meta-RL agents to represent and adapt to unknown environments, particularly in continuous control tasks and sparse reward navigation tasks.


LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

http://arxiv.org/abs/2406.12355v1

Compressor summary: Key points: - Gait recognition identifies individuals by walking patterns - LiDAR-camera fusion is a promising method for gait recognition - LiCAF is a novel network that uses asymmetric modeling strategy - LiCAF improves cross-modal information selection and temporal modeling - LiCAF achieves state-of-the-art performance on SUSTech1K dataset Summary: LiCAF is a new network that fuses LiDAR and camera data for gait recognition, using an asymmetric strategy to enhance cross-modal features and temporal modeling, and achieving the best results on a large dataset.


Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

http://arxiv.org/abs/2406.12354v1

Compressor summary: This paper proposes an adaptive machine unlearning method for multilingual models that erases information across languages while preserving performance and improving security against low-resource language attacks.


Encoding Matching Criteria for Cross-domain Deformable Image Registration

http://arxiv.org/abs/2406.12350v1

Compressor summary: Our method uses a registration-oriented encoder that combines general and structural features to improve cross-domain deformable registration accuracy and adaptability.


Interpreting Bias in Large Language Models: A Feature-Based Approach

http://arxiv.org/abs/2406.12347v1

Compressor summary: The paper presents a novel feature-based approach to analyze and understand how social biases are propagated within large language models and proposes targeted debiasing methods.


PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints

http://arxiv.org/abs/2406.12338v1

Compressor summary: The paper introduces a flexible algorithmic framework for fitting PARAFAC2-based CMTF models using Alternating Optimization and ADMM, which allows to impose various constraints on all modes and linear couplings and shows its benefits in accuracy and efficiency.


A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain

http://arxiv.org/abs/2406.12336v1

Compressor summary: The study evaluates and compares different sentence embedding models for telecom domains, finding that domain adaptation improves both retrieval accuracy and isotropy of embeddings.


Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

http://arxiv.org/abs/2406.12335v1

Compressor summary: VATP is a new method for reducing the memory cost of large language models by using both attention scores and value vector norms to prune tokens.


What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering

http://arxiv.org/abs/2406.12334v1

Compressor summary: The paper introduces two metrics (sensitivity and consistency) to measure how well large language models handle rephrasings of the input for text classification tasks, aiming to improve their robustness and performance.


Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

http://arxiv.org/abs/2406.12331v1

Compressor summary: Our novel approach improves large language models' multi-hop reasoning by dynamically editing long contexts using interactive information retrieval.


SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions

http://arxiv.org/abs/2406.12329v1

Compressor summary: SNAP is a machine unlearning method for large language models that selectively forgets information without compromising their performance or user experience.


Automatic benchmarking of large multimodal models via iterative experiment programming

http://arxiv.org/abs/2406.12321v1

Compressor summary: APEx is a framework that uses a language model to automatically generate experiments and reports for evaluating large multimodal models.


PRePair: Pointwise Reasoning Enhance Pairwise Evaluating for Robust Instruction-Following Assessments

http://arxiv.org/abs/2406.12319v1

Compressor summary: The study compares pointwise and pairwise evaluation methods for natural language generation tasks, finding that pointwise is more robust but pairwise can identify shortcomings, leading to a hybrid method that combines both approaches.


Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

http://arxiv.org/abs/2406.12316v1

Compressor summary: The paper proposes a modality-aware and instance-aware visual prompts network to improve visible-infrared pedestrian re-identification by using transformer architecture and customized prompts for each person.


PruningBench: A Comprehensive Benchmark of Structural Pruning

http://arxiv.org/abs/2406.12315v1

Compressor summary: PruningBench is a comprehensive benchmark for evaluating structural pruning methods on various models and tasks, providing an online platform and facilitating future research.


Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

http://arxiv.org/abs/2406.12311v1

Compressor summary: BinaryMoS is a novel binarization technique for large language models that adaptively adjusts the values of binary weights based on each token's context, enhancing linguistic effectiveness without compromising compression efficiency.


Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

http://arxiv.org/abs/2406.12307v1

Compressor summary: This study investigates how well large language models can handle incomplete scenarios and tool unavailability in real-world environments, and suggests improvements for their reliability.


COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

http://arxiv.org/abs/2406.12304v1

Compressor summary: The paper proposes a new framework using contrastive optimal transport to generate counter-narratives against hate speech, addressing target interaction, diversification, and relevance issues.


Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

http://arxiv.org/abs/2406.12303v1

Compressor summary: The paper proposes Immiscible Diffusion, a method to improve diffusion models by assigning target noise for each image before diffusing it, resulting in faster and more accurate training.


Research on Dangerous Flight Weather Prediction based on Machine Learning

http://arxiv.org/abs/2406.12298v1

Compressor summary: The paper proposes using support vector machine with radial basis function kernel to predict hazardous flight weather conditions based on historical meteorological observations.


Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

http://arxiv.org/abs/2406.12297v1

Compressor summary: The paper proposes a faithful and parallel density peaks clustering method that works on non-Euclidean data and outperforms existing methods in accuracy.


Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

http://arxiv.org/abs/2406.12295v1

Compressor summary: The paper presents FS-GEN, a framework that combines large and small language models for various applications, and analyzes different techniques within it to improve efficiency and collaboration.


Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

http://arxiv.org/abs/2406.12293v1

Compressor summary: ENCOFA is a framework that addresses mixed closed-set and open-set label noise in medical image classification by using contrastive learning and feature augmentation.


An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

http://arxiv.org/abs/2406.12288v1

Compressor summary: The study investigates how neuron activation in large language models' feed-forward layers explains their arithmetic reasoning capabilities when prompted with Chain-of-Thought prompts, using Llama2 as an example.


VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation

http://arxiv.org/abs/2406.12286v1

Compressor summary: VIRL pre-trains a 3D geometric encoder to improve few-shot learning for manufacturability tasks using CAM simulations, and suggests LoRA and static normalization as effective strategies.


DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection

http://arxiv.org/abs/2406.12285v1

Compressor summary: The paper proposes DASSF, a dynamic-attention scale-sequence fusion algorithm for small target detection in aerial images, which improves YOLO's performance by enhancing its ability to perceive targets of different scales and types.


Demystifying the Recency Heuristic in Temporal-Difference Learning

http://arxiv.org/abs/2406.12284v1

Compressor summary: The paper analyzes how the recency heuristic in reinforcement learning, which favors recent rewards, helps with temporal credit assignment and provides proofs for its convergence, contraction rate, and variance properties.


SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting

http://arxiv.org/abs/2406.12282v1

Compressor summary: SAGDFN is a scalable graph neural network that captures complex spatial-temporal correlations for large-scale multivariate time series forecasting and outperforms existing methods on multiple real-world datasets.


What Matters in Learning Facts in Language Models? Multifaceted Knowledge Probing with Diverse Multi-Prompt Datasets

http://arxiv.org/abs/2406.12277v1

Compressor summary: The study introduces BELIEF, a framework to evaluate how well large language models understand factual knowledge using diverse prompts and a new dataset called MyriadLAMA.


CodeNav: Beyond tool-use to using real-world codebases with LLM agents

http://arxiv.org/abs/2406.12276v1

Compressor summary: CodeNav is an LLM agent that navigates unseen code repositories and solves user queries by iteratively generating solutions with execution feedback.


VoCo-LLaMA: Towards Vision Compression with Large Language Models

http://arxiv.org/abs/2406.12275v1

Compressor summary: VoCo-LLaMA compresses vision tokens using language models to improve multi-modal tasks' efficiency and understanding of temporal correlations in videos.


SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

http://arxiv.org/abs/2406.12274v1

Compressor summary: SafeInfer is a method to generate safe and ethical responses from language models using demonstration examples and safety-optimized token selection.


Slot State Space Models

http://arxiv.org/abs/2406.12272v1

Compressor summary: SlotSSMs are a new framework for SSMs that model modular structures with independent state vectors and self-attention, improving performance in tasks involving multiple objects and long-range temporal dependencies.


Agriculture-Vision Challenge 2024 -- The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble

http://arxiv.org/abs/2406.12271v1

Compressor summary: The Agriculture-Vision Challenge uses semantic segmentation models to analyze satellite images for agricultural purposes and addresses class imbalance issues by using data augmentation and adaptive weight schemes.


Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization

http://arxiv.org/abs/2406.12269v1

Compressor summary: The text proposes a novel table reasoning framework called Question-then-Pinpoint that helps generate high-quality table summaries by unveiling implicit knowledge and providing explainable guidance for the summarizer.


Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

http://arxiv.org/abs/2406.12266v1

Compressor summary: The paper proposes ClientCAST, a method to assess large language models (LLMs) as therapists by simulating clients and having them complete questionnaires about their interactions with LLM therapists.


Defending Against Social Engineering Attacks in the Age of LLMs

http://arxiv.org/abs/2406.12263v1

Compressor summary: The study examines how Large Language Models can both facilitate and defend against chat-based social engineering attacks and proposes ConvoSentinel, a defense pipeline that improves detection by comparing messages to a database of similar conversations.


Investigating Data Usage for Inductive Conformal Predictors

http://arxiv.org/abs/2406.12262v1

Compressor summary: ICPs generate prediction sets with confidence levels based on exchangeability, and this study explores efficient data division for their development, considering overlap between training and calibration sets.


Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation

http://arxiv.org/abs/2406.12260v1

Compressor summary: The paper proposes a self-supervised anomaly detection technique for time-series data that uses learnable data augmentation and contrastive learning, improving performance on benchmark datasets and enabling diagnosis of root causes.


Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and Metrics

http://arxiv.org/abs/2406.12258v1

Compressor summary: The paper proposes a novel method to improve face anti-spoofing performance by enhancing zero-shot data domain generalization, measuring uncertainty, and ensembling backbones from a Bayesian perspective.


CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

http://arxiv.org/abs/2406.12257v1

Compressor summary: CleanGen is a lightweight defense against backdoor attacks in large language models that replaces suspicious tokens with non-compromised ones during decoding.


Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024

http://arxiv.org/abs/2406.12256v1

Compressor summary: The report introduces a new loss function for video-text retrieval tasks that leverages correlation matrix information and improves performance on EPIC-KITCHENS-100 challenge.


A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

http://arxiv.org/abs/2406.12255v1

Compressor summary: This paper investigates why chain-of-thought methods improve large language models' reasoning performance, proposing a Read-and-Control approach for explaining and controlling these methods.


Language and Multimodal Models in Sports: A Survey of Datasets and Applications

http://arxiv.org/abs/2406.12252v1

Compressor summary: This paper reviews datasets and applications of Natural Language Processing and multimodal models in sports analytics after 2020, highlighting their benefits for various purposes and challenges for future development.


Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

http://arxiv.org/abs/2406.12251v1

Compressor summary: SHLPT is a novel lifelong learning framework that uses a learnable similarity metric to partition tasks into subsets, enabling effective transfer regardless of their similarity or dissimilarity and reducing catastrophic forgetting with a parameter pool.


TroL: Traversal of Layers for Large Language and Vision Models

http://arxiv.org/abs/2406.12246v1

Compressor summary: TroL is an efficient large language and vision model family that uses layer traversal to reuse layers and achieve powerful performance without increasing physical model size.


GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

http://arxiv.org/abs/2406.12242v1

Compressor summary: The paper proposes GMP-AR, a framework that uses temporal hierarchy information and an adaptive reconciliation strategy to improve forecasting performance and maintain coherence for different temporal granularity tasks, such as sales prediction.


More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

http://arxiv.org/abs/2406.12241v1

Compressor summary: The paper proposes a framework that combines different approximate sampling methods with Feel-Good Thompson Sampling for exploration in Deep RL, achieving theoretical and empirical improvements.


PFID: Privacy First Inference Delegation Framework for LLMs

http://arxiv.org/abs/2406.12238v1

Compressor summary: PFID is a privacy-preserving framework for LLMs that hides user input by using model sharding, singular value decomposition, and compressed hidden states.


Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

http://arxiv.org/abs/2406.12235v1

Compressor summary: Holmes-VAD is a framework that uses precise temporal supervision and rich multimodal instructions to enable accurate and interpretable video anomaly detection, and introduces VAD-Instruct50k, a large-scale multimodal VAD instruction-tuning benchmark.


SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

http://arxiv.org/abs/2406.12233v1

Compressor summary: SyncVSR is an end-to-end learning framework for visual speech recognition that uses quantized audio and crossmodal supervision to handle visually similar lip gestures and achieve state-of-the-art results with much less data.


"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

http://arxiv.org/abs/2406.12232v1

Compressor summary: The study shows that GPT-3 and Llama models tend to favor hiring White female-sounding names and suggest varying salaries for candidates based on their race and gender, which may not reflect real-world labor market trends.


MCSD: An Efficient Language Model with Diverse Fusion

http://arxiv.org/abs/2406.12230v1

Compressor summary: MCSD is an efficient language model that fuses diverse features with slope and decay sections, enabling fast inference and low resource consumption, making it suitable for edge devices and embodied intelligence.


Spatially Resolved Gene Expression Prediction from Histology via Multi-view Graph Contrastive Learning with HSIC-bottleneck Regularization

http://arxiv.org/abs/2406.12229v1

Compressor summary: The text proposes a method called ST-GCHB that uses graph contrastive learning and histopathological images to predict gene expression in spatial transcriptomics data, while accounting for spatial dependencies among spots.


Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

http://arxiv.org/abs/2406.12227v1

Compressor summary: The paper explores how fine-tuning large language models causes them to lose general capabilities due to instruction following, proposes the Instruction Vector framework to capture these capabilities, and develops IV-guided training to mitigate forgetting.


The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

http://arxiv.org/abs/2406.12225v1

Compressor summary: The text introduces a method that uses a multimodal language model to improve object detection by generating referential expressions for each category, which helps align the detected targets with the target concepts and enhance the fine-tuning of the vision-language model.


ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

http://arxiv.org/abs/2406.12223v1

Compressor summary: The study shows that current language models struggle to detect offensive content when text is manipulated with homophonic substitutions and emoji transformations, especially in Chinese, suggesting a need for better techniques.


On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

http://arxiv.org/abs/2406.12221v1

Compressor summary: RLFH is a fine-grained feedback method using online reinforcement learning to reduce hallucination in large language models.


Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

http://arxiv.org/abs/2406.12220v1

Compressor summary: The paper proposes a new way to understand and improve Transformers and MLP-Mixers by integrating them with hierarchical associative memory, and finds that weight matrices in the vanilla MLP-Mixer naturally acquire symmetry-breaking effects for better performance.


PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

http://arxiv.org/abs/2406.12219v1

Compressor summary: The paper introduces HP-ViT, a model that uses a ViT backbone and transformer head to accurately estimate hand poses from egocentric videos, achieving the 1st position in the EgoExo4D Hand Pose Challenge.


Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

http://arxiv.org/abs/2406.12216v1

Compressor summary: This paper tests if large language models can accurately infer people's personalities from brief descriptions, finding that they can, but also have some limitations and biases.


LLM-Oracle Machines

http://arxiv.org/abs/2406.12213v1

Compressor summary: The text discusses extending oracle Turing machines with clustered large language models to improve natural language processing tasks and reliability.


PCIE_LAM Solution for Ego4D Looking At Me Challenge

http://arxiv.org/abs/2406.12211v1

Compressor summary: The report describes a winning solution for a challenge that involves detecting if people are looking at the camera in videos, using a combination of image and text features processed by neural networks.


Knowledge Fusion By Evolving Weights of Language Models

http://arxiv.org/abs/2406.12208v1

Compressor summary: The paper proposes a method called Evolver to integrate multiple language models into a unified model that performs well across various domains and generalizes well on out-of-domain data, without needing further training or additional data.


Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

http://arxiv.org/abs/2406.12205v1

Compressor summary: RL-LOW is an algorithm that minimizes simple regret in offline RL with preference feedback and preserves privacy.


InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context

http://arxiv.org/abs/2406.12203v1

Compressor summary: InterIntent is a framework that assesses large language models' social intelligence by measuring their ability to understand and manage intentions in a game setting, revealing their weaknesses in inferring others' intentions and highlighting the importance of intention understanding for success.


Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

http://arxiv.org/abs/2406.12199v1

Compressor summary: The study shows that advanced deep learning models can better predict heart rate time series for cardiovascular disease management than traditional models.


Debate as Optimization: Adaptive Conformal Prediction and Diverse Retrieval for Event Extraction

http://arxiv.org/abs/2406.12197v1

Compressor summary: DAO is a system that improves event extraction using multi-agent debates without tuning, with two novel modules (DRAG and AdaCP) enhancing accuracy and reliability.


Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

http://arxiv.org/abs/2406.12193v1

Compressor summary: The paper proposes an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address issues in high-dimensional multi-label data with missing labels, using a generalized regression model and integrating instance and label correlations.


Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

http://arxiv.org/abs/2406.12182v1

Compressor summary: Aquila-Med is a bilingual medical LLM that uses continue pre-training, supervised fine-tuning, and reinforcement learning to improve performance in specific professional fields like medicine, with open-source datasets and models available.


The Wisdom of a Crowd of Brains: A Universal Brain Encoder

http://arxiv.org/abs/2406.12179v1

Compressor summary: The paper proposes a Universal Brain-Encoder that learns unique voice-embeddings per brain-voxel, enabling cross-attention between images and brains, and improving encoding across subjects, datasets, and machines.


FCA-RAC: First Cycle Annotated Repetitive Action Counting

http://arxiv.org/abs/2406.12178v1

Compressor summary: The text introduces FCA-RAC, a framework that enhances action counting models by annotating videos with first action cycles, adjusting sampling strategies, using multi-temporal granularity convolution, and exploiting training knowledge augmentation to improve performance on seen and unseen actions.


Location-based Radiology Report-Guided Semi-supervised Learning for Prostate Cancer Detection

http://arxiv.org/abs/2406.12177v1

Compressor summary: The text describes a novel semisupervised learning method for prostate cancer detection on MRI using clinical information from radiology reports to guide the training and reduce the need for manual annotations.


MiSuRe is all you need to explain your image segmentation

http://arxiv.org/abs/2406.12173v1

Compressor summary: MiSuRe is an algorithm that creates minimal saliency maps for image segmentation models, highlighting only crucial regions and enabling post-hoc reliability analysis.


Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

http://arxiv.org/abs/2406.12172v1

Compressor summary: SearchBench introduces a new benchmark for search problems that challenges LLMs' end-to-end text solving abilities and shows that in-context learning with A* algorithm implementations and multi-stage verification can significantly improve their performance.


BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

http://arxiv.org/abs/2406.12168v1

Compressor summary: BPO is an online method for aligning large language models with human preferences, which improves performance across various tasks and datasets.


Statistical Uncertainty in Word Embeddings: GloVe-V

http://arxiv.org/abs/2406.12165v1

Compressor summary: The authors propose a method to estimate uncertainty in word embeddings (GloVe) and demonstrate its usefulness for hypothesis testing and bias analysis in various applications.


Discussion Graph Semantics of First-Order Logic with Equality for Reasoning about Discussion and Argumentation

http://arxiv.org/abs/2406.12163v1

Compressor summary: The paper proposes a new way to use logic for reasoning about general discussion and argumentation models using a top-down approach.


Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance

http://arxiv.org/abs/2406.12159v1

Compressor summary: Our study suggests that pre-training benefits of large language models come mainly from their latent space geometry, not linguistic knowledge, and this could help create better models with less data.


LLMs Are Prone to Fallacies in Causal Inference

http://arxiv.org/abs/2406.12158v1

Compressor summary: This study explores whether large language models can infer causal relations from different types of relational data in text and finds that they struggle with counterfactuals and are prone to the post hoc fallacy.