arxiv compressed, 2024-06-26

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-26 generated by the compressor, my personal LLM-based project.


Text-Animator: Controllable Visual Text Video Generation

http://arxiv.org/abs/2406.17777v1

Compressor summary: Text-Animator is a novel method for generating videos with accurate and coherent visual texts by controlling camera movement and refining text motions.


Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis

http://arxiv.org/abs/2406.17774v1

Compressor summary: The authors propose a fast and accurate method to estimate material properties of objects under uncontrolled lighting using signal-processing techniques, while also quantifying uncertainty for improved acquisition quality.


MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

http://arxiv.org/abs/2406.17770v1

Compressor summary: MG-LLaVA is a multi-modal large language model that enhances visual processing by using multi-granularity features and outperforms existing models on perception tasks.


BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning

http://arxiv.org/abs/2406.17764v1

Compressor summary: The authors introduce BMIKE-53, a benchmark for evaluating cross-lingual knowledge editing methods on 53 languages, and propose MIKE, a gradient-free method that shows promising results.


DiffusionPDE: Generative PDE-Solving Under Partial Observation

http://arxiv.org/abs/2406.17763v1

Compressor summary: The paper presents DiffusionPDE, a method that uses generative diffusion models to solve partial differential equations (PDEs) with missing information, achieving better results than existing methods.


Solving Hard Mizar Problems with Instantiation and Strategy Invention

http://arxiv.org/abs/2406.17762v1

Compressor summary: The authors use various ATP and AI methods to solve over 3000 previously unsolved Mizar problems, increasing the percentage of ATP-solved Mizar problems from 75% to above 80%.


CaLMQA: Exploring culturally specific long-form question answering across 23 languages

http://arxiv.org/abs/2406.17761v1

Compressor summary: CaLMQA is a diverse dataset of complex questions in 23 languages that reveals limitations of large language models in handling low-resource, culturally specific questions.


Interpreting Attention Layer Outputs with Sparse Autoencoders

http://arxiv.org/abs/2406.17759v1

Compressor summary: This paper trains sparse autoencoders on attention layer outputs in transformers to decompose them into interpretable features, discovering different feature families and roles, and using them to better understand and explain model behavior.


MotionBooth: Motion-Aware Customized Text-to-Video Generation

http://arxiv.org/abs/2406.17758v1

Compressor summary: MotionBooth is a framework that animates customized subjects with precise control over their shape, attributes, and motions using text-to-video models and training-free techniques.


Accelerating Clinical Evidence Synthesis with Large Language Models

http://arxiv.org/abs/2406.17755v1

Compressor summary: TrialMind is a generative AI pipeline for conducting medical systematic reviews, using large language models and human expert oversight, that outperforms traditional methods in literature search, screening, and data extraction.


Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

http://arxiv.org/abs/2406.17753v1

Compressor summary: The authors study how well Large Language Models (LLMs) can produce persuasive text and create a new dataset, Persuasive-Pairs, to measure and compare LLMs' abilities across various domains.


A New Perspective on Shampoo's Preconditioner

http://arxiv.org/abs/2406.17748v1

Compressor summary: Shampoo is an optimization algorithm that approximates the Gauss-Newton component or the covariance matrix using a Kronecker product, and its approximation is close to the optimal one.


Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

http://arxiv.org/abs/2406.17746v1

Compressor summary: The text proposes a taxonomy for memorization in language models, considering various factors that affect each type of memorization and using it to build a predictive model.


Following Length Constraints in Instructions

http://arxiv.org/abs/2406.17744v1

Compressor summary: Instruction-length controlling models perform better than standard models in evaluations that consider response length.


Point-SAM: Promptable 3D Segmentation Model for Point Clouds

http://arxiv.org/abs/2406.17741v1

Compressor summary: Point-SAM is a transformer-based 3D model that leverages 2D knowledge from SAM to segment point clouds with part-level and object-level annotations, achieving state-of-the-art performance on various benchmarks.


Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

http://arxiv.org/abs/2406.17740v1

Compressor summary: The authors propose a new framework for fine-tuning large Transformer models using structured unrestricted-rank matrices, which offer more flexibility and parameter efficiency than existing methods like Adapters and LoRA.


Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model

http://arxiv.org/abs/2406.17739v1

Compressor summary: ATTEMPT is a novel method for updating taxonomies by inserting new concepts at the appropriate position using pre-trained language models.


LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

http://arxiv.org/abs/2406.17737v1

Compressor summary: The study examines how the quality of responses from large language models varies depending on a user's English proficiency, education level, and country of origin, finding that these models are less reliable for users with lower proficiency or education, and those outside the US.


Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity

http://arxiv.org/abs/2406.17720v1

Compressor summary: Arboretum is a large dataset of diverse species images from iNaturalist with rich annotations for AI applications in biodiversity assessment and agriculture research.


When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

http://arxiv.org/abs/2406.17718v1

Compressor summary: The paper examines how auxiliary tasks like observation reconstruction and latent self-prediction affect representation learning in reinforcement learning, and shows that latent self-prediction is more helpful as an auxiliary task than observation reconstruction when dealing with distractions and non-linear functions.


ViANLI: Adversarial Natural Language Inference for Vietnamese

http://arxiv.org/abs/2406.17716v1

Compressor summary: The ViANLI dataset is an adversarial NLP dataset for Vietnamese natural language inference that challenges current machine learning models and improves their performance.


Compositional Models for Estimating Causal Effects

http://arxiv.org/abs/2406.17714v1

Compressor summary: The paper proposes a modular, compositional approach to estimate individual treatment effects in structured systems composed of multiple heterogeneous components, with benefits such as systematic generalization and improved overlap guarantees.


Data curation via joint example selection further accelerates multimodal learning

http://arxiv.org/abs/2406.17711v1

Compressor summary: JEST is an algorithm that selects batches of data jointly and improves training speed and efficiency in multimodal contrastive learning.


SurgeMOD: Translating image-space tissue motions into vision-based surgical forces

http://arxiv.org/abs/2406.17707v1

Compressor summary: The authors propose a new method to estimate forces in robotic surgery using video data and frequency domain analysis of organ motion.


HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

http://arxiv.org/abs/2406.17697v1

Compressor summary: HGTDP-DTA is a novel method for predicting drug target binding affinity using dynamic prompts and a hybrid Graph-Transformer architecture that integrates structural, sequence, and contextual information.


From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

http://arxiv.org/abs/2406.17692v1

Compressor summary: Alignment changes large language models' output distribution, but the effects are mostly superficial and can be replicated without fine-tuning.


Unified Auto-Encoding with Masked Diffusion

http://arxiv.org/abs/2406.17688v1

Compressor summary: UMD is a new auto-encoder that combines patch-based and noise-based image corruption techniques, leading to improved generative and representation learning performance.


VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

http://arxiv.org/abs/2406.17681v1

Compressor summary: The authors propose a method to dynamically evaluate language models by variabilizing benchmarks and sampling new values from test cases, ensuring fresh evaluations and reducing data contamination.


End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

http://arxiv.org/abs/2406.17680v1

Compressor summary: UAD is a vision-based autonomous driving method that uses unsupervised learning to reduce annotation requirements, computation overhead, and improve performance in nuScenes and CARLA.


Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

http://arxiv.org/abs/2406.17679v1

Compressor summary: The LoGoCAF framework uses a two-branch semantic segmentation architecture with local-to-global encoder and MLP decoder to fuse hyperspectral and X-modality data for efficient, accurate, and generalizable classification.


Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

http://arxiv.org/abs/2406.17675v1

Compressor summary: The paper presents a framework to study psychological attributes in large language models, creating a benchmark with six dimensions, and finds discrepancies between self-reported traits and real-world behaviors.


LaTable: Towards Large Tabular Models

http://arxiv.org/abs/2406.17673v1

Compressor summary: The paper introduces LaTable, a novel diffusion model for generating tabular data that works across different datasets and improves out-of-distribution performance, while exploring its limitations in zero-shot settings.


LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

http://arxiv.org/abs/2406.17663v1

Compressor summary: LLM-ARC combines a large language model with an automated reasoning critic to improve logical reasoning and achieve state-of-the-art accuracy on the FOLIO benchmark.


Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

http://arxiv.org/abs/2406.17660v1

Compressor summary: Grass is a novel sparse projection-based optimization method that reduces memory usage and computational costs for large language model training, enabling half-precision pretraining on a 13B parameter model with a $2 imes$ throughput improvement.


DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

http://arxiv.org/abs/2406.17659v1

Compressor summary: DKPROMPT combines vision-language models with domain knowledge in PDDL to improve robot task planning in open worlds.


ELIZA Reinterpreted: The world's first chatbot was not intended as a chatbot at all

http://arxiv.org/abs/2406.17650v1

Compressor summary: ELIZA, the first chatbot created by Joseph Weizenbaum in the early 1960s, was actually meant for research on human-machine conversation, but its accidental release and misunderstanding led to its fame as a chatbot and loss of the original source for over 50 years.


Privacy Preserving Reinforcement Learning for Population Processes

http://arxiv.org/abs/2406.17649v1

Compressor summary: The paper proposes a meta algorithm to make any Reinforcement Learning (RL) algorithm privacy-preserving in the setting of population processes, such as controlling epidemics, and shows that it achieves reasonable trade-offs between privacy and utility.


Variationist: Exploring Multifaceted Variation and Bias in Written Language Data

http://arxiv.org/abs/2406.17647v1

Compressor summary: Variationist is a new tool that helps researchers explore and visualize language variation and bias across multiple variables and metrics.


Banishing LLM Hallucinations Requires Rethinking Generalization

http://arxiv.org/abs/2406.17642v1

Compressor summary: Large Language Models (LLMs) often generate false information, and this study explores why traditional methods fail to prevent it and proposes a new model called Lamini-1 that uses multiple memory experts to store facts and reduce hallucinations.


BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

http://arxiv.org/abs/2406.17640v1

Compressor summary: BayTTA is a framework that optimizes test-time augmentation for computer vision tasks using Bayesian Model Averaging, improving accuracy and robustness on various medical image analysis and gene editing datasets.


Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP

http://arxiv.org/abs/2406.17639v1

Compressor summary: AlignCLIP is a method to improve cross-modal alignment in CLIP embeddings by sharing parameters and separating uni-modal embeddings, reducing the modality gap while maintaining performance on various tasks.


Aligning Diffusion Models with Noise-Conditioned Perception

http://arxiv.org/abs/2406.17636v1

Compressor summary: The proposed method improves text-to-image diffusion models by using a perceptual objective in the U-Net embedding space, leading to better human preference alignment and reduced computational cost.


Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels

http://arxiv.org/abs/2406.17633v1

Compressor summary: The study shows that using large language models to generate training labels can replace human annotations for text classification tasks in computational social science, leading to similar performance with faster and cheaper methods.


Video Inpainting Localization with Contrastive Learning

http://arxiv.org/abs/2406.17628v1

Compressor summary: The text proposes a method called ViLocal that uses contrastive learning to identify inpainted regions in videos and localize them using a 3D Uniformer encoder and a lightweight convolution decoder.


CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference

http://arxiv.org/abs/2406.17626v1

Compressor summary: The text discusses a study on large language models' safety in multi-turn dialogue coreference and reveals their vulnerability to such attacks.


Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

http://arxiv.org/abs/2406.17624v1

Compressor summary: This paper reviews current research on personality in large language models, categorizing studies into self-assessment, exhibition, and recognition, and providing a comprehensive overview of findings, challenges, resources, and future directions.


Embedded event based object detection with spiking neural network

http://arxiv.org/abs/2406.17617v1

Compressor summary: The research introduces an embedded neuromorphic testbench using SPLEAT accelerator to train and deploy efficient SNNs for event-based object detection on low-power hardware.


MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

http://arxiv.org/abs/2406.17614v1

Compressor summary: The paper proposes a regularization technique called MSRS for training speech recognition models from scratch, which reduces costs and improves performance.


Distributed Training of Large Graph Neural Networks with Variable Communication Rates

http://arxiv.org/abs/2406.17611v1

Compressor summary: The paper proposes a variable compression scheme for distributed GNN training that reduces communication volume without sacrificing accuracy, and shows its effectiveness in empirical results.


Test-Time Generative Augmentation for Medical Image Segmentation

http://arxiv.org/abs/2406.17608v1

Compressor summary: The paper proposes using a generative model to create multiple views of test images for medical image segmentation, improving performance and error estimation.


Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

http://arxiv.org/abs/2406.17601v1

Compressor summary: Director3D is a framework for generating realistic 3D scenes and camera trajectories from textual descriptions, using a combination of transformers, diffusion models, and refinement losses.


"Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

http://arxiv.org/abs/2406.17600v1

Compressor summary: The study suggests using expert labels and explanations with LLMs to approximate human label variation in NLI tasks, improving the scalability of annotations.


DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation

http://arxiv.org/abs/2406.17591v1

Compressor summary: DocParseNet combines deep learning and multi-modal learning to improve text and image recognition in scanned documents, achieving high accuracy and efficiency for real-world document processing applications.


LongIns: A Challenging Long-context Instruction-based Exam for LLMs

http://arxiv.org/abs/2406.17588v1

Compressor summary: LongIns is a new benchmark dataset that tests large language models' long-context and reasoning abilities in various settings, revealing their limitations in handling short context windows.


Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons

http://arxiv.org/abs/2406.17585v1

Compressor summary: The paper provides a guide on learning Dynamic Bayesian Networks from multiple trajectory samples, covering formalism, structure-weight interdependence, learning methods, and optimization functions, with comparisons of various algorithms.


Towards Compositional Interpretability for XAI

http://arxiv.org/abs/2406.17583v1

Compressor summary: The authors propose a categorical approach to define and compare AI models' interpretability using string diagrams, revealing common themes in XAI and demonstrating explainability benefits of compositionally-interpretable models.


Toward Universal Medical Image Registration via Sharpness-Aware Meta-Continual Learning

http://arxiv.org/abs/2406.17575v1

Compressor summary: The paper proposes a continual learning method for universal 3D medical image registration using meta-learning and sharpness-aware meta-continual learning, achieving better or comparable results to sequential or centralized multi-task training strategies on four datasets.


Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

http://arxiv.org/abs/2406.17574v1

Compressor summary: The research introduces a new text-to-SQL dataset for IoT devices and shows that joint training on query generation and data inference improves performance.


FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts

http://arxiv.org/abs/2406.17566v1

Compressor summary: The authors create a French dataset for evaluating and improving toxicity detection in language models, as current efforts mainly focus on English.


Multi-property Steering of Large Language Models with Dynamic Activation Composition

http://arxiv.org/abs/2406.17563v1

Compressor summary: This paper evaluates activation steering methods for language models and proposes Dynamic Activation Composition, an approach to modulate steering intensity based on multiple properties during generation.


Minimal Interaction Edge Tuning: A New Paradigm for Visual Adaptation

http://arxiv.org/abs/2406.17559v1

Compressor summary: Edge tuning uses pretrained models as feature extractors on cloud servers and fine-tunes small networks on edge devices with minimal information transfer and high adaptation capability using MIET.


The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

http://arxiv.org/abs/2406.17557v1

Compressor summary: FineWeb is a large pretraining dataset for language models that outperforms other open datasets and reveals insights into data curation strategies.


Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft

http://arxiv.org/abs/2406.17553v1

Compressor summary: The paper explores using large language models to predict actions in Minecraft collaborative building with few-shot prompts and analyzes performance gaps.


CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent

http://arxiv.org/abs/2406.17542v1

Compressor summary: CDQuant is a simple and scalable alternative to GPTQ that outperforms it in compressing large language models with minimal impact on performance.


Principal Component Clustering for Semantic Segmentation in Synthetic Data Generation

http://arxiv.org/abs/2406.17541v1

Compressor summary: Key points: - Method for generating synthetic dataset for semantic segmentation using latent diffusion model - No need for additional segmentation models - Part of submission to CVPR 2024 workshop challenge - Self-attentions for semantic information condensation - Non-prompt-influencing cross-attentions for mask classification - Mask refinement step using only output image by Stable Diffusion Summary: The authors present a method to create synthetic images with segmented objects using a latent diffusion model, without extra segmentation models, and propose various attention mechanisms and a mask refinement step for the CVPR 2024 workshop challenge.


SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

http://arxiv.org/abs/2406.17538v1

Compressor summary: The paper proposes a novel network that uses motion magnification, channel attention, temporal modeling, and self-knowledge distillation to enhance micro-expression recognition performance.


SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder

http://arxiv.org/abs/2406.17537v1

Compressor summary: The text proposes a semi-supervised deep learning method called SincVAE for detecting epileptic seizures in EEG data, which can identify early and late stages of seizures.


Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark

http://arxiv.org/abs/2406.17535v1

Compressor summary: The text introduces a structured benchmark using INVALSI tests to evaluate Large Language Models in Italian, providing a reference point for researchers and assessing their performance against human results.


Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

http://arxiv.org/abs/2406.17534v1

Compressor summary: The authors propose a framework that combines in-context learning with large language models to improve few-shot hierarchical text classification by using a retrieval database, label-aware representations, and a novel contrastive learning objective.


Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study

http://arxiv.org/abs/2406.17532v1

Compressor summary: Large language models can understand some aspects of Description Logic ontologies, but struggle with others like transitivity and handling large amounts of data.


Point Tree Transformer for Point Cloud Registration

http://arxiv.org/abs/2406.17530v1

Compressor summary: The Point Tree Transformer (PTT) is a novel transformer-based approach for point cloud registration that efficiently extracts local and global features while maintaining linear computational complexity by constructing hierarchical feature trees and using a new Point Tree Attention mechanism.


LumberChunker: Long-Form Narrative Document Segmentation

http://arxiv.org/abs/2406.17526v1

Compressor summary: LumberChunker is a method that uses an LLM to dynamically segment documents for dense retrieval, and it outperforms other methods on the GutenQA benchmark.


On the consistency of hyper-parameter selection in value-based deep reinforcement learning

http://arxiv.org/abs/2406.17523v1

Compressor summary: The paper studies how reliable hyper-parameter selection affects value-based deep reinforcement learning agents and introduces a new score to measure consistency.


Tell Me Where You Are: Multimodal LLMs Meet Place Recognition

http://arxiv.org/abs/2406.17520v1

Compressor summary: The authors propose a multimodal approach using vision-based retrieval and language-based reasoning for visual place recognition in robotics, without requiring supervised training.


Entropy-Based Decoding for Retrieval-Augmented Large Language Models

http://arxiv.org/abs/2406.17519v1

Compressor summary: The paper proposes a decoding method to improve retrieval-augmented LLMs by prioritizing relevant and reliable external knowledge, reducing distractibility issues.


Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks

http://arxiv.org/abs/2406.17518v1

Compressor summary: The text proposes a method to build causal knowledge networks for effective adaptive learning systems using Bayesian networks and recommendations based on human-centered explainable AI in education.


Preserving Node Distinctness in Graph Autoencoders via Similarity Distillation

http://arxiv.org/abs/2406.17517v1

Compressor summary: The authors propose a technique to improve graph autoencoders by transferring node similarity knowledge from raw graphs to reconstructed graphs using a KL constraint, enhancing their distinctiveness and performance.


Benchmarking Mental State Representations in Language Models

http://arxiv.org/abs/2406.17513v1

Compressor summary: The study examines how different language model characteristics affect their ability to represent mental states and reason about them, using probes and prompt variations.


WAVE: Weight Template for Adaptive Initialization of Variable-sized Models

http://arxiv.org/abs/2406.17503v1

Compressor summary: WAVE is a multitasking method that initializes variable-sized models with weight templates learned from pre-trained models, improving efficiency and performance.


MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

http://arxiv.org/abs/2406.17484v1

Compressor summary: The paper proposes a two-stage fine-tuning pipeline for large language models to improve their performance on diverse medical tasks by encoding knowledge and filtering noise, as well as aligning the model with task-specific data.


TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision

http://arxiv.org/abs/2406.17483v1

Compressor summary: TRIP is a hardware-efficient hard attention framework for event-based vision processing on neuromorphic processors that produces low-resolution ROIs for efficient and accurate classification, achieving state-of-the-art accuracies and significant improvements in computation, latency, and energy.


Transformer-based Named Entity Recognition with Combined Data Representation

http://arxiv.org/abs/2406.17474v1

Compressor summary: The study explores transformer-based models for named entity recognition, finding that combining different data representation strategies improves performance across multiple languages and datasets.


UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment

http://arxiv.org/abs/2406.17472v1

Compressor summary: The authors present a new Image Quality Assessment dataset with 6073 high-resolution, aesthetic images, annotated by experts and enriched with metadata, to improve perceptual image quality evaluation research.


Cross-Modal Spherical Aggregation for Weakly Supervised Remote Sensing Shadow Removal

http://arxiv.org/abs/2406.17469v1

Compressor summary: The paper proposes S2-ShadowNet, a network that uses both visible and infrared images for shadow removal, by learning cross-domain mapping and exploiting a spherical feature space with similarity and orthogonality losses.


Early learning of the optimal constant solution in neural networks and humans

http://arxiv.org/abs/2406.17467v1

Compressor summary: The text describes how deep neural networks initially learn the optimal constant solution (OCS), which is a pattern in the target labels, before adapting to more complex functions during training. This OCS phase is observed not only in linear networks but also in nonlinear ones and human learners, suggesting it as a universal learning principle.


Enhancing Tool Retrieval with Iterative Feedback from Large Language Models

http://arxiv.org/abs/2406.17465v1

Compressor summary: The paper proposes a method to improve tool learning for large language models using iterative feedback between the tool usage model and the tool retriever model, addressing challenges such as complex user instructions and misalignment between models.


The Tree of Diffusion Life: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

http://arxiv.org/abs/2406.17462v1

Compressor summary: TDL is a method to visualize and understand the data evolution in diffusion models by embedding high-dimensional samples into a lower-dimensional space preserving their relations and evolutionary structure.


Investigating Self-Supervised Methods for Label-Efficient Learning

http://arxiv.org/abs/2406.17460v1

Compressor summary: The paper compares different self-supervised learning tasks for vision transformers and introduces a framework using masked image modelling and clustering that performs well on low-shot downstream tasks.


Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration

http://arxiv.org/abs/2406.17458v1

Compressor summary: The paper proposes a deep learning method using self-attention and Markov networks for continuous urban change detection from satellite image time series, achieving promising results.


Improving Grammatical Error Correction via Contextual Data Augmentation

http://arxiv.org/abs/2406.17456v1

Compressor summary: The paper proposes a contextual augmentation method for creating synthetic data in Grammatical Error Correction, which improves error distribution consistency and uses relabeling to reduce noisy labels.


Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Expected Information Gain

http://arxiv.org/abs/2406.17453v1

Compressor summary: The paper proposes a method to improve LLM-generated questions' informativeness for 20-question game dialogues using Direct Preference Optimization on question pairs.


Pseudo Labelling for Enhanced Masked Autoencoders

http://arxiv.org/abs/2406.17450v1

Compressor summary: The paper proposes an enhanced Masked Autoencoder model that uses token-level reconstruction and pseudo labeling with a decoupled teacher network to improve performance on various image tasks.


Using joint angles based on the international biomechanical standards for human action recognition and related tasks

http://arxiv.org/abs/2406.17443v1

Compressor summary: The paper introduces biomechanical notions to convert keypoint data into joint angles that are suitable for machine learning and interpretation by humans in various applications like sports and medicine.


Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

http://arxiv.org/abs/2406.17442v1

Compressor summary: Mamba is a new architecture that uses state space models to improve 3D point cloud semantic segmentation with linear complexity and strong global modeling capability.


Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes

http://arxiv.org/abs/2406.17438v1

Compressor summary: Implicit-Zoo is a large dataset for neural implicit functions that improves performance in computer vision tasks like image classification, semantic segmentation, and 3D pose regression.


Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD

http://arxiv.org/abs/2406.17437v1

Compressor summary: Key points: - Paper proposes a novel recognition-based approach for question-answering handwritten documents - Model uses transformer-based document retrieval and ensemble methods - Achieves state-of-the-art results on HW-SQuAD and BenthamQA datasets - Code and trained models will be publicly available Summary: The paper presents a novel recognition-based approach that uses transformer-based document retrieval and ensemble methods to improve question-answering on handwritten documents, achieving state-of-the-art results and releasing code and models.


Mind the Graph When Balancing Data for Fairness or Robustness

http://arxiv.org/abs/2406.17433v1

Compressor summary: This paper studies how data balancing can affect fairness and robustness in machine learning, and emphasizes the need to consider the causal graph for effective mitigation strategies.


Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

http://arxiv.org/abs/2406.17430v1

Compressor summary: The paper proposes a taxonomy of speech-specific risks, such as sarcasm, imitation, and biases, and evaluates LMMs' effectiveness in detecting them.


A Critical Analysis of the Theoretical Framework of the Extreme Learning Machine

http://arxiv.org/abs/2406.17427v1

Compressor summary: The Extreme Learning Machine (ELM) lacks rigorous mathematical justification, and we refute its proofs, create a counterexample dataset, and offer alternative foundational statements.


Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

http://arxiv.org/abs/2406.17419v1

Compressor summary: Loong is a novel benchmark for evaluating large language models in realistic long-context scenarios through extended multi-document question answering with diverse tasks and context lengths.


SE-VGAE: Unsupervised Disentangled Representation Learning for Interpretable Architectural Layout Design Graph Generation

http://arxiv.org/abs/2406.17418v1

Compressor summary: The paper introduces a new framework, SE-VGAE, that uses unsupervised disentangled representation learning to generate and interpret architectural layout graphs from floor plan images.


Variable Layer-Wise Quantization: A Simple and Effective Approach to Quantize LLMs

http://arxiv.org/abs/2406.17415v1

Compressor summary: The authors propose a variable quantization approach for large language models, where different layers are quantized at varying bit levels based on their importance, resulting in minimal performance drop and compressed model size.


Consensus Learning with Deep Sets for Essential Matrix Estimation

http://arxiv.org/abs/2406.17414v1

Compressor summary: Our method, based on Deep Sets, estimates the essential matrix by identifying outliers and modeling noise in point matches, outperforming complex networks.


Depth-Guided Semi-Supervised Instance Segmentation

http://arxiv.org/abs/2406.17413v1

Compressor summary: The Depth-Guided Semi-Supervised Instance Segmentation framework uses depth maps to generate precise contours for distinct instances, overcoming limitations of RGB information, and achieves better performance than previous methods.


Less can be more: representational vs. stereotypical gender bias in facial expression recognition

http://arxiv.org/abs/2406.17405v1

Compressor summary: The paper investigates how demographic biases, especially stereotypical ones, in facial expression recognition datasets affect machine learning models' predictions.


Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

http://arxiv.org/abs/2406.17404v1

Compressor summary: The Make Some Noise (MSN) training framework improves parallel decoding and inference speed of large language models without sacrificing performance by introducing noise and using a tree-based retrieval-augmented decoding strategy.


GradCheck: Analyzing classifier guidance gradients for conditional diffusion sampling

http://arxiv.org/abs/2406.17399v1

Compressor summary: The study analyzes how to improve the quality of samples from a specific type of model using different techniques, focusing on the stability of gradients for better guidance.


SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing

http://arxiv.org/abs/2406.17396v1

Compressor summary: SyncNoise is a novel approach for consistent 3D scene editing using 2D diffusion models, achieving high-quality results with global and local consistency across multiple viewpoints.


Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance

http://arxiv.org/abs/2406.17385v1

Compressor summary: The study shows that LLMs give lower-quality or factually incorrect responses to non-native English speakers more frequently than to native speakers, and this difference persists across different regions.


Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

http://arxiv.org/abs/2406.17382v1

Compressor summary: This paper tests seven human pose estimation methods on infant videos and finds ViTPose to be the best performer for understanding infant movement in natural settings.


Forget but Recall: Incremental Latent Rectification in Continual Learning

http://arxiv.org/abs/2406.17381v1

Compressor summary: ILR is a new continual learning approach that uses rectifier units to correct and update old task representations in deep neural networks, improving performance on several benchmarks.


A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens

http://arxiv.org/abs/2406.17378v1

Compressor summary: The text describes an interesting phenomenon in large language models where the text embeddings can be aligned with key tokens, and shows its potential applications in information retrieval and understanding fuzzy concepts.


A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs

http://arxiv.org/abs/2406.17377v1

Compressor summary: The text explores three cross-lingual methods to adapt an English-dominated LLM to Indic languages and finds that additional supervisory signals and continued pre-training in one low-resource language help improve performance.


An Empirical Study on the Characteristics of Bias upon Context Length Variation for Bangla

http://arxiv.org/abs/2406.17375v1

Compressor summary: The authors create a Bangla dataset for measuring gender bias in pretrained language models and show that context length affects bias metrics, calling for more nuanced analysis.


Generalizability of experimental studies

http://arxiv.org/abs/2406.17374v1

Compressor summary: The paper proposes a mathematical formalization of experimental studies in machine learning and develops a quantifiable measure of generalizability, which is applied to two benchmarks and can be used with a provided Python module.


Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation

http://arxiv.org/abs/2406.17363v1

Compressor summary: The paper presents Irish-to-English speech translation systems using Whisper with various data augmentation techniques to improve performance.


Stacked Confusion Reject Plots (SCORE)

http://arxiv.org/abs/2406.17346v1

Compressor summary: SCORE is a new tool for machine learning applications that helps users understand how uncertain their predictions are by using stacked confusion matrices and reject curves.


NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods

http://arxiv.org/abs/2406.17345v1

Compressor summary: The paper introduces NerfBaselines, a framework to simplify installation and evaluation of novel view synthesis methods, and provides a web platform for comparison.


Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

http://arxiv.org/abs/2406.17343v1

Compressor summary: The paper proposes Q-DiT, a technique to improve image synthesis quality and efficiency for diffusion transformers by fine-grained quantization and other techniques.


Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

http://arxiv.org/abs/2406.17342v1

Compressor summary: Point-MAGE is a framework that leverages generative modeling and representation learning for point cloud data, achieving state-of-the-art performance in various tasks.


Generative Modelling of Structurally Constrained Graphs

http://arxiv.org/abs/2406.17341v1

Compressor summary: ConStruct is a novel framework that allows hard-constraining graph diffusion models to incorporate specific properties such as planarity or acyclicity, ensuring valid graphs for practical applications.


Dual-Space Knowledge Distillation for Large Language Models

http://arxiv.org/abs/2406.17328v1

Compressor summary: DSKD is a novel framework to compress large language models by unifying their output spaces and aligning their representations using cross-model attention.


The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game

http://arxiv.org/abs/2406.17326v1

Compressor summary: The study applies reinforcement learning (RL) to evolutionary game theory, using the SARSA algorithm to model how cooperative behavior emerges and changes among self-interested agents.


Delving into the Utilisation of ChatGPT in Scientific Publications in Astronomy

http://arxiv.org/abs/2406.17324v1

Compressor summary: The study examines the use of large language models in astronomy papers and finds a significant increase in words favored by ChatGPT, suggesting widespread adoption and need for ethical guidelines.


XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

http://arxiv.org/abs/2406.17323v1

Compressor summary: Key points: - Scattered light artefacts in astronomical observations are problematic and need automated detection - A dataset of images with different types of artefacts from the XMM-Newton space telescope is presented - A hybrid model combining CNNs and transformers is used to detect and mask artefacts using instance segmentation - The method and dataset can help advance artefact detection in astronomical observations Summary: The authors present a new dataset and a hybrid model that can detect and mask scattered light artefacts in astronomical images from the XMM-Newton space telescope using instance segmentation.


ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data

http://arxiv.org/abs/2406.17322v1

Compressor summary: ALPBench is a tool for benchmarking and comparing active learning pipelines, which consists of 86 real-world datasets and supports various query strategies and learning algorithms.


DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

http://arxiv.org/abs/2406.17319v1

Compressor summary: The paper introduces a new dual-channel modality fusion network (DMF-Net) for completing partial point clouds using image guidance, which performs better than existing methods.


Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

http://arxiv.org/abs/2406.17312v1

Compressor summary: The paper proposes a method to select which response pairs to annotate for iterative preference learning, considering uncertainty and distribution shifts, to achieve better performance with less annotation cost.


Zero-Shot Long-Form Video Understanding through Screenplay

http://arxiv.org/abs/2406.17309v1

Compressor summary: MM-Screenplayer is an advanced system that converts long videos into textual screenplays by organizing them into scenes and using a "Look Back" strategy to validate uncertain information, achieving high scores in a challenge.


Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

http://arxiv.org/abs/2406.17305v1

Compressor summary: The paper explores using Retrieval Augmented Instruction Tuning (RA-IT) for information extraction in open named entity recognition tasks, showing its effectiveness across languages and data sizes.


Leveraging LLMs for Dialogue Quality Measurement

http://arxiv.org/abs/2406.17304v1

Compressor summary: The paper explores using large language models (LLMs) for evaluating dialogue quality, finding that larger models, algorithmic example selection, chain-of-thought reasoning, and fine-tuning improve performance.


CausalScore: An Automatic Reference-Free Metric for Assessing Response Relevance in Open-Domain Dialogue Systems

http://arxiv.org/abs/2406.17300v1

Compressor summary: CausalScore is a novel metric that measures the relevance of responses in open-domain dialogues by estimating the causal strength between dialogue histories and responses, outperforming existing metrics in aligning with human judgements.


Towards Efficient and Scalable Training of Differentially Private Deep Learning

http://arxiv.org/abs/2406.17298v1

Compressor summary: The paper investigates the high computational cost of differentially private deep learning training and compares methods to reduce it.


Towards Open-set Camera 3D Object Detection

http://arxiv.org/abs/2406.17297v1

Compressor summary: OS-Det3D is a two-stage framework that uses a novel 3D Object Discovery Network and Joint Objectness Selection module to improve camera 3D object detection for both known and unknown objects.


BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

http://arxiv.org/abs/2406.17296v1

Compressor summary: BlockLLM reduces memory requirements for training large language models by selecting and updating a small subset of parameters without changing the model architecture or training procedure.


Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

http://arxiv.org/abs/2406.17294v1

Compressor summary: The authors create MathV360K, a diverse multimodal dataset for image instruction fine-tuning, and introduce Math-LLaVA, a model that improves multimodal mathematical reasoning with this dataset.


Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

http://arxiv.org/abs/2406.17287v1

Compressor summary: The study shows that Large Language Models can predict personality traits from counseling dialogues using role-play simulations and questionnaires, outperforming previous methods.


A Recursive Encoding for Cuneiform Signs

http://arxiv.org/abs/2406.17283v1

Compressor summary: The paper introduces a recursive encoding system for cuneiform signs that simplifies sign lookup, allows for computer processing, and enables new methods of rendering and displaying signs and tablets.


BERT, Neural Information Retrieval, Boolean Retrieval, Negation Retrieval

http://arxiv.org/abs/2406.17282v1

Compressor summary: SetBERT is a small and effective BERT-based model for enhancing logic-structured queries by using inversed-contrastive loss and outperforming BERT-base.


Distance Recomputator and Topology Reconstructor for Graph Neural Networks

http://arxiv.org/abs/2406.17281v1

Compressor summary: The paper presents two new methods to improve Graph Neural Networks (GNNs) by dynamically adjusting node distances and local graph structures for better representation and aggregation of complex and dynamic graphs.


OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

http://arxiv.org/abs/2406.17276v1

Compressor summary: Speculative decoding uses adaptive draft trees to generate multiple tokens per step, improving inference efficiency of autoregressive language models.


Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

http://arxiv.org/abs/2406.17274v1

Compressor summary: The paper introduces a comprehensive benchmark for evaluating uncertainty estimation in text summarization, incorporating various NLG metrics and human annotations.


A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

http://arxiv.org/abs/2406.17272v1

Compressor summary: The paper proposes a solution to improve speech recognition by connecting speech encoders to large language models with fine-tuning schemes, modality alignment enhancement, and methods to reduce insertion errors.


DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

http://arxiv.org/abs/2406.17271v1

Compressor summary: The authors propose a method called DARG to generate dynamic and diverse evaluation data for Large Language Models, revealing their performance and bias patterns under different complexity levels.


Image-Guided Outdoor LiDAR Perception Quality Assessment for Autonomous Driving

http://arxiv.org/abs/2406.17265v1

Compressor summary: The paper introduces a new algorithm for assessing LiDAR point cloud quality in outdoor autonomous driving environments using both image data and ground truth annotations, improving detection performance.


Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

http://arxiv.org/abs/2406.17263v1

Compressor summary: The paper proposes GMKI, an efficient derivative-free sampler for handling multi-modal distributions in Bayesian inference for large-scale inverse problems.


D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

http://arxiv.org/abs/2406.17262v1

Compressor summary: D2LLM combines a bi-encoder and an interaction module to achieve both efficiency and nuanced understanding in semantic search, outperforming five baselines on three tasks.


TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

http://arxiv.org/abs/2406.17261v1

Compressor summary: TRAWL optimizes large language models through tensor decomposition to improve performance without retraining or extra data, making AI systems more efficient and sustainable.


Mitigating Hallucination in Fictional Character Role-Play

http://arxiv.org/abs/2406.17260v1

Compressor summary: The authors present a method to reduce hallucination in role-playing dialogues by adjusting the influence of large language models' world knowledge with a confidence threshold, and demonstrate its effectiveness on a new dataset.


Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

http://arxiv.org/abs/2406.17257v1

Compressor summary: Key points: - TTS models face challenges in synthesizing speech for multiple languages - Standard approach uses transformers and large multilingual datasets - Paper proposes using PETL methods (adapters, hypernetworks) for better performance with less parameters Summary: The paper introduces a new method to improve TTS models for multiple languages by using PETL techniques that require fewer parameters and achieve similar or better results than standard fine-tuning.


Disentangled Motion Modeling for Video Frame Interpolation

http://arxiv.org/abs/2406.17256v1

Compressor summary: The paper proposes MoMo, a diffusion-based approach for video frame interpolation that enhances visual quality by focusing on intermediate motion modeling, using a novel diffusion U-Net architecture and achieving superior perceptual quality with reduced computational demands.


MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning

http://arxiv.org/abs/2406.17255v1

Compressor summary: MPCoder uses explicit and implicit residual learning to generate personalized, multi-user code with a contrastive adapter and a novel evaluation metric.


Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

http://arxiv.org/abs/2406.17254v1

Compressor summary: ScalpVision is an AI-driven system that uses innovative methods to segment hair and generate data for diagnosing scalp diseases and alopecia.


How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?

http://arxiv.org/abs/2406.17253v1

Compressor summary: The study explores how different levels of "perplexingness" affect the ability to update large language models with new knowledge, and introduces a novel dataset to investigate this phenomenon.


TopoGCL: Topological Graph Contrastive Learning

http://arxiv.org/abs/2406.17251v1

Compressor summary: Graph contrastive learning improves representations by incorporating latent shape properties of graphs at multiple resolutions, enhancing performance in unsupervised graph classification.


Unlocking Continual Learning Abilities in Language Models

http://arxiv.org/abs/2406.17245v1

Compressor summary: Key points: - LMs struggle with catastrophic forgetting in continual learning (CL) - MIGU is a rehearsal-free and task-label-free method that updates parameters with large output magnitudes in linear layers - MIGU is universal, effective, and can integrate with existing CL types Summary: MIGU is a novel method that improves LMs' continual learning performance by updating parameters based on output magnitude distribution in linear layers, without rehearsal or task labels.


What Do the Circuits Mean? A Knowledge Edit View

http://arxiv.org/abs/2406.17241v1

Compressor summary: The authors propose a novel method to learn the meanings of circuit representations in GPT2-XL using knowledge editing and explore their properties, such as size, composition, and overlap with other datasets.


Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples

http://arxiv.org/abs/2406.17238v1

Compressor summary: The paper proposes an Expansive Synthesis model that generates large-quality datasets from minimal samples by using expander graph mappings, feature interpolation, Koopman operator, and optimal transport, achieving performance on par with classifiers trained on full-scale datasets.


LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

http://arxiv.org/abs/2406.17236v1

Compressor summary: LIPE is a two-stage framework that learns a personalized identity prior for text-based non-rigid image editing, improving consistency and quality in editing results.


Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

http://arxiv.org/abs/2406.17232v1

Compressor summary: The study explored how integrating human belief networks into large language models can improve their alignment with human opinions on related topics.


CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

http://arxiv.org/abs/2406.17231v1

Compressor summary: The CogMG framework uses knowledge graphs to improve question-answering by large language models, reducing hallucinations and increasing accuracy.


Large Language Models are Interpretable Learners

http://arxiv.org/abs/2406.17224v1

Compressor summary: The paper proposes using large language models and symbolic programs to create interpretable and expressive predictive models for classification tasks, and introduces IL-Bench, a collection of diverse tasks for evaluation.


Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

http://arxiv.org/abs/2406.17219v1

Compressor summary: The paper proposes a new face anonymization method that distracts both intrinsic and extrinsic identity attention, allowing for flexible manipulation of appearance and geometry structure to protect privacy better than existing methods.


Machine Unlearning Fails to Remove Data Poisoning Attacks

http://arxiv.org/abs/2406.17216v1

Compressor summary: Existing machine unlearning methods fail to effectively remove the effects of data poisoning on deep learning models, highlighting the need for more rigorous evaluation metrics and provable guarantees.


Detecting Frames in News Headlines and Lead Images in U.S. Gun Violence Coverage

http://arxiv.org/abs/2406.17213v1

Compressor summary: The study combines article text and image features to identify news frames, finding that relevant images improve frame prediction and concreteness affects image usefulness.


Contrastive General Graph Matching with Adaptive Augmentation Sampling

http://arxiv.org/abs/2406.17199v1

Compressor summary: The text introduces a new method called Graph-centric Contrastive framework for Graph Matching (GCGM) that uses graph augmentations for contrastive learning without side information and outperforms existing self-supervised methods in pattern recognition tasks.


Geometric Median (GM) Matching for Robust Data Pruning

http://arxiv.org/abs/2406.17188v1

Compressor summary: The text proposes a new algorithm, Geometric Median Matching, for selecting informative subsets from large noisy datasets to train deep learning models efficiently and robustly.