arxiv compressed, 2024-02-15

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-15 generated by the compressor, my personal LLM-based project.


AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

http://arxiv.org/abs/2402.09404v1

Compressor summary: AQA-Bench is a benchmark to test large language models' ability to reason sequentially in algorithmic contexts, revealing insights on closed-source vs open-source models and other factors affecting performance.


Reinforcement Learning from Human Feedback with Active Queries

http://arxiv.org/abs/2402.09401v1

Compressor summary: The paper proposes a query-efficient reinforcement learning method for aligning large language models with human preferences by formalizing the problem as a contextual dueling bandit problem and designing an active-query-based proximal policy optimization algorithm.


Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

http://arxiv.org/abs/2402.09398v1

Compressor summary: The paper proposes LESS, a cache method for large language models that balances memory efficiency and information retention by integrating a constant-sized cache with eviction-based methods.


Long-form evaluation of model editing

http://arxiv.org/abs/2402.09394v1

Compressor summary: The paper introduces LEME, a protocol to evaluate model editing techniques for long-form text generation, which reveals different performance aspects than short-form metrics and identifies common failure modes in long-form settings.


LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

http://arxiv.org/abs/2402.09391v1

Compressor summary: The paper introduces SMolInstruct, a large dataset for instruction tuning, and shows that LLMs can achieve strong results on various chemistry tasks, outperforming GPT-4.


HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

http://arxiv.org/abs/2402.09390v1

Compressor summary: The hierarchical graph of thoughts (HGOT) is a multi-layered graph approach that improves the factuality and quality of language models by enhancing the retrieval of relevant passages during in-context learning.


Entropy-regularized Point-based Value Iteration

http://arxiv.org/abs/2402.09388v1

Compressor summary: The paper proposes an entropy-regularized model-based planner for partially observable problems that improves robustness and objective inference by reducing policy commitment to a single action.


GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly

http://arxiv.org/abs/2402.09381v1

Compressor summary: GraSSRep is a novel method that uses graph neural networks to classify DNA sequences as repetitive or non-repetitive, improving repeat detection accuracy in metagenomic data.


Loss Shaping Constraints for Long-Term Time Series Forecasting

http://arxiv.org/abs/2402.09373v1

Compressor summary: The text proposes a Constrained Learning method for long-term time series forecasting that minimizes both average performance and maximum error, using a Primal-Dual algorithm.


Transformers Can Achieve Length Generalization But Not Robustly

http://arxiv.org/abs/2402.09371v1

Compressor summary: The paper investigates how data format and position encoding affect length generalization in Transformers for adding two integers and achieves a 2.5x extrapolation, but finds the process sensitive to random seeds and other factors.


Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

http://arxiv.org/abs/2402.09369v1

Compressor summary: The paper introduces CultureAtlas, a dataset for acquiring multicultural knowledge from Wikipedia links to improve language models' cross-cultural communication and understanding.


Magic-Me: Identity-Specific Video Customized Diffusion

http://arxiv.org/abs/2402.09368v1

Compressor summary: VCD is a simple framework for generating videos with controlled subject identities using various modules to disentangle and enhance identity information.


Prediction of Activated Sludge Settling Characteristics from Microscopy Images with Deep Convolutional Neural Networks and Transfer Learning

http://arxiv.org/abs/2402.09367v1

Compressor summary: Key points: - Microbial communities affect wastewater treatment processes and settling characteristics - Computer vision-based approach using deep CNN models to assess sludge settling based on microscopy images - Transfer learning, data augmentation, and various CNN architectures tested for performance - Approach provides less labour-intensive, objective, and consistent assessments Summary: The study presents a computer vision-based approach using deep CNN models to predict sludge settling characteristics in wastewater treatment plants based on microscopy images. The approach uses transfer learning, data augmentation, and different CNN architectures to overcome limitations of existing techniques and provide objective and consistent assessments.


Copyright Traps for Large Language Models

http://arxiv.org/abs/2402.09363v1

Compressor summary: The paper proposes using copyright traps made of fictitious entries in original content to detect the use of copyrighted materials in Large Language Models that do not naturally memorize.


HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

http://arxiv.org/abs/2402.09360v1

Compressor summary: HiRE is a method that combines compression and efficient multi-device operations to speed up autoregressive decoding with sparse large language models on accelerators.


Integrating ChatGPT into Secure Hospital Networks: A Case Study on Improving Radiology Report Analysis

http://arxiv.org/abs/2402.09358v1

Compressor summary: The study shows how to use a cloud-based AI similar to ChatGPT to analyze radiology reports in hospitals, keeping data private and improving accuracy, reliability, and interpretability.


DoRA: Weight-Decomposed Low-Rank Adaptation

http://arxiv.org/abs/2402.09353v1

Compressor summary: DoRA is a novel method that combines weight decomposition and LoRA to improve fine-tuning performance while avoiding extra inference costs.


Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop

http://arxiv.org/abs/2402.09346v1

Compressor summary: The paper proposes an automatic method to create probes to audit large language models using different versions of the same question and human verification, increasing transparency and scientific rigor.


Mitigating Reward Hacking via Information-Theoretic Reward Modeling

http://arxiv.org/abs/2402.09345v1

Compressor summary: The paper proposes InfoRM, an information theoretic-based framework for reward modeling in reinforcement learning from human feedback, which can detect and mitigate reward overoptimization by using a variational information bottleneck objective and a cluster deviation score.


Generating Diverse Translation with Perturbed kNN-MT

http://arxiv.org/abs/2402.09344v1

Compressor summary: The paper introduces methods to generate more diverse translations using perturbed k-nearest neighbor machine translation, addressing the overcorrection problem in previous methods.


AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach

http://arxiv.org/abs/2402.09334v1

Compressor summary: AuditLLM is a tool that evaluates the safety, consistency, and reliability of Large Language Models by probing them with multiple variations of a single question, helping to identify potential issues such as bias or hallucinations.


YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection

http://arxiv.org/abs/2402.09329v1

Compressor summary: The YOLOv8-AM model incorporating attention mechanisms improves fracture detection performance and achieves state-of-the-art results in computer-assisted diagnosis of wrist trauma.


Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

http://arxiv.org/abs/2402.09327v1

Compressor summary: This paper studies how memorizing training data affects learning performance in stochastic convex optimization and shows a tradeoff between accuracy and memorization measured by conditional mutual information.


Stability and Multigroup Fairness in Ranking with Uncertain Predictions

http://arxiv.org/abs/2402.09326v1

Compressor summary: The text discusses ranking functions that incorporate uncertainty in classification tasks, focusing on their stability and multigroup fairness properties.


PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments

http://arxiv.org/abs/2402.09325v1

Compressor summary: PC-NeRF is a novel framework for 3D scene reconstruction and view synthesis using sparse LiDAR frames, which divides the scene into different levels of representation and leverages hierarchical spatial partitioning.


ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

http://arxiv.org/abs/2402.09320v1

Compressor summary: ICDPO is a novel approach that improves LLMs' ability to generate safe content by borrowing HPA capabilities from superior LLMs with In-context Learning, without the need for fine-tuning.


Only My Model On My Data: A Privacy Preserving Approach Protecting one Model and Deceiving Unauthorized Black-Box Models

http://arxiv.org/abs/2402.09316v1

Compressor summary: The study presents a method to protect image data privacy by generating human-perceivable images that can be accurately classified by authorized models while confusing unauthorized ones.


Few-Shot Object Detection with Sparse Context Transformers

http://arxiv.org/abs/2402.09315v1

Compressor summary: The paper proposes a sparse context transformer (SCT) for few-shot object detection, which leverages source domain knowledge and learns sparse context from few target domain images to reduce class confusion and enhance detector performance.


Embracing the black box: Heading towards foundation models for causal discovery from time series data

http://arxiv.org/abs/2402.09305v1

Compressor summary: Causal Pretraining is a method to learn causal graphs from time series data in a supervised way, which can improve performance with more data and larger models.


Immediate generalisation in humans but a generalisation lag in deep neural networks$\unicode{x2014}$evidence for representational divergence?

http://arxiv.org/abs/2402.09303v1

Compressor summary: This study compares how humans and deep neural networks learn image classification and finds significant differences in their representational changes during the learning process.


Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning

http://arxiv.org/abs/2402.09290v1

Compressor summary: PSRL is a reinforcement learning framework that fuses supervised and unsupervised learning to improve policy interpretability and performance.


EcoVal: An Efficient Data Valuation Framework for Machine Learning

http://arxiv.org/abs/2402.09288v1

Compressor summary: The paper introduces EcoVal, a fast and practical framework to estimate the value of clusters of similar data points for machine learning models, using intrinsic and extrinsic values and a production function concept.


Nutrition Facts, Drug Facts, and Model Facts: Putting AI Ethics into Practice in Gun Violence Research

http://arxiv.org/abs/2402.09286v1

Compressor summary: The study proposes a Model Facts template to increase AI trust and transparency in firearm injury research, allowing users to assess the validity and biases of models without technical knowledge.


Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

http://arxiv.org/abs/2402.09283v1

Compressor summary: This text is a survey of research on how to make large language models safe for conversational applications and prevent harmful responses.


Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies

http://arxiv.org/abs/2402.09282v1

Compressor summary: This paper shows how using GPT-4's Chain of Thought technique can improve NER performance in BERT by combining distilled knowledge from GPT-4 with human annotations, leading to better results and cost savings.


Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification

http://arxiv.org/abs/2402.09281v1

Compressor summary: The paper proposes a novel method that combines covariance and Hessian matrices to improve binary classification by maximizing between-class distance and minimizing within-class variance in a two-dimensional space.


Hybrid Machine Learning techniques in the management of harmful algal blooms impact

http://arxiv.org/abs/2402.09271v1

Compressor summary: BAGNET is a hybrid machine learning model that accurately predicts the toxicity levels of mollusc meat due to harmful algal blooms, helping to control shellfish production areas effectively.


Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

http://arxiv.org/abs/2402.09270v1

Compressor summary: Window-based event denoising is a method that improves interpretability and real-time processing for deep learning-based noise removal in complex scenes, using temporal and spatial analysis to filter out irrelevant events.


Transformers, parallel computation, and logarithmic depth

http://arxiv.org/abs/2402.09268v1

Compressor summary: Transformers can efficiently simulate and be simulated by communication rounds, enabling them to solve basic computational tasks faster than other neural sequence models.


Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

http://arxiv.org/abs/2402.09267v1

Compressor summary: The paper proposes a method to reduce factual errors in large language models by using their own self-evaluation abilities for training and fine-tuning.


Machine Learning in management of precautionary closures caused by lipophilic biotoxins

http://arxiv.org/abs/2402.09266v1

Compressor summary: The text proposes a predictive model using the kNN algorithm to support precautionary closures in mussel farming based on harmful algal blooms, achieving high accuracy and sensitivity values.


UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers

http://arxiv.org/abs/2402.09264v1

Compressor summary: UR2M is a framework for accurate event detection and reliable uncertainty estimation on wearable devices with low resource constraints.


MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

http://arxiv.org/abs/2402.09262v1

Compressor summary: MultiMedEval is an open-source toolkit for evaluating medical vision-language models on various tasks and datasets.


SyntaxShap: Syntax-aware Explainability Method for Text Generation

http://arxiv.org/abs/2402.09259v1

Compressor summary: SyntaxShap is a new method to explain text generation by considering syntactic dependencies in the data.


TDViT: Temporal Dilated Video Transformer for Dense Video Tasks

http://arxiv.org/abs/2402.09257v1

Compressor summary: The Temporal Dilated Video Transformer (TDViT) is a model that efficiently extracts spatiotemporal representations for dense video tasks, such as object detection and instance segmentation, by using temporal dilated transformer blocks and hierarchical structures.


Exploring the Relationship: Transformative Adaptive Activation Functions in Comparison to Other Activation Functions

http://arxiv.org/abs/2402.09249v1

Compressor summary: The paper introduces a new adaptive activation function for neural networks that can transform and generalize existing activation functions, improving their performance and versatility.


Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection

http://arxiv.org/abs/2402.09242v1

Compressor summary: Zero-Shot Food Detection (ZSFD) is tackled by a new framework, ZSFDet, which uses multi-source graphs to model the correlation between food categories and attributes and improves performance on FOWA and UECFOOD-256 datasets.


Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

http://arxiv.org/abs/2402.09241v1

Compressor summary: The paper proposes a framework to improve one-stage video object detection by exploiting temporal consistency, reducing computational costs, and increasing efficiency.


Switch EMA: A Free Lunch for Better Flatness and Sharpness

http://arxiv.org/abs/2402.09240v1

Compressor summary: Switch EMA (SEMA) is a simple modification to the Exponential Moving Average regularization method in deep neural networks, which improves generalization and convergence without extra costs.


Robust Training of Temporal GNNs using Nearest Neighbours based Hard Negatives

http://arxiv.org/abs/2402.09239v1

Compressor summary: The paper proposes a new method to train temporal graph neural networks (TGNNs) by using importance-based negative sampling instead of uniform random sampling, which improves their performance in future-link prediction tasks.


Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

http://arxiv.org/abs/2402.09237v1

Compressor summary: The paper improves image retrieval for visual localization by generating synthetic variants of training images and using a tailored training approach.


Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

http://arxiv.org/abs/2402.09236v1

Compressor summary: The paper proposes a method to learn interpretable concepts from data using ideas from causal representation learning and foundation models, and demonstrates its effectiveness on synthetic and natural language data.


Multi-Hierarchical Surrogate Learning for Structural Dynamics of Automotive Crashworthiness Using Graph Convolutional Neural Networks

http://arxiv.org/abs/2402.09234v1

Compressor summary: Key points: - Crash simulations are important but computationally expensive - A multi-hierarchical framework for creating surrogate models at different resolutions is proposed - Surrogates learn low-dimensional latent dynamics and transfer learning is used to pass information between levels Summary: The authors propose a method to create surrogate models of crash simulations at different resolutions, using graph convolutional networks and transfer learning to capture macroscale and microscale features efficiently.


Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

http://arxiv.org/abs/2402.09226v1

Compressor summary: The paper studies how two-homogeneous neural networks with small initializations converge in direction to KKT points of a correlation function using gradient flow dynamics and saddle-to-saddle dynamics.


Is my Data in your AI Model? Membership Inference Test with Application to Face Images

http://arxiv.org/abs/2402.09225v1

Compressor summary: The paper presents MINT, a method to determine if an AI model was trained with specific data, using two novel architectures based on MLP and CNNs, and evaluates them on face recognition tasks with six databases.


Spectral Filters, Dark Signals, and Attention Sinks

http://arxiv.org/abs/2402.09221v1

Compressor summary: The paper introduces a method to analyze and control attention mechanisms in transformer-based LLMs using spectral filters on intermediate representations.


Scaling the Authoring of AutoTutors with Large Language Models

http://arxiv.org/abs/2402.09216v1

Compressor summary: The paper explores using Large Language Models (LLMs) to create Intelligent Tutoring Systems with handcrafted pedagogical designs, and presents MWPTutor, a sample system that outperforms GPT-4 in human evaluation on math word problems.


DivaTrack: Diverse Bodies and Motions from Acceleration-Enhanced Three-Point Trackers

http://arxiv.org/abs/2402.09211v1

Compressor summary: DivaTrack is a deep learning framework that improves full-body pose estimation in digital reality by using linear accelerations from IMUs and blending predictions in two reference frames, outperforming existing methods on diverse body sizes and activities.


Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

http://arxiv.org/abs/2402.09205v1

Compressor summary: IN3 is a benchmark for inspecting users' implicit intentions in language model-driven agents, while Mistral-Interact is a powerful model that uses IN3 to improve user-agent interaction by refining vague tasks into actionable goals.


Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

http://arxiv.org/abs/2402.09204v1

Compressor summary: The text proposes a new post-hoc calibration method for deep neural networks that adapts to different test sets by using data augmentation and simulating domain shifts.


Better-than-KL PAC-Bayes Bounds

http://arxiv.org/abs/2402.09201v1

Compressor summary: The paper proposes a novel better-than-KL divergence for PAC-Bayes concentration inequalities and shows it can achieve strictly tighter bounds for estimating the mean of random sequences.


Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

http://arxiv.org/abs/2402.09199v1

Compressor summary: The paper proposes POGER, a method to improve AIGT detection by estimating word generation probabilities using multiple re-sampling in black-box settings.


Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

http://arxiv.org/abs/2402.09197v1

Compressor summary: Key points: - GBDT is a black-box model based on tree ensembles - A feature contribution method for GBDT is developed using node residues - The method allows to calculate node decisions and explain GBDT's behavior - The method is useful for ethical analysis of AI and compliance with GDPR Summary: The paper proposes a feature contribution method for GBDT that uses node residues to calculate node decisions and explain the model's behavior, which can help with ethical and legal issues in AI.


(Ir)rationality and Cognitive Biases in Large Language Models

http://arxiv.org/abs/2402.09193v1

Compressor summary: The paper evaluates language models' rational reasoning abilities using cognitive tasks and finds that they display irrationality similar to humans but with different biases and inconsistent responses.


Generalized Portrait Quality Assessment

http://arxiv.org/abs/2402.09178v1

Compressor summary: FHIQA is a learning-based method for assessing portrait quality in images that uses image semantics to improve precision and generalize to various scenes, as shown by experiments on the PIQ23 benchmark.


Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

http://arxiv.org/abs/2402.09177v1

Compressor summary: The authors propose a new Jailbreaking attack on large language models that uses preliminary question-answer pairs to guide the model's response towards revealing harmful information.


Nearly Optimal Regret for Decentralized Online Convex Optimization

http://arxiv.org/abs/2402.09173v1

Compressor summary: The paper proposes new decentralized online convex optimization algorithms with improved regret bounds by using an online accelerated gossip strategy and exploiting network topology spectral properties.


Evolving Restricted Boltzmann Machine-Kohonen Network for Online Clustering

http://arxiv.org/abs/2402.09167v1

Compressor summary: The paper introduces ERBM-KNet, a novel online clustering algorithm that combines an evolving restricted Boltzmann machine with a Kohonen network, achieving improved performance and handling streaming data efficiently.


Deinterleaving of Discrete Renewal Process Mixtures with Application to Electronic Support Measures

http://arxiv.org/abs/2402.09166v1

Compressor summary: Key points: - New deinterleaving method for mixtures of discrete renewal Markov chains - Method maximizes a penalized likelihood score - Theoretical analysis proves the method's accuracy under mild conditions - Experiments on synthetic data validate the theory - Applied to pulse trains in RESM context and performs well Summary: The paper presents a new deinterleaving method that uses a penalized likelihood score to separate symbols from discrete renewal Markov chains, and shows its theoretical and practical advantages.


Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

http://arxiv.org/abs/2402.09165v1

Compressor summary: The PNSIS framework uses probability theory to extract invariant subgraphs and improve generalization for graph Out-of-Distribution tasks, outperforming existing methods.


Less is More: Fewer Interpretable Region via Submodular Subset Selection

http://arxiv.org/abs/2402.09164v1

Compressor summary: The paper proposes a new image attribution method using submodular subset selection to improve model interpretability and accuracy for various samples, outperforming existing methods on three datasets.


Role-Playing Simulation Games using ChatGPT

http://arxiv.org/abs/2402.09161v1

Compressor summary: The article shows how Large Language Models and ChatGPT can improve teaching quality and student engagement through role-playing simulations during the digital transformation of education.


Attacking Large Language Models with Projected Gradient Descent

http://arxiv.org/abs/2402.09154v1

Compressor summary: Our method improves the speed and efficiency of creating adversarial prompts against large language models while maintaining their effectiveness.


Improved Regret for Bandit Convex Optimization with Delayed Feedback

http://arxiv.org/abs/2402.09152v1

Compressor summary: The paper proposes a new algorithm for bandit convex optimization with delayed feedback that improves the regret bound for various scenarios, including strong convexity and unconstrained action sets.


Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis

http://arxiv.org/abs/2402.09151v1

Compressor summary: The authors created a specialized pre-trained language model for psychological text analysis using a large dataset from Chinese social media platforms and integrated psychological lexicons into the training process.


Into the Unknown: Self-Learning Large Language Models

http://arxiv.org/abs/2402.09147v1

Compressor summary: The paper proposes a framework for large language models (LLMs) to learn unknown knowledge from their own hallucinations, using a score called Points in The Unknown (PiUs), and shows that finetuned or aligned 7B-Mistral models can self-learn effectively.


ResQuNNs:Towards Enabling Deep Learning in Quantum Convolution Neural Networks

http://arxiv.org/abs/2402.09146v1

Compressor summary: The paper proposes Residual Quanvolutional Neural Networks (ResQuNNs) that enable training of quanvolutional layers, improving the performance of quantum deep learning by addressing gradient-based optimization challenges.


When Representations Align: Universality in Representation Learning Dynamics

http://arxiv.org/abs/2402.09142v1

Compressor summary: The study proposes an effective theory of representation learning in deep neural networks that shows how different architectures learn similar representations when they are flexible enough and highlights behaviors that are conserved across various models.


Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies

http://arxiv.org/abs/2402.09141v1

Compressor summary: The study evaluates text augmentation techniques and their effects on NLP tasks, proposing Modified Cyclical Curriculum Learning (MCCL) as a novel approach to improve performance.


DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

http://arxiv.org/abs/2402.09136v1

Compressor summary: DolphCoder is a diverse instruction model for code generation that learns from various instruction targets and self-evaluates its code quality, achieving superior performance on benchmarks.


Exploring the Adversarial Capabilities of Large Language Models

http://arxiv.org/abs/2402.09132v1

Compressor summary: The text discusses the security risks of large language models, which can generate adversarial examples that fool hate speech detection systems and other safety measures.


Measuring Exploration in Reinforcement Learning via Optimal Transport in Policy Space

http://arxiv.org/abs/2402.09113v1

Compressor summary: The paper introduces a new measure called Exploration Index, which quantifies how much an RL algorithm explores compared to supervised learning, by measuring the distance travelled in the data distribution space using optimal transport metrics.


Headset: Human emotion awareness under partial occlusions multimodal dataset

http://arxiv.org/abs/2402.09107v1

Compressor summary: The paper introduces a new multimodal database with diverse and ethically compliant volumetric data of people speaking and wearing HMDs, recorded using a VoCap studio and a Lytro Illum camera, to aid XR algorithm development and testing.


Towards Realistic Landmark-Guided Facial Video Inpainting Based on GANs

http://arxiv.org/abs/2402.09100v1

Compressor summary: The paper proposes a network for removing occlusions from facial videos using GANs and preserving emotional expression, which can be useful for various applications like video conferencing and virtual makeup.


Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective

http://arxiv.org/abs/2402.09099v1

Compressor summary: The study explores how neuron interactions evolve during training in large language models, using self-organization and multifractal analysis to reveal emergent behavior and propose Neuron-based Multifractal Analysis (NeuroMFA) as a tool for quantitative analysis.


Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks

http://arxiv.org/abs/2402.09092v1

Compressor summary: The paper presents a large survey of 400 activation functions for neural networks, providing a comprehensive overview and systematization with links to original sources.


Polynomial Semantics of Tractable Probabilistic Circuits

http://arxiv.org/abs/2402.09085v1

Compressor summary: The paper proves that different probabilistic circuit models for binary distributions are equivalent and explores the challenges of extending them to categorical random variables.


Sobolev Training for Operator Learning

http://arxiv.org/abs/2402.09084v1

Compressor summary: Sobolev Training improves model performance by using derivative information on irregular meshes for operator learning.


Exploiting Estimation Bias in Deep Double Q-Learning for Actor-Critic Methods

http://arxiv.org/abs/2402.09078v1

Compressor summary: The paper presents two new RL algorithms (ExpD3 and BE-TD3) that address and use estimation biases for better continuous control tasks, outperforming existing methods like TD3 in some scenarios.


Affine transformation estimation improves visual self-supervised learning

http://arxiv.org/abs/2402.09071v1

Compressor summary: The paper proposes adding a predictive module to self-supervised learning models that improves their performance and efficiency by constraining representations with an additional loss term based on an affine transformation.


Solid Waste Detection in Remote Sensing Images: A Survey

http://arxiv.org/abs/2402.09066v1

Compressor summary: Remote sensing using EO satellites can help detect and monitor illegal landfills, mitigating pollution and health hazards, by providing high-resolution data at low cost.


Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

http://arxiv.org/abs/2402.09063v1

Compressor summary: The paper proposes a new attack on open-source language models that targets their continuous input representations, showing how it can lead to harmful behaviors and extract deleted information.


I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption

http://arxiv.org/abs/2402.09059v1

Compressor summary: BlindTuner is a system that enables privacy-preserving fine-tuning of transformer models on encrypted image data for image classification, achieving comparable accuracy and significant speed improvements over existing methods.


Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?

http://arxiv.org/abs/2402.09056v1

Compressor summary: The paper discusses challenges and novel insights of evidential deep learning for quantifying uncertainty in ML systems, focusing on optimizing second-order loss functions and interpreting epistemic uncertainty measures.


Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

http://arxiv.org/abs/2402.09055v1

Compressor summary: The paper proposes a new multi-modal humor detection model for short videos that aligns video and text in a shared meaning space and beats existing methods on two datasets.


L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

http://arxiv.org/abs/2402.09052v1

Compressor summary: L3GO is a language agent that uses large language models to reason about 3D mesh generation of unconventional objects in simulation environments, outperforming standard GPT-4 and other models.


FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning

http://arxiv.org/abs/2402.09051v1

Compressor summary: The paper presents FGeoDRL, a neural-symbolic system that uses reinforcement learning to perform human-like geometric deductive reasoning without human supervision.


End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

http://arxiv.org/abs/2402.09050v1

Compressor summary: E2E training outperforms non-E2E methods due to efficient input information propagation and layer-role differentiation in intermediate representations.


FGeo-TP: A Language Model-Enhanced Solver for Geometry Problems

http://arxiv.org/abs/2402.09047v1

Compressor summary: This paper introduces FGeo-TP, a theorem predictor that uses Transformers to help solve geometry problems faster and with fewer timeouts.


Inference of Abstraction for a Unified Account of Reasoning and Learning

http://arxiv.org/abs/2402.09046v1

Compressor summary: The paper proposes a probabilistic inference theory that unifies reasoning and learning by modeling how data leads to symbolic knowledge through abstraction and selective ignorance.


Under manipulations, are some AI models harder to audit?

http://arxiv.org/abs/2402.09043v1

Compressor summary: The paper investigates how difficult it is to audit web platforms with large-capacity models that can fit any data, and shows that such platforms are hard to audit robustly.


Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?

http://arxiv.org/abs/2402.09036v1

Compressor summary: This paper proposes a framework called GTI-MM that uses text-to-image models to help multi-modal learning for vision recognition tasks, improving data efficiency and robustness when some visual modalities are missing.


Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints

http://arxiv.org/abs/2402.09034v1

Compressor summary: Squared Sigmoid TanH (SST) activation improves the performance of sequential neural networks on small and sparse datasets by amplifying differences between strong and weak activations over time.


SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

http://arxiv.org/abs/2402.09025v1

Compressor summary: SLEB prunes redundant transformer blocks from large language models to speed up inference without sacrificing linguistic capabilities.


Pyramid Attention Network for Medical Image Registration

http://arxiv.org/abs/2402.09016v1

Compressor summary: The paper proposes a pyramid attention network (PAN) for deformable medical image registration, which improves feature representation and motion pattern analysis using a dual-stream encoder and a local attention Transformer decoder.


Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

http://arxiv.org/abs/2402.09015v1

Compressor summary: AgentEval is a framework that simplifies verifying the utility of LLM-powered applications by proposing criteria tailored to their unique purposes and assessing their performance against them.


Multi-Query Focused Disaster Summarization via Instruction-Based Prompting

http://arxiv.org/abs/2402.09008v1

Compressor summary: CrisisFACTS is a competition that challenges participants to develop systems for disaster summarization using web sources, focusing on fact extraction and QA-based prompting with LLaMA-13b model.


Gradient Alignment with Prototype Feature for Fully Test-time Adaptation

http://arxiv.org/abs/2402.09004v1

Compressor summary: GAP is a regularizer for Test-time Adaptation that uses gradient alignment and prototype features to prevent negative impacts from misclassified pseudo labels and improve performance on various datasets.


Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

http://arxiv.org/abs/2402.08998v1

Compressor summary: The paper proposes a new algorithm for the Stochastic Shortest Path problem with a linear mixture transition kernel, which eliminates restrictive assumptions and achieves near-optimality in regret bound.


CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic Decoding

http://arxiv.org/abs/2402.08994v1

Compressor summary: The CLIP-MUSED method uses a Transformer-based feature extractor, learnable subject-specific tokens, and representational similarity analysis to decode visual neural information from multiple subjects, overcoming limitations of prior methods.


MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

http://arxiv.org/abs/2402.08982v1

Compressor summary: MEL is a novel evolutionary computational approach that uses multi-task learning to improve feature selection in high-dimensional data and achieve better performance and efficiency than existing methods.


Research and application of Transformer based anomaly detection model: A literature review

http://arxiv.org/abs/2402.08975v1

Compressor summary: This is a review paper that provides an overview of how Transformers and their variants are used for anomaly detection, discusses the challenges, datasets, evaluation metrics, and future trends in this domain, and compiles over 100 references related to Transformer-based anomaly detection.


GrounDial: Human-norm Grounded Safe Dialog Response Generation

http://arxiv.org/abs/2402.08968v1

Compressor summary: GrounDial is a safe conversational AI system that uses commonsense social rules instead of fine-tuning, improving safety and reducing costs.


Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays

http://arxiv.org/abs/2402.08966v1

Compressor summary: The paper introduces PLURAL, a pretrained vision-language model that excels at difference visual question answering for chest X-ray images, by using longitudinal data to train on both natural and longitudinal image-text pairs.


Predicting User Experience on Laptops from Hardware Specifications

http://arxiv.org/abs/2402.08964v1

Compressor summary: The authors present a method to predict user experience on laptops from hardware specifications, using web browsing, video playback, and audio/video calls as indicators.


DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

http://arxiv.org/abs/2402.08963v1

Compressor summary: DUEL is a novel framework that uses active data filtering during self-supervised pre-training to address class imbalances cost-efficiently by enhancing distinctiveness information in an active memory.


HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding

http://arxiv.org/abs/2402.08961v1

Compressor summary: HyCubE is a novel 3D circular convolutional neural network that improves knowledge hypergraph embedding performance and efficiency, achieving significant improvements over existing methods.


Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

http://arxiv.org/abs/2402.08960v1

Compressor summary: The paper proposes a weakly-supervised open-vocabulary segmentation framework that uses independent image-mask and image-text pairs to predict masks and associate them with entities in CLIP embedding space, improving performance on challenging datasets.


Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

http://arxiv.org/abs/2402.08958v1

Compressor summary: The paper proposes aespa, a novel PTQ algorithm for Transformers that balances accuracy and efficiency by quantizing layer-wise and considering cross-layer dependency.


MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

http://arxiv.org/abs/2402.08957v1

Compressor summary: MUSTARD is a data generation framework that creates high-quality theorem and proof datasets for training large language models in mathematical reasoning tasks.


Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

http://arxiv.org/abs/2402.08955v1

Compressor summary: The study shows that large language models struggle with analogies that are dissimilar to their training data, unlike humans who maintain high performance across different analogy problems.


Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

http://arxiv.org/abs/2402.08948v1

Compressor summary: The paper investigates learning sparse polynomials with neural networks and gradient descent, and provides a basis-free generalization and a nearly sufficient condition for learnability.


Measuring Sharpness in Grokking

http://arxiv.org/abs/2402.08946v1

Compressor summary: This paper introduces a method to measure grokking in neural networks, a phenomenon where performance improves significantly after reaching a plateau on a validation set, and investigates its sharpness under two settings.


Evaluating DTW Measures via a Synthesis Framework for Time-Series Data

http://arxiv.org/abs/2402.08943v1

Compressor summary: The paper proposes a synthesis framework to model variations between time-series data sequences and evaluate different DTW measures for alignment and classification tasks.


Premise Order Matters in Reasoning with Large Language Models

http://arxiv.org/abs/2402.08939v1

Compressor summary: LLMs struggle with reasoning tasks when the order of premises is changed, even though the underlying task remains unchanged.


Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness

http://arxiv.org/abs/2402.08936v1

Compressor summary: The paper proposes a temporal attention mechanism for DVS cameras that reduces power consumption by only focusing on unpredictable visual events, filtering out noise and decreasing computational workload.


Depth-aware Volume Attention for Texture-less Stereo Matching

http://arxiv.org/abs/2402.08931v1

Compressor summary: Our method improves stereo matching and depth estimation for textured and texture-less images using a volume refinement scheme that incorporates depth, attention, and a new evaluation metric.


Second Order Methods for Bandit Optimization and Control

http://arxiv.org/abs/2402.08929v1

Compressor summary: The paper proposes a simple and practical online decision making algorithm for high dimensional data that achieves optimal regret bounds for a large class of convex functions, including linear, quadratic, and generalized linear models.


MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

http://arxiv.org/abs/2402.08925v1

Compressor summary: The paper proposes a method to align language models with diverse human preferences using a mixture of preference distributions and a MaxMin alignment objective, achieving better performance and fairness than conventional methods.


IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture

http://arxiv.org/abs/2402.08923v1

Compressor summary: The paper proposes a new method for predicting human poses using IMU data, which improves upon previous models by optimizing IMU placement and using a transformer-based model.


The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

http://arxiv.org/abs/2402.08922v1

Compressor summary: The paper introduces the Mirrored Influence Hypothesis, which suggests that evaluating the influence of training data on test predictions can be reformulated as an inverse problem, and proposes a new method for estimating this influence more efficiently.


Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding

http://arxiv.org/abs/2402.08919v1

Compressor summary: The paper proposes a method to measure conceptual similarity between images using captions and shows it correlates with human judgement and outperforms existing methods.


Graph Inference Acceleration by Learning MLPs on Graphs without Supervision

http://arxiv.org/abs/2402.08918v1

Compressor summary: The paper introduces SimMLP, a framework to learn MLPs on graphs without supervision, achieving better generalization and acceleration than existing methods.


Learning-based Bone Quality Classification Method for Spinal Metastasis

http://arxiv.org/abs/2402.08910v1

Compressor summary: The paper proposes a learning-based method for automatic bone quality classification in spinal metastasis using CT images, which improves performance with multi-task learning and self-paced learning.


Tackling Negative Transfer on Graphs

http://arxiv.org/abs/2402.08907v1

Compressor summary: The paper studies negative transfer in graph learning, finding that structural differences cause node embedding discrepancies, and proposes subgraph pooling methods to address this issue.


Weakly Supervised Segmentation of Vertebral Bodies with Iterative Slice-propagation

http://arxiv.org/abs/2402.08892v1

Compressor summary: WISS is a weakly supervised method that uses corner landmarks to segment vertebral bodies from CT images with high accuracy and reduced annotation costs.


Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation

http://arxiv.org/abs/2402.08882v1

Compressor summary: Key points: - The paper proposes a new neural network architecture for moving object proposals (MOP) using optical flow estimation and semantic segmentation. - The paper uses DAVIS Dataset and Encoder-Decoder architecture as main contributions. - The paper provides the codes with TensorFlow and runs them on AWS EC2 instance. Summary: The paper presents a new neural network method for MOP using optical flow and semantic segmentation, leveraging DAVIS Dataset and Encoder-Decoder model, and providing TensorFlow codes and AWS execution.


DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling

http://arxiv.org/abs/2402.08876v1

Compressor summary: The paper proposes a hyperbolic scaling method for learning unsigned distance fields, which improves 3D reconstruction quality, training performance, and enables accurate computation of topological properties.


TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition

http://arxiv.org/abs/2402.08875v1

Compressor summary: The paper presents a new TikTok dataset, TikTokActions, for human action recognition and shows its effectiveness in improving computer vision models.


Tree-Based Hard Attention with Self-Motivation for Large Language Models

http://arxiv.org/abs/2402.08874v1

Compressor summary: TEAROOM is a novel framework that helps large language models understand hierarchical text structures and improve their performance in estimating task-specific properties.


Position Paper: Challenges and Opportunities in Topological Deep Learning

http://arxiv.org/abs/2402.08871v1

Compressor summary: Topological deep learning uses topological features to enhance deep learning models and has promising applications and theoretical foundations that require further investigation.


ScamSpot: Fighting Financial Fraud in Instagram Comments

http://arxiv.org/abs/2402.08869v1

Compressor summary: ScamSpot is a system that combats spam and fraud in Instagram comments using a browser extension, a BERT model, and a REST API, with data annotation, user feedback, and open-source availability.


Large Language Model with Graph Convolution for Recommendation

http://arxiv.org/abs/2402.08859v1

Compressor summary: The paper proposes a new method to use large language models for improving item descriptions in recommendations by leveraging graph information and avoiding hallucination problems.