arxiv compressed, 2024-02-26

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-26 generated by the compressor, my personal LLM-based project.


Seamless Human Motion Composition with Blended Positional Encodings

http://arxiv.org/abs/2402.15509v1

Compressor summary: The paper introduces FlowMDM, a diffusion-based model that generates long, continuous human motion sequences guided by textual descriptions, improving accuracy, realism, and smoothness in virtual reality, gaming, and robotics applications.


AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

http://arxiv.org/abs/2402.15506v1

Compressor summary: The paper introduces AgentOhana, a solution to standardize and unify agent trajectories from diverse environments for training LLM-powered autonomous agents, and xLAM-v0.1, a large action model tailored for AI agents.


Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts

http://arxiv.org/abs/2402.15505v1

Compressor summary: The paper proposes a method to improve weak-to-strong generalization by using multiple specialized teachers instead of one generalist teacher for co-supervising a strong student model.


Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

http://arxiv.org/abs/2402.15504v1

Compressor summary: This paper presents Gen4Gen, a dataset creation pipeline using generative models to combine multiple personalized concepts into complex images with text descriptions, and introduces MyCanvas, a new benchmark dataset for multi-concept personalization in text-to-image diffusion models. It also proposes a comprehensive metric (CP-CLIP and TI-CLIP) to evaluate the performance of such methods.


Mechanics-Informed Autoencoder Enables Automated Detection and Localization of Unforeseen Structural Damage

http://arxiv.org/abs/2402.15492v1

Compressor summary: The text proposes a novel passive and automated system that can detect and localize damages in structures using cheap sensors and a mechanics-informed autoencoder, which learns the undamaged state's response characteristics.


API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

http://arxiv.org/abs/2402.15491v1

Compressor summary: API-BLEND is a large corpora that helps train and test language models using tools and external APIs in real-world scenarios.


A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

http://arxiv.org/abs/2402.15490v1

Compressor summary: The text discusses the different types of Convolutional Neural Networks (CNNs) for computer vision tasks, comparing their structures, characteristics, strengths, weaknesses, and applications, as well as exploring related research fields and platforms.


Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks

http://arxiv.org/abs/2402.15480v1

Compressor summary: This study shows how integrating retinotopic mapping into deep CNNs can improve image classification and localization, mimicking a key feature of human vision.


Transformers are Expressive, But Are They Expressive Enough for Regression?

http://arxiv.org/abs/2402.15478v1

Compressor summary: The paper investigates whether Transformers can reliably approximate continuous functions, finding that they struggle and depend on piecewise constant approximations with large gaps, raising questions about their universality as function approximators.


Debiasing Machine Learning Models by Using Weakly Supervised Learning

http://arxiv.org/abs/2402.15477v1

Compressor summary: The paper proposes a weakly supervised bias mitigation strategy for continuous sensitive variables, based on endogeneity from econometrics, that requires little expert input and works with model agnostic methods.


Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization

http://arxiv.org/abs/2402.15473v1

Compressor summary: The paper proposes a new reward modeling technique that uses domain knowledge to reduce the amount of human preference annotation needed in reinforcement learning from human feedback (RLHF) tasks, demonstrating its effectiveness in opinion summarization with a small dataset and releasing new datasets for future research.


FAIR: Filtering of Automatically Induced Rules

http://arxiv.org/abs/2402.15472v1

Compressor summary: The paper proposes a filtering algorithm that uses submodular objective functions to select high-quality rules for weak supervision from automatically induced rules, improving semi-supervised text classification performance.


Benchmarking the Robustness of Panoptic Segmentation for Automated Driving

http://arxiv.org/abs/2402.15469v1

Compressor summary: This paper proposes a method to test how well panoptic segmentation models work in assisted and automated driving scenarios, by generating realistic noisy camera data and measuring their performance with different image quality metrics.


Repetition Improves Language Model Embeddings

http://arxiv.org/abs/2402.15449v1

Compressor summary: Echo embeddings improve text embedding extraction from autoregressive models by repeating the input and using information from later tokens.


Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review

http://arxiv.org/abs/2402.15448v1

Compressor summary: This paper reviews computer vision techniques for multimedia geolocation and their potential to help fight human trafficking by locating illegal content more quickly and accurately.


Can we forget how we learned? Doxastic redundancy in iterated belief revision

http://arxiv.org/abs/2402.15445v1

Compressor summary: The text discusses how iterated belief revision can make some information acquisition methods irrelevant, and explores the complexity of shortening sequences of lexicographic revisions.


Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales

http://arxiv.org/abs/2402.15430v1

Compressor summary: The paper proposes a hierarchical invariant representation framework for robust and interpretable vision systems that can handle various tasks and applications.


ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

http://arxiv.org/abs/2402.15429v1

Compressor summary: ProTIP is a framework to evaluate the robustness of Text-to-Image models using probabilistic analysis and sequential testing.


A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

http://arxiv.org/abs/2402.15422v1

Compressor summary: The study uses large language models to generate patient summaries from doctors' notes, improves their performance by reducing hallucinations, and evaluates their faithfulness and quality using medical experts and quantitative metrics.


The Impact of LoRA on the Emergence of Clusters in Transformers

http://arxiv.org/abs/2402.15415v1

Compressor summary: The paper uses mathematical tools to study how attention and token values affect token cluster dynamics in Transformers, revealing similarities and differences depending on parameter variations.


Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

http://arxiv.org/abs/2402.15414v1

Compressor summary: The paper explores how combining pre-trained LoRA modules improves generalization to unseen tasks, especially in low-shot settings, and evaluates two methods for composing them.


G-RepsNet: A Fast and General Construction of Equivariant Networks for Arbitrary Matrix Groups

http://arxiv.org/abs/2402.15413v1

Compressor summary: Key points: - Group equivariance is a useful inductive bias for deep learning tasks with group symmetries - G-RepsNet is a lightweight equivariant network using tensor polynomials for arbitrary matrix groups - G-RepsNet performs well on various tasks with group symmetries, including image classification and physics simulations Summary: G-RepsNet is an efficient and expressive neural network that leverages tensor polynomials to achieve group equivariance for arbitrary matrix groups and perform well on diverse deep learning tasks.


Optimisic Information Directed Sampling

http://arxiv.org/abs/2402.15411v1

Compressor summary: The paper proposes a new algorithm for online learning in contextual bandits that combines Bayesian and worst-case analysis to achieve better regret guarantees without requiring Bayesian assumptions.


Conformalized-DeepONet: A Distribution-Free Framework for Uncertainty Quantification in Deep Operator Networks

http://arxiv.org/abs/2402.15406v1

Compressor summary: The paper presents a new method to quantify uncertainty in DeepONet regression using conformal prediction and shows its effectiveness on different examples.


United We Pretrain, Divided We Fail! Representation Learning for Time Series by Pretraining on 75 Datasets at Once

http://arxiv.org/abs/2402.15404v1

Compressor summary: The paper introduces a new self-supervised method for learning representations from diverse time series datasets using contrastive pretraining and interpolation, which improves performance on classification tasks in low-data regimes.


Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation

http://arxiv.org/abs/2402.15399v1

Compressor summary: The paper proposes DR-LSVI-UCB, an efficient algorithm for off-dynamics RL using online DRMDPs with linear function approximation, and shows its effectiveness in various scenarios.


TransFlower: An Explainable Transformer-Based Model with Flow-to-Flow Attention for Commuting Flow Prediction

http://arxiv.org/abs/2402.15398v1

Compressor summary: The research introduces TransFlower, an explainable deep learning model that predicts urban commuting patterns using attention mechanisms and a geospatial encoder to capture complex flows and interactions in city development.


NeuralThink: Algorithm Synthesis that Extrapolates in General Tasks

http://arxiv.org/abs/2402.15393v1

Compressor summary: NeuralThink is a new recurrent architecture that can extrapolate to both symmetrical and asymmetrical tasks with different input and output dimensionalities, outperforming previous Deep Thinking methods.


Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

http://arxiv.org/abs/2402.15392v1

Compressor summary: The paper introduces a new approach for estimating the feasible reward set of an expert agent from offline data using two efficient algorithms that handle challenges of the offline setting.


Genie: Generative Interactive Environments

http://arxiv.org/abs/2402.15391v1

Compressor summary: Genie is a generative model that can create diverse virtual worlds based on text and images, and it can be controlled through actions without needing labeled data or specific domain knowledge.


Explorations of Self-Repair in Language Models

http://arxiv.org/abs/2402.15390v1

Compressor summary: This paper investigates self-repair, a phenomenon where large language models change their behavior to compensate for ablated components, and identifies two mechanisms behind it.


Outlier detection by ensembling uncertainty with negative objectness

http://arxiv.org/abs/2402.15374v1

Compressor summary: The paper proposes a novel method for detecting outliers in visual recognition by predicting an outlier class along with K groundtruth classes and using a new anomaly score that combines uncertainty and negative objectness.


Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

http://arxiv.org/abs/2402.15370v1

Compressor summary: The paper proposes a dual encoder model (D2E2S) that leverages syntactic and semantic information in aspect sentiment triple extraction task, achieving state-of-the-art results.


On normalization-equivariance properties of supervised and unsupervised denoising methods: a survey

http://arxiv.org/abs/2402.15352v1

Compressor summary: The paper surveys and compares various supervised and unsupervised learning methods for image denoising, emphasizing recent developments in supervised learning and pointing out the lack of normalization equivariance in most approaches.


AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

http://arxiv.org/abs/2402.15351v1

Compressor summary: AutoMMLab is an LLM-powered AutoML system that automates the entire end-to-end model production workflow for computer vision tasks using language instructions.


Information-Theoretic Safe Bayesian Optimization

http://arxiv.org/abs/2402.15347v1

Compressor summary: The paper proposes a new safe Bayesian optimization method that uses an information-theoretic exploration criterion to select informative and safe parameters, without needing explicit hyperparameters or domain discretization.


Fourier Basis Density Model

http://arxiv.org/abs/2402.15345v1

Compressor summary: The authors propose a new probability density model that can fit complex 1D densities and achieve better performance than a previous deep factorized model with similar computational cost, and apply it to a compression task.


NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data

http://arxiv.org/abs/2402.15343v1

Compressor summary: The paper introduces NuNER, a compact language model that uses large language models for named entity recognition and can be fine-tuned to solve downstream NER problems efficiently and effectively.


Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies

http://arxiv.org/abs/2402.15337v1

Compressor summary: The paper explores how to learn conceptual spaces from large language models using rankings and perceptual features, and compares pointwise and pairwise ranking strategies.


Categorical Deep Learning: An Algebraic Theory of Architectures

http://arxiv.org/abs/2402.15332v1

Compressor summary: The authors propose using category theory to bridge the gap between specifying constraints for deep learning models and their implementations, recovering geometric deep learning constraints and encoding standard computer science concepts.


Towards Principled Task Grouping for Multi-Task Learning

http://arxiv.org/abs/2402.15328v1

Compressor summary: The paper proposes a new way to group tasks in multitask learning that is more theoretically sound, flexible, and effective across diverse domains.


Understanding Oversmoothing in Diffusion-Based GNNs From the Perspective of Operator Semigroup Theory

http://arxiv.org/abs/2402.15326v1

Compressor summary: The paper studies oversmoothing in diffusion-based GNNs using operator semigroup theory, proves a link to the ergodicity of the diffusion operator, and proposes an ergodicity-breaking term to mitigate it.


Optimal Transport on the Lie Group of Roto-translations

http://arxiv.org/abs/2402.15322v1

Compressor summary: The paper presents a method for optimal transport over Lie groups, particularly SE2, with applications in image analysis and improved interpolation of orientation fields.


OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

http://arxiv.org/abs/2402.15321v1

Compressor summary: The report outlines a 3D scene understanding workshop with a challenge, dataset, evaluation, and winning methods.


GPTVQ: The Blessing of Dimensionality for LLM Quantization

http://arxiv.org/abs/2402.15319v1

Compressor summary: The GPTVQ method improves neural network quantization by increasing dimensionality, compressing codebooks, and using information from the Hessian, achieving state-of-the-art results on LLMs with efficient computation.


On Minimal Depth in Neural Networks

http://arxiv.org/abs/2402.15315v1

Compressor summary: This study explores the expressivity of ReLU neural networks, investigating their minimal depth representation for sum and max operations, and the connection with a conjecture about representing continuous piecewise linear functions.


ArabianGPT: Native Arabic GPT-based Large Language

http://arxiv.org/abs/2402.15313v1

Compressor summary: ArabianGPT is a series of transformer-based models designed for Arabic that improve performance on tasks like sentiment analysis and summarization when fine-tuned.


Counterfactual Generation with Identifiability Guarantees

http://arxiv.org/abs/2402.15309v1

Compressor summary: The paper proposes a model (MATTE) for counterfactual generation that can handle domain-varying dependence between content and style latent variables, using relative sparsity of influences to identify the latent variables and achieving state-of-the-art performance in unsupervised style transfer.


Representing Online Handwriting for Recognition in Large Vision-Language Models

http://arxiv.org/abs/2402.15307v1

Compressor summary: The paper presents a novel tokenized representation of online handwriting that improves vision-language models' performance on handwriting recognition tasks without changing their architecture.


How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

http://arxiv.org/abs/2402.15302v1

Compressor summary: This study examines how large language models can be tricked into producing harmful or unethical content when asked to generate instructions, and introduces a dataset for testing this issue.


Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models

http://arxiv.org/abs/2402.15301v1

Compressor summary: The proposed method uses large language models and scientific literature to deduce causal relationships in general causal graph recovery tasks.


Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

http://arxiv.org/abs/2402.15300v1

Compressor summary: CLIP-Guided Decoding (CGD) is a training-free method to reduce object hallucination in large vision-language models by using CLIP similarity to guide the model's decoding process and improve visual grounding with images.


Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

http://arxiv.org/abs/2402.15297v1

Compressor summary: Key points: - Paper proposes a semi-supervised crowd-counting model - Model uses pixel-wise density distribution as a probability, not a single value - Model has three components: distribution matching loss, density tokens, and interleaving consistency learning - Model beats competitors on four datasets with different labeled ratios Summary: The paper presents a semi-supervised crowd-counting model that uses pixel-wise density distributions as probabilities and has three components to improve performance. The model outperforms existing methods on various datasets.


Linear Dynamics-embedded Neural Network for Long-Sequence Modeling

http://arxiv.org/abs/2402.15290v1

Compressor summary: LDNN is a new neural network model that combines continuous state space models with multi-input and multi-output to achieve efficient long-sequence modeling with reduced time complexity and improved flexibility.


Let's Rectify Step by Step: Improving Aspect-based Sentiment Analysis with Diffusion Models

http://arxiv.org/abs/2402.15289v1

Compressor summary: DiffusionABSA is a novel diffusion model that progressively extracts aspects and estimates their boundaries in sentiment analysis using a denoising neural network with syntax-aware temporal attention.


Spatiotemporal Observer Design for Predictive Learning of High-Dimensional Data

http://arxiv.org/abs/2402.15284v1

Compressor summary: This paper proposes a deep learning architecture, Spatiotemporal Observer, that incorporates dynamical system knowledge for theoretical guarantees and improved predictions of high dimensional data dynamics.


When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination

http://arxiv.org/abs/2402.15283v1

Compressor summary: The paper proposes an improvement for model-based reinforcement learning agents by fine-tuning their states using iterative inference at decision-time, resulting in better reconstruction accuracy and task performance, especially in partially-observable environments and with less training pre-evaluation.


Classification Under Strategic Self-Selection

http://arxiv.org/abs/2402.15274v1

Compressor summary: The text discusses how users might strategically choose whether to participate in predicting outcomes based on a learned classifier, and proposes a framework for learning from such self-selected populations.


Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones

http://arxiv.org/abs/2402.15273v1

Compressor summary: This paper presents an optimization pipeline for visual pose estimation on mini drones using neural architecture search and efficient software kernels.


EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

http://arxiv.org/abs/2402.15272v1

Compressor summary: The paper proposes a novel framework, EMIFF, for vehicle-infrastructure cooperative 3D object detection in autonomous driving, addressing pose errors and information loss by fusing multi-view images and compressing features for efficient communication.


Smoothed Graph Contrastive Learning via Seamless Proximity Integration

http://arxiv.org/abs/2402.15270v1

Compressor summary: Smoothed Graph Contrastive Learning (SGCL) is a new method for aligning node representations that uses proximity information and subgraph batches to improve performance on large-scale graphs.


MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

http://arxiv.org/abs/2402.15268v1

Compressor summary: MemoryPrompt is a method that enhances language models with an auxiliary network that provides contextual information as soft prompts, improving performance on multiple fact updates and dialogue tasks without requiring finetuning or causing catastrophic forgetting.


Calibration of Deep Learning Classification Models in fNIRS

http://arxiv.org/abs/2402.15266v1

Compressor summary: The text discusses the importance of calibration for reliable deep learning-based brain activity classification using non-invasive fNIRS technology and provides three tips to improve it.


DEEM: Dynamic Experienced Expert Modeling for Stance Detection

http://arxiv.org/abs/2402.15264v1

Compressor summary: DEEM is a method that improves stance detection by using large language models to simulate generalizable and reliable experts in a semi-parametric way.


Dynamic Memory Based Adaptive Optimization

http://arxiv.org/abs/2402.15262v1

Compressor summary: The paper introduces RLLC, a method to improve the performance of optimizers with memory by dynamically adjusting their learning laws using linear combinations of memory units.


Optimal Transport for Structure Learning Under Missing Data

http://arxiv.org/abs/2402.15255v1

Compressor summary: The paper proposes a score-based algorithm using optimal transport to learn causal structure from missing data more effectively than existing methods.


Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues

http://arxiv.org/abs/2402.15248v1

Compressor summary: The authors use few-shot prompting with Llama-2-70B to enrich the MultiWOZ dataset with user backstories, creating realistic chitchat scenarios that challenge task-oriented dialogue systems and improve their resilience.


GS-EMA: Integrating Gradient Surgery Exponential Moving Average with Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in Aneurysm Segmentation

http://arxiv.org/abs/2402.15239v1

Compressor summary: The paper proposes a new method to segment cerebral aneurysms in different medical images by learning domain-invariant features using gradient surgery exponential moving average and boundary-aware contrastive learning.


GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?

http://arxiv.org/abs/2402.15238v1

Compressor summary: GPT-HateCheck is a framework that uses large language models to generate diverse and realistic online hate detection test cases with high quality.