This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-26 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.15509v1
Compressor summary: The paper introduces FlowMDM, a diffusion-based model that generates long, continuous human motion sequences guided by textual descriptions, improving accuracy, realism, and smoothness in virtual reality, gaming, and robotics applications.
http://arxiv.org/abs/2402.15506v1
Compressor summary: The paper introduces AgentOhana, a solution to standardize and unify agent trajectories from diverse environments for training LLM-powered autonomous agents, and xLAM-v0.1, a large action model tailored for AI agents.
http://arxiv.org/abs/2402.15505v1
Compressor summary: The paper proposes a method to improve weak-to-strong generalization by using multiple specialized teachers instead of one generalist teacher for co-supervising a strong student model.
http://arxiv.org/abs/2402.15504v1
Compressor summary: This paper presents Gen4Gen, a dataset creation pipeline using generative models to combine multiple personalized concepts into complex images with text descriptions, and introduces MyCanvas, a new benchmark dataset for multi-concept personalization in text-to-image diffusion models. It also proposes a comprehensive metric (CP-CLIP and TI-CLIP) to evaluate the performance of such methods.
http://arxiv.org/abs/2402.15492v1
Compressor summary: The text proposes a novel passive and automated system that can detect and localize damages in structures using cheap sensors and a mechanics-informed autoencoder, which learns the undamaged state's response characteristics.
http://arxiv.org/abs/2402.15491v1
Compressor summary: API-BLEND is a large corpora that helps train and test language models using tools and external APIs in real-world scenarios.
http://arxiv.org/abs/2402.15490v1
Compressor summary: The text discusses the different types of Convolutional Neural Networks (CNNs) for computer vision tasks, comparing their structures, characteristics, strengths, weaknesses, and applications, as well as exploring related research fields and platforms.
http://arxiv.org/abs/2402.15480v1
Compressor summary: This study shows how integrating retinotopic mapping into deep CNNs can improve image classification and localization, mimicking a key feature of human vision.
http://arxiv.org/abs/2402.15478v1
Compressor summary: The paper investigates whether Transformers can reliably approximate continuous functions, finding that they struggle and depend on piecewise constant approximations with large gaps, raising questions about their universality as function approximators.
http://arxiv.org/abs/2402.15477v1
Compressor summary: The paper proposes a weakly supervised bias mitigation strategy for continuous sensitive variables, based on endogeneity from econometrics, that requires little expert input and works with model agnostic methods.
http://arxiv.org/abs/2402.15473v1
Compressor summary: The paper proposes a new reward modeling technique that uses domain knowledge to reduce the amount of human preference annotation needed in reinforcement learning from human feedback (RLHF) tasks, demonstrating its effectiveness in opinion summarization with a small dataset and releasing new datasets for future research.
http://arxiv.org/abs/2402.15472v1
Compressor summary: The paper proposes a filtering algorithm that uses submodular objective functions to select high-quality rules for weak supervision from automatically induced rules, improving semi-supervised text classification performance.
http://arxiv.org/abs/2402.15469v1
Compressor summary: This paper proposes a method to test how well panoptic segmentation models work in assisted and automated driving scenarios, by generating realistic noisy camera data and measuring their performance with different image quality metrics.
http://arxiv.org/abs/2402.15449v1
Compressor summary: Echo embeddings improve text embedding extraction from autoregressive models by repeating the input and using information from later tokens.
http://arxiv.org/abs/2402.15448v1
Compressor summary: This paper reviews computer vision techniques for multimedia geolocation and their potential to help fight human trafficking by locating illegal content more quickly and accurately.
http://arxiv.org/abs/2402.15445v1
Compressor summary: The text discusses how iterated belief revision can make some information acquisition methods irrelevant, and explores the complexity of shortening sequences of lexicographic revisions.
http://arxiv.org/abs/2402.15430v1
Compressor summary: The paper proposes a hierarchical invariant representation framework for robust and interpretable vision systems that can handle various tasks and applications.
http://arxiv.org/abs/2402.15429v1
Compressor summary: ProTIP is a framework to evaluate the robustness of Text-to-Image models using probabilistic analysis and sequential testing.
http://arxiv.org/abs/2402.15422v1
Compressor summary: The study uses large language models to generate patient summaries from doctors' notes, improves their performance by reducing hallucinations, and evaluates their faithfulness and quality using medical experts and quantitative metrics.
http://arxiv.org/abs/2402.15415v1
Compressor summary: The paper uses mathematical tools to study how attention and token values affect token cluster dynamics in Transformers, revealing similarities and differences depending on parameter variations.
http://arxiv.org/abs/2402.15414v1
Compressor summary: The paper explores how combining pre-trained LoRA modules improves generalization to unseen tasks, especially in low-shot settings, and evaluates two methods for composing them.
http://arxiv.org/abs/2402.15413v1
Compressor summary: Key points: - Group equivariance is a useful inductive bias for deep learning tasks with group symmetries - G-RepsNet is a lightweight equivariant network using tensor polynomials for arbitrary matrix groups - G-RepsNet performs well on various tasks with group symmetries, including image classification and physics simulations Summary: G-RepsNet is an efficient and expressive neural network that leverages tensor polynomials to achieve group equivariance for arbitrary matrix groups and perform well on diverse deep learning tasks.
http://arxiv.org/abs/2402.15411v1
Compressor summary: The paper proposes a new algorithm for online learning in contextual bandits that combines Bayesian and worst-case analysis to achieve better regret guarantees without requiring Bayesian assumptions.
http://arxiv.org/abs/2402.15406v1
Compressor summary: The paper presents a new method to quantify uncertainty in DeepONet regression using conformal prediction and shows its effectiveness on different examples.
http://arxiv.org/abs/2402.15404v1
Compressor summary: The paper introduces a new self-supervised method for learning representations from diverse time series datasets using contrastive pretraining and interpolation, which improves performance on classification tasks in low-data regimes.
http://arxiv.org/abs/2402.15399v1
Compressor summary: The paper proposes DR-LSVI-UCB, an efficient algorithm for off-dynamics RL using online DRMDPs with linear function approximation, and shows its effectiveness in various scenarios.
http://arxiv.org/abs/2402.15398v1
Compressor summary: The research introduces TransFlower, an explainable deep learning model that predicts urban commuting patterns using attention mechanisms and a geospatial encoder to capture complex flows and interactions in city development.
http://arxiv.org/abs/2402.15393v1
Compressor summary: NeuralThink is a new recurrent architecture that can extrapolate to both symmetrical and asymmetrical tasks with different input and output dimensionalities, outperforming previous Deep Thinking methods.
http://arxiv.org/abs/2402.15392v1
Compressor summary: The paper introduces a new approach for estimating the feasible reward set of an expert agent from offline data using two efficient algorithms that handle challenges of the offline setting.
http://arxiv.org/abs/2402.15391v1
Compressor summary: Genie is a generative model that can create diverse virtual worlds based on text and images, and it can be controlled through actions without needing labeled data or specific domain knowledge.
http://arxiv.org/abs/2402.15390v1
Compressor summary: This paper investigates self-repair, a phenomenon where large language models change their behavior to compensate for ablated components, and identifies two mechanisms behind it.
http://arxiv.org/abs/2402.15374v1
Compressor summary: The paper proposes a novel method for detecting outliers in visual recognition by predicting an outlier class along with K groundtruth classes and using a new anomaly score that combines uncertainty and negative objectness.
http://arxiv.org/abs/2402.15370v1
Compressor summary: The paper proposes a dual encoder model (D2E2S) that leverages syntactic and semantic information in aspect sentiment triple extraction task, achieving state-of-the-art results.
http://arxiv.org/abs/2402.15352v1
Compressor summary: The paper surveys and compares various supervised and unsupervised learning methods for image denoising, emphasizing recent developments in supervised learning and pointing out the lack of normalization equivariance in most approaches.
http://arxiv.org/abs/2402.15351v1
Compressor summary: AutoMMLab is an LLM-powered AutoML system that automates the entire end-to-end model production workflow for computer vision tasks using language instructions.
http://arxiv.org/abs/2402.15347v1
Compressor summary: The paper proposes a new safe Bayesian optimization method that uses an information-theoretic exploration criterion to select informative and safe parameters, without needing explicit hyperparameters or domain discretization.
http://arxiv.org/abs/2402.15345v1
Compressor summary: The authors propose a new probability density model that can fit complex 1D densities and achieve better performance than a previous deep factorized model with similar computational cost, and apply it to a compression task.
http://arxiv.org/abs/2402.15343v1
Compressor summary: The paper introduces NuNER, a compact language model that uses large language models for named entity recognition and can be fine-tuned to solve downstream NER problems efficiently and effectively.
http://arxiv.org/abs/2402.15337v1
Compressor summary: The paper explores how to learn conceptual spaces from large language models using rankings and perceptual features, and compares pointwise and pairwise ranking strategies.
http://arxiv.org/abs/2402.15332v1
Compressor summary: The authors propose using category theory to bridge the gap between specifying constraints for deep learning models and their implementations, recovering geometric deep learning constraints and encoding standard computer science concepts.
http://arxiv.org/abs/2402.15328v1
Compressor summary: The paper proposes a new way to group tasks in multitask learning that is more theoretically sound, flexible, and effective across diverse domains.
http://arxiv.org/abs/2402.15326v1
Compressor summary: The paper studies oversmoothing in diffusion-based GNNs using operator semigroup theory, proves a link to the ergodicity of the diffusion operator, and proposes an ergodicity-breaking term to mitigate it.
http://arxiv.org/abs/2402.15322v1
Compressor summary: The paper presents a method for optimal transport over Lie groups, particularly SE2, with applications in image analysis and improved interpolation of orientation fields.
http://arxiv.org/abs/2402.15321v1
Compressor summary: The report outlines a 3D scene understanding workshop with a challenge, dataset, evaluation, and winning methods.
http://arxiv.org/abs/2402.15319v1
Compressor summary: The GPTVQ method improves neural network quantization by increasing dimensionality, compressing codebooks, and using information from the Hessian, achieving state-of-the-art results on LLMs with efficient computation.
http://arxiv.org/abs/2402.15315v1
Compressor summary: This study explores the expressivity of ReLU neural networks, investigating their minimal depth representation for sum and max operations, and the connection with a conjecture about representing continuous piecewise linear functions.
http://arxiv.org/abs/2402.15313v1
Compressor summary: ArabianGPT is a series of transformer-based models designed for Arabic that improve performance on tasks like sentiment analysis and summarization when fine-tuned.
http://arxiv.org/abs/2402.15309v1
Compressor summary: The paper proposes a model (MATTE) for counterfactual generation that can handle domain-varying dependence between content and style latent variables, using relative sparsity of influences to identify the latent variables and achieving state-of-the-art performance in unsupervised style transfer.
http://arxiv.org/abs/2402.15307v1
Compressor summary: The paper presents a novel tokenized representation of online handwriting that improves vision-language models' performance on handwriting recognition tasks without changing their architecture.
http://arxiv.org/abs/2402.15302v1
Compressor summary: This study examines how large language models can be tricked into producing harmful or unethical content when asked to generate instructions, and introduces a dataset for testing this issue.
http://arxiv.org/abs/2402.15301v1
Compressor summary: The proposed method uses large language models and scientific literature to deduce causal relationships in general causal graph recovery tasks.
http://arxiv.org/abs/2402.15300v1
Compressor summary: CLIP-Guided Decoding (CGD) is a training-free method to reduce object hallucination in large vision-language models by using CLIP similarity to guide the model's decoding process and improve visual grounding with images.
http://arxiv.org/abs/2402.15297v1
Compressor summary: Key points: - Paper proposes a semi-supervised crowd-counting model - Model uses pixel-wise density distribution as a probability, not a single value - Model has three components: distribution matching loss, density tokens, and interleaving consistency learning - Model beats competitors on four datasets with different labeled ratios Summary: The paper presents a semi-supervised crowd-counting model that uses pixel-wise density distributions as probabilities and has three components to improve performance. The model outperforms existing methods on various datasets.
http://arxiv.org/abs/2402.15290v1
Compressor summary: LDNN is a new neural network model that combines continuous state space models with multi-input and multi-output to achieve efficient long-sequence modeling with reduced time complexity and improved flexibility.
http://arxiv.org/abs/2402.15289v1
Compressor summary: DiffusionABSA is a novel diffusion model that progressively extracts aspects and estimates their boundaries in sentiment analysis using a denoising neural network with syntax-aware temporal attention.
http://arxiv.org/abs/2402.15284v1
Compressor summary: This paper proposes a deep learning architecture, Spatiotemporal Observer, that incorporates dynamical system knowledge for theoretical guarantees and improved predictions of high dimensional data dynamics.
http://arxiv.org/abs/2402.15283v1
Compressor summary: The paper proposes an improvement for model-based reinforcement learning agents by fine-tuning their states using iterative inference at decision-time, resulting in better reconstruction accuracy and task performance, especially in partially-observable environments and with less training pre-evaluation.
http://arxiv.org/abs/2402.15274v1
Compressor summary: The text discusses how users might strategically choose whether to participate in predicting outcomes based on a learned classifier, and proposes a framework for learning from such self-selected populations.
http://arxiv.org/abs/2402.15273v1
Compressor summary: This paper presents an optimization pipeline for visual pose estimation on mini drones using neural architecture search and efficient software kernels.
http://arxiv.org/abs/2402.15272v1
Compressor summary: The paper proposes a novel framework, EMIFF, for vehicle-infrastructure cooperative 3D object detection in autonomous driving, addressing pose errors and information loss by fusing multi-view images and compressing features for efficient communication.
http://arxiv.org/abs/2402.15270v1
Compressor summary: Smoothed Graph Contrastive Learning (SGCL) is a new method for aligning node representations that uses proximity information and subgraph batches to improve performance on large-scale graphs.
http://arxiv.org/abs/2402.15268v1
Compressor summary: MemoryPrompt is a method that enhances language models with an auxiliary network that provides contextual information as soft prompts, improving performance on multiple fact updates and dialogue tasks without requiring finetuning or causing catastrophic forgetting.
http://arxiv.org/abs/2402.15266v1
Compressor summary: The text discusses the importance of calibration for reliable deep learning-based brain activity classification using non-invasive fNIRS technology and provides three tips to improve it.
http://arxiv.org/abs/2402.15264v1
Compressor summary: DEEM is a method that improves stance detection by using large language models to simulate generalizable and reliable experts in a semi-parametric way.
http://arxiv.org/abs/2402.15262v1
Compressor summary: The paper introduces RLLC, a method to improve the performance of optimizers with memory by dynamically adjusting their learning laws using linear combinations of memory units.
http://arxiv.org/abs/2402.15255v1
Compressor summary: The paper proposes a score-based algorithm using optimal transport to learn causal structure from missing data more effectively than existing methods.
http://arxiv.org/abs/2402.15248v1
Compressor summary: The authors use few-shot prompting with Llama-2-70B to enrich the MultiWOZ dataset with user backstories, creating realistic chitchat scenarios that challenge task-oriented dialogue systems and improve their resilience.
http://arxiv.org/abs/2402.15239v1
Compressor summary: The paper proposes a new method to segment cerebral aneurysms in different medical images by learning domain-invariant features using gradient surgery exponential moving average and boundary-aware contrastive learning.
http://arxiv.org/abs/2402.15238v1
Compressor summary: GPT-HateCheck is a framework that uses large language models to generate diverse and realistic online hate detection test cases with high quality.