arxiv compressed, 2024-03-08

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-08 generated by the compressor, my personal LLM-based project.


Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

http://arxiv.org/abs/2403.04765v1

Compressor summary: The paper proposes a new method that improves the efficiency and accuracy of semi-dense matching across images, with potential applications in image retrieval and 3D reconstruction.


Minimizing the Thompson Sampling Regret-to-Sigma Ratio (TS-RSR): a provably efficient algorithm for batch Bayesian Optimization

http://arxiv.org/abs/2403.04764v1

Compressor summary: The paper proposes a new batch Bayesian Optimization method using Thompson Sampling that minimizes redundancy and has low regret, and shows superior performance on nonconvex test functions.


BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization

http://arxiv.org/abs/2403.04763v1

Compressor summary: The paper shows how different graph learning techniques can be viewed as special cases or simplifications of bilevel optimization and presents a flexible class of energy functions for GNN message-passing layers with residual error analysis, called BloomGML.


Lifelong Intelligence Beyond the Edge using Hyperdimensional Computing

http://arxiv.org/abs/2403.04759v1

Compressor summary: The paper introduces LifeHD, an on-device lifelong learning system for IoT devices using hyperdimensional computing, which improves unsupervised clustering accuracy and energy efficiency compared to existing methods.


That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation

http://arxiv.org/abs/2403.04755v1

Compressor summary: The paper presents a method to estimate 3D poses from LiDAR scans using minimal storage and clustering, while maintaining accurate localization with an object-matching network.


GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

http://arxiv.org/abs/2403.04747v1

Compressor summary: The paper proposes a new aggregation function for graph neural networks (GNNs) that preserves expressivity and improves learning dynamics, potentially leading to self-normalizing GNNs.


LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

http://arxiv.org/abs/2403.04746v1

Compressor summary: The paper proposes a biologically inspired method called simulated trial and error (STE) that improves the accuracy and reliability of tool use by large language models, outperforming existing methods like GPT-4.


SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions

http://arxiv.org/abs/2403.04744v1

Compressor summary: The paper investigates NGCA's complexity in the SQ model and shows that only the moment-matching condition is necessary for hardness, not the chi-squared condition.


I Can't Believe It's Not Scene Flow!

http://arxiv.org/abs/2403.04739v1

Compressor summary: The paper proposes a new evaluation protocol for scene flow methods that accounts for object size, speed, and class, and demonstrates its effectiveness with a simple but powerful baseline method called TrackFlow.


SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

http://arxiv.org/abs/2403.04735v1

Compressor summary: SnapNTell is a new benchmark for entity-centric visual question answering that challenges large language models to provide accurate and detailed responses about various entities across 22 categories.


How Far Are We from Intelligent Visual Deductive Reasoning?

http://arxiv.org/abs/2403.04732v1

Compressor summary: The text explores vision-based deductive reasoning in VLMs using Raven's Progressive Matrices and finds that current state-of-the-art models struggle with understanding complex visual patterns.


Masked Capsule Autoencoders

http://arxiv.org/abs/2403.04724v1

Compressor summary: MCAE is a new Capsule Network model that uses self-supervised pretraining with masked image modelling to improve performance on complex data tasks.


Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization

http://arxiv.org/abs/2403.04720v1

Compressor summary: The research introduces a new encoder-based model for representing tabular datasets in meta-learning tasks, comparing it with Dataset2Vec and highlighting the importance of task-specific representations.


Common 7B Language Models Already Possess Strong Math Capabilities

http://arxiv.org/abs/2403.04706v1

Compressor summary: The paper demonstrates that a large language model can perform well on math benchmarks with proper pre-training and data scaling, but struggles to reliably generate correct answers without them.


ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

http://arxiv.org/abs/2403.04701v1

Compressor summary: The paper proposes a method to generate diverse object-to-background changes using text-to-image, image-to-text, and image-to-segment models, and evaluates the robustness of vision-based models against these changes.


Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

http://arxiv.org/abs/2403.04700v1

Compressor summary: This paper explores the long-tail distribution issue in multiple object tracking data and proposes two data augmentation strategies to mitigate its effects on performance.


AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

http://arxiv.org/abs/2403.04697v1

Compressor summary: The paper proposes AUFormer, a new method for facial action unit detection using parameter-efficient transfer learning and a novel loss function, achieving state-of-the-art performance without extra data.


Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

http://arxiv.org/abs/2403.04696v1

Compressor summary: The paper proposes a method to detect hallucinations and fact-check claims in large language models using token-level uncertainty scores, which improve over baselines and are comparable to external fact-checking tools.


Analysis of Systems' Performance in Natural Language Processing Competitions

http://arxiv.org/abs/2403.04693v1

Compressor summary: The text describes a universal evaluation methodology for scientific and technological collaborative competitions, which can handle classification and regression problems, account for different difficulties, and provide more accurate performance comparisons.


PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

http://arxiv.org/abs/2403.04692v1

Compressor summary: PixArt-\Sigma is a more efficient text-to-image diffusion model that generates higher quality 4K images with better alignment to user prompts and uses less data and parameters than previous models.


Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

http://arxiv.org/abs/2403.04690v1

Compressor summary: The paper proposes efficient kernels for neighborhood attention, a self-attention variant that limits token attention to nearby tokens, improving latency and reducing memory footprint.


Greater than the sum of its parts: The role of minority and majority status in collaborative problem-solving communication

http://arxiv.org/abs/2403.04671v1

Compressor summary: The text explores how sociocognitive linguistic patterns differ by race/ethnicity and gender in collaborative problem-solving tasks and discusses the implications of diversity on communication and collaboration.


End-to-end Conditional Robust Optimization

http://arxiv.org/abs/2403.04670v1

Compressor summary: Contextual Optimization (CO) uses machine learning and optimization to solve uncertain decision problems, and a new approach called Conditional Robust Optimization (CRO) enhances safety and reliability by combining uncertainty quantification with robust optimization, achieving high quality conditional coverage using differentiable optimization methods.


The Social Impact of Generative AI: An Analysis on ChatGPT

http://arxiv.org/abs/2403.04667v1

Compressor summary: The paper examines the social impacts of ChatGPT and other generative AI models, considering both their benefits and risks for various sectors and proposing ways to promote ethical and human-centered AI development.


Telecom Language Models: Must They Be Large?

http://arxiv.org/abs/2403.04666v1

Compressor summary: This paper evaluates Phi-2, a small language model that can understand and answer questions about telecom standards with high accuracy by using a Retrieval-Augmented Generation approach.


Dynamic Cross Attention for Audio-Visual Person Verification

http://arxiv.org/abs/2403.04661v1

Compressor summary: The paper proposes a Dynamic Cross-Attention (DCA) model for audio-visual identity verification that adapts to strong or weak complementary relationships between audio and visual features, achieving state-of-the-art results on Voxceleb1 dataset.


Chain of Thought Explanation for Dialogue State Tracking

http://arxiv.org/abs/2403.04656v1

Compressor summary: The paper proposes a model called CoTE for dialogue state tracking that generates explanations to improve accuracy and reliability in slot value determination.


Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

http://arxiv.org/abs/2403.04654v1

Compressor summary: The paper proposes a recursive cross-attentional model with BLSTMs for better audio-visual fusion in person verification, improving over unimodal systems.


Yi: Open Foundation Models by 01.AI

http://arxiv.org/abs/2403.04652v1

Compressor summary: The Yi model family is a series of multidimensional language and vision models that achieve strong performance on various benchmarks due to their high-quality data and super-computing infrastructure.


Context-Based Multimodal Fusion

http://arxiv.org/abs/2403.04650v1

Compressor summary: The Context-Based Multimodal Fusion model combines modality fusion and data distribution alignment to solve complex multimodal tasks with reduced computational and training data requirements.


QAQ: Quality Adaptive Quantization for LLM KV Cache

http://arxiv.org/abs/2403.04643v1

Compressor summary: QAQ is a novel compression scheme that adapts to key and value caches in NLP models, allowing for more efficient deployment of LLMs with minimal performance loss.


Teaching Large Language Models to Reason with Reinforcement Learning

http://arxiv.org/abs/2403.04642v1

Compressor summary: The paper compares different algorithms that use reinforcement learning from human feedback to improve large language models' reasoning capabilities and finds Expert Iteration performs best with similar sample complexity to PPO.


CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

http://arxiv.org/abs/2403.04640v1

Compressor summary: The paper presents CAT, a method that enhances MLLMs for answering questions in complex audio-visual scenarios by aggregating clues, using a mixed multimodal dataset, and optimizing for non-ambiguity responses.


MaCmS: Magahi Code-mixed Dataset for Sentiment Analysis

http://arxiv.org/abs/2403.04639v1

Compressor summary: The paper introduces MaCMS, the first Magahi-Hindi-English code-mixed sentiment analysis dataset, and analyzes its structure, language preferences, and quality.


Entropy Aware Message Passing in Graph Neural Networks

http://arxiv.org/abs/2403.04636v1

Compressor summary: The paper proposes an entropy-aware message passing term for GNNs that prevents oversmoothing by preserving node diversity during aggregation.


Pix2Gif: Motion-Guided Diffusion for GIF Generation

http://arxiv.org/abs/2403.04634v1

Compressor summary: The paper introduces Pix2Gif, a model that generates GIFs from images using text and motion guidance, and uses a new warping module and perceptual loss to ensure quality and coherence.


Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration

http://arxiv.org/abs/2403.04629v1

Compressor summary: ShapleyBO is a framework that uses game-theoretic Shapley values to interpret and improve Bayesian optimization, enabling better exploration and personalization of wearable robotic devices by humans.


In-n-Out: Calibrating Graph Neural Networks for Link Prediction

http://arxiv.org/abs/2403.04605v1

Compressor summary: IN-N-OUT is a new method to improve the calibration of graph neural networks (GNNs) for predicting links by labeling edges with true/false labels based on GNN predictions, which leads to better embeddings and more accurate probabilities.


Contrastive Continual Learning with Importance Sampling and Prototype-Instance Relation Distillation

http://arxiv.org/abs/2403.04599v1

Compressor summary: The paper proposes CCLIS, a method that uses importance sampling to select replay buffers and prototype-instance relation distillation to maintain knowledge, which improves continual learning by reducing catastrophic forgetting.


Embodied Understanding of Driving Scenarios

http://arxiv.org/abs/2403.04593v1

Compressor summary: The paper introduces Embodied Language Model (ELM), a framework that enables autonomous agents to understand driving scenes with large spatial and temporal spans by incorporating space-aware pre-training and time-aware token selection.


Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace

http://arxiv.org/abs/2403.04588v1

Compressor summary: Key points: - Humans can create mental representations from multiple senses and generalize information across domains - Robotics and RL agents struggle to exploit sensor redundancy and complementarity for robustness and generalization - A brain-inspired multimodal representation called 'Global Workspace' combines information across modalities and transfers policies between them without extra training Summary: The paper proposes a 'Global Workspace', a multimodal representation inspired by the human brain, that can improve RL agents' robustness and generalization by combining and transferring information across different sensors.


Unbiased Estimator for Distorted Conics in Camera Calibration

http://arxiv.org/abs/2403.04583v1

Compressor summary: The paper proposes a new method for camera calibration using conic features based on moments that can overcome distortion limitations and improve accuracy.


Beyond Major Product Prediction: Reproducing Reaction Mechanisms with Machine Learning Models Trained on a Large-Scale Mechanistic Dataset

http://arxiv.org/abs/2403.04580v1

Compressor summary: The authors create a large dataset of organic reaction intermediates and train machine learning models to predict reaction pathways, impurities, and roles of catalysts and reagents.


Wiki-TabNER:Advancing Table Interpretation Through Named Entity Recognition

http://arxiv.org/abs/2403.04577v1

Compressor summary: The paper analyses a benchmark dataset for table interpretation tasks, finds it too simple, creates a new more realistic dataset, introduces a novel entity linking problem, and proposes a prompting framework to evaluate large language models on this task.


Machine learning and information theory concepts towards an AI Mathematician

http://arxiv.org/abs/2403.04571v1

Compressor summary: The text explores how deep learning lacks reasoning and uncertainty estimation skills compared to human mathematicians, and proposes using information theory to discover interesting conjectures in mathematics.


Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition

http://arxiv.org/abs/2403.04568v1

Compressor summary: The paper proposes a new algorithm for linear mixture MDPs that achieves better regret than previous methods by using a novel least square estimator and self-normalized concentration.


Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes

http://arxiv.org/abs/2403.04562v1

Compressor summary: The authors propose an event-based method for class-agnostic motion segmentation that works in complex large-scale outdoor environments and achieves state-of-the-art results on indoor and outdoor benchmarks.


Reducing self-supervised learning complexity improves weakly-supervised classification performance in computational pathology

http://arxiv.org/abs/2403.04558v1

Compressor summary: The authors explore self-supervised learning methods for breast cancer diagnosis using consumer-grade hardware and show that they can improve classification performance while reducing training time by 90%.


Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

http://arxiv.org/abs/2403.04551v1

Compressor summary: The text introduces a fine-grained taxonomy of hardness types and a toolkit to benchmark Hardness Characterization Methods (HCMs) for ML model development.


Explainable Face Verification via Feature-Guided Gradient Backpropagation

http://arxiv.org/abs/2403.04549v1

Compressor summary: The paper proposes a new explanation approach for face recognition systems called FGGB, which generates precise saliency maps to interpret the system's decisions.


CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

http://arxiv.org/abs/2403.04547v1

Compressor summary: The study presents a novel algorithm (M4) to reduce biases in CLIP models by balancing data and analyzes its effects on different factors and metrics.


Improve Generalization Ability of Deep Wide Residual Network with A Suitable Scaling Factor

http://arxiv.org/abs/2403.04545v1

Compressor summary: The paper investigates how to choose a scaling factor ($\alpha$) for ResNets to avoid generalization issues and improve performance on various tasks.


Towards Automatic Composition of ASP Programs from Natural Language Specifications

http://arxiv.org/abs/2403.04541v1

Compressor summary: The paper presents a two-step system for generating ASP programs from natural language, using neural machine translation and CNL2ASP tool.


Hyperspectral unmixing for Raman spectroscopy via physics-constrained autoencoders

http://arxiv.org/abs/2403.04526v1

Compressor summary: Autoencoder neural networks improve unmixing accuracy, robustness, and efficiency in hyperspectral Raman spectroscopy for chemical composition analysis.


T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

http://arxiv.org/abs/2403.04523v1

Compressor summary: T-TAME is a method to generate explanations for Vision Transformers and other neural networks in image classification tasks, achieving state-of-the-art performance with less computation than existing techniques.


Uncertainty-Aware Relational Graph Neural Network for Few-Shot Knowledge Graph Completion

http://arxiv.org/abs/2403.04521v1

Compressor summary: The paper proposes a novel framework, UFKGC, for uncertainity-aware few-shot knowledge graph completion that models uncertainty using Gaussian distributions and improves robustness to noises.


Uncovering the Deep Filter Bubble: Narrow Exposure in Short-Video Recommendation

http://arxiv.org/abs/2403.04511v1

Compressor summary: The study explores deep filter bubbles on short-video platforms, how they evolve over time, and what factors influence them, while suggesting ways to mitigate their negative effects.


Where does In-context Translation Happen in Large Language Models

http://arxiv.org/abs/2403.04510v1

Compressor summary: The study explores the point where large language models transition from learning in context to translating, and finds a "task recognition" layer where attention to context is no longer needed.


Finding Waldo: Towards Efficient Exploration of NeRF Scene Space

http://arxiv.org/abs/2403.04508v1

Compressor summary: The paper introduces a new concept of scene exploration with NeRFs, proposes three methods to efficiently discover inputs for novel view synthesis, and shows that the proposed EGPS method outperforms baselines.


NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems

http://arxiv.org/abs/2403.04507v1

Compressor summary: The text introduces a novel method for evaluating natural language preprocessing tools using a benchmarking system inspired by GLUE and applied to Polish.


Improving Matrix Completion by Exploiting Rating Ordinality in Graph Neural Networks

http://arxiv.org/abs/2403.04504v1

Compressor summary: ROGMC is a new method that uses cumulative preference propagation and interest regularization to incorporate rating ordinality in graph neural networks for matrix completion, leading to better recommendations.


What makes an image realistic?

http://arxiv.org/abs/2403.04493v1

Compressor summary: The text discusses the challenges of quantifying realism in generated data, proposes a new concept called universal critic, and argues that it is different from adversarial critics.


Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

http://arxiv.org/abs/2403.04492v1

Compressor summary: The paper proposes two improvements for cross-domain few-shot classification: a parameter-efficient adaptation strategy and a variance-aware loss function, achieving better accuracy and efficiency than existing methods.


Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging

http://arxiv.org/abs/2403.04484v1

Compressor summary: The study compares the performance of ImageNet and RadImageNet in medical imaging classification, finding that both achieve similar results but ImageNet overfits more to confounders.


GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

http://arxiv.org/abs/2403.04483v1

Compressor summary: Key points: - The paper proposes a benchmark called GraphInstruct to evaluate and enhance the graph understanding abilities of large language models (LLMs) - The paper constructs two LLM variants, GraphLM and GraphLM+, by instruction-tuning and step mask training respectively - The paper shows that GraphLM and GraphLM+ outperform other LLMs on 21 classical graph reasoning tasks - The paper releases the code for generating GraphInstruct publicly Summary: The paper introduces GraphInstruct, a benchmark to test and improve LLMs' graph understanding skills, and two enhanced LLM variants that excel on various graph reasoning tasks.


On the Topology Awareness and Generalization Performance of Graph Neural Networks

http://arxiv.org/abs/2403.04482v1

Compressor summary: This paper characterizes the topology awareness of graph neural networks (GNNs) and studies its impact on generalization performance and fairness, showing that improving topology awareness can cause unfair generalization in some cases.


Do Large Language Model Understand Multi-Intent Spoken Language ?

http://arxiv.org/abs/2403.04481v1

Compressor summary: The study uses Large Language Models for understanding spoken language with multiple intentions, creating new datasets and metrics to evaluate their performance.


Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting

http://arxiv.org/abs/2403.04477v1

Compressor summary: The paper investigates how specific hyperparameters affect MLP performance in time series forecasting and introduces a large new dataset for this task.


TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

http://arxiv.org/abs/2403.04473v1

Compressor summary: TextMonkey is a text-centric large multimodal model that enhances performance and interpretability in tasks like DocVQA and scene text analysis with improvements on several benchmark datasets.


The Shutdown Problem: Three Theorems

http://arxiv.org/abs/2403.04471v1

Compressor summary: The shutdown problem explores how to create artificial agents that can competently pursue goals and still shut down when needed without trying to prevent or cause the shutdown button to be pressed, with trade-offs depending on the agent's patience.


A Survey of Graph Neural Networks in Real world: Imbalance, Noise, Privacy and OOD Challenges

http://arxiv.org/abs/2403.04468v1

Compressor summary: The text surveys existing Graph Neural Network (GNN) models that address challenges like data imbalance, noise, privacy, and out-of-distribution scenarios to improve their reliability and robustness in real-world applications.


Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

http://arxiv.org/abs/2403.04460v1

Compressor summary: PEARL is a new conversational recommendation dataset with detailed persona and knowledge that improves recommendation quality and relevance.


Low-Resource Court Judgment Summarization for Common Law Systems

http://arxiv.org/abs/2403.04454v1

Compressor summary: The authors present CLSum, a dataset for summarizing multi-jurisdictional common law court judgments, and propose LLM-based methods for data augmentation, summary generation, and evaluation.


Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation

http://arxiv.org/abs/2403.04453v1

Compressor summary: Vlearn is an off-policy reinforcement learning algorithm that uses only a state-value function as the critic, making it efficient and robust in high-dimensional action spaces.


Feedback-Generation for Programming Exercises With GPT-4

http://arxiv.org/abs/2403.04449v1

Compressor summary: The paper evaluates GPT-4 Turbo's ability to provide feedback for student programming submissions, finding improvements over GPT-3.5 in correctness and structure but also noting some inconsistencies.


FRRI: a novel algorithm for fuzzy-rough rule induction

http://arxiv.org/abs/2403.04447v1

Compressor summary: The paper introduces a novel fuzzy rough rule induction algorithm (FRRI) that creates interpretable white box models by combining fuzzy and rough set theory, and shows its superior performance in accuracy and rule length compared to other methods.


Classist Tools: Social Class Correlates with Performance in NLP

http://arxiv.org/abs/2403.04445v1

Compressor summary: The text discusses how sociodemographic characteristics like socioeconomic status, ethnicity, and geography affect NLP performance and calls for their inclusion in future language technologies to avoid disadvantaging less-privileged groups.


Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser

http://arxiv.org/abs/2403.04444v1

Compressor summary: The paper proposes a new method (DDHPose) for 3D human pose estimation that disentangles the pose into length and direction, uses a hierarchical denoiser to model joint spatial and temporal information, and improves performance over previous methods.


FriendNet: Detection-Friendly Dehazing Network

http://arxiv.org/abs/2403.04443v1

Compressor summary: The paper proposes FriendNet, a novel architecture that combines image restoration and object detection to improve performance in adverse weather conditions for autonomous driving systems.


Cooperative Bayesian Optimization for Imperfect Agents

http://arxiv.org/abs/2403.04442v1

Compressor summary: The paper proposes a collaborative Bayesian optimization problem where two agents work together to optimize a black-box function with each controlling one variable, using strategic planning and a user model to find the global maximum efficiently.


StableDrag: Stable Dragging for Point-based Image Editing

http://arxiv.org/abs/2403.04437v1

Compressor summary: StableDrag is a new framework that improves point-based image editing by using a discriminative point tracking method and a confidence-based latent enhancement strategy to address the issues of inaccurate tracking and incomplete supervision.


Exploring the Influence of Dimensionality Reduction on Anomaly Detection Performance in Multivariate Time Series

http://arxiv.org/abs/2403.04429v1

Compressor summary: The paper evaluates how dimensionality reduction techniques improve unsupervised time series anomaly detection models' performance and efficiency across different datasets.


Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences

http://arxiv.org/abs/2403.04417v1

Compressor summary: The text discusses how to use surrogate models to reduce computational costs and increase efficiency for large-scale Agent-Based Models (ABMs) in Social Health Computational Sciences.


Exploring Continual Learning of Compositional Generalization in NLI

http://arxiv.org/abs/2403.04400v1

Compressor summary: The paper proposes a new challenge (C2Gen NLI) to test neural models' compositional inference abilities under continual learning, and analyzes how different algorithms and subtask ordering affect performance.


MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment

http://arxiv.org/abs/2403.04398v1

Compressor summary: The paper proposes a method called MAGR to reduce forgetting in continual assessment of diverse skills by aligning old and new features with quality scores.


Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning

http://arxiv.org/abs/2403.04385v1

Compressor summary: This paper studies how different visual characteristics of input Earth observation data affect land cover classification models' performance and finds that texture distortions have a greater impact than color distortions.


Acceleron: A Tool to Accelerate Research Ideation

http://arxiv.org/abs/2403.04382v1

Compressor summary: Acceleron is a tool that uses large language models to help researchers formulate novel research proposals and validate their motivation by identifying gaps in existing literature.


Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

http://arxiv.org/abs/2403.04381v1

Compressor summary: The paper proposes a method to adapt single-view hand pose estimation models to dual views without extra annotations, allowing them to work with different camera settings.


Video-Driven Animation of Neural Head Avatars

http://arxiv.org/abs/2403.04380v1

Compressor summary: The paper proposes a new method to animate realistic 3D head models using video input without requiring personal information, by using a neural network that translates expression features into animation parameters.


Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases

http://arxiv.org/abs/2403.04376v1

Compressor summary: The paper studies how Chinese speakers omit noun markers based on context, builds a corpus with annotations, and tests various machine learning models to predict the missing markers' meanings.


From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

http://arxiv.org/abs/2403.04369v1

Compressor summary: The FWGB approach uses a legal knowledge graph and attention mechanism to predict confusing criminal charges by focusing on constituent elements that distinguish them.


Learning to Remove Wrinkled Transparent Film with Polarized Prior

http://arxiv.org/abs/2403.04368v1

Compressor summary: The paper proposes a method to remove wrinkled transparent films from images using polarized cameras and neural networks, improving image quality and industrial recognition systems performance.


Enhancing Court View Generation with Knowledge Injection and Guidance

http://arxiv.org/abs/2403.04366v1

Compressor summary: The paper proposes a novel Knowledge Injection and Guidance (KIG) approach to improve court view generation using pretrained language models, achieving better results especially in handling responsive claims.


Multi-step Temporal Modeling for UAV Tracking

http://arxiv.org/abs/2403.04363v1

Compressor summary: MT-Track is a new efficient and streamlined framework for UAV tracking that uses temporal modeling in two steps: correlation map generation and refinement, to handle challenges like fast motion and small objects.


Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification

http://arxiv.org/abs/2403.04353v1

Compressor summary: The text proposes a new EEG-based motor imagery classification method using topological maps, spatial features, and spatiotemporal pooling to improve accuracy.


CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

http://arxiv.org/abs/2403.04343v1

Compressor summary: The text proposes a novel algorithm for balancing tasks when tuning large multimodal models using visual instructions.


Explainable AI for Embedded Systems Design: A Case Study of Static Redundant NVM Memory Write Prediction

http://arxiv.org/abs/2403.04337v1

Compressor summary: The paper explores using eXplainable Artificial Intelligence (XAI) to identify redundant memory writes in embedded systems, which can improve performance and energy efficiency.


Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models

http://arxiv.org/abs/2403.04325v1

Compressor summary: The Composition Score is a new model-based metric that measures how much smaller language units combine to form phrases and sentences' meanings, and it relates to brain regions involved in different aspects of this process.


Discriminative Probing and Tuning for Text-to-Image Generation

http://arxiv.org/abs/2403.04321v1

Compressor summary: The paper proposes a discriminative adapter for text-to-image generation that improves alignment between generated images and text prompts by enhancing the model's discriminative abilities.


Online Adaptation of Language Models with a Memory of Amortized Contexts

http://arxiv.org/abs/2403.04317v1

Compressor summary: The paper introduces MAC, an efficient and effective online adaptation framework for large language models that uses amortized feature extraction and memory-augmentation to store new information and answer questions without gradient updates.


Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders

http://arxiv.org/abs/2403.04314v1

Compressor summary: The text proposes a new evaluation toolkit for conversational systems that assesses semantic understanding by measuring negation and implicature, and suggests pre-training with augmented data to improve embedding models.


ALTO: An Efficient Network Orchestrator for Compound AI Systems

http://arxiv.org/abs/2403.04311v1

Compressor summary: ALTO is a network orchestrator that improves the efficiency and performance of compound AI systems like language models by streaming intermediate outputs between stages.


AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

http://arxiv.org/abs/2403.04309v1

Compressor summary: The paper proposes AO-DETR, a method to detect overlapping prohibited items in X-ray images using category-specific queries and edge localization, outperforming existing object detectors.


HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild

http://arxiv.org/abs/2403.04307v1

Compressor summary: HaluEval-Wild is a new benchmark for evaluating large language models' hallucinations in real-world user-LLM interactions by using challenging queries from existing datasets and categorizing them into five types.


Effectiveness Assessment of Recent Large Vision-Language Models

http://arxiv.org/abs/2403.04306v1

Compressor summary: This article evaluates the performance of large vision-language models (LVLMs) in specialized tasks like object detection and healthcare, as well as general tasks like reasoning and question answering, finding that they are not very effective in either domain.


LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

http://arxiv.org/abs/2403.04303v1

Compressor summary: LORS reduces parameter usage in deep learning models by allowing stacked modules to share most of their parameters.


A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP

http://arxiv.org/abs/2403.04294v1

Compressor summary: A$^{3}$lign-DFER is a method that aligns text and video in three aspects (affective, dynamic, and bidirectional) to improve CLIP's performance in recognizing facial expressions dynamically.


MKF-ADS: A Multi-Knowledge Fused Anomaly Detection System for Automotive

http://arxiv.org/abs/2403.04293v1

Compressor summary: The paper proposes MKF-IDS, an anomaly-based intrusion detection system for CAN bus in ITSs, using spatial-temporal correlation with attention mechanism and patch sparse-transformer modules to improve safety and security.


A challenge in A(G)I, cybernetics revived in the Ouroboros Model as one algorithm for all thinking

http://arxiv.org/abs/2403.04292v1

Compressor summary: The paper discusses AI challenges in image categorization and generation, proposes incorporating cybernetics and analog control processes for improved cognition and abstraction, and introduces the Ouroboros Model as a versatile algorithmic backbone for general cognition.


Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

http://arxiv.org/abs/2403.04283v1

Compressor summary: Proxy-RLHF is a new method that lowers the computational cost of aligning large language models with human values by decoupling generation and alignment processes using a proxy model trained with reinforcement learning.


A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain

http://arxiv.org/abs/2403.04280v1

Compressor summary: The paper proposes a comprehensive benchmark for Arabic speech recognition in telephone conversations, considering dialectal diversity and call quality challenges.


Controllable Generation with Text-to-Image Diffusion Models: A Survey

http://arxiv.org/abs/2403.04279v1

Compressor summary: The text reviews controllable generation with text-to-image diffusion models, covering their theoretical foundations and practical advancements in various condition categories.


Active Generalized Category Discovery

http://arxiv.org/abs/2403.04272v1

Compressor summary: AGCD uses adaptive sampling to select valuable novel samples for labeling, and a stable label mapping algorithm to ensure consistent training across different stages, improving GCD performance in the low-labeling regime.


Competitive Facility Location under Random Utilities and Routing Constraints

http://arxiv.org/abs/2403.04264v1

Compressor summary: The paper studies a facility location problem with routing constraints in a competitive market, proposes new cuts for solving it, and develops exact and heuristic methods that outperform existing approaches.


Advancing Biomedical Text Mining with Community Challenges

http://arxiv.org/abs/2403.04261v1

Compressor summary: Biomedical text mining is a rapidly growing field that leverages advanced technology to analyze vast amounts of diverse text data from various sources, with community challenges promoting innovation and collaboration in Chinese biomedical research.


Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

http://arxiv.org/abs/2403.04258v1

Compressor summary: The paper proposes a test-time training method for zero-shot video object segmentation that predicts consistent depth maps and uses momentum-based weight initialization and looping-based training scheme to achieve better results.


Mastering Memory Tasks with World Models

http://arxiv.org/abs/2403.04253v1

Compressor summary: Recall to Imagine (R2I) integrates state space models into world models of reinforcement learning agents to improve long-term memory and credit assignment, achieving superhuman performance in complex memory tasks and faster convergence than DreamerV3.


UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

http://arxiv.org/abs/2403.04247v1

Compressor summary: The paper introduces negative seed entities to improve Entity Set Expansion (ESE) for ultra-fine-grained semantic classes, proposes two frameworks to assess models in this task, and suggests three strategies to enhance model performance.


Regularized DeepIV with Model Selection

http://arxiv.org/abs/2403.04236v1

Compressor summary: The paper proposes Regularized DeepIV (RDIV), a minimax-oracle-free method that avoids limitations in instrumental variable estimation using machine learning and provides rigorous guarantees for the popular DeepIV method with Tikhonov regularization.


DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

http://arxiv.org/abs/2403.04233v1

Compressor summary: DEEP-ICL is a new method that uses task definitions to improve in-context learning without relying on large language models.


Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging

http://arxiv.org/abs/2403.04228v1

Compressor summary: The paper proposes a new method for reconstructing HDR images from multiple low dynamic range images in dynamic scenes, using single-frame HDR reconstruction and enhanced stop image techniques to preserve details and avoid ghosting artifacts.


3DTextureTransformer: Geometry Aware Texture Generation for Arbitrary Mesh Topology

http://arxiv.org/abs/2403.04225v1

Compressor summary: The 3DTextureTransformer is a novel framework that generates high-quality textures for arbitrary mesh topologies using a hybrid approach of geometric deep learning and StyleGAN-like architecture, achieving state-of-the-art performance in this domain.


Aligners: Decoupling LLMs and Alignment

http://arxiv.org/abs/2403.04224v1

Compressor summary: The paper proposes to train aligner models that can quickly and safely align any large language model for a given criterion using synthetic data.


Self-Evaluation of Large Language Model based on Glass-box Features

http://arxiv.org/abs/2403.04222v1

Compressor summary: The study explores how open-source Large Language Models can evaluate their own output using glass-box features, such as softmax distribution, and incorporating reference features to improve quality evaluation.


Why Online Reinforcement Learning is Causal

http://arxiv.org/abs/2403.04221v1

Compressor summary: The paper explores how causal modelling can enhance reinforcement learning in online and offline settings, especially when learning from other agents' experiences.


Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

http://arxiv.org/abs/2403.04212v1

Compressor summary: The text proposes a new framework called PESS that can automatically infer persona from dialogues, which helps improve emotional support in conversations with chatbots.


GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

http://arxiv.org/abs/2403.04206v1

Compressor summary: The paper proposes a new algorithm for fast distributed training of deep learning models with better convergence and quality, and shows its effectiveness in theoretical and experimental studies.


On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

http://arxiv.org/abs/2403.04204v1

Compressor summary: Key points: - Big AI models have impressive results but also potential risks - Alignment technologies aim to make models conform to human preferences and values - Survey paper investigates value alignment approaches and their challenges, categories, connections, and frontiers Summary: The paper surveys various methods and issues related to aligning AI models with human values, exploring historical context, mathematical essence, existing techniques, emerging topics, and future directions.


ACC-ViT : Atrous Convolution's Comeback in Vision Transformers

http://arxiv.org/abs/2403.04200v1

Compressor summary: The paper introduces Atrous Attention, a hybrid of regional and sparse attention, which adapts to local and global information while preserving hierarchical relations. It also presents ACC-ViT, a vision transformer backbone that achieves high accuracy on ImageNet-1K with fewer parameters than existing models.


CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

http://arxiv.org/abs/2403.04198v1

Compressor summary: The paper proposes CN-RMA, a new method for detecting 3D indoor objects from multiple images, by combining 3D reconstruction and object detection networks to handle occlusion issues.


Large Language Models are In-Context Molecule Learners

http://arxiv.org/abs/2403.04197v1

Compressor summary: ICMA is a new method for adapting large language models to the molecule-caption translation task using context examples, improving their performance without extra pre-training or data.


Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control

http://arxiv.org/abs/2403.04195v1

Compressor summary: The text discusses applying deep reinforcement learning techniques to determine optimal reservoir operation policies, focusing on two methods that perform well for a case study of Folsom Reservoir in California.


SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising

http://arxiv.org/abs/2403.04194v1

Compressor summary: The paper explores using the Segment Anything Model (SAM) for video object segmentation by iteratively refining bounding box prompts to handle position, size, and occlusion variations.


Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

http://arxiv.org/abs/2403.04190v1

Compressor summary: This paper explores how large language models can create realistic synthetic data for low-resource AI challenges, detailing methods, evaluations, and applications, while acknowledging limitations and suggesting future directions.


YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions

http://arxiv.org/abs/2403.04183v1

Compressor summary: The paper proposes Refer-VI-ReID, a method to match visible images using infrared samples and textual descriptions, with a new Y-Y-shape structure and a cross-modal re-ranking algorithm that improves performance on three datasets.


Metric-aware LLM inference

http://arxiv.org/abs/2403.04182v1

Compressor summary: Metric-aware LLM inference is a new method to optimize NLP task performance by adjusting inference strategies based on evaluation metrics.


RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting

http://arxiv.org/abs/2403.04180v1

Compressor summary: RATSF is a framework that uses RACA, a cross-attention module, to improve customer service volume forecasting by leveraging historical data effectively in non-stationary scenarios.


Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

http://arxiv.org/abs/2403.04178v1

Compressor summary: The paper proposes a TTS system that incorporates stress into translations, addressing the challenge of language diversity and accessibility in India's education sector.


Image Coding for Machines with Edge Information Learning Using Segment Anything

http://arxiv.org/abs/2403.04173v1

Compressor summary: Key points: - The paper proposes a new method for image compression called SA-ICM that encodes and decodes only the edge information of object parts in an image - SA-ICM is based on a LIC model trained using edge information created by Segment Anything, which can be applied to various image recognition tasks and video compression models - SA-ICM has advantages in terms of privacy protection, robustness, and performance for image recognition and video compression Summary: The paper introduces SA-ICM, a novel image compression technique that preserves only the edge information of object parts, which improves image recognition and video compression tasks while protecting privacy.


SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization

http://arxiv.org/abs/2403.04172v1

Compressor summary: SDPL is a part-based representation learning method for cross-view geo-localization that divides images into multiple parts to explore contextual information while maintaining global structure and handling position shifting and scale variations.


ProMISe: Promptable Medical Image Segmentation using SAM

http://arxiv.org/abs/2403.04164v1

Compressor summary: The paper proposes a method to improve medical image segmentation using adaptive prompts and pattern shifting without fine-tuning the large SAM model, achieving competitive results.


Noisy Spiking Actor Network for Exploration

http://arxiv.org/abs/2403.04162v1

Compressor summary: NoisySAN is a novel exploration strategy for deep RL using spiking neural networks that introduces and reduces noise to achieve better performance on various tasks.


SWAP-NAS: Sample-Wise Activation Patterns For Ultra-Fast NAS

http://arxiv.org/abs/2403.04161v1

Compressor summary: SWAP-Score is a novel training-free metric that measures network expressivity and has strong correlation with performance, outperforming existing metrics in Neural Architecture Search.


DA-Net: A Disentangled and Adaptive Network for Multi-Source Cross-Lingual Transfer Learning

http://arxiv.org/abs/2403.04158v1

Compressor summary: DA-Net is a new method to transfer knowledge from multiple languages to another language by disentangling inputs and adapting class distributions, improving performance on three tasks and 38 languages.


Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

http://arxiv.org/abs/2403.04154v1

Compressor summary: Key points: - The paper proposes a method to train deep neural networks with policy gradient and SDEs for generating samples with high rewards. - The method constrains the SDE to be consistent with its perturbation process, which covers the entire space and is easy to sample. - The method improves stability and sample complexity of policy gradients and applies it to structure-based drug design. Summary: The paper presents a stable and efficient method to train deep neural networks with SDEs and policy gradient for generating high-reward samples, and demonstrates its effectiveness on structure-based drug design.


Dual-path Frequency Discriminators for Few-shot Anomaly Detection

http://arxiv.org/abs/2403.04151v1

Compressor summary: Key points: - FSAD is important for industrial manufacturing but existing methods have limitations - The paper proposes a DFD network that uses frequency domain to detect and locate subtle anomalies at image-level and feature-level - DFD outperforms state-of-the-art methods on benchmarks Summary: The paper presents a novel FSAD method, DFD, that leverages frequency domain to detect and locate inconspicuous anomalies in industrial manufacturing using a dual-path feature discrimination module.


MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection

http://arxiv.org/abs/2403.04149v1

Compressor summary: The paper proposes MAP, a method to protect intellectual property of deep learning models by pruning target-related parameters without accessing unauthorized data.


Contrastive Augmented Graph2Graph Memory Interaction for Few Shot Continual Learning

http://arxiv.org/abs/2403.04140v1

Compressor summary: The text introduces a new method for FSCIL that uses G2G interaction to preserve local geometric structure and mitigate catastrophic forgetting in few-shot learning.


Unsupervised Learning of Harmonic Analysis Based on Neural HSMM with Code Quality Templates

http://arxiv.org/abs/2403.04135v1

Compressor summary: The paper proposes a method for unsupervised learning of harmonic analysis using a hidden semi-Markov model and chord quality templates, which can recognize tonic without prior knowledge.


Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving

http://arxiv.org/abs/2403.04133v1

Compressor summary: nuPlan is a new dataset and benchmark for testing machine learning-based planners in autonomous vehicles across diverse driving situations.


Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

http://arxiv.org/abs/2403.04132v1

Compressor summary: Chatbot Arena is an open platform that evaluates large language models based on human preferences using pairwise comparisons and crowdsourcing, providing a credible and widely cited leaderboard for LLMs.


An Explainable AI Framework for Artificial Intelligence of Medical Things

http://arxiv.org/abs/2403.04130v1

Compressor summary: The text describes a custom Explainable Artificial Intelligence framework that uses multiple techniques to improve transparency and accuracy in healthcare systems, particularly in brain tumor detection.


Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models

http://arxiv.org/abs/2403.04125v1

Compressor summary: ComFe is an interpretable image classification approach that uses transformer-decoder and mixture modelling to identify and use consistent image components for accurate predictions.


Privacy-preserving Fine-tuning of Large Language Models through Flatness

http://arxiv.org/abs/2403.04124v1

Compressor summary: The paper proposes a framework to improve both privacy and generalization of large language models by controlling the flatness of their loss landscape.


Can Large Language Models Reason and Plan?

http://arxiv.org/abs/2403.04121v1

Compressor summary: LLMs do not seem to be able to correct their own mistakes like humans can.


A data-centric approach to class-specific bias in image data augmentation

http://arxiv.org/abs/2403.04120v1

Compressor summary: Data augmentation improves computer vision models but may introduce class-specific biases; this study examines these biases across various datasets and model types, suggesting a nuanced approach to model selection and a refined method for managing DA-induced biases.