arxiv compressed, 2024-02-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-17 generated by the compressor, my personal LLM-based project.


Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

http://arxiv.org/abs/2402.10211v1

Compressor summary: HiSS is a new technique for continuous sequential prediction using hierarchical state-space models that outperforms existing sequence models in real-world sensor datasets.


Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

http://arxiv.org/abs/2402.10210v1

Compressor summary: SPIN-Diffusion is a novel technique that improves diffusion models by having them compete with previous versions in an iterative self-improvement process, outperforming conventional supervised fine-tuning and RL methods.


Recovering the Pre-Fine-Tuning Weights of Generative Models

http://arxiv.org/abs/2402.10208v1

Compressor summary: Spectral DeTuning is a novel method that can recover the pre-fine-tuning weights of generative models, exposing a new vulnerability in large-scale models like Stable Diffusion and Mistral.


Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

http://arxiv.org/abs/2402.10207v1

Compressor summary: RiC is a simple and adaptive method for aligning foundation models with human preferences using supervised fine-tuning and dynamic inference-time adjustment, which reduces alignment cost and complexity compared to reinforcement learning.


Ising on the Graph: Task-specific Graph Subsampling via the Ising Model

http://arxiv.org/abs/2402.10206v1

Compressor summary: The paper proposes a graph reduction method using an Ising model and a graph neural network that adapts to specific downstream tasks in an end-to-end fashion.


Bridging Associative Memory and Probabilistic Modeling

http://arxiv.org/abs/2402.10202v1

Compressor summary: The text explores connections and applications between associative memory and probabilistic modeling in artificial intelligence, such as adaptable energy functions, dynamic memory creation, capacity analysis, and clustering on the hypersphere.


Chain-of-Thought Reasoning Without Prompting

http://arxiv.org/abs/2402.10200v1

Compressor summary: The study shows that large language models can reason effectively without prompting by altering the decoding process, and this approach improves performance on reasoning benchmarks.


Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

http://arxiv.org/abs/2402.10198v1

Compressor summary: The authors study why transformers struggle with multivariate long-term forecasting and propose a lightweight model that improves performance over existing methods.


BitDelta: Your Fine-Tune May Only Be Worth One Bit

http://arxiv.org/abs/2402.10193v1

Compressor summary: The paper introduces BitDelta, a method that quantizes the additional information added during fine-tuning of large language models, enabling significant reductions in GPU memory requirements and improved multi-tenant serving and storage.


Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias

http://arxiv.org/abs/2402.10192v1

Compressor summary: The text introduces mePS, a method to explain deep learning decisions by modeling them as random walks on hypergraphs with an inductive bias inspired by quantum physics that reduces complexity.


Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models

http://arxiv.org/abs/2402.10189v1

Compressor summary: The text discusses uncertainties in Large Language Models' (LLMs) in-context learning responses, proposing a method to quantify both aleatoric and epistemic uncertainties.


Self-consistent Validation for Machine Learning Electronic Structure

http://arxiv.org/abs/2402.10186v1

Compressor summary: Machine learning can help solve electronic structure problems efficiently, but needs a method to estimate its accuracy and integrate with self-consistent field methods for better performance and interpretability.


Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

http://arxiv.org/abs/2402.10184v1

Compressor summary: The paper proposes a tree-based information structure for reward modeling in reinforcement learning from human feedback (RLHF) to address the trilemma of diverse contexts, low labeling cost, and reliable alignment performance, and shows its superiority over chain-based methods on three NLP tasks.


TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

http://arxiv.org/abs/2402.10178v1

Compressor summary: TDAG is a multi-agent framework that uses dynamic task decomposition and agent generation to improve the performance of LLM-based agents in complex real-world tasks by assigning subtasks to specific subagents, while ItineraryBench is a benchmark for evaluating these agents in travel planning scenarios.


Large Scale Constrained Clustering With Reinforcement Learning

http://arxiv.org/abs/2402.10177v1

Compressor summary: The paper proposes a reinforcement learning method to find disjoint clusters that efficiently allocate resources and minimize distances while respecting a threshold.


OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

http://arxiv.org/abs/2402.10176v1

Compressor summary: The authors construct OpenMathInstruct-1, a large math instruction tuning dataset for open-source LLMs, using synthetic data from the Mixtral model and show competitive results on GSM8K and MATH benchmarks.


Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence

http://arxiv.org/abs/2402.10175v1

Compressor summary: The paper proposes a new metric to measure discourse coherence in long-form text generation and shows it performs better than existing methods.


OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

http://arxiv.org/abs/2402.10172v1

Compressor summary: OptiMUS is an LLM-based agent that can formulate and solve linear programming problems from natural language descriptions, outperforming existing methods on easy and hard datasets.


Data Engineering for Scaling Language Models to 128K Context

http://arxiv.org/abs/2402.10171v1

Compressor summary: The paper investigates how much and what kind of data is needed for continual pretraining to scale language models' context lengths to 128K, finding that domain balance and length upsampling are crucial factors.


Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients

http://arxiv.org/abs/2402.10153v1

Compressor summary: The paper presents a knowledge-infused LLM-based conversational health agent for diabetic patients, which integrates external sources and analytical tools to provide accurate and personalized dietary advice.


ControlLM: Crafting Diverse Personalities for Language Models

http://arxiv.org/abs/2402.10151v1

Compressor summary: ControlLM is a method to adjust the personality traits of language models in real time, enhancing their performance on various tasks.


$f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive Learning

http://arxiv.org/abs/2402.10150v1

Compressor summary: The paper proposes a generalization of InfoNCE called f-Mutual Information in Contrastive Learning (f-MICL) using f-divergences, which can improve performance and similarity functions in self-supervised learning.


Tracking Changing Probabilities via Dynamic Learners

http://arxiv.org/abs/2402.10142v1

Compressor summary: Key points: - Predictor learns to predict discrete items from a stream with probabilistic multiclass prediction - Predictor has limited space and needs to output probabilities for salient items only - Stream is unbounded, non-stationary, and concept-based (from prediction games) - Two techniques are proposed: queuing of count snapshots and sparse EMA with dynamic learning rates Summary: The text proposes two efficient techniques for probabilistic multiclass prediction of salient items in a non-stationary stream of discrete concepts, motivated by prediction games.


TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles

http://arxiv.org/abs/2402.10137v1

Compressor summary: The paper introduces TOAD, a novel dataset for task-oriented dialogs with realistic app context and system response styles, and evaluates its automatic generation pipeline.


Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem

http://arxiv.org/abs/2402.10133v1

Compressor summary: The paper proposes a novel approach to personalize game content using large language models, which generates better levels than traditional methods and reduces player quitting.


Is Continual Learning Ready for Real-world Challenges?

http://arxiv.org/abs/2402.10130v1

Compressor summary: This paper argues that current continual learning methods are not effective for real-world applications and proposes a new benchmark to test them more accurately.


GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

http://arxiv.org/abs/2402.10128v1

Compressor summary: GES, a new representation using Generalized Exponential Functions, improves efficiency and accuracy in 3D scene modeling compared to traditional Gaussian Splatting methods.


Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN

http://arxiv.org/abs/2402.10115v1

Compressor summary: The study aims to recreate images from EEG signals using a Transformer-encoder and a GAN network with perceptual loss for better image quality.


Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

http://arxiv.org/abs/2402.10110v1

Compressor summary: Selective Reflection-Tuning is a method that improves instruction tuning for large language models by combining teacher and student reflection and data selection.


Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

http://arxiv.org/abs/2402.10109v1

Compressor summary: The proposed method uses LLMs to find evidence in patient EHR data and improve diagnostic accuracy and timeliness by providing individualized risk estimates and reducing errors due to incomplete differentials.


Quantized Embedding Vectors for Controllable Diffusion Language Models

http://arxiv.org/abs/2402.10107v1

Compressor summary: QE-CDLM is a novel diffusion language model approach that improves controllability, portability, and inference speed by quantizing task-specific embedding space and employing adaption fine-tuning.


GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

http://arxiv.org/abs/2402.10104v1

Compressor summary: The GeoEval benchmark evaluates the ability of large and multi-modal language models to solve geometry math problems, revealing the WizardMath model as the best performer but showing the limitations of GPT-series models on rephrased problems.


Any-Shift Prompting for Generalization over Distributions

http://arxiv.org/abs/2402.10099v1

Compressor summary: Any-shift prompting is a method to improve image-language model generalization by using a hierarchical architecture and probabilistic inference to connect training and test distributions in the latent space.


Classification Diffusion Models

http://arxiv.org/abs/2402.10095v1

Compressor summary: Classification Diffusion Models combine density ratio estimation and denoising diffusion models for better learning data distributions and image generation.


MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

http://arxiv.org/abs/2402.10093v1

Compressor summary: MIM-Refiner improves MIM models' representations by using contrastive learning on diverse intermediate layers, achieving state-of-the-art results in linear probing and low-shot classification.


Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

http://arxiv.org/abs/2402.10083v1

Compressor summary: The study evaluates GPT-4's ability to align with human experts in assessing the quality of ophthalmology responses generated by fine-tuned large language models, finding good agreement and potential for streamlining clinical evaluation.


QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

http://arxiv.org/abs/2402.10076v1

Compressor summary: QUICK is a new method that speeds up the computation of quantized large language models on NVIDIA GPUs by avoiding shared memory issues.


GraphCBAL: Class-Balanced Active Learning for Graph Neural Networks via Reinforcement Learning

http://arxiv.org/abs/2402.10074v1

Compressor summary: GraphCBAL is a novel active learning framework for GNNs that acquires class-balanced and informative nodes for annotation to improve GNN performance in skewed class scenarios.


Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence

http://arxiv.org/abs/2402.10073v1

Compressor summary: The authors introduce a new dataset (EiBench) and method (MoEI) for enhancing emotional intelligence in large language models without sacrificing general intelligence.


NYCTALE: Neuro-Evidence Transformer for Adaptive and Personalized Lung Nodule Invasiveness Prediction

http://arxiv.org/abs/2402.10066v1

Compressor summary: The paper presents NYCTALE, a neuro-inspired Transformer architecture for lung cancer diagnosis in Personalized Medicine, which accumulates evidence from CT images and makes decisions when a threshold is reached.


How Much Does Each Datapoint Leak Your Privacy? Quantifying the Per-datum Membership Leakage

http://arxiv.org/abs/2402.10065v1

Compressor summary: The paper analyzes how well different methods protect data privacy in machine learning algorithms by measuring the risk of inferring whether a specific datum is present or not.


Balancing the Causal Effects in Class-Incremental Learning

http://arxiv.org/abs/2402.10063v1

Compressor summary: BaCE addresses catastrophic forgetting in class-incremental learning by balancing causal effects from new and old data for adapting to all classes.


X-maps: Direct Depth Lookup for Event-based Structured Light Systems

http://arxiv.org/abs/2402.10061v1

Compressor summary: The paper proposes a new method for fast and accurate depth estimation in Spatial Augmented Reality using event cameras and laser projectors, simplifying the process and enabling real-time interactivity.


Towards Safer Large Language Models through Machine Unlearning

http://arxiv.org/abs/2402.10058v1

Compressor summary: SKU is a novel unlearning framework for Large Language Models that eliminates harmful knowledge while maintaining performance on normal prompts.


Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination

http://arxiv.org/abs/2402.10052v1

Compressor summary: The text introduces deliberate imagination, a method for unlearning targeted text while maintaining LLMs' generation and understanding abilities, addressing privacy and data exposure issues.


SwissNYF: Tool Grounded LLM Agents for Black Box Setting

http://arxiv.org/abs/2402.10051v1

Compressor summary: The paper introduces TOPGUN, a method that uses program synthesis to plan tool usage in black-box settings, and SwissNYF, a suite for black-box planning and verification tasks, improving the capabilities of LLMs in complex API interactions.


How Flawed is ECE? An Analysis via Logit Smoothing

http://arxiv.org/abs/2402.10046v1

Compressor summary: The authors propose a new calibration metric, Logit-Smoothed ECE, which addresses some drawbacks of the existing expected calibration error (ECE) and evaluate it on pre-trained image classification models.


Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

http://arxiv.org/abs/2402.10039v1

Compressor summary: Feature accentuation is a new method for interpreting neural network features that provides both where and what information in arbitrary input images, using a combination of parameterization, augmentation, and regularization to create naturalistic visualizations.


RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

http://arxiv.org/abs/2402.10038v1

Compressor summary: The paper proposes a new method called RS-DPO that combines rejection sampling and direct preference optimization to fine-tune large language models using human feedback more effectively and efficiently than existing methods.


Diffusion Models Meet Contextual Bandits with Large Action Spaces

http://arxiv.org/abs/2402.10028v1

Compressor summary: The paper proposes diffusion Thompson sampling, a method that leverages diffusion models to efficiently explore correlated action spaces in contextual bandits.


Self-Augmented In-Context Learning for Unsupervised Word Translation

http://arxiv.org/abs/2402.10024v1

Compressor summary: SAIL is a method that improves unsupervised word translation for lower-resource languages by iteratively inducing high-confidence pairs from an LLM and using them for in-context learning.


SAWEC: Sensing-Assisted Wireless Edge Computing

http://arxiv.org/abs/2402.10021v1

Compressor summary: The paper proposes a novel approach called SAWEC that uses wireless sensing to reduce the amount of video data transmitted and processed in mobile VR systems, improving latency and performance.


Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length

http://arxiv.org/abs/2402.10013v1

Compressor summary: Neural networks struggle to learn formal languages perfectly, but using Minimum Description Length objective helps achieve optimal generalization.


Clifford Group Equivariant Simplicial Message Passing Networks

http://arxiv.org/abs/2402.10011v1

Compressor summary: The paper introduces a new method for processing geometric data using Clifford group-equivariant layers and simplicial message passing, which improves performance on various geometric tasks.


MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

http://arxiv.org/abs/2402.10002v1

Compressor summary: MM-Point is a novel self-supervised point cloud representation learning method that leverages multi-view 2D information for better 3D object understanding and achieves state-of-the-art performance in various downstream tasks.


Privacy Attacks in Decentralized Learning

http://arxiv.org/abs/2402.10001v1

Compressor summary: The text describes an attack on decentralized gradient descent that can reconstruct private data of users by exploiting a vulnerability in the gossip averaging protocol.


LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild

http://arxiv.org/abs/2402.09997v1

Compressor summary: LoraRetriever is a framework that adaptively retrieves and composes multiple LoRAs for fine-tuning large language models based on input prompts.


Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

http://arxiv.org/abs/2402.09992v1

Compressor summary: The paper introduces a novel risk-sensitive deep reinforcement learning algorithm for contextual multi-stage stochastic combinatorial optimization and shows its effectiveness in handling distribution shifts.


LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

http://arxiv.org/abs/2402.09989v1

Compressor summary: RiVEG is a framework that uses large language models to connect multimodal tasks, improving named entity recognition and grounding in social media images.


Symmetry-Breaking Augmentations for Ad Hoc Teamwork

http://arxiv.org/abs/2402.09984v1

Compressor summary: The paper introduces symmetry-breaking augmentations (SBA) to improve AI agents' ability to adapt to new teammates with unknown or diverse strategies in collaborative settings.


Data Augmentation and Transfer Learning Approaches Applied to Facial Expressions Recognition

http://arxiv.org/abs/2402.09982v1

Compressor summary: The paper proposes a new data augmentation method for facial expression recognition using geometrical transformations and GANs, leading to high accuracy with pretrained convolutional neural networks.


Fast Vocabulary Transfer for Language Model Compression

http://arxiv.org/abs/2402.09977v1

Compressor summary: The text suggests a new way to make language models smaller by sharing words between different domains, which helps save space and speed up computing without hurting performance too much.


Accelerating Parallel Sampling of Diffusion Models

http://arxiv.org/abs/2402.09970v1

Compressor summary: ParaTAA is a novel algorithm that accelerates image generation by parallelizing and optimizing the autoregressive process of diffusion models.


Case Study: Testing Model Capabilities in Some Reasoning Tasks

http://arxiv.org/abs/2402.09967v1

Compressor summary: The text discusses the need to improve large language models' reasoning skills and explainability for more complex tasks.


Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

http://arxiv.org/abs/2402.09966v1

Compressor summary: The text describes a new model that can generate images from text for multiple concepts, using cross-attention guidance to separate and connect the visual representation of each concept to the text prompt.


Why are Sensitive Functions Hard for Transformers?

http://arxiv.org/abs/2402.09963v1

Compressor summary: The paper explains how the input-space sensitivity of transformers affects their learnability and generalization, providing a theoretical framework to understand their biases and limitations.


ViGEO: an Assessment of Vision GNNs in Earth Observation

http://arxiv.org/abs/2402.09962v1

Compressor summary: The text discusses using deep neural networks, including a recent Vision GNN architecture called ViG, for land cover classification from satellite images, achieving state-of-the-art results.


Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through Dynamic Shift Extensions: A Deep Reinforcement Learning Approach

http://arxiv.org/abs/2402.09961v1

Compressor summary: The paper proposes a model using deep Q-networks to dynamically adjust offline schedules for committed couriers in crowdsourced delivery platforms, showing benefits in platform profit and reduced lost orders.


On Designing Features for Condition Monitoring of Rotating Machines

http://arxiv.org/abs/2402.09957v1

Compressor summary: The article introduces a new algorithm to design input features for fault recognition in rotating machines using one-dimensional raw sensor data, based on histogram theory, that works with different classifiers and has been validated with three real datasets.


Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation

http://arxiv.org/abs/2402.09954v1

Compressor summary: The study investigates how large language models use in-context learning to generate persona-based human-like Chinese dialogues and identifies three key findings about prompt instructions, retrieval strategies, and corrupted demos.


Multi-Word Tokenization for Sequence Compression

http://arxiv.org/abs/2402.09949v1

Compressor summary: MWT is a tokenizer that represents common multi-word expressions as single tokens, improving efficiency and performance of large language models.


Explaining Probabilistic Models with Distributional Values

http://arxiv.org/abs/2402.09947v1

Compressor summary: The paper proposes a new method to explain probabilistic machine learning models by extending cooperative game theory and introducing distributional values, which track changes in the model output.


Generative AI in the Construction Industry: A State-of-the-art Analysis

http://arxiv.org/abs/2402.09939v1

Compressor summary: This paper reviews generative AI's potential in addressing construction challenges and presents a framework for developing custom solutions using contract documents as an example.


Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse

http://arxiv.org/abs/2402.09934v1

Compressor summary: The paper introduces new datasets to study whataboutism, propaganda, and the tu quoque fallacy in NLP, and proposes a novel method using attention weights for negative sample mining to improve detection accuracy.


A Dataset of Open-Domain Question Answering with Multiple-Span Answers

http://arxiv.org/abs/2402.09923v1

Compressor summary: CLEAN is a new Chinese MSQA dataset with diverse questions and descriptive answers that can challenge existing models.


Road Graph Generator: Mapping roads at construction sites from GPS data

http://arxiv.org/abs/2402.09919v1

Compressor summary: The text describes a method to infer road networks from GPS data in construction sites using a graph-based approach.


BUSTER: a "BUSiness Transaction Entity Recognition" dataset

http://arxiv.org/abs/2402.09916v1

Compressor summary: BUSTER is a dataset for financial transaction entity recognition with manually and automatically annotated documents.


Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

http://arxiv.org/abs/2402.09911v1

Compressor summary: Key points: - The paper proposes a framework to enhance LLMs using KG for open-ended question-answering - The framework combines Pseudo-Graph Generation and Atomic Knowledge Verification - The approach improves ROUGE-L and accuracy scores compared to the baseline - The framework shows generalizability across different KG sources Summary: The paper introduces a KG-based framework that enhances LLMs for open-ended questions, using Pseudo-Graph Generation and Atomic Knowledge Verification, and demonstrates its effectiveness and generalizability.


DE-COP: Detecting Copyrighted Content in Language Models Training Data

http://arxiv.org/abs/2402.09910v1

Compressor summary: DE-COP is a method to detect if copyrighted content was used in training language models by asking them multiple-choice questions based on excerpts from books and their paraphrases.


Generative Representational Instruction Tuning

http://arxiv.org/abs/2402.09906v1

Compressor summary: GRIT is a new language model that excels at both generating and embedding text, unifying the two tasks without sacrificing performance.


Revisiting Recurrent Reinforcement Learning with Memory Monoids

http://arxiv.org/abs/2402.09900v1

Compressor summary: The text introduces memory monoids as a novel framework to improve recurrent models in reinforcement learning by proposing a better batching method.


COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions

http://arxiv.org/abs/2402.09897v1

Compressor summary: Key points: - The study aims to develop a web app for automatically classifying COVID-19 discussions on social media - The study labels and analyzes tweets using various feature extraction and machine learning methods - The study achieves high F1 scores with CNN and Linear SVC algorithms - The study provides a valuable resource for public health challenges and pandemics Summary: The study presents a web app that classifies COVID-19 tweets on social media using different machine learning techniques, reaching high accuracy with deep and traditional algorithms, and offering a useful tool for health issues during pandemics.


Predictors from causal features do not generalize better to new domains

http://arxiv.org/abs/2402.09891v1

Compressor summary: The study finds that machine learning models trained on causal features do not generalize better across domains compared to models using all available features.


Lester: rotoscope animation through video object segmentation and tracking

http://arxiv.org/abs/2402.09883v1

Compressor summary: Lester is a novel method that converts videos into retro-style 2D animations by segmenting and tracking objects using SAM and DeAOT, simplifying contours with Douglas-Peucker, and optionally adding facial traits, pixelation and shadows.


Explaining Kernel Clustering via Decision Trees

http://arxiv.org/abs/2402.09881v1

Compressor summary: This paper explores interpretable kernel clustering using decision trees to approximate kernel k-means, a nonlinear extension of the classic k-means algorithm, while maintaining interpretability and good approximation results.


Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

http://arxiv.org/abs/2402.09880v1

Compressor summary: The study critically assessed 23 state-of-the-art Large Language Model benchmarks, revealing significant limitations and emphasizing the need for standardized methodologies, ethical guidelines, and a paradigm shift in evaluation.


On Computing Plans with Uniform Action Costs

http://arxiv.org/abs/2402.09877v1

Compressor summary: The paper proposes uniformity metrics for automated planning to create stable and predictable plans, and shows their effectiveness in various benchmarks.


Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

http://arxiv.org/abs/2402.09872v1

Compressor summary: This paper introduces Social Reward, a framework that uses implicit feedback from social network users to assess and improve AI-generated images for text-to-image models.


Beyond Kalman Filters: Deep Learning-Based Filters for Improved Object Tracking

http://arxiv.org/abs/2402.09865v1

Compressor summary: The paper proposes two data-driven filtering methods for object tracking that improve accuracy and reduce domain-specific design choices compared to traditional Kalman filters, especially for non-linear motion patterns.


Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

http://arxiv.org/abs/2402.09849v1

Compressor summary: The text discusses hyperparameter selection in Gaussian processes, the challenges of evaluating GP approximations, and proposes recommendations for comparing methods based on user expectations.


Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

http://arxiv.org/abs/2402.09844v1

Compressor summary: The paper introduces JAT, a transformer-based model that can handle various RL tasks and multimodal data types, aiming to achieve a more general, cross-domain AI model design.


LAPDoc: Layout-Aware Prompting for Documents

http://arxiv.org/abs/2402.09841v1

Compressor summary: The paper explores how enriching text-based language models with layout information can improve their performance in document understanding tasks and compares it to multi-modal document transformers.


Performative Reinforcement Learning in Gradually Shifting Environments

http://arxiv.org/abs/2402.09838v1

Compressor summary: The paper proposes a framework to model environments that change due to deployed policies in reinforcement learning and introduces a new algorithm, MDRR, that combines samples from multiple deployments for faster convergence.


Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language Models

http://arxiv.org/abs/2402.09836v1

Compressor summary: Key points: - MobiGeaR is a novel framework that generates mobility behaviour as a commonsense reasoning problem - It uses a context-aware chain-of-thoughts prompting technique and a mechanistic gravity model to align LLMs with mobility data - It outperforms previous methods in sample efficiency, semantic-awareness, and downstream application performance Summary: MobiGeaR is a new method that uses large language models to generate coherent and realistic mobility data by reasoning about context and intentions.


All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining

http://arxiv.org/abs/2402.09834v1

Compressor summary: GCOPE is a novel method that enhances few-shot learning in graph datasets by unifying diverse graphs during pretraining and transferring meaningful knowledge to target tasks.


Utilizing GANs for Fraud Detection: Model Training with Synthetic Transaction Data

http://arxiv.org/abs/2402.09830v1

Compressor summary: This paper shows how Generative Adversarial Networks can be used for fraud detection by modeling complex data distributions and preventing bot-generated transactions.


Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment

http://arxiv.org/abs/2402.09816v1

Compressor summary: The authors propose a method to improve CLIP's performance in remote sensing and medical imagery tasks by fine-tuning and cross-modal alignment of RS modality encoder.


DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

http://arxiv.org/abs/2402.09812v1

Compressor summary: DreamMatcher is a novel method for text-to-image personalization that uses semantic matching and masking to align user-provided references with target prompts while preserving the diversity of pre-trained models.


TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

http://arxiv.org/abs/2402.09811v1

Compressor summary: TEXTRON is a Data Programming-based approach that improves multilingual text detection using weak supervision and an ensemble of CV and DL techniques, especially for low-resource or handwritten languages like Indian scripts.


Knowledge of Pretrained Language Models on Surface Information of Tokens

http://arxiv.org/abs/2402.09808v1

Compressor summary: Pretrained language models can capture some surface information of tokens, but struggle with others and have trouble using the knowledge effectively.


EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

http://arxiv.org/abs/2402.09801v1

Compressor summary: The paper proposes a fine-grained unlearning framework to eliminate object hallucination in multimodal language models without paired data or expensive computation.


Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 Model

http://arxiv.org/abs/2402.09786v1

Compressor summary: The StyleGAN3 model generates realistic faces with a biased discriminator that affects images by gender, race, and other categories.


MC-DBN: A Deep Belief Network-Based Model for Modality Completion

http://arxiv.org/abs/2402.09782v1

Compressor summary: The MC-DBN model improves stock market and heart rate forecasting by bridging semantic gaps in multi-modal data using implicit features.


A Comprehensive Review on Computer Vision Analysis of Aerial Data

http://arxiv.org/abs/2402.09781v1

Compressor summary: This paper reviews computer vision tasks in aerial data analysis, comparing hyper parameters, discussing libraries and datasets, exploring applications, and addressing challenges for future research.


TinyCL: An Efficient Hardware Architecture for Continual Learning on Autonomous Systems

http://arxiv.org/abs/2402.09780v1

Compressor summary: TinyCL is a hardware accelerator for Continuous Learning on resource-constrained autonomous systems that performs both forward and backward propagation and achieves up to 58x speedup compared to an Nvidia GPU.


NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

http://arxiv.org/abs/2402.09773v1

Compressor summary: NutePrune is an efficient progressive pruning method that uses one intact model with multiple masks to compress Large Language Models while maintaining high performance on various tasks.


Representation Learning Using a Single Forward Pass

http://arxiv.org/abs/2402.09769v1

Compressor summary: SPELA is a neuroscience-inspired algorithm for Edge AI devices that uses local Hebbian learning and embedded vectors without backpropagation, achieving high performance and few-shot learning on various datasets.


Reinforcement Learning for Solving Stochastic Vehicle Routing Problem with Time Windows

http://arxiv.org/abs/2402.09765v1

Compressor summary: The paper presents a reinforcement learning method to optimize the vehicle routing problem under uncertainty and time constraints, achieving better results than an ant-colony optimization algorithm.


Aligning Crowd Feedback via Distributional Preference Reward Modeling

http://arxiv.org/abs/2402.09764v1

Compressor summary: DPRM is a framework that uses a beta distribution to adapt to human preferences and fine-tunes an LLM policy to generate preferred responses.


Grounding Language Model with Chunking-Free In-Context Retrieval

http://arxiv.org/abs/2402.09760v1

Compressor summary: CFIC is a new retrieval approach for RAG systems that avoids document chunking and uses hidden states to retrieve relevant evidence text for user queries efficiently and accurately.


Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish

http://arxiv.org/abs/2402.09759v1

Compressor summary: The study fine-tunes an English language model on a large Polish dataset, creating Curie-7B-v1, which generates high-quality Polish text and performs well on nine KLEJ challenges.


Model Compression and Efficient Inference for Large Language Models: A Survey

http://arxiv.org/abs/2402.09748v1

Compressor summary: The paper explores compression and efficient inference methods for large language models, considering their characteristics and taxonomy, and introduces some frameworks for deploying them on resource-constrained devices.


AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis

http://arxiv.org/abs/2402.09742v1

Compressor summary: AI Hospital is a framework that leverages Large Language Models for real-time interactive diagnosis and collaboration among medical agents, enhancing diagnostic accuracy.


QuRating: Selecting High-Quality Data for Training Language Models

http://arxiv.org/abs/2402.09739v1

Compressor summary: QuRating is a method for selecting high-quality pre-training data for language models based on four abstract qualities, improving their perplexity and in-context learning performance.


Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection

http://arxiv.org/abs/2402.09738v1

Compressor summary: The paper proposes a context-aware attention framework for multimodal hateful content detection in both English and non-English languages, improving performance over existing methods.


DFORM: Diffeomorphic vector field alignment for assessing dynamics across learned models

http://arxiv.org/abs/2402.09735v1

Compressor summary: The paper introduces DFORM, a framework for comparing the dynamics of different RNNs by learning a nonlinear coordinate transformation between their trajectories, which measures their orbital similarity and functional equivalence.


Agents Need Not Know Their Purpose

http://arxiv.org/abs/2402.09734v1

Compressor summary: Oblivious agents are designed to maximize human values by inferring designers' intentions and behaving according to an effective utility function that combines known and hidden sub-functions, improving alignment as their intelligence increases.


Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

http://arxiv.org/abs/2402.09733v1

Compressor summary: The research examines if and how large language models are aware of their own hallucinations, using an experimental framework and model interpretation techniques to understand and reduce hallucination.


POBEVM: Real-time Video Matting via Progressively Optimize the Target Body and Edge

http://arxiv.org/abs/2402.09731v1

Compressor summary: The paper proposes a new CNN-based method for video matting that separates the target body and edge estimation and improves edge accuracy with a novel loss function.


DOF: Accelerating High-order Differential Operators with Forward Propagation

http://arxiv.org/abs/2402.09730v1

Compressor summary: The paper proposes DOF, an efficient framework for calculating second-order differential operators in PDEs using deep learning, which improves efficiency and memory consumption over existing methods.


A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

http://arxiv.org/abs/2402.09727v1

Compressor summary: ReadAgent is a system that uses LLMs to read long documents interactively, store relevant content in gist memories, and improve performance on reading comprehension tasks.


Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

http://arxiv.org/abs/2402.09725v1

Compressor summary: The paper proposes EECR, a training approach for CMLM to address data distribution discrepancy between training and inference in mask-predict frameworks for natural language processing tasks.


Region Feature Descriptor Adapted to High Affine Transformations

http://arxiv.org/abs/2402.09724v1

Compressor summary: The paper introduces a new region feature descriptor that simulates affine transformations using classification, improving feature matching accuracy under high affine transformations.


Visually Dehallucinative Instruction Generation: Know What You Don't Know

http://arxiv.org/abs/2402.09717v1

Compressor summary: The study introduces a new concept of "I Know" hallucination in image question-answering and proposes a benchmark, instructions database, and methods to reduce it.


Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

http://arxiv.org/abs/2402.09712v1

Compressor summary: This paper shows how diffusion models with cross-attention can help learn disentangled representations from images without complex designs or additional regularization.


Node Duplication Improves Cold-start Link Prediction

http://arxiv.org/abs/2402.09711v1

Compressor summary: This paper introduces NodeDup, an augmentation technique for graph neural networks that improves their link prediction performance on low-degree nodes by duplicating them and creating links with the duplicates.


Sparse and Faithful Explanations Without Sparse Models

http://arxiv.org/abs/2402.09702v1

Compressor summary: The Sparse Explanation Value (SEV) measures how few features are needed to explain a decision in machine learning models, and the paper proposes algorithms to reduce SEV for more interpretable explanations.


An Analysis of Langauge Frequency and Error Correction for Esperanto

http://arxiv.org/abs/2402.09696v1

Compressor summary: The authors create a new dataset for Esperanto grammar error correction and show that GPT-4 performs better than GPT-3.5 in detecting errors in this low-resource language.


Reward Poisoning Attack Against Offline Reinforcement Learning

http://arxiv.org/abs/2402.09695v1

Compressor summary: The paper proposes a black-box reward poisoning attack on general offline reinforcement learning with deep neural networks, which makes low-performing policies look high-performing and vice versa.


Seed Optimization with Frozen Generator for Superior Zero-shot Low-light Enhancement

http://arxiv.org/abs/2402.09694v1

Compressor summary: The paper proposes a low-light image enhancement method that uses pre-trained generators and a novel optimization strategy to produce high-quality images without training on low-light datasets.


HyperMagNet: A Magnetic Laplacian based Hypergraph Neural Network

http://arxiv.org/abs/2402.09676v1

Compressor summary: HyperMagNet is a hypergraph neural network that uses a non-reversible Markov chain to represent hypergraphs and a complex Hermitian Laplacian matrix for node classification tasks.


PAL: Proxy-Guided Black-Box Attack on Large Language Models

http://arxiv.org/abs/2402.09674v1

Compressor summary: The paper introduces PAL, a black-box attack on large language models that achieves high success rates, and other related techniques for testing and improving LLM safety.


Exploiting Alpha Transparency In Language And Vision-Based AI Systems

http://arxiv.org/abs/2402.09671v1

Compressor summary: The text describes a new attack method that uses the alpha layer of PNG images to fool AI vision systems across various domains, requiring retraining and architectural changes for mitigation.


How to Train Data-Efficient LLMs

http://arxiv.org/abs/2402.09668v1

Compressor summary: The paper explores efficient ways to pre-train large language models by using techniques that balance model quality and resource/data usage, such as Ask-LLM (quality assessment) and Density (diverse sampling).


EntailE: Introducing Textual Entailment in Commonsense Knowledge Graph Completion

http://arxiv.org/abs/2402.09666v1

Compressor summary: The paper proposes a method called EntailE to improve commonsense knowledge graph construction by using textual entailment to find implicit relations between nodes with similar plausibility, densifying the graph and enriching node representations.


Hand Shape and Gesture Recognition using Multiscale Template Matching, Background Subtraction and Binary Image Analysis

http://arxiv.org/abs/2402.09663v1

Compressor summary: The paper proposes a simple and effective method to classify hand shapes using multiscale template matching and background subtraction.


User Modeling and User Profiling: A Comprehensive Survey

http://arxiv.org/abs/2402.09660v1

Compressor summary: This paper surveys user modeling and profiling techniques in AI systems, highlighting their evolution, current trends, challenges, and applications in various domains.


The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

http://arxiv.org/abs/2402.09656v1

Compressor summary: This paper studies how editing Large Language Models (LLMs) can cause performance degradation or even model collapse, and proposes using perplexity as a faster alternative to benchmarking LLMs after each edit. It also introduces a new dataset for future research on this topic.


GPT-4's assessment of its performance in a USMLE-based case study

http://arxiv.org/abs/2402.09654v1

Compressor summary: The study explores how feedback affects GPT-4's confidence in answering USMLE questions for healthcare applications.


Foul prediction with estimated poses from soccer broadcast video

http://arxiv.org/abs/2402.09650v1

Compressor summary: Key points: - Computer vision advances can track and estimate poses of sports players, but predicting soccer fouls is challenging - Research introduces deep learning approach to anticipate soccer fouls using video data, bounding box positions, image details, and pose information - Model combines CNNs and RNNs to merge information from four modalities - Experimental results show that all components of the model are useful for foul prediction Summary: The research presents a novel deep learning method that uses video data and pose information to predict soccer fouls, integrating CNNs and RNNs to combine multiple modalities effectively.


Answer is All You Need: Instruction-following Text Embedding via Answering the Question

http://arxiv.org/abs/2402.09642v1

Compressor summary: The paper presents InBedder, a text embedder that uses abstractive question answering to capture user-specified characteristics of texts and follows instructions better than previous approaches, with high interpretability.


Multi-Fidelity Methods for Optimization: A Survey

http://arxiv.org/abs/2402.09638v1

Compressor summary: This survey explores multi-fidelity optimization (MFO), a cost-effective strategy for black-box optimization using hierarchical fidelity models, across various domains and highlights its challenges and prospects.


VisIRNet: Deep Image Alignment for UAV-taken Visible and Infrared Image Pairs

http://arxiv.org/abs/2402.09635v1

Compressor summary: The paper presents a deep learning method for aligning UAV images without using Lucas-Kanade based techniques, achieving superior performance by predicting image corners or homography matrices with a two-branch CNN.


MiMiC: Minimally Modified Counterfactuals in the Representation Space

http://arxiv.org/abs/2402.09631v1

Compressor summary: The paper proposes new methods for improving language models by making their representations less biased and less toxic, using techniques based on the Earth Mover's problem and Gaussian assumptions.