arxiv compressed, 2024-02-28

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-28 generated by the compressor, my personal LLM-based project.


ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

http://arxiv.org/abs/2402.17766v1

Compressor summary: ShapeLLM is a 3D LLM that understands objects from multiple views and interacts with them using language.


The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

http://arxiv.org/abs/2402.17764v1

Compressor summary: The text introduces BitNet b1.58, a 1-bit Large Language Model that achieves similar performance to full-precision models while being more cost-effective and enabling new hardware optimizations.


Massive Activations in Large Language Models

http://arxiv.org/abs/2402.17762v1

Compressor summary: The paper studies how large language models have a few extremely large activation values that affect their behavior and introduce biases.


Towards Optimal Learning of Language Models

http://arxiv.org/abs/2402.17759v1

Compressor summary: The paper proposes a theory for optimal learning of language models by maximizing data compression ratio and shows that it leads to faster learning and improved performance in experiments.


ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

http://arxiv.org/abs/2402.17758v1

Compressor summary: ADL4D is a large 4D HOI dataset with multiple subjects, objects, and actions for learning hand interactions in daily activities.


Robustly Learning Single-Index Models via Alignment Sharpness

http://arxiv.org/abs/2402.17756v1

Compressor summary: Key points: - Learning Single-Index Models under $L_2^2$ loss in agnostic model - Efficient algorithm with constant factor approximation to optimal loss - Works for various distributions and link functions - First efficient approximate learner for Gaussian data and nontrivial link functions - Local error bound notion: alignment sharpness Summary: The paper presents an efficient algorithm for learning Single-Index Models under $L_2^2$ loss in the agnostic model, achieving constant factor approximation to the optimal loss, and introduces a new concept of alignment sharpness.


Evaluating Very Long-Term Conversational Memory of LLM Agents

http://arxiv.org/abs/2402.17753v1

Compressor summary: The text introduces a machine-human pipeline for generating very long-term open-domain dialogues, evaluates the performance of large language models on various tasks, and presents a dataset (LoCoMo) with 35 sessions of conversations.


When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning

http://arxiv.org/abs/2402.17747v1

Compressor summary: Partial human observations in reinforcement learning can lead to deception and overjustification, which are addressed by studying the return function ambiguity.


Analyzing Regional Organization of the Human Hippocampus in 3D-PLI Using Contrastive Learning and Geometric Unfolding

http://arxiv.org/abs/2402.17744v1

Compressor summary: The authors propose a novel method to analyze 3D polarized light images of the human brain's hippocampus using unfolding methods and self-supervised contrastive learning, which can identify classical subfield boundaries.


reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

http://arxiv.org/abs/2402.17739v1

Compressor summary: reBandit is an online RL algorithm that uses random effects and Bayesian priors to deliver personalized mobile health interventions for reducing cannabis use among emerging adults.


Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

http://arxiv.org/abs/2402.17733v1

Compressor summary: The authors propose a method to adapt large language models for multiple tasks in translation workflows by combining pretraining on multilingual data with finetuning on task-specific instructions, achieving competitive results compared to general-purpose models and releasing new datasets and evaluation tools.


Markovletics: Methods and A Novel Application for Learning Continuous-Time Markov Chain Mixtures

http://arxiv.org/abs/2402.17730v1

Compressor summary: The text introduces a new method for learning mixtures of continuous-time Markov chains (CTMCs) from sequential data, explores its impact on learnability, and applies it to analyze user preferences on social media and NBA team tactics.


Towards Fairness-Aware Adversarial Learning

http://arxiv.org/abs/2402.17729v1

Compressor summary: The paper proposes a new adversarial training method, FAAL, that ensures both robustness and fairness of models across different categories by finding the worst distribution among them.


VRP-SAM: SAM with Visual Reference Prompt

http://arxiv.org/abs/2402.17726v1

Compressor summary: The paper introduces a novel Visual Reference Prompt encoder that helps the Segment Anything Model use annotated reference images as prompts for object segmentation, achieving state-of-the-art performance with minimal parameters and generalizing well to unseen objects and domains.


Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

http://arxiv.org/abs/2402.17723v1

Compressor summary: Key points: - Paper proposes a framework for joint video-audio generation using existing models - Uses multimodality latent aligner with ImageBind model to bridge different modalities - Shows superior performance on various tasks involving vision and audio Summary: The paper presents a method to generate videos and audios together using pre-trained models, by aligning their latent representations with ImageBind, and demonstrates its effectiveness on different tasks.


The SMART approach to instance-optimal online learning

http://arxiv.org/abs/2402.17720v1

Compressor summary: SMART is an online learning algorithm that adapts to data and achieves near-optimal regret by switching between follow-the-leader and worst-case policies.


Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

http://arxiv.org/abs/2402.17718v1

Compressor summary: The text proposes a digital twin framework that uses machine learning and optimization techniques to predict and control heat accumulation in laser-based additive manufacturing, improving material properties and part quality.


AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

http://arxiv.org/abs/2402.17717v1

Compressor summary: AmbigNLG is a new task that tackles task ambiguity in instructions for natural language generation tasks by creating a dataset and taxonomy to improve instruction clarity and LLM performance.


Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers

http://arxiv.org/abs/2402.17710v1

Compressor summary: The paper proposes ProxConnect++, a principled method for neural network binarization with automatic theoretical guarantees and enhanced performance in image classification tasks.


Case-Based or Rule-Based: How Do Transformers Do the Math?

http://arxiv.org/abs/2402.17709v1

Compressor summary: The text discusses the difference between rule-based and case-based reasoning in language models, and proposes a method (RFFT) to improve math problem-solving by teaching transformers to follow explicit rules.


Adaptive quantization with mixed-precision based on low-cost proxy

http://arxiv.org/abs/2402.17706v1

Compressor summary: The paper introduces LCPAQ, a fast and effective method to quantize neural networks for low-resource hardware by using mixed-precision, adaptive quantization, and low-cost proxy search.


RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

http://arxiv.org/abs/2402.17700v1

Compressor summary: RAVEL is a dataset for comparing interpretability methods in neural networks and introduces MDAS, a new method that finds distributed representations satisfying multiple causal criteria.


Gradient-based Discrete Sampling with Automatic Cyclical Scheduling

http://arxiv.org/abs/2402.17699v1

Compressor summary: The paper proposes a cyclical scheduling method to efficiently and accurately sample multimodal discrete distributions in deep models, overcoming the limitations of gradient-based discrete sampling.


Autonomous Vehicles: Evolution of Artificial Intelligence and Learning Algorithms

http://arxiv.org/abs/2402.17690v1

Compressor summary: The paper explores how AI and learning algorithms have evolved and are used in autonomous vehicles, influencing their decision-making, development life cycle, ethical considerations, and performance improvement.


QoS prediction in radio vehicular environments via prior user information

http://arxiv.org/abs/2402.17689v1

Compressor summary: The paper evaluates machine learning tree-ensemble methods for predicting wireless communication quality in the automotive industry, using data from a cellular test network and radio environment characteristics to improve accuracy and support longer prediction horizons.


NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents

http://arxiv.org/abs/2402.17682v1

Compressor summary: NextLevelBERT uses higher-level semantic text embeddings to improve large language models' performance on long documents tasks without sacrificing much detail.


MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning

http://arxiv.org/abs/2402.17680v1

Compressor summary: Key points: - The paper proposes a method to prevent catastrophic forgetting in video captioning tasks with sequential input - The method uses fine-grained sensitivity selection and two-stage knowledge distillation to retain old task information - The paper introduces a metric to measure the forgetting rate and evaluates the method on MSR-VTT dataset Summary: The paper presents a novel approach to mitigate catastrophic forgetting in video captioning by using fine-grained sensitivity selection and two-stage knowledge distillation, and proposes a metric to assess the forgetting rate.


CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

http://arxiv.org/abs/2402.17678v1

Compressor summary: CAD-SIGNet is a model that can recover the design history and provide multiple plausible design choices for CAD models given 3D scans of physical objects.


SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

http://arxiv.org/abs/2402.17672v1

Compressor summary: The paper introduces a new deep learning method for PolSAR image classification that fuses three branches and outperforms existing approaches on several datasets.


Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models

http://arxiv.org/abs/2402.17671v1

Compressor summary: This text discusses recent progress in improving the reliability and trustworthiness of foundation models within in-context learning frameworks, addressing issues such as toxicity, hallucination, disparity, adversarial vulnerability, and inconsistency.


Multi-Agent Deep Reinforcement Learning for Distributed Satellite Routing

http://arxiv.org/abs/2402.17666v1

Compressor summary: The paper presents a multi-agent deep learning method for satellite routing in low Earth orbit constellations, using a global neural network to learn optimal paths and local ones for fast on-board routing.


Bayesian Differentiable Physics for Cloth Digitalization

http://arxiv.org/abs/2402.17664v1

Compressor summary: The paper proposes a new method for digitalizing cloth using data from strict measuring protocols, a new dataset, and a Bayesian differentiable cloth model.


TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations

http://arxiv.org/abs/2402.17660v1

Compressor summary: The paper presents advancements in TorchMD-Net, a neural network-based molecular simulation software with improved efficiency, modular design, and physical priors integration.


Confidence-Aware Multi-Field Model Calibration

http://arxiv.org/abs/2402.17655v1

Compressor summary: The paper proposes a confidence-aware multi-field calibration method for ad ranking that adjusts the calibration intensity based on sample statistics and uses multiple feature fields to mitigate data sparsity.


Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

http://arxiv.org/abs/2402.17653v1

Compressor summary: The text describes a segmentation network that can detect errors caused by different test domains without extra annotation, using uncurated data and a novel benchmark based on the SAX Dataset.


Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs

http://arxiv.org/abs/2402.17649v1

Compressor summary: Large language models show left-leaning political leanings that vary by policy domain and are more reliable with increasing model size.


Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

http://arxiv.org/abs/2402.17644v1

Compressor summary: The QRData benchmark evaluates Large Language Models' ability to reason with quantitative data from real-world sources, finding that current models struggle with causal reasoning and using both data and code simultaneously.


Variational Learning is Effective for Large Deep Networks

http://arxiv.org/abs/2402.17641v1

Compressor summary: IVON is a better optimizer than Adam for large neural networks, as it has lower computational costs and higher predictive uncertainty, and can improve various tasks such as fine-tuning, model merging, generalization error prediction, and sensitivity estimation.


From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions

http://arxiv.org/abs/2402.17633v1

Compressor summary: The paper introduces YTSeg, a benchmark for spoken content segmentation, and MiniSeg, a model that performs hierarchical segmentation and smart chaptering.


Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks

http://arxiv.org/abs/2402.17630v1

Compressor summary: InFusE is a novel approach that uses variable premise size and simplifies summary sentences for diverse summarization tasks, outperforming existing methods based on off-the-shelf NLI models.


CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing

http://arxiv.org/abs/2402.17624v1

Compressor summary: The paper proposes CustomSketching, a framework that extracts novel sketch concepts from sketch-image pairs to enable fine-grained image synthesis and editing for large text-to-image models.


Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling

http://arxiv.org/abs/2402.17622v1

Compressor summary: The paper presents a semantic segmentation network that uses foundation models and Masked Image Modeling to produce high-quality uncertainty estimates for safety-critical applications, and shows its effectiveness on the SAX Segmentation benchmark.


Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation

http://arxiv.org/abs/2402.17614v1

Compressor summary: The authors propose a new approach for cross-domain few-shot segmentation that uses task-adaption and consistency across augmented views without training or using a main segmentation network, achieving state-of-the-art performance.


Neural Automated Writing Evaluation with Corrective Feedback

http://arxiv.org/abs/2402.17613v1

Compressor summary: The paper proposes an integrated system for assessing writing and correcting errors using NLP and machine learning to help second language learners improve their proficiency efficiently and cost-effectively.


A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images

http://arxiv.org/abs/2402.17611v1

Compressor summary: This paper evaluates various pretraining methods for detecting defects in solar cells using electroluminescence images and finds that supervised, semi-supervised, and self-supervised pretraining schemes yield similar performance, while some are better for underrepresented classes.


Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)

http://arxiv.org/abs/2402.17608v1

Compressor summary: The paper examines how fine-tuning pre-trained Encoder-Decoder models like T5 on linguistic tasks improves their ability to predict sentence complexity in Italian and English.


Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem

http://arxiv.org/abs/2402.17606v1

Compressor summary: This paper introduces TBGAT, a novel GNN architecture for job shop scheduling, which embeds disjunctive graphs in a forward and backward view and uses topological sorts as features to capture the graph topology.


Advancing sleep detection by modelling weak label sets: A novel weakly supervised learning approach

http://arxiv.org/abs/2402.17601v1

Compressor summary: The study presents a novel method for sleep detection using weakly supervised learning, which outperforms conventional methods and improves calibration accuracy.


Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing

http://arxiv.org/abs/2402.17595v1

Compressor summary: This paper explores implicit regularization in non-linear neural networks for matrix sensing problems, proposes Spectral Neural Networks (SNN) architecture, and shows its effectiveness with theoretical analysis and experiments.


PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning

http://arxiv.org/abs/2402.17589v1

Compressor summary: The paper proposes an end-to-end contrastive learning framework for learning with noisy labels that improves performance by using a Pseudo-Label Relaxed loss and a two-dimensional Gaussian Mixture Model.


Hyperdimensional computing: a fast, robust and interpretable paradigm for biological data

http://arxiv.org/abs/2402.17572v1

Compressor summary: Hyperdimensional computing is an efficient and interpretable alternative to deep learning for bioinformatics, using high-dimensional vectors to represent biological concepts and simple operators to manipulate them.


Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

http://arxiv.org/abs/2402.17574v1

Compressor summary: Agent-Pro is a large language model agent that uses policy-level reflection and optimization to learn from interactive experiences and improve its behavior in complex dynamic scenarios like games.


Sparse Variational Contaminated Noise Gaussian Process Regression for Forecasting Geomagnetic Perturbations

http://arxiv.org/abs/2402.17570v1

Compressor summary: The paper introduces a new method for Gaussian Processes using contaminated normal noise to handle heteroscedastic variance and outliers, and shows its advantages over neural networks in predicting geomagnetic ground perturbations.


Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

http://arxiv.org/abs/2402.17564v1

Compressor summary: The paper proposes a new method called GPO that uses gradient-based optimization techniques to improve the performance of language models as prompt optimizers.


Structure-Guided Adversarial Training of Diffusion Models

http://arxiv.org/abs/2402.17563v1

Compressor summary: SADM is a new diffusion model that learns structural relationships among samples in the data distribution using adversarial training, improving generative performance on various tasks.


An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains

http://arxiv.org/abs/2402.17562v1

Compressor summary: The text discusses the factors that influence the robustness of 3D object detectors, focusing on architecture, voxel encoding, data augmentations, and anchor strategies, and how they can be improved for different domains such as sensor type, weather, and location.


PHNet: Patch-based Normalization for Portrait Harmonization

http://arxiv.org/abs/2402.17561v1

Compressor summary: The paper proposes a patch-based harmonization network for composite images, which improves local visual coherence and achieves state-of-the-art results on iHarmony4 and a new human portrait dataset.


Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label

http://arxiv.org/abs/2402.17555v1

Compressor summary: The paper proposes a new method for semantic segmentation using scribble annotations and pseudo-labels, which considers both local and global cues, corrects feature representations, and reduces uncertainty with a distance entropy loss.


Evaluation of Predictive Reliability to Foster Trust in Artificial Intelligence. A case study in Multiple Sclerosis

http://arxiv.org/abs/2402.17554v1

Compressor summary: The authors propose a method to assess the reliability of Machine Learning predictions in critical contexts like medicine, using Autoencoders and proxy models, and demonstrate its effectiveness on a Multiple Sclerosis prediction model.


OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

http://arxiv.org/abs/2402.17553v1

Compressor summary: The paper introduces OmniACT, a dataset for testing virtual agents' ability to generate executable programs from screen images and natural language tasks, with GPT-4 as the current best baseline but still far from human proficiency.


COCOA: CBT-based Conversational Counseling Agent using Memory Specialized in Cognitive Distortions and Dynamic Prompt

http://arxiv.org/abs/2402.17546v1

Compressor summary: The paper presents CoCoA, a psychological counseling agent that uses CBT techniques, memory management, and dynamic prompting to address cognitive distortions in client statements.


Retrieval is Accurate Generation

http://arxiv.org/abs/2402.17532v1

Compressor summary: The paper proposes a novel method to generate text by selecting context-aware phrases from supporting documents, overcoming challenges in training oracles, and showing improved performance on knowledge-intensive tasks and open-ended text generation.


Predict the Next Word:

http://arxiv.org/abs/2402.17527v1

Compressor summary: Key points: - Language models (LMs) are statistical models that assign probability to text - LMs may not capture human linguistic variability well - Word-level exact matching can evaluate LM's ability to predict human words - LMs have low calibration to human uncertainty and ECE fails to reflect this Summary: The authors use word-level exact matching to test how well language models capture human variability in word prediction, and find that they are poorly calibrated and ECE is unreliable.


Diffusion Model-Based Image Editing: A Survey

http://arxiv.org/abs/2402.17525v1

Compressor summary: The text provides a comprehensive survey of diffusion models for image editing, covering various methods and tasks, and introduces a benchmark and metric for evaluating them.


AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Point Cloud Analysis

http://arxiv.org/abs/2402.17521v1

Compressor summary: The paper introduces an advanced sampler that efficiently downsamples point clouds using adaptive voxel sizes and a network compatible with arbitrary voxel sizes, achieving high accuracy on ShapeNetPart and ScanNet benchmarks.


Label-Noise Robust Diffusion Models

http://arxiv.org/abs/2402.17517v1

Compressor summary: The paper proposes TDSM, a method to train conditional diffusion models with noisy labels by incorporating transition probabilities and improving the quality of generated samples.


QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations

http://arxiv.org/abs/2402.17516v1

Compressor summary: QUCE is a method that improves interpretability of DNNs by minimizing path uncertainty and generating certain counterfactual examples.


Robust Unsupervised Crowd Counting and Localization with Adaptive Resolution SAM

http://arxiv.org/abs/2402.17514v1

Compressor summary: The paper proposes an adaptive resolution SEEM method with robust localization, point pseudo-labels, and a loss function to improve crowd counting performance without extensive training data.


Latent Attention for Linear Time Transformers

http://arxiv.org/abs/2402.17512v1

Compressor summary: The "Latte Transformer" model improves the time complexity of the standard attention mechanism by using latent vectors, enabling efficient language generation for large context windows.


Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

http://arxiv.org/abs/2402.17510v1

Compressor summary: Key points: - VLMs use contrastive training to learn image-caption representations - The paper introduces synthetic shortcuts to test if contrastive losses are sufficient - Contrastive losses are not enough to learn task-optimal representations - Two methods to reduce shortcut learning are latent target decoding and implicit feature modification Summary: The paper tests the sufficiency of contrastive losses for VLMs using synthetic shortcuts and shows that they fail to learn task-optimal representations. It also proposes two methods to mitigate shortcut learning.


Interactive Multi-Head Self-Attention with Linear Complexity

http://arxiv.org/abs/2402.17507v1

Compressor summary: The paper proposes an efficient method to enhance information flow in multi-head self-attention by decomposing the attention operation into query- and key-less components, which reduces computational complexity while maintaining performance.


Intensive Care as One Big Sequence Modeling Problem

http://arxiv.org/abs/2402.17501v1

Compressor summary: The paper introduces Healthcare as Sequence Modeling, a paradigm for reinforcement learning in healthcare that uses event streams to represent interactions between patients and providers, and presents MIMIC-SEQ, a new benchmark dataset based on clinical records from MIMIC-IV.


REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

http://arxiv.org/abs/2402.17497v1

Compressor summary: REAR is a new approach for question answering that helps large language models assess the relevance of retrieved documents to improve their performance.


Prescribing Large Language Models for Perioperative Care: What's The Right Dose for Pre-trained Models?

http://arxiv.org/abs/2402.17493v1

Compressor summary: Large language models can predict postoperative risks using clinical texts with various training strategies, improving performance compared to traditional word embeddings and offering opportunities for personalized perioperative care.


Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

http://arxiv.org/abs/2402.17487v1

Compressor summary: Neural network-based image compression outperforms classical methods and has led to JPEG-AI standardization; however, a gradual algorithmic optimization improves speed and performance of the current verification model.


MGE: A Training-Free and Efficient Model Generation and Enhancement Scheme

http://arxiv.org/abs/2402.17486v1

Compressor summary: The paper presents a method to efficiently generate and enhance deep learning models without training, achieving comparable or better performance and few-shot learning capabilities, as well as possible adversarial defense.


EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

http://arxiv.org/abs/2402.17485v1

Compressor summary: EMO is a new framework for generating talking head videos that captures audio cues and facial movements more accurately and expressively than previous methods.


AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

http://arxiv.org/abs/2402.17483v1

Compressor summary: The paper proposes AlignMiF, a method to align LiDAR and camera data within a multimodal implicit field, improving novel view synthesis performance.


Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles

http://arxiv.org/abs/2402.17478v1

Compressor summary: The paper introduces ArPro, a large propaganda detection dataset for multiple languages, and evaluates GPT-4's performance on fine-grained propaganda detection tasks.


Fraud Detection with Binding Global and Local Relational Interaction

http://arxiv.org/abs/2402.17472v1

Compressor summary: RAGFormer is a novel framework that combines local and global features for fraud detection using GNN and Transformer networks, outperforming previous methods on heterogeneous graphs.


Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization

http://arxiv.org/abs/2402.17470v1

Compressor summary: The text discusses neural network-based image compression codecs and proposes a method to improve the JPEG-AI verification model's bit distribution by adopting VVC intra's adaptable bit distribution structure.


Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

http://arxiv.org/abs/2402.17464v1

Compressor summary: The paper proposes a part-whole hierarchy message passing network for efficient 3D part assembly, using super-parts to provide hints about part poses and enable interpretability.


Training-Free Long-Context Scaling of Large Language Models

http://arxiv.org/abs/2402.17463v1

Compressor summary: Dual Chunk Attention improves LLMs' ability to handle long context sequences without finetuning, enabling better performance on practical tasks and as an open-source alternative to proprietary models.


Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

http://arxiv.org/abs/2402.17457v1

Compressor summary: The study finds that learning rate transfer in neural networks can be explained by the consistency of sharpness across model sizes, which is related to feature learning dynamics.


DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

http://arxiv.org/abs/2402.17453v1

Compressor summary: DS-Agent is a novel framework that combines large language models and case-based reasoning to automate data science tasks, improving task success rates and reducing costs.


Deep Learning Based Named Entity Recognition Models for Recipes

http://arxiv.org/abs/2402.17447v1

Compressor summary: Key points: - The text discusses named entity recognition (NER) techniques for recipe ingredients - The authors created three datasets with different levels of annotation and analysis - They tested various NER models and found the best one to be spaCy-transformer with high macro-F1 scores Summary: The paper compares NER models for identifying recipe ingredients from unstructured text, using three datasets with different levels of quality. The best model is spaCy-transformer with high accuracy.


Principled Architecture-aware Scaling of Hyperparameters

http://arxiv.org/abs/2402.17440v1

Compressor summary: The paper proposes a method to optimize hyperparameters based on the network architecture, which improves generalization and affects auto ML comparisons.


Exploiting Emotion-Semantic Correlations for Empathetic Response Generation

http://arxiv.org/abs/2402.17437v1

Compressor summary: The text introduces a new model (ESCM) for generating empathetic responses that considers both emotions and semantics as dynamic variables in dialogue.


Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder

http://arxiv.org/abs/2402.17433v1

Compressor summary: The paper proposes a new model (CET-MAE) and a framework (E2T-PTR) that improve EEG-based language decoding for brain-computer interfaces by integrating self-supervised learning and pre-trained modules, achieving state-of-the-art results.


The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns

http://arxiv.org/abs/2402.17431v1

Compressor summary: KANDY is a benchmarking framework that generates diverse learning and reasoning tasks inspired by Kandinsky patterns to test AI models' performance in continual and semi-supervised learning with symbol compositionality, challenging both neural and symbolic approaches.


Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

http://arxiv.org/abs/2402.17430v1

Compressor summary: MapQR is a method for constructing online vectorized maps in autonomous driving that enhances query capabilities using scatter-and-gather queries and exploits prior information to improve accuracy and efficiency.


VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

http://arxiv.org/abs/2402.17427v1

Compressor summary: VastGaussian is a new method that improves 3D Gaussian Splatting for large scene reconstruction by using progressive partitioning, airspace-aware visibility, parallel optimization, and decoupled appearance modeling, achieving fast and high-quality results.


ViTaL: An Advanced Framework for Automated Plant Disease Identification in Leaf Images Using Vision Transformers and Linear Projection For Feature Reduction

http://arxiv.org/abs/2402.17424v1

Compressor summary: Key points: - Paper introduces a robust framework for automated disease identification in plant leaf images - Framework uses Vision Transformers for feature extraction and tests different linear projections - Finds top model with 0.054 Hamming loss and proposes a low-cost hardware design for scanning leaves Summary: The paper presents a framework that uses Vision Transformers to automatically identify plant diseases from leaf images, evaluates different linear projections, and suggests a Raspberry Pi-based hardware solution.


Reinforced In-Context Black-Box Optimization

http://arxiv.org/abs/2402.17423v1

Compressor summary: RIBBO learns a black-box optimization algorithm from data using expressive sequence models and regret-to-go tokens to generate query points that meet user-desired regret.


PANDAS: Prototype-based Novel Class Discovery and Detection

http://arxiv.org/abs/2402.17420v1

Compressor summary: PANDAS is a simple method to detect new classes and adapt an object detector to spot them using prototypes for old and new classes.


CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

http://arxiv.org/abs/2402.17417v1

Compressor summary: CARZero is a novel approach for radiology zero-shot classification that uses cross-attention mechanisms and large language models to align image and text features and improve performance on chest radiograph diagnostic sets.


Neural Video Compression with Feature Modulation

http://arxiv.org/abs/2402.17414v1

Compressor summary: The paper proposes a conditional coding-based neural video codec that solves two critical problems and achieves superior performance over previous methods.


DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

http://arxiv.org/abs/2402.17412v1

Compressor summary: DiffuseKronA is a novel Kronecker product-based adaptation module for text-to-image generation that reduces parameters by up to 99.95% and improves image quality while being more interpretable and less sensitive to hyperparameters.


Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective

http://arxiv.org/abs/2402.17411v1

Compressor summary: The text describes a dataset and baselines for LLM consistency research, using LightGBM as the base model to evaluate and improve NLP models.


A novel image space formalism of Fourier domain interpolation neural networks for noise propagation analysis

http://arxiv.org/abs/2402.17410v1

Compressor summary: The text proposes a method to analyze noise propagation in multi-layer CNNs for MRI reconstructions using an image space formalism, which allows accurate noise resilience characterization.


LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning

http://arxiv.org/abs/2402.17406v1

Compressor summary: LSPT is a novel approach to visual representation learning that leverages long-term gated prompts and patch tokens for improved performance on downstream tasks.


Sora Generates Videos with Stunning Geometrical Consistency

http://arxiv.org/abs/2402.17403v1

Compressor summary: The paper introduces a new benchmark for evaluating video generation models like Sora by assessing their ability to satisfy geometric constraints in 3D reconstruction, indicating their adherence to real-world physics principles.


Investigating Continual Pretraining in Large Language Models: Insights and Implications

http://arxiv.org/abs/2402.17400v1

Compressor summary: The paper explores continual domain-adaptive pretraining in large language models, introduces a new benchmark to measure adaptability, and reveals insights about model size, domain progression, and knowledge transfer.


Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

http://arxiv.org/abs/2402.17396v1

Compressor summary: Key points: - Large Language Models (LLMs) are powerful but lack systematic generalization - The paper benchmarks GPT-4 on three algorithmic tasks with controllable difficulty - Advanced prompting techniques improve GPT-4's performance on all tasks Summary: The paper evaluates GPT-4, a state-of-the-art LLM, on challenging algorithmic tasks and shows that advanced prompting boosts its performance.


Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

http://arxiv.org/abs/2402.17392v1

Compressor summary: The paper compares human-written and bot-generated texts' semantic structures using n-grams and finds differences across four languages.


Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates

http://arxiv.org/abs/2402.17390v1

Compressor summary: The text discusses how machine learning models may experience negative flips when updated, affecting both accuracy and robustness to adversarial examples, and proposes a technique called robustness-congruent adversarial training to address this issue.


FairBelief - Assessing Harmful Beliefs in Language Models

http://arxiv.org/abs/2402.17389v1

Compressor summary: FairBelief is a method to analyze and assess the biases in language models that may negatively affect minorities and underrepresented groups.


Determinants of LLM-assisted Decision-Making

http://arxiv.org/abs/2402.17385v1

Compressor summary: Key points: - The study explores factors that influence decision-making with LLM support - It presents a dependency framework that systematizes possible interactions between these factors - It reveals significant aspects such as trust, mental model, and information processing Summary: The study analyzes how various factors affect decision-making with LLM support and proposes a framework to understand their interdependencies. It identifies trust, mental model, and information processing as important aspects to improve decision quality in human-AI collaboration.


KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

http://arxiv.org/abs/2402.17377v1

Compressor summary: KoDialogBench is a benchmark for evaluating language models' conversational skills in Korean, revealing room for improvement and highlighting effective training techniques.


Accelerating Diffusion Sampling with Optimized Time Steps

http://arxiv.org/abs/2402.17376v1

Compressor summary: The proposed framework optimizes time steps for numerical ODE solvers in diffusion probabilistic models (DPMs) to improve image synthesis efficiency and quality by minimizing the distance between the ground-truth solution and the approximate solution.


Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching

http://arxiv.org/abs/2402.17372v1

Compressor summary: This paper proposes a new technique for matching point clouds using graph Laplacian eigenmaps and a new operator called Coupled Laplacian, which improves accuracy on object anomaly localization and bone side estimation tasks.


A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry

http://arxiv.org/abs/2402.17371v1

Compressor summary: The paper introduces a dataset of ancient Hebrew poetry with labeled metaphors for studying figurative language in the Humanities.


An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary

http://arxiv.org/abs/2402.17370v1

Compressor summary: The paper presents a fast and accurate method for segmenting ore images using a lightweight MLP framework with a feature pyramid network and a novel loss function.


Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

http://arxiv.org/abs/2402.17364v1

Compressor summary: The paper introduces DynTet, a hybrid representation that combines neural networks and tetrahedra grids to generate realistic and animatable head avatars with improved fidelity, lip synchronization, and real-time performance.


CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer

http://arxiv.org/abs/2402.17360v1

Compressor summary: The paper introduces CAPT, a transformer-based method for accurately and robustly estimating joint parameters and states of articulated objects from a single point cloud, with improved performance by using a motion loss approach and a double voting strategy.


SoFA: Shielded On-the-fly Alignment via Priority Rule Following

http://arxiv.org/abs/2402.17358v1

Compressor summary: The paper proposes a new alignment method for LLMs using priority rules and presents PriorityDistill, an approach to distill these rules from LLM simulations for robust rule integration.


RECOST: External Knowledge Guided Data-efficient Instruction Tuning

http://arxiv.org/abs/2402.17355v1

Compressor summary: RECOST is a framework that uses external knowledge to improve data-efficient instruction tuning by evaluating and selecting high-quality samples synthesized by LLMs.


ICP-Flow: LiDAR Scene Flow Estimation with ICP

http://arxiv.org/abs/2402.17351v1

Compressor summary: ICP-Flow is a learning-free method that uses histogram-based initialization and Iterative Closest Point algorithm to estimate rigid transformation between LiDAR scans for autonomous driving, achieving high performance and real-time inference.


LocalGCL: Local-aware Contrastive Learning for Graphs

http://arxiv.org/abs/2402.17345v1

Compressor summary: Key points: - Graph representation learning (GRL) encodes graphs into embeddings - Self-supervised learning (SSL) techniques reduce manual labeling costs - Contrastive learning (CL) overemphasizes global patterns in graphs - Local-aware Graph Contrastive Learning (methname) captures local graph information with masking-based modeling - Methname outperforms existing methods and shows promise as a graph representation learner Summary: Methname is a self-supervised learning framework that improves graph representation learning by balancing global and local patterns in graphs.


Enhanced Bayesian Optimization via Preferential Modeling of Abstract Properties

http://arxiv.org/abs/2402.17343v1

Compressor summary: The paper proposes a human-AI collaboration method to improve Bayesian optimization by incorporating expert preferences for unmeasured properties in the surrogate modeling, handling biases, and discussing convergence behavior.


SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents

http://arxiv.org/abs/2402.17339v1

Compressor summary: The paper proposes SocialCVAE, a model that predicts pedestrian trajectories using behavioral uncertainty and socially explainable interaction energy map to improve accuracy.


Unsupervised multiple choices question answering via universal corpus

http://arxiv.org/abs/2402.17333v1

Compressor summary: This paper presents a framework that generates synthetic multiple-choice question answering data without manual annotation, using named entities and knowledge graphs.


Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

http://arxiv.org/abs/2402.17327v1

Compressor summary: Key points: - The paper proposes a data selection method based on $k$-means clustering and sensitivity sampling - The method can select a small subset of data that approximates the average loss of the whole dataset - The method works well for fine-tuning foundation models and linear regression Summary: The paper introduces a new data selection approach that uses $k$-means and sensitivity sampling to find a representative subset of data with minimal loss, and shows its effectiveness on foundation models and linear regression.


SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

http://arxiv.org/abs/2402.17323v1

Compressor summary: The paper introduces SDDGR, a novel generative replay method for class incremental object detection that uses diffusion models and knowledge distillation to mitigate catastrophic forgetting and achieve state-of-the-art results.


A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

http://arxiv.org/abs/2402.17319v1

Compressor summary: The report introduces UniNet, a vanilla framework for multi-task visual perception that combines DETR3D, Mask2Former, and BinsFormer with InternImage-L backbone, achieving a 49.6 overall score in the VCL Challenge.


Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation

http://arxiv.org/abs/2402.17316v1

Compressor summary: CEMA is a paradigm that allows online adaptation of edge models using forward propagation, limited data transmission, and distilled normalization layers.


SKT5SciSumm - A Hybrid Generative Approach for Multi-Document Scientific Summarization

http://arxiv.org/abs/2402.17311v1

Compressor summary: The paper introduces SKT5SciSumm, a hybrid framework for summarizing long scientific texts using Sentence-Transformer and T5 models, achieving state-of-the-art performance on Multi-XScience dataset.


Method of Tracking and Analysis of Fluorescent-Labeled Cells Using Automatic Thresholding and Labeling

http://arxiv.org/abs/2402.17310v1

Compressor summary: The paper proposes a new method for analyzing cell images in drug screening that tracks cells and quantifies cytoplasm-to-nuclei signal ratio using automatic thresholding and labeling algorithms.


Probing Multimodal Large Language Models for Global and Local Semantic Representation

http://arxiv.org/abs/2402.17304v1

Compressor summary: This study investigates if multimodal large language models can capture both global and local image information, finding that they excel at local object detection but struggle with global semantic understanding.


Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese

http://arxiv.org/abs/2402.17302v1

Compressor summary: The study explores using large language models to create commonsense question answering datasets for Indonesian and Sundanese, finding GPT-4 Turbo generates good questions in Indonesian but not Sundanese, and LLMs perform better on their own datasets.


ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks

http://arxiv.org/abs/2402.17298v1

Compressor summary: The study proposes a novel method called ArcSin to improve zero-shot cross-modal transfer for visual tasks by adaptively injecting noise into textual elements, expanding the domain generalization potential and preserving content integrity.


Learning Exposure Correction in Dynamic Scenes

http://arxiv.org/abs/2402.17296v1

Compressor summary: Key points: - Video exposure correction is challenging and less explored than image exposure correction - Authors construct a real-world paired video dataset with underexposure and overexposure scenes - They propose a Video Exposure Correction Network (VECNet) based on Retinex theory that enhances both factors - Their method outperforms existing methods Summary: The authors present a new dataset and a network for video exposure correction that improves the quality of videos with improper exposure using Retinex theory.


DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

http://arxiv.org/abs/2402.17292v1

Compressor summary: DivAvatar generates diverse 3D avatars from text using a finetuned 3D generative model and novel techniques for appearance and geometry quality.


An Interpretable Evaluation of Entropy-based Novelty of Generative Models

http://arxiv.org/abs/2402.17287v1

Compressor summary: The paper proposes a spectral method, Kernel-based Entropic Novelty (KEN) score, to measure the novelty of multi-modal generative models compared to a reference dataset.


Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network

http://arxiv.org/abs/2402.17285v1

Compressor summary: The DMGASR method combines a Group-Autoencoder with a diffusion model to improve hyperspectral image super-resolution, overcoming challenges such as convergence issues and long inference time.


One-Shot Structure-Aware Stylized Image Synthesis

http://arxiv.org/abs/2402.17275v1

Compressor summary: OSASIS is a novel one-shot image stylization method that preserves structure and semantics, and outperforms existing methods in various experimental settings.


Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social Dilemmas

http://arxiv.org/abs/2402.17270v1

Compressor summary: This survey summarizes AI advancements in understanding and enhancing cooperation in social dilemmas, covering multi-agent cooperation, human-agent cooperation, and using AI to improve human cooperation.


Explicit Interaction for Fusion-Based Place Recognition

http://arxiv.org/abs/2402.17264v1

Compressor summary: The paper introduces EINet, a novel fusion-based network that explicitly interacts with LiDAR and camera modalities for place recognition in GPS-denied scenarios, and proposes a new benchmark based on the nuScenes dataset.


Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

http://arxiv.org/abs/2402.17263v1

Compressor summary: MELoRA is a method that uses mini-ensembles of low-rank adapters to fine-tune large language models with fewer parameters and better performance than LoRA on various NLP tasks.


Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

http://arxiv.org/abs/2402.17262v1

Compressor summary: This paper shows that large language models can be tricked into generating harmful information through multi-turn dialogue, revealing safety issues in these models.


RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

http://arxiv.org/abs/2402.17257v1

Compressor summary: RIME is a robust preference-based reinforcement learning algorithm that uses human preferences as rewards, improves learning from noisy data, and incorporates a warm start to bridge the performance gap during transition.


Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

http://arxiv.org/abs/2402.17256v1

Compressor summary: The paper evaluates large language models' performance on out-of-domain intent detection in task-oriented dialogue systems, finding they perform well with little data but still lag behind fine-tuned models.


Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

http://arxiv.org/abs/2402.17251v1

Compressor summary: CDS-CZSL is a novel framework that improves attribute recognition by considering object diversity and context, achieving state-of-the-art results in both closed and open world scenarios.