arxiv compressed, 2024-02-21

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-21 generated by the compressor, my personal LLM-based project.


How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

http://arxiv.org/abs/2402.13255v1

Compressor summary: This paper provides a comprehensive overview of the evolution of Simultaneous Localization and Mapping (SLAM) research, focusing on recent advancements in radiance fields for autonomous exploration.


CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

http://arxiv.org/abs/2402.13254v1

Compressor summary: CounterCurate is a framework that improves multimodal models' ability to reason with images and text by addressing gaps in physically grounded reasoning and semantic counterfactual fine-tuning.


BiMediX: Bilingual Medical Mixture of Experts LLM

http://arxiv.org/abs/2402.13253v1

Compressor summary: BiMediX is a bilingual medical LLM that interacts in English and Arabic, using a translation pipeline and a large bilingual dataset to outperform existing models on various tasks.


Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields

http://arxiv.org/abs/2402.13252v1

Compressor summary: Key points: - Algorithm for joint refinement of camera pose and scene geometry using 2D images - Pilot study on 1D signal and comparison to 3D scenarios - Convolutional Gaussian filters for coarse-to-fine training schedule - Decomposed low-rank tensor for efficient 3D convolution approximation - Techniques for improved robustness and stability of joint optimization Summary: The paper presents an algorithm that refines camera pose and scene geometry from 2D images using decomposed low-rank tensors, convolutional Gaussian filters, and techniques to improve optimization.


Video ReCap: Recursive Captioning of Hour-Long Videos

http://arxiv.org/abs/2402.13250v1

Compressor summary: Video ReCap is a recursive video captioning model that can handle videos of different lengths and output captions at multiple hierarchy levels.


TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

http://arxiv.org/abs/2402.13249v1

Compressor summary: The study finds that large language models (LLMs) generate factually inconsistent summaries in dialogue domains, and existing LLMs are not effective at evaluating factual consistency.


VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

http://arxiv.org/abs/2402.13243v1

Compressor summary: VADv2 is an end-to-end driving model that uses probabilistic planning to handle uncertainty and achieve state-of-the-art performance with only camera sensors.


SMORE: Similarity-based Hyperdimensional Domain Adaptation for Multi-Sensor Time Series Classification

http://arxiv.org/abs/2402.13233v1

Compressor summary: Key points: - IoT applications use ML to analyze time series data from interconnected sensors - Distribution shift can degrade model performance - SMORE is a novel DA algorithm for multi-sensor time series classification that uses hyperdimensional computing - SMORE improves accuracy, reduces training and inference time compared to SOTA DNN-based algorithms Summary: SMORE is a resource-efficient domain adaptation algorithm that leverages hyperdimensional computing to improve multi-sensor time series classification performance and reduce time complexity.


A Touch, Vision, and Language Dataset for Multimodal Alignment

http://arxiv.org/abs/2402.13232v1

Compressor summary: The paper introduces a new dataset of vision-touch pairs with language labels, and uses it to train a model that improves touch-vision-language alignment in text generation and classification tasks.


Investigating Cultural Alignment of Large Language Models

http://arxiv.org/abs/2402.13231v1

Compressor summary: Our study shows that large language models have stronger cultural alignment when prompted with dominant languages or a mixture of languages from a specific culture, but misalignment can occur for underrepresented personas and sensitive topics.


Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

http://arxiv.org/abs/2402.13228v1

Compressor summary: DPOP improves DPO by avoiding a loss of preferred examples and achieves state-of-the-art performance on various tasks and datasets, including high edit distance ones.


AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning

http://arxiv.org/abs/2402.13225v1

Compressor summary: AgentMD is a language agent that can curate and apply clinical calculators to improve healthcare analytics and patient care by overcoming usability challenges, poor dissemination, and restricted functionality of existing tools.


RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian

http://arxiv.org/abs/2402.13222v1

Compressor summary: RoCode is a Romanian coding dataset that evaluates and fine-tunes language models for non-English languages.


Analyzing Operator States and the Impact of AI-Enhanced Decision Support in Control Rooms: A Human-in-the-Loop Specialized Reinforcement Learning Framework for Intervention Strategies

http://arxiv.org/abs/2402.13219v1

Compressor summary: The paper evaluates an AI-based decision support system for complex industrial processes, which reduces operator workload, improves situational awareness, and adapts interventions to both system and human performance.


How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

http://arxiv.org/abs/2402.13220v1

Compressor summary: MAD-Bench is a new benchmark that tests how well large language models handle deceptive information and finds significant performance gaps between them, suggesting the need for improvement.


VideoPrism: A Foundational Visual Encoder for Video Understanding

http://arxiv.org/abs/2402.13217v1

Compressor summary: VideoPrism is a versatile video encoder that uses pretraining on large video-caption and noisy text datasets to excel at various video understanding tasks.


Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A

http://arxiv.org/abs/2402.13213v1

Compressor summary: The study investigates whether overconfidence in large language models can be reduced by using their maximum softmax probabilities to selectively abstain from multiple-choice answers, finding evidence that this improves performance.


Soft Self-Consistency Improves Language Model Agents

http://arxiv.org/abs/2402.13212v1

Compressor summary: Soft Self-Consistency is a method to improve large language models by selecting the best answer from multiple solutions using a continuous score based on model likelihoods, resulting in better performance and efficiency on interactive tasks.


Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

http://arxiv.org/abs/2402.13211v1

Compressor summary: This paper analyzes large language models' challenges in providing emotional support through conversation and explores ways to improve their effectiveness.


Bayesian Reward Models for LLM Alignment

http://arxiv.org/abs/2402.13210v1

Compressor summary: The authors propose using a Bayesian reward model to reduce errors in LLM responses caused by overoptimizing rewards and to improve the quality of non-toxic and helpful responses.


How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

http://arxiv.org/abs/2402.13208v1

Compressor summary: The authors propose ConfHyena, a Conformer model that uses Hyena attention instead of standard self-attention for speech processing, achieving faster training times with minimal impact on performance.


Practical Kernel Tests of Conditional Independence

http://arxiv.org/abs/2402.13196v1

Compressor summary: The paper presents a method for testing conditional independence that balances false positive rate and power using kernel ridge regression and bias control techniques.


Question Calibration and Multi-Hop Modeling for Temporal Question Answering

http://arxiv.org/abs/2402.13188v1

Compressor summary: The paper proposes a novel approach for temporal knowledge graph question answering that calibrates question representations and models multi-hop relationships using a graph neural network, achieving better performance and interpretability than previous models.


Testing Calibration in Subquadratic Time

http://arxiv.org/abs/2402.13187v1

Compressor summary: The paper studies how to test if a binary prediction model is well-calibrated using property testing techniques and efficient algorithms, and explores different aspects of calibration measurement.


UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

http://arxiv.org/abs/2402.13185v1

Compressor summary: UniEdit is a text-guided framework for video editing that supports both motion and appearance editing by leveraging temporal and spatial self-attention layers.


What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

http://arxiv.org/abs/2402.13184v1

Compressor summary: The study introduces CosmoAgent, a framework using LLMs to simulate interactions between human and alien civilizations, considering risks and diversity in cosmologies, ethics, and worldviews, for peaceful coexistence and conflict resolution.


Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

http://arxiv.org/abs/2402.13182v1

Compressor summary: The text proposes an algorithm for distributed kernel bandits that minimizes regret by allowing agents to share information and use uniform exploration and shared randomness.


Benchmarking Retrieval-Augmented Generation for Medicine

http://arxiv.org/abs/2402.13178v1

Compressor summary: MIRAGE is a benchmark to evaluate retrieval-augmented generation (RAG) systems for medical question answering (QA), showing that different corpora and retrievers improve LLMs' performance by up to 18%.


AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

http://arxiv.org/abs/2402.13152v1

Compressor summary: AnnoTheia is a semi-automatic annotation toolkit that helps researchers create audio-visual speech technologies for low-resource languages by detecting when someone speaks and transcribing their words.


Defending Jailbreak Prompts via In-Context Adversarial Game

http://arxiv.org/abs/2402.13148v1

Compressor summary: ICAG is an adversarial game that improves LLM defenses against jailbreak attacks without fine-tuning and shows transferability to other models.


SubIQ: Inverse Soft-Q Learning for Offline Imitation with Suboptimal Demonstrations

http://arxiv.org/abs/2402.13147v1

Compressor summary: The paper proposes a new method for offline imitation learning that uses inverse soft-Q learning to align the learned rewards with expert demonstrations and avoid over-fitting.


OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog

http://arxiv.org/abs/2402.13146v1

Compressor summary: OLViT is a novel video dialog model that uses multi-modal attention to track objects and language co-references, improving performance on response classification and generation tasks.


CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

http://arxiv.org/abs/2402.13145v1

Compressor summary: The paper presents a large annotated Chinese Metaphor Corpus and proposes a novel approach to metaphor generation that emphasizes grounds, resulting in more realistic and creative metaphors.


Neural Network Diffusion

http://arxiv.org/abs/2402.13144v1

Compressor summary: The authors show how to generate high-performing neural network parameters using an autoencoder and a latent diffusion model, achieving comparable or improved performance over trained networks.


The Hidden Space of Transformer Language Adapters

http://arxiv.org/abs/2402.13137v1

Compressor summary: The study examines how transformer language adapters work, revealing that adaptation occurs gradually across layers and in specific layers for target languages.


exploreCOSMOS: Interactive Exploration of Conditional Statistical Shape Models in the Web-Browser

http://arxiv.org/abs/2402.13131v1

Compressor summary: The paper presents a new tool that allows non-experts to explore and manipulate statistical shape models of faces in a browser, using partial observations.


Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity

http://arxiv.org/abs/2402.13130v1

Compressor summary: TMFT improves ELECTRA's sentence embeddings for semantic textual similarity tasks, making them comparable to BERT in efficiency and performance.


TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning

http://arxiv.org/abs/2402.13125v1

Compressor summary: TreeEval is a benchmark-free method for evaluating large language models by using a high-performance LLM to ask questions under a topic with a tree planning strategy, avoiding data leakage and ensuring evaluation completeness and efficiency.


Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model

http://arxiv.org/abs/2402.13122v1

Compressor summary: CoRTe is a method that trains semantic segmentation models on unlabelled datasets using black-box source model predictions and pseudo-labelling, achieving good results in synthetic-to-real settings.


A Survey on Knowledge Distillation of Large Language Models

http://arxiv.org/abs/2402.13116v1

Compressor summary: Key points: - The survey explores knowledge distillation (KD) techniques for transferring advanced capabilities from proprietary LLMs to open-source ones - The survey organizes KD around three pillars: algorithm, skill, and verticalization - The survey highlights the role of data augmentation (DA) in enhancing KD performance - The survey provides guidance for researchers and practitioners and suggests future directions Summary: The survey examines how knowledge distillation with data augmentation can transfer sophisticated functionalities from proprietary to open-source large language models, covering algorithmic, skill-based, and vertical aspects.


BuffGraph: Enhancing Class-Imbalanced Node Classification via Buffer Nodes

http://arxiv.org/abs/2402.13114v1

Compressor summary: BuffGraph improves minor class representation in class-imbalanced graph data by inserting buffer nodes that modulate the impact of majority classes on node classification.


When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality

http://arxiv.org/abs/2402.13113v1

Compressor summary: The text explores how restart-incremental Transformers handle ambiguity in sentences and shows the benefits of bidirectional encoders and dependency parsing for revision.


CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

http://arxiv.org/abs/2402.13109v1

Compressor summary: CIF-Bench is a new benchmark to test the generalization ability of large language models in Chinese, revealing their limitations in handling complex reasoning and cultural nuances.


On the Stability of Gradient Descent for Large Learning Rate

http://arxiv.org/abs/2402.13108v1

Compressor summary: The paper investigates why linear neural networks trained with quadratic loss show Edge of Stability behavior and how the learning rate affects convergence.


Multivariate Functional Linear Discriminant Analysis for the Classification of Short Time Series with Missing Data

http://arxiv.org/abs/2402.13103v1

Compressor summary: MUDRA is a new method that improves FLDA by handling multivariate and incomplete data using an efficient algorithm and shows better results in predicting articulatory word recognition.


A Microstructure-based Graph Neural Network for Accelerating Multiscale Simulations

http://arxiv.org/abs/2402.13101v1

Compressor summary: The paper proposes a hybrid surrogate model that combines graph neural networks and microscopic constitutive models to simulate the mechanical response of advanced materials with concurrent multiscale models, improving accuracy and computational efficiency.


ELAD: Explanation-Guided Large Language Models Active Distillation

http://arxiv.org/abs/2402.13098v1

Compressor summary: The paper proposes ELAD, a framework that uses active learning and explanation-guided sample selection to improve Large Language Models' knowledge distillation efficiency and performance.


Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities

http://arxiv.org/abs/2402.13094v1

Compressor summary: The study compared different ways of measuring how well texts are understood by people with and without intellectual disabilities after being simplified either automatically or manually.


Event-level Knowledge Editing

http://arxiv.org/abs/2402.13093v1

Compressor summary: Event-level knowledge editing updates large language models by adding new events, improving efficiency and completeness over factual triplet-level editing.


Towards an empirical understanding of MoE design choices

http://arxiv.org/abs/2402.13089v1

Compressor summary: This paper examines how different design choices in Mixture of Experts models affect validation performance and finds that sequence-level routing leads to topic-specific expert specialization, while token-level routing results in syntactic specialization.


Slot-VLM: SlowFast Slots for Video-Language Modeling

http://arxiv.org/abs/2402.13088v1

Compressor summary: Slot-VLM is a framework that generates semantically decomposed video tokens to facilitate language model inference for video question-answering.


How Does Selection Leak Privacy: Revisiting Private Selection and Improved Results for Hyper-parameter Tuning

http://arxiv.org/abs/2402.13087v1

Compressor summary: The paper investigates the privacy implications of hyper-parameter tuning and proposes an improved solution with a tighter privacy bound.


IT Intrusion Detection Using Statistical Learning and Testbed Measurements

http://arxiv.org/abs/2402.13081v1

Compressor summary: The paper proposes using statistical learning methods to detect attacks in IT infrastructure based on continuous measurements, and compares HMM and LSTM for prediction accuracy and resource requirements.


Mechanistic Neural Networks for Scientific Machine Learning

http://arxiv.org/abs/2402.13077v1

Compressor summary: Mechanistic Neural Networks use a new block to learn governing equations and dynamics from data, improving interpretability and efficiency for scientific machine learning tasks.


Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

http://arxiv.org/abs/2402.13064v1

Compressor summary: GLAN is a method that uses a pre-curated human knowledge taxonomy to generate diverse and task-agnostic instructions for Large Language Models.


Toward Fairness via Maximum Mean Discrepancy Regularization on Logits Space

http://arxiv.org/abs/2402.13061v1

Compressor summary: Logits-MMD is a new framework for machine learning that improves fairness by using Maximum Mean Discrepancy on output logits, outperforming previous methods on facial and animal recognition datasets.


Random Graph Set and Evidence Pattern Reasoning Model

http://arxiv.org/abs/2402.13058v1

Compressor summary: Evidence Pattern Reasoning Model (EPRM) is a new evidential decision making model that can better fit different tasks by setting preferences and using Random Graph Set to model complex relationships, improving aircraft velocity ranking in an experiment.


Identifying Semantic Induction Heads to Understand In-Context Learning

http://arxiv.org/abs/2402.13055v1

Compressor summary: The authors analyze how attention heads in large language models encode syntactic and knowledge graph relations, and find a link between these semantic induction heads and the in-context learning ability of LLMs.


Stable Knowledge Editing in Large Language Models

http://arxiv.org/abs/2402.13048v1

Compressor summary: StableKE is a novel method that enhances language models with diverse and contextual knowledge descriptions to improve their editing performance and stability without oversimplifying the model's interconnected knowledge structure.


Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

http://arxiv.org/abs/2402.13043v1

Compressor summary: The text proposes a method to improve few-shot dialogue state tracking using conversation summaries and a lightweight encoder for query embeddings, which is more scalable than previous approaches.


Text-Guided Molecule Generation with Diffusion Language Model

http://arxiv.org/abs/2402.13040v1

Compressor summary: TGM-DLM is a novel method that uses diffusion models to generate molecules from text descriptions more effectively than autoregressive methods, without requiring extra data resources.


Align Your Intents: Offline Imitation Learning via Optimal Transport

http://arxiv.org/abs/2402.13037v1

Compressor summary: AILOT is an offline reinforcement learning method that uses optimal transport to learn from expert trajectories without explicit rewards or action labels.


SiLLM: Large Language Models for Simultaneous Machine Translation

http://arxiv.org/abs/2402.13036v1

Compressor summary: SiLLM is a new method for simultaneous machine translation that uses separate agents to handle policy-decision and translation, leveraging the capabilities of large language models.


Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models

http://arxiv.org/abs/2402.13035v1

Compressor summary: The paper proposes a new prompt for training language models to improve their self-correction abilities in mathematical reasoning without relying on external feedback.


Enhancing Real-World Complex Network Representations with Hyperedge Augmentation

http://arxiv.org/abs/2402.13033v1

Compressor summary: Hyperedge Augmentation is a new method to improve Graph Neural Networks by creating virtual hyperedges from raw data and extracting features from them, addressing the limitations of existing graph augmentation techniques.


Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables

http://arxiv.org/abs/2402.13028v1

Compressor summary: The paper proposes a novel word-level heterogeneous graph model for fact checking that leverages both unstructured and structured evidence data to reason about claim veracity.


CFEVER: A Chinese Fact Extraction and VERification Dataset

http://arxiv.org/abs/2402.13025v1

Compressor summary: CFEVER is a Chinese dataset for Fact Extraction and VERification with 30,012 labeled claims from Chinese Wikipedia.


SoMeLVLM: A Large Vision Language Model for Social Media Processing

http://arxiv.org/abs/2402.13022v1

Compressor summary: The paper introduces a new AI model, SoMeLVLM, which can handle various social media tasks by understanding and generating realistic behavior using five key capabilities.


Improving Neural-based Classification with Logical Background Knowledge

http://arxiv.org/abs/2402.13019v1

Compressor summary: The paper proposes a new neurosymbolic technique for supervised multi-label classification, called semantic conditioning at inference, which improves accuracy and resource efficiency while preserving semantic consistency.


Understanding the effects of language-specific class imbalance in multilingual fine-tuning

http://arxiv.org/abs/2402.13016v1

Compressor summary: Imbalanced label distribution across languages in multilingual classification datasets negatively affects transformer-based LLMs, but language-specific class weighing can help mitigate these issues.


Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

http://arxiv.org/abs/2402.13013v1

Compressor summary: The study improves code-focused LLMs' performance by introducing a new method to generate comments for existing code and filtering poorly correlated data, resulting in better performance on programming skill benchmarks.


Improve Cross-Architecture Generalization on Dataset Distillation

http://arxiv.org/abs/2402.13007v1

Compressor summary: The proposed "model pool" method improves data distillation by selecting diverse models based on probabilities and applying knowledge distillation to test results, enhancing generalizability and performance.


Investigating the Impact of Model Instability on Explanations and Uncertainty

http://arxiv.org/abs/2402.13006v1

Compressor summary: The study investigates how noise affects explanations from AI models and finds that high uncertainty doesn't always mean low plausibility, and some methods are more robust to perturbation than others.


Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

http://arxiv.org/abs/2402.13004v1

Compressor summary: The study compares conventional DNN-HMM and state-of-the-art CTC/Attention decoders for Visual Speech Recognition, showing that the former outperforms the latter in data-scarce scenarios with less training time and fewer parameters.


Phonotactic Complexity across Dialects

http://arxiv.org/abs/2402.12998v1

Compressor summary: The study finds a tradeoff between word length and phonotactic complexity in Dutch and Min dialects, suggesting that complex languages simplify in another dimension.


TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

http://arxiv.org/abs/2402.12991v1

Compressor summary: The paper proposes a method called TRAP to identify whether a third-party app uses a specific large language model through its chat function by using adversarial suffixes.


Towards Robust Graph Incremental Learning on Evolving Graphs

http://arxiv.org/abs/2402.12987v1

Compressor summary: The paper proposes a regularization technique for mitigating catastrophic forgetting in incremental graph-related tasks with structural shifts.


Can GNN be Good Adapter for LLMs?

http://arxiv.org/abs/2402.12984v1

Compressor summary: The paper proposes GraphAdapter, a framework that uses a graph neural network and large language models to efficiently model text-attributed graphs for various applications.


The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis

http://arxiv.org/abs/2402.12976v1

Compressor summary: The authors study multilingual in-context learning, finding that the effectiveness of demonstrations varies greatly depending on models, tasks, and languages, and suggesting that their importance might be overestimated.


Visual Style Prompting with Swapping Self-Attention

http://arxiv.org/abs/2402.12974v1

Compressor summary: The authors propose a novel method for text-to-image generation that maintains specific style elements without fine-tuning, achieving better results than existing approaches.


GlórIA - A Generative and Open Large Language Model for Portuguese

http://arxiv.org/abs/2402.12969v1

Compressor summary: Gl'orIA is a large European Portuguese language model pre-trained on a diverse corpus that excels in language modeling and generates high-quality text.


MapTrack: Tracking in the Map

http://arxiv.org/abs/2402.12968v1

Compressor summary: MapTrack is a robust multi-object tracker that uses probability maps, prediction maps, and covariance adaptive Kalman filters to handle occlusions and crowds in real time.


Conditional Logical Message Passing Transformer for Complex Query Answering

http://arxiv.org/abs/2402.12954v1

Compressor summary: CLMPT is a novel neural model for complex query answering over knowledge graphs, which considers node types in message passing and uses self-attention to model logical dependencies.


GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

http://arxiv.org/abs/2402.12948v1

Compressor summary: The Logits-Addition watermark, a new type of GumbelMax-trick-based watermark, enhances generation diversity in large language models and outperforms other decoding-based watermarking methods.


Cell Graph Transformer for Nuclei Classification

http://arxiv.org/abs/2402.12946v1

Compressor summary: The paper presents a new method for classifying nuclei in histopathology images using a cell graph transformer that learns node and edge features and improves classification performance.


Normalized Orthography for Tunisian Arabic

http://arxiv.org/abs/2402.12940v1

Compressor summary: The text introduces NOTA, a modified version of CODA* guidelines for writing Tunisian Arabic using the Arabic script, aiming to accurately represent its unique features and improve language resource development.


Discovering Behavioral Modes in Deep Reinforcement Learning Policies Using Trajectory Clustering in Latent Space

http://arxiv.org/abs/2402.12939v1

Compressor summary: The paper proposes a method to analyze and improve DRL policies by using dimensionality reduction and trajectory clustering on neural network latent spaces.


UniCell: Universal Cell Nucleus Classification via Prompt Learning

http://arxiv.org/abs/2402.12938v1

Compressor summary: The paper presents UniCell, a framework for multi-class cell nucleus recognition that uses prompt learning to leverage shared knowledge across datasets, improving histopathological diagnosis.


GRAPHGINI: Fostering Individual and Group Fairness in Graph Neural Networks

http://arxiv.org/abs/2402.12937v1

Compressor summary: Key points: - The paper introduces GRAPHGINI, a method for incorporating the Gini coefficient as a measure of fairness in graph neural networks (GNNs). - GRAPHGINI works with both individual and group fairness goals in a single system, while maintaining high prediction accuracy. - GRAPHGINI uses learnable attention scores and a maximum Nash social welfare constraint to enforce fairness constraints. - GRAPHGINI outperforms state-of-the-art methods on real-world datasets in terms of individual fairness, utility, and group equality. Summary: GRAPHGINI is a novel method for ensuring fairness in GNNs using the Gini coefficient as a differentiable approximation of fairness constraints. It balances individual and group fairness goals with high prediction accuracy and surpasses existing methods on real datasets.


Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence

http://arxiv.org/abs/2402.12930v1

Compressor summary: Syflow is an approach to find and describe exceptional sub-populations using normalizing flows and a novel neural layer for interpretable results.


CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection

http://arxiv.org/abs/2402.12927v1

Compressor summary: The paper explores using pre-trained vision-language models with adaptation methods for detecting deepfakes, finding that retaining the textual component of CLIP improves performance and reduces data requirements.


Advancements in Point Cloud-Based 3D Defect Detection and Classification for Industrial Systems: A Comprehensive Survey

http://arxiv.org/abs/2402.12923v1

Compressor summary: The paper reviews recent deep learning methods for condition monitoring using 3D point clouds, focusing on defect shape classification and segmentation in industrial applications.


Right on Time: Revising Time Series Models by Constraining their Explanations

http://arxiv.org/abs/2402.12921v1

Compressor summary: RioT is a method to help deep time series models avoid misleading results by interacting with model explanations in both time and frequency domains, guiding them away from confounding factors.


Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models

http://arxiv.org/abs/2402.12916v1

Compressor summary: The paper explores how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline, aiming for better results in machine learning tasks and adapting to the ever-changing data landscape.


Large Language Model-based Human-Agent Collaboration for Complex Task Solving

http://arxiv.org/abs/2402.12914v1

Compressor summary: The paper presents ReHAC, a method that uses reinforcement learning to enable effective collaboration between humans and large language models for complex task-solving, with limited human intervention.


OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data

http://arxiv.org/abs/2402.12913v1

Compressor summary: The paper presents a system for detecting hallucination in LLMs for text-generation tasks without labeled data, using prompt engineering and few-shot learning, and achieving competitive results with smaller LLMs.


RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models

http://arxiv.org/abs/2402.12908v1

Compressor summary: The authors introduce RealCompo, a text-to-image generation framework that combines text-to-image and layout-to-image models to create more realistic and compositional images, without requiring extra training.


Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

http://arxiv.org/abs/2402.12907v1

Compressor summary: The text discusses the importance of considering societal aspects in AI alignment and proposes a new problem (ICSAP) to explore how game theory can help bridge the gap between technical and social components.


Mind the Exit Pupil Gap: Revisiting the Intrinsics of a Standard Plenoptic Camera

http://arxiv.org/abs/2402.12891v1

Compressor summary: The paper discusses how the main lens exit pupil affects depth reconstruction and post-shot refocusing in plenoptic cameras, and provides analysis and validation of this effect using simulations and experiments.


More Discriminative Sentence Embeddings via Semantic Graph Smoothing

http://arxiv.org/abs/2402.12890v1

Compressor summary: The paper proposes a method to improve unsupervised sentence representations using semantic graph smoothing, leading to better text clustering and classification results on eight benchmarks.


The practice of qualitative parameterisation in the development of Bayesian networks

http://arxiv.org/abs/2402.12887v1

Compressor summary: The text describes a common step called qualitative parameterisation that occurs after developing the structure of a Bayesian network to illustrate its intended behaviour before doing a more rigorous parameterisation.


GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models

http://arxiv.org/abs/2402.12881v1

Compressor summary: The study examines how well pre-trained language and vision models understand object affordances and creates a new dataset for this purpose.


Autism Detection in Speech - A Survey

http://arxiv.org/abs/2402.12880v1

Compressor summary: The text discusses various linguistic, prosodic, and acoustic cues for detecting autism in voice, speech, and language across biomedical, psychological, and NLP domains, highlighting gaps in research on female patients and transformer-based methods.


Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

http://arxiv.org/abs/2402.12875v1

Compressor summary: CoT helps large language models perform serial computations, increasing their ability to solve complex tasks like arithmetic and symbolic reasoning.


Skill or Luck? Return Decomposition via Advantage Functions

http://arxiv.org/abs/2402.12874v1

Compressor summary: The paper proposes Off-policy DAE, a method that learns from off-policy data by decomposing return into skill and luck components, without using importance sampling or truncation, and shows its advantages over previous methods in sample-efficient reinforcement learning.


Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

http://arxiv.org/abs/2402.12869v1

Compressor summary: The paper investigates how different table-to-text methods affect the performance of question answering systems when using hybrid domain data, and presents empirical findings and reasons behind the success of some methods.


Fast Rates in Online Convex Optimization by Exploiting the Curvature of Feasible Sets

http://arxiv.org/abs/2402.12868v1

Compressor summary: The paper introduces a new online optimization algorithm that exploits the curvature of both feasible sets and loss functions to achieve fast regret bounds in various environments.


Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

http://arxiv.org/abs/2402.12865v1

Compressor summary: The authors propose a method to visualize how Transformer-based Language Models learn and recall information by projecting their gradients into words.


Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

http://arxiv.org/abs/2402.12862v1

Compressor summary: The paper proposes using evidential deep learning to handle ambiguous emotions in emotion classification by representing emotions as distributions and quantifying uncertainty.


Bounding Reconstruction Attack Success of Adversaries Without Data Priors

http://arxiv.org/abs/2402.12861v1

Compressor summary: The paper provides formal bounds on reconstructing sensitive data from machine learning models trained with differential privacy under realistic settings and supports them with empirical results.


Differentiable Mapper For Topological Optimization Of Data Representation

http://arxiv.org/abs/2402.12854v1

Compressor summary: The paper presents a new method to optimize a key parameter in topological data analysis, called filter, for better visualizing and analyzing data structures.


MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

http://arxiv.org/abs/2402.12851v1

Compressor summary: MoELoRA is a novel PEFT method that uses contrastive learning to improve the adaptability of LLMs in various reasoning tasks with fewer parameters and less training time.


Instruction-tuned Language Models are Better Knowledge Learners

http://arxiv.org/abs/2402.12847v1

Compressor summary: The paper proposes pre-instruction-tuning (PIT), a method that improves large language model-based assistants' ability to learn new facts by tuning them on questions before training on documents, instead of after.


ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

http://arxiv.org/abs/2402.12846v1

Compressor summary: Contrastive Visual Question Generation (ConVQG) is a method that uses both image and text information to generate focused and relevant questions, outperforming state-of-the-art methods in VQG.


MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

http://arxiv.org/abs/2402.12845v1

Compressor summary: The paper proposes a new method for offline reinforcement learning that uses multimodal language models to integrate image-based states and text-based actions, leading to better performance and long-term strategy.


ICON: Improving Inter-Report Consistency of Radiology Report Generation via Lesion-aware Mix-up Augmentation

http://arxiv.org/abs/2402.12844v1

Compressor summary: The paper proposes ICON, a method to improve inter-report consistency in radiology report generation by aligning lesion attributes using lesion-aware mix-up augmentation.


SolarPanel Segmentation :Self-Supervised Learning for Imperfect Datasets

http://arxiv.org/abs/2402.12843v1

Compressor summary: The paper explores self-supervised learning to improve solar panel segmentation from images, reducing the need for manual annotations and enhancing model generalization.


PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

http://arxiv.org/abs/2402.12842v1

Compressor summary: PromptKD is a novel method for compressing generative language models using prompt tuning and student guidance, achieving state-of-the-art results with minimal parameters.


ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

http://arxiv.org/abs/2402.12840v1

Compressor summary: ArabicMMLU is a new benchmark for evaluating Arabic language understanding in multiple tasks, showing that current models still have significant room for improvement.


PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs

http://arxiv.org/abs/2402.12835v1

Compressor summary: PANDA is a method to improve the domain-specific performance of large language models without fine-tuning, using insights from expert models' response preferences.


Identifying Factual Inconsistency in Summaries: Towards Effective Utilization of Large Language Model

http://arxiv.org/abs/2402.12821v1

Compressor summary: The paper explores how to use large language models to detect factual inconsistencies in summaries without training and how to distill smaller, high-performance models for this task.


Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need?

http://arxiv.org/abs/2402.12819v1

Compressor summary: The paper studies how many labelled examples specialised models need to outperform general language models on NLP tasks, finding that they often require only a few samples ($100-1000$) depending on task complexity and variance.


On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

http://arxiv.org/abs/2402.12817v1

Compressor summary: The proposed method investigates the effects of randomness factors on learning with limited labelled data by controlling interactions and measuring individual factor's impact.


Scalable Decentralized Algorithms for Online Personalized Mean Estimation

http://arxiv.org/abs/2402.12812v1

Compressor summary: The study proposes two algorithms for collaborative mean estimation that allow agents to communicate with a limited number of peers, addressing scalability issues and offering optimal performance under certain conditions.


PIP-Net: Pedestrian Intention Prediction in the Wild

http://arxiv.org/abs/2402.12810v1

Compressor summary: The paper introduces PIP-Net, a framework that predicts pedestrian crossing intentions for autonomous vehicles using kinematic data and spatial features from multiple cameras, achieving state-of-the-art performance and presenting the Urban-PIP dataset.


Learning Generalization and Regularization of Nonhomogeneous Temporal Poisson Processes

http://arxiv.org/abs/2402.12808v1

Compressor summary: The paper proposes a regularized learning framework for estimating nonhomogeneous Poisson processes from limited data using adaptive, data-driven binning methods.


SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning

http://arxiv.org/abs/2402.12806v1

Compressor summary: SymBa is a novel solver-LLM integration that improves backward chaining performance, proof faithfulness, and efficiency in diverse multi-step reasoning benchmarks.


Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting

http://arxiv.org/abs/2402.12801v1

Compressor summary: Key points: - Large Language Models are widely used for natural language processing tasks, especially in low-resource settings - The paper evaluates their performance for few shot clinical entity recognition in English, French and Spanish - The paper finds that prompt-based models perform well outside the clinical domain, but lighter supervised taggers with masked language models perform better in the clinical domain - The paper suggests that Large Language Models are not ready for production use, but could help create annotated data Summary: The paper assesses how Large Language Models do few shot clinical entity recognition in different languages and finds that they work better for creating annotated data than for recognizing entities.


Radar-Based Recognition of Static Hand Gestures in American Sign Language

http://arxiv.org/abs/2402.12800v1

Compressor summary: The text discusses a new approach to recognizing hand gestures using radar sensors and synthetic data, which could improve virtual reality and human-computer interaction applications.


OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow

http://arxiv.org/abs/2402.12792v1

Compressor summary: The paper proposes a novel 3D scene representation method using differentiable volumetric rendering, 2D labels, temporal rendering, and occupancy flow to achieve state-of-the-art performance in semantic occupancy estimation.


From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition

http://arxiv.org/abs/2402.12790v1

Compressor summary: The paper evaluates XAI metrics on CAM and Grad-CAM for skeleton-based HAR and finds stability is more reliable than faithfulness, while CAM and Grad-CAM provide similar explanations.


Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling Approach

http://arxiv.org/abs/2402.12789v1

Compressor summary: The paper explores how fair classification can be achieved without using sensitive attributes in the training data by shifting the data distribution and sampling influential data.


RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer

http://arxiv.org/abs/2402.12788v1

Compressor summary: RhythmFormer uses a hierarchical temporal periodic transformer to leverage the quasi-periodic nature of remote photoplethysmography for non-contact physiological signal detection, achieving state-of-the-art performance with fewer parameters and reduced computational complexity.


Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations

http://arxiv.org/abs/2402.12786v1

Compressor summary: The paper introduces StyleTalk, a dataset for teaching LLMs to understand and respond to different speaking styles in spoken dialogue, and proposes the Spoken-LLM framework that uses it to improve performance over text-only baselines and prior speech LLMs methods.


Two-stage Rainfall-Forecasting Diffusion Model

http://arxiv.org/abs/2402.12779v1

Compressor summary: Key points: - The paper proposes a Two-stage Rainfall-Forecasting Diffusion Model (TRDM) to improve long-term rainfall forecasts and address spatial-temporal imbalance. - TRDM consists of two stages: capturing robust temporal information under low-resolution conditions in the first stage, and reconstructing low-resolution images into high-resolution images in the second stage. - The paper achieves state-of-the-art results on two datasets and releases the code on GitHub. Summary: The paper introduces a novel two-stage model for long-term rainfall prediction that combines temporal and spatial information in low-resolution and high-resolution stages, and outperforms existing methods on two datasets.


Acknowledgment of Emotional States: Generating Validating Responses for Empathetic Dialogue

http://arxiv.org/abs/2402.12770v1

Compressor summary: The study introduces a framework for empathetic AI dialogue that uses validation techniques, detects emotional states, and generates validating responses, outperforming previous models on textual and speech-based datasets.


When and How: Learning Identifiable Latent States for Nonstationary Time Series Forecasting

http://arxiv.org/abs/2402.12767v1

Compressor summary: The paper proposes the IDEA model that learns identifiable latent states to detect and disentangle distribution shifts in time series data and performs better than existing methods.


GOOD: Towards Domain Generalized Orientated Object Detection

http://arxiv.org/abs/2402.12765v1

Compressor summary: Key points: - The paper introduces domain generalized oriented object detection, a task that requires generalization across different domains. - The paper proposes GOOD, a detector that uses style hallucination by CLIP and two components (RAC and SEC) to learn stable content and orientation representations. - The paper shows that GOOD achieves state-of-the-art performance on multiple cross-domain settings. Summary: The paper presents GOOD, a detector that learns to generalize across different domains by using CLIP's style hallucination and two components (RAC and SEC) that stabilize content and orientation representations, achieving state-of-the-art results on multiple settings.


BronchoTrack: Airway Lumen Tracking for Branch-Level Bronchoscopic Localization

http://arxiv.org/abs/2402.12763v1

Compressor summary: BronchoTrack is a fast and accurate framework for real-time bronchoscope localization that works across different patients and airway generations.


Static vs. Dynamic Databases for Indoor Localization based on Wi-Fi Fingerprinting: A Discussion from a Data Perspective

http://arxiv.org/abs/2402.12756v1

Compressor summary: Key points: - Wi-Fi fingerprinting is a popular approach to indoor localization, but its performance depends on accurate and up-to-date fingerprint databases - Time-varying electromagnetic interferences in indoor environments can cause significant changes in the characteristics of fingerprint databases over time - The authors construct a dynamic database based on RSSI measurements and compare it with a static database for indoor localization using Gaussian process regression - The results show that the localization error increases with time using a static database, but a dynamic database can improve the localization performance Summary: The paper shows how time-varying Wi-Fi fingerprints affect indoor localization and proposes to use dynamic databases that update RSSI measurements over time, which can reduce the localization error compared to static databases.


Fingerprint Presentation Attack Detector Using Global-Local Model

http://arxiv.org/abs/2402.12754v1

Compressor summary: The paper proposes a new PAD method that uses global and local features to detect fingerprint spoofing attacks, improving detection performance over existing methods.


Model Composition for Multimodal Large Language Models

http://arxiv.org/abs/2402.12750v1

Compressor summary: The paper proposes a new approach for creating versatile multimodal large language models by composing existing ones and introduces a benchmark to test their performance on various tasks.


Me LLaMA: Foundation Large Language Models for Medical Applications

http://arxiv.org/abs/2402.12749v1

Compressor summary: Key points: - Me LLaMA is a medical LLM family based on LLaMA2 with continual pre-training and instruction tuning on large medical data - It outperforms existing open-source medical LLMs and commercial giants like ChatGPT and GPT-4 on various tasks and datasets - It is an open-source foundational LLM for the medical domain using biomedical and clinical data Summary: Me LLaMA is a new medical large language model that leverages continual pre-training and instruction tuning of LLaMA2 on large medical data, achieving superior performance on various medical tasks compared to other medical LLMs and commercial AI models.


MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion

http://arxiv.org/abs/2402.12741v1

Compressor summary: MuLan is a training-free text-to-image model that uses a large language model and a vision-language model to generate multi-object images with spatial relationships and attribute bindings by progressive planning and feedback control.


Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues

http://arxiv.org/abs/2402.12738v1

Compressor summary: This study evaluates GPT-4's performance in mental health counseling dialogues and finds it comparable to human counselors.


Guarantee Regions for Local Explanations

http://arxiv.org/abs/2402.12737v1

Compressor summary: The paper proposes an algorithm to find reliable and accurate local explanations for predictive models by identifying intervals where input features are trustworthy.


CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning

http://arxiv.org/abs/2402.12736v1

Compressor summary: The paper introduces Calibration side tuning, a lightweight fine-tuning strategy for object detection networks that adapts transformer techniques to ResNet, improving performance while maintaining efficient resource use.


UMBCLU at SemEval-2024 Task 1A and 1C: Semantic Textual Relatedness with and without machine translation

http://arxiv.org/abs/2402.12730v1

Compressor summary: Key points: - System developed for SemEval-2024 Task 1 on semantic textual relatedness (STR) between African and Asian languages - Explored supervised and cross-lingual training using large language models (LLMs) - Developed TranSem for subtask A and FineSem for subtask C, achieving mixed results Summary: The paper presents a system that uses LLMs to train models for STR between African and Asian languages, with varied performance on two subtasks.


Scalable and reliable deep transfer learning for intelligent fault detection via multi-scale neural processes embedded with knowledge

http://arxiv.org/abs/2402.12729v1

Compressor summary: The paper introduces a new deep transfer learning method for intelligent fault detection that uses neural processes, graph convolution networks, and multi-scale uncertainty analysis to improve performance on scarce fault data.


Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering

http://arxiv.org/abs/2402.12728v1

Compressor summary: MAIL is a novel method for knowledge-based visual question answering that leverages multimodal knowledge and LLMs to enhance image understanding and knowledge reasoning in complex scenarios.


Diffusion Posterior Sampling is Computationally Intractable

http://arxiv.org/abs/2402.12727v1

Compressor summary: Posterior sampling is a computationally intractable method for learning from measurements, even though unconditional sampling is fast and efficient.


Structural Knowledge Informed Continual Multivariate Time Series Forecasting

http://arxiv.org/abs/2402.12722v1

Compressor summary: Key points: - The paper proposes a novel framework (SKI-CL) for multivariate time series forecasting under different regimes - SKI-CL leverages structural knowledge to guide the model and uses memory replay to preserve the data from each regime - SKI-CL outperforms existing methods on synthetic and real datasets Summary: SKI-CL is a new framework for continual multivariate time series forecasting that uses structural knowledge and memory replay to overcome catastrophic forgetting and achieve better performance.


PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images

http://arxiv.org/abs/2402.12721v1

Compressor summary: The paper introduces a novel neural network model, PAC-FNO, that can handle image recognition tasks across different resolutions and natural variations using the frequency domain approach, improving performance significantly on seven benchmarks.


Spurious Correlations in Machine Learning: A Survey

http://arxiv.org/abs/2402.12715v1

Compressor summary: This survey reviews spurious correlations in machine learning models, their impact on generalization and robustness, and existing methods, datasets, benchmarks, and metrics to address them.


Equivariant Pretrained Transformer for Unified Geometric Learning on Multi-Domain 3D Molecules

http://arxiv.org/abs/2402.12714v1

Compressor summary: The Equivariant Pretrained Transformer (EPT) is a novel framework that harmonizes geometric learning of small molecules and proteins by using a block-enhanced representation, E(3) equivariance, and joint pretraining on both domains.


Are Large Language Models Rational Investors?

http://arxiv.org/abs/2402.12713v1

Compressor summary: The study introduces Financial Bias Indicators to evaluate the financial rationality of Large Language Models in financial analysis, finding varying degrees of irrationality influenced by design and training factors.


MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

http://arxiv.org/abs/2402.12712v1

Compressor summary: The paper proposes MVDiffusion++, a neural network that reconstructs 3D objects from few or no images, using self-attention and view dropout to achieve better performance and scalability.


Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

http://arxiv.org/abs/2402.12711v1

Compressor summary: The paper introduces uniform last-iterate (ULI) as a stronger performance measure for bandit algorithms that considers both cumulative and instantaneous performance, and shows its near-optimality and achievability in some settings.


Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

http://arxiv.org/abs/2402.12706v1

Compressor summary: DITeD is a method for few-shot action recognition that leverages temporal invariance in dynamic systems to transfer knowledge between domains using generative pre-training and adaptation stages.


From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges

http://arxiv.org/abs/2402.12702v1

Compressor summary: The paper discusses the potential and challenges of using generative AI for design in resource-constrained settings and suggests innovative approaches to make it efficient and accessible.


Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

http://arxiv.org/abs/2402.12694v1

Compressor summary: Key points: - The paper introduces Leddam, a method for multivariate time series forecasting that uses learnable decomposition and dual attention module. - Leddam captures dynamic trend information and inter-series dependencies and intra-series variations simultaneously. - Leddam outperforms state-of-the-art methods on eight open-source datasets, and can be plugged into other methods for a performance boost. Summary: Leddam is a novel method for multivariate time series forecasting that leverages learnable decomposition and dual attention module to capture trend and dependency information, achieving significant improvements over existing methods and being compatible with them.


FormulaQA: A Question Answering Dataset for Formula-Based Numerical Reasoning

http://arxiv.org/abs/2402.12692v1

Compressor summary: FormulaQA is a new dataset for testing AI's ability to apply formulas in numerical reasoning problems using junior high school physics questions and various LLM evaluation methods.


Tree-Planted Transformers: Large Language Models with Implicit Syntactic Supervision

http://arxiv.org/abs/2402.12691v1

Compressor summary: The paper proposes tree-planting, a method to integrate syntactic supervision into Transformer LMs without explicit syntax, improving performance on SyntaxGym benchmark.


Simpson's Paradox and the Accuracy-Fluency Tradeoff in Translation

http://arxiv.org/abs/2402.12690v1

Compressor summary: The paper investigates a theoretical puzzle about the relationship between accuracy and fluency in translation, showing that they are correlated at the corpus level but trade off at the segment level.


Learning on manifolds without manifold learning

http://arxiv.org/abs/2402.12687v1

Compressor summary: Key points: - The paper proposes a one-shot method for function approximation from randomly sampled data without knowing the manifold structure - The method uses spherical polynomials on an ambient hypersphere - The method achieves optimal rates of approximation for rough functions Summary: The paper presents a novel function approximation method using spherical polynomials on a hypersphere that works directly from random data without manifold information and has optimal performance for rough functions.


XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques

http://arxiv.org/abs/2402.12685v1

Compressor summary: This paper presents XRL-Bench, a unified benchmark for evaluating explainable reinforcement learning methods that use state importance to explain agent actions, and introduces TabularSHAP, a novel XRL method for tabular and image data.


TorchCP: A Library for Conformal Prediction based on PyTorch

http://arxiv.org/abs/2402.12683v1

Compressor summary: TorchCP is an open-source Python toolbox for efficient conformal prediction on PyTorch-based deep learning models.


Object-level Geometric Structure Preserving for Natural Image Stitching

http://arxiv.org/abs/2402.12677v1

Compressor summary: The paper proposes a novel image stitching method based on global similarity prior and triangular meshes that preserves object structures and outperforms existing methods.


Advancing Monocular Video-Based Gait Analysis Using Motion Imitation with Physics-Based Simulation

http://arxiv.org/abs/2402.12676v1

Compressor summary: The authors propose a method that uses reinforcement learning to control a physics simulation of human movement, enabling accurate gait analysis from smartphone videos without producing physically implausible results.


Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

http://arxiv.org/abs/2402.12675v1

Compressor summary: This paper compares object-centric deep learning models to a ResNet-50 baseline in learning visual relations from images using tasks derived from comparative cognition literature and finds that while object-centric models outperform the baseline in simpler tasks, they still struggle in more difficult ones.


Discriminant Distance-Aware Representation on Deterministic Uncertainty Quantification Methods

http://arxiv.org/abs/2402.12664v1

Compressor summary: The text introduces DDAR, a new method for estimating deterministic uncertainty in deep learning models by using prototypes in latent representations to analyze input features and overcome feature collapse.


SoftQE: Learned Representations of Queries Expanded by LLMs

http://arxiv.org/abs/2402.12663v1

Compressor summary: SoftQE uses LLMs to enhance query encoders for dense retrieval without additional latency or cost.


The FinBen: An Holistic Financial Benchmark for Large Language Models

http://arxiv.org/abs/2402.12659v1

Compressor summary: FinBen is a benchmark for evaluating large language models' financial skills, revealing their strengths and limitations in various tasks.


HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

http://arxiv.org/abs/2402.12656v1

Compressor summary: HyperMoE is a novel framework for language models that uses hypernetworks to balance sparsity and expert knowledge by transferring unselected experts' information to specific modules.


OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

http://arxiv.org/abs/2402.12654v1

Compressor summary: The paper proposes a fast, robust, and multilingual encoder-only speech model based on CTC that outperforms encoder-decoder models for speech processing tasks such as ASR, ST, and LID.


Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation

http://arxiv.org/abs/2402.12649v1

Compressor summary: The study finds no correspondence between decontextualized "trick tests" and realistic evaluations of gender-occupation bias in LLMs, suggesting that current benchmarks may not adequately assess real-world harm.


DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

http://arxiv.org/abs/2402.12647v1

Compressor summary: The paper proposes a probabilistic model using diffusion to estimate object shapes and correspondences from synthetic data, improving pose estimation performance and generalization.


Training Artificial Neural Networks by Coordinate Search Algorithm

http://arxiv.org/abs/2402.12646v1

Compressor summary: Key points: - The text introduces a gradient-free optimization algorithm for training neural networks (Coordinate Search) that can handle non-differentiable activation functions and multi-loss problems. - The algorithm bundles weights instead of optimizing each variable, which speeds up convergence and reduces dimension. - The proposed method sometimes outperforms gradient-based methods, especially with limited labeled data. Summary: The text proposes a gradient-free optimization algorithm for neural networks that can deal with non-differentiable functions and multiple losses, and shows that it can improve performance over gradient-based methods in some scenarios, such as low data availability.


Neuromorphic Synergy for Video Binarization

http://arxiv.org/abs/2402.12644v1

Compressor summary: The paper proposes a method to binarize blurry images of bimodal objects using event-based and image-based inference, resulting in sharp binary videos with high frame rate.


YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source Detection

http://arxiv.org/abs/2402.12641v1

Compressor summary: The article presents a new antenna interference source detection model called YOLO-Ant, which combines a lightweight CNN and transformer structure to effectively detect small objects with complex backgrounds.


StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

http://arxiv.org/abs/2402.12636v1

Compressor summary: StyleDubber is a new method for movie dubbing that uses phonemes instead of frames to generate speech that matches both time and emotion with the video, based on a reference audio track.


A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

http://arxiv.org/abs/2402.12627v1

Compressor summary: The text discusses the data change problem in AI models due to dynamic data, grouping domain shift and concept drift into one issue and reviewing state-of-the-art methods to tackle it.


Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors

http://arxiv.org/abs/2402.12626v1

Compressor summary: The paper explores data poisoning attacks on self-supervised learning models and proposes two types of attacks with different stages.


Compact NSGA-II for Multi-objective Feature Selection

http://arxiv.org/abs/2402.12625v1

Compressor summary: The paper proposes a compact binary optimization algorithm for feature selection in machine learning, which improves classification accuracy and reduces memory requirements by using probability vectors instead of two populations.


Efficient Parameter Mining and Freezing for Continual Object Detection

http://arxiv.org/abs/2402.12624v1

Compressor summary: The text discusses how to improve object detection in self-learning systems by focusing on important layers in the network.


Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

http://arxiv.org/abs/2402.12621v1

Compressor summary: The text proposes Reflect-RL, a system that uses online reinforcement learning to fine-tune language models for multi-round interactive tasks with a reflection model and data generation techniques.


Multi-objective Binary Coordinate Search for Feature Selection

http://arxiv.org/abs/2402.12616v1

Compressor summary: The paper proposes a new multi-objective coordinate search algorithm (MOCS) for large-scale feature selection that generates distinct subsets of features by flipping variables on the Pareto front, outperforming NSGA-II in speed and efficiency.


Analysis of Using Sigmoid Loss for Contrastive Learning

http://arxiv.org/abs/2402.12613v1

Compressor summary: The paper analyzes how using the sigmoid loss instead of the InfoNCE loss in contrastive learning affects the geometric structure of learned embeddings and proposes a framework to parameterize various embedding structures by a single variable.


Patient-Centric Knowledge Graphs: A Survey of Current Methods, Challenges, and Applications

http://arxiv.org/abs/2402.12608v1

Compressor summary: Patient-Centric Knowledge Graphs (PCKGs) are a new approach in healthcare that combines different types of patient data to give doctors a better understanding of the patient's health and help them provide more personalized care.