arxiv compressed, 2024-03-07

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-07 generated by the compressor, my personal LLM-based project.


Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

http://arxiv.org/abs/2403.03950v1

Compressor summary: The paper explores using categorical cross-entropy for training value functions in deep reinforcement learning, improving performance and scalability in various domains.


The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models

http://arxiv.org/abs/2403.03942v1

Compressor summary: The study reveals that different subnetworks within a single language model can achieve similar performance in-domain but generalize differently, and this is related to the use of attention heads that compute shallow and non-generalizing features versus higher-level ones.


GUIDE: Guidance-based Incremental Learning with Diffusion Models

http://arxiv.org/abs/2403.03938v1

Compressor summary: GUIDE is a new method for continuous learning that uses diffusion models and classifier guidance to generate rehearsal examples for forgotten information, reducing catastrophic forgetting and outperforming previous approaches.


Extreme Precipitation Nowcasting using Transformer-based Generative Models

http://arxiv.org/abs/2403.03929v1

Compressor summary: The paper proposes a new method for short-term precipitation prediction using Transformer models and shows its effectiveness in capturing extreme weather events.


Did Translation Models Get More Robust Without Anyone Even Noticing?

http://arxiv.org/abs/2403.03923v1

Compressor summary: The paper investigates how neural machine translation and large language models handle noisy inputs and shows they are more robust than previous models.


Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

http://arxiv.org/abs/2403.03920v1

Compressor summary: The paper examines how AI and ML methods can enhance instructional quality by analyzing educational content, teacher discourse, and student responses using Elmore's Instructional Core Framework.


A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets

http://arxiv.org/abs/2403.03909v1

Compressor summary: The paper proposes a method to measure the linguistic diversity of multilingual NLP datasets by comparing them to a reference language sample using features extracted from typological databases and automatic text-based measures.


DART: Implicit Doppler Tomography for Radar Novel View Synthesis

http://arxiv.org/abs/2403.03896v1

Compressor summary: DART is a neural method that uses radar physics to create realistic radar images for simulation and design.


IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

http://arxiv.org/abs/2403.03894v1

Compressor summary: The authors introduce SLTrans, a large dataset with source code files and compiler intermediate representations, to improve Code-LMs' multilingual capabilities for code generation tasks.


From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

http://arxiv.org/abs/2403.03893v1

Compressor summary: The paper proposes a method to reduce toxicity across multiple languages using various techniques, and evaluates its effectiveness on nine languages with different resources.


FaaF: Facts as a Function for the evaluation of RAG systems

http://arxiv.org/abs/2403.03888v1

Compressor summary: Facts as a Function (FaaF) is a new method for evaluating RAG systems that uses function calling abilities of LMs to improve fact verification efficiency and reliability.


SaulLM-7B: A pioneering Large Language Model for Law

http://arxiv.org/abs/2403.03883v1

Compressor summary: SaulLM-7B, a 7 billion parameter language model, excels at legal text comprehension and generation after being trained on a large English legal corpus.


Self and Mixed Supervision to Improve Training Labels for Multi-Class Medical Image Segmentation

http://arxiv.org/abs/2403.03882v1

Compressor summary: The paper proposes a dual-branch network and transfer learning method to automatically improve weak training labels for multi-class medical image segmentation, achieving significant improvements in accuracy on abdominal CT scans.


Latent Dataset Distillation with Diffusion Models

http://arxiv.org/abs/2403.03881v1

Compressor summary: LD3M is a new method that uses diffusion in latent space to generate high-quality synthetic images from small datasets for machine learning models.


Graph neural network outputs are almost surely asymptotically constant

http://arxiv.org/abs/2403.03880v1

Compressor summary: The predictions of GNNs for probabilistic classification tasks become constant as the graph size increases, limiting their expressive power.


Redefining cystoscopy with ai: bladder cancer diagnosis using an efficient hybrid cnn-transformer model

http://arxiv.org/abs/2403.03879v1

Compressor summary: The paper proposes a deep learning approach using CNNs and a transformer to detect and segment bladder cancer in cystoscopic images, improving accuracy and efficiency for diagnosis.


Impoverished Language Technology: The Lack of (Social) Class in NLP

http://arxiv.org/abs/2403.03874v1

Compressor summary: The authors survey NLP literature and find little attention to socio-economic class, which they propose to include in future language technologies.


Learning to Decode Collaboratively with Multiple Language Models

http://arxiv.org/abs/2403.03870v1

Compressor summary: The paper proposes a method to teach large language models to collaborate by interleaving their generations and optimizing the marginal likelihood, leading to better performance on various tasks and interesting collaboration patterns.


On the Origins of Linear Representations in Large Language Models

http://arxiv.org/abs/2403.03867v1

Compressor summary: This paper investigates how large language models learn linear representations of semantic concepts and shows that a simple latent variable model can explain this phenomenon.


KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions

http://arxiv.org/abs/2403.03866v1

Compressor summary: The paper introduces KIWI, a dataset for evaluating large language models' ability to follow instructions and provide writing assistance in the scientific domain, finding that current models struggle with incorporating new information and judging their own performance.


Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

http://arxiv.org/abs/2403.03864v1

Compressor summary: The paper presents AlgoPuzzleVQA, a new dataset for multimodal puzzle-solving tasks that require visual understanding, language comprehension, and complex algorithmic reasoning, and shows that large language models struggle with these tasks.


X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification

http://arxiv.org/abs/2403.03863v1

Compressor summary: X-shot learning is a new challenge that involves adapting to different levels of label occurrences in real-world settings, and BinBin is a versatile system that outperforms previous methods on this task.


Designing Informative Metrics for Few-Shot Example Selection

http://arxiv.org/abs/2403.03861v1

Compressor summary: The paper proposes a method to select complex examples for few-shot sequence tagging tasks, improving the performance of pretrained language models significantly.


Emojinize : Enriching Any Text with Emoji Translations

http://arxiv.org/abs/2403.03857v1

Compressor summary: Emojinize is a method that uses large language models to translate text phrases into sequences of one or more emoji, increasing guessability and enabling various applications.


Public-data Assisted Private Stochastic Optimization: Power and Limitations

http://arxiv.org/abs/2403.03856v1

Compressor summary: The paper studies how well private algorithms can work using public data, focusing on optimization problems and showing the limits and optimal strategies for different settings.


ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation

http://arxiv.org/abs/2403.03854v1

Compressor summary: The paper proposes a method to improve unsupervised domain adaptation for semantic segmentation by using confident pseudo-labels from target data as data augmentation.


ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

http://arxiv.org/abs/2403.03853v1

Compressor summary: The study introduces Block Influence, a metric to measure layer significance in large language models, and proposes ShortGPT, a method that removes redundant layers based on their scores, achieving better results than previous methods.


Accelerating Convergence of Score-Based Diffusion Models, Provably

http://arxiv.org/abs/2403.03852v1

Compressor summary: The paper proposes novel training-free algorithms to accelerate diffusion generative models' samplers, achieving faster convergence rates than existing methods.


On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder

http://arxiv.org/abs/2403.03846v1

Compressor summary: The paper explores using distillation to defend against poisoned encoders in SSL by transferring benign knowledge from a teacher net to a student net, achieving significant reduction in attack success rate with minimal accuracy loss.


Feature Selection as Deep Sequential Generative Learning

http://arxiv.org/abs/2403.03838v1

Compressor summary: The text proposes a new method for feature selection that uses a deep variational transformer model to generate feature selection decision sequences based on learned utility scores.


Cobweb: An Incremental and Hierarchical Model of Human-Like Category Learning

http://arxiv.org/abs/2403.03835v1

Compressor summary: Cobweb is a human-like categorization system that builds hierarchical structures using utility, captures psychological effects, and can exhibit both exemplar and prototype learning.


Your device may know you better than you know yourself -- continuous authentication on novel dataset using machine learning

http://arxiv.org/abs/2403.03832v1

Compressor summary: The text is about a study on continuous authentication using gesture data from Minecraft players and machine learning models, with the most accurate model achieving 90% accuracy.


From Clicks to Security: Investigating Continuous Authentication via Mouse Dynamics

http://arxiv.org/abs/2403.03828v1

Compressor summary: The paper explores how mouse movement patterns can be used as a reliable and efficient method for continuous user authentication in gaming scenarios, using various machine learning models.


Temporal Enhanced Floating Car Observers

http://arxiv.org/abs/2403.03825v1

Compressor summary: The text describes how sensor-equipped vehicles can collect traffic data efficiently by emulating detection in a microscopic traffic simulation and using deep learning to recover hidden vehicles, improving traffic management.


A Modular Approach for Multimodal Summarization of TV Shows

http://arxiv.org/abs/2403.03823v1

Compressor summary: The paper proposes a modular approach for summarizing TV shows using scene detection, reordering, visual-to-text conversion, dialogue summarization, and fusion, and introduces PREFS, a new metric that evaluates summary quality based on fact recall and precision.


Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ

http://arxiv.org/abs/2403.03814v1

Compressor summary: MultiQ is a benchmark to evaluate multilingual question answering by open LLMs beyond their intended languages.


ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

http://arxiv.org/abs/2403.03812v1

Compressor summary: ProbSAINT is a machine learning model for used car pricing that can quantify uncertainties and adapt to different expected offer durations, providing accurate and fair transactions.


KG-TREAT: Pre-training for Treatment Effect Estimation by Synergizing Patient Data with Knowledge Graphs

http://arxiv.org/abs/2403.03791v1

Compressor summary: KG-TREAT is a novel framework that uses biomedical knowledge graphs to enhance the estimation of treatment effects on patient outcomes, showing improved performance over existing methods.


Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

http://arxiv.org/abs/2403.03790v1

Compressor summary: The article proposes a novel unified visual-language model called Popeye for multi-source ship detection from remote sensing imagery using various methods, knowledge adaption, and pixel-level segmentation.


PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

http://arxiv.org/abs/2403.03788v1

Compressor summary: The paper introduces a benchmark (PPTC-R) to evaluate how well Large Language Models (LLMs) can complete PowerPoint tasks in different real-world situations, and finds that GPT-4 performs best but all LLMs struggle with multiple challenges.


ENOT: Expectile Regularization for Fast and Accurate Training of Neural Optimal Transport

http://arxiv.org/abs/2403.03777v1

Compressor summary: The paper introduces Expectile-Regularised Neural Optimal Transport (ENOT), a new method for estimating optimal transportation plans that improves both accuracy and efficiency compared to previous approaches.


Verified Training for Counterfactual Explanation Robustness under Data Shift

http://arxiv.org/abs/2403.03773v1

Compressor summary: VeriTraCER generates counterfactual explanations that are robust to small model updates, providing users with reliable guidance on how to change their inputs to achieve desired predictions.


AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs

http://arxiv.org/abs/2403.03772v1

Compressor summary: The paper presents a method to efficiently parallelize existing causal discovery methods, enabling their application on large-scale datasets by significantly speeding up the process.


DeepCRE: Revolutionizing Drug R&D with Cutting-Edge Computational Models

http://arxiv.org/abs/2403.03768v1

Compressor summary: DeepCRE is a novel computational model that significantly improves cross-drug response evaluation and can help discover new therapeutics.


German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset

http://arxiv.org/abs/2403.03750v1

Compressor summary: The paper introduces a new dataset for detecting hallucinations in German summarization tasks using large language models, which can help improve their faithfulness to the source text.


Towards Safe and Aligned Large Language Models for Medicine

http://arxiv.org/abs/2403.03744v1

Compressor summary: This paper evaluates the safety and alignment of medical large language models (LLMs) using a dataset of harmful questions and suggests fine-tuning as a mitigation strategy to reduce potential harms.


SUPClust: Active Learning at the Boundaries

http://arxiv.org/abs/2403.03741v1

Compressor summary: SUPClust is an active learning method that focuses on finding points at the decision boundary between classes to improve model performance, especially in imbalanced datasets.


Self-supervised Photographic Image Layout Representation Learning

http://arxiv.org/abs/2403.03740v1

Compressor summary: Our method learns to represent photographic image layouts using heterogeneous graphs and autoencoders, outperforming existing approaches with a new dataset, LODB.


A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

http://arxiv.org/abs/2403.03739v1

Compressor summary: The paper proposes A&B BNN, which reduces binary neural network's multiplication operations by replacing them with bit operations and achieves competitive results on image classification tasks.


Probabilistic Topic Modelling with Transformer Representations

http://arxiv.org/abs/2403.03737v1

Compressor summary: The TNTM model combines transformer embeddings and probabilistic modelling to achieve high-quality topic representation with fast inference and good diversity.


Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

http://arxiv.org/abs/2403.03736v1

Compressor summary: The paper introduces a novel image generation-compression paradigm that uses vector-quantized image models and a multi-stage transformer to improve perceptual quality, especially in ultra-low bitrate scenarios.


Learning 3D object-centric representation through prediction

http://arxiv.org/abs/2403.03730v1

Compressor summary: The paper presents a new network architecture that learns to segment, localize, and perceive depth of objects from images and self-motion, mimicking human infants' abilities.


Bridging Diversity and Uncertainty in Active learning with Self-Supervised Pre-Training

http://arxiv.org/abs/2403.03728v1

Compressor summary: The study proposes a heuristic called TCM that combines diversity-based and uncertainty-based sampling strategies for active learning, improving performance across different data levels.


Diffusion on language model embeddings for protein sequence generation

http://arxiv.org/abs/2403.03726v1

Compressor summary: DiMA is a model that generates diverse and accurate protein sequences using continuous diffusion on embeddings derived from the ESM-2 protein language model, outperforming existing methods in quality and diversity metrics.


CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection

http://arxiv.org/abs/2403.03721v1

Compressor summary: CMDA is a novel unsupervised domain adaptation method that uses visual semantic cues from camera images to improve generalization of 3D object detection models across different domains.


Multimodal Transformer for Comics Text-Cloze

http://arxiv.org/abs/2403.03719v1

Compressor summary: The paper presents a Multimodal-LLM architecture for Text-cloze, a challenging task in comics, which improves performance by using a Domain-Adapted ResNet-50 visual encoder and new OCR annotations.


MeaCap: Memory-Augmented Zero-shot Image Captioning

http://arxiv.org/abs/2403.03715v1

Compressor summary: The paper proposes a novel memory-augmented method for zero-shot image captioning that generates concept-centered captions with fewer hallucinations and more world-knowledge, outperforming existing methods.


Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision

http://arxiv.org/abs/2403.03707v1

Compressor summary: The paper proposes a framework that learns pixel-level alignment between images and text to improve semantic segmentation without dense annotations.


Causal Prototype-inspired Contrast Adaptation for Unsupervised Domain Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery

http://arxiv.org/abs/2403.03704v1

Compressor summary: CPCA is a novel method for semantic segmentation of remote sensing imagery across domains, which disentangles causal features from bias features and adapts to invariant causal factors using intervention techniques.


Towards Controllable Time Series Generation

http://arxiv.org/abs/2403.03698v1

Compressor summary: The paper introduces CTS, a framework that generates controllable time series by decoupling the mapping process from VAE training and evaluates its effectiveness on three real-world datasets.


MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

http://arxiv.org/abs/2403.03691v1

Compressor summary: MolNexTR is a deep learning model that converts molecular images into graph structures and SMILES strings by fusing ConvNext and Vision-Transformer, using advanced algorithms for data augmentation and post-processing to handle diverse image styles.


Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

http://arxiv.org/abs/2403.03690v1

Compressor summary: Key points: - Human annotation is costly for creating instruction data and evaluation benchmarks for large language models in non-English languages like Japanese. - The authors propose a self-instruct method based on GPT-4 that uses English instructions translated and edited into Japanese as demonstrations to generate Japanese instruction data. - They also construct an evaluation benchmark using GPT-4 to assess the response quality of LLMs without human references, which shows their models outperformed existing ones. Summary: The authors use GPT-4 to create instruction data and evaluation benchmarks for large language models in Japanese with minimal human annotation, achieving better results than previous methods.


General2Specialized LLMs Translation for E-commerce

http://arxiv.org/abs/2403.03689v1

Compressor summary: The paper proposes a two-step fine-tuning method to improve Neural Machine Translation for domains with special writing formulas like e-commerce, using domain-specific resources and self-contrastive semantic enhancement.


Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors

http://arxiv.org/abs/2403.03674v1

Compressor summary: The study proposes AdvIG, a novel infrared physical attack that uses geometric shapes and optimizes their parameters to execute efficient black-box query attacks with high success rates.


Learning Adversarial MDPs with Stochastic Hard Constraints

http://arxiv.org/abs/2403.03672v1

Compressor summary: Key points: - Study online learning problems in CMDPs with adversarial losses and hard constraints - Design two algorithms: one for general CMDPs with sublinear regret and constraint violation, and one for CMDPs with known policy that satisfies constraints with high probability and has sublinear regret - First work to consider both adversarial losses and hard constraints in CMDPs - Algorithms can handle non-stationary environments and stricter requirements than existing ones - Applicable to real-world scenarios like autonomous driving, online advertising, and recommender systems Summary: The paper proposes two novel algorithms for online learning in CMDPs with adversarial losses and hard constraints, achieving sublinear regret and satisfying constraints in non-stationary environments, and applies them to various real-world problems.


Portraying the Need for Temporal Data in Flood Detection via Sentinel-1

http://arxiv.org/abs/2403.03671v1

Compressor summary: The authors propose a new method to detect floods in remote sensing data using temporal anomaly detection and show promising results.


CDC: A Simple Framework for Complex Data Clustering

http://arxiv.org/abs/2403.03670v1

Compressor summary: The paper proposes a simple framework for clustering complex data with linear complexity by using graph filtering and similarity-preserving regularization.


Provable Filter for Real-world Graph Clustering

http://arxiv.org/abs/2403.03666v1

Compressor summary: The paper proposes a novel method for graph clustering that handles both homophilic and heterophilic graphs, outperforming existing methods in experiments.


Harnessing Meta-Learning for Improving Full-Frame Video Stabilization

http://arxiv.org/abs/2403.03662v1

Compressor summary: The paper proposes a test-time adaptation method that improves pixel-level synthesis solutions for video stabilization by adapting models to individual input videos, leading to significant stability and quality gains with only one adaptation step.


Robust Graph Structure Learning under Heterophily

http://arxiv.org/abs/2403.03659v1

Compressor summary: Key points: - Graph is important for learning tasks but often noisy and sparse - Most methods assume homophilic graphs, ignoring heterophily - Proposed a novel robust graph structure learning method for heterophilic data - Method uses high-pass filter, adaptive norm, and regularizer to refine graph structure Summary: The paper proposes a new method to learn robust graphs from noisy and sparse heterophilic data by using a high-pass filter, an adaptive norm, and a regularizer.


K-Link: Knowledge-Link Graph from LLMs for Enhanced Representation Learning in Multivariate Time-Series Data

http://arxiv.org/abs/2403.03645v1

Compressor summary: K-Link uses large language models to create a knowledge-link graph that improves graph construction from multivariate time-series data, enhancing graph neural network performance on various tasks.


A Survey on Applications of Reinforcement Learning in Spatial Resource Allocation

http://arxiv.org/abs/2403.03643v1

Compressor summary: This paper reviews recent reinforcement learning methods for spatial resource allocation problems, discussing their advantages, challenges, and open questions.


Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People

http://arxiv.org/abs/2403.03640v1

Compressor summary: The authors develop multilingual medical AI models that can provide tailored healthcare services in various languages, using the ApolloCorpora dataset and achieving state-of-the-art performance.


SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

http://arxiv.org/abs/2403.03636v1

Compressor summary: SheetAgent is a novel autonomous agent that uses a large language model to perform long-horizon and multi-category spreadsheet manipulation tasks with reasoning, achieving improved precision and table reasoning abilities.


Tackling Missing Values in Probabilistic Wind Power Forecasting: A Generative Approach

http://arxiv.org/abs/2403.03631v1

Compressor summary: The paper proposes an efficient probabilistic forecasting method for wind power data with missing values by using a generative model that estimates joint distributions without preprocessing.


GPTopic: Dynamic and Interactive Topic Representations

http://arxiv.org/abs/2403.03628v1

Compressor summary: GPTopic is a software package that uses large language models to create interactive, dynamic topic representations for text corpora, making topic modeling more accessible and comprehensive.


Multimodal Large Language Models to Support Real-World Fact-Checking

http://arxiv.org/abs/2403.03627v1

Compressor summary: The authors propose a method to assess how well multimodal language models can support fact-checking and find that GPT-4V performs better than existing models, while also identifying their limitations and biases.


GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

http://arxiv.org/abs/2403.03608v1

Compressor summary: GSNeRF is a method to generate images and semantic maps from unseen scenes by combining multi-view inputs, semantic features, and geometry information.


The Geometric Structure of Topic Models

http://arxiv.org/abs/2403.03607v1

Compressor summary: The paper proposes a new method to analyze and visualize topic models, which allows for higher-dimensional conceptual relationships between topics.


A Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendation

http://arxiv.org/abs/2403.03600v1

Compressor summary: The paper proposes a privacy-preserving framework using multi-modal data to improve cross-domain recommendation accuracy by disentangling domain-common and domain-specific features.


Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

http://arxiv.org/abs/2403.03599v1

Compressor summary: The paper proposes CIT, a mechanism that improves GNNs' generalization by transferring cluster information and preserving node diversity when the test graph structure differs from the training one.


Assessing the Aesthetic Evaluation Capabilities of GPT-4 with Vision: Insights from Group and Individual Assessments

http://arxiv.org/abs/2403.03594v1

Compressor summary: This study shows how GPT-4 with Vision can predict human aesthetic evaluations of images better than other models and suggests creating an AI system based on scientific knowledge of beauty perception.


RouteExplainer: An Explanation Framework for Vehicle Routing Problem

http://arxiv.org/abs/2403.03585v1

Compressor summary: RouteExplainer is a framework that provides post-hoc explanations for vehicle routing problems by classifying edges based on their intentions and using large language models to generate explanation texts.


Design of an Open-Source Architecture for Neural Machine Translation

http://arxiv.org/abs/2403.03582v1

Compressor summary: adaptNMT is an open-source tool for easy development and deployment of neural machine translation models with features like subword segmentation, intuitive UI, and eco-friendly evaluation.


Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing

http://arxiv.org/abs/2403.03581v1

Compressor summary: The study used various AI models to analyze tweets and showed their potential in improving ASD diagnosis with high accuracy.


gaHealth: An English-Irish Bilingual Corpus of Health Data

http://arxiv.org/abs/2403.03575v1

Compressor summary: The gaHealth corpus, a bilingual dataset for English to Irish health translation, improved BLEU scores by 40% compared to the best performing models from the LoResMT2021 Shared Task, and provides linguistic guidelines for creating similar low-resource data sets.


On Transfer in Classification: How Well do Subsets of Classes Generalize?

http://arxiv.org/abs/2403.03569v1

Compressor summary: The paper proposes a theoretical framework for analyzing class transferability in machine learning using a partially ordered set of subsets of classes and explores its practical applications in few-shot learning.


Efficient Algorithms for Empirical Group Distributional Robust Optimization and Beyond

http://arxiv.org/abs/2403.03562v1

Compressor summary: The paper proposes a new algorithm for group distributionally robust optimization with better performance and convergence guarantees than existing methods.


HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

http://arxiv.org/abs/2403.03561v1

Compressor summary: HMD-Poser is a novel approach for real-time human motion tracking using scalable sparse observations from a VR headset and body-worn inertial measurement units (IMUs), achieving state-of-the-art accuracy and performance.


Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

http://arxiv.org/abs/2403.03558v1

Compressor summary: The paper introduces a new method for evaluating language models' reliability in answering math word problems using an unanswerable question dataset and shows that training with human feedback improves their performance.


Emotional Manipulation Through Prompt Engineering Amplifies Disinformation Generation in AI Large Language Models

http://arxiv.org/abs/2403.03550v1

Compressor summary: The study shows how OpenAI's LLMs can create fake news and respond to emotions, and suggests that they should be used responsibly to prevent misinformation.


Prompt Mining for Language-based Human Mobility Forecasting

http://arxiv.org/abs/2403.03544v1

Compressor summary: The paper proposes a new framework for designing diverse and effective prompts to improve language-based forecasting of human mobility patterns using large language models.


DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

http://arxiv.org/abs/2403.03542v1

Compressor summary: The paper introduces a new pre-training method and a scalable model architecture for neural operators in partial differential equations, achieving state-of-the-art results on various benchmarks and tasks.


Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

http://arxiv.org/abs/2403.03535v1

Compressor summary: This paper introduces Task Attribute Distance (TAD), a model-agnostic metric to quantify the relationship between training and novel tasks in few-shot learning, and shows its effectiveness in applications like data augmentation and test-time intervention.


Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

http://arxiv.org/abs/2403.03532v1

Compressor summary: EYOC is an unsupervised method for registering distant point clouds in driving scenarios without global pose labels, achieving comparable performance to supervised methods and better generalization.


BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation

http://arxiv.org/abs/2403.03521v1

Compressor summary: The paper introduces a bidirectional semantic-based evaluation method for neural machine translation that uses BabelNet to measure the sense distance between source and output sentences.


IB-Net: Initial Branch Network for Variable Decision in Boolean Satisfiability

http://arxiv.org/abs/2403.03517v1

Compressor summary: IB-Net is a framework that uses graph neural networks to help solve Boolean Satisfiability problems, making Electronic Design Automation faster and more efficient.


Unsupervised Multilingual Dense Retrieval via Generative Pseudo Labeling

http://arxiv.org/abs/2403.03516v1

Compressor summary: UMR is an unsupervised method to train multilingual dense retrievers using sequence likelihood estimation of multilingual language models, achieving better performance than supervised baselines.


CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

http://arxiv.org/abs/2403.03514v1

Compressor summary: CLongEval is a comprehensive Chinese benchmark for evaluating long-context language models with sufficient data volume, broad applicability, and high quality.


Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation

http://arxiv.org/abs/2403.03512v1

Compressor summary: The paper proposes a two-stage network that uses global and local contrastive learning to improve multi-organ segmentation with semi-supervised learning, considering relations among images and categories.


Probing the Robustness of Time-series Forecasting Models with CounterfacTS

http://arxiv.org/abs/2403.03508v1

Compressor summary: CounterfacTS is a tool that helps visualize and create counterfactuals for time-series forecasting models, enabling users to explore how changes in the data affect their performance and robustness.


GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

http://arxiv.org/abs/2403.03507v1

Compressor summary: GaLore is a training strategy for large language models that reduces memory usage while maintaining efficiency and performance in pre-training and fine-tuning stages.


Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts

http://arxiv.org/abs/2403.03506v1

Compressor summary: The study examines detecting AI-generated sentences in realistic human-AI collaboration texts and suggests using the CoAuthor dataset and considering segment length for better detection.


A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

http://arxiv.org/abs/2403.03496v1

Compressor summary: The paper introduces multi-source Wizard of Wikipedia (Ms.WoW), a benchmark for evaluating dialogue systems that can select and use knowledge from multiple sources, and a challenge to test their ability to adapt to new sources.


VastTrack: Vast Category Visual Object Tracking

http://arxiv.org/abs/2403.03493v1

Compressor summary: The paper introduces VastTrack, a large-scale benchmark for visual tracking with diverse object categories, more videos, and rich linguistic annotations.


NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

http://arxiv.org/abs/2403.03485v1

Compressor summary: NoiseCollage is a new text-to-image diffusion model that improves layout conditions by independently estimating and cropping noises for each object, outperforming several existing models and integrating with ControlNet to enhance edge, sketch, and pose skeleton control.


A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

http://arxiv.org/abs/2403.03483v1

Compressor summary: The paper proposes Teacher-Free Graph Self-Distillation (TGS), a method that improves the performance of MLPs on graph-related tasks without using GNNs or teachers, achieving fast inference and outperforming existing methods.


Magic Markup: Maintaining Document-External Markup with an LLM

http://arxiv.org/abs/2403.03481v1

Compressor summary: The paper presents a system that uses language models to automatically update annotations in changing text documents, enabling new applications in program writing and debugging.


Continual Segmentation with Disentangled Objectness Learning and Class Recognition

http://arxiv.org/abs/2403.03477v1

Compressor summary: CoMasTRe is a two-stage segmentation method that combines objectness learning and classification, using distillation to improve performance and prevent forgetting on PASCAL VOC and ADE20K.


Inverse-Free Fast Natural Gradient Descent Method for Deep Learning

http://arxiv.org/abs/2403.03473v1

Compressor summary: The paper introduces FNGD, a fast natural gradient descent method that reduces computational complexity and achieves speedup in image classification and machine translation tasks.


Boosting Meta-Training with Base Class Information for Few-Shot Learning

http://arxiv.org/abs/2403.03472v1

Compressor summary: Our proposed end-to-end training method for few-shot learning combines cross entropy loss with meta-learning and improves performance significantly.


Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

http://arxiv.org/abs/2403.03468v1

Compressor summary: The paper introduces a new real-time multi-task network for autonomous driving that handles 3D object detection, semantic segmentation, and dense depth estimation using a task-adaptive attention generator to prevent negative transfer.


Self-Attention Empowered Graph Convolutional Network for Structure Learning and Node Embedding

http://arxiv.org/abs/2403.03465v1

Compressor summary: The paper introduces GCN-SA, a graph neural network that uses self-attention to capture long-range dependencies and perform better representation learning on heterophilous graphs.


FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion

http://arxiv.org/abs/2403.03463v1

Compressor summary: Key points: - Machine learning benefits research fields but challenges remain for small/rare object detection - The authors present a dataset automata using diffusion models to generate paired wildfire images with controlled flame position and size - They vary the background of synthesized images by controlling text prompt and input image - They use CLIP model to filter out low-quality images and preserve domain shift Summary: The authors propose a dataset automata that uses diffusion models and CLIP filtering to generate high-quality, paired wildfire images with controlled flame features and varying backgrounds for small/rare object detection tasks.


A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video

http://arxiv.org/abs/2403.03461v1

Compressor summary: The paper introduces YoutubeFish-35, a new large-scale dataset for indiscernible object counting, and proposes TransVidCount, a method that combines density and regression branches to perform well on it.


Slot Abstractors: Toward Scalable Abstract Visual Reasoning

http://arxiv.org/abs/2403.03458v1

Compressor summary: Slot Abstractors is a scalable method for abstract visual reasoning with multi-object inputs and multiple relations, achieving state-of-the-art performance in four tasks.


DLP-GAN: Learning to Draw Modern Chinese Landscape Photos with Generative Adversarial Network

http://arxiv.org/abs/2403.03456v1

Compressor summary: Key points: - The paper proposes DLP-GAN, a novel framework for translating ancient Chinese landscape paintings into modern photos and sketches. - The framework uses asymmetric cycle mapping, dense-fusion module, and dual-consistency loss to balance realism and abstraction. - Experiments show that the model outperforms existing methods. Summary: The paper presents DLP-GAN, a new method for converting ancient Chinese landscape paintings into modern photos and sketches using a novel framework with various components to ensure quality.


Learning Constrained Optimization with Deep Augmented Lagrangian Methods

http://arxiv.org/abs/2403.03454v1

Compressor summary: The paper proposes a new method for learning to optimize constrained problems using dual solution estimates and improves convergence by incorporating augmented Lagrangian techniques.


D4C glove-train: solving the RPM and Bongard-logo problem by distributing and Circumscribing concepts

http://arxiv.org/abs/2403.03452v1

Compressor summary: The paper proposes novel methods for abstract reasoning tasks like Raven's Progressive Matrices and Bongard-Logo problems by redefining concept boundaries and improving distribution estimation, leading to state-of-the-art performance.


Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering

http://arxiv.org/abs/2403.03448v1

Compressor summary: The paper introduces a new method to improve kernel k-means clustering by integrating both kernel correlation and dissimilarity, leading to better performance and more objective information extraction.


HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

http://arxiv.org/abs/2403.03447v1

Compressor summary: HDRFlow is a robust and efficient flow estimator that uses an HDR-domain alignment loss, an efficient flow network with a multi-size large kernel, and synthetic data to reconstruct high dynamic range video from alternating exposure image sequences in real-time.


Uncertainty quantification for deeponets with ensemble kalman inversion

http://arxiv.org/abs/2403.03444v1

Compressor summary: The paper proposes a novel inference approach using Ensemble Kalman Inversion (EKI) to efficiently and informatively estimate uncertainty in DeepONet predictions, especially for limited and noisy data.


VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual Entailment Recognition

http://arxiv.org/abs/2403.03435v1

Compressor summary: The paper presents the first research on AI for the Vietnamese language in the legal domain, highlighting key linguistic challenges.


Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

http://arxiv.org/abs/2403.03432v1

Compressor summary: The Mixture-of-LoRAs (MoA) architecture is a novel tuning method that enhances multi-task learning with large language models by combining multiple LoRA modules using an explicit routing strategy and domain labels, enabling quick adaptation to new domains.


Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

http://arxiv.org/abs/2403.03431v1

Compressor summary: This paper analyzes how cross and self-attention maps in Stable Diffusion affect image editing and proposes a simpler, more efficient tuning-free method based on the findings.


Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

http://arxiv.org/abs/2403.03425v1

Compressor summary: 3DToMolo is an innovative deep learning method that generates novel molecules with specified symmetries and properties by harmonizing diverse modalities and aligning them seamlessly.


LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

http://arxiv.org/abs/2403.03421v1

Compressor summary: LEAD is a novel method for Universal Domain Adaptation without source data that uses feature decomposition and instance-level decision boundaries to identify target-private data.


Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

http://arxiv.org/abs/2403.03419v1

Compressor summary: D$^2$O is a novel alignment method for LLMs that uses only negative samples to reduce harmfulness while preserving helpfulness, achieving superior results in generating safe and informative responses.


Leveraging The Finite States of Emotion Processing to Study Late-Life Mental Health

http://arxiv.org/abs/2403.03414v1

Compressor summary: The text describes a method called vcHMM that uses Hidden Markov Models and Finite State Automata to study the system-level dynamics of mental health, questionnaire data, and fMRI data, providing insights into how behavior and neural activity relate to depression.


Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design

http://arxiv.org/abs/2403.03412v1

Compressor summary: OOD-R is a curated dataset with enhanced noise reduction and ActFun is a method to fine-tune model response for better OOD detection and uncertainty estimation in neural networks.


Prediction Of Cryptocurrency Prices Using LSTM, SVM And Polynomial Regression

http://arxiv.org/abs/2403.03410v1

Compressor summary: The text compares three algorithms for predicting crypto currency prices and finds that the Support Vector Machine with a linear kernel has the smallest error.


Scene Depth Estimation from Traditional Oriental Landscape Paintings

http://arxiv.org/abs/2403.03408v1

Compressor summary: The authors propose a novel framework for estimating scene depth from oriental landscape painting images, enabling 3D sculpture creation and improving accessibility for visually impaired people.


An EnKF-LSTM Assimilation Algorithm for Crop Growth Model

http://arxiv.org/abs/2403.03406v1

Compressor summary: The paper proposes an EnKF-LSTM data assimilation method for crop growth prediction that combines ensemble Kalman filter and LSTM neural network, improving accuracy by incorporating real-time data.


Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation

http://arxiv.org/abs/2403.03405v1

Compressor summary: CausalVLN is a framework that uses causal learning to train robust navigators with unbiased feature representations, improving generalization across different environments.


BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving

http://arxiv.org/abs/2403.03401v1

Compressor summary: BAIT is a framework to compare learning methods in Interactive Theorem Proving, showing that Structure Aware Transformers perform well and leading to a novel end-to-end system.


Contrastive Learning of Person-independent Representations for Facial Action Unit Detection

http://arxiv.org/abs/2403.03400v1

Compressor summary: The paper proposes a method to learn facial action unit (AU) representations from unlabelled videos using contrastive learning, which improves AU detection performance and reduces data scarcity.


Japanese-English Sentence Translation Exercises Dataset for Automatic Grading

http://arxiv.org/abs/2403.03396v1

Compressor summary: The paper introduces a new task of grading sentence translation exercises in L2 language learning and creates a dataset for it, showing that existing models struggle to classify responses accurately.


Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection

http://arxiv.org/abs/2403.03390v1

Compressor summary: The paper proposes a semi-supervised learning framework for weed detection using object detection frameworks, which achieves high accuracy with less labeled data and promotes sustainable agriculture.


Adaptive Discovering and Merging for Incremental Novel Class Discovery

http://arxiv.org/abs/2403.03382v1

Compressor summary: ADM is a new paradigm for lifelong learning that adaptively discovers and merges novel classes without losing established knowledge, using Triple Comparison and Probability Regularization for category assignment and Adaptive Model Merging for knowledge integration.