arxiv compressed, 2024-02-02

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-02 generated by the compressor, my personal LLM-based project.


AToM: Amortized Text-to-Mesh using 2D Diffusion

http://arxiv.org/abs/2402.00867v1

Compressor summary: AToM is a fast text-to-mesh framework that generates high-quality 3D models from multiple text prompts simultaneously and generalizes well to unseen inputs.


We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline

http://arxiv.org/abs/2402.00868v1

Compressor summary: This paper compares Image-DAS and Video-DAS methods for semantic segmentation, finding that Image-DAS outperforms Video-DAS and suggests a lack of improvement from combining the two approaches.


Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

http://arxiv.org/abs/2402.00865v1

Compressor summary: Feature-shaping methods improve out-of-distribution detection by adjusting deep learning model features, and can be optimized using ID data only.


ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields

http://arxiv.org/abs/2402.00864v1

Compressor summary: ViCA-NeRF is a method for 3D editing with text instructions that ensures multi-view consistency using depth information and latent code alignment.


Geometry Transfer for Stylizing Radiance Fields

http://arxiv.org/abs/2402.00863v1

Compressor summary: Geometry Transfer is a new method for 3D style transfer that uses depth maps to extract a style guide and apply geometric deformation to radiance fields, resulting in more expressive and accurate stylizations.


Evaluating Large Language Models for Generalization and Robustness via Data Compression

http://arxiv.org/abs/2402.00861v1

Compressor summary: The authors propose a lossless data compression evaluation method for large language models that tests their generalization and robustness using data from different sources split by training cutoff dates.


Can Large Language Models Understand Context?

http://arxiv.org/abs/2402.00858v1

Compressor summary: This paper introduces a context understanding benchmark for Large Language Models (LLMs) and evaluates their performance in different scenarios, such as in-context learning pretraining and quantized models.


Early Time Classification with Accumulated Accuracy Gap Control

http://arxiv.org/abs/2402.00857v1

Compressor summary: The paper introduces a statistical framework to control the accuracy gap between full and early-time classification by using a data-driven stopping rule.


Towards Efficient and Exact Optimization of Language Model Alignment

http://arxiv.org/abs/2402.00856v1

Compressor summary: The paper proposes a new method for optimizing language models based on human preferences that avoids the drawbacks of previous methods and shows its effectiveness in experiments.


SymbolicAI: A framework for logic-based approaches combining generative models and solvers

http://arxiv.org/abs/2402.00854v1

Compressor summary: SymbolicAI is a framework that combines generative AI with symbolic reasoning using logic, enabling seamless integration of models, task execution, data manipulation, and evaluation of computational graphs.


LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields

http://arxiv.org/abs/2402.00853v1

Compressor summary: LTAU is a fast and accurate uncertainty quantification method for deep learning models that uses CDFs of per-sample errors and latent space distance search.


Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

http://arxiv.org/abs/2402.00851v1

Compressor summary: The authors propose a data augmentation technique for convolutional neural networks that uses Raman spectroscopy to measure cell densities, substrate- and product concentrations in complex biological processes, improving model performance and robustness.


Score-based Causal Representation Learning: Linear and General Transformations

http://arxiv.org/abs/2402.00849v1

Compressor summary: The paper proposes a score-based class of algorithms for causal representation learning under nonparametric latent causal models with unknown transformations, ensuring identifiability and achievability through stochastic hard or soft interventions.


BootsTAP: Bootstrapped Training for Tracking-Any-Point

http://arxiv.org/abs/2402.00847v1

Compressor summary: The authors improve a tracking-any-point model by using large amounts of unlabeled real-world data and achieve state-of-the-art results.


Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

http://arxiv.org/abs/2402.00841v1

Compressor summary: The paper compares the performance of smaller, fine-tuned language models with larger, zero-shot ones on meeting summarization tasks and finds that FLAN-T5, a 780M parameter model, is a cost-efficient alternative for real-world deployment.


OLMo: Accelerating the Science of Language Models

http://arxiv.org/abs/2402.00838v1

Compressor summary: OLMo is an open language model with full transparency and access to its training data, architecture, and development, aiming to enable scientific study and innovation in NLP research.


ALISON: Fast and Effective Stylometric Authorship Obfuscation

http://arxiv.org/abs/2402.00835v1

Compressor summary: ALISON is a fast and effective authorship obfuscation method that uses unique stylometric features to protect privacy while preserving text semantics.


Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

http://arxiv.org/abs/2402.00827v1

Compressor summary: Emo-Avatar is a method to create high-quality, dynamic portrait videos with minimal customization and data requirements using deferred neural rendering and a two-stage pipeline.


SLIM: Skill Learning with Multiple Critics

http://arxiv.org/abs/2402.00823v1

Compressor summary: SLIM is a multi-critic actor-critic approach that improves latent-variable skill discovery for robotic manipulation by combining multiple reward functions and achieving better performance than existing methods in tabletop manipulation tasks.


Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

http://arxiv.org/abs/2402.00816v1

Compressor summary: The paper extends a shielding technique called approximate model-based shielding (AMBS) to handle continuous state and action spaces, and introduces new penalties for improved stability.


Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI

http://arxiv.org/abs/2402.00809v1

Compressor summary: This paper argues that Bayesian deep learning can enhance deep learning capabilities in various settings and explores promising research directions for its future development.


Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching

http://arxiv.org/abs/2402.00807v1

Compressor summary: The authors propose a method to reduce the size of deep generative models for offline reinforcement learning by using data augmentation and knowledge distillation, achieving similar or better results than existing methods on several benchmarks.


Signal Quality Auditing for Time-series Data

http://arxiv.org/abs/2402.00803v1

Compressor summary: The authors present an open-source software toolkit for assessing and improving the quality of time-series data used in AI applications like Predictive Maintenance, to prevent incorrect decisions due to hardware or software failures.


Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

http://arxiv.org/abs/2402.00798v1

Compressor summary: The paper proposes a Formal-LLM framework that integrates natural language and formal language to create controllable AI agents for complex tasks, improving planning performance and validity.


LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

http://arxiv.org/abs/2402.00795v1

Compressor summary: The paper investigates how large language models can predict dynamical systems' behavior without fine-tuning or prompt engineering, finding that their accuracy increases with context window length.


ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

http://arxiv.org/abs/2402.00794v1

Compressor summary: ReAGent is a model-agnostic feature attribution method for generative language models that updates token importance in a recursive manner, providing more faithful importance distributions than existing methods.


Distinguishing the Indistinguishable: Human Expertise in Algorithmic Prediction

http://arxiv.org/abs/2402.00793v1

Compressor summary: The text introduces a framework for using human expertise to help algorithms make better predictions, especially on specific instances where human judgment is superior.


Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

http://arxiv.org/abs/2402.00789v1

Compressor summary: Graph-Mamba is a new method that combines a state space model with node selection strategies to improve long-range context modeling in graph networks, achieving better performance and efficiency than existing methods.


CroissantLLM: A Truly Bilingual French-English Language Model

http://arxiv.org/abs/2402.00786v1

Compressor summary: CroissantLLM is a bilingual language model that trains on equal amounts of English and French data and performs well on various French tasks.


Dense Reward for Free in Reinforcement Learning from Human Feedback

http://arxiv.org/abs/2402.00782v1

Compressor summary: The authors propose a method to improve reinforcement learning for language models by using attention weights from the reward model to redistribute the reward, making the signal more informative and easier to optimize.


AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

http://arxiv.org/abs/2402.00769v1

Compressor summary: The paper proposes AnimateLCM, a fast and high-quality video generation model that decouples consistency learning for images and motions, and adapts existing adapters for various functions.


360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming

http://arxiv.org/abs/2402.00763v1

Compressor summary: 360-GS is a novel technique for panoramic rendering that addresses challenges in 3D Gaussian splatting by projecting Gaussians onto the tangent plane of the unit sphere and using layout priors to guide optimization.


Control-Theoretic Techniques for Online Adaptation of Deep Neural Networks in Dynamical Systems

http://arxiv.org/abs/2402.00761v1

Compressor summary: The authors propose control theory methods to update deep neural networks online, providing stability and transfer learning guarantees for applications such as controls.


Building Expressive and Tractable Probabilistic Generative Models: A Review

http://arxiv.org/abs/2402.00759v1

Compressor summary: The text surveys the progress and methods in tractable probabilistic generative modeling using Probabilistic Circuits (PCs), describing their trade-offs, design principles, algorithmic extensions, and challenges for deep and hybrid PCs.


GS++: Error Analyzing and Optimal Gaussian Splatting

http://arxiv.org/abs/2402.00752v1

Compressor summary: This paper proposes and validates an optimal projection strategy for 3D Gaussian Splatting that reduces artifacts and improves photo-realistic rendering quality.


Unlearnable Algorithms for In-context Learning

http://arxiv.org/abs/2402.00751v1

Compressor summary: The paper proposes ERASE, an efficient algorithm for exact unlearning of task adaptation data using few-shot examples with in-context learning for large language models.


Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model

http://arxiv.org/abs/2402.00746v1

Compressor summary: Heath-LLM is an innovative framework that combines large-scale feature extraction and medical knowledge trade-off scoring to improve intelligent healthcare by integrating health reports, adjusting weights based on expertise, and enhancing language models with semi-automated features.


Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

http://arxiv.org/abs/2402.00745v1

Compressor summary: The paper proposes Logic-Explainer, a hybrid neuro-symbolic framework that improves ethical NLI explanations by integrating LLMs with an external backward-chaining solver to refine, verify, and support their reasoning.


Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data

http://arxiv.org/abs/2402.00743v1

Compressor summary: The paper explores how transformers can learn from unstructured data in linear regression tasks, identifying components that facilitate in-context learning.


Transforming and Combining Rewards for Aligning Large Language Models

http://arxiv.org/abs/2402.00742v1

Compressor summary: The paper explores how to optimize language models based on human preferences using reward models, and proposes a transformed reward model that improves performance and allows combining multiple properties.


DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

http://arxiv.org/abs/2402.00740v1

Compressor summary: Our method uses neural rendering to decompose 4D scenes from monocular cameras into static and dynamic features, overcoming challenges like occlusion and limited 3D clues, and achieves higher-fidelity results than existing methods for single-view dynamic scene representation.


FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

http://arxiv.org/abs/2402.00738v1

Compressor summary: Key points: - The paper proposes a new reinforcement learning method for two-team zero-sum Markov games (2t0sMGs) - The method uses the individual-global-minimax principle to ensure coherence between minimax and greedy behaviors - The method factorizes the joint minimax Q function into individual ones and iteratively solves for them - The paper provides a theoretical analysis and empirical evaluation of the method Summary: The paper introduces FM3Q, a novel reinforcement learning method for 2t0sMGs that ensures coherence between minimax and greedy behaviors using the individual-global-minimax principle and factorizes the joint minimax Q function. The paper shows the convergence and superiority of FM3Q theoretically and empirically.


MobilityDL: A Review of Deep Learning From Trajectory Data

http://arxiv.org/abs/2402.00732v1

Compressor summary: The paper reviews deep learning methods for trajectory data, focusing on eight mobility use cases and analyzing their performance along the mobility data continuum.


Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation

http://arxiv.org/abs/2402.00728v1

Compressor summary: The text discusses the problem of multiple competing models in classification tasks that can lead to unfairness and presents a novel framework using dropout techniques to measure and mitigate this issue.


Automatic Segmentation of the Spinal Cord Nerve Rootlets

http://arxiv.org/abs/2402.00724v1

Compressor summary: The study presents an automatic method for segmenting spinal nerve rootlets from T2-weighted MRI scans using a 3D convolutional neural network, achieving good performance and low variability across different MRI vendors, sites, and sessions.


Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders

http://arxiv.org/abs/2402.00723v1

Compressor summary: T5VQVAE is a novel model that combines VQVAEs with T5 to improve semantic control and generation in NLP tasks.


Intent Assurance using LLMs guided by Intent Drift

http://arxiv.org/abs/2402.00715v1

Compressor summary: The paper proposes an assurance framework for intent-based networking that uses AI policies from large language models to detect and fix intent drift.


ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

http://arxiv.org/abs/2402.00712v1

Compressor summary: ChaosBench is a new physics-based benchmark to evaluate subseasonal-to-seasonal climate prediction models, which shows existing methods struggle with this challenging task.


Explaining Text Classifiers with Counterfactual Representations

http://arxiv.org/abs/2402.00711v1

Compressor summary: The paper proposes a method to generate counterfactuals for text classification by intervening in text representations, which overcomes the limitations of using plausible real-world events for texts.


Non-Exchangeable Conformal Language Generation with Nearest Neighbors

http://arxiv.org/abs/2402.00707v1

Compressor summary: The paper proposes a new method for generating text with statistical guarantees using non-exchangeable conformal prediction and nearest neighbors.


Combining the Strengths of Dutch Survey and Register Data in a Data Challenge to Predict Fertility (PreFer)

http://arxiv.org/abs/2402.00705v1

Compressor summary: The paper introduces two datasets, one with longitudinal survey data and the other with register data, to study and compare the predictability of fertility outcomes in the Netherlands, and announces a data challenge called PreFer starting in Spring 2024.


In-Bed Pose Estimation: A Review

http://arxiv.org/abs/2402.00700v1

Compressor summary: The text discusses human pose estimation in-bed monitoring applications, comparing unimodal and multimodal methods, and reviewing existing datasets and approaches to highlight limitations, challenges, and future directions.


Approximating Optimal Morphing Attacks using Template Inversion

http://arxiv.org/abs/2402.00695v1

Compressor summary: The paper proposes a new method to generate realistic face morphing attacks using face recognition models' embeddings and shows its effectiveness against various face recognition systems.


A Framework for Building Point Cloud Cleaning, Plane Detection and Semantic Segmentation

http://arxiv.org/abs/2402.00692v1

Compressor summary: Key points: - The paper presents a framework for point cloud cleaning, plane detection, and semantic segmentation for enhancing building modeling. - The framework uses adaptive threshold, RANSAC, PointNet, and deep learning techniques. - The results show improved accuracy and efficiency in building modeling tasks. Summary: The paper proposes a framework that uses various techniques to clean, detect planes, and segment semantically point clouds for better building modeling.


Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID

http://arxiv.org/abs/2402.00672v1

Compressor summary: The paper proposes a new method for person re-identification across different modalities (visible and infrared) that uses a Modality-Unified Label Transfer module and an Online Cross-memory Label Refinement module to improve cross-modality label associations and representation learning.


Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

http://arxiv.org/abs/2402.00667v1

Compressor summary: The paper explores two phases of superalignment under the W2SG framework to enhance weak supervision and ensure consistent AI behavior with human values and intentions.


Modeling Freight Mode Choice Using Machine Learning Classifiers: A Comparative Study Using the Commodity Flow Survey (CFS) Data

http://arxiv.org/abs/2402.00659v1

Compressor summary: The study compares machine learning classifiers and finds that Random Forest is the best at predicting freight mode choice based on shipment characteristics.


Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

http://arxiv.org/abs/2402.00658v1

Compressor summary: This paper proposes a method to improve LLMs' reasoning by learning from ranked trajectories and synthesized process rewards, achieving better results on logical reasoning benchmarks.


Improving the accuracy of freight mode choice models: A case study using the 2017 CFS PUF data set and ensemble learning techniques

http://arxiv.org/abs/2402.00654v1

Compressor summary: The study uses 2017 commodity flow data to build a high-performance freight mode choice model that improves accuracy by constructing local models, extracting geographical features, and applying ensemble learning methods.


Random Forest-Based Prediction of Stroke Outcome

http://arxiv.org/abs/2402.00638v1

Compressor summary: The authors use machine learning to create a predictive model for the long-term outcomes of stroke patients based on various factors.


Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird's-Eye-View

http://arxiv.org/abs/2402.00637v1

Compressor summary: The text proposes a novel multimodal fusion model that combines fisheye cameras and ultrasonic sensors for efficient obstacle perception in autonomous driving, especially under challenging conditions like low-light or glare.


Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

http://arxiv.org/abs/2402.00632v1

Compressor summary: This paper shows how direct speech-to-text translation systems can use prosody (tone of speech) to better translate Korean to English, outperforming traditional cascade systems.


CapHuman: Capture Your Moments in Parallel Universes

http://arxiv.org/abs/2402.00627v1

Compressor summary: The CapHuman framework generates realistic human portraits from a single reference photo using identity preservation, 3D facial prior, and text-to-image diffusion models.


Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

http://arxiv.org/abs/2402.00626v1

Compressor summary: Typographic attacks threaten large vision-language models (LVLMs), and the paper introduces a new benchmark and an effective self-generated attack method using GPT-4V.


Actor Identification in Discourse: A Challenge for LLMs?

http://arxiv.org/abs/2402.00620v1

Compressor summary: The text discusses challenges and methods for identifying political actors who make claims in public debates, highlighting the limitations of large language models and suggesting a hybrid approach.


Deep Clustering Using the Soft Silhouette Score: Towards Compact and Well-Separated Clusters

http://arxiv.org/abs/2402.00608v1

Compressor summary: The paper introduces soft silhoutte, a probabilistic approach to improve deep clustering by forming compact and well-separated clusters, and presents an autoencoder-based architecture for optimizing it.


Are Synthetic Time-series Data Really not as Good as Real Data?

http://arxiv.org/abs/2402.00607v1

Compressor summary: Key points: - Time-series data has quality issues, bias, and generalization problem - InfoBoost is a framework that synthesizes cross-domain time-series data with representation learning - InfoBoost enables model training without real data and universal feature extraction - InfoBoot overcomes interference and sampling window limitations Summary: InfoBoost is a novel framework that uses synthetic data to train models and extract features from time-series data, improving quality and generalization.


Dynamic Texture Transfer using PatchMatch and Transformers

http://arxiv.org/abs/2402.00606v1

Compressor summary: The paper proposes a model that combines PatchMatch and Transformers to transfer dynamic textures from one video to another by synthesizing the start frame, predicting patches, and merging them smoothly.


Uncertainty-Aware Partial-Label Learning

http://arxiv.org/abs/2402.00592v1

Compressor summary: The article introduces a new partial-label-learning algorithm based on Dempster-Shafer theory that produces well-calibrated uncertainty estimates and performs competitively in real-world applications.


Sandra -- A Neuro-Symbolic Reasoner Based On Descriptions And Situations

http://arxiv.org/abs/2402.00591v1

Compressor summary: Sandra is a neuro-symbolic reasoner that combines vectorial representations with deductive reasoning using the Description and Situation ontology design pattern, achieving better performance and interpretability than baselines without increasing complexity.


Diffusion-based Light Field Synthesis

http://arxiv.org/abs/2402.00575v1

Compressor summary: LFdiff is a diffusion-based framework that synthesizes light fields from single RGB images using disparity estimation, position-aware warping, and disentanglement-based noise estimation.


A Single Graph Convolution Is All You Need: Efficient Grayscale Image Classification

http://arxiv.org/abs/2402.00564v1

Compressor summary: Key points: - The paper proposes a novel grayscale image classification approach using MLPs and graph convolutional layers - The approach exploits the lightweightness of MLPs, reduces problem complexity, and improves accuracy - The paper also develops an FPGA accelerator for the model with several optimizations - The approach achieves low latency and competitive or leading performance on benchmark datasets Summary: The paper presents a fast and accurate grayscale image classification method using MLPs and graph convolutional layers, along with an FPGA accelerator.


A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

http://arxiv.org/abs/2402.00559v1

Compressor summary: Reveal is a new dataset for evaluating automatic verifiers of complex reasoning steps in open-domain question answering tasks.


Masked Conditional Diffusion Model for Enhancing Deepfake Detection

http://arxiv.org/abs/2402.00541v1

Compressor summary: The paper proposes a new data augmentation technique, Masked Conditional Diffusion Model (MCDM), to generate diverse and realistic forged faces that enhance deepfake detection models' robustness and generalizability.


A Manifold Representation of the Key in Vision Transformers

http://arxiv.org/abs/2402.00534v1

Compressor summary: The paper proposes disentangling and manifold-structured keys in vision transformers, which improves their accuracy on various tasks.


Preconditioning for Physics-Informed Neural Networks

http://arxiv.org/abs/2402.00531v1

Compressor summary: Key points: - PINNs are neural networks for solving PDEs but have convergence and prediction issues - Condition number is a metric to diagnose and mitigate these issues by measuring sensitivity and stability - Preconditioning algorithm improves condition number and reduces errors in 18 PDE problems Summary: The paper proposes using condition number, a measure of sensitivity and stability, to improve PINNs' performance in solving PDEs with preconditioning.


Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

http://arxiv.org/abs/2402.00530v1

Compressor summary: Superfiltering uses a smaller model to select data for finetuning a larger model, improving instruction tuning efficiency and performance.


Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

http://arxiv.org/abs/2402.00522v1

Compressor summary: The paper studies how different parts of the Transformer model affect its ability to handle long and complex sequences, and identifies key parameters that influence its performance.


EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

http://arxiv.org/abs/2402.00518v1

Compressor summary: EE-Tuning is a method to improve large language models by adding tuned early-exit layers that require less resources and data, and it's released as open-source code.


EXMOS: Explanatory Model Steering Through Multifaceted Explanations and Data Configurations

http://arxiv.org/abs/2402.00491v1

Compressor summary: The text discusses how different types of explanations in interactive machine-learning systems help healthcare experts improve models by configuring data, and suggests that a hybrid fusion of both global model-centric and data-centric explanations is the most effective approach.


A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender Systems

http://arxiv.org/abs/2402.00485v1

Compressor summary: The paper proposes CP-FairRank, a re-ranking algorithm for recommender systems that considers fairness on both user and item sides, using group segmentation, model selection, and domain to adapt to various settings.


Bias Mitigating Few-Shot Class-Incremental Learning

http://arxiv.org/abs/2402.00481v1

Compressor summary: The paper proposes a novel method to address the accuracy imbalance issue in few-shot class-incremental learning by stimulating mapping ability, using dual-feature classification, and self-optimizing classifiers.


SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models

http://arxiv.org/abs/2402.00474v1

Compressor summary: SA-MDKIF is a framework that enhances general-purpose language models with medical knowledge through instruction tuning and skill adaptation, improving performance on various medical tasks.


RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway

http://arxiv.org/abs/2402.00468v1

Compressor summary: The article introduces RadDQN, a deep Q-learning based architecture that optimizes radiation exposure for autonomous UAVs using a radiation-aware reward function and exploration strategies.


Instruction Makes a Difference

http://arxiv.org/abs/2402.00453v1

Compressor summary: iDocVQA and LLaDoc improve instruction-following in document analysis tasks, but they are still far from matching human performance.


CPT: Competence-progressive Training Strategy for Few-shot Node Classification

http://arxiv.org/abs/2402.00450v1

Compressor summary: CPT is a novel curriculum learning method that adapts to task difficulty and improves few-shot node classification by GNNs.


Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

http://arxiv.org/abs/2402.00448v1

Compressor summary: The paper proposes a dual-student knowledge distillation (DSKD) architecture for unsupervised anomaly detection, which uses two inverted student networks to improve normal data consistency and anomaly representation.


A Survey of Data-Efficient Graph Learning

http://arxiv.org/abs/2402.00447v1

Compressor summary: The paper surveys data-efficient graph learning approaches that address the challenge of limited labeled data in graph neural networks.


Improving Dialog Safety using Socially Aware Contrastive Learning

http://arxiv.org/abs/2402.00446v1

Compressor summary: The paper proposes a dual-step fine-tuning process to teach conversational AI systems to produce safe and prosocial content in both adversarial and casual contexts using n-pair contrastive loss and datasets like MIC and ProsocialDialog.


Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

http://arxiv.org/abs/2402.00433v1

Compressor summary: The paper proposes a dynamic method to merge Transformer models for concurrent tasks using a weight-ensembling mixture of experts module, which adapts to each instance and reduces parameter interference.


Lightweight Pixel Difference Networks for Efficient Visual Representation Learning

http://arxiv.org/abs/2402.00422v1

Compressor summary: The paper proposes new types of convolutions, Pixel Difference Convolution and Binary PDC, that improve accuracy and efficiency in lightweight Deep Neural Networks for visual tasks like edge detection and object recognition.


From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models

http://arxiv.org/abs/2402.00421v1

Compressor summary: The text introduces two AI systems, PARIS and LE-PARIS, designed to help patent attorneys respond more efficiently and effectively to Office Actions, and validates their effectiveness through multiple studies.


Prompt-Time Symbolic Knowledge Capture with Large Language Models

http://arxiv.org/abs/2402.00414v1

Compressor summary: The paper explores how to use large language models to generate knowledge graphs from text prompts, using three methods and a synthetic dataset.


InfMAE: A Foundation Model in Infrared Modality

http://arxiv.org/abs/2402.00407v1

Compressor summary: This paper introduces InfMAE, a foundation model for infrared images, with a new dataset (Inf30), an information-aware masking strategy, and a multi-scale encoder and decoder to improve performance in downstream tasks.


Investigating Bias Representations in Llama 2 Chat via Activation Steering

http://arxiv.org/abs/2402.00402v1

Compressor summary: The paper investigates and tries to reduce gender, race, and religion biases in the Llama 2 7B Chat model using activation steering.


Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic Forecasting

http://arxiv.org/abs/2402.00397v1

Compressor summary: Key points: - Traffic forecasting is important but challenging due to data scarcity in many cities - MTPB is a solution that leverages similarities across diverse cities and uses a pre-training process, clustering, and meta-knowledge aggregation - MTPB outperforms existing methods and can improve cross-city few-shot forecasting Summary: MTPB is a novel framework for cross-city few-shot traffic forecasting that leverages similarities across cities, using pre-training, clustering, and meta-knowledge aggregation to overcome data scarcity and achieve superior performance.


Efficient Exploration for LLMs

http://arxiv.org/abs/2402.00396v1

Compressor summary: Efficient exploration helps improve large language models using human feedback, and double Thompson sampling with epistemic neural networks performs best.


Cumulative Distribution Function based General Temporal Point Processes

http://arxiv.org/abs/2402.00388v1

Compressor summary: The CuFun model is a novel deep temporal point process model that uses a cumulative distribution function and a monotonic neural network to capture complex behavioral patterns in event sequences, improving prediction accuracy and adaptability.


Computational Morphology and Lexicography Modeling of Modern Standard Arabic Nominals

http://arxiv.org/abs/2402.00385v1

Compressor summary: The paper presents a new model for Arabic nominals that handles their complex morphology and irregular paradigms, and shows improved performance over existing tools.


What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

http://arxiv.org/abs/2402.00371v1

Compressor summary: This paper explores how large language models can be used to improve and evade social media bot detection, showing their potential for both applications but also risks.


Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

http://arxiv.org/abs/2402.00367v1

Compressor summary: The paper proposes two novel approaches to identify and address knowledge gaps in large language models using model collaboration, improving the accuracy of abstaining from answering questions when unsure.


Safety of Multimodal Large Language Models on Images and Text

http://arxiv.org/abs/2402.00357v1

Compressor summary: This paper surveys current work on evaluating, attacking, and defending the safety of Multimodal Large Language Models (MLLMs) in image and text domains, highlighting open challenges.


Adaptive Primal-Dual Method for Safe Reinforcement Learning

http://arxiv.org/abs/2402.00355v1

Compressor summary: Adaptive primal-dual methods optimize policy in safe reinforcement learning by adjusting learning rates based on Lagrangian multipliers, improving convergence and stability.


High-Quality Medical Image Generation from Free-hand Sketch

http://arxiv.org/abs/2402.00353v1

Compressor summary: The paper introduces Sketch2MedI, a model that can generate realistic medical images from free-hand sketches by encoding them into StyleGAN's latent space, outperforming other models in this task.


Machine Unlearning for Image-to-Image Generative Models

http://arxiv.org/abs/2402.00351v1

Compressor summary: This paper introduces a framework and algorithm for machine unlearning of image-to-image generative models, ensuring data privacy while maintaining performance.


ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update

http://arxiv.org/abs/2402.00348v1

Compressor summary: The paper proposes O-DICE, a new method for offline RL and IL that uses orthogonal-gradient updates to improve the performance of DICE-based methods by resolving gradient conflicts and imposing state-action-level constraint in a corrected way.


Diverse Explanations from Data-driven and Domain-driven Perspectives for Machine Learning Models

http://arxiv.org/abs/2402.00347v1

Compressor summary: This paper highlights the challenges of explaining machine learning models in scientific domains and proposes a method to find accurate models with consistent explanations that meet stakeholders' needs and reinforce physical laws.


IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators

http://arxiv.org/abs/2402.00345v1

Compressor summary: The study introduces IndiVec, a general media bias detection framework that uses large language models and vector databases to adapt and excel in detecting biases across diverse datasets.


Recasting Regional Lighting for Shadow Removal

http://arxiv.org/abs/2402.00341v1

Compressor summary: Our method removes shadows by estimating local lighting and restoring textures conditioned on the corrected illumination, achieving better results than previous methods.


Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features

http://arxiv.org/abs/2402.00332v1

Compressor summary: The paper compares two training algorithms for two-layer neural networks and shows that ARFF has less spectral bias and similar robustness to SGD.


PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks

http://arxiv.org/abs/2402.00326v1

Compressor summary: PirateNets is a novel deep learning framework for physics-informed neural networks that uses an adaptive residual connection to enable efficient and stable training of deeper models, achieving state-of-the-art results.


A Consistent Lebesgue Measure for Multi-label Learning

http://arxiv.org/abs/2402.00324v1

Compressor summary: The paper proposes a new multi-label learning method that can handle non-differentiable loss functions and proves its consistency, achieving state-of-the-art results without complex features.


Bias in Opinion Summarisation from Pre-training to Adaptation: A Case Study in Political Bias

http://arxiv.org/abs/2402.00322v1

Compressor summary: The study investigates bias in abstractive opinion summarisation models and suggests that fine-tuning with diverse topics reduces bias.


SmartCooper: Vehicular Collaborative Perception with Adaptive Fusion and Judger Mechanism

http://arxiv.org/abs/2402.00321v1

Compressor summary: SmartCooper is an adaptive framework for collaborative perception in autonomous vehicles that optimizes communication, compression, and filters detrimental data to improve road safety and efficiency.


SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling

http://arxiv.org/abs/2402.00319v1

Compressor summary: The paper presents SCO-VIST, a framework that generates coherent and engaging stories from image sequences using object relations and social interaction knowledge, outperforming existing methods in multiple metrics.


Online Distribution Learning with Local Private Constraints

http://arxiv.org/abs/2402.00315v1

Compressor summary: The paper studies online estimation of distribution-valued functions with unbounded label sets under local differential privacy and shows a different growth rate of KL-risk compared to bounded label sets.


Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

http://arxiv.org/abs/2402.00313v1

Compressor summary: The paper presents a new reinforcement learning method using stochastic planning that incorporates risk preference and works well for control problems with delayed feedback, including Atari games.


An Accurate and Low-Parameter Machine Learning Architecture for Next Location Prediction

http://arxiv.org/abs/2402.00306v1

Compressor summary: The paper proposes an energy-efficient machine learning architecture that predicts users' next locations with high accuracy, low parameters, small size, and fast training time.


Self-supervised learning of video representations from a child's perspective

http://arxiv.org/abs/2402.00300v1

Compressor summary: The study shows that self-supervised video models can effectively learn action concepts and object representations from children's egocentric visual experience, suggesting that temporal aspects of a child's internal model of the world may be learned using generic learning algorithms.


Comparative Evaluation of Traditional and Deep Learning-Based Segmentation Methods for Spoil Pile Delineation Using UAV Images

http://arxiv.org/abs/2402.00295v1

Compressor summary: This paper evaluates various segmentation methods for image-based analysis of spoil piles in mining using remotely acquired data and finds that a morphology-based deep learning approach outperforms other techniques.


FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

http://arxiv.org/abs/2402.00293v1

Compressor summary: FineBio is a new dataset of videos showing people performing biological experiments with detailed annotations for activity understanding and hand-object interaction recognition.


Multimodal Embodied Interactive Agent for Cafe Scene

http://arxiv.org/abs/2402.00290v1

Compressor summary: MEIA is a multimodal agent that can translate natural language tasks into actions using a memory module to integrate visual-language information and large models.


Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

http://arxiv.org/abs/2402.00281v1

Compressor summary: This paper proposes a method to train deep facial expression recognition models with spatial action units cues, making them more interpretable without sacrificing accuracy or requiring extra annotations.


A Crucial Parameter for Rank-Frequency Relation in Natural Languages

http://arxiv.org/abs/2402.00271v1

Compressor summary: The text proposes a more accurate model for word frequency in natural languages, showing that the parameter $\gamma$ measures vocabulary growth resistance, and introduces a method to estimate it using a "zeroth word".


Does \textsc{DetectGPT} Fully Utilize Perturbation? Selective Perturbation on Model-Based Contrastive Learning Detector would be Better

http://arxiv.org/abs/2402.00263v1

Compressor summary: Key points: - Large language models (LLMs) raise concerns about abuse - DetectGPT is a zero-shot text detector with random perturbation and logit regression - \modelname{} is a novel detector that uses selective strategy perturbation and multi-pair contrastive learning - \modelname{} outperforms SOTA by 1.20% on average on four datasets Summary: \modelname{} is a new text detector that improves over DetectGPT by using selective perturbation and contrastive learning to capture implicit patterns and reduce noise.


Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective

http://arxiv.org/abs/2402.00262v1

Compressor summary: The text discusses integrating Large Language Models into Agent-based Modeling to enhance anthropomorphism in complex systems simulations, but highlights the need for explainability and causal analysis in social sciences.


Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse Maps

http://arxiv.org/abs/2402.00261v1

Compressor summary: The paper presents Linear Algebra techniques to model neural network layers, visualize weight spaces and kernels, and find inputs for invertible networks.


Multi-group Learning for Hierarchical Groups

http://arxiv.org/abs/2402.00258v1

Compressor summary: The text describes a multi-group learning model for hierarchical data and presents an algorithm that produces interpretable decision trees with good generalization.


Vertical Symbolic Regression via Deep Policy Gradient

http://arxiv.org/abs/2402.00254v1

Compressor summary: VSR-DPG combines vertical symbolic regression with deep policy gradient to discover ground-truth equations involving multiple input variables more effectively than previous methods.


A Survey on Hallucination in Large Vision-Language Models

http://arxiv.org/abs/2402.00253v1

Compressor summary: The paper surveys the challenges, evaluation, causes, and mitigation of hallucination in Large Vision-Language Models (LVLMs).


Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning

http://arxiv.org/abs/2402.00251v1

Compressor summary: The paper presents a method for uncertainty estimation in LLMs and a decision-making agent design that allows efficient use of black-box proprietary LLMs in AI agent development.


LRDif: Diffusion Models for Under-Display Camera Emotion Recognition

http://arxiv.org/abs/2402.00250v1

Compressor summary: LRDif is a novel framework that uses diffusion models and transformers to recognize facial expressions from under-display camera images, overcoming their image degradation challenges.