arxiv compressed, 2024-03-04

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-04 generated by the compressor, my personal LLM-based project.


DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

http://arxiv.org/abs/2402.19481v1

Compressor summary: DistriFusion is a method that speeds up image synthesis with diffusion models by using parallelism, asynchronous communication, and context reuse across multiple GPUs without sacrificing quality.


Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

http://arxiv.org/abs/2402.19479v1

Compressor summary: The authors propose an automatic approach to create a large video dataset with high-quality captions using multimodal inputs and cross-modality teacher models, and show its effectiveness on three downstream tasks.


Learning a Generalized Physical Face Model From Data

http://arxiv.org/abs/2402.19477v1

Compressor summary: The paper presents a simulation-free method for learning a generalized physical face model from 3D face data, enabling easy fitting to any identity and realistic physics-based facial animation with minimal artist input and network training.


The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

http://arxiv.org/abs/2402.19474v1

Compressor summary: The All-Seeing Project V2 introduces a new model and dataset for understanding object relations in images, improving relation comprehension in multi-modal large language models.


Retrieval-Augmented Generation for AI-Generated Content: A Survey

http://arxiv.org/abs/2402.19473v1

Compressor summary: The paper reviews retrieval-augmented generation (RAG), a technique that enhances artificial intelligence generated content (AIGC) by integrating information retrieval to improve accuracy and robustness, and surveys its applications, benchmarks, and future research directions.


Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

http://arxiv.org/abs/2402.19472v1

Compressor summary: Lifelong benchmarks mitigate overfitting in machine learning by expanding test sets and using Sort & Search framework for efficient evaluation of models.


Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling

http://arxiv.org/abs/2402.19471v1

Compressor summary: The study uses a game-based task to investigate how people ask questions with limited resources and compares language models' abilities to generate informative questions that mirror human performance.


Teaching Large Language Models an Unseen Language on the Fly

http://arxiv.org/abs/2402.19167v1

Compressor summary: We propose DiPMT++, a framework that teaches large language models to translate new languages using only a dictionary and 5K parallel sentences, improving translation quality for Zhuang and enabling human-assisted translation for unseen languages.


MemoNav: Working Memory Model for Visual Navigation

http://arxiv.org/abs/2402.19161v1

Compressor summary: MemoNav is a memory model for image-goal navigation that uses short-term, long-term, and working memory to efficiently explore unfamiliar environments and navigate to goals indicated by images.


Effective Message Hiding with Order-Preserving Mechanisms

http://arxiv.org/abs/2402.19160v1

Compressor summary: StegaFormer is a new method for hiding secret messages in images using MLPs that preserves the order of message bits and fuses them with image features, achieving better recovery accuracy, message capacity, and imperceptibility.


Trajectory Consistency Distillation

http://arxiv.org/abs/2402.19159v1

Compressor summary: The paper proposes Trajectory Consistency Distillation (TCD) to improve the Latent Consistency Model (LCM) for text-to-image synthesis by addressing errors in three areas and using strategic stochastic sampling.


Beyond Language Models: Byte Models are Digital World Simulators

http://arxiv.org/abs/2402.19155v1

Compressor summary: bGPT is a powerful model that uses next byte prediction to simulate and diagnose various aspects of the digital world, achieving high accuracy in tasks such as converting music notation and executing CPU operations.


Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts

http://arxiv.org/abs/2402.19150v1

Compressor summary: Typographic Attack is a security vulnerability for LMMs, but they can partially distinguish visual contents and typos in images, and using more informative texts or prompts can improve their performance.


A SAM-guided Two-stream Lightweight Model for Anomaly Detection

http://arxiv.org/abs/2402.19145v1

Compressor summary: The paper presents a lightweight and efficient anomaly detection model that uses Segment Anything (SAM) to localize unseen anomalies and diverse patterns, achieving high performance on various datasets.


Weakly Supervised Monocular 3D Detection with a Single-View Image

http://arxiv.org/abs/2402.19144v1

Compressor summary: SKD-WM3D is a weakly supervised monocular 3D detection framework that uses self-knowledge distillation to achieve precise 3D object localization from a single image without any 3D annotations or extra training data.


ProtoP-OD: Explainable Object Detection with Prototypical Parts

http://arxiv.org/abs/2402.19142v1

Compressor summary: The paper introduces prototypical parts, a way to make detection transformers more interpretable by constructing local features that align with object classes and allowing visual inspection of the model's perception.


Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations

http://arxiv.org/abs/2402.19133v1

Compressor summary: The paper explores using eye-tracking recordings as an alternative to manual annotations for evaluating explainability methods in NLP tasks, and compares different language models and languages.


BigGait: Learning Gait Representation You Want by Large Vision Models

http://arxiv.org/abs/2402.19122v1

Compressor summary: BigGait is a novel framework that uses large vision models to learn implicit gait features in an unsupervised way, improving gait recognition performance and reducing annotation costs.


VIXEN: Visual Text Comparison Network for Image Difference Captioning

http://arxiv.org/abs/2402.19119v1

Compressor summary: VIXEN summarizes textual differences between images to detect manipulation.


Continuous Sign Language Recognition Based on Motor attention mechanism and frame-level Self-distillation

http://arxiv.org/abs/2402.19118v1

Compressor summary: The paper proposes a novel motor attention mechanism to capture dynamic changes in sign language expression and applies self-distillation to improve feature extraction for continuous sign language recognition (CSLR), achieving state-of-the-art results on three datasets.


How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

http://arxiv.org/abs/2402.19116v1

Compressor summary: This paper proposes a method called Implicit-Enhanced Causal Inference to improve weakly-supervised phrase grounding by modeling implicit relations and using intervention and counterfactual techniques, leading to better performance than existing models and multimodal LLMs.


DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

http://arxiv.org/abs/2402.19108v1

Compressor summary: DeepEraser is a recurrent deep network that erases text from images using iterative refinements and custom mask generation, achieving strong results on several benchmarks.


Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

http://arxiv.org/abs/2402.19103v1

Compressor summary: The paper analyzes how false premises cause language models to generate incorrect text and proposes FAITH, a method to limit attention heads and reduce hallucinations.


FlatNAS: optimizing Flatness in Neural Architecture Search for Out-of-Distribution Robustness

http://arxiv.org/abs/2402.19102v1

Compressor summary: FlatNAS is a novel NAS method that optimizes NN performance, OOD robustness, and parameter count using only in-distribution data.


TEncDM: Understanding the Properties of Diffusion Model in the Space of Language Model Encodings

http://arxiv.org/abs/2402.19097v1

Compressor summary: The authors propose a new text diffusion model (TEncDM) that uses language model encodings and a Transformer decoder for better text generation and reduces the number of denoising steps.


Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection

http://arxiv.org/abs/2402.19091v1

Compressor summary: Key points: - Synthetic image generation poses risks for online information integrity and safety - Existing SID methods focus on high-level visual semantics but need fine-grained details - The method uses intermediate CLIP layers and a learnable vector space to improve SID performance - The method outperforms state-of-the-art by 10.6% with minimal training time Summary: The authors propose a novel synthetic image detection method that leverages intermediate CLIP layers and a forgery-aware vector space, achieving significant improvement over existing methods with fast training.


Best Arm Identification with Resource Constraints

http://arxiv.org/abs/2402.19090v1

Compressor summary: The text studies a problem where an agent needs to find the best arm under limited resources and proposes a novel algorithm that converges quickly, with different rates depending on resource uncertainty.


Survey in Characterization of Semantic Change

http://arxiv.org/abs/2402.19088v1

Compressor summary: The text discusses how language evolution affects computational linguistics algorithms, and surveys existing methods to characterize semantic changes in words.


Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

http://arxiv.org/abs/2402.19085v1

Compressor summary: The paper proposes controllable preference optimization (CPO) to align AI models with human preferences on multiple objectives, reducing trade-offs and improving performance.


VideoMAC: Video Masked Autoencoders Meet ConvNets

http://arxiv.org/abs/2402.19082v1

Compressor summary: VideoMAC combines video masked autoencoders with ConvNets to improve visual representation learning for videos, outperforming ViT-based approaches on downstream tasks.


Smooth Tchebycheff Scalarization for Multi-Objective Optimization

http://arxiv.org/abs/2402.19078v1

Compressor summary: The paper proposes a new smooth Tchebycheff scalarization approach for gradient-based multi-objective optimization that has good theoretical properties and lower computational complexity than existing methods.


Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

http://arxiv.org/abs/2402.19076v1

Compressor summary: The text discusses how large language models struggle with relation extraction tasks when entity mentions are replaced or modified, suggesting they rely on shortcut features rather than semantic understanding.


TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

http://arxiv.org/abs/2402.19072v1

Compressor summary: TimeXer is a novel framework that leverages external information to enhance the forecasting of endogenous variables using the Transformer architecture with self-attention and cross-attention mechanisms.


VEnvision3D: A Synthetic Perception Dataset for 3D Multi-Task Model Research

http://arxiv.org/abs/2402.19059v1

Compressor summary: VEnvision3D is a large synthetic 3D perception dataset for multi-task learning, aiming to facilitate the development of unified foundation models in computer vision research.


Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study

http://arxiv.org/abs/2402.19052v1

Compressor summary: The study evaluates the performance of large language models in summarizing various counseling components using MentalCLOUDS dataset, finding that Mistral performs best overall with room for improvement.


Theoretical Foundations of Deep Selective State-Space Models

http://arxiv.org/abs/2402.19047v1

Compressor summary: Key points: - Structured state-space models (SSMs) like S4 are effective for sequential data modeling - Deep SSMs with multiplicative interactions can outperform attention-based transformers in accuracy and efficiency - The paper provides theoretical grounding using Rough Path Theory to explain the success of selective state-space models like Mamba Summary: The paper explains how deep SSMs with selectivity mechanisms, which project hidden states from input signatures, can surpass attention-based transformers in sequential data modeling and provides a theoretical framework using Rough Path Theory.


Atmospheric Turbulence Removal with Video Sequence Deep Visual Priors

http://arxiv.org/abs/2402.19041v1

Compressor summary: Key points: - Atmospheric turbulence distorts visual imagery - Model-based methods have artefacts, deep learning methods need diverse datasets - Self-supervised learning method uses accelerated DIP with temporal information - Method improves visual quality of raw or pre-processed sequences Summary: The paper proposes a self-supervised learning method that uses accelerated DIP and temporal information to improve the visual quality of sequences affected by atmospheric turbulence distortions.


Progressive Contrastive Learning with Multi-Prototype for Unsupervised Visible-Infrared Person Re-identification

http://arxiv.org/abs/2402.19026v1

Compressor summary: The paper proposes a new method for matching people in infrared and visible images without annotations, using progressive contrastive learning with multi-prototype to address disparity and retain natural feature variety.


Combination of Weak Learners eXplanations to Improve Random Forest eXplicability Robustness

http://arxiv.org/abs/2402.19025v1

Compressor summary: The paper proposes a method to improve the consistency of explanations for predictions made by combining multiple weak learners using discriminative averaging, and shows its effectiveness on SHAP and Random Forest ensembles.


Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

http://arxiv.org/abs/2402.19014v1

Compressor summary: The paper proposes a contrastive learning framework, DoCo, to improve visual representation in text-rich scenarios for large visual-language models, enhancing their performance in visual document understanding tasks.


Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding

http://arxiv.org/abs/2402.19009v1

Compressor summary: DiLED is a generalized diffusion model with learnable encoder-decoder that enhances performance and broad applicability across different data types.


DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

http://arxiv.org/abs/2402.19007v1

Compressor summary: The DOZE dataset provides a more realistic challenge for autonomous agents to navigate in dynamic environments with diverse objects and obstacles.


RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

http://arxiv.org/abs/2402.19004v1

Compressor summary: RSAM-Seg is a modified SAM model that improves image segmentation for remote sensing tasks and can help identify missing data in ground truth.


GoalNet: Goal Areas Oriented Pedestrian Trajectory Prediction

http://arxiv.org/abs/2402.19002v1

Compressor summary: GoalNet predicts pedestrian goals and trajectories using scene context and observed trajectory, outperforming previous methods by a large margin.


Analysis of the Two-Step Heterogeneous Transfer Learning for Laryngeal Blood Vessel Classification: Issue and Improvement

http://arxiv.org/abs/2402.19001v1

Compressor summary: The study explores how using intermediate domains affects two-step transfer learning for classifying medical images and proposes a step-wise fine-tuning method to improve performance.


COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

http://arxiv.org/abs/2402.18998v1

Compressor summary: The paper proposes a novel method for few-shot anomaly detection that uses pre-trained models, contrastive training, and cross-instance pairs to learn suitable representations for the task.


Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

http://arxiv.org/abs/2402.18995v1

Compressor summary: The paper introduces a new dynamical system that improves count time series modeling by capturing overdispersed behaviors, learning latent structure, and enabling fast inference and prediction.


Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

http://arxiv.org/abs/2402.18975v1

Compressor summary: This paper introduces Continuous OBB, a novel representation method that solves the discontinuity issue in Oriented Object Detection, and shows its effectiveness on the DOTA dataset using Faster-RCNN as a baseline model.


Graph Generation via Spectral Diffusion

http://arxiv.org/abs/2402.18974v1

Compressor summary: GRASP is a fast and accurate graph generative model that uses spectral decomposition, denoising, and node features to create realistic graphs.


OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

http://arxiv.org/abs/2402.18969v1

Compressor summary: The paper introduces OHTA, a method that creates realistic and personalized hand avatars from one image using data-driven hand priors, and shows its applications in various scenarios.


Towards Out-of-Distribution Detection for breast cancer classification in Point-of-Care Ultrasound Imaging

http://arxiv.org/abs/2402.18960v1

Compressor summary: The study compares three methods for detecting unreliable assessments in medical image analysis using deep learning and finds that the ensemble method performs best.


Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching

http://arxiv.org/abs/2402.18958v1

Compressor summary: The authors propose a novel active learning method (SSOD-AT) that uses an RoI comparison module to generate high-confidence pseudo-labels for object detection in remote sensing images and improves performance over state-of-the-art methods.


WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts

http://arxiv.org/abs/2402.18956v1

Compressor summary: The paper proposes WWW, a framework that explains neural network decisions by discovering concepts, creating localized concept maps and heatmaps, and predicting uncertainty.


Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

http://arxiv.org/abs/2402.18951v1

Compressor summary: Key points: - Open-world video recognition is hard due to environment variations - Foundation models have rich knowledge but need proper application - PCA pipeline transfers external multimodal knowledge to improve recognition - PCA has three stages: Percept, Chat, and Adapt - PCA achieves state-of-the-art results on three benchmarks Summary: PCA is a generic pipeline that uses foundation models' knowledge to enhance open-world video recognition, by transferring visual and textual information in three stages.


PopALM: Popularity-Aligned Language Models for Social Media Trendy Response Prediction

http://arxiv.org/abs/2402.18950v1

Compressor summary: The paper proposes a method to predict popular user replies on social media using reinforcement learning and curriculum learning.


Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

http://arxiv.org/abs/2402.18946v1

Compressor summary: The paper proposes an adaptive online learning framework with a sparse GP model and a safety filter based on HOCBFs to ensure safe control in non-stationary environments.


Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

http://arxiv.org/abs/2402.18933v1

Compressor summary: The paper proposes a new method to learn effective structural image representations for multimodality image registration using Deep Neighbourhood Self-similarity and anatomy-aware contrastive learning, improving discrimination and accuracy compared to existing methods.


Navigating Beyond Dropout: An Intriguing Solution Towards Generalizable Image Super Resolution

http://arxiv.org/abs/2402.18929v1

Compressor summary: Our paper investigates the effects of Dropout and proposes a new training strategy for Blind Super-Resolution that preserves fine details better than Dropout.


Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

http://arxiv.org/abs/2402.18927v1

Compressor summary: The paper presents a real-time video analysis system using edge computing that adapts to network conditions and object detection with reinforcement learning methods.


PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds

http://arxiv.org/abs/2402.18925v1

Compressor summary: Key points: - Event cameras record scene dynamics with high temporal resolution and low-level illumination - Existing methods fuse intensity and event data at pixel level, ignoring high-level patterns - PCDepth discretizes the scene into high-level patterns and integrates them across modalities - PCDepth achieves more accurate monocular depth estimation than existing methods, especially in nighttime scenarios Summary: PCDepth is a novel approach that leverages high-level patterns from event cameras and intensity images for better monocular depth estimation, outperforming state-of-the-art methods in low-light conditions.


Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition

http://arxiv.org/abs/2402.18923v1

Compressor summary: The paper proposes a large-scale speech recognition model with an inappropriate pause prediction layer to detect and assess inappropriate pauses in dysarthric speech, which affects stroke patients' speech intelligibility.


A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

http://arxiv.org/abs/2402.18922v1

Compressor summary: The authors propose a simple and versatile network (SENet) for camouflaged object detection and salient object detection, using a vision Transformer encoder-decoder structure, a local information capture module, and a dynamic weighted loss function.


Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

http://arxiv.org/abs/2402.18920v1

Compressor summary: The paper presents a unified framework that predicts point-wise correspondences and shape interpolation between 3D shapes using a combination of deep functional maps and classical surface deformation models, achieving better performance than previous methods.


Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation

http://arxiv.org/abs/2402.18919v1

Compressor summary: DaC improves image classification robustness by identifying causal components, intervening on images, and retraining models to address correlation shift caused by compositional nature of images.


SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection

http://arxiv.org/abs/2402.18918v1

Compressor summary: This paper proposes a novel heterogeneous feature fusion network for freespace detection with improved accuracy and efficiency, addressing limitations in previous techniques.


Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization

http://arxiv.org/abs/2402.18917v1

Compressor summary: The paper proposes efficient algorithms for assortment optimization with user choices modeled by Plackett Luce, and provides a novel concentration guarantee using Pairwise Rank-Breaking to minimize regret.


AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

http://arxiv.org/abs/2402.18913v1

Compressor summary: The paper introduces AdaMergeX, a cross-lingual transfer method that uses adaptive adapter merging to improve performance on target tasks by decoupling task ability and language ability.


DIGIC: Domain Generalizable Imitation Learning by Causal Discovery

http://arxiv.org/abs/2402.18910v1

Compressor summary: The paper proposes DIGIC, a framework that uses causal discovery to identify causal features for domain generalizable imitation learning with single-domain data, outperforming cross-domain variation-based methods.


Updating Language Models with Unstructured Facts: Towards Practical Knowledge Editing

http://arxiv.org/abs/2402.18909v1

Compressor summary: Unstructured Knowledge Editing (UKE) is a new benchmark that evaluates language models' ability to update their knowledge using unstructured texts, which is more practical than current methods based on structured facts.


On the Convergence of Differentially-Private Fine-tuning: To Linearly Probe or to Fully Fine-tune?

http://arxiv.org/abs/2402.18905v1

Compressor summary: The paper analyzes the training dynamics of differentially private (DP) linear probing and full fine-tuning, explores sequential fine-tuning, provides theoretical insights into DP fine-tuning convergence, and establishes a utility curve for privacy budget allocation.


Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

http://arxiv.org/abs/2402.18892v1

Compressor summary: AKGVP is a method that improves object-goal navigation by aligning knowledge graph with visual perception using continuous modeling and natural language pre-training.


BP-DeepONet: A new method for cuffless blood pressure estimation using the physcis-informed DeepONet

http://arxiv.org/abs/2402.18886v1

Compressor summary: The study proposes a novel framework using physics-informed DeepONet approach to predict continuous and accurate arterial blood pressure waveforms without invasive methods.


Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features

http://arxiv.org/abs/2402.18884v1

Compressor summary: This paper studies how over-parameterized deep neural networks behave under supervised contrastive loss and reveals their structural patterns using an analytical approach.


Dose Prediction Driven Radiotherapy Paramters Regression via Intra- and Inter-Relation Modeling

http://arxiv.org/abs/2402.18879v1

Compressor summary: The paper proposes a two-stage framework using CNN and transformer to predict dose maps and radiotherapy parameters, incorporating intra-relation and inter-relation models for accurate parameter regression.


Principal Component Analysis as a Sanity Check for Bayesian Phylolinguistic Reconstruction

http://arxiv.org/abs/2402.18877v1

Compressor summary: The paper proposes a simple method to check if the tree model assumption is valid for reconstructing language evolution, by projecting the tree onto a space generated by principal component analysis.


Loss-aware Curriculum Learning for Heterogeneous Graph Neural Networks

http://arxiv.org/abs/2402.18875v1

Compressor summary: The paper proposes a loss-aware training schedule (LTS) that improves the performance and robustness of heterogeneous graph neural networks by progressively incorporating data with varying quality.


Reducing Hallucinations in Entity Abstract Summarization with Facts-Template Decomposition

http://arxiv.org/abs/2402.18873v1

Compressor summary: SlotSum is an explainable framework for entity abstract summarization that decomposes the summary into facts and template, enabling error detection and rectification with external knowledge.


Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming

http://arxiv.org/abs/2402.18866v1

Compressor summary: The paper introduces Dr. Strategy, a new MBRL agent with a novel dreaming strategy that uses latent landmarks to improve sample efficiency and performance in complex navigation tasks.


Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

http://arxiv.org/abs/2402.18865v1

Compressor summary: The paper investigates the trade-off between plasticity and stability in large language models' continual learning, proposing a method called I-LoRA that improves performance on domain-specific tasks.


Probabilistic Lipschitzness and the Stable Rank for Comparing Explanation Models

http://arxiv.org/abs/2402.18863v1

Compressor summary: This paper investigates how different explainability models affect the quality of post hoc explanations for neural networks, using probabilistic Lipschitzness and stable rank as metrics to compare their robustness.


Taking Second-life Batteries from Exhausted to Empowered using Experiments, Data Analysis, and Health Estimation

http://arxiv.org/abs/2402.18859v1

Compressor summary: The study proposes health monitoring algorithms for reusing retired electric vehicle batteries in grid energy storage, achieving promising results with a machine learning-based model and an adaptive online algorithm.


Rethinking Multi-domain Generalization with A General Learning Objective

http://arxiv.org/abs/2402.18853v1

Compressor summary: The paper proposes a new learning objective for multi-domain generalization (mDG) that uses Y-mapping to relax constraints and improve performance in various tasks.


Applications of 0-1 Neural Networks in Prescription and Prediction

http://arxiv.org/abs/2402.18851v1

Compressor summary: Prescriptive networks (PNNs) are shallow neural networks that use mixed integer programming to optimize personalized healthcare policies, offering greater interpretability than deep neural networks and better performance in a case study of postpartum hypertension treatment.


Enhancing Steganographic Text Extraction: Evaluating the Impact of NLP Models on Accuracy and Semantic Coherence

http://arxiv.org/abs/2402.18849v1

Compressor summary: This study presents an LSB-NLP hybrid framework that combines image steganography and NLP to improve the accuracy and robustness of extracting hidden text, especially Chinese characters.


SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

http://arxiv.org/abs/2402.18848v1

Compressor summary: The paper proposes a co-designed method for realistic human portrait relighting using a physics-guided architecture and a self-supervised pre-training strategy.


Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling

http://arxiv.org/abs/2402.18846v1

Compressor summary: MFRNP is a novel framework for multi-fidelity surrogate modeling that improves accuracy by optimizing lower fidelity decoders for information sharing and modeling residual between aggregated outputs and ground truth.


Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey

http://arxiv.org/abs/2402.18844v1

Compressor summary: This paper reviews deep learning methods for 3D human pose estimation and mesh recovery, covering single-person and multi-person approaches, explicit and implicit models, and comparing results on several datasets.


ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

http://arxiv.org/abs/2402.18842v1

Compressor summary: ViewFusion is a new algorithm that enhances diffusion models for better multi-view consistency in image generation without needing training or fine-tuning.


Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation

http://arxiv.org/abs/2402.18839v1

Compressor summary: The paper develops a new method for conditional generation using Flow Matching, which improves over existing guidance-based methods by ensuring continuous matching of matrix fields instead of vector fields.


When does word order matter and when doesn't it?

http://arxiv.org/abs/2402.18838v1

Compressor summary: The paper suggests that language models are insensitive to word order in NLU tasks because linguistic redundancy provides overlapping information, and this insensitivity varies across tasks.


A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations

http://arxiv.org/abs/2402.18836v1

Compressor summary: The paper proposes a method to use expert observations with deep reinforcement learning for better sample efficiency and performance on continuous control tasks.


Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification

http://arxiv.org/abs/2402.18825v1

Compressor summary: The paper proposes a new framework (HiAdv) that uses a local hierarchy to improve text classification, especially for complex taxonomic structures and rare classes.


Batch size invariant Adam

http://arxiv.org/abs/2402.18824v1

Compressor summary: The paper proposes a modified version of Adam that works well with distributed training and does not require strong assumptions about gradient variance.


Debiased Novel Category Discovering and Localization

http://arxiv.org/abs/2402.18821v1

Compressor summary: The paper proposes an object detection model that can discover and localize novel classes without bias towards seen objects, using Debiased Region Mining and semi-supervised contrastive learning.


Dual Operating Modes of In-Context Learning

http://arxiv.org/abs/2402.18819v1

Compressor summary: The paper introduces a probabilistic model to explain the dual operating modes of in-context learning (ICL) and analyzes its behavior, offering insights into ICL's risk dynamics and performance with biased labels.


Gradient Alignment for Cross-Domain Face Anti-Spoofing

http://arxiv.org/abs/2402.18817v1

Compressor summary: The paper introduces GAC-FAS, a novel learning objective for face anti-spoofing that ensures convergence to an optimal flat minimum without additional modules and achieves state-of-the-art performance on cross-domain datasets.


How do Large Language Models Handle Multilingualism?

http://arxiv.org/abs/2402.18815v1

Compressor summary: The authors study how large language models process multilingual inputs, propose a framework for it, and develop a method to detect language-specific neurons in LLMs.


BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

http://arxiv.org/abs/2402.18811v1

Compressor summary: The paper proposes BFRFormer, a Transformer-based method for restoring blind face images with more identity-preserving details, using wavelet discriminator and aggregated attention module to address limitations of convolutional neural networks.


On the Decision-Making Abilities in Role-Playing using Large Language Models

http://arxiv.org/abs/2402.18807v1

Compressor summary: This paper evaluates how well large language models can make decisions in role-playing tasks based on different personality types and provides metrics to improve their decision-making abilities.


To Pool or Not To Pool: Analyzing the Regularizing Effects of Group-Fair Training on Shared Models

http://arxiv.org/abs/2402.18803v1

Compressor summary: The paper proposes generalization error bounds for fair machine learning that leverage the majority group's larger sample size to reduce performance disparities between groups.


BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data

http://arxiv.org/abs/2402.18800v1

Compressor summary: BlockEcho is a novel matrix completion method that uses Matrix Factorization within Generative Adversarial Networks to improve imputation of block-wise missing data, especially at higher rates.


Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

http://arxiv.org/abs/2402.18787v1

Compressor summary: The Immunity method enhances a modified Mixture-of-Experts architecture with Random Switch Gates and MI/Position Stability-based losses to improve adversarial robustness of DNNs.


OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

http://arxiv.org/abs/2402.18786v1

Compressor summary: The paper proposes a new imaging system that anonymizes facial images for depression recognition while preserving disease-related features and achieving state-of-the-art privacy protection performance.


Brain-inspired and Self-based Artificial Intelligence

http://arxiv.org/abs/2402.18784v1

Compressor summary: This paper proposes a Brain-inspired and Self-based Artificial Intelligence paradigm that emphasizes the role of self in shaping human-level AI models and robotic applications, aiming for real Artificial General Intelligence.


A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D

http://arxiv.org/abs/2402.18780v1

Compressor summary: The text describes a new method for evaluating and improving 3D content generation from text prompts using objective metrics and a novel baseline model that reduces artifacts.


NARUTO: Neural Active Reconstruction from Uncertain Target Observations

http://arxiv.org/abs/2402.18771v1

Compressor summary: NARUTO is a neural system that uses uncertainty learning to create high-quality environment maps and efficiently explore them for active reconstruction tasks.


Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*

http://arxiv.org/abs/2402.18766v1

Compressor summary: The paper introduces Gerv'asio PT*, a new open source decoder model that sets a new state of the art for neural decoding of Portuguese with instruction data sets.


Disentangling the Causes of Plasticity Loss in Neural Networks

http://arxiv.org/abs/2402.18762v1

Compressor summary: The paper investigates how to maintain neural network trainability by combining multiple mechanisms of plasticity loss mitigation, such as layer normalization and weight decay, in various settings.