This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-01 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.19481v1
Compressor summary: DistriFusion is a method that speeds up image synthesis with diffusion models by using parallelism, asynchronous communication, and context reuse across multiple GPUs without sacrificing quality.
http://arxiv.org/abs/2402.19479v1
Compressor summary: The authors propose an automatic approach to create a large video dataset with high-quality captions using multimodal inputs and cross-modality teacher models, and show its effectiveness on three downstream tasks.
http://arxiv.org/abs/2402.19477v1
Compressor summary: The paper presents a simulation-free method for learning a generalized physical face model from 3D face data, enabling easy fitting to any identity and realistic physics-based facial animation with minimal artist input and network training.
http://arxiv.org/abs/2402.19474v1
Compressor summary: The All-Seeing Project V2 introduces a new model and dataset for understanding object relations in images, improving relation comprehension in multi-modal large language models.
http://arxiv.org/abs/2402.19473v1
Compressor summary: The paper reviews retrieval-augmented generation (RAG), a technique that enhances artificial intelligence generated content (AIGC) by integrating information retrieval to improve accuracy and robustness, and surveys its applications, benchmarks, and future research directions.
http://arxiv.org/abs/2402.19472v1
Compressor summary: Lifelong benchmarks mitigate overfitting in machine learning by expanding test sets and using Sort & Search framework for efficient evaluation of models.
http://arxiv.org/abs/2402.19471v1
Compressor summary: The study uses a game-based task to investigate how people ask questions with limited resources and compares language models' abilities to generate informative questions that mirror human performance.
http://arxiv.org/abs/2402.19167v1
Compressor summary: We propose DiPMT++, a framework that teaches large language models to translate new languages using only a dictionary and 5K parallel sentences, improving translation quality for Zhuang and enabling human-assisted translation for unseen languages.
http://arxiv.org/abs/2402.19161v1
Compressor summary: MemoNav is a memory model for image-goal navigation that uses short-term, long-term, and working memory to efficiently explore unfamiliar environments and navigate to goals indicated by images.
http://arxiv.org/abs/2402.19160v1
Compressor summary: StegaFormer is a new method for hiding secret messages in images using MLPs that preserves the order of message bits and fuses them with image features, achieving better recovery accuracy, message capacity, and imperceptibility.
http://arxiv.org/abs/2402.19159v1
Compressor summary: The paper proposes Trajectory Consistency Distillation (TCD) to improve the Latent Consistency Model (LCM) for text-to-image synthesis by addressing errors in three areas and using strategic stochastic sampling.
http://arxiv.org/abs/2402.19155v1
Compressor summary: bGPT is a powerful model that uses next byte prediction to simulate and diagnose various aspects of the digital world, achieving high accuracy in tasks such as converting music notation and executing CPU operations.
http://arxiv.org/abs/2402.19150v1
Compressor summary: Typographic Attack is a security vulnerability for LMMs, but they can partially distinguish visual contents and typos in images, and using more informative texts or prompts can improve their performance.
http://arxiv.org/abs/2402.19145v1
Compressor summary: The paper presents a lightweight and efficient anomaly detection model that uses Segment Anything (SAM) to localize unseen anomalies and diverse patterns, achieving high performance on various datasets.
http://arxiv.org/abs/2402.19144v1
Compressor summary: SKD-WM3D is a weakly supervised monocular 3D detection framework that uses self-knowledge distillation to achieve precise 3D object localization from a single image without any 3D annotations or extra training data.
http://arxiv.org/abs/2402.19142v1
Compressor summary: The paper introduces prototypical parts, a way to make detection transformers more interpretable by constructing local features that align with object classes and allowing visual inspection of the model's perception.
http://arxiv.org/abs/2402.19133v1
Compressor summary: The paper explores using eye-tracking recordings as an alternative to manual annotations for evaluating explainability methods in NLP tasks, and compares different language models and languages.
http://arxiv.org/abs/2402.19122v1
Compressor summary: BigGait is a novel framework that uses large vision models to learn implicit gait features in an unsupervised way, improving gait recognition performance and reducing annotation costs.
http://arxiv.org/abs/2402.19119v1
Compressor summary: VIXEN summarizes textual differences between images to detect manipulation.
http://arxiv.org/abs/2402.19118v1
Compressor summary: The paper proposes a novel motor attention mechanism to capture dynamic changes in sign language expression and applies self-distillation to improve feature extraction for continuous sign language recognition (CSLR), achieving state-of-the-art results on three datasets.
http://arxiv.org/abs/2402.19116v1
Compressor summary: This paper proposes a method called Implicit-Enhanced Causal Inference to improve weakly-supervised phrase grounding by modeling implicit relations and using intervention and counterfactual techniques, leading to better performance than existing models and multimodal LLMs.
http://arxiv.org/abs/2402.19108v1
Compressor summary: DeepEraser is a recurrent deep network that erases text from images using iterative refinements and custom mask generation, achieving strong results on several benchmarks.
http://arxiv.org/abs/2402.19103v1
Compressor summary: The paper analyzes how false premises cause language models to generate incorrect text and proposes FAITH, a method to limit attention heads and reduce hallucinations.
http://arxiv.org/abs/2402.19102v1
Compressor summary: FlatNAS is a novel NAS method that optimizes NN performance, OOD robustness, and parameter count using only in-distribution data.
http://arxiv.org/abs/2402.19097v1
Compressor summary: The authors propose a new text diffusion model (TEncDM) that uses language model encodings and a Transformer decoder for better text generation and reduces the number of denoising steps.
http://arxiv.org/abs/2402.19091v1
Compressor summary: Key points: - Synthetic image generation poses risks for online information integrity and safety - Existing SID methods focus on high-level visual semantics but need fine-grained details - The method uses intermediate CLIP layers and a learnable vector space to improve SID performance - The method outperforms state-of-the-art by 10.6% with minimal training time Summary: The authors propose a novel synthetic image detection method that leverages intermediate CLIP layers and a forgery-aware vector space, achieving significant improvement over existing methods with fast training.
http://arxiv.org/abs/2402.19090v1
Compressor summary: The text studies a problem where an agent needs to find the best arm under limited resources and proposes a novel algorithm that converges quickly, with different rates depending on resource uncertainty.
http://arxiv.org/abs/2402.19088v1
Compressor summary: The text discusses how language evolution affects computational linguistics algorithms, and surveys existing methods to characterize semantic changes in words.
http://arxiv.org/abs/2402.19085v1
Compressor summary: The paper proposes controllable preference optimization (CPO) to align AI models with human preferences on multiple objectives, reducing trade-offs and improving performance.
http://arxiv.org/abs/2402.19082v1
Compressor summary: VideoMAC combines video masked autoencoders with ConvNets to improve visual representation learning for videos, outperforming ViT-based approaches on downstream tasks.
http://arxiv.org/abs/2402.19078v1
Compressor summary: The paper proposes a new smooth Tchebycheff scalarization approach for gradient-based multi-objective optimization that has good theoretical properties and lower computational complexity than existing methods.
http://arxiv.org/abs/2402.19076v1
Compressor summary: The text discusses how large language models struggle with relation extraction tasks when entity mentions are replaced or modified, suggesting they rely on shortcut features rather than semantic understanding.
http://arxiv.org/abs/2402.19072v1
Compressor summary: TimeXer is a novel framework that leverages external information to enhance the forecasting of endogenous variables using the Transformer architecture with self-attention and cross-attention mechanisms.
http://arxiv.org/abs/2402.19059v1
Compressor summary: VEnvision3D is a large synthetic 3D perception dataset for multi-task learning, aiming to facilitate the development of unified foundation models in computer vision research.
http://arxiv.org/abs/2402.19052v1
Compressor summary: The study evaluates the performance of large language models in summarizing various counseling components using MentalCLOUDS dataset, finding that Mistral performs best overall with room for improvement.
http://arxiv.org/abs/2402.19047v1
Compressor summary: Key points: - Structured state-space models (SSMs) like S4 are effective for sequential data modeling - Deep SSMs with multiplicative interactions can outperform attention-based transformers in accuracy and efficiency - The paper provides theoretical grounding using Rough Path Theory to explain the success of selective state-space models like Mamba Summary: The paper explains how deep SSMs with selectivity mechanisms, which project hidden states from input signatures, can surpass attention-based transformers in sequential data modeling and provides a theoretical framework using Rough Path Theory.
http://arxiv.org/abs/2402.19041v1
Compressor summary: Key points: - Atmospheric turbulence distorts visual imagery - Model-based methods have artefacts, deep learning methods need diverse datasets - Self-supervised learning method uses accelerated DIP with temporal information - Method improves visual quality of raw or pre-processed sequences Summary: The paper proposes a self-supervised learning method that uses accelerated DIP and temporal information to improve the visual quality of sequences affected by atmospheric turbulence distortions.
http://arxiv.org/abs/2402.19026v1
Compressor summary: The paper proposes a new method for matching people in infrared and visible images without annotations, using progressive contrastive learning with multi-prototype to address disparity and retain natural feature variety.
http://arxiv.org/abs/2402.19025v1
Compressor summary: The paper proposes a method to improve the consistency of explanations for predictions made by combining multiple weak learners using discriminative averaging, and shows its effectiveness on SHAP and Random Forest ensembles.
http://arxiv.org/abs/2402.19014v1
Compressor summary: The paper proposes a contrastive learning framework, DoCo, to improve visual representation in text-rich scenarios for large visual-language models, enhancing their performance in visual document understanding tasks.
http://arxiv.org/abs/2402.19009v1
Compressor summary: DiLED is a generalized diffusion model with learnable encoder-decoder that enhances performance and broad applicability across different data types.
http://arxiv.org/abs/2402.19007v1
Compressor summary: The DOZE dataset provides a more realistic challenge for autonomous agents to navigate in dynamic environments with diverse objects and obstacles.
http://arxiv.org/abs/2402.19004v1
Compressor summary: RSAM-Seg is a modified SAM model that improves image segmentation for remote sensing tasks and can help identify missing data in ground truth.
http://arxiv.org/abs/2402.19002v1
Compressor summary: GoalNet predicts pedestrian goals and trajectories using scene context and observed trajectory, outperforming previous methods by a large margin.
http://arxiv.org/abs/2402.19001v1
Compressor summary: The study explores how using intermediate domains affects two-step transfer learning for classifying medical images and proposes a step-wise fine-tuning method to improve performance.
http://arxiv.org/abs/2402.18998v1
Compressor summary: The paper proposes a novel method for few-shot anomaly detection that uses pre-trained models, contrastive training, and cross-instance pairs to learn suitable representations for the task.
http://arxiv.org/abs/2402.18995v1
Compressor summary: The paper introduces a new dynamical system that improves count time series modeling by capturing overdispersed behaviors, learning latent structure, and enabling fast inference and prediction.
http://arxiv.org/abs/2402.18975v1
Compressor summary: This paper introduces Continuous OBB, a novel representation method that solves the discontinuity issue in Oriented Object Detection, and shows its effectiveness on the DOTA dataset using Faster-RCNN as a baseline model.
http://arxiv.org/abs/2402.18974v1
Compressor summary: GRASP is a fast and accurate graph generative model that uses spectral decomposition, denoising, and node features to create realistic graphs.
http://arxiv.org/abs/2402.18969v1
Compressor summary: The paper introduces OHTA, a method that creates realistic and personalized hand avatars from one image using data-driven hand priors, and shows its applications in various scenarios.
http://arxiv.org/abs/2402.18960v1
Compressor summary: The study compares three methods for detecting unreliable assessments in medical image analysis using deep learning and finds that the ensemble method performs best.
http://arxiv.org/abs/2402.18958v1
Compressor summary: The authors propose a novel active learning method (SSOD-AT) that uses an RoI comparison module to generate high-confidence pseudo-labels for object detection in remote sensing images and improves performance over state-of-the-art methods.
http://arxiv.org/abs/2402.18956v1
Compressor summary: The paper proposes WWW, a framework that explains neural network decisions by discovering concepts, creating localized concept maps and heatmaps, and predicting uncertainty.
http://arxiv.org/abs/2402.18951v1
Compressor summary: Key points: - Open-world video recognition is hard due to environment variations - Foundation models have rich knowledge but need proper application - PCA pipeline transfers external multimodal knowledge to improve recognition - PCA has three stages: Percept, Chat, and Adapt - PCA achieves state-of-the-art results on three benchmarks Summary: PCA is a generic pipeline that uses foundation models' knowledge to enhance open-world video recognition, by transferring visual and textual information in three stages.
http://arxiv.org/abs/2402.18950v1
Compressor summary: The paper proposes a method to predict popular user replies on social media using reinforcement learning and curriculum learning.
http://arxiv.org/abs/2402.18946v1
Compressor summary: The paper proposes an adaptive online learning framework with a sparse GP model and a safety filter based on HOCBFs to ensure safe control in non-stationary environments.
http://arxiv.org/abs/2402.18933v1
Compressor summary: The paper proposes a new method to learn effective structural image representations for multimodality image registration using Deep Neighbourhood Self-similarity and anatomy-aware contrastive learning, improving discrimination and accuracy compared to existing methods.
http://arxiv.org/abs/2402.18929v1
Compressor summary: Our paper investigates the effects of Dropout and proposes a new training strategy for Blind Super-Resolution that preserves fine details better than Dropout.
http://arxiv.org/abs/2402.18927v1
Compressor summary: The paper presents a real-time video analysis system using edge computing that adapts to network conditions and object detection with reinforcement learning methods.
http://arxiv.org/abs/2402.18925v1
Compressor summary: Key points: - Event cameras record scene dynamics with high temporal resolution and low-level illumination - Existing methods fuse intensity and event data at pixel level, ignoring high-level patterns - PCDepth discretizes the scene into high-level patterns and integrates them across modalities - PCDepth achieves more accurate monocular depth estimation than existing methods, especially in nighttime scenarios Summary: PCDepth is a novel approach that leverages high-level patterns from event cameras and intensity images for better monocular depth estimation, outperforming state-of-the-art methods in low-light conditions.
http://arxiv.org/abs/2402.18923v1
Compressor summary: The paper proposes a large-scale speech recognition model with an inappropriate pause prediction layer to detect and assess inappropriate pauses in dysarthric speech, which affects stroke patients' speech intelligibility.
http://arxiv.org/abs/2402.18922v1
Compressor summary: The authors propose a simple and versatile network (SENet) for camouflaged object detection and salient object detection, using a vision Transformer encoder-decoder structure, a local information capture module, and a dynamic weighted loss function.
http://arxiv.org/abs/2402.18920v1
Compressor summary: The paper presents a unified framework that predicts point-wise correspondences and shape interpolation between 3D shapes using a combination of deep functional maps and classical surface deformation models, achieving better performance than previous methods.
http://arxiv.org/abs/2402.18919v1
Compressor summary: DaC improves image classification robustness by identifying causal components, intervening on images, and retraining models to address correlation shift caused by compositional nature of images.
http://arxiv.org/abs/2402.18918v1
Compressor summary: This paper proposes a novel heterogeneous feature fusion network for freespace detection with improved accuracy and efficiency, addressing limitations in previous techniques.
http://arxiv.org/abs/2402.18917v1
Compressor summary: The paper proposes efficient algorithms for assortment optimization with user choices modeled by Plackett Luce, and provides a novel concentration guarantee using Pairwise Rank-Breaking to minimize regret.
http://arxiv.org/abs/2402.18913v1
Compressor summary: The paper introduces AdaMergeX, a cross-lingual transfer method that uses adaptive adapter merging to improve performance on target tasks by decoupling task ability and language ability.
http://arxiv.org/abs/2402.18910v1
Compressor summary: The paper proposes DIGIC, a framework that uses causal discovery to identify causal features for domain generalizable imitation learning with single-domain data, outperforming cross-domain variation-based methods.
http://arxiv.org/abs/2402.18909v1
Compressor summary: Unstructured Knowledge Editing (UKE) is a new benchmark that evaluates language models' ability to update their knowledge using unstructured texts, which is more practical than current methods based on structured facts.
http://arxiv.org/abs/2402.18905v1
Compressor summary: The paper analyzes the training dynamics of differentially private (DP) linear probing and full fine-tuning, explores sequential fine-tuning, provides theoretical insights into DP fine-tuning convergence, and establishes a utility curve for privacy budget allocation.
http://arxiv.org/abs/2402.18892v1
Compressor summary: AKGVP is a method that improves object-goal navigation by aligning knowledge graph with visual perception using continuous modeling and natural language pre-training.
http://arxiv.org/abs/2402.18886v1
Compressor summary: The study proposes a novel framework using physics-informed DeepONet approach to predict continuous and accurate arterial blood pressure waveforms without invasive methods.
http://arxiv.org/abs/2402.18884v1
Compressor summary: This paper studies how over-parameterized deep neural networks behave under supervised contrastive loss and reveals their structural patterns using an analytical approach.
http://arxiv.org/abs/2402.18879v1
Compressor summary: The paper proposes a two-stage framework using CNN and transformer to predict dose maps and radiotherapy parameters, incorporating intra-relation and inter-relation models for accurate parameter regression.
http://arxiv.org/abs/2402.18877v1
Compressor summary: The paper proposes a simple method to check if the tree model assumption is valid for reconstructing language evolution, by projecting the tree onto a space generated by principal component analysis.
http://arxiv.org/abs/2402.18875v1
Compressor summary: The paper proposes a loss-aware training schedule (LTS) that improves the performance and robustness of heterogeneous graph neural networks by progressively incorporating data with varying quality.
http://arxiv.org/abs/2402.18873v1
Compressor summary: SlotSum is an explainable framework for entity abstract summarization that decomposes the summary into facts and template, enabling error detection and rectification with external knowledge.
http://arxiv.org/abs/2402.18866v1
Compressor summary: The paper introduces Dr. Strategy, a new MBRL agent with a novel dreaming strategy that uses latent landmarks to improve sample efficiency and performance in complex navigation tasks.
http://arxiv.org/abs/2402.18865v1
Compressor summary: The paper investigates the trade-off between plasticity and stability in large language models' continual learning, proposing a method called I-LoRA that improves performance on domain-specific tasks.
http://arxiv.org/abs/2402.18863v1
Compressor summary: This paper investigates how different explainability models affect the quality of post hoc explanations for neural networks, using probabilistic Lipschitzness and stable rank as metrics to compare their robustness.
http://arxiv.org/abs/2402.18859v1
Compressor summary: The study proposes health monitoring algorithms for reusing retired electric vehicle batteries in grid energy storage, achieving promising results with a machine learning-based model and an adaptive online algorithm.
http://arxiv.org/abs/2402.18853v1
Compressor summary: The paper proposes a new learning objective for multi-domain generalization (mDG) that uses Y-mapping to relax constraints and improve performance in various tasks.
http://arxiv.org/abs/2402.18851v1
Compressor summary: Prescriptive networks (PNNs) are shallow neural networks that use mixed integer programming to optimize personalized healthcare policies, offering greater interpretability than deep neural networks and better performance in a case study of postpartum hypertension treatment.
http://arxiv.org/abs/2402.18849v1
Compressor summary: This study presents an LSB-NLP hybrid framework that combines image steganography and NLP to improve the accuracy and robustness of extracting hidden text, especially Chinese characters.
http://arxiv.org/abs/2402.18848v1
Compressor summary: The paper proposes a co-designed method for realistic human portrait relighting using a physics-guided architecture and a self-supervised pre-training strategy.
http://arxiv.org/abs/2402.18846v1
Compressor summary: MFRNP is a novel framework for multi-fidelity surrogate modeling that improves accuracy by optimizing lower fidelity decoders for information sharing and modeling residual between aggregated outputs and ground truth.
http://arxiv.org/abs/2402.18844v1
Compressor summary: This paper reviews deep learning methods for 3D human pose estimation and mesh recovery, covering single-person and multi-person approaches, explicit and implicit models, and comparing results on several datasets.
http://arxiv.org/abs/2402.18842v1
Compressor summary: ViewFusion is a new algorithm that enhances diffusion models for better multi-view consistency in image generation without needing training or fine-tuning.
http://arxiv.org/abs/2402.18839v1
Compressor summary: The paper develops a new method for conditional generation using Flow Matching, which improves over existing guidance-based methods by ensuring continuous matching of matrix fields instead of vector fields.
http://arxiv.org/abs/2402.18838v1
Compressor summary: The paper suggests that language models are insensitive to word order in NLU tasks because linguistic redundancy provides overlapping information, and this insensitivity varies across tasks.
http://arxiv.org/abs/2402.18836v1
Compressor summary: The paper proposes a method to use expert observations with deep reinforcement learning for better sample efficiency and performance on continuous control tasks.
http://arxiv.org/abs/2402.18825v1
Compressor summary: The paper proposes a new framework (HiAdv) that uses a local hierarchy to improve text classification, especially for complex taxonomic structures and rare classes.
http://arxiv.org/abs/2402.18824v1
Compressor summary: The paper proposes a modified version of Adam that works well with distributed training and does not require strong assumptions about gradient variance.
http://arxiv.org/abs/2402.18821v1
Compressor summary: The paper proposes an object detection model that can discover and localize novel classes without bias towards seen objects, using Debiased Region Mining and semi-supervised contrastive learning.
http://arxiv.org/abs/2402.18819v1
Compressor summary: The paper introduces a probabilistic model to explain the dual operating modes of in-context learning (ICL) and analyzes its behavior, offering insights into ICL's risk dynamics and performance with biased labels.
http://arxiv.org/abs/2402.18817v1
Compressor summary: The paper introduces GAC-FAS, a novel learning objective for face anti-spoofing that ensures convergence to an optimal flat minimum without additional modules and achieves state-of-the-art performance on cross-domain datasets.
http://arxiv.org/abs/2402.18815v1
Compressor summary: The authors study how large language models process multilingual inputs, propose a framework for it, and develop a method to detect language-specific neurons in LLMs.
http://arxiv.org/abs/2402.18811v1
Compressor summary: The paper proposes BFRFormer, a Transformer-based method for restoring blind face images with more identity-preserving details, using wavelet discriminator and aggregated attention module to address limitations of convolutional neural networks.
http://arxiv.org/abs/2402.18807v1
Compressor summary: This paper evaluates how well large language models can make decisions in role-playing tasks based on different personality types and provides metrics to improve their decision-making abilities.
http://arxiv.org/abs/2402.18803v1
Compressor summary: The paper proposes generalization error bounds for fair machine learning that leverage the majority group's larger sample size to reduce performance disparities between groups.
http://arxiv.org/abs/2402.18800v1
Compressor summary: BlockEcho is a novel matrix completion method that uses Matrix Factorization within Generative Adversarial Networks to improve imputation of block-wise missing data, especially at higher rates.
http://arxiv.org/abs/2402.18787v1
Compressor summary: The Immunity method enhances a modified Mixture-of-Experts architecture with Random Switch Gates and MI/Position Stability-based losses to improve adversarial robustness of DNNs.
http://arxiv.org/abs/2402.18786v1
Compressor summary: The paper proposes a new imaging system that anonymizes facial images for depression recognition while preserving disease-related features and achieving state-of-the-art privacy protection performance.
http://arxiv.org/abs/2402.18784v1
Compressor summary: This paper proposes a Brain-inspired and Self-based Artificial Intelligence paradigm that emphasizes the role of self in shaping human-level AI models and robotic applications, aiming for real Artificial General Intelligence.
http://arxiv.org/abs/2402.18780v1
Compressor summary: The text describes a new method for evaluating and improving 3D content generation from text prompts using objective metrics and a novel baseline model that reduces artifacts.
http://arxiv.org/abs/2402.18771v1
Compressor summary: NARUTO is a neural system that uses uncertainty learning to create high-quality environment maps and efficiently explore them for active reconstruction tasks.
http://arxiv.org/abs/2402.18766v1
Compressor summary: The paper introduces Gerv'asio PT*, a new open source decoder model that sets a new state of the art for neural decoding of Portuguese with instruction data sets.
http://arxiv.org/abs/2402.18762v1
Compressor summary: The paper investigates how to maintain neural network trainability by combining multiple mechanisms of plasticity loss mitigation, such as layer normalization and weight decay, in various settings.