This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-08 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.04699v1
Compressor summary: The paper presents a novel method using transformers with local and global attention for efficient and accurate 3D reconstruction from novel views.
http://arxiv.org/abs/2407.04697v1
Compressor summary: The paper introduces VCoME, a framework for generating coherent and visually appealing verbal videos with multimodal editing effects, which is 85 times more efficient than professional editors.
http://arxiv.org/abs/2407.04694v1
Compressor summary: The Situational Awareness Dataset (SAD) is a benchmark that tests the self-knowledge and situational awareness of large language models (LLMs), which are important for their capacity to plan and act autonomously, but also raise novel risks.
http://arxiv.org/abs/2407.04693v1
Compressor summary: The paper proposes an iterative self-training framework that scales up hallucination annotation and improves the accuracy of the annotator, outperforming GPT-4 and achieving state-of-the-art results in hallucination detection.
http://arxiv.org/abs/2407.04690v1
Compressor summary: Interpretability research using counterfactual theories faces issues with capturing multiple causes and transitive dependencies in neural networks, which affects the accuracy of causal graphs extraction and interpretation.
http://arxiv.org/abs/2407.04688v1
Compressor summary: The paper presents a new method to collect video data on lane-specific weaving patterns, which can help improve traffic management systems.
http://arxiv.org/abs/2407.04681v1
Compressor summary: The paper proposes a new visual prompt approach to integrate fine-grained external knowledge into multimodal large language models, improving their ability to understand detailed or localized visual elements.
http://arxiv.org/abs/2407.04678v1
Compressor summary: The paper introduces XQSV, a deep learning architecture for Xiangqi that can change its structure and mimic human players with high accuracy and indistinguishability.
http://arxiv.org/abs/2407.04676v1
Compressor summary: The study found strong associations between plantar thermography clusters and diabetic foot ulceration risk factors, but these associations were not predictive of the risk factors.
http://arxiv.org/abs/2407.04663v1
Compressor summary: The paper introduces an unsupervised deep learning method for tracking myocardial motion from low-resolution echocardiography images, which improves accuracy and speed over existing methods.
http://arxiv.org/abs/2407.04651v1
Compressor summary: The authors propose a few-shot fine-tuning strategy for adapting Segment Anything (SAM) to medical image segmentation tasks, which reduces user interaction and improves efficiency by using embeddings from annotated slices as prompts.
http://arxiv.org/abs/2407.04638v1
Compressor summary: The authors propose a semi-supervised segmentation method that uses unlabeled images and a few labeled ones to train a model, improving hip bone segmentation in CT scans with less data.
http://arxiv.org/abs/2407.04629v1
Compressor summary: This paper proposes EDF, a novel framework that improves open NER LLMs' performance in clinical named entity recognition by decomposing the task into sub-entity retrievals and filtering incorrect entities.
http://arxiv.org/abs/2407.04622v1
Compressor summary: The paper explores how different oversight protocols involving AI and human judges perform across various tasks with information asymmetry and finds that debate outperforms consultancy and is comparable to direct question-answering depending on the task type.
http://arxiv.org/abs/2407.04621v1
Compressor summary: OneRestore is a transformer-based framework that adapts to different image degradation scenarios by merging scene descriptors with image features using cross-attention, achieving superior restoration results.
http://arxiv.org/abs/2407.04620v1
Compressor summary: The authors propose Test-Time Training layers that use a hidden state as a machine learning model and update it during self-supervised training, achieving linear complexity with expressive power comparable to Transformer and RNNs in sequence modeling.
http://arxiv.org/abs/2407.04619v1
Compressor summary: The paper proposes CountGD, a model that can count objects in images using text or visual exemplars or both, and shows its superior performance on multiple counting benchmarks.
http://arxiv.org/abs/2407.04617v1
Compressor summary: The rPINN method is a faster and more effective alternative to HMC for uncertainty quantification in inverse PDE problems with noisy data.
http://arxiv.org/abs/2407.04616v1
Compressor summary: Isomorphic Pruning is a simple approach that effectively prunes heterogeneous sub-structures in advanced vision models, improving accuracy and reducing computation.
http://arxiv.org/abs/2407.04615v1
Compressor summary: The paper proposes an efficient way to improve language models with task-specific rewards for real world applications like detoxification and sentiment control.
http://arxiv.org/abs/2407.04604v1
Compressor summary: PartCraft is a generative visual AI system that allows users to select and combine parts of objects for fine-grained, faithful, and plausible results.
http://arxiv.org/abs/2407.04603v1
Compressor summary: AWT is a novel adaptation framework that enhances vision-language models for various visual tasks by augmenting inputs, dynamically weighting them, and mining semantic correlations in the vision-language space.
http://arxiv.org/abs/2407.04600v1
Compressor summary: Self-distillation improves performance, especially with multiple steps, and can significantly reduce excess risk in linear regression.
http://arxiv.org/abs/2407.04597v1
Compressor summary: FADeR improves unsupervised anomaly detection using minimal changes and fewer layers in a neural network, reducing false alarms and increasing efficiency for edge computing.
http://arxiv.org/abs/2407.04593v1
Compressor summary: The study explores how English speakers learn exceptions to the passive voice rule using neural network language models, and finds that verb frequency in the passive affects its passivizability more than semantics.
http://arxiv.org/abs/2407.04592v1
Compressor summary: This paper shows how to recognize emotions from smell-related artworks using style transfer and hyperparameter tuning, and suggests ways to improve it further.
http://arxiv.org/abs/2407.04591v1
Compressor summary: The paper proposes and analyzes three variants of the online proximal point method for solving time-varying convex-concave games, achieving near-optimal duality gap and dynamic Nash equilibrium regret bounds in benign environments.
http://arxiv.org/abs/2407.04590v1
Compressor summary: The researchers developed a system using object detection and neural networks to detect and verify the proper use of personal protective equipment in various industries, creating a dataset and achieving promising results.
http://arxiv.org/abs/2407.04587v1
Compressor summary: The paper introduces modal-aware interactive enhancement (MIE), a novel multimodal learning method that uses sharpness aware minimization and gradient modification to balance learning speeds, improve generalization, and reduce modality forgetting.
http://arxiv.org/abs/2407.04581v1
Compressor summary: The paper discusses how integrating Large Language Models into Integrated Satellite, Aerial, and Terrestrial Networks can improve connectivity and performance by enhancing data processing, network management, and advanced algorithms.
http://arxiv.org/abs/2407.04579v1
Compressor summary: GOALPlace is a new learning-based method that improves placement congestion by controlling cell density, achieving superior or comparable results to commercial tools.
http://arxiv.org/abs/2407.04560v1
Compressor summary: The authors are developing a system that uses deep learning to detect facial expressions and display matching emojis in real-time for various applications such as education and entertainment.
http://arxiv.org/abs/2407.04559v1
Compressor summary: The paper proposes a method to measure visual storytelling quality based on human likeness and evaluates several models, finding LLaVA as the best performer but with room for improvement.
http://arxiv.org/abs/2407.04549v1
Compressor summary: The paper investigates how iterative self-refinement in language models can lead to reward hacking, where the generated output optimizes the evaluator's ratings instead of actual user preference.
http://arxiv.org/abs/2407.04545v1
Compressor summary: The paper proposes a novel method for generating facial expressions using low-dimensional linear spaces based on dynamic 3D Gaussians, enabling real-time rendering and control.
http://arxiv.org/abs/2407.04543v1
Compressor summary: The paper proposes intermediate pre-training for Transformers to improve their inductive biases for syntactic transformations in seq2seq tasks, leading to better few-shot learning and structural generalization.
http://arxiv.org/abs/2407.04541v1
Compressor summary: The authors present PoPreRo, a dataset for predicting the popularity of Romanian Reddit posts, and show that it is a challenging task even for state-of-the-art models.
http://arxiv.org/abs/2407.04538v1
Compressor summary: The paper shows that pre-trained transformer-based vision models enable more flexible object part detection and improve fine-grained classification tasks, challenging restrictive assumptions on part properties.
http://arxiv.org/abs/2407.04534v1
Compressor summary: This study proposes a new way to categorize out-of-distribution samples (inside and outside) and analyzes their impact on machine learning models' performance.
http://arxiv.org/abs/2407.04533v1
Compressor summary: SSL models pretrained on speech improve SLU and ASR for low-resource Tunisian Arabic dialect, especially when fine-tuned with limited data.
http://arxiv.org/abs/2407.04528v1
Compressor summary: The paper compares different fine-tuning techniques for large language models, showing that RETRO models excel in zero-shot settings while GPT models have higher potential with parameter efficiency, and 8B models strike an optimal balance between cost and performance.
http://arxiv.org/abs/2407.04522v1
Compressor summary: This review discusses the potential of graph neural networks (GNNs) and reinforcement learning (RL) for improving power grid control, addressing different use cases and challenges in transmission and distribution grids.
http://arxiv.org/abs/2407.04519v1
Compressor summary: The paper proposes JFS, a method to assess the success of segmentation refinement using few-shot segmentation, which evaluates the quality of refined masks compared to coarse masks.
http://arxiv.org/abs/2407.04516v1
Compressor summary: The paper proposes a novel graph neural network approach that optimizes mesh adaptivity in finite element methods by minimizing the solution error, improving accuracy and efficiency compared to classical and previous machine learning methods.
http://arxiv.org/abs/2407.04513v1
Compressor summary: The paper proposes training methods for vision transformers that allow them to adapt to different layer execution orders, random merging, and layer pruning at test time, but with some reduction in accuracy.
http://arxiv.org/abs/2407.04505v1
Compressor summary: The paper evaluates deep learning architectures for hyperspectral image segmentation and shows that combining spectral and spatial information improves results, while also releasing a new dataset for the task.
http://arxiv.org/abs/2407.04504v1
Compressor summary: The paper introduces SA4D, a framework for segmenting objects in 4D digital scenes using 4D Gaussians and temporal identity features, with applications like removal, recoloring, composition, and rendering of masks.
http://arxiv.org/abs/2407.04493v1
Compressor summary: PROUD is a deep generative model that optimizes multiple properties simultaneously, preserving sample quality and achieving Pareto optimality in image and protein generation tasks.
http://arxiv.org/abs/2407.04491v1
Compressor summary: The authors introduce RealMLP, an improved MLP, and better default parameters for GBDTs and RealMLP, which offer a good time-accuracy tradeoff and can achieve excellent results on tabular data without hyperparameter tuning.
http://arxiv.org/abs/2407.04490v1
Compressor summary: The paper presents HFUT-VUT, a system for recognizing micro-gestures in videos that ranks 2nd in a challenge.
http://arxiv.org/abs/2407.04489v1
Compressor summary: The paper proposes a new framework for few-shot learning that uses dual contexts and unbalanced optimal transport theory to improve feature representation, alignment, and image augmentation, achieving better results than existing methods.
http://arxiv.org/abs/2407.04485v1
Compressor summary: The authors propose a method using graph attention networks and contrastive learning to detect hallucinations in large language models and improve their trustworthiness.
http://arxiv.org/abs/2407.04484v1
Compressor summary: This paper investigates how different infrared processing methods affect pedestrian detection in low-visibility urban scenarios and recommends using a specific shutterless pipeline with tonemapping for autonomous driving.
http://arxiv.org/abs/2407.04481v1
Compressor summary: The paper proposes using Petri nets to integrate RL models in real-world domains, improving trustworthiness by enabling verification of model properties and enforcing constraints.
http://arxiv.org/abs/2407.04480v1
Compressor summary: LoCo is a method that compensates gradients before compression, ensuring efficient synchronization and maintaining training quality for large-scale models.
http://arxiv.org/abs/2407.04476v1
Compressor summary: The text discusses a new method for point cloud upsampling that compares patch-based and full-model inputs, but finds that patch-based methods perform better.
http://arxiv.org/abs/2407.04467v1
Compressor summary: This study examines how large language models perform in strategic games and finds they have systematic biases that affect their decision-making, especially when game settings or prompts change.
http://arxiv.org/abs/2407.04466v1
Compressor summary: The authors present CIViC Evidence, a sequence classification problem in medical NLP, and compare different language models and GPT-4's few-shot performance on it.
http://arxiv.org/abs/2407.04461v1
Compressor summary: The paper proposes a novel texture synthesis method for 3D shapes using a 2D-3D collaborative denoising framework, which addresses the modal gap and improves the consistency of textures by aligning variances and refining inpainting.
http://arxiv.org/abs/2407.04459v1
Compressor summary: The paper compares general and special-purpose language models on Urdu tasks, finding that special-purpose models perform better and GPT-4-Turbo is more aligned with human evaluation than Llama-3-8b-Instruct.
http://arxiv.org/abs/2407.04458v1
Compressor summary: DMRNet is a novel method for robust multimodal learning that samples embeddings from probabilistic distributions instead of fixed points, allowing it to capture modality-specific information and perform better than existing methods.
http://arxiv.org/abs/2407.04451v1
Compressor summary: HPL uses hindsight information to model human preferences for offline RL, resulting in more robust and advantageous rewards.
http://arxiv.org/abs/2407.04449v1
Compressor summary: The paper proposes using Electronic Health Record (EHR) data to improve self-supervised learning for chest X-ray images by incorporating it into a Masked Siamese Network.
http://arxiv.org/abs/2407.04444v1
Compressor summary: TokenVerse is a single Transducer-based model for conversational intelligence that handles multiple tasks by integrating task-specific tokens during training and improving ASR and task performance over the cascaded pipeline approach.
http://arxiv.org/abs/2407.04440v1
Compressor summary: The paper presents a wavelet-based neural network model that efficiently captures spatio-temporal correlations in traffic flow data and outperforms existing methods on three real-world datasets.
http://arxiv.org/abs/2407.04434v1
Compressor summary: The paper proposes a catalogue of gender-exclusive terms and their neutral alternatives, which is used to fine-tune LLMs and reduce gender stereotyping.
http://arxiv.org/abs/2407.04419v1
Compressor summary: The paper studies the complexity of finding symmetry breaking predicates (SBPs) for solving constraint programming problems like SAT, and shows that certifying graph non-isomorphism is a natural barrier for efficient SBP computation.
http://arxiv.org/abs/2407.04407v1
Compressor summary: The paper proposes a new conformal prediction method for classification tasks that uses rank-based scores to capture uncertainty and provides better coverage than existing methods.
http://arxiv.org/abs/2407.04406v1
Compressor summary: The text proposes an optimization problem and algorithm for finding the best mapping between two Hilbert spaces using density matrix measurements and quantum channels, which generalizes unitary learning and allows studying probabilistic mixtures and superpositions.