This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-22 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.13808v1
Compressor summary: Increasing data diversity in self-supervised learning improves performance only when the distribution distance to downstream data is small.
http://arxiv.org/abs/2403.13807v1
Compressor summary: EMCID is a two-stage method that edits massive concepts in text-to-image diffusion models, addressing risks such as outdated content and copyright infringement, while offering scalability for real-world applications.
http://arxiv.org/abs/2403.13805v1
Compressor summary: CLIP and MLLMs have complementary strengths for vision-language recognition tasks; RAR combines them to improve accuracy on fine-grained, few-shot, and zero-shot recognition.
http://arxiv.org/abs/2403.13806v1
Compressor summary: RadSplat is a lightweight real-time rendering method that leverages radiance fields for scene representation and optimization, pruning points for compactness, and test-time filtering for speed, achieving state-of-the-art results on complex scenes.
http://arxiv.org/abs/2403.13804v1
Compressor summary: SynGround is a framework that enhances vision-and-language models by combining data-driven learning, knowledge transfer, and mask-attention consistency to improve grounding capabilities and performance on pointing games.
http://arxiv.org/abs/2403.13803v1
Compressor summary: The authors propose a box stability score (BoS) that reflects the stability of bounding boxes in object detection, which correlates with detector accuracy and can be used to assess detectors without test ground truths.
http://arxiv.org/abs/2403.13802v1
Compressor summary: The study proposes Zigzag Mamba, a method that improves speed and memory utilization for visual data generation by addressing spatial continuity issues in the State-Space Model Mamba.
http://arxiv.org/abs/2403.13800v1
Compressor summary: The paper presents a method to generate videos by rewinding time from a single image using neuromorphic event cameras and diffusion models, showing promising results for capturing missed moments in computer vision and photography.
http://arxiv.org/abs/2403.13799v1
Compressor summary: The paper proposes reverse training for large language models to improve their ability to handle reverse relations, such as "B is a feature of A", by doubling the available tokens and training in both forward and reverse directions.
http://arxiv.org/abs/2403.13798v1
Compressor summary: The text introduces a neuro-symbolic approach for action quality assessment using computer vision that is more transparent, unbiased, and informative than existing neural models, and applies it to diving.
http://arxiv.org/abs/2403.13797v1
Compressor summary: SWAB is a method that uses optimal transport to transfer statistics between open-source and target datasets, improving zero-shot image classification by selecting the best Pre-Trained VLM from the VLM Zoo based on text data only.
http://arxiv.org/abs/2403.13793v1
Compressor summary: The authors evaluate Gemini 1.0 AI models on four potential dangerous capabilities and find no strong evidence of risk, but highlight early warning signs.
http://arxiv.org/abs/2403.13788v1
Compressor summary: The paper proposes a generative method for monocular depth estimation that uses image diffusion as a prior and flow matching to efficiently map from input images to depth maps, achieving state-of-the-art results with low computational cost.
http://arxiv.org/abs/2403.13787v1
Compressor summary: RewardBench is a benchmark dataset and code-base to evaluate reward models in language models alignment, revealing their strengths and weaknesses in chat, reasoning, and safety tasks.
http://arxiv.org/abs/2403.13786v1
Compressor summary: The Chain-of-Interaction (CoI) prompting method helps large language models understand patient behaviors during motivational interviewing by considering the coding scheme, therapist question strategies, and dyadic interactions between patients and therapists.
http://arxiv.org/abs/2403.13785v1
Compressor summary: The paper presents an extension of Fault Trees to model and analyze Predictive Maintenance problems in modern systems.
http://arxiv.org/abs/2403.13784v1
Compressor summary: The Model Openness Framework (MOF) is a system to rate and promote openness in generative AI models, addressing concerns about transparency, reproducibility, bias, and safety.
http://arxiv.org/abs/2403.13781v1
Compressor summary: This paper proposes a sparse implementation of Graph-Informed layers for Graph Neural Networks that reduces memory usage and improves computational efficiency, enabling deeper and more scalable models on large graphs.
http://arxiv.org/abs/2403.13780v1
Compressor summary: InfoSumm is a novel framework that distills a powerful summarizer using an information-theoretic objective without relying on large-scale language models or human references.
http://arxiv.org/abs/2403.13771v1
Compressor summary: The paper introduces Describe-and-Dissect (DnD), a method that uses multimodal deep learning to generate natural language descriptions of hidden neurons in vision networks without training data or predefined concepts, and shows its superiority over prior work.
http://arxiv.org/abs/2403.13765v1
Compressor summary: The paper investigates the theoretical and empirical aspects of learning latent state representations from video data for decision-making tasks, finding that temporal contrastive learning and forward modeling perform well in settings with only iid noise but struggle with exogenous noise.
http://arxiv.org/abs/2403.13763v1
Compressor summary: The paper proposes a new format for optical music recognition (OMR) models, called Linearized MusicXML, which allows them to read input images and produce a linear sequence of tokens compatible with industry standards, and creates a benchmark dataset based on the OpenScore Lieder corpus.
http://arxiv.org/abs/2403.13761v1
Compressor summary: HierCode is a lightweight codebook that uses a multi-hot encoding strategy to represent Chinese characters hierarchically and facilitate zero-shot recognition of out-of-vocabulary characters, achieving state-of-the-art performance with fast inference speed.
http://arxiv.org/abs/2403.13756v1
Compressor summary: The authors propose a model that uses a pre-trained vision language model to improve the understanding of patient gait videos by learning from text, video, and numerical data, achieving better performance than previous methods.
http://arxiv.org/abs/2403.13754v1
Compressor summary: The study examines how different ways of splitting words affect Spanish plural nouns' agreement, finding that a method based on word structure performs similarly to other methods and is not necessary for good performance.
http://arxiv.org/abs/2403.13749v1
Compressor summary: The paper introduces a new graph isomorphism test hierarchy, $r$-loopy Weisfeiler-Leman (r-WL), that can count homomorphisms of cactus graphs and a corresponding GNN framework, r-MPNN, which performs well on various datasets.
http://arxiv.org/abs/2403.13747v1
Compressor summary: The authors propose HHNet, a method that uses High-Resolution Networks (HRNets) to learn high-resolution features for efficient image retrieval, outperforming existing methods on various datasets.
http://arxiv.org/abs/2403.13745v1
Compressor summary: MOTIA is a diffusion-based method that adapts to input videos and uses learned patterns for effective video outpainting, achieving superior results with minimal tuning.
http://arxiv.org/abs/2403.13740v1
Compressor summary: Prob-PSENN is a probabilistic reformulation of Deep Neural Networks that offers transparent, flexible, and uncertain prediction explanations using probability distributions over prototypes.
http://arxiv.org/abs/2403.13737v1
Compressor summary: The paper introduces EthioLLM, multilingual large language models for five Ethiopian languages and English, and Ethiobenchmark, a new benchmark dataset for various NLP tasks, to improve the state of low-resource language NLP in Ethiopia.
http://arxiv.org/abs/2403.13728v1
Compressor summary: The authors propose a probabilistic graphical model to optimize neural network parameters and weight multipliers jointly, using a hypervolume based likelihood that promotes descent of each loss term, resulting in a multiplier-free method that saves computational resources and outperforms other methods.
http://arxiv.org/abs/2403.13724v1
Compressor summary: Key points: - Framework for probabilistic forecasting based on generative modeling of dynamical systems - Stochastic interpolants enable construction of a generative model between base and target distributions - Fictitious non-physical stochastic dynamics produces samples from target conditional distribution - Drift coefficient can be learned efficiently by square loss regression - Approach works on complex, high-dimensional problems like fluid dynamics and video prediction Summary: The paper presents a generative modeling framework for probabilistic forecasting of dynamical systems using stochastic interpolants and a non-physical stochastic dynamics that learns the drift coefficient from data and handles complex problems.
http://arxiv.org/abs/2403.13705v1
Compressor summary: The text discusses two types of search algorithms (depth-first and best-first) for minimax games and challenges the prevailing opinion that best-first algorithms are more efficient but less practical than depth-first algorithms.
http://arxiv.org/abs/2403.13703v1
Compressor summary: The paper proposes a lightweight YOLOv5 technique for detecting transmission line objects on mobile devices with improved efficiency and accuracy by integrating C3Ghost and FasterNet modules and using wIoUv3 loss function.
http://arxiv.org/abs/2403.13684v1
Compressor summary: SPTNet adapts both model and data representation for generalized category discovery, using a novel spatial prompt tuning method that focuses on object parts and achieves state-of-the-art performance with minimal additional parameters.
http://arxiv.org/abs/2403.13683v1
Compressor summary: The Deep Voxel Matching Network (DVMNet) is a method that computes the relative pose of an object between two images in 3D space using voxels and a least-squares problem, achieving more accurate results and lower computational cost than existing methods.
http://arxiv.org/abs/2403.13681v1
Compressor summary: The paper introduces PARAMANU-AYN, a legal language model based on Indian laws and constitution that can perform various legal tasks without much data or fine-tuning.
http://arxiv.org/abs/2403.13679v1
Compressor summary: RoleInteract is a new benchmark to evaluate social intelligence in AI conversational agents that mimic diverse characters and human behaviors.
http://arxiv.org/abs/2403.13678v1
Compressor summary: The paper proposes a novel audio-visual approach to improve facial action unit detection by using advanced features, adaptive fusion, and context-aware modeling.
http://arxiv.org/abs/2403.13677v1
Compressor summary: The proposed Retina Vision Transformer (RetinaViT) model incorporates low spatial frequency components in the input to improve visual scene formation and achieve better performance than the original ViT on ImageNet-1K dataset.
http://arxiv.org/abs/2403.13672v1
Compressor summary: The paper presents an ML-optimized approach to optimize parameters in meshfree simulation software, improving its usability and performance for various applications.
http://arxiv.org/abs/2403.13667v1
Compressor summary: The paper introduces DCM, a new dataset that combines camera movement, dance, and music, and proposes DanceCamera3D, a transformer-based model for synthesizing dance camera movements.
http://arxiv.org/abs/2403.13666v1
Compressor summary: Text-only Language Models can learn spatial relations with locations, outperforming Vision-and-Language Models on a verbalized VSR dataset.
http://arxiv.org/abs/2403.13663v1
Compressor summary: T-Pixel2Mesh is a new method to reconstruct 3D shapes from images using Transformers that improves details and works better with real-world data.
http://arxiv.org/abs/2403.13660v1
Compressor summary: The text introduces a new segmentation model for polyp detection in medical images that uses Prompt-Mamba, which combines Vision-Mamba and prompt technologies to achieve high accuracy and generalization across different datasets.
http://arxiv.org/abs/2403.13658v1
Compressor summary: The paper proposes a novel multimodal variational autoencoder (CardioVAE) that integrates chest X-ray and electrocardiogram data to predict cardiac hemodynamic instability, improving performance and interpretability over single-modality methods.
http://arxiv.org/abs/2403.13653v1
Compressor summary: The paper proposes a method to learn user embeddings for personalized saliency prediction using natural images and eye tracking data, improving performance and generalization.
http://arxiv.org/abs/2403.13652v1
Compressor summary: ZoDi is a zero-shot domain adaptation method that uses diffusion models to synthesize target-like images and train a model with both source and synthesized images, improving image segmentation performance without target images.
http://arxiv.org/abs/2403.13647v1
Compressor summary: The proposed method predicts potential keypoints for arbitrary objects using learnable embeddings, and refines them with support keypoints and a slacked regression loss.
http://arxiv.org/abs/2403.13642v1
Compressor summary: The paper proposes a new model, H-vmunet, for medical image segmentation that improves the 2D-selective-scan (SS2D) module by adding higher-order interactions and a Local-SS2D module to enhance local feature learning.
http://arxiv.org/abs/2403.13638v1
Compressor summary: The paper explores using machine translation to create synthetic data for pre-training language models in low-resource languages, showing improved performance on NLU and NLG tasks with efficient filtering and extended pretraining.
http://arxiv.org/abs/2403.13625v1
Compressor summary: The study describes how using learning and training methods with gamification can help law enforcement and other professionals better understand and combat cyber-crime and terrorism financing.
http://arxiv.org/abs/2403.13612v1
Compressor summary: The study evaluates the effectiveness of various methods for generating private synthetic biomedical data, focusing on the Mann-Whitney U test's ability to maintain validity and power when applied to such data.
http://arxiv.org/abs/2403.13600v1
Compressor summary: The paper introduces VL-Mamba, a more efficient multimodal language model based on state space models that can handle long sequences with fast inference and linear scaling.
http://arxiv.org/abs/2403.13592v1
Compressor summary: The authors study the political leanings and knowledge of Llama Chat, a large language model, when fine-tuned on EU politics and suggest it could be used for research purposes.
http://arxiv.org/abs/2403.13590v1
Compressor summary: The paper proposes efficient methods to transfer knowledge from debiased large language models to smaller, more reliable ones, improving performance while reducing parameters and computational cost.
http://arxiv.org/abs/2403.13589v1
Compressor summary: The image diffusion model that integrates gated self-attention into the U-Net shows a bias towards spatial cues over text cues, but this can be improved by changing the network architecture without fine-tuning.
http://arxiv.org/abs/2403.13578v1
Compressor summary: The paper proposes two new bandit methods to improve natural language generation by jointly optimizing multiple text qualities in counselor reflection generation.
http://arxiv.org/abs/2403.13570v1
Compressor summary: The paper presents a new learning approach for creating realistic 4D head avatars using pseudo multi-view videos and a vision transformer backbone, outperforming previous methods in various aspects.
http://arxiv.org/abs/2403.13560v1
Compressor summary: The article introduces eRST, a new discourse analysis framework that improves on RST by including more relation types, implicit and explicit signals, and provides tools and a large corpus to support it.
http://arxiv.org/abs/2403.13556v1
Compressor summary: The paper proposes a method to improve LiDAR-based 3D object detection in urban environments by using open-vocabulary learning and multi-sensor data with pre-trained vision-language models, achieving better novel object recall and reducing bias towards camera-proximal objects.
http://arxiv.org/abs/2403.13551v1
Compressor summary: Ground-A-Score is a model-agnostic image editing method that incorporates grounding during score distillation to accurately reflect complex text prompts and preserve object integrity in edited images.
http://arxiv.org/abs/2403.13548v1
Compressor summary: Our method prunes channels based on their sensitivities to latent vector perturbations, improving sample diversity and FID scores in compressed StyleGAN models without extra training cost.
http://arxiv.org/abs/2403.13547v1
Compressor summary: The study shows that using features from large language models can improve or match the accuracy of predicting traffic incident severity using conventional machine learning algorithms.
http://arxiv.org/abs/2403.13545v1
Compressor summary: The paper proposes a deep learning method for predicting fire risk in an area based on images representing daily snapshots and various features.
http://arxiv.org/abs/2403.13537v1
Compressor summary: ORCA is a cross-modal fine-tuning technique that performs well on 1D tasks but not 2D tasks, and model fine-tuning is more important than embedder training for its success.
http://arxiv.org/abs/2403.13535v1
Compressor summary: IDAdapter is a new method that creates personalized and diverse avatars from a single face image, using textual and visual inputs and preserving identity details.
http://arxiv.org/abs/2403.13524v1
Compressor summary: The paper introduces a triplane autoencoder with a 3D-aware cross-attention mechanism and a diffusion model for efficient compression and high-speed generation of 3D models from images, achieving superior performance compared to existing methods.
http://arxiv.org/abs/2403.13523v1
Compressor summary: Key points: - Large amounts of training data lead to potential threats like poisoning attacks - Paper proposes a new way to detect and filter poisoned datapoints in transfer learning setting - New characteristic vector representation captures intrinsic properties of data distribution - Experiments show that the proposal outperforms existing defenses Summary: The paper presents a novel approach using characteristic vectors to defend against clean-label poisoning attacks on neural networks in transfer learning settings and shows its effectiveness in experiments.
http://arxiv.org/abs/2403.13522v1
Compressor summary: REAL is a novel method for exemplar-free class-incremental learning that enhances representation by combining supervised and self-supervised learning, distillation, and analytic learning.
http://arxiv.org/abs/2403.13518v1
Compressor summary: Text2motion aims to create motion sequences from fine-grained textual descriptions, using FineHumanML3D dataset and FineMotionDiffuse model trained with GPT-3.5-turbo.
http://arxiv.org/abs/2403.13514v1
Compressor summary: The study examines political biases in Czech neural language models and finds no systematic alignment with values, but rather superficial imitation of training data patterns.
http://arxiv.org/abs/2403.13513v1
Compressor summary: The paper introduces Counterfactual Inception, a method to reduce hallucination in large multimodal models by injecting counterfactual thoughts using misaligned keywords, and Dual-modality Verification Process for selecting optimal keywords.
http://arxiv.org/abs/2403.13512v1
Compressor summary: SDD improves logit knowledge distillation by separating global logits into local ones and transferring unambiguous, fine-grained knowledge to the student.
http://arxiv.org/abs/2403.13507v1
Compressor summary: The paper introduces a new adversarial attack on video-based language models that can trick them into generating wrong or nonsensical answers by adding subtle perturbations to videos.
http://arxiv.org/abs/2403.13501v1
Compressor summary: The paper introduces VSTAR, a method that improves the generation of longer, more dynamic videos from text using generative temporal nursing with video synopsis prompting and temporal attention regularization.
http://arxiv.org/abs/2403.13499v1
Compressor summary: The text describes how large language models can be used for various computer vision tasks by connecting them to perceptual backbones, and presents an experimental evaluation of different interfacing mechanisms that improves performance and reduces training time.
http://arxiv.org/abs/2403.13485v1
Compressor summary: The paper proposes an Entropy-based Watermark Detection (EWD) for large language models that considers token entropy during detection and improves performance in low-entropy scenarios.
http://arxiv.org/abs/2403.13480v1
Compressor summary: The paper proposes a new framework for cross-modal retrieval that uses optimal transport to align semantics, correct noisy labels, and narrow the heterogeneous gap between modalities.
http://arxiv.org/abs/2403.13479v1
Compressor summary: The paper introduces a new method for training deepfake detectors that can recognize various techniques by injecting crafted frequency patterns into pristine images, improving their generalization capabilities.
http://arxiv.org/abs/2403.13470v1
Compressor summary: Key points: - The paper proposes using diffusion models to complete 3D LiDAR scenes from a single scan - The method operates directly on the points, not on range images extracted from LiDAR - A regularization loss is added to stabilize the noise prediction during denoising - The results show more detailed and complete scenes than existing methods Summary: The paper presents a diffusion model approach that completes 3D LiDAR scenes directly on points, with a regularization loss to reduce noise, achieving better results than previous scene completion methods.
http://arxiv.org/abs/2403.13469v1
Compressor summary: The paper proposes a novel method to create a synthetic medical image dataset from the original one while preserving useful information, improving training stability, diversity, and performance using progressive trajectory matching and dynamic overlap mitigation.
http://arxiv.org/abs/2403.13466v1
Compressor summary: Key points: - The paper presents an AI-assisted skin care recommendation system integrated into an XR platform - The system uses a CNN model to analyse skin type and recommend personalised products based on facial image and questionnaire data - The system achieves high accuracy in classifying skin issues and can provide immersive and engaging experiences to users Summary: The paper proposes an AI-powered XR platform that analyses users' skin type and recommends personalised products using a facial image and a questionnaire, providing immersive and engaging skincare experiences.
http://arxiv.org/abs/2403.13447v1
Compressor summary: HyperLLaVA improves multimodal task performance by dynamically adjusting visual and language experts using HyperNetworks, unlike static tuning in LLaVA.
http://arxiv.org/abs/2403.13444v1
Compressor summary: The paper proposes a novel method for generating medical reports from X-ray images without paired data, using cycle-consistent mapping functions and report auto-encoding, leading to improved results in chest X-ray report generation.
http://arxiv.org/abs/2403.13443v1
Compressor summary: Fast-Poly is a fast and effective 3D multi-object tracking method that addresses object rotational anisotropy, enhances local computation densification, and leverages parallelization for improved accuracy and speed on large-scale datasets.
http://arxiv.org/abs/2403.13441v1
Compressor summary: The paper studies formal verification problems for neural networks using symbolic specifications and provides a theoretical framework to analyze their complexities in a semi-linear setting.
http://arxiv.org/abs/2403.13434v1
Compressor summary: The study presents a new method for accurate 6D pose estimation in Augmented Reality from uncontrolled RGB images, improving 3D object overlaying and applications in manufacturing and robotics.
http://arxiv.org/abs/2403.13433v1
Compressor summary: The Agent Group Chat simulation explores how language influences human collective behavior by creating diverse narrative scenarios and measuring the disorder of agents' interactions.
http://arxiv.org/abs/2403.13430v1
Compressor summary: This study proposes Multi-Task Pretraining for foundation models in Remote Sensing, which improves downstream tasks by addressing task discrepancy and achieving competitive performance with fewer parameters.
http://arxiv.org/abs/2403.13417v1
Compressor summary: The D-Persona framework aims to achieve both diversified and personalized results in multi-rater medical image segmentation by training a Probabilistic U-Net model and using attention-based projection heads.
http://arxiv.org/abs/2403.13412v1
Compressor summary: The paper proposes a cell tracking method for C. elegans that handles large migrations due to head movement, inconsistent detections, and low-contrast images by using non-rigid alignment and pairwise detection.
http://arxiv.org/abs/2403.13408v1
Compressor summary: The paper proposes a new video generation method called Sector-Shaped Diffusion Model (S2DM) that maintains consistency and continuity across video frames by using sector-shaped diffusion regions and optical flow as temporal conditions.
http://arxiv.org/abs/2403.13405v1
Compressor summary: The paper proposes a novel network for 3D hand pose estimation using ordinal regression, which improves accuracy by reducing noise and outliers in large-scale regression offset values.
http://arxiv.org/abs/2403.13395v1
Compressor summary: The paper proposes UMF, a multi-modal model that improves SLAM performance in challenging environments by cross-attention between vision and LiDAR features and re-ranking based on local feature matching.
http://arxiv.org/abs/2403.13392v1
Compressor summary: The paper proposes a robust image segmentation model that uses intensity inhomogeneity and binary level set to handle noise and improve performance.
http://arxiv.org/abs/2403.13378v1
Compressor summary: The paper proposes a novel diffusion model for semantic image synthesis that uses segmentation masks and style reference images, and improves the generation quality with refinement, color-transfer, and model ensembles techniques.
http://arxiv.org/abs/2403.13376v1
Compressor summary: The paper presents methods for finding similar patterns in microscopy images of heterogeneous organoids using models, algorithms, and clustering techniques.
http://arxiv.org/abs/2403.13375v1
Compressor summary: FOMC is a novel FSOD method for remote sensing images that uses oriented bounding boxes and supervised contrastive learning to improve detection performance for arbitrary-oriented objects with limited annotations.
http://arxiv.org/abs/2403.13372v1
Compressor summary: LlamaFactory is a framework that helps users fine-tune large language models efficiently and effectively using a web UI without coding.
http://arxiv.org/abs/2403.13370v1
Compressor summary: The paper introduces a new problem in multi-class learning, called Learning from the Majority Label, and proposes a counting network to solve it effectively.
http://arxiv.org/abs/2403.13369v1
Compressor summary: The authors evaluate domain-adapted and prompted lightweight models for medical information extraction from German doctor's letters, achieving high accuracy with minimal training data and ensuring interpretability of predictions using Shapley values.
http://arxiv.org/abs/2403.13368v1
Compressor summary: The paper discusses using computational language models in brain research, evaluates their performance, and emphasizes the importance of diverse data and strict experiments.
http://arxiv.org/abs/2403.13352v1
Compressor summary: AGFSync improves image generation by using VLM to assess and provide feedback to T2I diffusion models in a closed AI-driven loop, leading to better quality images and performance on benchmarks.
http://arxiv.org/abs/2403.13351v1
Compressor summary: Orthogonal Capsule Network (OrthCaps) reduces redundancy and improves routing performance in CapsNet using pruning, sparse attention routing, and orthogonal weight matrices, achieving competitive results with significantly fewer parameters.
http://arxiv.org/abs/2403.13349v1
Compressor summary: HGAD is a novel anomaly detection method that uses hierarchical Gaussian mixture modeling to overcome the limitations of existing normalizing flow-based approaches, achieving better performance on real-world datasets.
http://arxiv.org/abs/2403.13347v1
Compressor summary: The authors propose vid-TLDR, a lightweight video Transformer that merges background tokens and focuses on salient regions using attention maps and token dropping, to improve efficiency without sacrificing performance.
http://arxiv.org/abs/2403.13343v1
Compressor summary: TiBiX is a method that uses temporal information to generate reports and images for chest X-rays, improving the quality of medical information.
http://arxiv.org/abs/2403.13341v1
Compressor summary: The paper proposes a hierarchical merging approach to improve medical imaging performance and robustness by aggregating models based on hyperparameter configurations, leading to better results than model soups on in-domain and out-of-distribution tasks.
http://arxiv.org/abs/2403.13338v1
Compressor summary: The paper proposes Brain-SubGNN, a graph representation network that mines and enhances critical subgraphs based on T1-MRI to improve understanding and diagnosis of early-stage dementia.
http://arxiv.org/abs/2403.13337v1
Compressor summary: The paper proposes a method to decompose illumination, reflectance, and noise from input views under low-light conditions, enabling better synthesis of novel views with improved visual quality.
http://arxiv.org/abs/2403.13335v1
Compressor summary: The study shows that combining multiple transformer-based models with adaptive ensemble algorithms can significantly improve the accuracy of detecting fake text generated by large language models on different types of data.
http://arxiv.org/abs/2403.13334v1
Compressor summary: The study aims to create a lightweight language model, Hyacinth6B, that performs well without high computational costs by using the LoRA method for efficient finetuning.
http://arxiv.org/abs/2403.13331v1
Compressor summary: The paper introduces an autoregressive method for motion prediction in autonomous driving using GPT-style next token prediction, factorized attention modules, and position encoding styles to capture spatial-temporal relations.
http://arxiv.org/abs/2403.13330v1
Compressor summary: SGENet is an efficient framework for scene text recognition that combines super-resolution and semantic guidance branches to improve accuracy while keeping low computational costs.
http://arxiv.org/abs/2403.13327v1
Compressor summary: Our method adapts to handheld video camera motion and improves scene reconstruction using detailed image formation modeling and differentiable rendering with velocities from visual-inertial odometry.
http://arxiv.org/abs/2403.13324v1
Compressor summary: ODPC uses language models to generate peer classes for OOD detection, improving reliability and security of machine learning models.
http://arxiv.org/abs/2403.13322v1
Compressor summary: The authors present a comprehensive benchmark for evaluating the adversarial robustness of compressed datasets created by various distillation methods, using different attacks and datasets.
http://arxiv.org/abs/2403.13319v1
Compressor summary: Key points: - Paper proposes a hypernetwork framework to fuse medical imaging and tabular data for multimodal tasks in healthcare - Method conditions image processing on EHR values and measurements - Outperforms single-modality models and existing fusion methods on brain MRI tasks Summary: The paper introduces a novel hypernetwork approach to merge medical imaging and tabular data, which improves multimodal healthcare applications by enhancing image processing with EHR information.
http://arxiv.org/abs/2403.13315v1
Compressor summary: The paper introduces PuzzleVQA, a dataset to test large multimodal models on abstract patterns, and finds that they struggle with visual perception and inductive reasoning, suggesting limitations in emulating human cognition.
http://arxiv.org/abs/2403.13313v1
Compressor summary: Polaris is a safety-focused system of large language models that can engage in real-time voice conversations with patients, performing better than previous models and human nurses on medical safety and bedside manner.
http://arxiv.org/abs/2403.13312v1
Compressor summary: The authors use Lean, a theorem proving framework, to improve LLMs' logical reasoning skills by formalizing problems into theorems and solving them with Lean's symbolic solver and library of proofs.
http://arxiv.org/abs/2403.13311v1
Compressor summary: The Multi-Robot Connected Fermat Spiral (MCFS) algorithm helps multiple robots navigate around obstacles for efficient area coverage and smooth paths, improving performance in complex environments.
http://arxiv.org/abs/2403.13307v1
Compressor summary: LaserHuman is a new dataset for generating realistic human motions from natural language descriptions in various 3D environments, with a novel multi-conditional diffusion model that outperforms existing methods.
http://arxiv.org/abs/2403.13304v1
Compressor summary: DetDiffusion is a novel method that combines generative and perceptive models to create high-quality synthetic images for object detection tasks using segmentation and perception-aware attributes.
http://arxiv.org/abs/2403.13298v1
Compressor summary: RoPE enhances Vision Transformer performance on image tasks by improving resolution and precision, while being computationally efficient.
http://arxiv.org/abs/2403.13293v1
Compressor summary: AutoBuild is a scheme that learns to assign importance scores to neural architecture modules using latent embeddings, enabling high-quality network construction without costly search.
http://arxiv.org/abs/2403.13289v1
Compressor summary: Text-to-3D shape generation is a rapidly advancing field with many challenges, and this report surveys the existing methods and suggests future research directions.
http://arxiv.org/abs/2403.13282v1
Compressor summary: AdaViPro is a method that learns to add and remove prompts in different regions of an image to fine-tune pre-trained models efficiently.
http://arxiv.org/abs/2403.13269v1
Compressor summary: AFLoRA is a new fine-tuning method that uses low-rank matrices to adapt pre-trained models with fewer parameters, less computation, and better performance on the GLUE benchmark.
http://arxiv.org/abs/2403.13268v1
Compressor summary: Unifews is a novel method that jointly sparsifies edges and weights of graph neural networks, reducing computational cost without sacrificing accuracy.
http://arxiv.org/abs/2403.13263v1
Compressor summary: The paper introduces SC-Tune, a novel fine-tuning method for Large Vision Language Models that improves their self-consistency and object-level comprehension abilities.
http://arxiv.org/abs/2403.13261v1
Compressor summary: The text describes a new method for autonomous driving systems to predict motion using unlabeled LiDAR point clouds and coarse pseudo motion labels, outperforming existing self-supervised methods.
http://arxiv.org/abs/2403.13258v1
Compressor summary: The paper introduces SAMCT, a modified segment anything model for medical imaging that improves performance by adding a U-shaped CNN image encoder, cross-branch interaction, and task-indicator prompt encoder to address the lack of medical knowledge and insufficient feature extraction.
http://arxiv.org/abs/2403.13257v1
Compressor summary: MergeKit is an open-source library that helps merge pre-trained language models to improve performance and versatility without additional training.
http://arxiv.org/abs/2403.13253v1
Compressor summary: The paper explores using grammatical structure information extracted by a statistical natural language parser to detect authorship of texts, and tests the method on The Federalist Papers and Sanditon.
http://arxiv.org/abs/2403.13250v1
Compressor summary: CensorChat is a dialogue monitoring dataset that uses knowledge distillation of large language models to annotate and develop text classifiers for detecting pornographic content in human-machine interaction dialogues.
http://arxiv.org/abs/2403.13249v1
Compressor summary: This paper introduces a unified framework for Continual Learning methods, reveals their common mathematical structures, and proposes refresh learning, an innovative technique inspired by neuroscience to enhance CL performance.
http://arxiv.org/abs/2403.13248v1
Compressor summary: Mora is a new multi-agent framework that mimics Sora's generalist video generation capabilities using several advanced visual AI agents for various tasks.
http://arxiv.org/abs/2403.13246v1
Compressor summary: The authors develop a home EV charging prediction method using historical smart meter data, which can help with load scheduling and energy management, and achieve high accuracy.
http://arxiv.org/abs/2403.13244v1
Compressor summary: The text introduces TSMMG, a large language model that generates molecules based on natural language descriptions and performs well across various tasks and styles.
http://arxiv.org/abs/2403.13241v1
Compressor summary: The paper proposes a method to separate clean and noisy data memorization in deep networks using additive parameter decomposition, improving generalization and reducing overfitting.
http://arxiv.org/abs/2403.13240v1
Compressor summary: The paper proposes a summarize-and-translate pipeline for cross-lingual summarization, which leverages existing resources and shows competitive performance with few-shot fine-tuning.
http://arxiv.org/abs/2403.13238v1
Compressor summary: The paper presents a new framework to generate coherent 4D sequences of 3D shapes with dynamic evolution of shape and color using diffusion models and latent mapping.
http://arxiv.org/abs/2403.13233v1
Compressor summary: The paper presents a third-place solution for the BetterMixture challenge, which uses Ke-Data-Juicer to optimize and filter data for large language models.
http://arxiv.org/abs/2403.13219v1
Compressor summary: The paper proposes a method to optimize complex designs using diffusion models, which generate near-optimal solutions based on noisy rewards or human preferences, preserving latent structures and achieving sub-optimality error bounds.
http://arxiv.org/abs/2403.13218v1
Compressor summary: The paper introduces a new variant of the resonator network that uses self-attention based update rules for faster and better associative memory tasks like pattern recognition, scene decomposition, and object reasoning.
http://arxiv.org/abs/2403.13213v1
Compressor summary: The paper investigates the effectiveness of safety measures in large language models and shows that safety responses can still encode harmful assumptions, leading to trade-offs between helpfulness and safety.