This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-18 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.10521v1
Compressor summary: P-MapNet uses map priors from both SDMap and HDMap to improve online map generation for autonomous vehicles in regions without high-definition maps.
http://arxiv.org/abs/2403.10520v1
Compressor summary: The controllable blind image decomposition network allows users to choose which degradations to remove or keep in an image, using a minimal amount of computational resources.
http://arxiv.org/abs/2403.10519v1
Compressor summary: The paper explores applying data augmentations to frozen features in few-shot image classification tasks, finding that simple pointwise FroFA improves performance consistently across different settings.
http://arxiv.org/abs/2403.10518v1
Compressor summary: Lodge is a network that generates long dance routines based on music using two diffusion stages and foot refinement, achieving physical realism and expressive motions.
http://arxiv.org/abs/2403.10517v1
Compressor summary: VideoAgent is a system that uses an agent-based approach to reason interactively and plan for long-form video understanding, outperforming existing methods on two benchmarks.
http://arxiv.org/abs/2403.10516v1
Compressor summary: FeatUp is a framework to restore spatial resolution in deep features for computer vision tasks such as segmentation and depth prediction.
http://arxiv.org/abs/2403.10511v1
Compressor summary: The paper introduces a novel framework that jointly predicts the gaze target and social gaze label for multiple people in a scene using a temporal, transformer-based architecture and a new dataset.
http://arxiv.org/abs/2403.10502v1
Compressor summary: The text proposes a new quantitative framework for belief change based on knowledge measures that minimizes surprise and satisfies AGM postulates, introducing information measures for contraction, expansion, and revision operations.
http://arxiv.org/abs/2403.10476v1
Compressor summary: The authors propose a method to improve vision transformers' robustness by adding nullspace noise to the input during fine-tuning.
http://arxiv.org/abs/2403.10459v1
Compressor summary: Double descent is when increasing model complexity past the interpolation point lowers test error due to inductive biases selecting smooth empirical risk minimizers.
http://arxiv.org/abs/2403.10452v1
Compressor summary: The text introduces a novel method to infer simple geometric shapes (cuboids) from complex real-world scenes using depth maps and neural networks, without requiring manual annotations.
http://arxiv.org/abs/2403.10446v1
Compressor summary: The authors propose a system that uses Retrieval Augmented Generation to improve the accuracy of Large Language Models for domain-specific and time-sensitive queries, but finetuning with small and skewed datasets has limitations.
http://arxiv.org/abs/2403.10444v1
Compressor summary: The paper proposes a block-level verification algorithm for speculative decoding that improves wall-clock speedup by considering a wider range of draft verification algorithms and obtaining higher accepted tokens in expectation.
http://arxiv.org/abs/2403.10434v1
Compressor summary: Spotter+GPT combines a sign spotter with a large language model to generate spoken sentences from sign language videos.
http://arxiv.org/abs/2403.10427v1
Compressor summary: The paper proposes a method to improve 3D Gaussian Splatting for learning 3D scenes from unstructured in-the-wild photos by modeling appearance and handling occluders.
http://arxiv.org/abs/2403.10425v1
Compressor summary: The paper proposes NeuFlow, an efficient learning-based optical flow method that achieves high accuracy and speedup compared to state-of-the-art methods, enabling real-time computer vision tasks on edge computing platforms like drones.
http://arxiv.org/abs/2403.10424v1
Compressor summary: The authors propose an evaluation framework for synthetic tabular data quality based on a single objective that the synthetic data should match the observed data distribution, and show that structured synthesizers perform better than others.
http://arxiv.org/abs/2403.10416v1
Compressor summary: The paper proposes efficient and optimal robust estimators for Gaussian sparse estimation tasks, such as mean estimation, PCA, and linear regression, using a new multidimensional filtering method.
http://arxiv.org/abs/2403.10415v1
Compressor summary: The text discusses the need for explainable AI, especially for neural networks, and presents a taxonomy of gradient based explanation methods along with challenges and evaluations.
http://arxiv.org/abs/2403.10413v1
Compressor summary: The paper proposes a method to efficiently integrate multi-head self-attention into high resolution representation CNNs for image segmentation using architecture search, achieving better efficiency and effectiveness than previous methods.
http://arxiv.org/abs/2403.10404v1
Compressor summary: The study develops models to use Measure While Drilling data for accurate and automated rock mass quality classification, supporting decision making in tunnel engineering.
http://arxiv.org/abs/2403.10403v1
Compressor summary: The paper proposes an EBM-based method for OOD detection that outperforms KNN on CIFAR-10/CIFAR-100 benchmarks.
http://arxiv.org/abs/2403.10395v1
Compressor summary: Isotropic3D is an image-to-3D generation pipeline that uses a CLIP embedding and a text-to-3D diffusion model fine-tuned with Explicit Multi-view Attention to generate consistent, symmetrical, and less distorted 3D models.
http://arxiv.org/abs/2403.10391v1
Compressor summary: CDMAD is a new SSL algorithm that addresses class imbalance and distribution mismatch by refining biased pseudo-labels and ensuring a neutral classifier.
http://arxiv.org/abs/2403.10390v1
Compressor summary: The paper proposes a new method for evaluating perceptual distances using statistical modeling of binomial distributions fitted to human judgments in two-alternative forced choice experiments, improving on previous binary decision approaches.
http://arxiv.org/abs/2403.10381v1
Compressor summary: The paper introduces a method to find and edit numeric property representations in language models, showing how changing these representations can alter the model's output.
http://arxiv.org/abs/2403.10379v1
Compressor summary: The paper presents an anytime variant of the Estimation-To-Decisions algorithm that optimizes exploration-exploitation trade-off online for sequential decision-making in structured bandits and reinforcement learning.
http://arxiv.org/abs/2403.10378v1
Compressor summary: EXAMS-V is a new multilingual exam benchmark that tests vision language models' abilities to reason over text and images from diverse disciplines and regions.
http://arxiv.org/abs/2403.10376v1
Compressor summary: PASTA is a novel framework for HDR deghosting that uses hierarchical representation and feature distanglement to achieve both effectiveness and efficiency, with a significant 3x speedup compared to current methods.
http://arxiv.org/abs/2403.10373v1
Compressor summary: The paper proposes a framework to use XAI techniques to improve pre-trained DL classifiers without retraining them, using either auto-encoder or encoder-decoder based learning strategies.
http://arxiv.org/abs/2403.10371v1
Compressor summary: This paper proposes a technique called ENAMLE that reduces energy consumption and improves performance in IoT-based machine learning systems facing data incompleteness due to missing sensor readings.
http://arxiv.org/abs/2403.10369v1
Compressor summary: The Open Stamped Parts Dataset (OSPD) contains real and synthetic images of metal sheets with annotations for defect detection, which can help improve automotive manufacturing and computer vision.
http://arxiv.org/abs/2403.10367v1
Compressor summary: The paper evaluates MediaPipe Holistic and OpenFace for landmark tracking of sign languages and finds that both methods need further improvement for linguistic analysis of eyebrow movements.
http://arxiv.org/abs/2403.10357v1
Compressor summary: ANIM is a novel method that accurately reconstructs 3D human shapes from single-view RGB-D images by incorporating depth observations and leveraging multi-resolution features to overcome depth ambiguities.
http://arxiv.org/abs/2403.10353v1
Compressor summary: The paper presents SimPB, a single model that simultaneously detects 2D and 3D objects from multiple cameras using a hybrid decoder and dynamic query allocation modules.
http://arxiv.org/abs/2403.10351v1
Compressor summary: TriSum is a framework that distills large language models' text summarization abilities into a compact, local model, improving performance and interpretability on various benchmarks.
http://arxiv.org/abs/2403.10349v1
Compressor summary: Key points: - Paper proposes ParaPoint, an unsupervised neural network for parameterizing point clouds with UV coordinates - Uses sub-networks with specific functionalities and bi-directional cycle mapping framework - Introduces effective loss functions and differential geometric constraints - First attempt to achieve global mappings and free boundaries with neural point cloud parameterization Summary: The paper presents ParaPoint, a novel neural network that maps point clouds to UV coordinates with adaptive boundaries, using sub-networks, cycle mapping, and geometric constraints.
http://arxiv.org/abs/2403.10348v1
Compressor summary: The paper investigates the difficulty of denoising tasks in diffusion-based generative models across different timesteps and proposes an easy-to-hard learning scheme to improve performance and convergence.
http://arxiv.org/abs/2403.10344v1
Compressor summary: SCILLA is a new method for reconstructing large outdoor scenes from images using two implicit fields, one for density and another for distance to the surface, with a novel volume-rendering strategy that works without geometric priors.
http://arxiv.org/abs/2403.10340v1
Compressor summary: Thermal-NeRF is a novel method that uses infrared cameras to reconstruct 3D scenes with high detail under poor lighting conditions, outperforming existing methods.
http://arxiv.org/abs/2403.10339v1
Compressor summary: HedGe is a new GNN model that generates low-variance edges to improve anomaly detection and robustness in graphs.
http://arxiv.org/abs/2403.10338v1
Compressor summary: The text describes an experiment comparing an LSTM and a transformer model's ability to learn and generalize grammatical gender in French, similar to human learning, and finds that both models show a masculine gender bias.
http://arxiv.org/abs/2403.10336v1
Compressor summary: The paper proposes Continuous Scaling Attention (CSAttn), a new attention mechanism for image restoration without using feed-forward networks, and demonstrates its effectiveness on various tasks.
http://arxiv.org/abs/2403.10335v1
Compressor summary: Key points: - NECA is an approach to learn versatile human representation from videos - It predicts disentangled neural fields for geometry, albedo, shadow, and lighting - It enables realistic rendering and editing of human avatars Summary: NECA learns human avatars from videos and allows customization of their appearance and illumination.
http://arxiv.org/abs/2403.10330v1
Compressor summary: The paper explores the differences between adversarial examples and counterfactual explanations, and argues for obtaining non-adversarial algorithmic recourse in high-stakes situations.
http://arxiv.org/abs/2403.10326v1
Compressor summary: This paper proposes a method to generate cloze distractors using pre-trained language models, improving learner ability assessment effectiveness.
http://arxiv.org/abs/2403.10318v1
Compressor summary: ATLAS is an efficient anytime Neural Architecture Search approach for tabular data that uses a two-phase filtering-and-refinement optimization scheme and reduces search time by up to 82.75x.
http://arxiv.org/abs/2403.10304v1
Compressor summary: KIF is a framework that integrates different knowledge bases using Wikidata as a common language and provides a unified view and querying options.
http://arxiv.org/abs/2403.10301v1
Compressor summary: Uni-SMART is a multimodal model that improves the analysis of scientific literature by understanding its various elements, such as molecular structure, tables, and charts, and outperforms existing text-focused models in several domains.
http://arxiv.org/abs/2403.10299v1
Compressor summary: The paper presents MSGW-FLM, a new resource allocation model for emergency relief operations that uses IoT and spatio-temporal data analytics to optimize decision-making in complex disaster scenarios.
http://arxiv.org/abs/2403.10298v1
Compressor summary: The paper proposes a novel network (CSQA-Net) for fine-grained visual categorization that uses cross-attention and semantic quality evaluation to improve feature representations and discriminability.
http://arxiv.org/abs/2403.10297v1
Compressor summary: The paper proposes a method to improve keypoint scene coordinate regression (KSCR) by synthesizing novel keypoint descriptors using Neural Radiance Fields, enhancing localization accuracy and generalization in data-scarce environments.
http://arxiv.org/abs/2403.10293v1
Compressor summary: The MaiBaam corpus is the first multi-dialect Bavarian treebank with UD annotation, covering various text genres and illustrating morphosyntactic differences between Bavarian and German.
http://arxiv.org/abs/2403.10291v1
Compressor summary: Key points: - The text proposes a CNN-based framework to predict myocardial disease substrates from regional strain patterns - The method converts clinical standard bullseye representation to a multi-channel 2D image for image classification - The method achieves high accuracy in detecting and localizing myocardial scar on simulated data Summary: The text presents a CNN-based framework that uses regional strain patterns as input to predict and locate myocardial disease substrates, such as scar, from clinical standard data.
http://arxiv.org/abs/2403.10287v1
Compressor summary: VISE is a training-free method that converts few-shot image classification and segmentation into VQA using VLMs and off-the-shelf vision models, achieving state-of-the-art results.
http://arxiv.org/abs/2403.10283v1
Compressor summary: The paper proposes a hierarchical Visual Place Recognition pipeline that uses training-free and data-efficient local feature encoding, attention module, and hyperdimensional computing to achieve better performance, speed, and storage compared to existing methods.
http://arxiv.org/abs/2403.10281v1
Compressor summary: Pre-CoFactv3 is a framework that uses In-Context Learning, Fine-tuned LLMs, and FakeNet to improve fact verification accuracy and won first place in the AAAI-24 Factify 3.0 Workshop.
http://arxiv.org/abs/2403.10275v1
Compressor summary: The paper proposes a way to measure the quality of explanations from large language models and shows that simpler models have clearer explanations than transformers.
http://arxiv.org/abs/2403.10261v1
Compressor summary: This paper proposes a simple and effective method called Thumbnail Layout (TALL) for deepfake video detection that transforms clips into pre-defined layouts, and enhances it with a graph reasoning block and semantic consistency loss to improve performance.
http://arxiv.org/abs/2403.10259v1
Compressor summary: The study compares various machine learning algorithms to determine the best one for predicting and analyzing machine performance in maintenance applications.
http://arxiv.org/abs/2403.10258v1
Compressor summary: The authors show that English-centric large language models may not perform well on culture-related tasks when prompted in English, and suggest developing stronger multilingual models instead.
http://arxiv.org/abs/2403.10255v1
Compressor summary: The paper proposes a new method for super-resolution and image generation at arbitrary scales using a latent diffusion model, an auto-encoder, and an implicit neural decoder, which improves quality, diversity, and consistency while reducing memory and inference time.
http://arxiv.org/abs/2403.10254v1
Compressor summary: The paper introduces a new framework called EDITOR that uses tokenized features from different modalities and adaptive selection to improve multi-modal object re-identification robustness and discrimination.
http://arxiv.org/abs/2403.10253v1
Compressor summary: The paper proposes a novel framework that combines continual learning and granular-ball computing for feature selection in data preprocessing, addressing unknown classes and knowledge transfer challenges.
http://arxiv.org/abs/2403.10252v1
Compressor summary: The study proposes a novel method for partially supervised multi-task dense prediction by aligning region-wise Gaussian distributions to capture cross-task relationships, achieving state-of-the-art results on two benchmarks.
http://arxiv.org/abs/2403.10249v1
Compressor summary: This paper reviews LM-based Agents for games, their challenges, and suggests future research directions.
http://arxiv.org/abs/2403.10245v1
Compressor summary: The paper presents CoLeCLIP, a novel method to learn open-domain continual learning of vision-language models using task prompts and a cross-domain class vocabulary, improving performance on 11 domain datasets.
http://arxiv.org/abs/2403.10242v1
Compressor summary: The paper presents FDGaussian, a novel framework for single-image 3D reconstruction that uses orthogonal plane decomposition and epipolar attention to generate consistent multi-view images and high-quality 3D objects.
http://arxiv.org/abs/2403.10239v1
Compressor summary: The paper presents a new method to analyze news articles and social networks to understand how regional factors influence FDI in African companies.
http://arxiv.org/abs/2403.10237v1
Compressor summary: The study aims to improve topic detection in Persian language by adapting existing methods and comparing their performance on social network texts using a new evaluation criterion.
http://arxiv.org/abs/2403.10236v1
Compressor summary: The paper proposes a method for generating density maps for object counting using various prompt types, improving accuracy with fixed-point inference and contrastive training.
http://arxiv.org/abs/2403.10231v1
Compressor summary: One-shot-subgraph link prediction efficiently predicts links in knowledge graphs by extracting a query-dependent subgraph and using Personalized PageRank to identify potential answers and evidence.
http://arxiv.org/abs/2403.10228v1
Compressor summary: HawkEye is a new video-text language model that can answer questions about long videos by using temporal information, thanks to a large-scale dataset and two new training objectives.
http://arxiv.org/abs/2403.10220v1
Compressor summary: AERO is a novel framework for anomaly detection in astronomical observations that uses a Transformer encoder-decoder and a window-wise graph neural network to distinguish normal patterns from concurrent noise, reducing false alarms.
http://arxiv.org/abs/2403.10216v1
Compressor summary: The study proposes using optical flow maps as an additional input to the nnU-Net framework for improved surgical instrument segmentation in laparoscopy by leveraging the temporal information of moving objects.
http://arxiv.org/abs/2403.10214v1
Compressor summary: ECAN is a novel model for aspect-category-based sentiment analysis that uses coherence modeling and hierarchical disentanglement to identify and extract multiple aspect categories and their sentiments from reviews.
http://arxiv.org/abs/2403.10211v1
Compressor summary: BlindDiff is a diffusion model-based method for blind image super-resolution that integrates MAP optimization, modulated conditional transformer, and kernel-aware gradient to handle complex unknown degradations and achieve state-of-the-art performance.
http://arxiv.org/abs/2403.10205v1
Compressor summary: The paper proposes functionality extraction from Git README files, a challenging text generation task, and introduces FuncRead dataset and models that outperform existing large language models for this task.
http://arxiv.org/abs/2403.10191v1
Compressor summary: GenerateU is a framework for generative open-ended object detection that can detect and name objects without predefined categories, using Deformable DETR and a language model.
http://arxiv.org/abs/2403.10190v1
Compressor summary: The text discusses annotator label uncertainty, its negative effects on model performance, and proposes a new method to generate multiple labels for training models without requiring massive annotations.
http://arxiv.org/abs/2403.10185v1
Compressor summary: The DeFaBel corpus explores the relation between deception and factuality based on personal belief, and shows that people are more confident in arguments aligned with their beliefs.
http://arxiv.org/abs/2403.10184v1
Compressor summary: Lifted inference speeds up causal inference in relational domains by using symmetries and representative objects.
http://arxiv.org/abs/2403.10182v1
Compressor summary: This study compares different neural network ensembles for uncertainly estimation in operations research, finding the batch ensemble to be cost-effective and competitive while reducing training and test time and memory usage.
http://arxiv.org/abs/2403.10179v1
Compressor summary: The SMCD method combines semantic and motion cues in a diffusion model to improve text-to-video generation by enhancing control over video outputs.
http://arxiv.org/abs/2403.10175v1
Compressor summary: Importance weighting is a simple and useful technique in statistics and machine learning that adjusts the objective function or probability distribution based on the importance of instances, with many applications such as handling distribution shift.
http://arxiv.org/abs/2403.10173v1
Compressor summary: The authors propose a novel hybrid attention-based spiking neural network (SNN) and artificial neural network (ANN) architecture for object detection using event cameras, achieving ANN-like performance with reduced parameters and low latency on neuromorphic hardware.
http://arxiv.org/abs/2403.10171v1
Compressor summary: AUTONODE is an AI system that uses neuro-graphical techniques and learning from experience to autonomously navigate and execute complex tasks on web interfaces without predefined scripts or manual intervention.
http://arxiv.org/abs/2403.10170v1
Compressor summary: The paper introduces computer UI understanding, a challenging task, and presents a dataset of videos showing user actions on desktop interfaces and a framework using synthetic samples and contrastive learning for fine-grain UI classification.
http://arxiv.org/abs/2403.10168v1
Compressor summary: The paper proposes a general uncertainty framework for neural networks that links uncertainty estimation to explainable AI, reduces misclassifications with human expert input, and improves trustworthiness under distribution shifts.
http://arxiv.org/abs/2403.10167v1
Compressor summary: The paper introduces a new algorithm, DEFT, which efficiently finds exchangeable factors in a factor graph, enabling faster probabilistic inference with lifted models.
http://arxiv.org/abs/2403.10166v1
Compressor summary: SemanticHuman-HD is a novel method that achieves semantic disentangled human image synthesis at high resolution using 3D-aware super-resolution and reduced computational cost.
http://arxiv.org/abs/2403.10164v1
Compressor summary: CoReEcho is a novel training framework for echocardiogram analysis that improves performance, explainability, and generalization compared to existing deep learning models.
http://arxiv.org/abs/2403.10160v1
Compressor summary: The paper proposes a framework that combines offline and virtual preferences for learning reward functions in preference-based reinforcement learning, improving generalizability and guidance for agents.
http://arxiv.org/abs/2403.10158v1
Compressor summary: The paper presents a novel framework called funGCN that combines Functional Data Analysis and Graph Convolutional Networks to handle multi-task and multi-modal learning in digital health and longitudinal studies, ensuring interpretability and performance.
http://arxiv.org/abs/2403.10153v1
Compressor summary: eCLIP is an improved CLIP model that uses radiologist eye-gaze heatmaps as expert annotations to enhance multi-modal medical imaging analysis, particularly addressing data scarcity and modality gap issues.
http://arxiv.org/abs/2403.10147v1
Compressor summary: GGRt is a novel framework that enables fast and efficient generalizable novel view synthesis without real camera poses using 3D Gaussian Splatting and joint learning.
http://arxiv.org/abs/2403.10145v1
Compressor summary: Key points: - Roadside perception is important for autonomous driving and traffic management - Existing approaches have limitations in sensing range and blind spots - RCooper aims to achieve area-coverage roadside perception for restricted traffic areas - The paper releases the first large-scale RCooper dataset with annotations Summary: The paper introduces RCooper, a new approach for area-coverage roadside perception in autonomous driving and traffic management, and releases the first large-scale annotated dataset for it.
http://arxiv.org/abs/2403.10144v1
Compressor summary: The paper proposes a methodology to evaluate and improve the safety and reliability of deep neural networks in natural language processing by addressing technical challenges and introducing new metrics for verification pipelines.
http://arxiv.org/abs/2403.10133v1
Compressor summary: The paper proposes a zero-shot image editing method, E4C, which enhances editability and alignment with CLIP guidance using a dual-branch feature-sharing pipeline.
http://arxiv.org/abs/2403.10131v1
Compressor summary: The paper introduces RAFT, a method that trains large language models to answer questions by ignoring distractor documents and citing the relevant sequence from them.
http://arxiv.org/abs/2403.10127v1
Compressor summary: TransLandSeg is a transfer learning approach for landslide detection that improves efficiency and accuracy by adapting the powerful segmentation capability of SAM with only 1.3% of its parameters.
http://arxiv.org/abs/2403.10124v1
Compressor summary: DISCN is a network that uses eye movement analysis with salient attention and serial attention modules to detect Alzheimer's disease from visual stimuli.
http://arxiv.org/abs/2403.10123v1
Compressor summary: CLDSSMs are a novel approach that combines deep state-space models with continual learning methods to adapt to evolving tasks without forgetting previous ones.
http://arxiv.org/abs/2403.10119v1
Compressor summary: The proposed method improves NeRF performance by estimating camera poses and velocities from unordered rolling shutter images without sequential data constraints.
http://arxiv.org/abs/2403.10112v1
Compressor summary: The paper proposes NeuroEvolution-based methods for centralized and decentralized active hypothesis testing with eavesdroppers, showing improved performance in anomaly detection over wireless sensor networks.
http://arxiv.org/abs/2403.10110v1
Compressor summary: The text proposes a meta-learning algorithm for improving complex query answering by learning meta-operators, which are more generalizable than the existing approaches.
http://arxiv.org/abs/2403.10107v1
Compressor summary: The paper proposes a framework that uses multiple large language models to improve video-based human-object interaction detection and enhance the decision-making of robots and autonomous systems.
http://arxiv.org/abs/2403.10104v1
Compressor summary: CSDNet is a lightweight network that uses spatial information prescreening and implicit coherence navigation to integrate two less-coherent modalities for salient object detection in robotic perception, outperforming methods using RGB-T or RGB-D modalities with faster runtime and fewer FLOPs.
http://arxiv.org/abs/2403.10103v1
Compressor summary: DyBluRF is a dynamic NeRF method that synthesizes sharp novel views from motion-blurred monocular videos by capturing camera and object trajectories and using cross-time rendering.
http://arxiv.org/abs/2403.10099v1
Compressor summary: KP-RED is a method to match object scans with CAD models using sparse keypoints, embedding space, and neural cage deformation.
http://arxiv.org/abs/2403.10098v1
Compressor summary: The paper proposes a Diffusion-Information-Diffusion framework to correct face degradation across diverse scenarios using AdaIN and manifold information bottleneck, achieving high generalization and quality.
http://arxiv.org/abs/2403.10097v1
Compressor summary: AdaRand is a simple method for improving fine-tuning of deep neural networks without auxiliary source information by adapting feature vector distributions using class conditional Gaussian distributions.
http://arxiv.org/abs/2403.10094v1
Compressor summary: RangeLDM is a novel approach that uses latent diffusion models to rapidly generate high-quality range-view LiDAR point clouds for autonomous driving with accurate projection, compression, and 3D structural fidelity preservation.
http://arxiv.org/abs/2403.10088v1
Compressor summary: CoARL is a new framework that generates effective counterspeech for online hate speech by modeling social biases and using multi-instruction tuning and reinforcement learning.
http://arxiv.org/abs/2403.10087v1
Compressor summary: The study presents an improved SE-InceptionV3 model that detects monkeypox with 96.71% accuracy using SENet and L2 regularization, outperforming conventional and deep learning methods.
http://arxiv.org/abs/2403.10085v1
Compressor summary: Key points: - A novel framework for point cloud registration with broad applicability - Spherical voxel feature representation to handle different densities and distributions - Hierarchical correspondence filtering to remove outliers and mismatches - High performance in both homologous and cross-source registration scenarios - Code available at https://github.com/GuiyuZhao/VRHCF Summary: The authors present a versatile point cloud registration framework that uses spherical voxels and hierarchical correspondence filtering to achieve high performance in both homologous and cross-source registration scenarios, with code available online.
http://arxiv.org/abs/2403.10082v1
Compressor summary: Key points: - Proposed method uses text descriptions from LLMs to guide feature learning for one-shot action recognition - Text descriptions are used in a global-local-global way to focus on informative joints and form global representation - Dual-branch architecture allows inference without text input and reduces cost compared to base encoder - Outperforms existing methods and can enhance other skeleton encoders Summary: The paper proposes a method that uses text descriptions from LLMs to guide feature learning for one-shot action recognition, achieving better performance and efficiency than existing methods.
http://arxiv.org/abs/2403.10081v1
Compressor summary: DRAGIN is a new framework that improves text generation by dynamically deciding when and what to retrieve based on the real-time information needs of large language models.
http://arxiv.org/abs/2403.10079v1
Compressor summary: The paper proposes an unsupervised object-centric prediction model that learns visual dynamics between objects for future predictions, outperforming existing methods in visual quality and physical reliability.
http://arxiv.org/abs/2403.10075v1
Compressor summary: This paper reviews various techniques for creating synthetic data to augment computer vision tasks when real data is scarce or unavailable, covering approaches based on 3D graphics, neural style transfer, differential neural rendering, GANs, and VAEs.
http://arxiv.org/abs/2403.10071v1
Compressor summary: VQCT is a new framework that uses language model-pretrained codebooks and part-of-speech knowledge to improve image synthesis with vector-quantized image modeling.
http://arxiv.org/abs/2403.10069v1
Compressor summary: The paper proposes a new active finetuning framework that selects samples for annotation based on diversity and uncertainty using pseudo-class centers, denoising, and iterative boundary sample selection in high-dimensional feature space.
http://arxiv.org/abs/2403.10068v1
Compressor summary: The paper proposes a new framework for multi-agent perception that improves collaboration, preserves individual view information, and reduces communication volume.
http://arxiv.org/abs/2403.10066v1
Compressor summary: CoPA is a novel contrastive pre-training framework for point cloud quality assessment that learns quality-aware representations from unlabeled data using mixed images with multiple distortions, improving generalization and performance over existing methods.
http://arxiv.org/abs/2403.10065v1
Compressor summary: The Triple GNNs network improves dialogue sentiment analysis by using graph convolution and attention to capture syntactic dependencies within utterances and inter-utterance interactions.
http://arxiv.org/abs/2403.10063v1
Compressor summary: The paper presents new projection-free algorithms for continuous DR-submodular optimization with various scenarios and constraints, achieving sub-linear regret bounds in both non-monotone and monotone settings.
http://arxiv.org/abs/2403.10061v1
Compressor summary: The paper proposes a self-supervised pre-training framework using masked autoencoders to improve no-reference point cloud quality assessment without labeled data.
http://arxiv.org/abs/2403.10058v1
Compressor summary: RID-Twin is a new method that uses advanced generative models to de-identify faces in videos by separating identity from motion, while addressing various challenges in the field.
http://arxiv.org/abs/2403.10056v1
Compressor summary: The paper proposes a novel continual instruction tuning method for large language models using Key-part Information Gain (KPIG) to alleviate catastrophic forgetting and improve task-aware information capture.
http://arxiv.org/abs/2403.10054v1
Compressor summary: The article presents an industrial automation method using digital image processing and optimization techniques for generating optimal routes in a warehouse area.
http://arxiv.org/abs/2403.10053v1
Compressor summary: The authors propose Group-Mix SAM, a lightweight version of MobileSAM, which can be deployed in practical assembly line scenarios due to its reduced size and computational cost, while maintaining similar performance.
http://arxiv.org/abs/2403.10052v1
Compressor summary: The paper proposes a masked autoencoder (MAE) for trajectory prediction that learns from actor-specific token memory and adapts to distribution shifts, improving accuracy and efficiency over existing methods.
http://arxiv.org/abs/2403.10050v1
Compressor summary: Texture-GS is a novel approach to disentangle appearance and geometry in 3D Gaussian splatting, enabling high-fidelity appearance editing and real-time rendering.
http://arxiv.org/abs/2403.10047v1
Compressor summary: The proposed scene text spotter uses a pre-trained language model and text block detection to recognize texts in images without precise detection, achieving better performance on complex scenarios.
http://arxiv.org/abs/2403.10044v1
Compressor summary: SphereDiffusion is a novel framework for generating high-quality, controllable spherical panoramic images by addressing unique challenges such as spherical distortion and geometry characteristics using text encoding, deformable techniques, and improved data diversity.
http://arxiv.org/abs/2403.10039v1
Compressor summary: The authors propose a method to improve unsupervised surgical instrument segmentation by enhancing optical flow quality and reducing manual annotations.
http://arxiv.org/abs/2403.10037v1
Compressor summary: The text proposes two models that condense and reason with external knowledge to improve visual question answering, achieving state-of-the-art results.
http://arxiv.org/abs/2403.10036v1
Compressor summary: SparseFusion is a novel framework that uses sparse 3D features to enable efficient long-range 3D object detection, outperforming dense detectors and achieving state-of-the-art results on several tasks.
http://arxiv.org/abs/2403.10030v1
Compressor summary: MCTF is a method to improve the efficiency and accuracy of Vision Transformers by fusing tokens based on multiple criteria and using one-step-ahead attention.
http://arxiv.org/abs/2403.10022v1
Compressor summary: The paper proposes a method to maintain compatibility with old models and reduce computational complexity in lifelong person re-identification by using cross-model compatibility loss and knowledge consolidation.
http://arxiv.org/abs/2403.10020v1
Compressor summary: Watermark collision is a problem for detecting text copyright in large language models, as two watermarks can interfere with each other's detection.
http://arxiv.org/abs/2403.10015v1
Compressor summary: The paper proposes a method for classifying point sets with spatial deformations using Linear Optimal Transport, which simplifies the problem and achieves competitive results.
http://arxiv.org/abs/2403.10012v1
Compressor summary: The paper introduces a novel Domain Adaptive CAC (DACAC) approach that uses unpaired real-world data to improve the performance of Computational Aberration Correction in real-world applications, by proposing a Quntized Domain-Mixing Representation (QDMR) framework.
http://arxiv.org/abs/2403.10004v1
Compressor summary: Text-grounded Object Generation (TOG) is a new image editing scenario where text descriptions guide the creation of objects in real images, and ST-LDM is a framework that uses Swin-Transformer to improve spatial perception and attention in latent diffusion models.
http://arxiv.org/abs/2403.10001v1
Compressor summary: The paper proposes VFMSeg, a novel pipeline that uses visual foundation models to improve cross-modal unsupervised domain adaptation for 3D point cloud segmentation by generating more accurate labels and enhancing neural networks with semantically augmented data.
http://arxiv.org/abs/2403.09998v1
Compressor summary: The paper introduces a binary point cloud Transformer model that compresses neural network weights and activations for point cloud processing, and proposes a binarization mechanism called dynamic-static hybridization to address performance degradation.
http://arxiv.org/abs/2403.09997v1
Compressor summary: The text discusses how natural language processing and machine learning can help identify hereditary health risks from electronic health records for precision health applications.
http://arxiv.org/abs/2403.09996v1
Compressor summary: The paper presents MEDPNet, a high-precision adaptive point cloud registration method for complex Die Castings, and introduces DieCastCloud, a dataset tailored for this task.
http://arxiv.org/abs/2403.09993v1
Compressor summary: The study presents a novel deep learning based rain generator that considers the physical generation mechanism of rains, simulates expected rains, adapts to diverse rainy images, and improves deraining and downstream tasks with more controllable and diverse samples.
http://arxiv.org/abs/2403.09981v1
Compressor summary: The paper introduces MVControl, a neural network architecture for controllable text-to-3D generation, using input condition images and camera poses to guide optimization-based 3D creation with efficient multi-stage 3D Gaussians representation.
http://arxiv.org/abs/2403.09977v1
Compressor summary: EfficientVMamba is a novel light-weight model that combines state space models and efficient skip sampling to achieve competitive performance in various vision tasks with reduced computational complexity.
http://arxiv.org/abs/2403.09976v1
Compressor summary: The paper proposes a method to distinguish between task-irrelevant visual distractors using Implicit Action Generator (IAG) and implicit action-informed world models, which improves performance on various visual control tasks with both heterogeneous and homogeneous distractors.
http://arxiv.org/abs/2403.09975v1
Compressor summary: The authors propose a new method, NoiseEraSAR, to improve skeleton-based action recognition by reducing label noise and setting new state-of-the-art standards.
http://arxiv.org/abs/2403.09974v1
Compressor summary: The paper proposes TES, a method that uses CLIP to generate pseudo text embeddings for unlabelled samples and a dual-branch framework to enhance visual and semantic information in the GCD task, achieving state-of-the-art results.
http://arxiv.org/abs/2403.09973v1
Compressor summary: Key points: - custom mobile multi-camera system for large-space dense light field capture - aim to contribute to 3D scene reconstruction algorithms and immersive VR/AR experiences - used 40 GoPro 10 cameras, captured images of 5k resolution, at least 1000 photos per scene - included elements such as sky, reflections, lights and shadows - validated dataset on three popular algorithms and integrated into Unity engine for VR realism Summary: The authors present a custom mobile system that captures high-quality and dense light field images of large outdoor scenes, which can enhance 3D scene reconstruction and immersive VR/AR experiences.
http://arxiv.org/abs/2403.09972v1
Compressor summary: The paper proposes a two-step framework to better estimate the confidence of multiple answers generated by large language models, which can improve their trustability.
http://arxiv.org/abs/2403.09969v1
Compressor summary: The paper proposes a TCN model that fuses AIS, pilotage booking, and meteorological data to predict vessel arrival time with high accuracy and low error.
http://arxiv.org/abs/2403.09963v1
Compressor summary: This paper investigates prompt bias in pre-trained language models, shows its negative impact on benchmark accuracy, and proposes a representation-based approach to mitigate it during inference time.
http://arxiv.org/abs/2403.09962v1
Compressor summary: The paper introduces a new model that uses vision transformers to improve machine abstract reasoning abilities on the Raven dataset, which mimics human reasoning tests.
http://arxiv.org/abs/2403.09953v1
Compressor summary: This paper introduces LeBeD, a metric to evaluate how well trained graph neural networks (GNNs) can generalize to real-world graphs with distribution shifts, by measuring learning behavior discrepancies in node prediction and structure reconstruction.
http://arxiv.org/abs/2403.09948v1
Compressor summary: RadCLIP is a new AI model that uses language-image pre-training to improve medical image analysis by understanding radiological data better than existing models.
http://arxiv.org/abs/2403.09947v1
Compressor summary: The authors propose a deep learning model that enhances local features for knee osteoarthritis grade classification using the Swin Transformer, improving accuracy and robustness in medical imaging diagnostics.
http://arxiv.org/abs/2403.09939v1
Compressor summary: This study examines how quantizing neural networks affects their perceptual fields, particularly class activation maps (CAMs), across six different CNN architectures, shedding light on the alignment between CAMs and human visual saliency maps and revealing the sensitivities of different models to quantization.
http://arxiv.org/abs/2403.09930v1
Compressor summary: QDAC is an advanced deep reinforcement learning algorithm that learns diverse and high-performing skills for adapting to complex situations.