This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-12 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.06978v1
Compressor summary: APT is a computationally efficient method that injects prompts into attention mechanisms for video action recognition tasks, reducing FLOPs and latency while improving performance.
http://arxiv.org/abs/2403.06977v1
Compressor summary: VideoMamba is a novel video understanding model that adapts Mamba to videos, overcoming limitations of existing models and achieving scalability, efficiency, and robustness in various tasks.
http://arxiv.org/abs/2403.06976v1
Compressor summary: BrushNet is a new model that improves image inpainting by dividing the masked image features and noisy latent into separate branches, leading to better results than existing models.
http://arxiv.org/abs/2403.06974v1
Compressor summary: The paper introduces a framework to enhance offline 3D scene perception models with online capabilities using adapters that leverage temporal information and memory.
http://arxiv.org/abs/2403.06971v1
Compressor summary: The paper proposes a game-based method for learning dimensionality-reducing representations using prior knowledge of future prediction tasks.
http://arxiv.org/abs/2403.06970v1
Compressor summary: Key points: - Syntactic parsing is important for information extraction in resource-scarce languages - Existing systems for morphologically rich languages (MRLs) are slow and complex - A new "flipped pipeline" approach improves speed and accuracy for Hebrew NLP tasks Summary: The paper proposes a fast and accurate "flipped pipeline" method for syntactic parsing in MRLs, using Hebrew as a test case.
http://arxiv.org/abs/2403.06966v1
Compressor summary: Di-SkilL is an RL method for learning diverse skills using Mixture of Experts and maximum entropy optimization, with energy-based models for handling hard discontinuities and multi-modality.
http://arxiv.org/abs/2403.06965v1
Compressor summary: The text discusses how Construction Grammar can help explain meaning in language constructions, tests Large Language Models on understanding a specific construction (caused-motion), and proposes a novel pipeline for collecting annotated linguistic data using NLP tools.
http://arxiv.org/abs/2403.06963v1
Compressor summary: The paper discusses how autoregressive inference and teacher-forced training are different phases of next-token prediction, and argues that teacher-forcing can fail to learn an accurate predictor in certain tasks.
http://arxiv.org/abs/2403.06961v1
Compressor summary: The authors propose a new self-attention mechanism for medical image diagnosis that provides better visual insights and enhances trust in AI decisions.
http://arxiv.org/abs/2403.06953v1
Compressor summary: The paper evaluates object-centric deep learning methods for improving surgical scene understanding across different medical centers and proposes a new approach (LG-DG) that significantly outperforms existing methods.
http://arxiv.org/abs/2403.06952v1
Compressor summary: SELMA improves text-to-image models by using automatically generated data sets to teach different skills and then merging specialized models for faithful image generation.
http://arxiv.org/abs/2403.06951v1
Compressor summary: DEADiff improves text controllability in text-to-image models by decoupling style and semantics using Q-Formers and non-reconstructive learning.
http://arxiv.org/abs/2403.06947v1
Compressor summary: The paper presents a new framework for remote photoplethysmography that uses both explicit and implicit prior knowledge to improve performance across different domains and noise sources.
http://arxiv.org/abs/2403.06946v1
Compressor summary: Our UniMoS framework disentangles CLIP's features and trains them separately, improving unsupervised domain adaptation performance for vision-language models.
http://arxiv.org/abs/2403.06936v1
Compressor summary: The paper introduces a new task called CFKGR that links knowledge graph completion and counterfactual reasoning, and proposes COULDD, a method for adapting knowledge graph embeddings to detect plausible changes in hypothetical scenarios while retaining original facts.
http://arxiv.org/abs/2403.06935v1
Compressor summary: The text discusses how humans use various expressions for describing objects in images and explores whether large language models can mimic this feature, finding mixed results.
http://arxiv.org/abs/2403.06932v1
Compressor summary: ERA-CoT is a new method that helps large language models understand context and reason about multiple entities using Chain-of-Thoughts, leading to improved performance on various natural language processing tasks.
http://arxiv.org/abs/2403.06925v1
Compressor summary: This paper explores how transformers have a lower sensitivity bias than other neural network architectures, which leads to better robustness and simplicity across different data modalities.
http://arxiv.org/abs/2403.06914v1
Compressor summary: MEND is a method to improve the efficiency and effectiveness of in-context learning by distilling demonstrations without retraining, achieving better performance than vanilla ICL and other distillation models.
http://arxiv.org/abs/2403.06912v1
Compressor summary: DNGaussian is a framework that uses depth regularization to improve real-time novel view synthesis from sparse input views with low training costs and fast inference speed.
http://arxiv.org/abs/2403.06910v1
Compressor summary: The paper introduces a unified definition of responsible AI, focusing on ethical, explainable, and privacy-preserving AI methods, to help guide future regulation and development.
http://arxiv.org/abs/2403.06908v1
Compressor summary: FreGS is a technique that improves 3D Gaussian splatting by regulating image frequencies during Gaussian densification, resulting in better real-time novel view synthesis quality.
http://arxiv.org/abs/2403.06906v1
Compressor summary: DeCCaF is a new L2D framework that learns human error probabilities with less data and minimizes error costs under workload constraints in cost-sensitive scenarios.
http://arxiv.org/abs/2403.06903v1
Compressor summary: The paper investigates benign overfitting in two-layer leaky ReLU networks trained with hinge loss on binary classification tasks and shows that high SNR leads to benign overfitting, while low SNR leads to harmful overfitting, both due to approximate margin maximization.
http://arxiv.org/abs/2403.06902v1
Compressor summary: The paper proposes a novel data-driven adaptive Chirp-Z Transform estimator for remote heart rate estimation from rPPG signals, achieving outstanding performance across diverse datasets.
http://arxiv.org/abs/2403.06895v1
Compressor summary: The research improves GRIT, a relation recognition model, by introducing new features, creating two versions with different sizes, and applying quantization techniques for efficient deployment on mobile devices.
http://arxiv.org/abs/2403.06892v1
Compressor summary: The paper introduces OmDet-Turbo, a fast and accurate transformer-based model for open-vocabulary object detection with an Efficient Fusion Head module.
http://arxiv.org/abs/2403.06884v1
Compressor summary: The study introduces TrafficDojo, a holistic traffic simulation framework for evaluating vision-based traffic signal control methods that reduce congestion and emissions by using end-to-end learning and optimization of traffic signals.
http://arxiv.org/abs/2403.06880v1
Compressor summary: The text discusses how reward transitions in reinforcement learning tasks, inspired by toddlers' learning from sparse feedback to dense rewards, affect sample efficiency, success rates, and generalization.
http://arxiv.org/abs/2403.06874v1
Compressor summary: Key points: - Paper proposes a framework for combining OOD measures into one COOD measure using a supervised model - COOD is evaluated on three large-scale biodiversity datasets for anomaly and novel class detection - COOD outperforms individual OOD measures by a large margin and considers ID images incorrectly classified for the original task Summary: The paper presents a supervised framework that combines various OOD measures to improve anomaly and novel class detection in species recognition tasks, achieving better results than existing methods.
http://arxiv.org/abs/2403.06872v1
Compressor summary: MESc is a deep-learning framework that uses multi-stage encoder-based supervised clustering to predict legal judgments from large, non-uniform, and unstructured documents, outperforming previous methods by 2 points.
http://arxiv.org/abs/2403.06871v1
Compressor summary: This paper presents a theoretical framework to understand how unsupervised pre-training affects the generalization of fine-tuned models, and proposes a novel regularization method for better performance.
http://arxiv.org/abs/2403.06870v1
Compressor summary: Key points: - Continual learning (CL) methods use prompts to adapt a large pre-trained model to new tasks - Prompt selection strategy can suffer from catastrophic forgetting due to changing keys - The method uses CLIP to select prompts and transfer semantics to ViT layers with a residual mechanism - The method outperforms state-of-the-art CL approaches and works well on datasets with domain gap Summary: The paper proposes a novel CL method that uses CLIP to select and transfer prompts, avoiding forgetting and adapting to datasets with domain gap.
http://arxiv.org/abs/2403.06869v1
Compressor summary: This paper analyzes and mitigates label noise in large-scale pre-training datasets to improve generalization and reduce risks in foundation models.
http://arxiv.org/abs/2403.06866v1
Compressor summary: The paper presents a new non-parametric method for evaluating image quality and aesthetics that outperforms existing approaches, requires no additional engineering, and agrees well with human judgments.
http://arxiv.org/abs/2403.06862v1
Compressor summary: SimXR is a method that uses information from VR/AR headsets to control a humanoid avatar's movement in simulation, combining headset poses and image analysis.
http://arxiv.org/abs/2403.06860v1
Compressor summary: Key points: - The study develops a model for predicting locust breeding grounds using spatio-teminal input features and deep learning - The model outperforms existing baselines and uses multi-spectral earth observation images as the main input - Multi-spectral images alone are sufficient for prediction without other environmental or climatic data Summary: The study creates a better model for predicting locust breeding grounds using deep learning and multi-spectral images, which can enhance early warning systems and control measures.
http://arxiv.org/abs/2403.06857v1
Compressor summary: The study develops a Caregiving Language Model (CaLM) using small language models and a caregiving knowledge base, finding it performs better than GPT-3.5 in supporting family caregivers of individuals with Alzheimer's Disease Related Dementias.
http://arxiv.org/abs/2403.06854v1
Compressor summary: The paper analyses how sensitive inverse reinforcement learning (IRL) is to misspecification of behavioural models and provides conditions for when IRL is robust or not.
http://arxiv.org/abs/2403.06846v1
Compressor summary: DiaLoc is a dialog-based localization framework that uses multimodal data and iterative refinement to achieve state-of-the-art results in embodied dialog-based localization tasks, both in single-shot and multi-shot settings.
http://arxiv.org/abs/2403.06845v1
Compressor summary: DriveDreamer-2 is a system that uses a large language model to generate customized driving videos with high quality and coherence for enhancing driving perception methods training.
http://arxiv.org/abs/2403.06843v1
Compressor summary: The paper proposes a machine learning approach to identify risk factors for infant resuscitation at birth and aims to develop a mobile app for healthcare personnel to use in the delivery room.
http://arxiv.org/abs/2403.06840v1
Compressor summary: RA-ISF is a framework that improves large language models' problem-solving by iteratively decomposing tasks and integrating external knowledge, outperforming existing methods on GPT3.5 and Llama2.
http://arxiv.org/abs/2403.06837v1
Compressor summary: The authors propose a new method called stochastic cortical self-reconstruction (SCSR) that creates subject-specific healthy reference ranges for assessing cortical atrophy in neurodegenerative diseases using MRI data and various machine learning models.
http://arxiv.org/abs/2403.06835v1
Compressor summary: The proposed model generates detailed and accurate synthetic medical images by aligning descriptive text prompts with image features using fine-grained alignment techniques.
http://arxiv.org/abs/2403.06833v1
Compressor summary: The text introduces a formal measure to evaluate the instruction-data separation in LLMs, a new dataset (SEP) to estimate it, and shows that current LLMs lack this separation.
http://arxiv.org/abs/2403.06832v1
Compressor summary: The paper proposes SNAG, a Transformer-based method that learns robust multi-modal entity features in knowledge graphs using modality-level noise masking and specific training objectives for two tasks: MKGC and MMEA.
http://arxiv.org/abs/2403.06831v1
Compressor summary: The HDRTransDC network combines TDCAM and DWFB to generate high-quality HDR images by eliminating ghosting artifacts and fusion distortions in multi-exposure LDR images.
http://arxiv.org/abs/2403.06829v1
Compressor summary: The paper introduces a method to improve regression by discretizing continuous variables, creating value thresholds, training classifiers, and concatenating outputs into an enriched vector.
http://arxiv.org/abs/2403.06826v1
Compressor summary: The paper introduces ICEE, an efficient algorithm for online policy learning in offline RL that balances exploration and exploitation within a Transformer model without Bayesian inference.
http://arxiv.org/abs/2403.06814v1
Compressor summary: The text describes a novel contextual multi-armed bandit approach for adaptive deep brain stimulation to treat Parkinson's disease, which improves efficiency and reduces side effects compared to traditional methods.
http://arxiv.org/abs/2403.06813v1
Compressor summary: LeOCLR is a new framework for contrastive learning of visual representations that improves representation learning by ensuring shared regions between positive pairs are semantically correct, outperforming baseline models on different datasets.
http://arxiv.org/abs/2403.06812v1
Compressor summary: The paper proposes algorithms for online learning with individual fairness that use monotone aggregation functions to collect feedback from multiple auditors, achieving better regret and fairness violations bounds than previous methods.
http://arxiv.org/abs/2403.06807v1
Compressor summary: Multistep Consistency Models combine consistency and diffusion models to balance sampling speed and quality, achieving impressive results on image generation tasks.
http://arxiv.org/abs/2403.06806v1
Compressor summary: The paper analyzes the convergence rate of policy gradient for infinite horizon average reward Markov decision processes and shows that it converges at a sublinear rate with finite-time performance guarantees.
http://arxiv.org/abs/2403.06804v1
Compressor summary: Shape Non-rigid Kinematics (SNK) is a novel method for matching non-rigid shapes that doesn't require training or ground truth data, using an encoder-decoder architecture and an unsupervised functional map.
http://arxiv.org/abs/2403.06803v1
Compressor summary: The authors propose a data-independent operator (DIO) to detect fake images generated by various models, achieving state-of-the-art performance without requiring training or large models.
http://arxiv.org/abs/2403.06800v1
Compressor summary: MambaMIL uses a sequence model to improve feature extraction and overfitting in computational pathology, outperforming existing Multiple Instance Learning approaches.
http://arxiv.org/abs/2403.06797v1
Compressor summary: The paper proposes using deep learning to generate informative samples from sparse data for training autonomous systems securely.
http://arxiv.org/abs/2403.06793v1
Compressor summary: The paper proposes a lightweight module called PTG-RM that uses pre-trained models to improve image restoration tasks such as enhancing low-light images, removing rain, blur, and noise from images.
http://arxiv.org/abs/2403.06786v1
Compressor summary: The paper proposes interpretable metrics to predict how well augmentation policies work for sim-to-real object detection tasks and introduces GeneticAugment, a method that uses these metrics to automatically design augmentation policies.
http://arxiv.org/abs/2403.06775v1
Compressor summary: The paper proposes a method to improve text-to-image generation by modeling subjects as derived classes that inherit both public and private attributes from their categories, leading to more realistic and imaginative attribute-related generations.
http://arxiv.org/abs/2403.06771v1
Compressor summary: The paper proposes a novel framework for characterizing group dynamics in temporal data using archetypal events defined by facets, enabling richer and more reliable analyses of complex group relationships.
http://arxiv.org/abs/2403.06769v1
Compressor summary: The paper proposes TRIP, a dialogue agent that can tailor its strategic planning for diverse users and perform well on non-collaborative dialogue tasks.
http://arxiv.org/abs/2403.06768v1
Compressor summary: XB-MAML is a meta-learning method that learns expandable basis parameters to form an effective initialization for diverse unseen tasks by adaptively expanding them based on discrepancy with fine-tuned parameters.
http://arxiv.org/abs/2403.06765v1
Compressor summary: The paper introduces ConspEmoLLM, an open-source natural language processing model that detects and analyzes conspiracy theories by incorporating affective features such as sentiment and emotions.
http://arxiv.org/abs/2403.06764v1
Compressor summary: The study identifies attention inefficiency in large vision-language models and introduces FastV, a method to optimize efficiency and performance in image and video understanding tasks.
http://arxiv.org/abs/2403.06759v1
Compressor summary: The paper proposes mL1-ACE, a novel loss function to improve medical image segmentation by reducing overconfidence and calibration errors, while maintaining high Dice scores.
http://arxiv.org/abs/2403.06758v1
Compressor summary: The paper proposes EarthLoc, a novel model that uses image retrieval to efficiently and accurately localize astronaut photography for scientific research and disaster response.
http://arxiv.org/abs/2403.06757v1
Compressor summary: The paper proposes a new method for training multiple models to predict uncertain outcomes in dynamical systems, such as weather, by encouraging them to disagree with each other.
http://arxiv.org/abs/2403.06754v1
Compressor summary: ALaRM is a framework that models hierarchical rewards to improve the alignment of large language models with human preferences in complex text generation tasks.
http://arxiv.org/abs/2403.06745v1
Compressor summary: The paper introduces a novel supervised fine-tuning mechanism for multilingual neural machine translation that improves performance and reduces off-target issues by automatically constructing constrained templates with trigger tokens.
http://arxiv.org/abs/2403.06741v1
Compressor summary: DistDiff is a data expansion framework that uses hierarchical prototypes to generate distribution-consistent samples, improving performance of deep models without additional training.
http://arxiv.org/abs/2403.06738v1
Compressor summary: V3D uses pre-trained video diffusion models to create high-quality 3D objects from single images with geometrical consistency and fast generation speed.
http://arxiv.org/abs/2403.06735v1
Compressor summary: The paper presents a method to improve text-generative models for image captions using Supervised Learning, Reinforcement Learning with Human Feedback, and a novel loss function based on the Flickr8k dataset.
http://arxiv.org/abs/2403.06734v1
Compressor summary: CognitiveEMS is a wearable system that uses speech recognition, graph-based attention, and action recognition to assist EMS responders in real-time during emergencies.
http://arxiv.org/abs/2403.06728v1
Compressor summary: The paper proposes LM-RRG, a novel radiology report generation method that combines large models with clinical quality reinforcement learning to produce accurate and comprehensive chest X-ray reports.
http://arxiv.org/abs/2403.06726v1
Compressor summary: The paper proposes ProCo, a probabilistic contrastive learning algorithm that estimates class distributions using mixture of von Mises-Fisher distributions and samples contrastive pairs accordingly to handle imbalanced data.
http://arxiv.org/abs/2403.06702v1
Compressor summary: E3-FaceNet is a network that generates and manipulates 3D face models from text instructions with high efficiency and quality, using a direct mapping and novel enhancements.
http://arxiv.org/abs/2403.06687v1
Compressor summary: The HL-HGAT is a novel graph neural network that learns from $k$-simplices using Hodge-Laplacian convolutional filters, simplicial projection, and simplicial attention pooling for various applications.
http://arxiv.org/abs/2403.06683v1
Compressor summary: The paper explores using deep learning models trained on natural images to infer depth in endoscopic videos, and improves their performance by adding temporal consistency self-supervision.
http://arxiv.org/abs/2403.06682v1
Compressor summary: The paper presents a novel model that uses multimodal deep learning to restore ancient texts, particularly ideographs, by combining context understanding with visual information from damaged artefacts.
http://arxiv.org/abs/2403.06681v1
Compressor summary: PLL-OOD is a novel method for learning from ambiguously labelled data that incorporates Out-of-Distribution detection to enhance model adaptability and accuracy in open-world settings.
http://arxiv.org/abs/2403.06679v1
Compressor summary: The paper proposes a method called mutual correlation distillation (MCD) to improve audio-visual question answering by enhancing soft associations, aligning cross-modal features, and decoupling audio-visual dependencies, leading to better performance on two datasets.
http://arxiv.org/abs/2403.06677v1
Compressor summary: The study proposes R-LSVRG and R-PAGE methods for stochastic optimization on Riemannian manifolds, which simplify proofs, hyperparameter selection, and have sharp convergence guarantees, and applies them to non-convex distributed settings with communication compression.
http://arxiv.org/abs/2403.06676v1
Compressor summary: Large kernel CNNs perform well in downstream tasks like weakly supervised object localization, with feature map improvement being the main factor for their success, and they are robust to CAM problems.
http://arxiv.org/abs/2403.06674v1
Compressor summary: The text describes an application that uses computer vision to detect car damages and align pre-trip and post-trip images for insurance purposes, using a Mask R-CNN model and a self-supervised SimCLR alignment approach.
http://arxiv.org/abs/2403.06670v1
Compressor summary: CEAT is a new architecture that enables models to learn new tasks without forgetting old ones while protecting privacy by extending and absorbing layers and using prototype contrastive loss and pseudo-features.
http://arxiv.org/abs/2403.06668v1
Compressor summary: PeerAiD uses a peer network to defend against adversarial examples targeting a student network, improving its robustness in security-critical domains.
http://arxiv.org/abs/2403.06658v1
Compressor summary: The paper proposes a biometric recognition framework that uses synthetic data and local registration to address data demands, domain generalization, and interpretability issues.
http://arxiv.org/abs/2403.06644v1
Compressor summary: The authors investigate how Large Language Models (LLMs) handle tabular data, finding that they may be contaminated or memorize the data, which can affect their performance on downstream tasks.
http://arxiv.org/abs/2403.06643v1
Compressor summary: The study proposes two new features for occupancy detection based on CO2 concentration spatial distribution, improving accuracy and quantity estimation in naturally ventilated buildings without or with ventilation information.
http://arxiv.org/abs/2403.06631v1
Compressor summary: The paper explores finetuning object detection models for few-shot learning, evaluates energy demands in industrial settings, and proposes an Efficiency Factor metric to measure the trade-off between performance and efficiency.
http://arxiv.org/abs/2403.06616v1
Compressor summary: The text describes a method for improving the accuracy and efficiency of localizing driving actions in videos using density-guided label smoothing and post-processing techniques, achieving competitive results in a naturalistic driving action recognition challenge.
http://arxiv.org/abs/2403.06611v1
Compressor summary: The MedKP framework improves large language models' performance in medical dialogue generation by incorporating external knowledge from a medical knowledge graph and internal clinical pathway encoding via entities and actions.
http://arxiv.org/abs/2403.06609v1
Compressor summary: In-Context Padding (ICP) is a novel framework that enhances large language models' clinical reasoning by guiding them with critical medical knowledge elements called knowledge seeds.
http://arxiv.org/abs/2403.06606v1
Compressor summary: Our method edits and augments images to remove bias in facial attribute classification without needing extra labels.
http://arxiv.org/abs/2403.06601v1
Compressor summary: Key points: - Direct image-to-graph transformation is a complex task that requires large training datasets - The paper proposes methods for cross-domain and cross-dimension transfer learning for image-to-graph transformers - The methods include edge sampling loss, domain adaptation framework, and projection function - The method shows better performance on various benchmarks, such as retinal or whole-brain vessel graph extraction Summary: The paper presents novel methods for improving cross-domain and cross-dimension transfer learning for image-to-graph transformers, which enhance object detection and relationship prediction.
http://arxiv.org/abs/2403.06600v1
Compressor summary: The paper proposes BEV2PR, a VPR framework that uses bird's-eye view segmentation features and a single monocular camera to generate composite descriptors with visual cues and spatial awareness, improving performance over existing camera-based methods.
http://arxiv.org/abs/2403.06592v1
Compressor summary: The paper proposes a new method for detecting fake videos by analyzing how the style latent vectors change over time in generated videos, which reveals abnormal patterns that indicate manipulation.
http://arxiv.org/abs/2403.06591v1
Compressor summary: The text introduces a new test (SESI) to evaluate the social intelligence of large language models (LLMs), which shows they have room for improvement and social intelligence is distinct from academic intelligence.
http://arxiv.org/abs/2403.06586v1
Compressor summary: ContextGPT uses prompt engineering to retrieve common-sense knowledge from Large Language Models for context-aware Human Activity Recognition, requiring less human effort and expertise than ontologies.
http://arxiv.org/abs/2403.06577v1
Compressor summary: The study adapts a transformer-based fusion architecture to improve temporal localization and classification accuracy in driver-assistance systems using video action recognition and 2D human-pose estimation.
http://arxiv.org/abs/2403.06576v1
Compressor summary: The paper introduces FFAD, a new metric for evaluating quality of synthetic time series data using the Fourier transform and an auto-encoder.
http://arxiv.org/abs/2403.06574v1
Compressor summary: AC-EVAL is an innovative benchmark to evaluate Large Language Models' understanding of ancient Chinese across three levels of difficulty and 13 tasks, aiming to improve their performance in education and research.
http://arxiv.org/abs/2403.06571v1
Compressor summary: The text introduces $L_1$-Coverage, a new exploration objective in reinforcement learning that balances intrinsic complexity control, efficient planning, and efficient exploration for high-dimensional domains.
http://arxiv.org/abs/2403.06570v1
Compressor summary: The paper presents a new pipeline for optimizing speaker assignment in real-life meeting transcription using VAD, SD, and SA-ASR, and demonstrates improvements by fine-tuning the SA-ASR model and extracting speaker embedding templates from SD output.
http://arxiv.org/abs/2403.06569v1
Compressor summary: The authors use deep learning to adapt models trained on able-bodied data to predict joint motion for amputee patients, potentially improving assistive technologies.
http://arxiv.org/abs/2403.06568v1
Compressor summary: This paper proposes a new way to compare MaxSAT solvers using Empirical Cumulative Distribution Functions, which can help optimize their parameters and show differences in their performance across different time budgets.
http://arxiv.org/abs/2403.06567v1
Compressor summary: The authors propose using vision foundation models as feature extractors for content-based medical image retrieval and show that weakly-supervised models achieve competitive performance without fine-tuning.
http://arxiv.org/abs/2403.06563v1
Compressor summary: This report provides a detailed analysis of scaling law principles for large language models, deriving precise formulas to predict various attributes such as test loss, training steps, and batch size for models up to 33 billion parameters.
http://arxiv.org/abs/2403.06560v1
Compressor summary: This paper presents a method to compute Sliced-Wasserstein distance on certain manifolds and applies it to various problems with non-Euclidean data.
http://arxiv.org/abs/2403.06546v1
Compressor summary: OMH is a novel approach for unsupervised semantic segmentation that uses structured sparsity and optimal transport to learn a hierarchy among parallel clusters, improving performance over existing methods.
http://arxiv.org/abs/2403.06537v1
Compressor summary: The authors demonstrate how an open-source language model can be adapted to provide unethical answers about criminal activities using a new dataset, EVE, highlighting the need for caution with open technologies.
http://arxiv.org/abs/2403.06536v1
Compressor summary: The Multi-Scale Implicit Transformer (MSIT) uses a novel approach to improve the performance of arbitrary-scale super-resolution by exploiting multi-scale characteristics and enhancing latent codes with self-attention and re-interaction modules.
http://arxiv.org/abs/2403.06535v1
Compressor summary: DeLAMA is a new algorithm that helps multiple agents collaborate efficiently without a central server by learning graph structures, using memory to store knowledge, and applying optimization and neural networks.
http://arxiv.org/abs/2403.06534v1
Compressor summary: The authors create a large-scale, diverse SAR object detection dataset (SARDet-100K) and propose a novel pretraining framework (MSFA) to improve the performance of SAR object detection models.
http://arxiv.org/abs/2403.06529v1
Compressor summary: The authors propose a domain-independent pre-training framework for RGB-D face recognition that uses depth models from 3D Morphable Models and an Adaptive Confidence Weighting mechanism to fuse RGB and depth information, achieving state-of-the-art performance.
http://arxiv.org/abs/2403.06524v1
Compressor summary: The paper presents a deep reinforcement learning framework for autonomous trucks to make tactical decisions in ACC and lane change maneuvers, and explores various methods to optimize performance with a multi-objective reward function based on TCOP.
http://arxiv.org/abs/2403.06520v1
Compressor summary: The paper proposes a commonsense-based approach for news captioning that distinguishes similar entities and enriches their descriptions with relevant information.
http://arxiv.org/abs/2403.06517v1
Compressor summary: The paper proposes ActGen, an active learning approach for image generation, which uses real images as guides and generates challenging samples to improve image classification accuracy efficiently.
http://arxiv.org/abs/2403.06516v1
Compressor summary: The paper proposes a reinforcement learning framework that uses comparative feedback to generate realistic chest X-rays from diagnostic reports.
http://arxiv.org/abs/2403.06514v1
Compressor summary: The paper proposes a method to generate more descriptive and accurate explanations for model predictions using semantic graphs, outperforming previous models in both quantitative and qualitative evaluations.
http://arxiv.org/abs/2403.06510v1
Compressor summary: The text introduces a new annotation method (SkA) and a learning framework for accurate airway segmentation using less labeling and improving consistency and accuracy.
http://arxiv.org/abs/2403.06505v1
Compressor summary: Vosh is a hybrid representation of NeRF that combines voxels and mesh for fast and high-quality image synthesis with adjustable balance.
http://arxiv.org/abs/2403.06501v1
Compressor summary: The paper introduces SeSame, a new way to represent 3D object detection data that combines semantic and geometric features, improving accuracy in autonomous driving.
http://arxiv.org/abs/2403.06497v1
Compressor summary: QuantTune is a method that fine-tunes transformer-based models for better post-training linear quantization and reduces accuracy drops caused by precision loss due to outliers.
http://arxiv.org/abs/2403.06495v1
Compressor summary: The paper proposes a novel method, InCTRL, that uses few-shot normal images as prompts to train a generalist anomaly detection model on diverse datasets without extra training data.
http://arxiv.org/abs/2403.06489v1
Compressor summary: The paper proposes GNUM, a graph neural network-based framework with two uplift estimators to learn from social graphs for uplift estimation in randomized experiments or observational data.
http://arxiv.org/abs/2403.06488v1
Compressor summary: QPENet uses query features to create custom foreground and background prototypes for few-shot segmentation, improving performance on PASCAL-$5^i$ and COCO-$20^i$ datasets.
http://arxiv.org/abs/2403.06487v1
Compressor summary: The paper explores using a voice activity projection model to predict turn-taking in multilingual dialogue and shows that a multilingual model outperforms monolingual ones, while also analyzing the role of pitch and audio encoders.
http://arxiv.org/abs/2403.06483v1
Compressor summary: This paper proposes a negation method for random permutation sets theory, studies its convergence, and shows its effects on uncertainty and dissimilarity using numerical examples.
http://arxiv.org/abs/2403.06479v1
Compressor summary: The paper introduces Ada-Tracker, a method for tracking soft tissues in computer-assisted surgeries using optical flow to capture deformations and adaptively correct the template.
http://arxiv.org/abs/2403.06471v1
Compressor summary: This study presents DPANet, a few-shot segmentation method for accurately segmenting the heart and left atrial enlargement on canine chest radiographs, setting a new benchmark in veterinary AI research.
http://arxiv.org/abs/2403.06470v1
Compressor summary: Key points: - The paper proposes a 3D-aware image generation and editing model with multiple conditional inputs. - The model uses a novel disentanglement strategy to separate shape and appearance features in the latent space of GANs. - The model can generate diverse images, edit attributes via text, and conduct style transfer with a reference image. Summary: The paper presents a new 3D-aware image generation and editing model that leverages multiple inputs and disentangles shape and appearance in GANs' latent space, enabling diverse images and flexible attribute editing.
http://arxiv.org/abs/2403.06467v1
Compressor summary: Key points: - State space model (SSM) is a powerful tool for language and image processing, but hard to apply to point clouds due to their disorder and causality requirements - The paper proposes Point Mamba, a SSM-based point cloud processing backbone with an octree-based ordering strategy that preserves causal dependency and spatial proximity of points - Point Mamba achieves state-of-the-art performance and linear complexity on two benchmark datasets, outperforming transformer-based methods Summary: Point Mamba is a novel SSM-based point cloud processing method that uses an octree ordering strategy to capture causal dependency and spatial proximity of points, achieving superior accuracy and efficiency on language and image tasks.
http://arxiv.org/abs/2403.06466v1
Compressor summary: The paper proposes a Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) that handles uncertain events like traffic congestion and considers the interests of both the bus company and passengers.
http://arxiv.org/abs/2403.06462v1
Compressor summary: The paper proposes a new semi-supervised semantic segmentation method called Density-Descending Feature Perturbation (DDFP) that improves feature density estimation and exploration for better segmentation performance.
http://arxiv.org/abs/2403.06461v1
Compressor summary: Latte is a multi-modal test-time adaptation method for 3D segmentation that uses reliable cross-modal spatial-temporal correspondences and temporal local prediction consistency to adapt models to unlabeled target domains.
http://arxiv.org/abs/2403.06458v1
Compressor summary: The article presents a system using a neural network (LSTM) to estimate wort density for beer production from cheaper sensor data like pressure or temperature.
http://arxiv.org/abs/2403.06457v1
Compressor summary: The paper proposes a graph neural network approach that combines data-driven and traditional graph-matching methods, improving performance and reducing computational complexity.
http://arxiv.org/abs/2403.06453v1
Compressor summary: FontCLIP is a model that combines vision-language understanding with typography knowledge to find fonts across languages and attributes, even for unseen data.
http://arxiv.org/abs/2403.06452v1
Compressor summary: Text2QR is a method that uses stable-diffusion models to create visually appealing QR codes while maintaining scanning robustness.
http://arxiv.org/abs/2403.06448v1
Compressor summary: The paper introduces MIND, an unsupervised framework for real-time hallucination detection in large language models, and HELM, a new benchmark for evaluating such detection.
http://arxiv.org/abs/2403.06444v1
Compressor summary: The paper introduces Latent Semantic Consensus (LSC), a method to fit geometric models to noisy data by preserving latent semantic spaces, and shows its effectiveness and efficiency in computer vision tasks.
http://arxiv.org/abs/2403.06443v1
Compressor summary: Event-Based Temporal Mapping Photography (EvTemMap) converts events from a stationary event camera into dense intensity images using temporal mapping neural networks, achieving high dynamic range and fine-grained details in static scenes.
http://arxiv.org/abs/2403.06433v1
Compressor summary: The paper proposes Fine-Grained Pillar Feature Encoding (FG-PFE), which uses Spatio-Temporal Virtual grids to capture LiDAR point distributions within pillar structures and improve 3D object detection for autonomous vehicles.
http://arxiv.org/abs/2403.06432v1
Compressor summary: Key points: - GNNs can learn dynamic functional connectivity from human brain networks, but labeled data is scarce and costly - ST-JEMA is a self-supervised generative model that reconstructs dynamic graphs and captures high-level semantic representations - ST-JEMA outperforms previous methods in predicting phenotypes and diagnoses, and works well on missing data scenarios Summary: ST-JEMA is a novel self-supervised method that learns semantic representations of dynamic functional connectivity from fMRI data using generative reconstruction, improving phenotype and diagnosis prediction and handling missing data.
http://arxiv.org/abs/2403.06430v1
Compressor summary: The paper proposes a new backdoor attack on face restoration models using subtle frequency domain triggers, making the attacks imperceptible but still effective.
http://arxiv.org/abs/2403.06425v1
Compressor summary: The paper proposes a smooth and interpretable way to model how Graph Neural Networks (GNN) predict distributions on evolving graphs using differential geometry.
http://arxiv.org/abs/2403.06421v1
Compressor summary: The text discusses the need for better performance evaluation of talking head generation techniques using psychophysical experiments and human validation.
http://arxiv.org/abs/2403.06417v1
Compressor summary: Sparsification-based pruning improves model compression by enhancing the expressivity of kept weights and maintaining the magnitude of dropped weights, achieving superior performance without fine-tuning under aggressive pruning scenarios.
http://arxiv.org/abs/2403.06414v1
Compressor summary: EvoKD is a method that uses active learning and feedback to generate diverse and challenging data for distilling knowledge from large language models to small ones, improving their performance on various NLP tasks.
http://arxiv.org/abs/2403.06412v1
Compressor summary: CLIcK is a new benchmark dataset for testing Korean language models' cultural and linguistic knowledge using questions from official exams and textbooks.
http://arxiv.org/abs/2403.06410v1
Compressor summary: The paper proposes LMPM, a pre-trained AI model that uses external memory and entity abstraction to generate entailment trees with logical consistency and improved credibility.
http://arxiv.org/abs/2403.06408v1
Compressor summary: The lens of perturbation helps understand how artificial perturbations affect large language models' performance, suggesting non-uniform quantization can improve efficiency without sacrificing performance.
http://arxiv.org/abs/2403.06407v1
Compressor summary: This paper explores how to efficiently fine-tune large language models for specific tasks in the medical domain by comparing different methods and optimizing training costs.
http://arxiv.org/abs/2403.06406v1
Compressor summary: The paper proposes a new method to compare no-reference image quality assessment (NR-IQA) models using analysis-by-synthesis framework and psychophysical testing, which provides better insights than conventional correlation-based metrics.
http://arxiv.org/abs/2403.06403v1
Compressor summary: PointSeg is a training-free method that uses off-the-shelf vision foundation models to segment 3D scenes accurately by constructing 3D point-box prompts pairs and applying iterative post-refinement and merging algorithms.
http://arxiv.org/abs/2403.06402v1
Compressor summary: The paper proposes adaptive in-context learning, where the number of examples used to control a generative model varies based on the similarity between the input and training data instances, improving text classification performance.
http://arxiv.org/abs/2403.06401v1
Compressor summary: This paper introduces InterPCSeg, a framework that allows users to improve point cloud semantic segmentation by providing corrective clicks, without needing offline re-training or specialized networks.
http://arxiv.org/abs/2403.06400v1
Compressor summary: The paper introduces a method to improve image generation from text by dividing the task into simpler subtasks and using layout information to guide the process.
http://arxiv.org/abs/2403.06399v1
Compressor summary: The authors create a large IGT corpus and use a pretrained multilingual model to generate IGT for various languages, improving performance on unsegmented text and small corpora.
http://arxiv.org/abs/2403.06398v1
Compressor summary: The text discusses how the width of neural networks affects their ability to avoid catastrophic forgetting when learning new tasks sequentially, and presents a theoretical framework to analyze this relationship.
http://arxiv.org/abs/2403.06397v1
Compressor summary: DeepSafeMPC is a novel method that uses centralized deep learning to predict environmental dynamics and apply Model Predictive Control to ensure safety in multi-agent reinforcement learning.
http://arxiv.org/abs/2403.06394v1
Compressor summary: The authors explore how a diffusion model called Dreambooth can synthesize views of novel objects without 3D priors, and introduce a method to transfer view knowledge from one object to another using low rank adapters.
http://arxiv.org/abs/2403.06392v1
Compressor summary: The paper explores how sharpness of learned minima affects out-of-distribution generalization and provides a tighter bound by considering robustness.
http://arxiv.org/abs/2403.06382v1
Compressor summary: The paper presents Fennec, a framework for model selection in transfer learning that uses a large vision model to infer a new task's representation in a transfer-related subspace, where distances represent transferability, and archi2vec to encode models' structures.
http://arxiv.org/abs/2403.06381v1
Compressor summary: The paper proposes attention regulation, a method to improve semantic fidelity in text-to-image synthesis by adjusting cross-attention layers during inference time without additional training or fine-tuning.
http://arxiv.org/abs/2403.06378v1
Compressor summary: The paper proposes StabStitch, a method that simultaneously performs video stitching and stabilization using unsupervised learning to reduce warping shakes and improve visual experience.
http://arxiv.org/abs/2403.06375v1
Compressor summary: The paper proposes FlowVQTalker, a method to create realistic talking faces with emotion-aware textures and lip synchronization using normalizing flows and vector quantization models.
http://arxiv.org/abs/2403.06367v1
Compressor summary: FEATAUG is a new feature augmentation framework that automatically generates predicate-aware SQL queries for one-to-many relationship tables, outperforming Featuretools and other baselines in effectiveness.
http://arxiv.org/abs/2403.06366v1
Compressor summary: This paper analyzes two types of soft Q-learning algorithms using switching system models and derives novel finite-time error bounds, contributing to a better understanding of soft Q-learning.
http://arxiv.org/abs/2403.06365v1
Compressor summary: Style2Talker is a new method to generate expressive talking head videos from audio by using text-controlled emotion and picture-controlled art styles, which outperforms existing methods in lip sync and style control.
http://arxiv.org/abs/2403.06363v1
Compressor summary: SAAS is a novel method for generating natural-looking talking head videos with diverse head motions and styles by using a multi-task VQ-VAE and a residual architecture.
http://arxiv.org/abs/2403.06361v1
Compressor summary: Shallow subject-specific adapters help decode cross-subject fMRI data into unified representations, improving neural representation learning and reconstruction for both high-level and low-level perceptions.
http://arxiv.org/abs/2403.06360v1
Compressor summary: The text discusses challenges in understanding noun compounds' meanings in NLP, proposes new relations for Romanian compounds, and tests them with humans and neural networks, finding that agreement tracks with frequency but no existing relation fully captures the meanings.
http://arxiv.org/abs/2403.06356v1
Compressor summary: Key points: - The paper proposes a framework to generate long videos without jitter and noise - The framework consists of four modules for tuning, fusing, and ensuring consistency - The method outperforms existing approaches in video quality Summary: The paper presents a novel framework that uses four modules to produce high-quality long videos with consistent background and foreground without jitter and noise.
http://arxiv.org/abs/2403.06355v1
Compressor summary: The paper presents a new method for multi-modal semantic understanding that aligns image and text features using CLIP-guided contrastive learning, achieving better results on sarcasm detection and sentiment analysis tasks.
http://arxiv.org/abs/2403.06354v1
Compressor summary: The authors train an Amharic language model on limited data using data augmentation, translation models, and multimodal learning, and release their methods and resources.
http://arxiv.org/abs/2403.06352v1
Compressor summary: The paper presents a lightweight CNN architecture, L-Mobilenet, that adapts well to embedded systems and achieves better performance than existing models with fewer parameters and less delay.
http://arxiv.org/abs/2403.06351v1
Compressor summary: The paper presents a method called Exo2Ego that converts third-person videos to first-person views, using a two-stage approach, and introduces a benchmark for evaluating this task.
http://arxiv.org/abs/2403.06350v1
Compressor summary: The text introduces a suite of resources for developing Indic language LLMs, covering 22 languages, using curated, valuable, and synthetic data from various sources, and addressing toxicity alignment.
http://arxiv.org/abs/2403.06349v1
Compressor summary: The paper proposes a novel method to combine histological images and genetic data for computer-aided diagnosis of brain tumor grades, using a Multi-modal Outer Arithmetic Block (MOAB) that applies arithmetic operations to latent representations of different modalities.