This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-06 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.03221v1
Compressor summary: The paper proposes a method that combines the strengths of correspondence-based and direct pose prediction methods to achieve both precision and robustness in 6DoF camera pose estimation.
http://arxiv.org/abs/2403.03219v1
Compressor summary: This study proposes a new algorithm for the linear contextual bandit problem that improves regret bounds and relaxes suboptimality gap assumptions, based on Follow-The-Regularized-Leader with Tsallis entropy.
http://arxiv.org/abs/2403.03218v1
Compressor summary: The WMDP benchmark evaluates hazardous knowledge in large language models and serves as a basis for developing methods to unlearn such knowledge.
http://arxiv.org/abs/2403.03217v1
Compressor summary: Key points: - The text is about a 3D patient body modeling method that uses multi-modal keypoint detection and self-supervised mesh regression. - The method addresses the issues of customization, data requirement, and annotation cost of existing solutions. - The method shows superior performance in patient positioning across different scenarios. Summary: The authors propose a 3D patient body modeling method that uses multi-modal keypoint detection and self-supervised mesh regression to improve patient positioning accuracy and efficiency, without the need for customization, extensive data, or costly annotations.
http://arxiv.org/abs/2403.03206v1
Compressor summary: The authors propose an improved noise sampling technique for rectified flow models that focuses on perceptually relevant scales, a novel transformer-based architecture for text-to-image synthesis, and show its superior performance in various metrics and human evaluations.
http://arxiv.org/abs/2403.03203v1
Compressor summary: Key points: - Research agenda in AI focuses on learning and reasoning integration - Humans use background knowledge (constraints) to infer answers from partially observed scenes - CLEVR-POC is a novel benchmark for reasoning-intensive VQA under constraints - Neuro-symbolic models outperform pre-trained vision language models on CLEVR-POC Summary: The paper introduces CLEVR-POC, a new challenge for AI to use logical constraints and visual cues to answer questions about hidden objects in partially observable scenes, and shows that neuro-symbolic models excel at this task.
http://arxiv.org/abs/2403.03194v1
Compressor summary: MAGID is a framework for generating diverse and high-quality images to augment text-only dialogues using a diffusion model and a feedback loop.
http://arxiv.org/abs/2403.03190v1
Compressor summary: The paper proposes the Triple-CFN approach and its variants, Meta Triple-CFN and Re-space layer, to improve artificial intelligence's performance on abstract reasoning problems such as Bongard-Logo and RPM.
http://arxiv.org/abs/2403.03188v1
Compressor summary: The study introduces an AI Assistant powered by GPT-4 to help communicate flood risks effectively, integrate real-time warnings with maps and data, and provide actionable advice for decision-makers and the public.
http://arxiv.org/abs/2403.03187v1
Compressor summary: The paper argues that retrieval-augmented language models can overcome the limitations of parametric ones by incorporating large-scale datastores, but current implementations are not yet optimal and need improvement.
http://arxiv.org/abs/2403.03186v1
Compressor summary: The text introduces Cradle, a foundation agent that can learn any computer task from screen images and audio inputs, and demonstrates its capabilities in the game Red Dead Redemption II.
http://arxiv.org/abs/2403.03185v1
Compressor summary: The paper proposes using Occupancy Measure (OM) divergence instead of Action Distribution (AD) divergence to prevent reward hacking in reinforcement learning by regularizing towards a safe policy.
http://arxiv.org/abs/2403.03183v1
Compressor summary: The text discusses how Transformers can approximate higher order optimization methods and perform tasks like logistic regression using linear attention and ReLU layers.
http://arxiv.org/abs/2403.03181v1
Compressor summary: VQ-BeT is a model that improves behavior generation by tokenizing continuous actions with vector quantization, achieving better performance and faster inference than existing models in various environments.
http://arxiv.org/abs/2403.03176v1
Compressor summary: The paper proposes a unified definition for multiple high-quality planning problems, enabling efficient certification using task transformations.
http://arxiv.org/abs/2403.03173v1
Compressor summary: The study introduces PMoC and Pose-Transformer, which improve AI's abstract reasoning and cognitive pattern recognition by modeling probability and learning positional information from image data.
http://arxiv.org/abs/2403.03172v1
Compressor summary: The paper proposes a model-based framework, MAGI, that uses an Imagined common goal to coordinate multiple agents in cooperative reinforcement learning tasks, improving sample efficiency and performance.
http://arxiv.org/abs/2403.03167v1
Compressor summary: PARADISE is a Q&A task that tests large language models' abductive reasoning and implicit knowledge inference in practical procedural text, revealing limitations of current models and providing valuable insights for future research.
http://arxiv.org/abs/2403.03163v1
Compressor summary: The authors benchmark Design2Code task, where multimodal LLMs convert screenshots into code implementations for real-world webpages, and find that GPT-4V performs the best in this task.
http://arxiv.org/abs/2403.03161v1
Compressor summary: PalmProbNet is a new probabilistic approach using transfer learning to automatically detect palm trees in dense tropical rainforests using UAV-derived orthomosaic imagery, achieving high accuracy and visualizing palm distribution with probability heatmaps.
http://arxiv.org/abs/2403.03150v1
Compressor summary: The paper proposes a deep learned compression model, HQARF, that uses learned vector quantization to compress RF signals for AI processing, reducing bandwidth and latency costs.
http://arxiv.org/abs/2403.03145v1
Compressor summary: The paper proposes a novel semi-supervised learning framework for audio-visual source localization that uses two teachers to generate high-quality pseudo-labels and outperforms current methods by a large margin.
http://arxiv.org/abs/2403.03141v1
Compressor summary: The Language Guided Exploration framework uses a language model to help reinforcement learning agents explore and learn better in complex text environments, improving performance compared to baselines.
http://arxiv.org/abs/2403.03134v1
Compressor summary: The paper proposes a simple linear model using segmentation models to measure image complexity, showing that it is well explained by the number of segments and classes in an image across diverse datasets.
http://arxiv.org/abs/2403.03129v1
Compressor summary: CoGenesis is a collaborative generation framework that integrates large and small language models on cloud and local devices to address privacy concerns while maintaining efficient command execution.
http://arxiv.org/abs/2403.03122v1
Compressor summary: The authors propose Neural Riemannian Distance Fields (NRDFs), a data-driven method for modeling the space of plausible articulations, which improves pose generation, estimation, and inverse kinematics tasks across humans, hands, and animals.
http://arxiv.org/abs/2403.03121v1
Compressor summary: The study investigates gendered emotion attribution in large language models and finds that they consistently exhibit emotions influenced by gender stereotypes.
http://arxiv.org/abs/2403.03120v1
Compressor summary: The paper proposes a technique to improve video segmentation performance without additional labeling by using optical flow to account for motion in the moving average calculation.
http://arxiv.org/abs/2403.03111v1
Compressor summary: The paper proposes a framework to improve real-time LiDAR odometry and mapping for self-driving cars by using semantic information and rejecting outliers in the matching process.
http://arxiv.org/abs/2403.03103v1
Compressor summary: Deep ensembles, which are groups of neural networks with different weights, become equivariant, meaning they respond in the same way to different transformations of inputs, by using data augmentation, and this property holds for any architecture and regardless of input location, in the infinite width limit.
http://arxiv.org/abs/2403.03102v1
Compressor summary: In-Dialogue Learning (IDL) is a fine-tuning framework that improves persona-based personalized dialogue generation without pre-defined profiles by using large language models and dialogue history, achieving significant improvements in BLEU and ROUGE scores and human evaluations.
http://arxiv.org/abs/2403.03101v1
Compressor summary: KnowAgent improves language agents' planning capabilities by incorporating explicit action knowledge, reducing planning hallucination and achieving better performance on complex tasks.
http://arxiv.org/abs/2403.03095v1
Compressor summary: XPL is a novel semi-supervised AVSL method that uses cross-refinement and two components (soft pseudo-labels and curriculum data selection) to avoid bias, improve stability, and achieve state-of-the-art performance.
http://arxiv.org/abs/2403.03082v1
Compressor summary: The paper proposes a two-level framework for continual learning that separates stability and plasticity mechanisms using generative adversarial meta-models to maintain past knowledge.
http://arxiv.org/abs/2403.03077v1
Compressor summary: The MiKASA Transformer is a novel model for 3D visual grounding that improves accuracy, spatial understanding, and explainability, achieving the highest overall performance in two challenges.
http://arxiv.org/abs/2403.03075v1
Compressor summary: The paper presents new methods for detecting and selecting visually-grounded text tokens in multimodal machine translation systems, using different techniques and improving performance.
http://arxiv.org/abs/2403.03069v1
Compressor summary: The paper proposes two methods to improve variational autoencoder estimation from incomplete data by addressing the increased complexity of the latent variable distribution.
http://arxiv.org/abs/2403.03063v1
Compressor summary: CrackNex is a framework that uses Retinex Theory to learn illumination-invariant representations for crack segmentation in low-light conditions, and utilizes few-shot segmentation to overcome the data efficiency issue.
http://arxiv.org/abs/2403.03045v1
Compressor summary: We propose a method to improve multimodal machine translation models by combining a text-only MT model with vision-text adapter layers and pre-training and fine-tuning on specific datasets.
http://arxiv.org/abs/2403.03037v1
Compressor summary: Key points: - The text is about transferring holistic perception to intelligent machines - It proposes a unified approach to video understanding with shared temporal modelling of human actions - It introduces EgoPack, a solution that creates a collection of task perspectives for multiple downstream tasks - It shows the effectiveness and efficiency of its approach on four benchmarks Summary: The text presents EgoPack, a unified approach to video understanding that enables intelligent machines to learn holistic perception from different tasks and use it for novel skills.
http://arxiv.org/abs/2403.03031v1
Compressor summary: The authors propose a cooperative framework that improves tool learning in large language models by modularizing the workflow and enabling agents to adapt based on feedback.
http://arxiv.org/abs/2403.03029v1
Compressor summary: The authors propose a framework called extsc{SocraticReframe} that uses question-answer pairs to help reframe negative thoughts into positive ones, improving the performance of language models in this task.
http://arxiv.org/abs/2403.03028v1
Compressor summary: The study introduces a method to improve the explainability of LLMs by varying individual words in prompts to measure their impact on model outputs and various text scores, addressing concerns about their transparency, reliability, and ethical use.
http://arxiv.org/abs/2403.03020v1
Compressor summary: The paper explores how sequence models with different properties affect meta-RL performance and proposes a new method called SplAgger that combines them.
http://arxiv.org/abs/2403.03018v1
Compressor summary: Key points: - CRISPR is a gene editing technology with challenges in predicting sgRNA efficacy and off-target effects - The paper proposes a novel ensemble learning method that combines multiple models to improve accuracy and generalizability - The method outperformed existing methods on a benchmark dataset and could have implications for clinical use of CRISPR Summary: The paper presents a new method to design sgRNAs for CRISPR gene editing that uses multiple machine learning models to improve the accuracy and generalizability of predictions, which could lead to safer and more effective treatments for diseases.
http://arxiv.org/abs/2403.03017v1
Compressor summary: The text introduces OPEx, a framework to analyze the impact of various components on embodied instruction following tasks using large language models, and shows that combining multi-agent dialogue and LLMs improves performance.
http://arxiv.org/abs/2403.03014v1
Compressor summary: The paper suggests that multimodal machine translation (MMT) models should be evaluated using three different test sets to capture their use of visual information and complex sentence translation abilities, which are not measured by the current Multi30k testing set.
http://arxiv.org/abs/2403.03008v1
Compressor summary: The paper proposes using knowledge graphs to improve the quality and accuracy of explanations generated by large language models for personalized education.
http://arxiv.org/abs/2403.03003v1
Compressor summary: Mixture-of-Resolution Adaptation (MRA) improves multimodal large language models' visual recognition by combining low- and high-resolution features from images.
http://arxiv.org/abs/2403.02993v1
Compressor summary: The paper explores how focusing on local optima instead of global optimization can improve prompt optimization for large language models, and proposes a new algorithm called ZOPO that uses a Gaussian process to efficiently search for local optima.
http://arxiv.org/abs/2403.02991v1
Compressor summary: MADTP is a framework that aligns different modalities and dynamically adjusts compression ratios to reduce VLT computation costs while maintaining performance.
http://arxiv.org/abs/2403.02990v1
Compressor summary: This paper surveys how large language models impact data augmentation techniques, discussing challenges and opportunities in natural language processing and other domains.
http://arxiv.org/abs/2403.02985v1
Compressor summary: Evolution Transformer is a meta-optimization system that improves evolutionary algorithms by learning from data and using causal Transformers to update search distributions.
http://arxiv.org/abs/2403.02981v1
Compressor summary: The paper proposes a Doubly Abductive Counterfactual inference framework (DAC) to improve text-based image editing by balancing editability and fidelity, supporting various user intents.
http://arxiv.org/abs/2403.02975v1
Compressor summary: Keywords: semantic matching, natural language processing, multilingual, MCP-SM framework
http://arxiv.org/abs/2403.02969v1
Compressor summary: The proposed AnyRef model generates pixel-wise object perceptions and natural language descriptions from various multi-modality references for more flexible visual-language interactions.
http://arxiv.org/abs/2403.02966v1
Compressor summary: EFSum is a method that creates evidence-focused summaries of facts for knowledge graphs to enhance question answering with large language models.
http://arxiv.org/abs/2403.02965v1
Compressor summary: The paper shows that ChatGPT can perform well in face recognition, gender detection, and age estimation tasks, despite avoiding sensitive information.
http://arxiv.org/abs/2403.02962v1
Compressor summary: The paper introduces a new dataset for testing large language models' ability to edit irregular tables and evaluates their performance on it.
http://arxiv.org/abs/2403.02959v1
Compressor summary: SimuCourt is a judicial benchmark that evaluates agents' decision-making power using real-world data and a large legal knowledge base, outperforming existing methods.
http://arxiv.org/abs/2403.02957v1
Compressor summary: The paper proves that a specific diffusion probabilistic model denoising method converges to the best possible estimator, and shows how it can be used for noise removal and generation.
http://arxiv.org/abs/2403.02951v1
Compressor summary: Key points: - Large Language Models (LLMs) are powerful tools for Text-to-SQL, but no consensus on prompt templates and design frameworks - Existing benchmarks lack comprehensive evaluation of LLMs across sub-tasks - New dataset and five evaluation tasks to assess LLMs' performance and propose optimal in-context learning solutions Summary: The paper introduces a new dataset and evaluation tasks to assess and optimize LLM-based Text-to-SQL systems, addressing the limitations of existing benchmarks and prompt templates.
http://arxiv.org/abs/2403.02950v1
Compressor summary: Venom is a new attack method to enhance existing backdoors and make them harder to detect and remove by model reconstruction-based defenses.
http://arxiv.org/abs/2403.02946v1
Compressor summary: This paper proposes a faster way to test the reliability of systolic array-based DNN accelerators using a new fault injection method.
http://arxiv.org/abs/2403.02945v1
Compressor summary: The paper tests if common patient subgroups in ICUs exist by comparing two different datasets but finds limited similarities, suggesting that standardized restructuring may not work and customization might be better.
http://arxiv.org/abs/2403.02944v1
Compressor summary: Text-guided image compression algorithm achieves high perceptual and pixel-wise fidelity by using text-adaptive encoding and training with joint image-text loss, outperforming all baselines in terms of LPIPS.
http://arxiv.org/abs/2403.02938v1
Compressor summary: The system adjusts playback speed based on phonemes and uses speech recognizer score to ensure intelligibility, improving time efficiency of content comprehension.
http://arxiv.org/abs/2403.02936v1
Compressor summary: The paper introduces a new fault-tolerant approximate multiplier designed for ASIC-based deep learning chips.
http://arxiv.org/abs/2403.02933v1
Compressor summary: The paper proposes a fuzzy extension of Datalog with existential rules to perform logical reasoning with uncertain neural and symbolic data from various sources.
http://arxiv.org/abs/2403.02932v1
Compressor summary: The paper proposes RulePrompt, a method for text classification that uses logical expressions to characterize category meanings and improves performance by iteratively updating these rules and pseudo labels.
http://arxiv.org/abs/2403.02930v1
Compressor summary: The authors thoroughly investigated and replicated the BASS summarization system, finding performance discrepancies and stressing the importance of clear communication in replication studies.
http://arxiv.org/abs/2403.02922v1
Compressor summary: The text describes a new machine learning method that improves the accuracy of retrieving forest variables from satellite data and corrects biases in radiative transfer models.
http://arxiv.org/abs/2403.02920v1
Compressor summary: TaylorShift is a new method that improves the efficiency of attention mechanisms for long sequences in Transformers without sacrificing performance or token-to-token interactions.
http://arxiv.org/abs/2403.02919v1
Compressor summary: The paper presents CycleDM, a machine learning method that converts machine-printed characters to handwritten ones and vice versa, by combining CycleGAN with a diffusion model.
http://arxiv.org/abs/2403.02914v1
Compressor summary: The paper proposes a novel approach called DynST for optimizing sensor deployment in earth science systems by dynamically filtering important sensors based on spatio-temporal data.
http://arxiv.org/abs/2403.02910v1
Compressor summary: The paper proposes a jailbreak attack on vision language models that can bypass their safety barriers and perform harmful actions using poisoned image-text data pairs.
http://arxiv.org/abs/2403.02909v1
Compressor summary: The paper presents a novel method for predicting gaze direction in low-light conditions using temporal event encoding and a dedicated neural network, achieving high accuracy.
http://arxiv.org/abs/2403.02902v1
Compressor summary: The study uses information flow analysis to show how word- and text-level classifications can mutually improve each other in text classification tasks and applies this knowledge to prompt learning for better predictions.
http://arxiv.org/abs/2403.02901v1
Compressor summary: This paper surveys automatic text summarization (ATS) methods using natural language processing and large language models, focusing on practical real-world implementations.
http://arxiv.org/abs/2403.02899v1
Compressor summary: DAMP is a technique that uses mutual prompting to align visual and textual embeddings for better unsupervised domain adaptation, improving performance on three benchmarks.
http://arxiv.org/abs/2403.02893v1
Compressor summary: The paper proposes a model that can detect causal relations between events in texts across different languages using heterogeneous graphs and contrastive transfer learning.
http://arxiv.org/abs/2403.02892v1
Compressor summary: The paper proposes a framework for long-term person re-identification that uses global and local information, including body part features, to handle clothes-changing scenarios and achieves better performance than existing methods.
http://arxiv.org/abs/2403.02889v1
Compressor summary: The paper proposes a new method to detect hallucinations in large language models, which can help improve their reliability and real-world adoption.
http://arxiv.org/abs/2403.02887v1
Compressor summary: The paper proposes conditional diffusion models as a promising decoder for generative image compression, which considers both rate and visual quality by adjusting tradeoffs.
http://arxiv.org/abs/2403.02886v1
Compressor summary: The paper reveals that many confidence estimation methods are harmful for detecting misclassification errors, proposes a new method to enlarge the confidence gap and improve failure prediction performance.
http://arxiv.org/abs/2403.02884v1
Compressor summary: MathScale is a method to create high-quality math reasoning data using large language models, which improves their mathematical problem-solving skills.
http://arxiv.org/abs/2403.02879v1
Compressor summary: Zero-LED is a novel diffusion model that enhances low-light images without paired training data, using bidirectional constraints and frequency-domain based appearance reconstruction.
http://arxiv.org/abs/2403.02877v1
Compressor summary: The paper proposes an active learning method for autonomous driving that efficiently annotates data based on route planning criteria and achieves similar performance to state-of-the-art methods with much less data.
http://arxiv.org/abs/2403.02875v1
Compressor summary: The authors propose a new pretraining method using hard negative text examples and introduce a challenging dataset (InpaintCOCO) to improve fine-grained concept understanding in multimodal models.
http://arxiv.org/abs/2403.02873v1
Compressor summary: The paper presents a method for analyzing probabilistic algorithms with light-tailed randomization by simplifying them to use bounded random variables, which works for various distributions like exponential and sub-Gaussian, and gives examples of its application.
http://arxiv.org/abs/2403.02870v1
Compressor summary: The text discusses how side-channel attacks can reveal vital information about deep learning models, enabling model extraction attacks even without prior knowledge of the target model.
http://arxiv.org/abs/2403.02839v1
Compressor summary: This study compares different judge models' evaluation of LLMs, finding that fine-tuned judge models outperform GPT4 on in-domain tasks but lag behind GPT4 in generalization and fairness.
http://arxiv.org/abs/2403.02833v1
Compressor summary: The paper presents SOFIM, a stochastic optimization method that uses the regularized Fisher information matrix to approximate the Hessian and achieve faster convergence in machine learning model training than existing methods.
http://arxiv.org/abs/2403.02827v1
Compressor summary: The paper proposes a method to improve image-to-video fidelity by adding noise and rectifying it in a tuning-free way.
http://arxiv.org/abs/2403.02821v1
Compressor summary: The text proposes using hydropower plants as ecosystem protectors by adjusting discharges with a neural network and integrating it into management software, potentially increasing electricity production and mitigating climate change impacts.
http://arxiv.org/abs/2403.02820v1
Compressor summary: The paper proposes a new neural network-based method for reconstructing 3D images of wood logs from 2D slices, improving the identification of biological features.
http://arxiv.org/abs/2403.02818v1
Compressor summary: Key points: - Sparsely-annotated framework for 3D object detection requires only one 3D bounding box per scene - SS3D++ method improves training and generates confident fully-annotated scenes from sparse seeds - Achieves competitive or better performance with significantly less annotation cost than SOTA methods Summary: The paper proposes a sparsely-annotated framework for 3D object detection, which uses SS3D++ method to generate confident fully-annotated scenes and reduces annotation cost while maintaining or improving performance.
http://arxiv.org/abs/2403.02814v1
Compressor summary: InjectTST is a method to improve multivariate time series forecasting by selectively injecting global information into individual channels of a channel-independent Transformer model, enhancing its robustness and performance.
http://arxiv.org/abs/2403.02810v1
Compressor summary: The paper introduces a new operator learning algorithm, DGGO, that can learn parametric PDEs and generalize on arbitrary discretization schemes for discrete mechanics problems using dynamic Gaussian graph kernels and Fourier neural operators.
http://arxiv.org/abs/2403.02799v1
Compressor summary: The paper introduces DPPA, a dual-stage method for merging complex fine-tuned models that combines dynamically pruning and partition amplification to enhance performance while preserving fewer parameters.
http://arxiv.org/abs/2403.02795v1
Compressor summary: Key points: - The paper proposes using Language Models (LMs) as educational experts to assess the impact of instructions on learning outcomes - GPT-3.5 can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect - The paper introduces an instruction optimization approach using one LM as a reward function for another LM to generate math word problem worksheets - Human teachers' evaluations show a significant alignment between LM judgments and human teacher preferences Summary: The paper shows how Language Models can evaluate and optimize instructional materials by replicating educational findings and creating math worksheets that align with human teacher preferences.
http://arxiv.org/abs/2403.02786v1
Compressor summary: This study uses graph neural networks to predict fatty liver disease from health checkup data, while providing personalized explanations for better clinical interpretability.
http://arxiv.org/abs/2403.02784v1
Compressor summary: The paper presents a hybrid training strategy and dual-domain image fusion for semantic segmentation of remote sensing images using unsupervised domain adaptation, improving performance through pseudo-label region-specific weighting.
http://arxiv.org/abs/2403.02783v1
Compressor summary: This paper investigates the phase transition in the Quadratic Assignment Problem (QAP), a challenging problem in combinatorial optimization, by introducing a new submodular-based design and analyzing its correlation with solving effort.
http://arxiv.org/abs/2403.02782v1
Compressor summary: The paper introduces KEPP, a system that uses probabilistic procedural knowledge graphs to help agents create strategic plans for real-life tasks based on visual observations and instructions.
http://arxiv.org/abs/2403.02781v1
Compressor summary: The paper proposes an unsupervised domain prompt distillation framework to transfer knowledge from a larger CLIP model to a smaller one using unlabeled images and pre-stored text features.
http://arxiv.org/abs/2403.02780v1
Compressor summary: This paper presents a rigorous theoretical foundation for Non-Readily Identifiable Data Collaboration (NRI-DC) framework to improve the quality and diversity of training datasets while protecting user privacy, and shows that the proposed approach enhances model performance without compromising communication efficiency or privacy protections.
http://arxiv.org/abs/2403.02777v1
Compressor summary: The paper presents a zero-shot learning strategy for autonomous endovascular navigation using reinforcement learning that can generalize to unseen vascular anatomies without retraining and achieve high success rates.
http://arxiv.org/abs/2403.02775v1
Compressor summary: EasyQuant is a training-free, data-independent method for reducing the size of large language models without sacrificing performance or generalization.
http://arxiv.org/abs/2403.02772v1
Compressor summary: The paper introduces a novel AI-driven virtual rehabilitation framework that uses supervised contrastive learning with hard and soft negative samples to train a single model applicable to all exercise types, improving generalizability and reducing complexity in rehabilitation exercise assessment.
http://arxiv.org/abs/2403.02769v1
Compressor summary: The paper proposes an unsupervised method to detect humans in complex scenes by transferring knowledge from synthetic data to real data using novel modules for representation and feature alignment.
http://arxiv.org/abs/2403.02767v1
Compressor summary: DeconfuseTrack is a new multi-object tracker that uses Decomposed Data Association and Occlusion-aware Non-Maximum Suppression to reduce confusion in tracking objects across multiple videos.
http://arxiv.org/abs/2403.02765v1
Compressor summary: The text describes a new machine learning model called G4-attention that uses neural networks and attention layers to accurately predict four-stranded nucleic acid structures involved in various biological roles.
http://arxiv.org/abs/2403.02760v1
Compressor summary: The text discusses how large language models can improve recommendation systems by understanding users' interests and capturing textual information better than deep neural networks.
http://arxiv.org/abs/2403.02757v1
Compressor summary: The paper proposes a novel learning framework called In-memory Learning, where agents improve themselves by refining notes in their memory using natural language.
http://arxiv.org/abs/2403.02756v1
Compressor summary: REGA is a novel approach to adapt Large Language Models for multiple domains without compromising their general capabilities or causing confusion between domains.
http://arxiv.org/abs/2403.02753v1
Compressor summary: This paper presents a novel method for learning features of multi-person activities without manual annotations, using person attributes and location guidance to disentangle the complex features and achieve state-of-the-art performance on two datasets.
http://arxiv.org/abs/2403.02746v1
Compressor summary: Paraformer is a framework that uses a parallel CNN-Transformer feature extractor to map large-scale high-resolution land-cover from low-resolution historical data with weak supervision.
http://arxiv.org/abs/2403.02745v1
Compressor summary: The paper proposes a method to robustly recalibrate values in preference datasets for large language models, handling incomplete and corrupted data.
http://arxiv.org/abs/2403.02742v1
Compressor summary: Hypnos is a Chinese Anesthesia model that improves data quality, fine-tunes with general medicine data, and introduces a standardized benchmark for evaluating medical large language models in anesthesiology.
http://arxiv.org/abs/2403.02738v1
Compressor summary: The paper proposes a novel causal prompting method for large language models to mitigate biases using front-door adjustment, without accessing the model's parameters or logits.
http://arxiv.org/abs/2403.02737v1
Compressor summary: The text introduces Neural FDEs, a new deep learning architecture that adapts Fractional Differential Equations for modelling non-local and memory-dependent systems, potentially outperforming Neural ODEs in such tasks.
http://arxiv.org/abs/2403.02736v1
Compressor summary: The paper presents new cluster-based sampling methods for detecting rare objects in high-resolution images without labeled data or spatial prior, and applies them to identify bomas in the Serengeti Mara region.
http://arxiv.org/abs/2403.02730v1
Compressor summary: The paper proposes a two-stage training method for Neural ODEs that solves constrained optimization problems without hyperparameters, improving model performance and explainability.
http://arxiv.org/abs/2403.02727v1
Compressor summary: The paper shows that large language models can recognize human activities from raw sensor data using appropriate prompts and outperform traditional and deep learning baselines.
http://arxiv.org/abs/2403.02723v1
Compressor summary: The paper proposes a new type of attack on Graph Neural Networks that adaptively finds the minimum perturbation needed to fool each node, and shows its effectiveness on different datasets.
http://arxiv.org/abs/2403.02719v1
Compressor summary: The paper proposes a multi-scale subgraph contrastive learning method for graphs that captures fine-grained semantic information by generating global and local views based on subgraph sampling.
http://arxiv.org/abs/2403.02718v1
Compressor summary: The paper proposes a new framework for incrementally learning relations from data streams that balances preserving old information and acquiring new information, improving performance over existing methods.
http://arxiv.org/abs/2403.02715v1
Compressor summary: The paper presents a Vietnamese language model fine-tuning method and an evaluation framework to improve large language models' effectiveness in processing Vietnamese.
http://arxiv.org/abs/2403.02714v1
Compressor summary: The text introduces a new task (Adaptive Domain Generalization) and dataset (DomainVerse) for vision-language models to adapt to various realistic domains without fine-tuning, and proposes two tuning-free methods (Domain CLIP and Domain++ CLIP).
http://arxiv.org/abs/2403.02713v1
Compressor summary: The paper proposes CoAT, a method to improve smartphone GUI agents by considering semantic information and actions thinking in natural language tasks.
http://arxiv.org/abs/2403.02712v1
Compressor summary: Breeze-7B is an open-source language model that improves Chinese comprehension and chatbot skills, outperforming similar models.
http://arxiv.org/abs/2403.02707v1
Compressor summary: The paper introduces a method to improve medical VQA models by applying gradient-guided parameter perturbations during pre-training and fine-tuning.
http://arxiv.org/abs/2403.02698v1
Compressor summary: Causal Walk is a novel method for debiasing multi-hop fact verification using causal inference and front-door adjustment, which can handle complicated bias patterns hidden in multiple hops of evidence.
http://arxiv.org/abs/2403.02695v1
Compressor summary: The paper proposes an optimization scheme to improve performance on different groups or domains in distribution shifts, using Controllable Prompt Tuning (CPT) to reduce computational cost.
http://arxiv.org/abs/2403.02691v1
Compressor summary: InjecAgent is a benchmark to assess and mitigate indirect prompt injection attacks on tool-integrated large language models, finding them vulnerable to such attacks.
http://arxiv.org/abs/2403.02690v1
Compressor summary: The paper proposes a new method called RENT for learning with noisy labels by resampling based on the noise transition matrix, and shows its superior performance over existing methods.
http://arxiv.org/abs/2403.02689v1
Compressor summary: DCFM is a novel video semantic segmentation approach that leverages feature sharing to address challenges like redundant computation and feature propagation reliability.
http://arxiv.org/abs/2403.02683v1
Compressor summary: The L2D framework adapts to new experts at test-time using meta-learning and attention mechanism for safe and robust autonomous systems.
http://arxiv.org/abs/2403.02682v1
Compressor summary: Time Weaver is a new model that uses various types of contextual metadata to generate more realistic time series for applications like electricity demand forecasting, outperforming existing methods by up to 27%.
http://arxiv.org/abs/2403.02681v1
Compressor summary: The paper proposes a new optimization method for deep neural networks that combines first-order and second-order techniques, improving both accuracy and generalization.
http://arxiv.org/abs/2403.02677v1
Compressor summary: The proposed framework uses fine-tuned multimodal language models to filter image-text data more effectively than existing methods, leading to improved performance on popular models and tasks.
http://arxiv.org/abs/2403.02674v1
Compressor summary: SEEDA is a new dataset for grammatical error correction meta-evaluation that improves correlation by using consistent granularity and considering modern systems.
http://arxiv.org/abs/2403.02649v1
Compressor summary: TiF learner uses diffusion models and low-rank adapters to extract nuanced class attributes from few-shot images for classification.
http://arxiv.org/abs/2403.02648v1
Compressor summary: Key points: - KATE is a new optimization algorithm based on AdaGrad - KATE is scale-invariant for Generalized Linear Models - KATE has a convergence rate similar to AdaGrad and Adam - KATE performs better than AdaGrad and matches/surpasses Adam in various tasks Summary: KATE is a novel, scale-invariant optimization algorithm that achieves comparable or better performance than existing adaptive methods like Adam and AdaGrad in different machine learning problems.
http://arxiv.org/abs/2403.02647v1
Compressor summary: The paper presents FinReport, a system that helps ordinary investors collect, analyze, and generate reports on stock earnings using financial news and a multi-factor model.
http://arxiv.org/abs/2403.02640v1
Compressor summary: Key points: - V2X is a popular topic in autonomous driving - VIC improves roadside perception with multi-sensor holographic systems - HoloVIC is a large-scale dataset with different sensors and intersections - HoloVIC has four tasks and benchmarks for research purposes Summary: The paper introduces HoloVIC, a large-scale dataset for VIC research in autonomous driving, using multi-sensor holographic systems at various intersections.
http://arxiv.org/abs/2403.02639v1
Compressor summary: The paper introduces false-positive sampling, a technique that improves 3D object detection models by retraining them using point clouds predicted as false positives, and shows its effectiveness on KITTI and Waymo Open datasets.
http://arxiv.org/abs/2403.02637v1
Compressor summary: The paper proposes a method called Brain-inspired Streaming Dual-level Perturbations (BSDP) that uses old samples as perturbations to help deep learning models learn new categories without forgetting the old ones, inspired by how humans learn.
http://arxiv.org/abs/2403.02628v1
Compressor summary: The paper proposes a novel Interactive Continual Learning framework that leverages collaborative interactions among different-sized language models, enabling them to better emulate advanced life forms' continual learning abilities.
http://arxiv.org/abs/2403.02626v1
Compressor summary: The proposed framework replaces manual image labeling with natural language interactions to quickly and effectively train classifiers for subjective or nuanced concepts, reducing the need for crowd-sourced annotations and outperforming existing methods.
http://arxiv.org/abs/2403.02624v1
Compressor summary: The paper presents a Pareto-Efficient algorithm for optimal treatment selection by balancing short-term and long-term effects, using Pareto-Optimal Estimation and Policy Learning methods.
http://arxiv.org/abs/2403.02622v1
Compressor summary: The paper reviews world models, a promising approach for autonomous driving systems to predict future scenarios and compensate for information gaps, and discusses their theoretical foundations, practical applications, and ongoing research challenges.
http://arxiv.org/abs/2403.02616v1
Compressor summary: MAD-Transformer is a method to detect and diagnose abnormal behaviors in industrial cyber-physical systems using fine-grained adaptive anomaly diagnosis, capturing temporal, spatial, and series dependencies among multivariate time series.
http://arxiv.org/abs/2403.02615v1
Compressor summary: The MCR benchmark tests LLMs' ability to reason about different types of composition relations in English and five other languages, evaluating their robustness and adaptability.
http://arxiv.org/abs/2403.02611v1
Compressor summary: The paper proposes a unified framework using multi-pyramid transformer and extended frequency contrastive regularization to address defocus blur in microscope imaging, achieving state-of-the-art performance.
http://arxiv.org/abs/2403.02610v1
Compressor summary: The paper describes the second ChatGPT4PCG competition at IEEE Conference on Games, which improves on the first edition by introducing a new evaluation metric, allowing Python submissions, and making other changes to foster prompt engineering for procedural content generation.
http://arxiv.org/abs/2403.02608v1
Compressor summary: The paper introduces DNNLasso, a faster and more accurate method for estimating a sparse Kronecker-sum structure of precision matrices in matrix-variate Gaussian graphical models.
http://arxiv.org/abs/2403.02600v1
Compressor summary: The paper proposes TESTAM, a deep learning model that separately captures recurring and non-recurring traffic patterns using a mixture-of-experts approach with three experts for temporal, spatio-temporal, and dynamic dependency modeling.
http://arxiv.org/abs/2403.02598v1
Compressor summary: The paper proposes a Category theory-based approach to handle shifts/imbalances in covariates for overparameterized models, simplifying data analysis and unifying different problem settings.
http://arxiv.org/abs/2403.02586v1
Compressor summary: The authors propose a new approach to improve zero-shot event detection by creating a diverse dataset of event definitions and fine-tuning a LLaMA-2-7B model, achieving better results than GPT-3.5 on open benchmarks.
http://arxiv.org/abs/2403.02581v1
Compressor summary: VEglue is a new testing method for visual entailment systems that uses object alignment and joint erasing to detect issues and improve model performance.
http://arxiv.org/abs/2403.02580v1
Compressor summary: Inverting CLIP models generates images related to the given target prompts, which can reveal insights into their abilities and biases, but may also produce NSFW content.
http://arxiv.org/abs/2403.02573v1
Compressor summary: The paper proposes a learning-augmented online algorithm for resource-constrained wireless communication that balances transmission and staleness costs while ensuring worst-case performance guarantees and good average performance.
http://arxiv.org/abs/2403.02571v1
Compressor summary: DPAdapter is a novel technique that boosts the performance of differentially private machine learning models by improving parameter robustness.
http://arxiv.org/abs/2403.02567v1
Compressor summary: xSTREAM is a multilingual dataset for structured reasoning and explanation tasks that reveals LLM performance gaps between English and other languages, and proposes methods to improve reasoning by incorporating code using machine translation and step-by-step prompts.
http://arxiv.org/abs/2403.02563v1
Compressor summary: The paper reviews 101 recent papers on sign language AI, revealing significant biases and lack of input from Deaf stakeholders, and calls for more ethical development and inclusion of Deaf perspectives in the field.
http://arxiv.org/abs/2403.02561v1
Compressor summary: SHERT is a novel pipeline for reconstructing semantic human meshes with textures and high-precision details, overcoming challenges faced by current methods in industrial applications.
http://arxiv.org/abs/2403.02558v1
Compressor summary: The text discusses the need for updated guidelines in using and evaluating generative models in clinical AI research, building on the MI-CLAIM checklist.