This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-13 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.07874v1
Compressor summary: The authors propose a method to translate images into words using a large language model without fine-tuning, enabling image comprehension and denoising tasks.
http://arxiv.org/abs/2403.07872v1
Compressor summary: The paper proposes an RWQ-Elo rating system for assessing large language models based on a two-player competitive format using real-world questions and demonstrates its advantages over MCQA.
http://arxiv.org/abs/2403.07865v1
Compressor summary: The paper introduces CodeAttack, a framework that tests the safety generalization of LLMs by transforming natural language inputs into code inputs, revealing common vulnerabilities in all studied models.
http://arxiv.org/abs/2403.07860v1
Compressor summary: The paper introduces LaVi-Bridge, a pipeline that integrates diverse language and generative vision models for text-to-image generation, improving alignment and quality.
http://arxiv.org/abs/2403.07857v1
Compressor summary: The text discusses how model-induced distribution shifts can cause performance and fairness issues in machine learning models, but also proposes a framework called algorithmic reparation to address these problems and promote equity.
http://arxiv.org/abs/2403.07856v1
Compressor summary: The study uses Quantum Support Vector Machine (QSVM) to improve prostate cancer detection, achieving comparable accuracy and increased sensitivity over classical SVM.
http://arxiv.org/abs/2403.07854v1
Compressor summary: Key points: - Data pruning can reduce model size and training time but may compromise accuracy - Knowledge distillation (KD) integrates soft predictions from a teacher network pre-trained on full data to guide the pruned student network - KD improves pruned models across datasets, pruning methods, and pruning fractions - There is a trade-off between the pruning factor and the optimal knowledge distillation weight - Smaller teachers may outperform larger ones for lower pruning fractions Summary: The paper shows that using knowledge distillation with data pruning can improve accuracy and suggests optimal parameters for different pruning regimes.
http://arxiv.org/abs/2403.07851v1
Compressor summary: O-FSCIL is a lightweight, memory-efficient method for incrementally learning new classes from few examples on resource-constrained devices.
http://arxiv.org/abs/2403.07849v1
Compressor summary: EEGL is an iterative algorithm that improves GNNs for node classification by using explanations from subgraph mining to obtain application-dependent features.
http://arxiv.org/abs/2403.07843v1
Compressor summary: Udaan, India's largest B2B ecommerce platform, uses an ensemble of machine learning models to predict and optimize customer order patterns, resulting in a significant increase in order rates.
http://arxiv.org/abs/2403.07842v1
Compressor summary: DP-TLDM is a novel tabular synthesizer that combines an autoencoder with a latent diffusion model to generate high-quality, privacy-preserving synthetic data.
http://arxiv.org/abs/2403.07839v1
Compressor summary: Key points: - The paper proposes a new pruning framework (MoPE-CLIP) for vision-language pre-trained models - MoPE metric assesses module importance by performance decline on cross-modal tasks - MoPE-CLIP reduces pre-training costs and achieves competitive task-specific performance Summary: The paper introduces a new pruning method (MoPE-CLIP) that uses a novel metric to measure the importance of modules in vision-language pre-trained models, leading to reduced pre-training costs and high task-specific performance.
http://arxiv.org/abs/2403.07825v1
Compressor summary: The paper proposes GORA, a new method to evaluate and measure the ripple effect in large language models, and SORA, a technique to mitigate this effect by selectively re-editing the model.
http://arxiv.org/abs/2403.07818v1
Compressor summary: The paper proposes a new label dropout technique to improve the robustness of deep learning models for echocardiography segmentation when trained with multiple diverse partially-labelled datasets.
http://arxiv.org/abs/2403.07816v1
Compressor summary: Branch-Train-MiX (BTX) is a method for training large language models with multiple specialized skills by branching a seed model, training experts in parallel, combining their feedforward parameters, and learning token-level routing.
http://arxiv.org/abs/2403.07815v1
Compressor summary: Chronos is a framework for pretraining probabilistic time series models that use transformer-based language models and tokenized time series data, achieving high zero-shot performance on various forecasting tasks.
http://arxiv.org/abs/2403.07809v1
Compressor summary: $ extbf{pyvene}$ is a Python library that allows customizable interventions on different PyTorch modules for various AI applications, including interpretability and robustness.
http://arxiv.org/abs/2403.07807v1
Compressor summary: StyleGaussian is a fast 3D style transfer method that uses 3D Gaussian Splatting, VGG features, and a K-nearest-neighbor-based 3D CNN to achieve high quality stylization with real-time rendering and multi-view consistency.
http://arxiv.org/abs/2403.07805v1
Compressor summary: Key points: - LMs are effective in NLP tasks, especially knowledge-intensive ones - Paper investigates how LMs access their memory sequentially or randomly - LMs can access memory sequentially but struggle with random access - Recitation and permutation techniques improve random access - Improved random access helps question answering Summary: The paper explores how LMs access their memory in different scenarios, and proposes recitation and permutation to enhance their random access, which benefits question answering.
http://arxiv.org/abs/2403.07798v1
Compressor summary: The text proposes a Fourier method (FTF) to improve unsupervised domain adaptation by fusing low-level information from both domains and aligning multiple sources of data, achieving better generalization and performance on four benchmark datasets.
http://arxiv.org/abs/2403.07797v1
Compressor summary: Jam-pgm is a new technique that allows graphical model-based synthetic data generation to use public data, improving quality and outperforming other methods, even with biased public data distributions.
http://arxiv.org/abs/2403.07794v1
Compressor summary: The paper proposes sequential instruction tuning to improve large language models' ability to follow multiple instructions in complex tasks, and analyzes its effects on various factors.
http://arxiv.org/abs/2403.07773v1
Compressor summary: SemCity generates realistic outdoor scenes using a 3D diffusion model with triplane representation and manipulation, improving performance on tasks like inpainting and semantic completion.
http://arxiv.org/abs/2403.07769v1
Compressor summary: The article discusses using large language models and multi-agent systems theory to create artificial agents that can simulate human interactions and support various organizational processes, overcoming some limitations of traditional approaches.
http://arxiv.org/abs/2403.07764v1
Compressor summary: Stable-Makeup is a novel diffusion-based method that transfers realistic and detailed makeup to user-provided faces, preserving content and structure, and showing strong robustness and generalizability for various tasks.
http://arxiv.org/abs/2403.07750v1
Compressor summary: Our method synthesizes image-text pairs using LLMs and image generation models, improving VLM training efficiency and performance on image captioning tasks.
http://arxiv.org/abs/2403.07747v1
Compressor summary: FineMath is a new benchmark dataset for evaluating Chinese LLMs' mathematical reasoning skills on diverse elementary school math problems with different difficulty levels.
http://arxiv.org/abs/2403.07746v1
Compressor summary: HyDRa is a novel camera-radar fusion architecture that improves depth prediction and 3D perception for autonomous driving in diverse conditions, achieving state-of-the-art results on nuScenes and Occ3D benchmarks.
http://arxiv.org/abs/2403.07741v1
Compressor summary: The paper proposes a method to quantify uncertainty in multi-stage 6D object pose estimation using deep ensembles and evaluates it on SurfEmb, a top-performing approach.
http://arxiv.org/abs/2403.07733v1
Compressor summary: DSEG-LIME is a new method to improve image analysis by creating more accurate and consistent explanations of complex machine learning models using data-driven segmentation.
http://arxiv.org/abs/2403.07726v1
Compressor summary: The paper describes SHROOM, a shared task on detecting inaccurate NLG outputs from 3 tasks, and reports trends and results from 42 teams participating in it.
http://arxiv.org/abs/2403.07724v1
Compressor summary: The paper presents a framework to analyze how data restrictions affect the accuracy and fairness of Bayesian classifiers under various scenarios and fairness definitions.
http://arxiv.org/abs/2403.07723v1
Compressor summary: Shuffling gradient methods, such as Random Reshuffle and Shuffle Once, have good empirical performance but lacked theoretical guarantees until now; researchers prove last-iterate convergence rates without strong convexity.
http://arxiv.org/abs/2403.07720v1
Compressor summary: Key points: - The paper introduces Large Multi-modal Models (LMMs) that combine text and image features for various vision tasks - The paper proposes visual words, which map visual features to text vocabulary, providing supervision information - The paper experiments with 5 VQA tasks and shows the effectiveness of the proposed approach Summary: The paper presents a novel LMM that uses visual words to supervise image modelling and achieves state-of-the-art results on 5 VQA tasks.
http://arxiv.org/abs/2403.07719v1
Compressor summary: The authors propose a novel dynamic graph representation algorithm for histopathological whole slide images classification that captures both instance relationships and spatial interactions using a knowledge-aware attention mechanism.
http://arxiv.org/abs/2403.07718v1
Compressor summary: The study measures large language models' abilities to interact with enterprise software using WorkArena benchmark and BrowserGym environment, finding promise but significant gaps and disparities.
http://arxiv.org/abs/2403.07714v1
Compressor summary: StableToolBench is a benchmark for testing LLMs with external tools that uses a virtual API server, a caching system, and an automatic evaluator to ensure stability and fairness.
http://arxiv.org/abs/2403.07711v1
Compressor summary: Key points: - Diffusion models for video generation use attention layers but have memory limitations - State-space models (SSMs) are proposed as alternatives with linear memory consumption - SSMs achieve competitive results on UCF101 and MineRL Navigate datasets Summary: The paper proposes using state-space models for video generation instead of attention layers, which save memory and maintain performance.
http://arxiv.org/abs/2403.07708v1
Compressor summary: The authors propose a penalty term called contrastive rewards to make reward models more effective in reinforcement learning from human feedback, which improves robustness, calibration, and performance.
http://arxiv.org/abs/2403.07706v1
Compressor summary: FBI is a fast and simple XAI method for point cloud data that enables better understanding of the network properties, online feedback, and improved classification explainability.
http://arxiv.org/abs/2403.07705v1
Compressor summary: The paper proposes a method to fine-tune stereo matching networks without losing their robustness to unseen domains by using pseudo labels and a dynamic framework.
http://arxiv.org/abs/2403.07704v1
Compressor summary: Symmetric Q-learning improves deep reinforcement learning by creating a Gaussian error distribution from skewed noise, increasing sample efficiency on continuous control tasks in MuJoCo.
http://arxiv.org/abs/2403.07700v1
Compressor summary: VoteCut is a new method that uses multiple self-supervised models to discover objects without labels and improves image segmentation with CuVLER, a zero-shot model trained on pseudo-labels.
http://arxiv.org/abs/2403.07693v1
Compressor summary: The paper proposes a data augmentation framework for opinion summarization that uses both large and small language models to generate synthetic negative reviews and balance the sentiment distribution of the dataset.
http://arxiv.org/abs/2403.07692v1
Compressor summary: The paper proposes Masked AutoDecoder (MAD), a multi-task vision generalist that uses bi-directional attention and masked sequence modeling to unify different vision tasks in parallel, achieving better performance and efficiency than autoregressive models.
http://arxiv.org/abs/2403.07691v1
Compressor summary: The paper introduces ORPO, a reference model-free algorithm that improves language models by fine-tuning them with odds ratios, achieving better performance than state-of-the-art models.
http://arxiv.org/abs/2403.07688v1
Compressor summary: The paper reassesses dying neurons in deep neural networks, showing that they can be useful for structured pruning and compression, and introduces Demon Pruning, a simple and effective method to control them.
http://arxiv.org/abs/2403.07687v1
Compressor summary: Key points: - Current foundation models have imbalanced geographical and economic representation in their training data - More data from underrepresented countries is needed to improve model performance and reduce annotation costs - The paper proposes methods to identify the data to be annotated based on visual distinctiveness and similarity - The resulting lists of countries and topics are available online Summary: The paper presents methods to balance model performance and annotation costs by identifying and annotating data from underrepresented countries with visually distinctive and similar topics to current foundation models' training data.
http://arxiv.org/abs/2403.07684v1
Compressor summary: The paper proposes a method for removing adverse weather conditions from videos using test-time adaptation and diffusion-based network, which improves generalization to unseen weather conditions.
http://arxiv.org/abs/2403.07678v1
Compressor summary: MoralBERT models capture moral nuances in text using annotated data from Twitter, Reddit, and Facebook, improving prediction accuracy compared to traditional methods.
http://arxiv.org/abs/2403.07669v1
Compressor summary: This chapter reviews machine learning methods for soccer match outcome prediction, highlighting current best-performing models, gaps in comparison of deep learning and Random Forest, and potential improvements in features and interpretability.
http://arxiv.org/abs/2403.07657v1
Compressor summary: Bayesian Neural Field (BayesNF) is a statistical model combining a deep neural network and Bayesian inference for spatiotemporal data analysis, outperforming existing methods on large-scale climate and public health datasets.
http://arxiv.org/abs/2403.07652v1
Compressor summary: The paper proposes a dynamic expert selection method for Mixture of Experts models that adjusts the number of activated experts based on input difficulty, improving efficiency and performance.
http://arxiv.org/abs/2403.07636v1
Compressor summary: The paper proposes a novel VLP framework that dissects disease descriptions into aspects, aligns images with them, and improves detection of known and unknown diseases using a dual-head Transformer.
http://arxiv.org/abs/2403.07632v1
Compressor summary: The paper introduces CardioGenAI, a machine learning framework that can re-engineer drugs to reduce their potential to cause heart problems by interfering with a specific ion channel, while maintaining their effectiveness.
http://arxiv.org/abs/2403.07630v1
Compressor summary: This paper proposes a method called CPAL to improve semantic segmentation by using context-aware prototypes that capture diverse object features and reduce knowledge bias between instances and contexts.
http://arxiv.org/abs/2403.07622v1
Compressor summary: The study proposes a novel latent mapping network based on variational auto-encoder (VAE) to enhance compressed dark images while preserving texture details and avoiding compression artifacts amplification.
http://arxiv.org/abs/2403.07621v1
Compressor summary: The paper proposes using deep learning to classify locations in smart museums and aquariums with smartphone images, achieving high precision and showing good feasibility for indoor tourism attractions.
http://arxiv.org/abs/2403.07611v1
Compressor summary: The paper introduces new machine unlearning algorithms that selectively erase knowledge from trained models while preserving performance and avoiding post fine-tuning.
http://arxiv.org/abs/2403.07605v1
Compressor summary: NegOpt is a novel method that optimizes negative prompt generation for text-to-image models using supervised fine-tuning and reinforcement learning, improving image quality by 25% on average.
http://arxiv.org/abs/2403.07603v1
Compressor summary: \our{} is a new probabilistic method for PML that improves performance over existing methods, especially in noisy environments.
http://arxiv.org/abs/2403.07601v1
Compressor summary: The paper proposes a novel approach called LCFD that discovers causal relationships between latent variables and model decisions for unified source-free domain adaptation, achieving state-of-the-art results.
http://arxiv.org/abs/2403.07598v1
Compressor summary: Mondrian is a system that improves object detection on high-resolution videos by selectively processing relevant pixels and combining them efficiently on accelerators like GPUs, achieving higher accuracy and throughput than existing methods.
http://arxiv.org/abs/2403.07593v1
Compressor summary: MinkUNeXt is an efficient 3D place-recognition architecture based on 3D sparse convolutions that surpasses current methods without using complex proposals like Transformers or Attention-Layers.
http://arxiv.org/abs/2403.07592v1
Compressor summary: TRIPLEX is a deep learning framework that predicts spatial gene expression from images, improving on current models and aiding in cancer diagnosis and treatment.
http://arxiv.org/abs/2403.07591v1
Compressor summary: The text proposes a new algorithm (RoBoT) that combines and optimizes existing training-free metrics using Bayesian optimization to improve the search performance of neural network design, especially for diverse tasks.
http://arxiv.org/abs/2403.07589v1
Compressor summary: Key points: - Large kernel convnets have appealing performance but face challenges due to square complexity of convolution and proliferated parameters - The paper proposes peripheral convolution, inspired by human vision, that reduces parameter count and complexity of convolution - The paper also introduces PeLK, a large kernel network that outperforms modern vision Transformers and ConvNet architectures on various tasks Summary: The paper presents peripheral convolution, a novel CNN method based on human vision, and PeLK, a large kernel network that achieves superior performance on vision tasks with extremely large kernels.
http://arxiv.org/abs/2403.07588v1
Compressor summary: The paper studies how image reconstruction attacks on machine learning models depend on real-world image priors and suggests using diffusion models to assess privacy risks under differential privacy.
http://arxiv.org/abs/2403.07587v1
Compressor summary: The authors propose a novel system that allows users to define and automate their data terms of use policies for decentralized web applications like Solid, ensuring better control over their data and improving privacy and usability.
http://arxiv.org/abs/2403.07581v1
Compressor summary: The paper proposes a method to detect personality traits in social media posts using a large language model, text augmentations, and contrastive learning, achieving better results than existing methods.
http://arxiv.org/abs/2403.07578v1
Compressor summary: The paper proposes a self-supervised learning model for assessing children's paintings aesthetics, using a novel dataset with labeled attributes and outperforming other methods.
http://arxiv.org/abs/2403.07576v1
Compressor summary: Fine-grained Prompt Tuning (FPT) is a novel method for medical image classification that reduces memory consumption by using a lightweight side network and fine-grained prompts to access pre-trained knowledge from large-scale models.
http://arxiv.org/abs/2403.07570v1
Compressor summary: The paper presents an active contour model with a hybrid signed pressure function that combines global and local information to improve image segmentation in complex environments.
http://arxiv.org/abs/2403.07567v1
Compressor summary: The paper presents T2X, a new data-to-text dataset for isiXhosa, introduces the SSPG model for agglutinative languages, and evaluates various methods for generating text from data.
http://arxiv.org/abs/2403.07566v1
Compressor summary: This paper proposes a novel DRL algorithm for blood glucose control that uses multi-step learning and Prioritized Experience Replay, achieving better results than benchmarks in time-in-range.
http://arxiv.org/abs/2403.07560v1
Compressor summary: AMMNet is a novel framework for semantic scene completion that uses cross-modal modulation and adversarial training to improve feature learning and generalization from single-view RGB-D images.
http://arxiv.org/abs/2403.07557v1
Compressor summary: The study compares GPT-3.5 and GPT-4 for detecting inconsistencies in summaries and proposes SIFiD, a method to identify key sentences for inconsistency detection using LLMs.
http://arxiv.org/abs/2403.07556v1
Compressor summary: TACS is a method to help large language models generate better text by filtering out untruthful information from the input context.
http://arxiv.org/abs/2403.07548v1
Compressor summary: The text proposes two realistic scenarios for learning embodied agents, and introduces Confidence-Aware Moving Average (CAMA) method to update logits without task boundary information.
http://arxiv.org/abs/2403.07547v1
Compressor summary: SMURF is a novel method that uses Neural-ODEs to model continuous camera motion and volumetric representation for synthesizing high-fidelity views with robustness to motion blur.
http://arxiv.org/abs/2403.07544v1
Compressor summary: The MAMMOTH toolkit is a framework for training modular multilingual machine translation systems efficiently across clusters of GPUs and is publicly available online.
http://arxiv.org/abs/2403.07542v1
Compressor summary: The text summarizes how visual transformer models, successful in natural language processing, are being adapted for autonomous driving tasks, such as object detection and scene recognition, due to their advantages in processing dynamic visual scenes.
http://arxiv.org/abs/2403.07536v1
Compressor summary: LaB-GATr is a transformer neural network that can effectively learn from large-scale medical 3D models by using geometric tokenisation, sequence compression and interpolation, achieving state-of-the-art results in cardiovascular hemodynamics modelling and neurodevelopmental phenotype prediction.
http://arxiv.org/abs/2403.07535v1
Compressor summary: The paper proposes a new fused depth estimation system that adaptively integrates multi-view and single-view results for robustness against noisy camera poses and challenging conditions.
http://arxiv.org/abs/2403.07532v1
Compressor summary: The paper presents a method for autonomous systems to identify and classify novel objects in real-world images without extra training data, enabling better decision-making in tasks like planning or mapping.
http://arxiv.org/abs/2403.07518v1
Compressor summary: The paper proposes Pseudo-OCR, an open-vocabulary text recognition framework that uses character detection and image inpainting to generate pseudo OOV training data from real images and a quality-aware margin loss to train with both real and pseudo data.
http://arxiv.org/abs/2403.07516v1
Compressor summary: The paper proposes a new method to generate realistic RGBD samples using Diffusion4D, which improves deep learning models' performance on monocular depth estimation tasks.
http://arxiv.org/abs/2403.07514v1
Compressor summary: CUDGNet is a novel model that uses contrastive learning and domain generation to improve single domain generalization and provide uncertainty estimation.
http://arxiv.org/abs/2403.07513v1
Compressor summary: The paper proposes two methods for analyzing temporal patterns in medical data using deep learning, improving prognosis and diagnosis of conditions like AMD and cardiac output.
http://arxiv.org/abs/2403.07510v1
Compressor summary: The paper proposes a novel "relevance score" for heuristic planning that identifies actions or facts important for most but not all plans to achieve a goal, and shows its improved performance compared to the standard landmark-based approach on problems without clear landmarks.
http://arxiv.org/abs/2403.07508v1
Compressor summary: MoAI is a new large language and vision model that uses auxiliary computer vision information for better real-world scene understanding in zero-shot tasks.
http://arxiv.org/abs/2403.07503v1
Compressor summary: The paper proposes a mathematical expression for constrained optimal fuel consumption in hybrid electric vehicles using constrained reinforcement learning and compares two mainstream approaches, finding that Lagrangian-based methods achieve lower fuel consumption with more oscillations than variational policy optimization.
http://arxiv.org/abs/2403.07501v1
Compressor summary: Dev-Assist is an IntelliJ IDEA plugin that uses multi-label machine learning to detect security-relevant methods in code and automatically configure static analysis tools with better performance than related approaches.
http://arxiv.org/abs/2403.07500v1
Compressor summary: LoRA is a new technique that improves personalization and stylization in text-to-image generation by fine-tuning different blocks of a diffusion model.
http://arxiv.org/abs/2403.07487v1
Compressor summary: Motion Mamba is a novel approach that combines Hierarchical Temporal Mamba and Bidirectional Spatial Mamba blocks to efficiently generate high-quality human motion sequences using state space models.
http://arxiv.org/abs/2403.07486v1
Compressor summary: XpertAI is a framework that helps explain regression models by breaking them into range-specific sub-strategies and allowing precise queries as linear combinations of those sub-strategies.
http://arxiv.org/abs/2403.07483v1
Compressor summary: The paper proposes a non-invasive diabetes diagnosis method using a neural network with batch normalization, data re-sampling and balancing, which improves accuracy compared to traditional machine learning methods.
http://arxiv.org/abs/2403.07472v1
Compressor summary: The study shows that using deep learning models with a balanced presence-only loss function can better model rare species in citizen science datasets, helping with biodiversity conservation.
http://arxiv.org/abs/2403.07469v1
Compressor summary: Key points: - 3D dense captioning is a vision-language bridging task that generates detailed descriptions for 3D scenes - The paper reviews existing methods, provides a standard pipeline, introduces a taxonomy, and proposes future directions - The aim is to facilitate research and applications in multimedia and related domains Summary: The paper surveys 3D dense captioning, a task that creates accurate descriptions of 3D scenes, and presents a comprehensive review of methods, tools, challenges, and opportunities.
http://arxiv.org/abs/2403.07460v1
Compressor summary: The paper compares various prediction models for time-to-event analysis, evaluates their performance on three datasets, and explores how ensemble methods can improve accuracy and robustness.
http://arxiv.org/abs/2403.07456v1
Compressor summary: The authors propose a unified mathematical framework for multi-view autoencoders and extend the {multi-view-AE} Python library to provide consistent implementations and improve performance for modelling multi-modal data.
http://arxiv.org/abs/2403.07442v1
Compressor summary: The paper proposes a method for domain adaptation under distribution shift using proximal causal learning and proxy variables, and shows its effectiveness in two settings: Concept Bottleneck and Multi-domain.
http://arxiv.org/abs/2403.07440v1
Compressor summary: MTLoRA is a new matrix transformation-based method for efficient fine-tuning of LPLMs that mimics brain function geometry to improve performance on NLU, NLG, and other downstream tasks.
http://arxiv.org/abs/2403.07437v1
Compressor summary: The paper proposes a geometric feature method for object pose estimation that works without category information, achieving similar results to category-based methods.
http://arxiv.org/abs/2403.07436v1
Compressor summary: The paper proposes a new method for event-based moving object detection that uses joint spatio-temporal reasoning and improves accuracy by 13%.
http://arxiv.org/abs/2403.07432v1
Compressor summary: The text proposes a novel framework that uses an event as a bridge between RGB and LiDAR sensors for fusing cross-modal knowledge in scene flow, improving visual and motion features.
http://arxiv.org/abs/2403.07420v1
Compressor summary: DragAnything is a user-friendly motion control method for any object in video generation that uses entity representation and trajectory-based interaction.
http://arxiv.org/abs/2403.07413v1
Compressor summary: The paper proposes online learning algorithms tailored for specific tasks like caching and scheduling, improving performance and robustness by integrating machine learning prediction into the algorithm design.
http://arxiv.org/abs/2403.07408v1
Compressor summary: The paper proposes a novel nighttime image dehazing method using severe augmentation during training, which improves robustness to real-world degradations and achieves state-of-the-art performance.
http://arxiv.org/abs/2403.07407v1
Compressor summary: In-context learning allows GPT-4V, a large vision language model, to perform well on three cancer histopathology tasks with minimal data and no task-specific fine-tuning.
http://arxiv.org/abs/2403.07406v1
Compressor summary: FeTrIL++ is an improved framework for class-incremental learning that balances accuracy for new and past classes using oversampling techniques and dynamic optimization strategies.
http://arxiv.org/abs/2403.07404v1
Compressor summary: The study explores how early-exit networks can be adapted for continual learning, improving efficiency and performance in class-incremental settings with a simple method called Task-wise Logits Correction (TLC).
http://arxiv.org/abs/2403.07403v1
Compressor summary: The paper introduces two new benchmarks for food recognition from daily-life scenarios and a baseline method to improve transferability of existing methods.
http://arxiv.org/abs/2403.07398v1
Compressor summary: COM2 is a new dataset that helps language models improve their ability to reason about complex events by generating questions from a commonsense knowledge graph.
http://arxiv.org/abs/2403.07392v1
Compressor summary: ViT-CoMer is a plain, pre-training-free, and feature-enhanced ViT backbone that improves dense prediction tasks by injecting spatial pyramid multi-receptive field convolutional features and proposing a simple CNN-Transformer bidirectional fusion interaction module.
http://arxiv.org/abs/2403.07389v1
Compressor summary: The paper proposes a new method for translating chromogenic immunohistochemistry images to fluorescence images using a novel training design and an auxiliary unpaired image domain, which improves segmentation performance compared to existing methods.
http://arxiv.org/abs/2403.07384v1
Compressor summary: The SmallToLarge (S2L) method improves data efficiency in supervised fine-tuning for specialized domains by selecting data based on training trajectories from small models, achieving better results with less data and a smaller reference model.
http://arxiv.org/abs/2403.07380v1
Compressor summary: The Gabformer combines CNNs and Transformers with Gabor filters to improve image deraining by enhancing local texture features and robustness to noise.
http://arxiv.org/abs/2403.07379v1
Compressor summary: The paper proposes a new way to understand neural networks by analyzing their optimization trajectories and reveals how different optimization choices affect their behavior.
http://arxiv.org/abs/2403.07378v1
Compressor summary: SVD-LLM is a new LLM compression method that improves over existing SVD-based methods by using truncation-aware data whitening and layer-wise model parameter update to reduce compression loss and maintain accuracy.
http://arxiv.org/abs/2403.07376v1
Compressor summary: This paper proposes NavCoT, a novel strategy to train large language models for vision-and-language navigation tasks by improving their navigational reasoning and interpretability using a chain-of-thought approach.
http://arxiv.org/abs/2403.07372v1
Compressor summary: The ECFusion method addresses extrinsic and inherent cross-modal conflicts in 3D object detection by aligning spatial distributions and preserving objectness clues, leading to improved performance on the nuScenes dataset.
http://arxiv.org/abs/2403.07371v1
Compressor summary: The study presents a new diffusion-based method for virtual try-on that preserves clothing texture, retains user identity, and is significantly faster than existing approaches.
http://arxiv.org/abs/2403.07369v1
Compressor summary: TextGCD is a multi-modality framework that uses visual-language models to generate descriptive texts for images and leverages textual-visual disparities for novel visual category discovery, achieving superior performance over existing methods.
http://arxiv.org/abs/2403.07366v1
Compressor summary: The text introduces a new test-time adaptation method called DeYO that uses a novel confidence metric based on object shape to mitigate error accumulation in online updates.
http://arxiv.org/abs/2403.07363v1
Compressor summary: The paper introduces intuitionistic fuzzy random forest, a new classification method that combines random forest and fuzzy logic, and shows its superior performance compared to existing algorithms.
http://arxiv.org/abs/2403.07362v1
Compressor summary: The text introduces a new evaluative approach for machine unlearning that identifies the worst-case data subset to erase, using bi-level optimization and experiments on various datasets and models.
http://arxiv.org/abs/2403.07359v1
Compressor summary: The Few-point Shape Completion (FSC) model uses a dual-branch feature extractor and a two-stage revision network to recover 3D shapes from very sparse point clouds, outperforming previous methods and generalizing well to different objects.
http://arxiv.org/abs/2403.07356v1
Compressor summary: The paper proposes a method to generate synthetic data based on text descriptions and use it for pre-training models, which improves their performance in continual learning tasks.
http://arxiv.org/abs/2403.07354v1
Compressor summary: The text introduces BID, an unsupervised framework that partitions motion sequences into meaningful pre-action segments, improving action localization and understanding performance.
http://arxiv.org/abs/2403.07353v1
Compressor summary: GraphRevoker is a novel framework that improves the unlearning process of GNNs by preserving graph properties and aggregating sub-models effectively.
http://arxiv.org/abs/2403.07350v1
Compressor summary: The paper introduces KEBench, a new benchmark for knowledge editing in large vision-language models, with an extended metric (Portability) and improved image data quality to better evaluate model performance.
http://arxiv.org/abs/2403.07347v1
Compressor summary: FD4MM is a new approach to reveal subtle motions in videos by separating and enhancing low-frequency motion fields and high-frequency details using sparse filters and contrastive regularization, achieving better performance and efficiency than existing methods.
http://arxiv.org/abs/2403.07346v1
Compressor summary: EvRGBHand is a novel approach for 3D hand mesh reconstruction that combines an event camera and an RGB camera to overcome each other's limitations and improve performance in challenging scenarios.
http://arxiv.org/abs/2403.07342v1
Compressor summary: ASTE is a subtask of sentiment analysis that extracts structured sentiment triplets from text, and the proposed method uses a novel tagging scheme and contrastive learning to improve performance over existing approaches and LLMs.
http://arxiv.org/abs/2403.07339v1
Compressor summary: The authors propose a method (IM-Unpack) to represent heavy hitters in GEMM matrices with low bit-width integers by unpacking them into multiple smaller matrices, achieving efficiency gains and parity with floating point calculations.
http://arxiv.org/abs/2403.07332v1
Compressor summary: LMa-UNet is a novel medical image segmentation method that leverages large windows and a hierarchical Mamba block to achieve efficient long-range dependency modeling with linear complexity.
http://arxiv.org/abs/2403.07329v1
Compressor summary: The paper proposes UDIM, a domain generalization method that minimizes loss inconsistency between source and perturbed domains by perturbing instances from the source dataset and combining it with SAM optimization.
http://arxiv.org/abs/2403.07326v1
Compressor summary: The paper introduces Gray code in event-based structured light systems, enabling fast and accurate depth estimation with high-speed projection and spatio-temporal encoding.
http://arxiv.org/abs/2403.07321v1
Compressor summary: Key points: - The paper introduces GRiD, a dataset for detecting ChatGPT-generated text - The dataset contains Reddit context-prompt pairs with both human and ChatGPT responses - GpTen is a new semi-supervised tensor-based detection method that performs well on the dataset Summary: The paper presents GRiD, a novel dataset to detect ChatGPT-generated text from Reddit contexts, and proposes GpTen, a semi-supervised tensor-based detection method.
http://arxiv.org/abs/2403.07319v1
Compressor summary: The proposed method improves image restoration by efficiently shifting between high-quality and low-quality images using a Markov chain and a flexible noise schedule, achieving superior or comparable results to current methods with fewer sampling steps.
http://arxiv.org/abs/2403.07311v1
Compressor summary: The paper presents a new method to predict multiple links in knowledge graphs using natural language processing techniques, such as chain-of-thought prompting and in-context learning, which improve the performance and generalization of large language models.
http://arxiv.org/abs/2403.07309v1
Compressor summary: The paper proposes POSNEGDM, a reinforcement learning framework that uses positive and negative demonstrations and individual patient characteristics to guide sepsis treatment, achieving higher survival rates than existing methods.
http://arxiv.org/abs/2403.07308v1
Compressor summary: The paper proposes a holistic approach to learn barrier functions for system safety with finite-step termination guarantees, by first learning an NN basis function and then fine-tuning it with convexity and counterexamples from verification failure.
http://arxiv.org/abs/2403.07304v1
Compressor summary: Lumen is a large multimodal model that enhances perception capabilities by decoupling learning into task-agnostic and task-specific stages, improving performance on various visual tasks.
http://arxiv.org/abs/2403.07301v1
Compressor summary: LLaMS generates high-quality, multimodal stories from image streams using commonsense knowledge, textual reasoning, and story illustration.
http://arxiv.org/abs/2403.07300v1
Compressor summary: The LLaTA framework aligns language models and time series data to improve multivariate forecasting by leveraging both static and dynamic knowledge from large language models.
http://arxiv.org/abs/2403.07294v1
Compressor summary: Graph Data Condensation via Self-expressive Graph Structure Reconstruction (GCSR) is a novel framework that condenses large-scale graphs by incorporating the original graph structure and reconstructing an interpretable synthetic graph.