This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-27 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.16707v1
Compressor summary: The paper proposes a new technique for domain incremental learning with deep neural networks when only one sample from a new domain is available, by addressing issues in batch normalization layers.
http://arxiv.org/abs/2403.16702v1
Compressor summary: This paper introduces ProCQA, a mixed-modal QA dataset from StackOverflow, and proposes a modality-agnostic contrastive pre-training method to improve code language models' text and code alignment for code question answering.
http://arxiv.org/abs/2403.16697v1
Compressor summary: Dynamic PromptStyler (DPStyler) is a model that improves source-free domain generalization by updating styles, removing style variations, and ensembling models.
http://arxiv.org/abs/2403.16685v1
Compressor summary: ToXCL is a framework that detects and explains implicit toxic speech in online posts, improving on previous models by generating targeted demographic groups and using knowledge distillation for better detection.
http://arxiv.org/abs/2403.16680v1
Compressor summary: The paper presents a new method for continuous convolutions in fluid simulations using separable basis functions, particularly Fourier-based networks, which improve accuracy and stability over existing methods.
http://arxiv.org/abs/2403.16677v1
Compressor summary: FOOL is a novel feature compression method for nanosatellites that reduces transfer costs by processing raw satellite images onboard, leveraging inter-tile dependencies and context, while maintaining high prediction performance and perceptual quality.
http://arxiv.org/abs/2403.16668v1
Compressor summary: This paper uses computational methods to study how common, changing, and influenced by demographics bragging is on Twitter in the U.S., and identifies different bragging topics related to user characteristics.
http://arxiv.org/abs/2403.16667v1
Compressor summary: This paper explores using deep reinforcement learning to optimize investment portfolios while considering environmental, social, and governance (ESG) goals.
http://arxiv.org/abs/2403.16662v1
Compressor summary: The text proposes a method using a Large Language Model to retrieve and summarize evidence for explainable fact-checking systems, and introduces RU22Fact, a multilingual dataset on the Russia-Ukraine conflict in 2022 with claims, evidence, and explanations.
http://arxiv.org/abs/2403.16656v1
Compressor summary: GraphAug is a framework that improves recommendation systems by generating denoised self-supervised signals and adapting contrastive view generation using graph information bottleneck regularization.
http://arxiv.org/abs/2403.16655v1
Compressor summary: The project analyzes and corrects text errors using advanced neural network-based language models, BART and MarianMT.
http://arxiv.org/abs/2403.16654v1
Compressor summary: The paper proposes a new SVM loss function called $\ell_s$ that improves generalization by considering the degree of penalty for correctly classified samples within the margin, and presents a fast optimization algorithm ($\ell_s$-ADMM) to handle it.
http://arxiv.org/abs/2403.16649v1
Compressor summary: The paper proposes a simple contrastive learning framework to align large language models with human preferences using a novel rescoring strategy and pairwise contrastive loss.
http://arxiv.org/abs/2403.16646v1
Compressor summary: S2VNet is a universal framework that unifies automatic and interactive medical image segmentation using slice-to-volume propagation, achieving faster inference speeds and reduced memory consumption than existing 3D solutions.
http://arxiv.org/abs/2403.16638v1
Compressor summary: The text proposes an effective AI-generated video detection scheme using a two-branch spatio-temporal CNN with ResNet sub-detectors, and presents a large-scale dataset for benchmarking and evaluation.
http://arxiv.org/abs/2403.16635v1
Compressor summary: The paper introduces a new message unit, point cluster, for enhanced vehicle-to-everything perception, overcoming issues with existing methods using bandwidth and dense representations.
http://arxiv.org/abs/2403.16630v1
Compressor summary: The paper evaluates patent-specific pretrained embedding models and Sentence Transformers for text-based patent similarity, using patent interferences as ground-truth, and proposes Patent SBERT-adapt-ub, which outperforms current methods.
http://arxiv.org/abs/2403.16627v1
Compressor summary: Key points: - Diffusion models are powerful but have drawbacks like complex architecture and high latency - The paper introduces a dual approach to reduce model latency: miniaturization and fewer sampling steps - The method uses knowledge distillation, feature matching, and score distillation - Two new models, SDXS-512 and SDXS-1024, are faster than previous ones on a single GPU Summary: The paper proposes a dual approach to speed up diffusion models by miniaturizing the architecture and reducing sampling steps, using knowledge distillation and other techniques. The new models are significantly faster than previous ones on a single GPU.
http://arxiv.org/abs/2403.16614v1
Compressor summary: The authors propose multi-lingual sentence encoders for crisis-related social media texts in over 50 languages, improving semantic similarity and contextual understanding.
http://arxiv.org/abs/2403.16612v1
Compressor summary: The paper proposes a method for calibrating neural network-based seasonal temperature forecasts, improving their reliability and sharpness.
http://arxiv.org/abs/2403.16609v1
Compressor summary: The text discusses the importance of conversational grounding, which helps build trustworthy dialog systems by ensuring shared understanding between parties, and introduces a framework for annotating dialog corpora with grounding acts and units to improve this capability.
http://arxiv.org/abs/2403.16607v1
Compressor summary: Style Filter is a method that improves transfer learning performance for industrial data by selectively filtering source domain data without using labels or prior knowledge.
http://arxiv.org/abs/2403.16605v1
Compressor summary: This paper proposes a method to generate high-quality semantic segmentation masks for satellite images using generative image diffusion, which improves performance in earth observation tasks.
http://arxiv.org/abs/2403.16592v1
Compressor summary: Key points: - The paper presents methods to detect machine-generated text across various domains and languages - It analyzes different approaches, such as statistical, neural, and pre-trained models - It reports accuracy results on two subtasks - It discusses challenges and factors for future research Summary: The paper proposes and evaluates methods to identify machine-generated text using various approaches and languages, achieving high accuracy and highlighting future directions.
http://arxiv.org/abs/2403.16591v1
Compressor summary: This paper compares local differential privacy (LDP) and Bayesian privacy methods for measuring privacy in machine learning and proposes a framework to better understand their trade-offs and effectiveness.
http://arxiv.org/abs/2403.16584v1
Compressor summary: Large language models can partially remove unwanted information from text, but struggle to completely eliminate sentiment without losing semantic content.
http://arxiv.org/abs/2403.16582v1
Compressor summary: The paper investigates how to choose the best encoder and fusion strategy for multi-view learning in crop classification using various temporal data sources, suggesting a framework for researchers.
http://arxiv.org/abs/2403.16578v1
Compressor summary: SegICL is a novel approach that uses in-context learning for text-guided image segmentation, enabling adaptation to new tasks without training or fine-tuning the model.
http://arxiv.org/abs/2403.16571v1
Compressor summary: The study introduces NSINA, a large Sinhala news corpus and NLP tasks, to address challenges in adapting LLMs to low-resource languages like Sinhala.
http://arxiv.org/abs/2403.16558v1
Compressor summary: Elysium is a model that uses a large dataset of video frames with object boxes and descriptions to perform object tracking and expression generation tasks in videos, overcoming challenges related to pretraining and computational cost.
http://arxiv.org/abs/2403.16554v1
Compressor summary: The paper introduces Poincar'e Explanation, a method for explaining deep learning models in NLP using hyperbolic spaces, which capture syntax and semantics better than Euclidean spaces.
http://arxiv.org/abs/2403.16543v1
Compressor summary: The paper proposes a novel method for few-shot relation classification using multiple sentence representations and contrastive learning to extract complementary information.
http://arxiv.org/abs/2403.16539v1
Compressor summary: DOrA is a new 3D visual grounding framework that uses Large Language Models to suggest an order of anchor objects, improving the accuracy of locating target objects in scenes described by natural language.
http://arxiv.org/abs/2403.16536v1
Compressor summary: The paper introduces VMRNN cells, a new recurrent unit that combines Vision Mamba blocks and LSTM for spatiotemporal forecasting tasks.
http://arxiv.org/abs/2403.16530v1
Compressor summary: Intermediate fusion improves text-to-image alignment and efficiency in diffusion models by fusing conditioning text in a specially designed space instead of using early fusion in pretrained image features.
http://arxiv.org/abs/2403.16528v1
Compressor summary: Vision-language models are not open-set models despite being trained on large datasets because their finite query sets introduce closed-set assumptions, leading to low precision and recall in open-set conditions.
http://arxiv.org/abs/2403.16527v1
Compressor summary: Autonomous systems use modular sub-components or foundation models for decision-making, but need better ways to detect and mitigate hallucinations that can lead to poor performance in out-of-distribution scenarios.
http://arxiv.org/abs/2403.16526v1
Compressor summary: The study presents a pyramid network with ModeTv2 operator that optimizes deformable image registration in medical imaging, improving accuracy, efficiency, and generalizability while being fast and interpretable.
http://arxiv.org/abs/2403.16524v1
Compressor summary: This paper explores using large language models to create socially aware agents that can discover, reason, and make decisions based on norms, and discusses challenges and opportunities for collaboration across fields.
http://arxiv.org/abs/2403.16520v1
Compressor summary: The paper introduces CMViM, a new method to learn efficient and unified representations from multi-modal 3D medical images for AD diagnosis, using masked Vim autoencoder and contrastive learning.
http://arxiv.org/abs/2403.16516v1
Compressor summary: ViTLP is a pre-training method for visual document understanding that generates text and layout sequences from document images and performs well on various downstream tasks.
http://arxiv.org/abs/2403.16513v1
Compressor summary: Key points: - Generative models can create realistic images, but have inconsistent artifacts - Proposed method uses natural traces to distinguish real from fake images - Method outperforms baselines and shows high accuracy on a real-world platform Summary: The paper introduces a method that uses natural traces learned from real images to detect artifacts in generative models' fake images, achieving better performance than previous methods.
http://arxiv.org/abs/2403.16512v1
Compressor summary: The study explores in-context learning (ICL) for low-resource languages, introduces query alignment to improve performance, and provides insights into its effectiveness and challenges.
http://arxiv.org/abs/2403.16510v1
Compressor summary: The Make-Your-Anchor system generates anchor-style videos from one-minute clips with precise torso and hand movements, using a structure-guided diffusion model and a face enhancement module.
http://arxiv.org/abs/2403.16509v1
Compressor summary: The document outlines the datasets and challenges for the 2024 Human Understanding AI Paper Challenge, which focuses on developing AI technologies for understanding human daily life.
http://arxiv.org/abs/2403.16508v1
Compressor summary: WL-GOOSE uses graph representations and WL features to learn heuristics faster and better than existing deep learning models, achieving competitive performance in several domains.
http://arxiv.org/abs/2403.16504v1
Compressor summary: LARA is a language model that improves multi-turn intent classification in chatbots across six languages by combining fine-tuning with adaptive retrieval techniques.
http://arxiv.org/abs/2403.16502v1
Compressor summary: The text provides a comprehensive review of medical image registration methods from traditional and deep learning-based directions, focusing on recent advances in retinal image registration and its challenges.
http://arxiv.org/abs/2403.16501v1
Compressor summary: The paper proposes a framework for assisting human decision making in high-stakes tasks using interpretable and task-specific textual guidance from AI models instead of taking control away from the expert.
http://arxiv.org/abs/2403.16499v1
Compressor summary: The text proposes two self-supervised learning tasks using anatomy-oriented imaging planes to improve pretraining for medical image analysis.
http://arxiv.org/abs/2403.16497v1
Compressor summary: PathoTune is a framework that adapts foundation models to pathology-specific tasks using multi-modal prompt tuning, improving performance over single-modality approaches and even outperforming specialized pathological models.
http://arxiv.org/abs/2403.16495v1
Compressor summary: Key points: - LSTTN is a novel framework for long- and short-term traffic flow prediction using STGNNs - LSTTN leverages masked subseries Transformer to learn compressed and contextual temporal representations from long historical series - LSTTN extracts long-term trend, periodic features and short-term features by different convolution layers and fuses them for prediction - LSTTN outperforms baseline models on four real-world datasets in 60-minute-ahead forecasting Summary: LSTTN is a new traffic flow prediction model that uses STGNNs to learn from long historical series and extract long- and short-term features by various convolution layers, achieving significant improvement over baselines.
http://arxiv.org/abs/2403.16494v1
Compressor summary: CT-Bound is a fast and accurate method for estimating image boundaries using a hybrid neural network that combines Convolution and Transformer layers, enabling real-time video processing.
http://arxiv.org/abs/2403.16483v1
Compressor summary: The paper proposes WHLL, a method to create a large-scale geoparsing corpus from Wikipedia articles using hyperlinks to annotate coordinates for multiple location expressions.
http://arxiv.org/abs/2403.16482v1
Compressor summary: DMLL is a novel labeling setting for multi-label classification that reduces annotation cost by assigning determined labels to training instances, and this paper proposes a risk-consistent estimator and a similarity-based prompt learning method to learn from these determined labels.
http://arxiv.org/abs/2403.16481v1
Compressor summary: The paper proposes a novel method to synthesize new views in real-time on various scenes with rich view-dependent appearances, using meshes and a neural environment map.
http://arxiv.org/abs/2403.16469v1
Compressor summary: The paper introduces a new weakly supervised learning setting called Reduced Label that reduces labeling costs and preserves supervised information for long-tailed data classes, achieving better performance than existing methods.
http://arxiv.org/abs/2403.16463v1
Compressor summary: SuperCD is a method to improve few-shot named entity recognition by actively learning from superposition concepts and instances.
http://arxiv.org/abs/2403.16459v1
Compressor summary: The paper analyzes approximation and learning abilities of CNNs using new bounds on their weights and size, and shows that they achieve minimax optimal convergence rates for various problems like regression and classification.
http://arxiv.org/abs/2403.16451v1
Compressor summary: DeepMachining is a deep learning system for online prediction of lathe machine errors using manufacturing data and pre-trained models.
http://arxiv.org/abs/2403.16450v1
Compressor summary: The paper proposes a camera-aware label refinement framework to improve unsupervised person re-identification by reducing feature distribution discrepancies across different cameras.
http://arxiv.org/abs/2403.16447v1
Compressor summary: The study explores how attention scores in BERT models vary by lexical categories during fine-tuning for different tasks and finds that content words get more attention in semantic tasks while function words in syntactic tasks.
http://arxiv.org/abs/2403.16446v1
Compressor summary: The text proposes an automatic evaluation method for large language models (LLMs) in medical diagnosis using a multi-agent framework, standardized patients, and a Retrieval-Augmented Evaluation (RAE).
http://arxiv.org/abs/2403.16444v1
Compressor summary: Key points: - Instruction Tuning is essential for large language models to perform well on specific tasks - Publicly available instruction datasets in English are widely used, but not in Korean - KIT-19 is a new instruction dataset for Korean NLP tasks, composed of 19 existing open-source datasets - KIT-19 helps train a Korean Pretrained LLM that significantly outperforms existing ones Summary: KIT-19 is a novel instruction dataset for Korean NLP that enables training a superior Korean Pretrained LLM.
http://arxiv.org/abs/2403.16443v1
Compressor summary: The paper proposes CodeS, a framework that generates code repositories from natural language requirements using LLMs, and evaluates its performance using both automated and manual methods.
http://arxiv.org/abs/2403.16442v1
Compressor summary: The paper proposes EX2, a method to analyze visual language models and find out what features they use to represent concepts, revealing the importance of non-visual attributes and spurious descriptions in VLM representations.
http://arxiv.org/abs/2403.16440v1
Compressor summary: The paper proposes RCBEVDet, a method for fusing radar and camera data to improve 3D object detection in autonomous driving.
http://arxiv.org/abs/2403.16435v1
Compressor summary: The paper presents InstUPR, a passage reranking method using instruction-tuned LLMs without extra fine-tuning, which performs better than unsupervised baselines and an instruction-tuned reranker.
http://arxiv.org/abs/2403.16432v1
Compressor summary: Prompt-based learning on PLMs can be adversarially attacked with natural UATs generated by the $ extit{LinkPrompt}$ algorithm, affecting the performance of downstream NLP tasks.
http://arxiv.org/abs/2403.16431v1
Compressor summary: The paper introduces DOCTR, a novel method for point scene understanding that leverages object-centric representation and Transformer decoder to optimize queries involving object relationships, achieving state-of-the-art performance on ScanNet dataset.
http://arxiv.org/abs/2403.16428v1
Compressor summary: The HANDS23 challenge aims to improve 3D hand-object reconstruction from egocentric views by addressing challenges like occlusion, viewpoint bias, distortion, and motion blur.
http://arxiv.org/abs/2403.16427v1
Compressor summary: Key points: - The paper proposes a new method (Re2LLM) for session-based recommendation using large language models - Re2LLM guides LLMs to learn from their own errors and a knowledge base of hints - Re2LLM achieves better recommendations than existing methods Summary: The paper introduces Re2LLM, a method that enhances session-based recommendation with large language models by making them reflect on their mistakes and use hints from a knowledge base.
http://arxiv.org/abs/2403.16424v1
Compressor summary: The study explores using ChatGPT to generate Library of Congress Subject Headings (LCSH) for electronic theses and dissertations (ETDs), finding that while LLMs can help with cataloging backlog, human catalogers are still needed for validity.
http://arxiv.org/abs/2403.16422v1
Compressor summary: The paper proposes a new framework for improving Text-to-Image generation models that can handle lengthy and complex visual text, demonstrating significant improvements on two benchmarks.
http://arxiv.org/abs/2403.16418v1
Compressor summary: This paper proposes IMLIB, an interpretable machine learning model based on MaxSAT that generates balanced and accurate classification rules.
http://arxiv.org/abs/2403.16416v1
Compressor summary: The text discusses the limitations and challenges of using large language models for user simulators in conversational recommender systems, proposing a new method called SimpleUserSim to improve recommendations.
http://arxiv.org/abs/2403.16412v1
Compressor summary: TANet is a novel method for finding point-wise correspondences between deformable and unusual shapes in unsupervised point cloud shape correspondence.
http://arxiv.org/abs/2403.16410v1
Compressor summary: Spike-NeRF is a novel method for 3D reconstruction and viewpoint synthesis of high-speed scenes using continuous spike streams from moving spike cameras.
http://arxiv.org/abs/2403.16407v1
Compressor summary: This paper surveys recent advancements and paradigms in generating long-duration videos, discussing network design, conditioning techniques, datasets, evaluation metrics, and future directions.
http://arxiv.org/abs/2403.16400v1
Compressor summary: ASDF is an approach for in-situ AR visualization in assembly scenarios that combines object detection, 6D pose estimation, and assembly state detection to provide guidance and reduce errors.
http://arxiv.org/abs/2403.16396v1
Compressor summary: Definition bias affects information extraction models, and a multi-stage framework is proposed to measure and mitigate it.
http://arxiv.org/abs/2403.16395v1
Compressor summary: The paper proposes a new network (MAPNet) with improved matchers for feature comparison in classification-regression tasks, which leads to better tracking performance on several benchmarks.
http://arxiv.org/abs/2403.16394v1
Compressor summary: The paper proposes metrics to measure linguistic and visual skew in text-to-image generation datasets, showing that balanced phenomenological coverage improves generalization without increasing data size.
http://arxiv.org/abs/2403.16393v1
Compressor summary: The paper proposes CLED, a method to detect errors in LLMs' outputs by extracting linguistic features and using a concurrent classifier, without accessing the model's internal nodes.
http://arxiv.org/abs/2403.16387v1
Compressor summary: Text-IF is a novel approach for image fusion that uses text guidance to address degradations and interactive needs, achieving better performance and flexibility than existing methods.
http://arxiv.org/abs/2403.16386v1
Compressor summary: Dia-LLaMA is a framework that adapts LLaMA2-7B for CT report generation by incorporating diagnostic information as guidance prompts, using a pre-trained ViT3D and disease prototype memory bank, and introducing disease-aware attention.
http://arxiv.org/abs/2403.16385v1
Compressor summary: The paper presents a method to improve chart VQA models by leveraging LLMs as an automatic data annotator that generates question-answer annotations for chart images using a step-by-step generation procedure, achieving state-of-the-art accuracy on complex reasoning questions.
http://arxiv.org/abs/2403.16379v1
Compressor summary: The text describes the development of an efficient algorithm, FlashEval, for selecting a representative subset of data to evaluate text-to-image generative models quickly.
http://arxiv.org/abs/2403.16377v1
Compressor summary: The paper proposes a neural process-based approach that adapts to real-time condition monitoring signals, enables on-the-spot predictions, and incorporates qualitative information from individual units.
http://arxiv.org/abs/2403.16376v1
Compressor summary: Elite360D is a novel framework that uses ERP image and ICOSAP point set to estimate depth with better local-with-global representation and lower computational cost than previous methods.
http://arxiv.org/abs/2403.16374v1
Compressor summary: The paper proposes a progressive interaction network for autonomous driving that uses graph convolutions to better learn map constraints and social interactions, and a weight allocation mechanism for multi-modal training.
http://arxiv.org/abs/2403.16370v1
Compressor summary: GoodSAM framework uses a teacher assistant and Segment Anything Model to transfer knowledge for panoramic semantic segmentation without labeled data.
http://arxiv.org/abs/2403.16369v1
Compressor summary: Key points: - The text introduces action-bisimulation encoding, a method to capture multi-step controllability in reinforcement learning with high-dimensional observations. - Action-bisimulation is inspired by bisimulation invariance pseudometric and extends single-step controllability with recursive invariance constraint. - The text shows that action-bisimulation pretraining improves sample efficiency and provides theoretical and qualitative analysis of its performance. Summary: The text presents action-bisimulation encoding, a novel method for reinforcement learning agents to learn multi-step controllability from high-dimensional observations using a recursive invariance constraint, which enhances sample efficiency and control relevance.
http://arxiv.org/abs/2403.16368v1
Compressor summary: The proposed framework fuses and distills semantic priors from a segment anything model to improve image restoration performance without sacrificing inference efficiency.
http://arxiv.org/abs/2403.16365v1
Compressor summary: The text describes a new technique, Guided Diffusion Poisoning (GDP), that creates base samples for crafting more potent poisons and backdoors in neural networks trained on web-scraped data.
http://arxiv.org/abs/2403.16358v1
Compressor summary: ChebMixer is a novel graph neural network architecture that uses Chebyshev polynomials for efficient and multiscale node representation learning, improving performance on various graph mining tasks.
http://arxiv.org/abs/2403.16345v1
Compressor summary: The paper proposes two strategies for identifying query facets without using a search engine, which can improve performance in applications with private documents and constantly updated search engines.
http://arxiv.org/abs/2403.16338v1
Compressor summary: This paper analyzes how lossy video compression affects fisheye camera images used in autonomous driving systems and proposes a new metric and method to improve compression efficiency.
http://arxiv.org/abs/2403.16334v1
Compressor summary: GLIDER is a novel framework for graph-structured data that tackles the challenges of out-of-distribution generalization by diversifying variations across domains and minimizing representation space discrepancy, leading to improved performance in predicting semantic labels.