This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-07 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.03950v1
Compressor summary: The paper explores using categorical cross-entropy for training value functions in deep reinforcement learning, improving performance and scalability in various domains.
http://arxiv.org/abs/2403.03942v1
Compressor summary: The study reveals that different subnetworks within a single language model can achieve similar performance in-domain but generalize differently, and this is related to the use of attention heads that compute shallow and non-generalizing features versus higher-level ones.
http://arxiv.org/abs/2403.03938v1
Compressor summary: GUIDE is a new method for continuous learning that uses diffusion models and classifier guidance to generate rehearsal examples for forgotten information, reducing catastrophic forgetting and outperforming previous approaches.
http://arxiv.org/abs/2403.03929v1
Compressor summary: The paper proposes a new method for short-term precipitation prediction using Transformer models and shows its effectiveness in capturing extreme weather events.
http://arxiv.org/abs/2403.03923v1
Compressor summary: The paper investigates how neural machine translation and large language models handle noisy inputs and shows they are more robust than previous models.
http://arxiv.org/abs/2403.03920v1
Compressor summary: The paper examines how AI and ML methods can enhance instructional quality by analyzing educational content, teacher discourse, and student responses using Elmore's Instructional Core Framework.
http://arxiv.org/abs/2403.03909v1
Compressor summary: The paper proposes a method to measure the linguistic diversity of multilingual NLP datasets by comparing them to a reference language sample using features extracted from typological databases and automatic text-based measures.
http://arxiv.org/abs/2403.03896v1
Compressor summary: DART is a neural method that uses radar physics to create realistic radar images for simulation and design.
http://arxiv.org/abs/2403.03894v1
Compressor summary: The authors introduce SLTrans, a large dataset with source code files and compiler intermediate representations, to improve Code-LMs' multilingual capabilities for code generation tasks.
http://arxiv.org/abs/2403.03893v1
Compressor summary: The paper proposes a method to reduce toxicity across multiple languages using various techniques, and evaluates its effectiveness on nine languages with different resources.
http://arxiv.org/abs/2403.03888v1
Compressor summary: Facts as a Function (FaaF) is a new method for evaluating RAG systems that uses function calling abilities of LMs to improve fact verification efficiency and reliability.
http://arxiv.org/abs/2403.03883v1
Compressor summary: SaulLM-7B, a 7 billion parameter language model, excels at legal text comprehension and generation after being trained on a large English legal corpus.
http://arxiv.org/abs/2403.03882v1
Compressor summary: The paper proposes a dual-branch network and transfer learning method to automatically improve weak training labels for multi-class medical image segmentation, achieving significant improvements in accuracy on abdominal CT scans.
http://arxiv.org/abs/2403.03881v1
Compressor summary: LD3M is a new method that uses diffusion in latent space to generate high-quality synthetic images from small datasets for machine learning models.
http://arxiv.org/abs/2403.03880v1
Compressor summary: The predictions of GNNs for probabilistic classification tasks become constant as the graph size increases, limiting their expressive power.
http://arxiv.org/abs/2403.03879v1
Compressor summary: The paper proposes a deep learning approach using CNNs and a transformer to detect and segment bladder cancer in cystoscopic images, improving accuracy and efficiency for diagnosis.
http://arxiv.org/abs/2403.03874v1
Compressor summary: The authors survey NLP literature and find little attention to socio-economic class, which they propose to include in future language technologies.
http://arxiv.org/abs/2403.03870v1
Compressor summary: The paper proposes a method to teach large language models to collaborate by interleaving their generations and optimizing the marginal likelihood, leading to better performance on various tasks and interesting collaboration patterns.
http://arxiv.org/abs/2403.03867v1
Compressor summary: This paper investigates how large language models learn linear representations of semantic concepts and shows that a simple latent variable model can explain this phenomenon.
http://arxiv.org/abs/2403.03866v1
Compressor summary: The paper introduces KIWI, a dataset for evaluating large language models' ability to follow instructions and provide writing assistance in the scientific domain, finding that current models struggle with incorporating new information and judging their own performance.
http://arxiv.org/abs/2403.03864v1
Compressor summary: The paper presents AlgoPuzzleVQA, a new dataset for multimodal puzzle-solving tasks that require visual understanding, language comprehension, and complex algorithmic reasoning, and shows that large language models struggle with these tasks.
http://arxiv.org/abs/2403.03863v1
Compressor summary: X-shot learning is a new challenge that involves adapting to different levels of label occurrences in real-world settings, and BinBin is a versatile system that outperforms previous methods on this task.
http://arxiv.org/abs/2403.03861v1
Compressor summary: The paper proposes a method to select complex examples for few-shot sequence tagging tasks, improving the performance of pretrained language models significantly.
http://arxiv.org/abs/2403.03857v1
Compressor summary: Emojinize is a method that uses large language models to translate text phrases into sequences of one or more emoji, increasing guessability and enabling various applications.
http://arxiv.org/abs/2403.03856v1
Compressor summary: The paper studies how well private algorithms can work using public data, focusing on optimization problems and showing the limits and optimal strategies for different settings.
http://arxiv.org/abs/2403.03854v1
Compressor summary: The paper proposes a method to improve unsupervised domain adaptation for semantic segmentation by using confident pseudo-labels from target data as data augmentation.
http://arxiv.org/abs/2403.03853v1
Compressor summary: The study introduces Block Influence, a metric to measure layer significance in large language models, and proposes ShortGPT, a method that removes redundant layers based on their scores, achieving better results than previous methods.
http://arxiv.org/abs/2403.03852v1
Compressor summary: The paper proposes novel training-free algorithms to accelerate diffusion generative models' samplers, achieving faster convergence rates than existing methods.
http://arxiv.org/abs/2403.03846v1
Compressor summary: The paper explores using distillation to defend against poisoned encoders in SSL by transferring benign knowledge from a teacher net to a student net, achieving significant reduction in attack success rate with minimal accuracy loss.
http://arxiv.org/abs/2403.03838v1
Compressor summary: The text proposes a new method for feature selection that uses a deep variational transformer model to generate feature selection decision sequences based on learned utility scores.
http://arxiv.org/abs/2403.03835v1
Compressor summary: Cobweb is a human-like categorization system that builds hierarchical structures using utility, captures psychological effects, and can exhibit both exemplar and prototype learning.
http://arxiv.org/abs/2403.03832v1
Compressor summary: The text is about a study on continuous authentication using gesture data from Minecraft players and machine learning models, with the most accurate model achieving 90% accuracy.
http://arxiv.org/abs/2403.03828v1
Compressor summary: The paper explores how mouse movement patterns can be used as a reliable and efficient method for continuous user authentication in gaming scenarios, using various machine learning models.
http://arxiv.org/abs/2403.03825v1
Compressor summary: The text describes how sensor-equipped vehicles can collect traffic data efficiently by emulating detection in a microscopic traffic simulation and using deep learning to recover hidden vehicles, improving traffic management.
http://arxiv.org/abs/2403.03823v1
Compressor summary: The paper proposes a modular approach for summarizing TV shows using scene detection, reordering, visual-to-text conversion, dialogue summarization, and fusion, and introduces PREFS, a new metric that evaluates summary quality based on fact recall and precision.
http://arxiv.org/abs/2403.03814v1
Compressor summary: MultiQ is a benchmark to evaluate multilingual question answering by open LLMs beyond their intended languages.
http://arxiv.org/abs/2403.03812v1
Compressor summary: ProbSAINT is a machine learning model for used car pricing that can quantify uncertainties and adapt to different expected offer durations, providing accurate and fair transactions.
http://arxiv.org/abs/2403.03791v1
Compressor summary: KG-TREAT is a novel framework that uses biomedical knowledge graphs to enhance the estimation of treatment effects on patient outcomes, showing improved performance over existing methods.
http://arxiv.org/abs/2403.03790v1
Compressor summary: The article proposes a novel unified visual-language model called Popeye for multi-source ship detection from remote sensing imagery using various methods, knowledge adaption, and pixel-level segmentation.
http://arxiv.org/abs/2403.03788v1
Compressor summary: The paper introduces a benchmark (PPTC-R) to evaluate how well Large Language Models (LLMs) can complete PowerPoint tasks in different real-world situations, and finds that GPT-4 performs best but all LLMs struggle with multiple challenges.
http://arxiv.org/abs/2403.03777v1
Compressor summary: The paper introduces Expectile-Regularised Neural Optimal Transport (ENOT), a new method for estimating optimal transportation plans that improves both accuracy and efficiency compared to previous approaches.
http://arxiv.org/abs/2403.03773v1
Compressor summary: VeriTraCER generates counterfactual explanations that are robust to small model updates, providing users with reliable guidance on how to change their inputs to achieve desired predictions.
http://arxiv.org/abs/2403.03772v1
Compressor summary: The paper presents a method to efficiently parallelize existing causal discovery methods, enabling their application on large-scale datasets by significantly speeding up the process.
http://arxiv.org/abs/2403.03768v1
Compressor summary: DeepCRE is a novel computational model that significantly improves cross-drug response evaluation and can help discover new therapeutics.
http://arxiv.org/abs/2403.03750v1
Compressor summary: The paper introduces a new dataset for detecting hallucinations in German summarization tasks using large language models, which can help improve their faithfulness to the source text.
http://arxiv.org/abs/2403.03744v1
Compressor summary: This paper evaluates the safety and alignment of medical large language models (LLMs) using a dataset of harmful questions and suggests fine-tuning as a mitigation strategy to reduce potential harms.
http://arxiv.org/abs/2403.03741v1
Compressor summary: SUPClust is an active learning method that focuses on finding points at the decision boundary between classes to improve model performance, especially in imbalanced datasets.
http://arxiv.org/abs/2403.03740v1
Compressor summary: Our method learns to represent photographic image layouts using heterogeneous graphs and autoencoders, outperforming existing approaches with a new dataset, LODB.
http://arxiv.org/abs/2403.03739v1
Compressor summary: The paper proposes A&B BNN, which reduces binary neural network's multiplication operations by replacing them with bit operations and achieves competitive results on image classification tasks.
http://arxiv.org/abs/2403.03737v1
Compressor summary: The TNTM model combines transformer embeddings and probabilistic modelling to achieve high-quality topic representation with fast inference and good diversity.
http://arxiv.org/abs/2403.03736v1
Compressor summary: The paper introduces a novel image generation-compression paradigm that uses vector-quantized image models and a multi-stage transformer to improve perceptual quality, especially in ultra-low bitrate scenarios.
http://arxiv.org/abs/2403.03730v1
Compressor summary: The paper presents a new network architecture that learns to segment, localize, and perceive depth of objects from images and self-motion, mimicking human infants' abilities.
http://arxiv.org/abs/2403.03728v1
Compressor summary: The study proposes a heuristic called TCM that combines diversity-based and uncertainty-based sampling strategies for active learning, improving performance across different data levels.
http://arxiv.org/abs/2403.03726v1
Compressor summary: DiMA is a model that generates diverse and accurate protein sequences using continuous diffusion on embeddings derived from the ESM-2 protein language model, outperforming existing methods in quality and diversity metrics.
http://arxiv.org/abs/2403.03721v1
Compressor summary: CMDA is a novel unsupervised domain adaptation method that uses visual semantic cues from camera images to improve generalization of 3D object detection models across different domains.
http://arxiv.org/abs/2403.03719v1
Compressor summary: The paper presents a Multimodal-LLM architecture for Text-cloze, a challenging task in comics, which improves performance by using a Domain-Adapted ResNet-50 visual encoder and new OCR annotations.
http://arxiv.org/abs/2403.03715v1
Compressor summary: The paper proposes a novel memory-augmented method for zero-shot image captioning that generates concept-centered captions with fewer hallucinations and more world-knowledge, outperforming existing methods.
http://arxiv.org/abs/2403.03707v1
Compressor summary: The paper proposes a framework that learns pixel-level alignment between images and text to improve semantic segmentation without dense annotations.
http://arxiv.org/abs/2403.03704v1
Compressor summary: CPCA is a novel method for semantic segmentation of remote sensing imagery across domains, which disentangles causal features from bias features and adapts to invariant causal factors using intervention techniques.
http://arxiv.org/abs/2403.03698v1
Compressor summary: The paper introduces CTS, a framework that generates controllable time series by decoupling the mapping process from VAE training and evaluates its effectiveness on three real-world datasets.
http://arxiv.org/abs/2403.03691v1
Compressor summary: MolNexTR is a deep learning model that converts molecular images into graph structures and SMILES strings by fusing ConvNext and Vision-Transformer, using advanced algorithms for data augmentation and post-processing to handle diverse image styles.
http://arxiv.org/abs/2403.03690v1
Compressor summary: Key points: - Human annotation is costly for creating instruction data and evaluation benchmarks for large language models in non-English languages like Japanese. - The authors propose a self-instruct method based on GPT-4 that uses English instructions translated and edited into Japanese as demonstrations to generate Japanese instruction data. - They also construct an evaluation benchmark using GPT-4 to assess the response quality of LLMs without human references, which shows their models outperformed existing ones. Summary: The authors use GPT-4 to create instruction data and evaluation benchmarks for large language models in Japanese with minimal human annotation, achieving better results than previous methods.
http://arxiv.org/abs/2403.03689v1
Compressor summary: The paper proposes a two-step fine-tuning method to improve Neural Machine Translation for domains with special writing formulas like e-commerce, using domain-specific resources and self-contrastive semantic enhancement.
http://arxiv.org/abs/2403.03674v1
Compressor summary: The study proposes AdvIG, a novel infrared physical attack that uses geometric shapes and optimizes their parameters to execute efficient black-box query attacks with high success rates.
http://arxiv.org/abs/2403.03672v1
Compressor summary: Key points: - Study online learning problems in CMDPs with adversarial losses and hard constraints - Design two algorithms: one for general CMDPs with sublinear regret and constraint violation, and one for CMDPs with known policy that satisfies constraints with high probability and has sublinear regret - First work to consider both adversarial losses and hard constraints in CMDPs - Algorithms can handle non-stationary environments and stricter requirements than existing ones - Applicable to real-world scenarios like autonomous driving, online advertising, and recommender systems Summary: The paper proposes two novel algorithms for online learning in CMDPs with adversarial losses and hard constraints, achieving sublinear regret and satisfying constraints in non-stationary environments, and applies them to various real-world problems.
http://arxiv.org/abs/2403.03671v1
Compressor summary: The authors propose a new method to detect floods in remote sensing data using temporal anomaly detection and show promising results.
http://arxiv.org/abs/2403.03670v1
Compressor summary: The paper proposes a simple framework for clustering complex data with linear complexity by using graph filtering and similarity-preserving regularization.
http://arxiv.org/abs/2403.03666v1
Compressor summary: The paper proposes a novel method for graph clustering that handles both homophilic and heterophilic graphs, outperforming existing methods in experiments.
http://arxiv.org/abs/2403.03662v1
Compressor summary: The paper proposes a test-time adaptation method that improves pixel-level synthesis solutions for video stabilization by adapting models to individual input videos, leading to significant stability and quality gains with only one adaptation step.
http://arxiv.org/abs/2403.03659v1
Compressor summary: Key points: - Graph is important for learning tasks but often noisy and sparse - Most methods assume homophilic graphs, ignoring heterophily - Proposed a novel robust graph structure learning method for heterophilic data - Method uses high-pass filter, adaptive norm, and regularizer to refine graph structure Summary: The paper proposes a new method to learn robust graphs from noisy and sparse heterophilic data by using a high-pass filter, an adaptive norm, and a regularizer.
http://arxiv.org/abs/2403.03645v1
Compressor summary: K-Link uses large language models to create a knowledge-link graph that improves graph construction from multivariate time-series data, enhancing graph neural network performance on various tasks.
http://arxiv.org/abs/2403.03643v1
Compressor summary: This paper reviews recent reinforcement learning methods for spatial resource allocation problems, discussing their advantages, challenges, and open questions.
http://arxiv.org/abs/2403.03640v1
Compressor summary: The authors develop multilingual medical AI models that can provide tailored healthcare services in various languages, using the ApolloCorpora dataset and achieving state-of-the-art performance.
http://arxiv.org/abs/2403.03636v1
Compressor summary: SheetAgent is a novel autonomous agent that uses a large language model to perform long-horizon and multi-category spreadsheet manipulation tasks with reasoning, achieving improved precision and table reasoning abilities.
http://arxiv.org/abs/2403.03631v1
Compressor summary: The paper proposes an efficient probabilistic forecasting method for wind power data with missing values by using a generative model that estimates joint distributions without preprocessing.
http://arxiv.org/abs/2403.03628v1
Compressor summary: GPTopic is a software package that uses large language models to create interactive, dynamic topic representations for text corpora, making topic modeling more accessible and comprehensive.
http://arxiv.org/abs/2403.03627v1
Compressor summary: The authors propose a method to assess how well multimodal language models can support fact-checking and find that GPT-4V performs better than existing models, while also identifying their limitations and biases.
http://arxiv.org/abs/2403.03608v1
Compressor summary: GSNeRF is a method to generate images and semantic maps from unseen scenes by combining multi-view inputs, semantic features, and geometry information.
http://arxiv.org/abs/2403.03607v1
Compressor summary: The paper proposes a new method to analyze and visualize topic models, which allows for higher-dimensional conceptual relationships between topics.
http://arxiv.org/abs/2403.03600v1
Compressor summary: The paper proposes a privacy-preserving framework using multi-modal data to improve cross-domain recommendation accuracy by disentangling domain-common and domain-specific features.
http://arxiv.org/abs/2403.03599v1
Compressor summary: The paper proposes CIT, a mechanism that improves GNNs' generalization by transferring cluster information and preserving node diversity when the test graph structure differs from the training one.
http://arxiv.org/abs/2403.03594v1
Compressor summary: This study shows how GPT-4 with Vision can predict human aesthetic evaluations of images better than other models and suggests creating an AI system based on scientific knowledge of beauty perception.
http://arxiv.org/abs/2403.03585v1
Compressor summary: RouteExplainer is a framework that provides post-hoc explanations for vehicle routing problems by classifying edges based on their intentions and using large language models to generate explanation texts.
http://arxiv.org/abs/2403.03582v1
Compressor summary: adaptNMT is an open-source tool for easy development and deployment of neural machine translation models with features like subword segmentation, intuitive UI, and eco-friendly evaluation.
http://arxiv.org/abs/2403.03581v1
Compressor summary: The study used various AI models to analyze tweets and showed their potential in improving ASD diagnosis with high accuracy.
http://arxiv.org/abs/2403.03575v1
Compressor summary: The gaHealth corpus, a bilingual dataset for English to Irish health translation, improved BLEU scores by 40% compared to the best performing models from the LoResMT2021 Shared Task, and provides linguistic guidelines for creating similar low-resource data sets.
http://arxiv.org/abs/2403.03569v1
Compressor summary: The paper proposes a theoretical framework for analyzing class transferability in machine learning using a partially ordered set of subsets of classes and explores its practical applications in few-shot learning.
http://arxiv.org/abs/2403.03562v1
Compressor summary: The paper proposes a new algorithm for group distributionally robust optimization with better performance and convergence guarantees than existing methods.
http://arxiv.org/abs/2403.03561v1
Compressor summary: HMD-Poser is a novel approach for real-time human motion tracking using scalable sparse observations from a VR headset and body-worn inertial measurement units (IMUs), achieving state-of-the-art accuracy and performance.
http://arxiv.org/abs/2403.03558v1
Compressor summary: The paper introduces a new method for evaluating language models' reliability in answering math word problems using an unanswerable question dataset and shows that training with human feedback improves their performance.
http://arxiv.org/abs/2403.03550v1
Compressor summary: The study shows how OpenAI's LLMs can create fake news and respond to emotions, and suggests that they should be used responsibly to prevent misinformation.
http://arxiv.org/abs/2403.03544v1
Compressor summary: The paper proposes a new framework for designing diverse and effective prompts to improve language-based forecasting of human mobility patterns using large language models.
http://arxiv.org/abs/2403.03542v1
Compressor summary: The paper introduces a new pre-training method and a scalable model architecture for neural operators in partial differential equations, achieving state-of-the-art results on various benchmarks and tasks.
http://arxiv.org/abs/2403.03535v1
Compressor summary: This paper introduces Task Attribute Distance (TAD), a model-agnostic metric to quantify the relationship between training and novel tasks in few-shot learning, and shows its effectiveness in applications like data augmentation and test-time intervention.
http://arxiv.org/abs/2403.03532v1
Compressor summary: EYOC is an unsupervised method for registering distant point clouds in driving scenarios without global pose labels, achieving comparable performance to supervised methods and better generalization.
http://arxiv.org/abs/2403.03521v1
Compressor summary: The paper introduces a bidirectional semantic-based evaluation method for neural machine translation that uses BabelNet to measure the sense distance between source and output sentences.
http://arxiv.org/abs/2403.03517v1
Compressor summary: IB-Net is a framework that uses graph neural networks to help solve Boolean Satisfiability problems, making Electronic Design Automation faster and more efficient.
http://arxiv.org/abs/2403.03516v1
Compressor summary: UMR is an unsupervised method to train multilingual dense retrievers using sequence likelihood estimation of multilingual language models, achieving better performance than supervised baselines.
http://arxiv.org/abs/2403.03514v1
Compressor summary: CLongEval is a comprehensive Chinese benchmark for evaluating long-context language models with sufficient data volume, broad applicability, and high quality.
http://arxiv.org/abs/2403.03512v1
Compressor summary: The paper proposes a two-stage network that uses global and local contrastive learning to improve multi-organ segmentation with semi-supervised learning, considering relations among images and categories.
http://arxiv.org/abs/2403.03508v1
Compressor summary: CounterfacTS is a tool that helps visualize and create counterfactuals for time-series forecasting models, enabling users to explore how changes in the data affect their performance and robustness.
http://arxiv.org/abs/2403.03507v1
Compressor summary: GaLore is a training strategy for large language models that reduces memory usage while maintaining efficiency and performance in pre-training and fine-tuning stages.
http://arxiv.org/abs/2403.03506v1
Compressor summary: The study examines detecting AI-generated sentences in realistic human-AI collaboration texts and suggests using the CoAuthor dataset and considering segment length for better detection.
http://arxiv.org/abs/2403.03496v1
Compressor summary: The paper introduces multi-source Wizard of Wikipedia (Ms.WoW), a benchmark for evaluating dialogue systems that can select and use knowledge from multiple sources, and a challenge to test their ability to adapt to new sources.
http://arxiv.org/abs/2403.03493v1
Compressor summary: The paper introduces VastTrack, a large-scale benchmark for visual tracking with diverse object categories, more videos, and rich linguistic annotations.
http://arxiv.org/abs/2403.03485v1
Compressor summary: NoiseCollage is a new text-to-image diffusion model that improves layout conditions by independently estimating and cropping noises for each object, outperforming several existing models and integrating with ControlNet to enhance edge, sketch, and pose skeleton control.
http://arxiv.org/abs/2403.03483v1
Compressor summary: The paper proposes Teacher-Free Graph Self-Distillation (TGS), a method that improves the performance of MLPs on graph-related tasks without using GNNs or teachers, achieving fast inference and outperforming existing methods.
http://arxiv.org/abs/2403.03481v1
Compressor summary: The paper presents a system that uses language models to automatically update annotations in changing text documents, enabling new applications in program writing and debugging.
http://arxiv.org/abs/2403.03477v1
Compressor summary: CoMasTRe is a two-stage segmentation method that combines objectness learning and classification, using distillation to improve performance and prevent forgetting on PASCAL VOC and ADE20K.
http://arxiv.org/abs/2403.03473v1
Compressor summary: The paper introduces FNGD, a fast natural gradient descent method that reduces computational complexity and achieves speedup in image classification and machine translation tasks.
http://arxiv.org/abs/2403.03472v1
Compressor summary: Our proposed end-to-end training method for few-shot learning combines cross entropy loss with meta-learning and improves performance significantly.
http://arxiv.org/abs/2403.03468v1
Compressor summary: The paper introduces a new real-time multi-task network for autonomous driving that handles 3D object detection, semantic segmentation, and dense depth estimation using a task-adaptive attention generator to prevent negative transfer.
http://arxiv.org/abs/2403.03465v1
Compressor summary: The paper introduces GCN-SA, a graph neural network that uses self-attention to capture long-range dependencies and perform better representation learning on heterophilous graphs.
http://arxiv.org/abs/2403.03463v1
Compressor summary: Key points: - Machine learning benefits research fields but challenges remain for small/rare object detection - The authors present a dataset automata using diffusion models to generate paired wildfire images with controlled flame position and size - They vary the background of synthesized images by controlling text prompt and input image - They use CLIP model to filter out low-quality images and preserve domain shift Summary: The authors propose a dataset automata that uses diffusion models and CLIP filtering to generate high-quality, paired wildfire images with controlled flame features and varying backgrounds for small/rare object detection tasks.
http://arxiv.org/abs/2403.03461v1
Compressor summary: The paper introduces YoutubeFish-35, a new large-scale dataset for indiscernible object counting, and proposes TransVidCount, a method that combines density and regression branches to perform well on it.
http://arxiv.org/abs/2403.03458v1
Compressor summary: Slot Abstractors is a scalable method for abstract visual reasoning with multi-object inputs and multiple relations, achieving state-of-the-art performance in four tasks.
http://arxiv.org/abs/2403.03456v1
Compressor summary: Key points: - The paper proposes DLP-GAN, a novel framework for translating ancient Chinese landscape paintings into modern photos and sketches. - The framework uses asymmetric cycle mapping, dense-fusion module, and dual-consistency loss to balance realism and abstraction. - Experiments show that the model outperforms existing methods. Summary: The paper presents DLP-GAN, a new method for converting ancient Chinese landscape paintings into modern photos and sketches using a novel framework with various components to ensure quality.
http://arxiv.org/abs/2403.03454v1
Compressor summary: The paper proposes a new method for learning to optimize constrained problems using dual solution estimates and improves convergence by incorporating augmented Lagrangian techniques.
http://arxiv.org/abs/2403.03452v1
Compressor summary: The paper proposes novel methods for abstract reasoning tasks like Raven's Progressive Matrices and Bongard-Logo problems by redefining concept boundaries and improving distribution estimation, leading to state-of-the-art performance.
http://arxiv.org/abs/2403.03448v1
Compressor summary: The paper introduces a new method to improve kernel k-means clustering by integrating both kernel correlation and dissimilarity, leading to better performance and more objective information extraction.
http://arxiv.org/abs/2403.03447v1
Compressor summary: HDRFlow is a robust and efficient flow estimator that uses an HDR-domain alignment loss, an efficient flow network with a multi-size large kernel, and synthetic data to reconstruct high dynamic range video from alternating exposure image sequences in real-time.
http://arxiv.org/abs/2403.03444v1
Compressor summary: The paper proposes a novel inference approach using Ensemble Kalman Inversion (EKI) to efficiently and informatively estimate uncertainty in DeepONet predictions, especially for limited and noisy data.
http://arxiv.org/abs/2403.03435v1
Compressor summary: The paper presents the first research on AI for the Vietnamese language in the legal domain, highlighting key linguistic challenges.
http://arxiv.org/abs/2403.03432v1
Compressor summary: The Mixture-of-LoRAs (MoA) architecture is a novel tuning method that enhances multi-task learning with large language models by combining multiple LoRA modules using an explicit routing strategy and domain labels, enabling quick adaptation to new domains.
http://arxiv.org/abs/2403.03431v1
Compressor summary: This paper analyzes how cross and self-attention maps in Stable Diffusion affect image editing and proposes a simpler, more efficient tuning-free method based on the findings.
http://arxiv.org/abs/2403.03425v1
Compressor summary: 3DToMolo is an innovative deep learning method that generates novel molecules with specified symmetries and properties by harmonizing diverse modalities and aligning them seamlessly.
http://arxiv.org/abs/2403.03421v1
Compressor summary: LEAD is a novel method for Universal Domain Adaptation without source data that uses feature decomposition and instance-level decision boundaries to identify target-private data.
http://arxiv.org/abs/2403.03419v1
Compressor summary: D$^2$O is a novel alignment method for LLMs that uses only negative samples to reduce harmfulness while preserving helpfulness, achieving superior results in generating safe and informative responses.
http://arxiv.org/abs/2403.03414v1
Compressor summary: The text describes a method called vcHMM that uses Hidden Markov Models and Finite State Automata to study the system-level dynamics of mental health, questionnaire data, and fMRI data, providing insights into how behavior and neural activity relate to depression.
http://arxiv.org/abs/2403.03412v1
Compressor summary: OOD-R is a curated dataset with enhanced noise reduction and ActFun is a method to fine-tune model response for better OOD detection and uncertainty estimation in neural networks.
http://arxiv.org/abs/2403.03410v1
Compressor summary: The text compares three algorithms for predicting crypto currency prices and finds that the Support Vector Machine with a linear kernel has the smallest error.
http://arxiv.org/abs/2403.03408v1
Compressor summary: The authors propose a novel framework for estimating scene depth from oriental landscape painting images, enabling 3D sculpture creation and improving accessibility for visually impaired people.
http://arxiv.org/abs/2403.03406v1
Compressor summary: The paper proposes an EnKF-LSTM data assimilation method for crop growth prediction that combines ensemble Kalman filter and LSTM neural network, improving accuracy by incorporating real-time data.
http://arxiv.org/abs/2403.03405v1
Compressor summary: CausalVLN is a framework that uses causal learning to train robust navigators with unbiased feature representations, improving generalization across different environments.
http://arxiv.org/abs/2403.03401v1
Compressor summary: BAIT is a framework to compare learning methods in Interactive Theorem Proving, showing that Structure Aware Transformers perform well and leading to a novel end-to-end system.
http://arxiv.org/abs/2403.03400v1
Compressor summary: The paper proposes a method to learn facial action unit (AU) representations from unlabelled videos using contrastive learning, which improves AU detection performance and reduces data scarcity.
http://arxiv.org/abs/2403.03396v1
Compressor summary: The paper introduces a new task of grading sentence translation exercises in L2 language learning and creates a dataset for it, showing that existing models struggle to classify responses accurately.
http://arxiv.org/abs/2403.03390v1
Compressor summary: The paper proposes a semi-supervised learning framework for weed detection using object detection frameworks, which achieves high accuracy with less labeled data and promotes sustainable agriculture.
http://arxiv.org/abs/2403.03382v1
Compressor summary: ADM is a new paradigm for lifelong learning that adaptively discovers and merges novel classes without losing established knowledge, using Triple Comparison and Probability Regularization for category assignment and Adaptive Model Merging for knowledge integration.