This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-08 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.04765v1
Compressor summary: The paper proposes a new method that improves the efficiency and accuracy of semi-dense matching across images, with potential applications in image retrieval and 3D reconstruction.
http://arxiv.org/abs/2403.04764v1
Compressor summary: The paper proposes a new batch Bayesian Optimization method using Thompson Sampling that minimizes redundancy and has low regret, and shows superior performance on nonconvex test functions.
http://arxiv.org/abs/2403.04763v1
Compressor summary: The paper shows how different graph learning techniques can be viewed as special cases or simplifications of bilevel optimization and presents a flexible class of energy functions for GNN message-passing layers with residual error analysis, called BloomGML.
http://arxiv.org/abs/2403.04759v1
Compressor summary: The paper introduces LifeHD, an on-device lifelong learning system for IoT devices using hyperdimensional computing, which improves unsupervised clustering accuracy and energy efficiency compared to existing methods.
http://arxiv.org/abs/2403.04755v1
Compressor summary: The paper presents a method to estimate 3D poses from LiDAR scans using minimal storage and clustering, while maintaining accurate localization with an object-matching network.
http://arxiv.org/abs/2403.04747v1
Compressor summary: The paper proposes a new aggregation function for graph neural networks (GNNs) that preserves expressivity and improves learning dynamics, potentially leading to self-normalizing GNNs.
http://arxiv.org/abs/2403.04746v1
Compressor summary: The paper proposes a biologically inspired method called simulated trial and error (STE) that improves the accuracy and reliability of tool use by large language models, outperforming existing methods like GPT-4.
http://arxiv.org/abs/2403.04744v1
Compressor summary: The paper investigates NGCA's complexity in the SQ model and shows that only the moment-matching condition is necessary for hardness, not the chi-squared condition.
http://arxiv.org/abs/2403.04739v1
Compressor summary: The paper proposes a new evaluation protocol for scene flow methods that accounts for object size, speed, and class, and demonstrates its effectiveness with a simple but powerful baseline method called TrackFlow.
http://arxiv.org/abs/2403.04735v1
Compressor summary: SnapNTell is a new benchmark for entity-centric visual question answering that challenges large language models to provide accurate and detailed responses about various entities across 22 categories.
http://arxiv.org/abs/2403.04732v1
Compressor summary: The text explores vision-based deductive reasoning in VLMs using Raven's Progressive Matrices and finds that current state-of-the-art models struggle with understanding complex visual patterns.
http://arxiv.org/abs/2403.04724v1
Compressor summary: MCAE is a new Capsule Network model that uses self-supervised pretraining with masked image modelling to improve performance on complex data tasks.
http://arxiv.org/abs/2403.04720v1
Compressor summary: The research introduces a new encoder-based model for representing tabular datasets in meta-learning tasks, comparing it with Dataset2Vec and highlighting the importance of task-specific representations.
http://arxiv.org/abs/2403.04706v1
Compressor summary: The paper demonstrates that a large language model can perform well on math benchmarks with proper pre-training and data scaling, but struggles to reliably generate correct answers without them.
http://arxiv.org/abs/2403.04701v1
Compressor summary: The paper proposes a method to generate diverse object-to-background changes using text-to-image, image-to-text, and image-to-segment models, and evaluates the robustness of vision-based models against these changes.
http://arxiv.org/abs/2403.04700v1
Compressor summary: This paper explores the long-tail distribution issue in multiple object tracking data and proposes two data augmentation strategies to mitigate its effects on performance.
http://arxiv.org/abs/2403.04697v1
Compressor summary: The paper proposes AUFormer, a new method for facial action unit detection using parameter-efficient transfer learning and a novel loss function, achieving state-of-the-art performance without extra data.
http://arxiv.org/abs/2403.04696v1
Compressor summary: The paper proposes a method to detect hallucinations and fact-check claims in large language models using token-level uncertainty scores, which improve over baselines and are comparable to external fact-checking tools.
http://arxiv.org/abs/2403.04693v1
Compressor summary: The text describes a universal evaluation methodology for scientific and technological collaborative competitions, which can handle classification and regression problems, account for different difficulties, and provide more accurate performance comparisons.
http://arxiv.org/abs/2403.04692v1
Compressor summary: PixArt-\Sigma is a more efficient text-to-image diffusion model that generates higher quality 4K images with better alignment to user prompts and uses less data and parameters than previous models.
http://arxiv.org/abs/2403.04690v1
Compressor summary: The paper proposes efficient kernels for neighborhood attention, a self-attention variant that limits token attention to nearby tokens, improving latency and reducing memory footprint.
http://arxiv.org/abs/2403.04671v1
Compressor summary: The text explores how sociocognitive linguistic patterns differ by race/ethnicity and gender in collaborative problem-solving tasks and discusses the implications of diversity on communication and collaboration.
http://arxiv.org/abs/2403.04670v1
Compressor summary: Contextual Optimization (CO) uses machine learning and optimization to solve uncertain decision problems, and a new approach called Conditional Robust Optimization (CRO) enhances safety and reliability by combining uncertainty quantification with robust optimization, achieving high quality conditional coverage using differentiable optimization methods.
http://arxiv.org/abs/2403.04667v1
Compressor summary: The paper examines the social impacts of ChatGPT and other generative AI models, considering both their benefits and risks for various sectors and proposing ways to promote ethical and human-centered AI development.
http://arxiv.org/abs/2403.04666v1
Compressor summary: This paper evaluates Phi-2, a small language model that can understand and answer questions about telecom standards with high accuracy by using a Retrieval-Augmented Generation approach.
http://arxiv.org/abs/2403.04661v1
Compressor summary: The paper proposes a Dynamic Cross-Attention (DCA) model for audio-visual identity verification that adapts to strong or weak complementary relationships between audio and visual features, achieving state-of-the-art results on Voxceleb1 dataset.
http://arxiv.org/abs/2403.04656v1
Compressor summary: The paper proposes a model called CoTE for dialogue state tracking that generates explanations to improve accuracy and reliability in slot value determination.
http://arxiv.org/abs/2403.04654v1
Compressor summary: The paper proposes a recursive cross-attentional model with BLSTMs for better audio-visual fusion in person verification, improving over unimodal systems.
http://arxiv.org/abs/2403.04652v1
Compressor summary: The Yi model family is a series of multidimensional language and vision models that achieve strong performance on various benchmarks due to their high-quality data and super-computing infrastructure.
http://arxiv.org/abs/2403.04650v1
Compressor summary: The Context-Based Multimodal Fusion model combines modality fusion and data distribution alignment to solve complex multimodal tasks with reduced computational and training data requirements.
http://arxiv.org/abs/2403.04643v1
Compressor summary: QAQ is a novel compression scheme that adapts to key and value caches in NLP models, allowing for more efficient deployment of LLMs with minimal performance loss.
http://arxiv.org/abs/2403.04642v1
Compressor summary: The paper compares different algorithms that use reinforcement learning from human feedback to improve large language models' reasoning capabilities and finds Expert Iteration performs best with similar sample complexity to PPO.
http://arxiv.org/abs/2403.04640v1
Compressor summary: The paper presents CAT, a method that enhances MLLMs for answering questions in complex audio-visual scenarios by aggregating clues, using a mixed multimodal dataset, and optimizing for non-ambiguity responses.
http://arxiv.org/abs/2403.04639v1
Compressor summary: The paper introduces MaCMS, the first Magahi-Hindi-English code-mixed sentiment analysis dataset, and analyzes its structure, language preferences, and quality.
http://arxiv.org/abs/2403.04636v1
Compressor summary: The paper proposes an entropy-aware message passing term for GNNs that prevents oversmoothing by preserving node diversity during aggregation.
http://arxiv.org/abs/2403.04634v1
Compressor summary: The paper introduces Pix2Gif, a model that generates GIFs from images using text and motion guidance, and uses a new warping module and perceptual loss to ensure quality and coherence.
http://arxiv.org/abs/2403.04629v1
Compressor summary: ShapleyBO is a framework that uses game-theoretic Shapley values to interpret and improve Bayesian optimization, enabling better exploration and personalization of wearable robotic devices by humans.
http://arxiv.org/abs/2403.04605v1
Compressor summary: IN-N-OUT is a new method to improve the calibration of graph neural networks (GNNs) for predicting links by labeling edges with true/false labels based on GNN predictions, which leads to better embeddings and more accurate probabilities.
http://arxiv.org/abs/2403.04599v1
Compressor summary: The paper proposes CCLIS, a method that uses importance sampling to select replay buffers and prototype-instance relation distillation to maintain knowledge, which improves continual learning by reducing catastrophic forgetting.
http://arxiv.org/abs/2403.04593v1
Compressor summary: The paper introduces Embodied Language Model (ELM), a framework that enables autonomous agents to understand driving scenes with large spatial and temporal spans by incorporating space-aware pre-training and time-aware token selection.
http://arxiv.org/abs/2403.04588v1
Compressor summary: Key points: - Humans can create mental representations from multiple senses and generalize information across domains - Robotics and RL agents struggle to exploit sensor redundancy and complementarity for robustness and generalization - A brain-inspired multimodal representation called 'Global Workspace' combines information across modalities and transfers policies between them without extra training Summary: The paper proposes a 'Global Workspace', a multimodal representation inspired by the human brain, that can improve RL agents' robustness and generalization by combining and transferring information across different sensors.
http://arxiv.org/abs/2403.04583v1
Compressor summary: The paper proposes a new method for camera calibration using conic features based on moments that can overcome distortion limitations and improve accuracy.
http://arxiv.org/abs/2403.04580v1
Compressor summary: The authors create a large dataset of organic reaction intermediates and train machine learning models to predict reaction pathways, impurities, and roles of catalysts and reagents.
http://arxiv.org/abs/2403.04577v1
Compressor summary: The paper analyses a benchmark dataset for table interpretation tasks, finds it too simple, creates a new more realistic dataset, introduces a novel entity linking problem, and proposes a prompting framework to evaluate large language models on this task.
http://arxiv.org/abs/2403.04571v1
Compressor summary: The text explores how deep learning lacks reasoning and uncertainty estimation skills compared to human mathematicians, and proposes using information theory to discover interesting conjectures in mathematics.
http://arxiv.org/abs/2403.04568v1
Compressor summary: The paper proposes a new algorithm for linear mixture MDPs that achieves better regret than previous methods by using a novel least square estimator and self-normalized concentration.
http://arxiv.org/abs/2403.04562v1
Compressor summary: The authors propose an event-based method for class-agnostic motion segmentation that works in complex large-scale outdoor environments and achieves state-of-the-art results on indoor and outdoor benchmarks.
http://arxiv.org/abs/2403.04558v1
Compressor summary: The authors explore self-supervised learning methods for breast cancer diagnosis using consumer-grade hardware and show that they can improve classification performance while reducing training time by 90%.
http://arxiv.org/abs/2403.04551v1
Compressor summary: The text introduces a fine-grained taxonomy of hardness types and a toolkit to benchmark Hardness Characterization Methods (HCMs) for ML model development.
http://arxiv.org/abs/2403.04549v1
Compressor summary: The paper proposes a new explanation approach for face recognition systems called FGGB, which generates precise saliency maps to interpret the system's decisions.
http://arxiv.org/abs/2403.04547v1
Compressor summary: The study presents a novel algorithm (M4) to reduce biases in CLIP models by balancing data and analyzes its effects on different factors and metrics.
http://arxiv.org/abs/2403.04545v1
Compressor summary: The paper investigates how to choose a scaling factor ($\alpha$) for ResNets to avoid generalization issues and improve performance on various tasks.
http://arxiv.org/abs/2403.04541v1
Compressor summary: The paper presents a two-step system for generating ASP programs from natural language, using neural machine translation and CNL2ASP tool.
http://arxiv.org/abs/2403.04526v1
Compressor summary: Autoencoder neural networks improve unmixing accuracy, robustness, and efficiency in hyperspectral Raman spectroscopy for chemical composition analysis.
http://arxiv.org/abs/2403.04523v1
Compressor summary: T-TAME is a method to generate explanations for Vision Transformers and other neural networks in image classification tasks, achieving state-of-the-art performance with less computation than existing techniques.
http://arxiv.org/abs/2403.04521v1
Compressor summary: The paper proposes a novel framework, UFKGC, for uncertainity-aware few-shot knowledge graph completion that models uncertainty using Gaussian distributions and improves robustness to noises.
http://arxiv.org/abs/2403.04511v1
Compressor summary: The study explores deep filter bubbles on short-video platforms, how they evolve over time, and what factors influence them, while suggesting ways to mitigate their negative effects.
http://arxiv.org/abs/2403.04510v1
Compressor summary: The study explores the point where large language models transition from learning in context to translating, and finds a "task recognition" layer where attention to context is no longer needed.
http://arxiv.org/abs/2403.04508v1
Compressor summary: The paper introduces a new concept of scene exploration with NeRFs, proposes three methods to efficiently discover inputs for novel view synthesis, and shows that the proposed EGPS method outperforms baselines.
http://arxiv.org/abs/2403.04507v1
Compressor summary: The text introduces a novel method for evaluating natural language preprocessing tools using a benchmarking system inspired by GLUE and applied to Polish.
http://arxiv.org/abs/2403.04504v1
Compressor summary: ROGMC is a new method that uses cumulative preference propagation and interest regularization to incorporate rating ordinality in graph neural networks for matrix completion, leading to better recommendations.
http://arxiv.org/abs/2403.04493v1
Compressor summary: The text discusses the challenges of quantifying realism in generated data, proposes a new concept called universal critic, and argues that it is different from adversarial critics.
http://arxiv.org/abs/2403.04492v1
Compressor summary: The paper proposes two improvements for cross-domain few-shot classification: a parameter-efficient adaptation strategy and a variance-aware loss function, achieving better accuracy and efficiency than existing methods.
http://arxiv.org/abs/2403.04484v1
Compressor summary: The study compares the performance of ImageNet and RadImageNet in medical imaging classification, finding that both achieve similar results but ImageNet overfits more to confounders.
http://arxiv.org/abs/2403.04483v1
Compressor summary: Key points: - The paper proposes a benchmark called GraphInstruct to evaluate and enhance the graph understanding abilities of large language models (LLMs) - The paper constructs two LLM variants, GraphLM and GraphLM+, by instruction-tuning and step mask training respectively - The paper shows that GraphLM and GraphLM+ outperform other LLMs on 21 classical graph reasoning tasks - The paper releases the code for generating GraphInstruct publicly Summary: The paper introduces GraphInstruct, a benchmark to test and improve LLMs' graph understanding skills, and two enhanced LLM variants that excel on various graph reasoning tasks.
http://arxiv.org/abs/2403.04482v1
Compressor summary: This paper characterizes the topology awareness of graph neural networks (GNNs) and studies its impact on generalization performance and fairness, showing that improving topology awareness can cause unfair generalization in some cases.
http://arxiv.org/abs/2403.04481v1
Compressor summary: The study uses Large Language Models for understanding spoken language with multiple intentions, creating new datasets and metrics to evaluate their performance.
http://arxiv.org/abs/2403.04477v1
Compressor summary: The paper investigates how specific hyperparameters affect MLP performance in time series forecasting and introduces a large new dataset for this task.
http://arxiv.org/abs/2403.04473v1
Compressor summary: TextMonkey is a text-centric large multimodal model that enhances performance and interpretability in tasks like DocVQA and scene text analysis with improvements on several benchmark datasets.
http://arxiv.org/abs/2403.04471v1
Compressor summary: The shutdown problem explores how to create artificial agents that can competently pursue goals and still shut down when needed without trying to prevent or cause the shutdown button to be pressed, with trade-offs depending on the agent's patience.
http://arxiv.org/abs/2403.04468v1
Compressor summary: The text surveys existing Graph Neural Network (GNN) models that address challenges like data imbalance, noise, privacy, and out-of-distribution scenarios to improve their reliability and robustness in real-world applications.
http://arxiv.org/abs/2403.04460v1
Compressor summary: PEARL is a new conversational recommendation dataset with detailed persona and knowledge that improves recommendation quality and relevance.
http://arxiv.org/abs/2403.04454v1
Compressor summary: The authors present CLSum, a dataset for summarizing multi-jurisdictional common law court judgments, and propose LLM-based methods for data augmentation, summary generation, and evaluation.
http://arxiv.org/abs/2403.04453v1
Compressor summary: Vlearn is an off-policy reinforcement learning algorithm that uses only a state-value function as the critic, making it efficient and robust in high-dimensional action spaces.
http://arxiv.org/abs/2403.04449v1
Compressor summary: The paper evaluates GPT-4 Turbo's ability to provide feedback for student programming submissions, finding improvements over GPT-3.5 in correctness and structure but also noting some inconsistencies.
http://arxiv.org/abs/2403.04447v1
Compressor summary: The paper introduces a novel fuzzy rough rule induction algorithm (FRRI) that creates interpretable white box models by combining fuzzy and rough set theory, and shows its superior performance in accuracy and rule length compared to other methods.
http://arxiv.org/abs/2403.04445v1
Compressor summary: The text discusses how sociodemographic characteristics like socioeconomic status, ethnicity, and geography affect NLP performance and calls for their inclusion in future language technologies to avoid disadvantaging less-privileged groups.
http://arxiv.org/abs/2403.04444v1
Compressor summary: The paper proposes a new method (DDHPose) for 3D human pose estimation that disentangles the pose into length and direction, uses a hierarchical denoiser to model joint spatial and temporal information, and improves performance over previous methods.
http://arxiv.org/abs/2403.04443v1
Compressor summary: The paper proposes FriendNet, a novel architecture that combines image restoration and object detection to improve performance in adverse weather conditions for autonomous driving systems.
http://arxiv.org/abs/2403.04442v1
Compressor summary: The paper proposes a collaborative Bayesian optimization problem where two agents work together to optimize a black-box function with each controlling one variable, using strategic planning and a user model to find the global maximum efficiently.
http://arxiv.org/abs/2403.04437v1
Compressor summary: StableDrag is a new framework that improves point-based image editing by using a discriminative point tracking method and a confidence-based latent enhancement strategy to address the issues of inaccurate tracking and incomplete supervision.
http://arxiv.org/abs/2403.04429v1
Compressor summary: The paper evaluates how dimensionality reduction techniques improve unsupervised time series anomaly detection models' performance and efficiency across different datasets.
http://arxiv.org/abs/2403.04417v1
Compressor summary: The text discusses how to use surrogate models to reduce computational costs and increase efficiency for large-scale Agent-Based Models (ABMs) in Social Health Computational Sciences.
http://arxiv.org/abs/2403.04400v1
Compressor summary: The paper proposes a new challenge (C2Gen NLI) to test neural models' compositional inference abilities under continual learning, and analyzes how different algorithms and subtask ordering affect performance.
http://arxiv.org/abs/2403.04398v1
Compressor summary: The paper proposes a method called MAGR to reduce forgetting in continual assessment of diverse skills by aligning old and new features with quality scores.
http://arxiv.org/abs/2403.04385v1
Compressor summary: This paper studies how different visual characteristics of input Earth observation data affect land cover classification models' performance and finds that texture distortions have a greater impact than color distortions.
http://arxiv.org/abs/2403.04382v1
Compressor summary: Acceleron is a tool that uses large language models to help researchers formulate novel research proposals and validate their motivation by identifying gaps in existing literature.
http://arxiv.org/abs/2403.04381v1
Compressor summary: The paper proposes a method to adapt single-view hand pose estimation models to dual views without extra annotations, allowing them to work with different camera settings.
http://arxiv.org/abs/2403.04380v1
Compressor summary: The paper proposes a new method to animate realistic 3D head models using video input without requiring personal information, by using a neural network that translates expression features into animation parameters.
http://arxiv.org/abs/2403.04376v1
Compressor summary: The paper studies how Chinese speakers omit noun markers based on context, builds a corpus with annotations, and tests various machine learning models to predict the missing markers' meanings.
http://arxiv.org/abs/2403.04369v1
Compressor summary: The FWGB approach uses a legal knowledge graph and attention mechanism to predict confusing criminal charges by focusing on constituent elements that distinguish them.
http://arxiv.org/abs/2403.04368v1
Compressor summary: The paper proposes a method to remove wrinkled transparent films from images using polarized cameras and neural networks, improving image quality and industrial recognition systems performance.
http://arxiv.org/abs/2403.04366v1
Compressor summary: The paper proposes a novel Knowledge Injection and Guidance (KIG) approach to improve court view generation using pretrained language models, achieving better results especially in handling responsive claims.
http://arxiv.org/abs/2403.04363v1
Compressor summary: MT-Track is a new efficient and streamlined framework for UAV tracking that uses temporal modeling in two steps: correlation map generation and refinement, to handle challenges like fast motion and small objects.
http://arxiv.org/abs/2403.04353v1
Compressor summary: The text proposes a new EEG-based motor imagery classification method using topological maps, spatial features, and spatiotemporal pooling to improve accuracy.
http://arxiv.org/abs/2403.04343v1
Compressor summary: The text proposes a novel algorithm for balancing tasks when tuning large multimodal models using visual instructions.
http://arxiv.org/abs/2403.04337v1
Compressor summary: The paper explores using eXplainable Artificial Intelligence (XAI) to identify redundant memory writes in embedded systems, which can improve performance and energy efficiency.
http://arxiv.org/abs/2403.04325v1
Compressor summary: The Composition Score is a new model-based metric that measures how much smaller language units combine to form phrases and sentences' meanings, and it relates to brain regions involved in different aspects of this process.
http://arxiv.org/abs/2403.04321v1
Compressor summary: The paper proposes a discriminative adapter for text-to-image generation that improves alignment between generated images and text prompts by enhancing the model's discriminative abilities.
http://arxiv.org/abs/2403.04317v1
Compressor summary: The paper introduces MAC, an efficient and effective online adaptation framework for large language models that uses amortized feature extraction and memory-augmentation to store new information and answer questions without gradient updates.
http://arxiv.org/abs/2403.04314v1
Compressor summary: The text proposes a new evaluation toolkit for conversational systems that assesses semantic understanding by measuring negation and implicature, and suggests pre-training with augmented data to improve embedding models.
http://arxiv.org/abs/2403.04311v1
Compressor summary: ALTO is a network orchestrator that improves the efficiency and performance of compound AI systems like language models by streaming intermediate outputs between stages.
http://arxiv.org/abs/2403.04309v1
Compressor summary: The paper proposes AO-DETR, a method to detect overlapping prohibited items in X-ray images using category-specific queries and edge localization, outperforming existing object detectors.
http://arxiv.org/abs/2403.04307v1
Compressor summary: HaluEval-Wild is a new benchmark for evaluating large language models' hallucinations in real-world user-LLM interactions by using challenging queries from existing datasets and categorizing them into five types.
http://arxiv.org/abs/2403.04306v1
Compressor summary: This article evaluates the performance of large vision-language models (LVLMs) in specialized tasks like object detection and healthcare, as well as general tasks like reasoning and question answering, finding that they are not very effective in either domain.
http://arxiv.org/abs/2403.04303v1
Compressor summary: LORS reduces parameter usage in deep learning models by allowing stacked modules to share most of their parameters.
http://arxiv.org/abs/2403.04294v1
Compressor summary: A$^{3}$lign-DFER is a method that aligns text and video in three aspects (affective, dynamic, and bidirectional) to improve CLIP's performance in recognizing facial expressions dynamically.
http://arxiv.org/abs/2403.04293v1
Compressor summary: The paper proposes MKF-IDS, an anomaly-based intrusion detection system for CAN bus in ITSs, using spatial-temporal correlation with attention mechanism and patch sparse-transformer modules to improve safety and security.
http://arxiv.org/abs/2403.04292v1
Compressor summary: The paper discusses AI challenges in image categorization and generation, proposes incorporating cybernetics and analog control processes for improved cognition and abstraction, and introduces the Ouroboros Model as a versatile algorithmic backbone for general cognition.
http://arxiv.org/abs/2403.04283v1
Compressor summary: Proxy-RLHF is a new method that lowers the computational cost of aligning large language models with human values by decoupling generation and alignment processes using a proxy model trained with reinforcement learning.
http://arxiv.org/abs/2403.04280v1
Compressor summary: The paper proposes a comprehensive benchmark for Arabic speech recognition in telephone conversations, considering dialectal diversity and call quality challenges.
http://arxiv.org/abs/2403.04279v1
Compressor summary: The text reviews controllable generation with text-to-image diffusion models, covering their theoretical foundations and practical advancements in various condition categories.
http://arxiv.org/abs/2403.04272v1
Compressor summary: AGCD uses adaptive sampling to select valuable novel samples for labeling, and a stable label mapping algorithm to ensure consistent training across different stages, improving GCD performance in the low-labeling regime.
http://arxiv.org/abs/2403.04264v1
Compressor summary: The paper studies a facility location problem with routing constraints in a competitive market, proposes new cuts for solving it, and develops exact and heuristic methods that outperform existing approaches.
http://arxiv.org/abs/2403.04261v1
Compressor summary: Biomedical text mining is a rapidly growing field that leverages advanced technology to analyze vast amounts of diverse text data from various sources, with community challenges promoting innovation and collaboration in Chinese biomedical research.
http://arxiv.org/abs/2403.04258v1
Compressor summary: The paper proposes a test-time training method for zero-shot video object segmentation that predicts consistent depth maps and uses momentum-based weight initialization and looping-based training scheme to achieve better results.
http://arxiv.org/abs/2403.04253v1
Compressor summary: Recall to Imagine (R2I) integrates state space models into world models of reinforcement learning agents to improve long-term memory and credit assignment, achieving superhuman performance in complex memory tasks and faster convergence than DreamerV3.
http://arxiv.org/abs/2403.04247v1
Compressor summary: The paper introduces negative seed entities to improve Entity Set Expansion (ESE) for ultra-fine-grained semantic classes, proposes two frameworks to assess models in this task, and suggests three strategies to enhance model performance.
http://arxiv.org/abs/2403.04236v1
Compressor summary: The paper proposes Regularized DeepIV (RDIV), a minimax-oracle-free method that avoids limitations in instrumental variable estimation using machine learning and provides rigorous guarantees for the popular DeepIV method with Tikhonov regularization.
http://arxiv.org/abs/2403.04233v1
Compressor summary: DEEP-ICL is a new method that uses task definitions to improve in-context learning without relying on large language models.
http://arxiv.org/abs/2403.04228v1
Compressor summary: The paper proposes a new method for reconstructing HDR images from multiple low dynamic range images in dynamic scenes, using single-frame HDR reconstruction and enhanced stop image techniques to preserve details and avoid ghosting artifacts.
http://arxiv.org/abs/2403.04225v1
Compressor summary: The 3DTextureTransformer is a novel framework that generates high-quality textures for arbitrary mesh topologies using a hybrid approach of geometric deep learning and StyleGAN-like architecture, achieving state-of-the-art performance in this domain.
http://arxiv.org/abs/2403.04224v1
Compressor summary: The paper proposes to train aligner models that can quickly and safely align any large language model for a given criterion using synthetic data.
http://arxiv.org/abs/2403.04222v1
Compressor summary: The study explores how open-source Large Language Models can evaluate their own output using glass-box features, such as softmax distribution, and incorporating reference features to improve quality evaluation.
http://arxiv.org/abs/2403.04221v1
Compressor summary: The paper explores how causal modelling can enhance reinforcement learning in online and offline settings, especially when learning from other agents' experiences.
http://arxiv.org/abs/2403.04212v1
Compressor summary: The text proposes a new framework called PESS that can automatically infer persona from dialogues, which helps improve emotional support in conversations with chatbots.
http://arxiv.org/abs/2403.04206v1
Compressor summary: The paper proposes a new algorithm for fast distributed training of deep learning models with better convergence and quality, and shows its effectiveness in theoretical and experimental studies.
http://arxiv.org/abs/2403.04204v1
Compressor summary: Key points: - Big AI models have impressive results but also potential risks - Alignment technologies aim to make models conform to human preferences and values - Survey paper investigates value alignment approaches and their challenges, categories, connections, and frontiers Summary: The paper surveys various methods and issues related to aligning AI models with human values, exploring historical context, mathematical essence, existing techniques, emerging topics, and future directions.
http://arxiv.org/abs/2403.04200v1
Compressor summary: The paper introduces Atrous Attention, a hybrid of regional and sparse attention, which adapts to local and global information while preserving hierarchical relations. It also presents ACC-ViT, a vision transformer backbone that achieves high accuracy on ImageNet-1K with fewer parameters than existing models.
http://arxiv.org/abs/2403.04198v1
Compressor summary: The paper proposes CN-RMA, a new method for detecting 3D indoor objects from multiple images, by combining 3D reconstruction and object detection networks to handle occlusion issues.
http://arxiv.org/abs/2403.04197v1
Compressor summary: ICMA is a new method for adapting large language models to the molecule-caption translation task using context examples, improving their performance without extra pre-training or data.
http://arxiv.org/abs/2403.04195v1
Compressor summary: The text discusses applying deep reinforcement learning techniques to determine optimal reservoir operation policies, focusing on two methods that perform well for a case study of Folsom Reservoir in California.
http://arxiv.org/abs/2403.04194v1
Compressor summary: The paper explores using the Segment Anything Model (SAM) for video object segmentation by iteratively refining bounding box prompts to handle position, size, and occlusion variations.
http://arxiv.org/abs/2403.04190v1
Compressor summary: This paper explores how large language models can create realistic synthetic data for low-resource AI challenges, detailing methods, evaluations, and applications, while acknowledging limitations and suggesting future directions.
http://arxiv.org/abs/2403.04183v1
Compressor summary: The paper proposes Refer-VI-ReID, a method to match visible images using infrared samples and textual descriptions, with a new Y-Y-shape structure and a cross-modal re-ranking algorithm that improves performance on three datasets.
http://arxiv.org/abs/2403.04182v1
Compressor summary: Metric-aware LLM inference is a new method to optimize NLP task performance by adjusting inference strategies based on evaluation metrics.
http://arxiv.org/abs/2403.04180v1
Compressor summary: RATSF is a framework that uses RACA, a cross-attention module, to improve customer service volume forecasting by leveraging historical data effectively in non-stationary scenarios.
http://arxiv.org/abs/2403.04178v1
Compressor summary: The paper proposes a TTS system that incorporates stress into translations, addressing the challenge of language diversity and accessibility in India's education sector.
http://arxiv.org/abs/2403.04173v1
Compressor summary: Key points: - The paper proposes a new method for image compression called SA-ICM that encodes and decodes only the edge information of object parts in an image - SA-ICM is based on a LIC model trained using edge information created by Segment Anything, which can be applied to various image recognition tasks and video compression models - SA-ICM has advantages in terms of privacy protection, robustness, and performance for image recognition and video compression Summary: The paper introduces SA-ICM, a novel image compression technique that preserves only the edge information of object parts, which improves image recognition and video compression tasks while protecting privacy.
http://arxiv.org/abs/2403.04172v1
Compressor summary: SDPL is a part-based representation learning method for cross-view geo-localization that divides images into multiple parts to explore contextual information while maintaining global structure and handling position shifting and scale variations.
http://arxiv.org/abs/2403.04164v1
Compressor summary: The paper proposes a method to improve medical image segmentation using adaptive prompts and pattern shifting without fine-tuning the large SAM model, achieving competitive results.
http://arxiv.org/abs/2403.04162v1
Compressor summary: NoisySAN is a novel exploration strategy for deep RL using spiking neural networks that introduces and reduces noise to achieve better performance on various tasks.
http://arxiv.org/abs/2403.04161v1
Compressor summary: SWAP-Score is a novel training-free metric that measures network expressivity and has strong correlation with performance, outperforming existing metrics in Neural Architecture Search.
http://arxiv.org/abs/2403.04158v1
Compressor summary: DA-Net is a new method to transfer knowledge from multiple languages to another language by disentangling inputs and adapting class distributions, improving performance on three tasks and 38 languages.
http://arxiv.org/abs/2403.04154v1
Compressor summary: Key points: - The paper proposes a method to train deep neural networks with policy gradient and SDEs for generating samples with high rewards. - The method constrains the SDE to be consistent with its perturbation process, which covers the entire space and is easy to sample. - The method improves stability and sample complexity of policy gradients and applies it to structure-based drug design. Summary: The paper presents a stable and efficient method to train deep neural networks with SDEs and policy gradient for generating high-reward samples, and demonstrates its effectiveness on structure-based drug design.
http://arxiv.org/abs/2403.04151v1
Compressor summary: Key points: - FSAD is important for industrial manufacturing but existing methods have limitations - The paper proposes a DFD network that uses frequency domain to detect and locate subtle anomalies at image-level and feature-level - DFD outperforms state-of-the-art methods on benchmarks Summary: The paper presents a novel FSAD method, DFD, that leverages frequency domain to detect and locate inconspicuous anomalies in industrial manufacturing using a dual-path feature discrimination module.
http://arxiv.org/abs/2403.04149v1
Compressor summary: The paper proposes MAP, a method to protect intellectual property of deep learning models by pruning target-related parameters without accessing unauthorized data.
http://arxiv.org/abs/2403.04140v1
Compressor summary: The text introduces a new method for FSCIL that uses G2G interaction to preserve local geometric structure and mitigate catastrophic forgetting in few-shot learning.
http://arxiv.org/abs/2403.04135v1
Compressor summary: The paper proposes a method for unsupervised learning of harmonic analysis using a hidden semi-Markov model and chord quality templates, which can recognize tonic without prior knowledge.
http://arxiv.org/abs/2403.04133v1
Compressor summary: nuPlan is a new dataset and benchmark for testing machine learning-based planners in autonomous vehicles across diverse driving situations.
http://arxiv.org/abs/2403.04132v1
Compressor summary: Chatbot Arena is an open platform that evaluates large language models based on human preferences using pairwise comparisons and crowdsourcing, providing a credible and widely cited leaderboard for LLMs.
http://arxiv.org/abs/2403.04130v1
Compressor summary: The text describes a custom Explainable Artificial Intelligence framework that uses multiple techniques to improve transparency and accuracy in healthcare systems, particularly in brain tumor detection.
http://arxiv.org/abs/2403.04125v1
Compressor summary: ComFe is an interpretable image classification approach that uses transformer-decoder and mixture modelling to identify and use consistent image components for accurate predictions.
http://arxiv.org/abs/2403.04124v1
Compressor summary: The paper proposes a framework to improve both privacy and generalization of large language models by controlling the flatness of their loss landscape.
http://arxiv.org/abs/2403.04121v1
Compressor summary: LLMs do not seem to be able to correct their own mistakes like humans can.
http://arxiv.org/abs/2403.04120v1
Compressor summary: Data augmentation improves computer vision models but may introduce class-specific biases; this study examines these biases across various datasets and model types, suggesting a nuanced approach to model selection and a refined method for managing DA-induced biases.