This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-21 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.13255v1
Compressor summary: This paper provides a comprehensive overview of the evolution of Simultaneous Localization and Mapping (SLAM) research, focusing on recent advancements in radiance fields for autonomous exploration.
http://arxiv.org/abs/2402.13254v1
Compressor summary: CounterCurate is a framework that improves multimodal models' ability to reason with images and text by addressing gaps in physically grounded reasoning and semantic counterfactual fine-tuning.
http://arxiv.org/abs/2402.13253v1
Compressor summary: BiMediX is a bilingual medical LLM that interacts in English and Arabic, using a translation pipeline and a large bilingual dataset to outperform existing models on various tasks.
http://arxiv.org/abs/2402.13252v1
Compressor summary: Key points: - Algorithm for joint refinement of camera pose and scene geometry using 2D images - Pilot study on 1D signal and comparison to 3D scenarios - Convolutional Gaussian filters for coarse-to-fine training schedule - Decomposed low-rank tensor for efficient 3D convolution approximation - Techniques for improved robustness and stability of joint optimization Summary: The paper presents an algorithm that refines camera pose and scene geometry from 2D images using decomposed low-rank tensors, convolutional Gaussian filters, and techniques to improve optimization.
http://arxiv.org/abs/2402.13250v1
Compressor summary: Video ReCap is a recursive video captioning model that can handle videos of different lengths and output captions at multiple hierarchy levels.
http://arxiv.org/abs/2402.13249v1
Compressor summary: The study finds that large language models (LLMs) generate factually inconsistent summaries in dialogue domains, and existing LLMs are not effective at evaluating factual consistency.
http://arxiv.org/abs/2402.13243v1
Compressor summary: VADv2 is an end-to-end driving model that uses probabilistic planning to handle uncertainty and achieve state-of-the-art performance with only camera sensors.
http://arxiv.org/abs/2402.13233v1
Compressor summary: Key points: - IoT applications use ML to analyze time series data from interconnected sensors - Distribution shift can degrade model performance - SMORE is a novel DA algorithm for multi-sensor time series classification that uses hyperdimensional computing - SMORE improves accuracy, reduces training and inference time compared to SOTA DNN-based algorithms Summary: SMORE is a resource-efficient domain adaptation algorithm that leverages hyperdimensional computing to improve multi-sensor time series classification performance and reduce time complexity.
http://arxiv.org/abs/2402.13232v1
Compressor summary: The paper introduces a new dataset of vision-touch pairs with language labels, and uses it to train a model that improves touch-vision-language alignment in text generation and classification tasks.
http://arxiv.org/abs/2402.13231v1
Compressor summary: Our study shows that large language models have stronger cultural alignment when prompted with dominant languages or a mixture of languages from a specific culture, but misalignment can occur for underrepresented personas and sensitive topics.
http://arxiv.org/abs/2402.13228v1
Compressor summary: DPOP improves DPO by avoiding a loss of preferred examples and achieves state-of-the-art performance on various tasks and datasets, including high edit distance ones.
http://arxiv.org/abs/2402.13225v1
Compressor summary: AgentMD is a language agent that can curate and apply clinical calculators to improve healthcare analytics and patient care by overcoming usability challenges, poor dissemination, and restricted functionality of existing tools.
http://arxiv.org/abs/2402.13222v1
Compressor summary: RoCode is a Romanian coding dataset that evaluates and fine-tunes language models for non-English languages.
http://arxiv.org/abs/2402.13219v1
Compressor summary: The paper evaluates an AI-based decision support system for complex industrial processes, which reduces operator workload, improves situational awareness, and adapts interventions to both system and human performance.
http://arxiv.org/abs/2402.13220v1
Compressor summary: MAD-Bench is a new benchmark that tests how well large language models handle deceptive information and finds significant performance gaps between them, suggesting the need for improvement.
http://arxiv.org/abs/2402.13217v1
Compressor summary: VideoPrism is a versatile video encoder that uses pretraining on large video-caption and noisy text datasets to excel at various video understanding tasks.
http://arxiv.org/abs/2402.13213v1
Compressor summary: The study investigates whether overconfidence in large language models can be reduced by using their maximum softmax probabilities to selectively abstain from multiple-choice answers, finding evidence that this improves performance.
http://arxiv.org/abs/2402.13212v1
Compressor summary: Soft Self-Consistency is a method to improve large language models by selecting the best answer from multiple solutions using a continuous score based on model likelihoods, resulting in better performance and efficiency on interactive tasks.
http://arxiv.org/abs/2402.13211v1
Compressor summary: This paper analyzes large language models' challenges in providing emotional support through conversation and explores ways to improve their effectiveness.
http://arxiv.org/abs/2402.13210v1
Compressor summary: The authors propose using a Bayesian reward model to reduce errors in LLM responses caused by overoptimizing rewards and to improve the quality of non-toxic and helpful responses.
http://arxiv.org/abs/2402.13208v1
Compressor summary: The authors propose ConfHyena, a Conformer model that uses Hyena attention instead of standard self-attention for speech processing, achieving faster training times with minimal impact on performance.
http://arxiv.org/abs/2402.13196v1
Compressor summary: The paper presents a method for testing conditional independence that balances false positive rate and power using kernel ridge regression and bias control techniques.
http://arxiv.org/abs/2402.13188v1
Compressor summary: The paper proposes a novel approach for temporal knowledge graph question answering that calibrates question representations and models multi-hop relationships using a graph neural network, achieving better performance and interpretability than previous models.
http://arxiv.org/abs/2402.13187v1
Compressor summary: The paper studies how to test if a binary prediction model is well-calibrated using property testing techniques and efficient algorithms, and explores different aspects of calibration measurement.
http://arxiv.org/abs/2402.13185v1
Compressor summary: UniEdit is a text-guided framework for video editing that supports both motion and appearance editing by leveraging temporal and spatial self-attention layers.
http://arxiv.org/abs/2402.13184v1
Compressor summary: The study introduces CosmoAgent, a framework using LLMs to simulate interactions between human and alien civilizations, considering risks and diversity in cosmologies, ethics, and worldviews, for peaceful coexistence and conflict resolution.
http://arxiv.org/abs/2402.13182v1
Compressor summary: The text proposes an algorithm for distributed kernel bandits that minimizes regret by allowing agents to share information and use uniform exploration and shared randomness.
http://arxiv.org/abs/2402.13178v1
Compressor summary: MIRAGE is a benchmark to evaluate retrieval-augmented generation (RAG) systems for medical question answering (QA), showing that different corpora and retrievers improve LLMs' performance by up to 18%.
http://arxiv.org/abs/2402.13152v1
Compressor summary: AnnoTheia is a semi-automatic annotation toolkit that helps researchers create audio-visual speech technologies for low-resource languages by detecting when someone speaks and transcribing their words.
http://arxiv.org/abs/2402.13148v1
Compressor summary: ICAG is an adversarial game that improves LLM defenses against jailbreak attacks without fine-tuning and shows transferability to other models.
http://arxiv.org/abs/2402.13147v1
Compressor summary: The paper proposes a new method for offline imitation learning that uses inverse soft-Q learning to align the learned rewards with expert demonstrations and avoid over-fitting.
http://arxiv.org/abs/2402.13146v1
Compressor summary: OLViT is a novel video dialog model that uses multi-modal attention to track objects and language co-references, improving performance on response classification and generation tasks.
http://arxiv.org/abs/2402.13145v1
Compressor summary: The paper presents a large annotated Chinese Metaphor Corpus and proposes a novel approach to metaphor generation that emphasizes grounds, resulting in more realistic and creative metaphors.
http://arxiv.org/abs/2402.13144v1
Compressor summary: The authors show how to generate high-performing neural network parameters using an autoencoder and a latent diffusion model, achieving comparable or improved performance over trained networks.
http://arxiv.org/abs/2402.13137v1
Compressor summary: The study examines how transformer language adapters work, revealing that adaptation occurs gradually across layers and in specific layers for target languages.
http://arxiv.org/abs/2402.13131v1
Compressor summary: The paper presents a new tool that allows non-experts to explore and manipulate statistical shape models of faces in a browser, using partial observations.
http://arxiv.org/abs/2402.13130v1
Compressor summary: TMFT improves ELECTRA's sentence embeddings for semantic textual similarity tasks, making them comparable to BERT in efficiency and performance.
http://arxiv.org/abs/2402.13125v1
Compressor summary: TreeEval is a benchmark-free method for evaluating large language models by using a high-performance LLM to ask questions under a topic with a tree planning strategy, avoiding data leakage and ensuring evaluation completeness and efficiency.
http://arxiv.org/abs/2402.13122v1
Compressor summary: CoRTe is a method that trains semantic segmentation models on unlabelled datasets using black-box source model predictions and pseudo-labelling, achieving good results in synthetic-to-real settings.
http://arxiv.org/abs/2402.13116v1
Compressor summary: Key points: - The survey explores knowledge distillation (KD) techniques for transferring advanced capabilities from proprietary LLMs to open-source ones - The survey organizes KD around three pillars: algorithm, skill, and verticalization - The survey highlights the role of data augmentation (DA) in enhancing KD performance - The survey provides guidance for researchers and practitioners and suggests future directions Summary: The survey examines how knowledge distillation with data augmentation can transfer sophisticated functionalities from proprietary to open-source large language models, covering algorithmic, skill-based, and vertical aspects.
http://arxiv.org/abs/2402.13114v1
Compressor summary: BuffGraph improves minor class representation in class-imbalanced graph data by inserting buffer nodes that modulate the impact of majority classes on node classification.
http://arxiv.org/abs/2402.13113v1
Compressor summary: The text explores how restart-incremental Transformers handle ambiguity in sentences and shows the benefits of bidirectional encoders and dependency parsing for revision.
http://arxiv.org/abs/2402.13109v1
Compressor summary: CIF-Bench is a new benchmark to test the generalization ability of large language models in Chinese, revealing their limitations in handling complex reasoning and cultural nuances.
http://arxiv.org/abs/2402.13108v1
Compressor summary: The paper investigates why linear neural networks trained with quadratic loss show Edge of Stability behavior and how the learning rate affects convergence.
http://arxiv.org/abs/2402.13103v1
Compressor summary: MUDRA is a new method that improves FLDA by handling multivariate and incomplete data using an efficient algorithm and shows better results in predicting articulatory word recognition.
http://arxiv.org/abs/2402.13101v1
Compressor summary: The paper proposes a hybrid surrogate model that combines graph neural networks and microscopic constitutive models to simulate the mechanical response of advanced materials with concurrent multiscale models, improving accuracy and computational efficiency.
http://arxiv.org/abs/2402.13098v1
Compressor summary: The paper proposes ELAD, a framework that uses active learning and explanation-guided sample selection to improve Large Language Models' knowledge distillation efficiency and performance.
http://arxiv.org/abs/2402.13094v1
Compressor summary: The study compared different ways of measuring how well texts are understood by people with and without intellectual disabilities after being simplified either automatically or manually.
http://arxiv.org/abs/2402.13093v1
Compressor summary: Event-level knowledge editing updates large language models by adding new events, improving efficiency and completeness over factual triplet-level editing.
http://arxiv.org/abs/2402.13089v1
Compressor summary: This paper examines how different design choices in Mixture of Experts models affect validation performance and finds that sequence-level routing leads to topic-specific expert specialization, while token-level routing results in syntactic specialization.
http://arxiv.org/abs/2402.13088v1
Compressor summary: Slot-VLM is a framework that generates semantically decomposed video tokens to facilitate language model inference for video question-answering.
http://arxiv.org/abs/2402.13087v1
Compressor summary: The paper investigates the privacy implications of hyper-parameter tuning and proposes an improved solution with a tighter privacy bound.
http://arxiv.org/abs/2402.13081v1
Compressor summary: The paper proposes using statistical learning methods to detect attacks in IT infrastructure based on continuous measurements, and compares HMM and LSTM for prediction accuracy and resource requirements.
http://arxiv.org/abs/2402.13077v1
Compressor summary: Mechanistic Neural Networks use a new block to learn governing equations and dynamics from data, improving interpretability and efficiency for scientific machine learning tasks.
http://arxiv.org/abs/2402.13064v1
Compressor summary: GLAN is a method that uses a pre-curated human knowledge taxonomy to generate diverse and task-agnostic instructions for Large Language Models.
http://arxiv.org/abs/2402.13061v1
Compressor summary: Logits-MMD is a new framework for machine learning that improves fairness by using Maximum Mean Discrepancy on output logits, outperforming previous methods on facial and animal recognition datasets.
http://arxiv.org/abs/2402.13058v1
Compressor summary: Evidence Pattern Reasoning Model (EPRM) is a new evidential decision making model that can better fit different tasks by setting preferences and using Random Graph Set to model complex relationships, improving aircraft velocity ranking in an experiment.
http://arxiv.org/abs/2402.13055v1
Compressor summary: The authors analyze how attention heads in large language models encode syntactic and knowledge graph relations, and find a link between these semantic induction heads and the in-context learning ability of LLMs.
http://arxiv.org/abs/2402.13048v1
Compressor summary: StableKE is a novel method that enhances language models with diverse and contextual knowledge descriptions to improve their editing performance and stability without oversimplifying the model's interconnected knowledge structure.
http://arxiv.org/abs/2402.13043v1
Compressor summary: The text proposes a method to improve few-shot dialogue state tracking using conversation summaries and a lightweight encoder for query embeddings, which is more scalable than previous approaches.
http://arxiv.org/abs/2402.13040v1
Compressor summary: TGM-DLM is a novel method that uses diffusion models to generate molecules from text descriptions more effectively than autoregressive methods, without requiring extra data resources.
http://arxiv.org/abs/2402.13037v1
Compressor summary: AILOT is an offline reinforcement learning method that uses optimal transport to learn from expert trajectories without explicit rewards or action labels.
http://arxiv.org/abs/2402.13036v1
Compressor summary: SiLLM is a new method for simultaneous machine translation that uses separate agents to handle policy-decision and translation, leveraging the capabilities of large language models.
http://arxiv.org/abs/2402.13035v1
Compressor summary: The paper proposes a new prompt for training language models to improve their self-correction abilities in mathematical reasoning without relying on external feedback.
http://arxiv.org/abs/2402.13033v1
Compressor summary: Hyperedge Augmentation is a new method to improve Graph Neural Networks by creating virtual hyperedges from raw data and extracting features from them, addressing the limitations of existing graph augmentation techniques.
http://arxiv.org/abs/2402.13028v1
Compressor summary: The paper proposes a novel word-level heterogeneous graph model for fact checking that leverages both unstructured and structured evidence data to reason about claim veracity.
http://arxiv.org/abs/2402.13025v1
Compressor summary: CFEVER is a Chinese dataset for Fact Extraction and VERification with 30,012 labeled claims from Chinese Wikipedia.
http://arxiv.org/abs/2402.13022v1
Compressor summary: The paper introduces a new AI model, SoMeLVLM, which can handle various social media tasks by understanding and generating realistic behavior using five key capabilities.
http://arxiv.org/abs/2402.13019v1
Compressor summary: The paper proposes a new neurosymbolic technique for supervised multi-label classification, called semantic conditioning at inference, which improves accuracy and resource efficiency while preserving semantic consistency.
http://arxiv.org/abs/2402.13016v1
Compressor summary: Imbalanced label distribution across languages in multilingual classification datasets negatively affects transformer-based LLMs, but language-specific class weighing can help mitigate these issues.
http://arxiv.org/abs/2402.13013v1
Compressor summary: The study improves code-focused LLMs' performance by introducing a new method to generate comments for existing code and filtering poorly correlated data, resulting in better performance on programming skill benchmarks.
http://arxiv.org/abs/2402.13007v1
Compressor summary: The proposed "model pool" method improves data distillation by selecting diverse models based on probabilities and applying knowledge distillation to test results, enhancing generalizability and performance.
http://arxiv.org/abs/2402.13006v1
Compressor summary: The study investigates how noise affects explanations from AI models and finds that high uncertainty doesn't always mean low plausibility, and some methods are more robust to perturbation than others.
http://arxiv.org/abs/2402.13004v1
Compressor summary: The study compares conventional DNN-HMM and state-of-the-art CTC/Attention decoders for Visual Speech Recognition, showing that the former outperforms the latter in data-scarce scenarios with less training time and fewer parameters.
http://arxiv.org/abs/2402.12998v1
Compressor summary: The study finds a tradeoff between word length and phonotactic complexity in Dutch and Min dialects, suggesting that complex languages simplify in another dimension.
http://arxiv.org/abs/2402.12991v1
Compressor summary: The paper proposes a method called TRAP to identify whether a third-party app uses a specific large language model through its chat function by using adversarial suffixes.
http://arxiv.org/abs/2402.12987v1
Compressor summary: The paper proposes a regularization technique for mitigating catastrophic forgetting in incremental graph-related tasks with structural shifts.
http://arxiv.org/abs/2402.12984v1
Compressor summary: The paper proposes GraphAdapter, a framework that uses a graph neural network and large language models to efficiently model text-attributed graphs for various applications.
http://arxiv.org/abs/2402.12976v1
Compressor summary: The authors study multilingual in-context learning, finding that the effectiveness of demonstrations varies greatly depending on models, tasks, and languages, and suggesting that their importance might be overestimated.
http://arxiv.org/abs/2402.12974v1
Compressor summary: The authors propose a novel method for text-to-image generation that maintains specific style elements without fine-tuning, achieving better results than existing approaches.
http://arxiv.org/abs/2402.12969v1
Compressor summary: Gl'orIA is a large European Portuguese language model pre-trained on a diverse corpus that excels in language modeling and generates high-quality text.
http://arxiv.org/abs/2402.12968v1
Compressor summary: MapTrack is a robust multi-object tracker that uses probability maps, prediction maps, and covariance adaptive Kalman filters to handle occlusions and crowds in real time.
http://arxiv.org/abs/2402.12954v1
Compressor summary: CLMPT is a novel neural model for complex query answering over knowledge graphs, which considers node types in message passing and uses self-attention to model logical dependencies.
http://arxiv.org/abs/2402.12948v1
Compressor summary: The Logits-Addition watermark, a new type of GumbelMax-trick-based watermark, enhances generation diversity in large language models and outperforms other decoding-based watermarking methods.
http://arxiv.org/abs/2402.12946v1
Compressor summary: The paper presents a new method for classifying nuclei in histopathology images using a cell graph transformer that learns node and edge features and improves classification performance.
http://arxiv.org/abs/2402.12940v1
Compressor summary: The text introduces NOTA, a modified version of CODA* guidelines for writing Tunisian Arabic using the Arabic script, aiming to accurately represent its unique features and improve language resource development.
http://arxiv.org/abs/2402.12939v1
Compressor summary: The paper proposes a method to analyze and improve DRL policies by using dimensionality reduction and trajectory clustering on neural network latent spaces.
http://arxiv.org/abs/2402.12938v1
Compressor summary: The paper presents UniCell, a framework for multi-class cell nucleus recognition that uses prompt learning to leverage shared knowledge across datasets, improving histopathological diagnosis.
http://arxiv.org/abs/2402.12937v1
Compressor summary: Key points: - The paper introduces GRAPHGINI, a method for incorporating the Gini coefficient as a measure of fairness in graph neural networks (GNNs). - GRAPHGINI works with both individual and group fairness goals in a single system, while maintaining high prediction accuracy. - GRAPHGINI uses learnable attention scores and a maximum Nash social welfare constraint to enforce fairness constraints. - GRAPHGINI outperforms state-of-the-art methods on real-world datasets in terms of individual fairness, utility, and group equality. Summary: GRAPHGINI is a novel method for ensuring fairness in GNNs using the Gini coefficient as a differentiable approximation of fairness constraints. It balances individual and group fairness goals with high prediction accuracy and surpasses existing methods on real datasets.
http://arxiv.org/abs/2402.12930v1
Compressor summary: Syflow is an approach to find and describe exceptional sub-populations using normalizing flows and a novel neural layer for interpretable results.
http://arxiv.org/abs/2402.12927v1
Compressor summary: The paper explores using pre-trained vision-language models with adaptation methods for detecting deepfakes, finding that retaining the textual component of CLIP improves performance and reduces data requirements.
http://arxiv.org/abs/2402.12923v1
Compressor summary: The paper reviews recent deep learning methods for condition monitoring using 3D point clouds, focusing on defect shape classification and segmentation in industrial applications.
http://arxiv.org/abs/2402.12921v1
Compressor summary: RioT is a method to help deep time series models avoid misleading results by interacting with model explanations in both time and frequency domains, guiding them away from confounding factors.
http://arxiv.org/abs/2402.12916v1
Compressor summary: The paper explores how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline, aiming for better results in machine learning tasks and adapting to the ever-changing data landscape.
http://arxiv.org/abs/2402.12914v1
Compressor summary: The paper presents ReHAC, a method that uses reinforcement learning to enable effective collaboration between humans and large language models for complex task-solving, with limited human intervention.
http://arxiv.org/abs/2402.12913v1
Compressor summary: The paper presents a system for detecting hallucination in LLMs for text-generation tasks without labeled data, using prompt engineering and few-shot learning, and achieving competitive results with smaller LLMs.
http://arxiv.org/abs/2402.12908v1
Compressor summary: The authors introduce RealCompo, a text-to-image generation framework that combines text-to-image and layout-to-image models to create more realistic and compositional images, without requiring extra training.
http://arxiv.org/abs/2402.12907v1
Compressor summary: The text discusses the importance of considering societal aspects in AI alignment and proposes a new problem (ICSAP) to explore how game theory can help bridge the gap between technical and social components.
http://arxiv.org/abs/2402.12891v1
Compressor summary: The paper discusses how the main lens exit pupil affects depth reconstruction and post-shot refocusing in plenoptic cameras, and provides analysis and validation of this effect using simulations and experiments.
http://arxiv.org/abs/2402.12890v1
Compressor summary: The paper proposes a method to improve unsupervised sentence representations using semantic graph smoothing, leading to better text clustering and classification results on eight benchmarks.
http://arxiv.org/abs/2402.12887v1
Compressor summary: The text describes a common step called qualitative parameterisation that occurs after developing the structure of a Bayesian network to illustrate its intended behaviour before doing a more rigorous parameterisation.
http://arxiv.org/abs/2402.12881v1
Compressor summary: The study examines how well pre-trained language and vision models understand object affordances and creates a new dataset for this purpose.
http://arxiv.org/abs/2402.12880v1
Compressor summary: The text discusses various linguistic, prosodic, and acoustic cues for detecting autism in voice, speech, and language across biomedical, psychological, and NLP domains, highlighting gaps in research on female patients and transformer-based methods.
http://arxiv.org/abs/2402.12875v1
Compressor summary: CoT helps large language models perform serial computations, increasing their ability to solve complex tasks like arithmetic and symbolic reasoning.
http://arxiv.org/abs/2402.12874v1
Compressor summary: The paper proposes Off-policy DAE, a method that learns from off-policy data by decomposing return into skill and luck components, without using importance sampling or truncation, and shows its advantages over previous methods in sample-efficient reinforcement learning.
http://arxiv.org/abs/2402.12869v1
Compressor summary: The paper investigates how different table-to-text methods affect the performance of question answering systems when using hybrid domain data, and presents empirical findings and reasons behind the success of some methods.
http://arxiv.org/abs/2402.12868v1
Compressor summary: The paper introduces a new online optimization algorithm that exploits the curvature of both feasible sets and loss functions to achieve fast regret bounds in various environments.
http://arxiv.org/abs/2402.12865v1
Compressor summary: The authors propose a method to visualize how Transformer-based Language Models learn and recall information by projecting their gradients into words.
http://arxiv.org/abs/2402.12862v1
Compressor summary: The paper proposes using evidential deep learning to handle ambiguous emotions in emotion classification by representing emotions as distributions and quantifying uncertainty.
http://arxiv.org/abs/2402.12861v1
Compressor summary: The paper provides formal bounds on reconstructing sensitive data from machine learning models trained with differential privacy under realistic settings and supports them with empirical results.
http://arxiv.org/abs/2402.12854v1
Compressor summary: The paper presents a new method to optimize a key parameter in topological data analysis, called filter, for better visualizing and analyzing data structures.
http://arxiv.org/abs/2402.12851v1
Compressor summary: MoELoRA is a novel PEFT method that uses contrastive learning to improve the adaptability of LLMs in various reasoning tasks with fewer parameters and less training time.
http://arxiv.org/abs/2402.12847v1
Compressor summary: The paper proposes pre-instruction-tuning (PIT), a method that improves large language model-based assistants' ability to learn new facts by tuning them on questions before training on documents, instead of after.
http://arxiv.org/abs/2402.12846v1
Compressor summary: Contrastive Visual Question Generation (ConVQG) is a method that uses both image and text information to generate focused and relevant questions, outperforming state-of-the-art methods in VQG.
http://arxiv.org/abs/2402.12845v1
Compressor summary: The paper proposes a new method for offline reinforcement learning that uses multimodal language models to integrate image-based states and text-based actions, leading to better performance and long-term strategy.
http://arxiv.org/abs/2402.12844v1
Compressor summary: The paper proposes ICON, a method to improve inter-report consistency in radiology report generation by aligning lesion attributes using lesion-aware mix-up augmentation.
http://arxiv.org/abs/2402.12843v1
Compressor summary: The paper explores self-supervised learning to improve solar panel segmentation from images, reducing the need for manual annotations and enhancing model generalization.
http://arxiv.org/abs/2402.12842v1
Compressor summary: PromptKD is a novel method for compressing generative language models using prompt tuning and student guidance, achieving state-of-the-art results with minimal parameters.
http://arxiv.org/abs/2402.12840v1
Compressor summary: ArabicMMLU is a new benchmark for evaluating Arabic language understanding in multiple tasks, showing that current models still have significant room for improvement.
http://arxiv.org/abs/2402.12835v1
Compressor summary: PANDA is a method to improve the domain-specific performance of large language models without fine-tuning, using insights from expert models' response preferences.
http://arxiv.org/abs/2402.12821v1
Compressor summary: The paper explores how to use large language models to detect factual inconsistencies in summaries without training and how to distill smaller, high-performance models for this task.
http://arxiv.org/abs/2402.12819v1
Compressor summary: The paper studies how many labelled examples specialised models need to outperform general language models on NLP tasks, finding that they often require only a few samples ($100-1000$) depending on task complexity and variance.
http://arxiv.org/abs/2402.12817v1
Compressor summary: The proposed method investigates the effects of randomness factors on learning with limited labelled data by controlling interactions and measuring individual factor's impact.
http://arxiv.org/abs/2402.12812v1
Compressor summary: The study proposes two algorithms for collaborative mean estimation that allow agents to communicate with a limited number of peers, addressing scalability issues and offering optimal performance under certain conditions.
http://arxiv.org/abs/2402.12810v1
Compressor summary: The paper introduces PIP-Net, a framework that predicts pedestrian crossing intentions for autonomous vehicles using kinematic data and spatial features from multiple cameras, achieving state-of-the-art performance and presenting the Urban-PIP dataset.
http://arxiv.org/abs/2402.12808v1
Compressor summary: The paper proposes a regularized learning framework for estimating nonhomogeneous Poisson processes from limited data using adaptive, data-driven binning methods.
http://arxiv.org/abs/2402.12806v1
Compressor summary: SymBa is a novel solver-LLM integration that improves backward chaining performance, proof faithfulness, and efficiency in diverse multi-step reasoning benchmarks.
http://arxiv.org/abs/2402.12801v1
Compressor summary: Key points: - Large Language Models are widely used for natural language processing tasks, especially in low-resource settings - The paper evaluates their performance for few shot clinical entity recognition in English, French and Spanish - The paper finds that prompt-based models perform well outside the clinical domain, but lighter supervised taggers with masked language models perform better in the clinical domain - The paper suggests that Large Language Models are not ready for production use, but could help create annotated data Summary: The paper assesses how Large Language Models do few shot clinical entity recognition in different languages and finds that they work better for creating annotated data than for recognizing entities.
http://arxiv.org/abs/2402.12800v1
Compressor summary: The text discusses a new approach to recognizing hand gestures using radar sensors and synthetic data, which could improve virtual reality and human-computer interaction applications.
http://arxiv.org/abs/2402.12792v1
Compressor summary: The paper proposes a novel 3D scene representation method using differentiable volumetric rendering, 2D labels, temporal rendering, and occupancy flow to achieve state-of-the-art performance in semantic occupancy estimation.
http://arxiv.org/abs/2402.12790v1
Compressor summary: The paper evaluates XAI metrics on CAM and Grad-CAM for skeleton-based HAR and finds stability is more reliable than faithfulness, while CAM and Grad-CAM provide similar explanations.
http://arxiv.org/abs/2402.12789v1
Compressor summary: The paper explores how fair classification can be achieved without using sensitive attributes in the training data by shifting the data distribution and sampling influential data.
http://arxiv.org/abs/2402.12788v1
Compressor summary: RhythmFormer uses a hierarchical temporal periodic transformer to leverage the quasi-periodic nature of remote photoplethysmography for non-contact physiological signal detection, achieving state-of-the-art performance with fewer parameters and reduced computational complexity.
http://arxiv.org/abs/2402.12786v1
Compressor summary: The paper introduces StyleTalk, a dataset for teaching LLMs to understand and respond to different speaking styles in spoken dialogue, and proposes the Spoken-LLM framework that uses it to improve performance over text-only baselines and prior speech LLMs methods.
http://arxiv.org/abs/2402.12779v1
Compressor summary: Key points: - The paper proposes a Two-stage Rainfall-Forecasting Diffusion Model (TRDM) to improve long-term rainfall forecasts and address spatial-temporal imbalance. - TRDM consists of two stages: capturing robust temporal information under low-resolution conditions in the first stage, and reconstructing low-resolution images into high-resolution images in the second stage. - The paper achieves state-of-the-art results on two datasets and releases the code on GitHub. Summary: The paper introduces a novel two-stage model for long-term rainfall prediction that combines temporal and spatial information in low-resolution and high-resolution stages, and outperforms existing methods on two datasets.
http://arxiv.org/abs/2402.12770v1
Compressor summary: The study introduces a framework for empathetic AI dialogue that uses validation techniques, detects emotional states, and generates validating responses, outperforming previous models on textual and speech-based datasets.
http://arxiv.org/abs/2402.12767v1
Compressor summary: The paper proposes the IDEA model that learns identifiable latent states to detect and disentangle distribution shifts in time series data and performs better than existing methods.
http://arxiv.org/abs/2402.12765v1
Compressor summary: Key points: - The paper introduces domain generalized oriented object detection, a task that requires generalization across different domains. - The paper proposes GOOD, a detector that uses style hallucination by CLIP and two components (RAC and SEC) to learn stable content and orientation representations. - The paper shows that GOOD achieves state-of-the-art performance on multiple cross-domain settings. Summary: The paper presents GOOD, a detector that learns to generalize across different domains by using CLIP's style hallucination and two components (RAC and SEC) that stabilize content and orientation representations, achieving state-of-the-art results on multiple settings.
http://arxiv.org/abs/2402.12763v1
Compressor summary: BronchoTrack is a fast and accurate framework for real-time bronchoscope localization that works across different patients and airway generations.
http://arxiv.org/abs/2402.12756v1
Compressor summary: Key points: - Wi-Fi fingerprinting is a popular approach to indoor localization, but its performance depends on accurate and up-to-date fingerprint databases - Time-varying electromagnetic interferences in indoor environments can cause significant changes in the characteristics of fingerprint databases over time - The authors construct a dynamic database based on RSSI measurements and compare it with a static database for indoor localization using Gaussian process regression - The results show that the localization error increases with time using a static database, but a dynamic database can improve the localization performance Summary: The paper shows how time-varying Wi-Fi fingerprints affect indoor localization and proposes to use dynamic databases that update RSSI measurements over time, which can reduce the localization error compared to static databases.
http://arxiv.org/abs/2402.12754v1
Compressor summary: The paper proposes a new PAD method that uses global and local features to detect fingerprint spoofing attacks, improving detection performance over existing methods.
http://arxiv.org/abs/2402.12750v1
Compressor summary: The paper proposes a new approach for creating versatile multimodal large language models by composing existing ones and introduces a benchmark to test their performance on various tasks.
http://arxiv.org/abs/2402.12749v1
Compressor summary: Key points: - Me LLaMA is a medical LLM family based on LLaMA2 with continual pre-training and instruction tuning on large medical data - It outperforms existing open-source medical LLMs and commercial giants like ChatGPT and GPT-4 on various tasks and datasets - It is an open-source foundational LLM for the medical domain using biomedical and clinical data Summary: Me LLaMA is a new medical large language model that leverages continual pre-training and instruction tuning of LLaMA2 on large medical data, achieving superior performance on various medical tasks compared to other medical LLMs and commercial AI models.
http://arxiv.org/abs/2402.12741v1
Compressor summary: MuLan is a training-free text-to-image model that uses a large language model and a vision-language model to generate multi-object images with spatial relationships and attribute bindings by progressive planning and feedback control.
http://arxiv.org/abs/2402.12738v1
Compressor summary: This study evaluates GPT-4's performance in mental health counseling dialogues and finds it comparable to human counselors.
http://arxiv.org/abs/2402.12737v1
Compressor summary: The paper proposes an algorithm to find reliable and accurate local explanations for predictive models by identifying intervals where input features are trustworthy.
http://arxiv.org/abs/2402.12736v1
Compressor summary: The paper introduces Calibration side tuning, a lightweight fine-tuning strategy for object detection networks that adapts transformer techniques to ResNet, improving performance while maintaining efficient resource use.
http://arxiv.org/abs/2402.12730v1
Compressor summary: Key points: - System developed for SemEval-2024 Task 1 on semantic textual relatedness (STR) between African and Asian languages - Explored supervised and cross-lingual training using large language models (LLMs) - Developed TranSem for subtask A and FineSem for subtask C, achieving mixed results Summary: The paper presents a system that uses LLMs to train models for STR between African and Asian languages, with varied performance on two subtasks.
http://arxiv.org/abs/2402.12729v1
Compressor summary: The paper introduces a new deep transfer learning method for intelligent fault detection that uses neural processes, graph convolution networks, and multi-scale uncertainty analysis to improve performance on scarce fault data.
http://arxiv.org/abs/2402.12728v1
Compressor summary: MAIL is a novel method for knowledge-based visual question answering that leverages multimodal knowledge and LLMs to enhance image understanding and knowledge reasoning in complex scenarios.
http://arxiv.org/abs/2402.12727v1
Compressor summary: Posterior sampling is a computationally intractable method for learning from measurements, even though unconditional sampling is fast and efficient.
http://arxiv.org/abs/2402.12722v1
Compressor summary: Key points: - The paper proposes a novel framework (SKI-CL) for multivariate time series forecasting under different regimes - SKI-CL leverages structural knowledge to guide the model and uses memory replay to preserve the data from each regime - SKI-CL outperforms existing methods on synthetic and real datasets Summary: SKI-CL is a new framework for continual multivariate time series forecasting that uses structural knowledge and memory replay to overcome catastrophic forgetting and achieve better performance.
http://arxiv.org/abs/2402.12721v1
Compressor summary: The paper introduces a novel neural network model, PAC-FNO, that can handle image recognition tasks across different resolutions and natural variations using the frequency domain approach, improving performance significantly on seven benchmarks.
http://arxiv.org/abs/2402.12715v1
Compressor summary: This survey reviews spurious correlations in machine learning models, their impact on generalization and robustness, and existing methods, datasets, benchmarks, and metrics to address them.
http://arxiv.org/abs/2402.12714v1
Compressor summary: The Equivariant Pretrained Transformer (EPT) is a novel framework that harmonizes geometric learning of small molecules and proteins by using a block-enhanced representation, E(3) equivariance, and joint pretraining on both domains.
http://arxiv.org/abs/2402.12713v1
Compressor summary: The study introduces Financial Bias Indicators to evaluate the financial rationality of Large Language Models in financial analysis, finding varying degrees of irrationality influenced by design and training factors.
http://arxiv.org/abs/2402.12712v1
Compressor summary: The paper proposes MVDiffusion++, a neural network that reconstructs 3D objects from few or no images, using self-attention and view dropout to achieve better performance and scalability.
http://arxiv.org/abs/2402.12711v1
Compressor summary: The paper introduces uniform last-iterate (ULI) as a stronger performance measure for bandit algorithms that considers both cumulative and instantaneous performance, and shows its near-optimality and achievability in some settings.
http://arxiv.org/abs/2402.12706v1
Compressor summary: DITeD is a method for few-shot action recognition that leverages temporal invariance in dynamic systems to transfer knowledge between domains using generative pre-training and adaptation stages.
http://arxiv.org/abs/2402.12702v1
Compressor summary: The paper discusses the potential and challenges of using generative AI for design in resource-constrained settings and suggests innovative approaches to make it efficient and accessible.
http://arxiv.org/abs/2402.12694v1
Compressor summary: Key points: - The paper introduces Leddam, a method for multivariate time series forecasting that uses learnable decomposition and dual attention module. - Leddam captures dynamic trend information and inter-series dependencies and intra-series variations simultaneously. - Leddam outperforms state-of-the-art methods on eight open-source datasets, and can be plugged into other methods for a performance boost. Summary: Leddam is a novel method for multivariate time series forecasting that leverages learnable decomposition and dual attention module to capture trend and dependency information, achieving significant improvements over existing methods and being compatible with them.
http://arxiv.org/abs/2402.12692v1
Compressor summary: FormulaQA is a new dataset for testing AI's ability to apply formulas in numerical reasoning problems using junior high school physics questions and various LLM evaluation methods.
http://arxiv.org/abs/2402.12691v1
Compressor summary: The paper proposes tree-planting, a method to integrate syntactic supervision into Transformer LMs without explicit syntax, improving performance on SyntaxGym benchmark.
http://arxiv.org/abs/2402.12690v1
Compressor summary: The paper investigates a theoretical puzzle about the relationship between accuracy and fluency in translation, showing that they are correlated at the corpus level but trade off at the segment level.
http://arxiv.org/abs/2402.12687v1
Compressor summary: Key points: - The paper proposes a one-shot method for function approximation from randomly sampled data without knowing the manifold structure - The method uses spherical polynomials on an ambient hypersphere - The method achieves optimal rates of approximation for rough functions Summary: The paper presents a novel function approximation method using spherical polynomials on a hypersphere that works directly from random data without manifold information and has optimal performance for rough functions.
http://arxiv.org/abs/2402.12685v1
Compressor summary: This paper presents XRL-Bench, a unified benchmark for evaluating explainable reinforcement learning methods that use state importance to explain agent actions, and introduces TabularSHAP, a novel XRL method for tabular and image data.
http://arxiv.org/abs/2402.12683v1
Compressor summary: TorchCP is an open-source Python toolbox for efficient conformal prediction on PyTorch-based deep learning models.
http://arxiv.org/abs/2402.12677v1
Compressor summary: The paper proposes a novel image stitching method based on global similarity prior and triangular meshes that preserves object structures and outperforms existing methods.
http://arxiv.org/abs/2402.12676v1
Compressor summary: The authors propose a method that uses reinforcement learning to control a physics simulation of human movement, enabling accurate gait analysis from smartphone videos without producing physically implausible results.
http://arxiv.org/abs/2402.12675v1
Compressor summary: This paper compares object-centric deep learning models to a ResNet-50 baseline in learning visual relations from images using tasks derived from comparative cognition literature and finds that while object-centric models outperform the baseline in simpler tasks, they still struggle in more difficult ones.
http://arxiv.org/abs/2402.12664v1
Compressor summary: The text introduces DDAR, a new method for estimating deterministic uncertainty in deep learning models by using prototypes in latent representations to analyze input features and overcome feature collapse.
http://arxiv.org/abs/2402.12663v1
Compressor summary: SoftQE uses LLMs to enhance query encoders for dense retrieval without additional latency or cost.
http://arxiv.org/abs/2402.12659v1
Compressor summary: FinBen is a benchmark for evaluating large language models' financial skills, revealing their strengths and limitations in various tasks.
http://arxiv.org/abs/2402.12656v1
Compressor summary: HyperMoE is a novel framework for language models that uses hypernetworks to balance sparsity and expert knowledge by transferring unselected experts' information to specific modules.
http://arxiv.org/abs/2402.12654v1
Compressor summary: The paper proposes a fast, robust, and multilingual encoder-only speech model based on CTC that outperforms encoder-decoder models for speech processing tasks such as ASR, ST, and LID.
http://arxiv.org/abs/2402.12649v1
Compressor summary: The study finds no correspondence between decontextualized "trick tests" and realistic evaluations of gender-occupation bias in LLMs, suggesting that current benchmarks may not adequately assess real-world harm.
http://arxiv.org/abs/2402.12647v1
Compressor summary: The paper proposes a probabilistic model using diffusion to estimate object shapes and correspondences from synthetic data, improving pose estimation performance and generalization.
http://arxiv.org/abs/2402.12646v1
Compressor summary: Key points: - The text introduces a gradient-free optimization algorithm for training neural networks (Coordinate Search) that can handle non-differentiable activation functions and multi-loss problems. - The algorithm bundles weights instead of optimizing each variable, which speeds up convergence and reduces dimension. - The proposed method sometimes outperforms gradient-based methods, especially with limited labeled data. Summary: The text proposes a gradient-free optimization algorithm for neural networks that can deal with non-differentiable functions and multiple losses, and shows that it can improve performance over gradient-based methods in some scenarios, such as low data availability.
http://arxiv.org/abs/2402.12644v1
Compressor summary: The paper proposes a method to binarize blurry images of bimodal objects using event-based and image-based inference, resulting in sharp binary videos with high frame rate.
http://arxiv.org/abs/2402.12641v1
Compressor summary: The article presents a new antenna interference source detection model called YOLO-Ant, which combines a lightweight CNN and transformer structure to effectively detect small objects with complex backgrounds.
http://arxiv.org/abs/2402.12636v1
Compressor summary: StyleDubber is a new method for movie dubbing that uses phonemes instead of frames to generate speech that matches both time and emotion with the video, based on a reference audio track.
http://arxiv.org/abs/2402.12627v1
Compressor summary: The text discusses the data change problem in AI models due to dynamic data, grouping domain shift and concept drift into one issue and reviewing state-of-the-art methods to tackle it.
http://arxiv.org/abs/2402.12626v1
Compressor summary: The paper explores data poisoning attacks on self-supervised learning models and proposes two types of attacks with different stages.
http://arxiv.org/abs/2402.12625v1
Compressor summary: The paper proposes a compact binary optimization algorithm for feature selection in machine learning, which improves classification accuracy and reduces memory requirements by using probability vectors instead of two populations.
http://arxiv.org/abs/2402.12624v1
Compressor summary: The text discusses how to improve object detection in self-learning systems by focusing on important layers in the network.
http://arxiv.org/abs/2402.12621v1
Compressor summary: The text proposes Reflect-RL, a system that uses online reinforcement learning to fine-tune language models for multi-round interactive tasks with a reflection model and data generation techniques.
http://arxiv.org/abs/2402.12616v1
Compressor summary: The paper proposes a new multi-objective coordinate search algorithm (MOCS) for large-scale feature selection that generates distinct subsets of features by flipping variables on the Pareto front, outperforming NSGA-II in speed and efficiency.
http://arxiv.org/abs/2402.12613v1
Compressor summary: The paper analyzes how using the sigmoid loss instead of the InfoNCE loss in contrastive learning affects the geometric structure of learned embeddings and proposes a framework to parameterize various embedding structures by a single variable.
http://arxiv.org/abs/2402.12608v1
Compressor summary: Patient-Centric Knowledge Graphs (PCKGs) are a new approach in healthcare that combines different types of patient data to give doctors a better understanding of the patient's health and help them provide more personalized care.