This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-19 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.12849v1
Compressor summary: Our new framework uses perspective depth estimation models to generate pseudo labels for 360-degree images, improving depth estimation accuracy in virtual reality and other applications.
http://arxiv.org/abs/2406.12847v1
Compressor summary: The paper introduces ChangeViT, a framework that uses vision transformers to enhance change detection in remote sensing images by capturing both large-scale and fine-grained changes, achieving state-of-the-art performance on various datasets.
http://arxiv.org/abs/2406.12846v1
Compressor summary: DrVideo is a system that converts long videos into text documents to leverage large language models for understanding and answering questions about them, achieving state-of-the-art results on various benchmarks.
http://arxiv.org/abs/2406.12845v1
Compressor summary: The paper proposes a new method for training human-interpretable reward models that use multiple objectives to guide large language models based on human preferences, achieving state-of-the-art results on RewardBench.
http://arxiv.org/abs/2406.12843v1
Compressor summary: The paper investigates whether simple defenses can improve KataGo's performance against adversarial strategies in Go and finds that none of them can withstand adaptive attacks.
http://arxiv.org/abs/2406.12841v1
Compressor summary: The text introduces a taxonomy and blueprint for higher-order graph neural networks (HOGNNs) to analyze, compare, and select the best model for a given scenario.
http://arxiv.org/abs/2406.12839v1
Compressor summary: This article analyzes the generation process of diffusion models, providing insights on how to design training and sampling for effective generation, and comparing with existing methods.
http://arxiv.org/abs/2406.12837v1
Compressor summary: LayerMerge is a new method that reduces the number of layers in convolutional neural networks by pruning activation functions and convolution layers, achieving efficiency and performance gains without increasing kernel size.
http://arxiv.org/abs/2406.12835v1
Compressor summary: The paper proposes a framework that uses neural bandit algorithms and graph convolutional networks to maximize the number of users influenced in multi-round diffusion campaigns with unknown network topology.
http://arxiv.org/abs/2406.12834v1
Compressor summary: The paper proposes a method called GroPrompt that uses Text-Aware Prompt Contrastive Learning to train video object segmentation models with weak supervision, achieving competitive results on standard benchmarks.
http://arxiv.org/abs/2406.12832v1
Compressor summary: LaMDA is a novel approach for fine-tuning large language models that reduces trainable parameters, GPU memory, and compute cost by using low-dimensional adaptation and two projection matrices.
http://arxiv.org/abs/2406.12831v1
Compressor summary: VIA is a framework for adaptive video editing that ensures global and local consistency in spatiotemporal dimensions for minute-long videos.
http://arxiv.org/abs/2406.12830v1
Compressor summary: The paper evaluates the probabilistic reasoning capabilities of language models using various tasks and contexts, and releases a new dataset for this purpose.
http://arxiv.org/abs/2406.12824v1
Compressor summary: The paper examines how language models use external context to answer questions, finding that they rely heavily on context information and minimally on their own memory.
http://arxiv.org/abs/2406.12822v1
Compressor summary: This study examines how using native or translated data during tuning and evaluation affects multilingual language models' performance, revealing differences in high-performance scenarios and suggesting regularization helps for structured tasks.
http://arxiv.org/abs/2406.12816v1
Compressor summary: The paper proposes neural approximate mirror maps (NAMMs) for imposing constraints on diffusion models, which improve their accuracy and reliability in generating valid synthetic data and solving constrained inverse problems.
http://arxiv.org/abs/2406.12814v1
Compressor summary: The paper demonstrates new safety risks of multimodal agents using adversarial text strings to manipulate image-based VLMs, showing how different VLMs vary in their robustness and discussing potential defenses.
http://arxiv.org/abs/2406.12809v1
Compressor summary: The ConsisEval benchmark evaluates the inconsistency of large language models by comparing their answers to pairs of easy and hard questions, finding that GPT-4 is the most consistent but still has room for improvement.
http://arxiv.org/abs/2406.12808v1
Compressor summary: The text discusses the use of Graph Neural Networks (GNNs) as a promising alternative to Convolutional Neural Networks (CNNs) for analyzing histopathological images, and identifies four emerging trends in this field.
http://arxiv.org/abs/2406.12807v1
Compressor summary: This paper proposes a novel neural network model that uses medical images and data to predict disease progression and treatment effects for multiple sclerosis, allowing for more personalized medicine and better understanding of patient subgroups.
http://arxiv.org/abs/2406.12805v1
Compressor summary: Our method learns adaptive inclusive tokens to shift attribute distribution and reduce biases in text-to-image generation without explicit attribute specification or prior knowledge.
http://arxiv.org/abs/2406.12803v1
Compressor summary: The paper proposes a scalable algorithm that learns nearly optimal rule lists from large datasets using sampling and guarantees on the quality of the approximation.
http://arxiv.org/abs/2406.12795v1
Compressor summary: The paper explores how to maximize state entropy under partial observability by bounding the approximation of true state entropy using observation properties and regularizing observation entropy.
http://arxiv.org/abs/2406.12793v1
Compressor summary: ChatGLM is a family of large language models that rival or outperform GPT-4 in various tasks, with GLM-4 All Tools being able to autonomously use different tools for complex tasks.
http://arxiv.org/abs/2406.12787v1
Compressor summary: The study tests different AI models on generating educational materials with controlled readability levels and finds that LLaMA-2 70B performs best, but there are concerns about quality and accuracy.
http://arxiv.org/abs/2406.12785v1
Compressor summary: The text introduces a new approach to machine learning called in-context learning of energy functions, which can handle more complex settings than traditional in-context learning methods.
http://arxiv.org/abs/2406.12784v1
Compressor summary: UBENCH is a benchmark for evaluating large language models' reliability, covering various abilities and saving computational resources compared to previous methods.
http://arxiv.org/abs/2406.12779v1
Compressor summary: The paper proposes CNLC, CNL, and CFM methods for data augmentation to address the scarcity of annotated resources for NNER tasks and improve performance on ACE2004 and ACE2005 benchmarks.
http://arxiv.org/abs/2406.12775v1
Compressor summary: The study analyzes how large language models perform multi-hop queries and proposes a "back-patching" method to improve their latent reasoning.
http://arxiv.org/abs/2406.12774v1
Compressor summary: The paper explores the challenges of training AI models on analog devices, proposing Tiki-Taka algorithm as a better alternative to SGD for converging exactly and avoiding asymptotic errors.
http://arxiv.org/abs/2406.12770v1
Compressor summary: AI can improve the dairy industry by enhancing production, minimizing manual tasks, and addressing challenges through novel technologies like Machine Learning.
http://arxiv.org/abs/2406.12769v1
Compressor summary: Latent intuitive physics is a framework that can simulate fluids from 3D videos without knowing their physical properties.
http://arxiv.org/abs/2406.12762v1
Compressor summary: The paper proposes an online unsupervised clustering method using wearable IMUs for Human Activity Recognition (HAR) in sports, enabling explainable classification and detection of cheating in Nordic Walking.
http://arxiv.org/abs/2406.12757v1
Compressor summary: The authors introduce the Multi-Attribute Composition (MAC) dataset, which provides more realistic and diverse attribute annotations for compositional zero-shot learning tasks.
http://arxiv.org/abs/2406.12756v1
Compressor summary: Key points: - Machine Learning for Mineral Prospectivity Mapping is challenging and requires large-scale geospatial data and few labeled observations - Deep Learning methods may overfit due to scarce labels, but self-supervised learning using unlabeled data can improve robustness and interpretability - The approach uses a masked image modeling framework to pretrain a neural network for feature extraction and downstream tasks - The method is evaluated on MVT and CD deposits in North America and Australia and shows improved predictions and explainability Summary: The paper proposes a self-supervised learning approach for Mineral Prospectivity Mapping using unlabeled geospatial data and a masked image modeling framework. The method improves feature robustness and interpretability, and outperforms existing methods on MVT and CD deposits.
http://arxiv.org/abs/2406.12754v1
Compressor summary: Chumor is a dataset of culturally nuanced Chinese jokes with explanations that challenge and outperform state-of-the-art language models.
http://arxiv.org/abs/2406.12753v1
Compressor summary: OlympicArena is a benchmark for evaluating AI's cognitive reasoning abilities using complex problems from various scientific disciplines and modalities, revealing current limitations in advanced models like GPT-4o.
http://arxiv.org/abs/2406.12747v1
Compressor summary: TSI-Bench is a benchmark suite for evaluating deep learning algorithms for time series imputation tasks, considering different missingness ratios and patterns.
http://arxiv.org/abs/2406.12746v1
Compressor summary: REACT is a K-VQA method that combines multiple question-answering tactics using decision contexts and rationales to generate and select answer candidates, achieving better performance than LLM-based baselines.
http://arxiv.org/abs/2406.12742v1
Compressor summary: MIRB is a new benchmark to evaluate visual language models' ability to reason across multiple images, revealing gaps in current models and highlighting the need for further research.
http://arxiv.org/abs/2406.12739v1
Compressor summary: MT-LLMs combine machine translation encoders with large language models to improve natural language understanding for underrepresented languages.
http://arxiv.org/abs/2406.12738v1
Compressor summary: The paper proposes a universal language model decoder for handling diverse clinical tasks with minimal task-specific adaptation, achieving comparable or better performance than existing methods.
http://arxiv.org/abs/2406.12736v1
Compressor summary: The PrivacyGuard framework uses a structured scene graph and data augmentation to determine privacy classes for objects in images based on scene contexts.
http://arxiv.org/abs/2406.12732v1
Compressor summary: The paper proposes an explainable machine learning solution that uses data from manufacturing processes and workers' performance to evaluate productivity, differentiate between expert and inexpert workers, and generate insights to improve industrial workflows.
http://arxiv.org/abs/2406.12725v1
Compressor summary: The paper proposes a new method to induce sound laws between ancestor and descendant languages using large language models and Python programs trained on examples, which complements existing methods' weaknesses.
http://arxiv.org/abs/2406.12723v1
Compressor summary: The paper introduces the BIOSCAN-5M Insect dataset with multi-modal data for over 5 million insect specimens and proposes three benchmark tasks to evaluate its impact on classification and clustering accuracy.
http://arxiv.org/abs/2406.12719v1
Compressor summary: The study examines how different factors affect large language models' ability to understand and answer questions based on tables, finding that instructions improve performance but there are still challenges with data quality and reliability.
http://arxiv.org/abs/2406.12718v1
Compressor summary: The paper proposes AGLA, a method to reduce object hallucinations in large vision-language models by combining global and local image features for response generation and visual discrimination.
http://arxiv.org/abs/2406.12712v1
Compressor summary: $\mathtt{CoBEVGlue}$ is a novel collaborative perception system that aligns agents using co-visible objects and achieves robust performance under localization errors and attacks.
http://arxiv.org/abs/2406.12709v1
Compressor summary: The paper proposes a new framework that combines three types of curriculum learning for training models on spatio-temporal data, improving their performance and addressing complex challenges.
http://arxiv.org/abs/2406.12708v1
Compressor summary: AgentReview is a simulation framework using a large language model that reveals how reviewer biases affect paper decisions in scientific publication.
http://arxiv.org/abs/2406.12707v1
Compressor summary: PerceptiveAgent is a multi-modal dialogue system that uses LLMs to perceive acoustic information and generate empathetic responses, improving contextual understanding in Human-AI communication.
http://arxiv.org/abs/2406.12702v1
Compressor summary: The paper presents two paradoxes about detecting jailbreaks in foundation models, showing their impossibility and inconsistency, with examples and implications.
http://arxiv.org/abs/2406.12700v1
Compressor summary: SUPER is a method that eliminates distortions and adjusts head pose in close-up face crops using 3D GAN inversion and visibility-based blending.
http://arxiv.org/abs/2406.12698v1
Compressor summary: The paper proposes an online-adaptive anomaly detection framework using transfer learning that adapts to different environments and achieves high detection accuracy by computing Mahalanobis distance between a normality model and test image features.
http://arxiv.org/abs/2406.12693v1
Compressor summary: XXLTraffic is a large public traffic dataset with long timespan and increasing sensor nodes, curated to support research in ultra-dynamic forecasting and address practical constraints.
http://arxiv.org/abs/2406.12692v1
Compressor summary: MAGIC is a method that uses three agents to create and refine a self-correction guideline for text-to-SQL, improving the performance and interpretability of large language models in this task.
http://arxiv.org/abs/2406.12687v1
Compressor summary: The paper shows how contemporary language models can help with mental health research tasks like data collection, annotation, and instrument deployment by using a dataset of 644 participants with different diagnoses.
http://arxiv.org/abs/2406.12683v1
Compressor summary: The study introduces a deep learning method using Spatial Sequence Attention to classify individuals with Schizophrenia based on structural MRI features extracted from pre-trained DenseNet.
http://arxiv.org/abs/2406.12680v1
Compressor summary: The Psychological Depth Scale (PDS) measures how well large language models (LLMs) can create stories that emotionally engage readers, and shows that GPT-4 stories are comparable to highly-rated human stories.
http://arxiv.org/abs/2406.12679v1
Compressor summary: The study evaluates five large language models on two style control tasks and finds inconsistencies, cultural insensitivity, and varying performance levels across models.
http://arxiv.org/abs/2406.12673v1
Compressor summary: This paper proposes KEEN, a probe that evaluates the knowledge of large language models about entities before generating any text, by analyzing their internal computation.
http://arxiv.org/abs/2406.12671v1
Compressor summary: Key points: - The text discusses the challenges and advances in monocular geometry estimation methods, which estimate 3D shapes from single images. - It compares discriminative and generative models, and shows that data quality is more important than data scale or model architecture for achieving good generalization. - It proposes new benchmarks with diverse scenes and high-quality annotations to evaluate and analyze the models. Summary: The text summarizes recent progress and challenges in monocular geometry estimation, a task that estimates 3D shapes from images. It shows that data quality matters more than other factors for model performance, and introduces new benchmarks to test and improve the models.
http://arxiv.org/abs/2406.12670v1
Compressor summary: The authors introduce new methods, theory, and a jet-pack block for editing large language models, and show how intrinsic dimensionality predicts editability and vulnerability to stealth attacks.
http://arxiv.org/abs/2406.12668v1
Compressor summary: The paper presents a method to detect disturbing images using large multimodal models and CLIP's text and image encoders, which achieves state-of-the-art results.
http://arxiv.org/abs/2406.12667v1
Compressor summary: Wagner proposed using Reinforcement Learning to test conjectures in graph theory by building graphs step-by-step and maximizing a score related to the conjecture.
http://arxiv.org/abs/2406.12665v1
Compressor summary: The text describes the creation of CollabStory, a dataset with LLM-generated collaborative stories, to explore and study multi-LLM scenarios for writing tasks.
http://arxiv.org/abs/2406.12663v1
Compressor summary: The paper proposes a new decoding strategy for image captioning that generates detailed captions with low object hallucination and introduces reliable evaluation metrics to assess this performance.
http://arxiv.org/abs/2406.12662v1
Compressor summary: The paper introduces Online Anchor-based Training (OAT), a novel method for image classification that improves performance by training models to learn percentage changes of class labels with respect to batch centers, and evaluates it on four datasets.
http://arxiv.org/abs/2406.12661v1
Compressor summary: SCORE is a fast and scalable Bayesian optimization method that uses a 1D reparametrization trick to overcome computational challenges in high-dimensional landscapes.
http://arxiv.org/abs/2406.12660v1
Compressor summary: The study explores how Explainable AI (XAI) affects users' compliance with AI recommendations and the role of AI literacy and mental models in this process.
http://arxiv.org/abs/2406.12655v1
Compressor summary: The paper reviews existing methods to evaluate machine learning models that generate program code from natural language input, focusing on benchmarks and metrics.
http://arxiv.org/abs/2406.12649v1
Compressor summary: Key points: - ViTs are powerful vision models but lack trustworthy explanations - The paper proposes five criteria for explaining ViTs and shows the limitations of existing methods - The paper introduces PACE, a variational Bayesian framework that provides faithful, stable, sparse, multi-level, and parsimonious explanations for ViT predictions Summary: The paper presents PACE, a new method to explain ViTs using variational Bayesian techniques, which meets five desiderata for trustworthy explanations and outperforms existing approaches.
http://arxiv.org/abs/2406.12645v1
Compressor summary: The study compares human-curated and machine-selected evidence for generating fact-checking explanations using large language models, finding that machine-selected evidence can generate similar or higher quality explanions.
http://arxiv.org/abs/2406.12644v1
Compressor summary: The text introduces the Hierarchical Prompting Taxonomy, a framework for assessing large language models' abilities on diverse tasks using different prompting strategies, and compares its effectiveness with existing evaluation methods.
http://arxiv.org/abs/2406.12641v1
Compressor summary: The paper introduces DetectBench, a benchmark to test implicit evidence detection in long contexts, and proposes Detective Reasoning Prompt and Finetuning methods to improve LLMs' abilities in this task.
http://arxiv.org/abs/2406.12640v1
Compressor summary: The paper analyzes the data enhancement technology of graph neural networks for deep learning applications with limited or costly data sets.
http://arxiv.org/abs/2406.12639v1
Compressor summary: The authors propose a new task and benchmark dataset for language agents to predict clarification needs, invoke external tools, and generate plans based on user instructions, using a multi-agent framework called Clarification-Execution-Planning.
http://arxiv.org/abs/2406.12638v1
Compressor summary: Candle is a framework that improves CLIP's performance on downstream tasks with long-tailed and few-shot data by using compensating logit-adjusted loss, cross-modal attention, and virtual prototypes for new classes.
http://arxiv.org/abs/2406.12629v1
Compressor summary: SeTAR is a novel OOD detection method that uses low-rank approximation of weight matrices and achieves superior performance on ImageNet1K and Pascal-VOC benchmarks.
http://arxiv.org/abs/2406.12624v1
Compressor summary: The paper evaluates the performance of different large language models acting as judges and compares them with human annotations on TriviaQA, highlighting the importance of Cohen's kappa and the potential biases in this paradigm.
http://arxiv.org/abs/2406.12621v1
Compressor summary: The article compares two parsing methods for speech and shows that graph-based parsing performs best and is more efficient than a pipeline approach using ASR and syntactic parsers.
http://arxiv.org/abs/2406.12620v1
Compressor summary: The paper proposes MLEMs, a method that compares how different layers of language models represent linguistic information, providing transparent comparisons and potential extensions to other domains and neural systems.
http://arxiv.org/abs/2406.12618v1
Compressor summary: This paper analyzes the impact of interpretability and analysis (IA) research on NLP using citation data and survey responses, finding that IA is influential in shaping NLP progress and methodology, while also identifying areas for improvement.
http://arxiv.org/abs/2406.12616v1
Compressor summary: JKOnet* is a simple and efficient model for learning diffusion processes from data that recovers all components of the process and provides a closed-form optimal solution for some cases.
http://arxiv.org/abs/2406.12615v1
Compressor summary: The text explores the limitations of expressivity and learning dynamics of bias-free ReLU networks, which share some properties with linear networks due to symmetry conditions.
http://arxiv.org/abs/2406.12614v1
Compressor summary: EUvsDisinfo is a large multilingual dataset of pro-Kremlin disinformation articles that helps track the evolution and spread of disinformation over time and across languages, as well as train models to distinguish between disinformation and trustworthy content.
http://arxiv.org/abs/2406.12608v1
Compressor summary: GraphBridge is a novel framework for text-attributed graphs that leverages contextual textual information to bridge local and global perspectives and improves efficiency with a graph-aware token reduction module.
http://arxiv.org/abs/2406.12606v1
Compressor summary: The paper proposes a low-redundant alignment method called ALLO that improves large language models by focusing on optimizing the most related neurons with useful supervised signals.
http://arxiv.org/abs/2406.12605v1
Compressor summary: The paper explores backdoor attacks on web attack detection models and proposes five methods and defenses, achieving an 87% success rate that can be reduced by fine-tuning.
http://arxiv.org/abs/2406.12600v1
Compressor summary: The study analyzes how statistical learning algorithms generalize from non-i.i.d. data sampled from a stationary mixing process, and shows that online learning with delayed feedback can help achieve low generalization error.
http://arxiv.org/abs/2406.12592v1
Compressor summary: This paper explores concept ablation in pre-trained models, introduces a new variant called 'trademark ablation', and analyzes the model's limitations, resilience, and adaptability.
http://arxiv.org/abs/2406.12589v1
Compressor summary: The paper explores how specialized training environments can improve reinforcement learning agents' performance, transferability, and speed of adaptation to new problems.
http://arxiv.org/abs/2406.12587v1
Compressor summary: The paper proposes Restorer, a Transformer network with U-Net architecture and all-axis attention mechanism, which achieves state-of-the-art or comparable performance in multiple image restoration tasks while being faster than existing methods.
http://arxiv.org/abs/2406.12585v1
Compressor summary: Key points: - Ensembling multiple models can improve classification accuracy - Existing methods mostly use full-text outputs of LLMs without exploiting token-level probabilities - The paper proposes GaC, which uses token-level probabilities from LLMs for ensembling - GaC outperforms existing methods on various benchmarks and can be further improved by ensembling only key tokens Summary: The paper introduces GaC, a novel method that exploits token-level probabilities from LLMs to ensemble multiple models and achieve better classification performance on various benchmarks.
http://arxiv.org/abs/2406.12577v1
Compressor summary: The paper introduces CeLDA, a method for detecting cephalometric landmarks in adults and adolescents using prototypical networks and prototype relation mining.
http://arxiv.org/abs/2406.12572v1
Compressor summary: Mathador-LM is a new benchmark that tests large language models' math skills using a game-like challenge, and it shows that current LLMs perform worse than 5th graders on it.
http://arxiv.org/abs/2406.12570v1
Compressor summary: The paper proposes a method to detect machine-generated text using DetectGPT classifiers and ensembling techniques, achieving high accuracy even when the generative and discriminative language models are different or unknown.
http://arxiv.org/abs/2406.12569v1
Compressor summary: The paper investigates MOYU, a property of large language models that affects inference speed and performance, and suggests ways to improve dynamic activation methods.
http://arxiv.org/abs/2406.12566v1
Compressor summary: RichRAG is a novel RAG framework that uses a sub-aspect explorer, a multi-faceted retriever, and a list-wise ranker to generate rich long-form answers for open-ended user queries.
http://arxiv.org/abs/2406.12563v1
Compressor summary: The paper presents a vision-based autonomous car racing agent that uses only local input features to outperform human drivers in time trial races.
http://arxiv.org/abs/2406.12550v1
Compressor summary: The paper proposes a novel model-based framework, SRA, for offline imitation learning that uses reverse dynamic models and self-paced exploration to overcome covariate shift and achieve state-of-the-art performance.
http://arxiv.org/abs/2406.12549v1
Compressor summary: The authors introduce MultiSocial, a new multilingual dataset for testing machine-generated text detection in social media, and evaluate existing methods on it.
http://arxiv.org/abs/2406.12548v1
Compressor summary: The paper proposes a new method, called P-tailor, for personalizing large language models based on the Big Five Personality Traits, using a mixture of experts approach and a specialization loss function.
http://arxiv.org/abs/2406.12546v1
Compressor summary: TruthQuest is a benchmark for suppositional reasoning based on knights and knaves puzzles that tests large language models' ability to reason about truth and lies.
http://arxiv.org/abs/2406.12539v1
Compressor summary: The paper introduces a novel method for heterophilic graphs that assigns unique aggregation patterns to each node, called Heterophily Snowflake Hypothesis, and shows its effectiveness on various tasks and backbones.
http://arxiv.org/abs/2406.12538v1
Compressor summary: Variational Diffusion Distillation (VDD) is a new method that converts powerful but slow diffusion models into faster Mixtures of Experts (MoE) for behavior learning tasks.
http://arxiv.org/abs/2406.12536v1
Compressor summary: Key points: - The paper introduces a new dataset (ViDSOD-100) and a new model (ATF-Net) for RGB-D video salient object detection - The model fuses appearance, spatio-temporal, and geometry information from different modalities using MEA and MDA modules - The model outperforms existing methods on various tasks and domains Summary: The paper presents a new dataset and a novel model for detecting salient objects in RGB-D videos, which integrates multiple modalities with attention and surpasses state-of-the-art techniques.
http://arxiv.org/abs/2406.12534v1
Compressor summary: UAR is a method for determining when to retrieve information in RAG systems that uses four criteria and standardized procedures to handle various types of user instructions efficiently and effectively.
http://arxiv.org/abs/2406.12531v1
Compressor summary: The paper proposes a method to reduce path lengths in decision trees during training to optimize their execution time on resource-constrained devices by favoring highly asymmetric distributions for split criteria, achieving up to 4x faster inference with minimal accuracy loss.
http://arxiv.org/abs/2406.12527v1
Compressor summary: FuseGen is a novel framework for zero-shot learning that improves synthetic dataset quality by using multiple PLMs and STMs for subset selection, re-weighting, and iterative data generation.
http://arxiv.org/abs/2406.12507v1
Compressor summary: The paper analyzes InterpretTime, a method to evaluate explanations in multivariate time series classification, pointing out its weaknesses, and proposing improvements. It also shows how the best attribution methods can improve channel selection in MTSC.
http://arxiv.org/abs/2406.12502v1
Compressor summary: Code-Optimise is a framework that improves both correctness and runtime of code language models by incorporating preference data from self-generated solutions.
http://arxiv.org/abs/2406.12499v1
Compressor summary: The study explores the use of inverse reinforcement learning (IRL) to train models for autonomous navigation of catheters and guidewires in endovascular surgery, improving safety and efficiency.
http://arxiv.org/abs/2406.12496v1
Compressor summary: The paper introduces a new real-time semantic segmentation model (RDRNet) that balances accuracy and speed by reparameterizing multi-path blocks during inference and improving feature representation with RPPM.
http://arxiv.org/abs/2406.12494v1
Compressor summary: LightPAL is a fast and effective passage retrieval method for summarizing multiple documents based on a large language model and random walk algorithm.
http://arxiv.org/abs/2406.12480v1
Compressor summary: The authors demonstrate how to use large language models (LLMs) to generate synthetic data for training stance detection agents, which improves their performance in online political discussions.
http://arxiv.org/abs/2406.12479v1
Compressor summary: The text describes a new paradigm for remote sensing image intelligence understanding using a multimodal instruction-following dataset and discusses its features and design.
http://arxiv.org/abs/2406.12478v1
Compressor summary: The paper explores ways to improve depthwise separable convolutions in efficient neural networks by optimizing data layouts, achieving significant latency and memory reduction on a low-power device.
http://arxiv.org/abs/2406.12475v1
Compressor summary: MiDEX is an algorithm for minimizing regret in multi-dueling bandits with adversarial preferences, achieving near-optimal performance.
http://arxiv.org/abs/2406.12474v1
Compressor summary: The text describes how Independent Component Analysis (ICA) can reveal universal semantic axes across languages, and presents a method to verify their consistency within and across languages using statistical tests.
http://arxiv.org/abs/2406.12471v1
Compressor summary: DENI is a new strategy that reduces performance instability of fine-tuned language models by using ensembling, noise regularisation and model interpolation, while being computationally efficient.
http://arxiv.org/abs/2406.12468v1
Compressor summary: ATBias is a new decoding technique that improves in-context editing (ICE) performance by biasing logits related to key entities, achieving significant improvements with low latency.
http://arxiv.org/abs/2406.12463v1
Compressor summary: LFMamba is a new super-resolution method for 4D light fields using State Space Models, which effectively captures contextual and angular information while being efficient and easy to implement.
http://arxiv.org/abs/2406.12459v1
Compressor summary: HumanSplat is a method that predicts 3D human properties from one image, using a multi-view diffusion model and a transformer with structure priors, to achieve photorealistic novel-view synthesis.
http://arxiv.org/abs/2406.12454v1
Compressor summary: The article proposes a novel exact algorithm for the 2L-CVRP that combines attention and recurrence mechanisms, improving the state-of-the-art solution by 29.79% on average and resolving an open instance.
http://arxiv.org/abs/2406.12452v1
Compressor summary: The text discusses the crisis of insect decline, the need for better monitoring tools, and presents new machine learning benchmarks and datasets for insect recognition using computer vision.
http://arxiv.org/abs/2406.12449v1
Compressor summary: Retrieval-augmented generation (RAG) combines generative AI's strengths with external knowledge retrieval to enhance accuracy and improve medical applications.
http://arxiv.org/abs/2406.12442v1
Compressor summary: Key points: - The paper introduces Abstraction-of-Thought (AoT), a structured reasoning format that requires varying levels of abstraction - The paper presents AoT Collection, a finetuning dataset for language models with AoT reasoning processes - The paper shows that AoT models outperform Chain-of-Thought (CoT) models on many reasoning tasks Summary: The paper proposes AoT, a new way of reasoning with abstraction, and AoT Collection, a dataset to train language models with it. It demonstrates that AoT models are better at reasoning than CoT models.
http://arxiv.org/abs/2406.12441v1
Compressor summary: The paper introduces Cycle-Correspondence Loss (CCL) for learning view-invariant visual descriptors for robot manipulation tasks, enabling simple data collection and training on unpaired RGB camera views.
http://arxiv.org/abs/2406.12440v1
Compressor summary: The authors explore different deep learning methods for recognising hand gestures using 3D skeleton data and show that supervised learning is most accurate, while self-supervised learning improves accuracy in simulated settings.
http://arxiv.org/abs/2406.12439v1
Compressor summary: Key points: - GNNs excel at multi-class node classification, but less so at multi-label classification - The authors collected and released three biological datasets and a multi-label graph generator - They proposed new notions of homophily and Cross-Class Neighborhood Similarity for multi-label classification - They conducted a large-scale comparative study with eight methods across nine datasets Summary: The paper introduces new concepts and methods for multi-label node classification using GNNs, and releases three biological datasets and a graph generator.
http://arxiv.org/abs/2406.12430v1
Compressor summary: The paper introduces Decision QA, a new task for using LLMs to help with complex decision making, and proposes a new RAG technique called PlanRAG that performs better than the state-of-the-art method in two scenarios.
http://arxiv.org/abs/2406.12429v1
Compressor summary: This paper proposes a method to select homogeneous tools for tasks, considering both their performance and cost, and shows it outperforms existing methods.
http://arxiv.org/abs/2406.12428v1
Compressor summary: The study proposes a method to generate text and speech simultaneously for spoken dialogue systems, reducing response generation latency and maintaining content quality.
http://arxiv.org/abs/2406.12423v1
Compressor summary: Key points: - Time-series data has privacy and business sensitivity issues - Generative time-series models can create synthetic data for societal benefits - Existing models are limited by memory, accuracy, and representativeness - The paper proposes a transformer-based diffusion model (TDDPM) that outperforms state-of-the-art - The model can also generate mobility data for different scenarios and environments Summary: The paper introduces TDDPM, a transformer-based diffusion model that generates high-quality and representative time-series data, especially for mobility applications, while addressing privacy and scalability challenges.
http://arxiv.org/abs/2406.12422v1
Compressor summary: The researchers developed a web service that combines deep learning and a morphological dictionary to improve Czech morphosyntactic analysis, achieving significant error reductions in lemmatization, POS tagging, and dependency parsing.
http://arxiv.org/abs/2406.12420v1
Compressor summary: The paper proposes a unified template filling model that connects textual and visual modalities via textual prompts to improve Event Argument Extraction performance on the M2E2 benchmark.
http://arxiv.org/abs/2406.12419v1
Compressor summary: The ESA$^\mathrm{AI}$ protocol uses AI assistance to help annotators reduce time and cost in evaluating machine translation quality, while providing more detailed annotations.
http://arxiv.org/abs/2406.12416v1
Compressor summary: This paper evaluates how well preference learning fine-tunes LLMs for factuality on out-of-domain datasets and proposes a new method, APEFT, that improves factuality awareness.
http://arxiv.org/abs/2406.12412v1
Compressor summary: The RC-CCD framework uses rough set theory and consensus clustering to accurately detect overlapping communities in complex networks, handling uncertainties and enhancing reliability.
http://arxiv.org/abs/2406.12407v1
Compressor summary: The authors present a new method using occupancy networks to accurately locate 67 body structures from single depth images, accounting for individual anatomical diversity, and aiming to enhance medical imaging and diagnostics.
http://arxiv.org/abs/2406.12406v1
Compressor summary: The paper proposes a new algorithm for learning from limited feedback that improves upon existing bounds and has a small price of bandit feedback in the agnostic setting.
http://arxiv.org/abs/2406.12404v1
Compressor summary: The text introduces a new scan-to-BIM framework for creating geometric digital twins of as-built roads from semantically labeled point cloud data, which can improve automation and accuracy.
http://arxiv.org/abs/2406.12403v1
Compressor summary: The paper proposes a framework that allows clients to train small language models on large language models' rationales without revealing sensitive information using two privacy strategies.
http://arxiv.org/abs/2406.12402v1
Compressor summary: This paper introduces explainable templates for common informal logical fallacies to explicate their implicit logic and evaluates their effectiveness in annotating and detecting fallacies.
http://arxiv.org/abs/2406.12400v1
Compressor summary: The paper presents an innovative IoT intrusion detection system using deep learning that achieves high accuracy, low false alarms, and real-time processing, making it a promising solution for IoT cybersecurity.
http://arxiv.org/abs/2406.12399v1
Compressor summary: QueerBench is a new framework that evaluates how large language models generate sentence completions that are potentially harmful towards LGBTQIA+ individuals, finding a significant difference in discriminatory behaviour.
http://arxiv.org/abs/2406.12397v1
Compressor summary: Synthetic data can improve LLMs' performance but may cause pattern overfitting and reduce instruction-following; using unlearning techniques can mitigate these issues.
http://arxiv.org/abs/2406.12395v1
Compressor summary: The paper proposes SDNIA-YOLO, a method that improves object detection in extreme conditions by enhancing image quality and learning from synthesized images using neural style transfer.
http://arxiv.org/abs/2406.12389v1
Compressor summary: The authors introduce a large and reliable dataset of emotion causes extracted from tweets, which can help develop emotion-aware systems.
http://arxiv.org/abs/2406.12386v1
Compressor summary: IPEval is the first evaluation benchmark for Large Language Models in intellectual property tasks, covering creation, application, protection, and management aspects with multiple-choice questions and various evaluation methods.
http://arxiv.org/abs/2406.12384v1
Compressor summary: The text introduces VRSBench, a new benchmark for remote sensing image understanding that improves on existing datasets with more diverse tasks, detailed object information, and quality control.
http://arxiv.org/abs/2406.12382v1
Compressor summary: TAGI is a method to simulate human learning by generating task-specific models from instructions, improving cross-task generalization and reducing computational costs.
http://arxiv.org/abs/2406.12381v1
Compressor summary: QOG models using fine-tuning sequence-to-sequence LMs are efficient and stable, outperforming other methods and being competitive with Llama 3-8B.
http://arxiv.org/abs/2406.12375v1
Compressor summary: GW-MoE is a fine-tuning method that reduces uncertainty in Mixture-of-Experts models by broadcasting uncertain tokens across experts during inference, improving performance in various tasks and model sizes.
http://arxiv.org/abs/2406.12374v1
Compressor summary: This paper explores how network structures and agent interactions affect the reasoning and question-answering abilities of multi-agent systems, finding that random or scale-free networks with knowledgeable hubs improve performance.
http://arxiv.org/abs/2406.12373v1
Compressor summary: WebCanvas is a framework for evaluating web agents that captures the dynamic nature of web interactions using a novel metric, dataset, and annotation tools.
http://arxiv.org/abs/2406.12369v1
Compressor summary: The study evaluates deep learning techniques for continuous sign language recognition across various datasets and languages, providing insights into their performance and robustness.
http://arxiv.org/abs/2406.12368v1
Compressor summary: DiffMix is a self-supervised learning framework that combines real and synthetic images to improve representation learning and reduce image augmentations.
http://arxiv.org/abs/2406.12367v1
Compressor summary: The paper presents a new method to train post-processing filters for machine vision tasks using competitive learning, achieving better results than independent training.
http://arxiv.org/abs/2406.12366v1
Compressor summary: The paper proposes a framework and algorithms for structured prediction in online learning, generalizing optimal supervised learning algorithms to non-i.i.d. and non-stationary data.
http://arxiv.org/abs/2406.12364v1
Compressor summary: Context-aware Neural Machine Translation models may improve accuracy but not eliminate gender bias.
http://arxiv.org/abs/2406.12362v1
Compressor summary: This paper describes a machine learning-based drone detection system development process that follows the soon-to-be-published ED 324/ARP 6983 recommendations for reliability.
http://arxiv.org/abs/2406.12360v1
Compressor summary: UrbanLLM is a large language model that helps solve complex urban problems by using AI models for different sub-tasks and generating comprehensive responses.
http://arxiv.org/abs/2406.12359v1
Compressor summary: The text discusses how different data sampling methods affect the ability of meta-RL agents to represent and adapt to unknown environments, particularly in continuous control tasks and sparse reward navigation tasks.
http://arxiv.org/abs/2406.12355v1
Compressor summary: Key points: - Gait recognition identifies individuals by walking patterns - LiDAR-camera fusion is a promising method for gait recognition - LiCAF is a novel network that uses asymmetric modeling strategy - LiCAF improves cross-modal information selection and temporal modeling - LiCAF achieves state-of-the-art performance on SUSTech1K dataset Summary: LiCAF is a new network that fuses LiDAR and camera data for gait recognition, using an asymmetric strategy to enhance cross-modal features and temporal modeling, and achieving the best results on a large dataset.
http://arxiv.org/abs/2406.12354v1
Compressor summary: This paper proposes an adaptive machine unlearning method for multilingual models that erases information across languages while preserving performance and improving security against low-resource language attacks.
http://arxiv.org/abs/2406.12350v1
Compressor summary: Our method uses a registration-oriented encoder that combines general and structural features to improve cross-domain deformable registration accuracy and adaptability.
http://arxiv.org/abs/2406.12347v1
Compressor summary: The paper presents a novel feature-based approach to analyze and understand how social biases are propagated within large language models and proposes targeted debiasing methods.
http://arxiv.org/abs/2406.12338v1
Compressor summary: The paper introduces a flexible algorithmic framework for fitting PARAFAC2-based CMTF models using Alternating Optimization and ADMM, which allows to impose various constraints on all modes and linear couplings and shows its benefits in accuracy and efficiency.
http://arxiv.org/abs/2406.12336v1
Compressor summary: The study evaluates and compares different sentence embedding models for telecom domains, finding that domain adaptation improves both retrieval accuracy and isotropy of embeddings.
http://arxiv.org/abs/2406.12335v1
Compressor summary: VATP is a new method for reducing the memory cost of large language models by using both attention scores and value vector norms to prune tokens.
http://arxiv.org/abs/2406.12334v1
Compressor summary: The paper introduces two metrics (sensitivity and consistency) to measure how well large language models handle rephrasings of the input for text classification tasks, aiming to improve their robustness and performance.
http://arxiv.org/abs/2406.12331v1
Compressor summary: Our novel approach improves large language models' multi-hop reasoning by dynamically editing long contexts using interactive information retrieval.
http://arxiv.org/abs/2406.12329v1
Compressor summary: SNAP is a machine unlearning method for large language models that selectively forgets information without compromising their performance or user experience.
http://arxiv.org/abs/2406.12321v1
Compressor summary: APEx is a framework that uses a language model to automatically generate experiments and reports for evaluating large multimodal models.
http://arxiv.org/abs/2406.12319v1
Compressor summary: The study compares pointwise and pairwise evaluation methods for natural language generation tasks, finding that pointwise is more robust but pairwise can identify shortcomings, leading to a hybrid method that combines both approaches.
http://arxiv.org/abs/2406.12316v1
Compressor summary: The paper proposes a modality-aware and instance-aware visual prompts network to improve visible-infrared pedestrian re-identification by using transformer architecture and customized prompts for each person.
http://arxiv.org/abs/2406.12315v1
Compressor summary: PruningBench is a comprehensive benchmark for evaluating structural pruning methods on various models and tasks, providing an online platform and facilitating future research.
http://arxiv.org/abs/2406.12311v1
Compressor summary: BinaryMoS is a novel binarization technique for large language models that adaptively adjusts the values of binary weights based on each token's context, enhancing linguistic effectiveness without compromising compression efficiency.
http://arxiv.org/abs/2406.12307v1
Compressor summary: This study investigates how well large language models can handle incomplete scenarios and tool unavailability in real-world environments, and suggests improvements for their reliability.
http://arxiv.org/abs/2406.12304v1
Compressor summary: The paper proposes a new framework using contrastive optimal transport to generate counter-narratives against hate speech, addressing target interaction, diversification, and relevance issues.
http://arxiv.org/abs/2406.12303v1
Compressor summary: The paper proposes Immiscible Diffusion, a method to improve diffusion models by assigning target noise for each image before diffusing it, resulting in faster and more accurate training.
http://arxiv.org/abs/2406.12298v1
Compressor summary: The paper proposes using support vector machine with radial basis function kernel to predict hazardous flight weather conditions based on historical meteorological observations.
http://arxiv.org/abs/2406.12297v1
Compressor summary: The paper proposes a faithful and parallel density peaks clustering method that works on non-Euclidean data and outperforms existing methods in accuracy.
http://arxiv.org/abs/2406.12295v1
Compressor summary: The paper presents FS-GEN, a framework that combines large and small language models for various applications, and analyzes different techniques within it to improve efficiency and collaboration.
http://arxiv.org/abs/2406.12293v1
Compressor summary: ENCOFA is a framework that addresses mixed closed-set and open-set label noise in medical image classification by using contrastive learning and feature augmentation.
http://arxiv.org/abs/2406.12288v1
Compressor summary: The study investigates how neuron activation in large language models' feed-forward layers explains their arithmetic reasoning capabilities when prompted with Chain-of-Thought prompts, using Llama2 as an example.
http://arxiv.org/abs/2406.12286v1
Compressor summary: VIRL pre-trains a 3D geometric encoder to improve few-shot learning for manufacturability tasks using CAM simulations, and suggests LoRA and static normalization as effective strategies.
http://arxiv.org/abs/2406.12285v1
Compressor summary: The paper proposes DASSF, a dynamic-attention scale-sequence fusion algorithm for small target detection in aerial images, which improves YOLO's performance by enhancing its ability to perceive targets of different scales and types.
http://arxiv.org/abs/2406.12284v1
Compressor summary: The paper analyzes how the recency heuristic in reinforcement learning, which favors recent rewards, helps with temporal credit assignment and provides proofs for its convergence, contraction rate, and variance properties.
http://arxiv.org/abs/2406.12282v1
Compressor summary: SAGDFN is a scalable graph neural network that captures complex spatial-temporal correlations for large-scale multivariate time series forecasting and outperforms existing methods on multiple real-world datasets.
http://arxiv.org/abs/2406.12277v1
Compressor summary: The study introduces BELIEF, a framework to evaluate how well large language models understand factual knowledge using diverse prompts and a new dataset called MyriadLAMA.
http://arxiv.org/abs/2406.12276v1
Compressor summary: CodeNav is an LLM agent that navigates unseen code repositories and solves user queries by iteratively generating solutions with execution feedback.
http://arxiv.org/abs/2406.12275v1
Compressor summary: VoCo-LLaMA compresses vision tokens using language models to improve multi-modal tasks' efficiency and understanding of temporal correlations in videos.
http://arxiv.org/abs/2406.12274v1
Compressor summary: SafeInfer is a method to generate safe and ethical responses from language models using demonstration examples and safety-optimized token selection.
http://arxiv.org/abs/2406.12272v1
Compressor summary: SlotSSMs are a new framework for SSMs that model modular structures with independent state vectors and self-attention, improving performance in tasks involving multiple objects and long-range temporal dependencies.
http://arxiv.org/abs/2406.12271v1
Compressor summary: The Agriculture-Vision Challenge uses semantic segmentation models to analyze satellite images for agricultural purposes and addresses class imbalance issues by using data augmentation and adaptive weight schemes.
http://arxiv.org/abs/2406.12269v1
Compressor summary: The text proposes a novel table reasoning framework called Question-then-Pinpoint that helps generate high-quality table summaries by unveiling implicit knowledge and providing explainable guidance for the summarizer.
http://arxiv.org/abs/2406.12266v1
Compressor summary: The paper proposes ClientCAST, a method to assess large language models (LLMs) as therapists by simulating clients and having them complete questionnaires about their interactions with LLM therapists.
http://arxiv.org/abs/2406.12263v1
Compressor summary: The study examines how Large Language Models can both facilitate and defend against chat-based social engineering attacks and proposes ConvoSentinel, a defense pipeline that improves detection by comparing messages to a database of similar conversations.
http://arxiv.org/abs/2406.12262v1
Compressor summary: ICPs generate prediction sets with confidence levels based on exchangeability, and this study explores efficient data division for their development, considering overlap between training and calibration sets.
http://arxiv.org/abs/2406.12260v1
Compressor summary: The paper proposes a self-supervised anomaly detection technique for time-series data that uses learnable data augmentation and contrastive learning, improving performance on benchmark datasets and enabling diagnosis of root causes.
http://arxiv.org/abs/2406.12258v1
Compressor summary: The paper proposes a novel method to improve face anti-spoofing performance by enhancing zero-shot data domain generalization, measuring uncertainty, and ensembling backbones from a Bayesian perspective.
http://arxiv.org/abs/2406.12257v1
Compressor summary: CleanGen is a lightweight defense against backdoor attacks in large language models that replaces suspicious tokens with non-compromised ones during decoding.
http://arxiv.org/abs/2406.12256v1
Compressor summary: The report introduces a new loss function for video-text retrieval tasks that leverages correlation matrix information and improves performance on EPIC-KITCHENS-100 challenge.
http://arxiv.org/abs/2406.12255v1
Compressor summary: This paper investigates why chain-of-thought methods improve large language models' reasoning performance, proposing a Read-and-Control approach for explaining and controlling these methods.
http://arxiv.org/abs/2406.12252v1
Compressor summary: This paper reviews datasets and applications of Natural Language Processing and multimodal models in sports analytics after 2020, highlighting their benefits for various purposes and challenges for future development.
http://arxiv.org/abs/2406.12251v1
Compressor summary: SHLPT is a novel lifelong learning framework that uses a learnable similarity metric to partition tasks into subsets, enabling effective transfer regardless of their similarity or dissimilarity and reducing catastrophic forgetting with a parameter pool.
http://arxiv.org/abs/2406.12246v1
Compressor summary: TroL is an efficient large language and vision model family that uses layer traversal to reuse layers and achieve powerful performance without increasing physical model size.
http://arxiv.org/abs/2406.12242v1
Compressor summary: The paper proposes GMP-AR, a framework that uses temporal hierarchy information and an adaptive reconciliation strategy to improve forecasting performance and maintain coherence for different temporal granularity tasks, such as sales prediction.
http://arxiv.org/abs/2406.12241v1
Compressor summary: The paper proposes a framework that combines different approximate sampling methods with Feel-Good Thompson Sampling for exploration in Deep RL, achieving theoretical and empirical improvements.
http://arxiv.org/abs/2406.12238v1
Compressor summary: PFID is a privacy-preserving framework for LLMs that hides user input by using model sharding, singular value decomposition, and compressed hidden states.
http://arxiv.org/abs/2406.12235v1
Compressor summary: Holmes-VAD is a framework that uses precise temporal supervision and rich multimodal instructions to enable accurate and interpretable video anomaly detection, and introduces VAD-Instruct50k, a large-scale multimodal VAD instruction-tuning benchmark.
http://arxiv.org/abs/2406.12233v1
Compressor summary: SyncVSR is an end-to-end learning framework for visual speech recognition that uses quantized audio and crossmodal supervision to handle visually similar lip gestures and achieve state-of-the-art results with much less data.
http://arxiv.org/abs/2406.12232v1
Compressor summary: The study shows that GPT-3 and Llama models tend to favor hiring White female-sounding names and suggest varying salaries for candidates based on their race and gender, which may not reflect real-world labor market trends.
http://arxiv.org/abs/2406.12230v1
Compressor summary: MCSD is an efficient language model that fuses diverse features with slope and decay sections, enabling fast inference and low resource consumption, making it suitable for edge devices and embodied intelligence.
http://arxiv.org/abs/2406.12229v1
Compressor summary: The text proposes a method called ST-GCHB that uses graph contrastive learning and histopathological images to predict gene expression in spatial transcriptomics data, while accounting for spatial dependencies among spots.
http://arxiv.org/abs/2406.12227v1
Compressor summary: The paper explores how fine-tuning large language models causes them to lose general capabilities due to instruction following, proposes the Instruction Vector framework to capture these capabilities, and develops IV-guided training to mitigate forgetting.
http://arxiv.org/abs/2406.12225v1
Compressor summary: The text introduces a method that uses a multimodal language model to improve object detection by generating referential expressions for each category, which helps align the detected targets with the target concepts and enhance the fine-tuning of the vision-language model.
http://arxiv.org/abs/2406.12223v1
Compressor summary: The study shows that current language models struggle to detect offensive content when text is manipulated with homophonic substitutions and emoji transformations, especially in Chinese, suggesting a need for better techniques.
http://arxiv.org/abs/2406.12221v1
Compressor summary: RLFH is a fine-grained feedback method using online reinforcement learning to reduce hallucination in large language models.
http://arxiv.org/abs/2406.12220v1
Compressor summary: The paper proposes a new way to understand and improve Transformers and MLP-Mixers by integrating them with hierarchical associative memory, and finds that weight matrices in the vanilla MLP-Mixer naturally acquire symmetry-breaking effects for better performance.
http://arxiv.org/abs/2406.12219v1
Compressor summary: The paper introduces HP-ViT, a model that uses a ViT backbone and transformer head to accurately estimate hand poses from egocentric videos, achieving the 1st position in the EgoExo4D Hand Pose Challenge.
http://arxiv.org/abs/2406.12216v1
Compressor summary: This paper tests if large language models can accurately infer people's personalities from brief descriptions, finding that they can, but also have some limitations and biases.
http://arxiv.org/abs/2406.12213v1
Compressor summary: The text discusses extending oracle Turing machines with clustered large language models to improve natural language processing tasks and reliability.
http://arxiv.org/abs/2406.12211v1
Compressor summary: The report describes a winning solution for a challenge that involves detecting if people are looking at the camera in videos, using a combination of image and text features processed by neural networks.
http://arxiv.org/abs/2406.12208v1
Compressor summary: The paper proposes a method called Evolver to integrate multiple language models into a unified model that performs well across various domains and generalizes well on out-of-domain data, without needing further training or additional data.
http://arxiv.org/abs/2406.12205v1
Compressor summary: RL-LOW is an algorithm that minimizes simple regret in offline RL with preference feedback and preserves privacy.
http://arxiv.org/abs/2406.12203v1
Compressor summary: InterIntent is a framework that assesses large language models' social intelligence by measuring their ability to understand and manage intentions in a game setting, revealing their weaknesses in inferring others' intentions and highlighting the importance of intention understanding for success.
http://arxiv.org/abs/2406.12199v1
Compressor summary: The study shows that advanced deep learning models can better predict heart rate time series for cardiovascular disease management than traditional models.
http://arxiv.org/abs/2406.12197v1
Compressor summary: DAO is a system that improves event extraction using multi-agent debates without tuning, with two novel modules (DRAG and AdaCP) enhancing accuracy and reliability.
http://arxiv.org/abs/2406.12193v1
Compressor summary: The paper proposes an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address issues in high-dimensional multi-label data with missing labels, using a generalized regression model and integrating instance and label correlations.
http://arxiv.org/abs/2406.12182v1
Compressor summary: Aquila-Med is a bilingual medical LLM that uses continue pre-training, supervised fine-tuning, and reinforcement learning to improve performance in specific professional fields like medicine, with open-source datasets and models available.
http://arxiv.org/abs/2406.12179v1
Compressor summary: The paper proposes a Universal Brain-Encoder that learns unique voice-embeddings per brain-voxel, enabling cross-attention between images and brains, and improving encoding across subjects, datasets, and machines.
http://arxiv.org/abs/2406.12178v1
Compressor summary: The text introduces FCA-RAC, a framework that enhances action counting models by annotating videos with first action cycles, adjusting sampling strategies, using multi-temporal granularity convolution, and exploiting training knowledge augmentation to improve performance on seen and unseen actions.
http://arxiv.org/abs/2406.12177v1
Compressor summary: The text describes a novel semisupervised learning method for prostate cancer detection on MRI using clinical information from radiology reports to guide the training and reduce the need for manual annotations.
http://arxiv.org/abs/2406.12173v1
Compressor summary: MiSuRe is an algorithm that creates minimal saliency maps for image segmentation models, highlighting only crucial regions and enabling post-hoc reliability analysis.
http://arxiv.org/abs/2406.12172v1
Compressor summary: SearchBench introduces a new benchmark for search problems that challenges LLMs' end-to-end text solving abilities and shows that in-context learning with A* algorithm implementations and multi-stage verification can significantly improve their performance.
http://arxiv.org/abs/2406.12168v1
Compressor summary: BPO is an online method for aligning large language models with human preferences, which improves performance across various tasks and datasets.
http://arxiv.org/abs/2406.12165v1
Compressor summary: The authors propose a method to estimate uncertainty in word embeddings (GloVe) and demonstrate its usefulness for hypothesis testing and bias analysis in various applications.
http://arxiv.org/abs/2406.12163v1
Compressor summary: The paper proposes a new way to use logic for reasoning about general discussion and argumentation models using a top-down approach.
http://arxiv.org/abs/2406.12159v1
Compressor summary: Our study suggests that pre-training benefits of large language models come mainly from their latent space geometry, not linguistic knowledge, and this could help create better models with less data.
http://arxiv.org/abs/2406.12158v1
Compressor summary: This study explores whether large language models can infer causal relations from different types of relational data in text and finds that they struggle with counterfactuals and are prone to the post hoc fallacy.