This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.14818v1
Compressor summary: The study presents Palo, a large multilingual multimodal model that enables visual reasoning in 10 major languages, using a semi-automated translation approach and showing improvements over baselines.
http://arxiv.org/abs/2402.14817v1
Compressor summary: The paper proposes a novel way of estimating camera poses using rays instead of global parametrizations, which improves accuracy and generalizes better to different scenarios.
http://arxiv.org/abs/2402.14811v1
Compressor summary: Fine-tuning language models on generalized tasks improves their performance on entity tracking by enhancing their ability to handle positional information.
http://arxiv.org/abs/2402.14812v1
Compressor summary: Key points: - Weakly supervised visual recognition reduces human labeling costs and relies on multi-instance learning and pseudo-labeling - WeakSAM uses pre-learned world knowledge from SAM to solve WSOD and segmentation problems - WeakSAM improves WSOD and WSIS performance significantly with adaptive PGT generation, RoI drop regularization, and category awareness Summary: WeakSAM is a novel method that uses pre-learned world knowledge from SAM to improve weakly supervised visual recognition problems with better pseudo ground truth generation, region of interest regularization, and category awareness.
http://arxiv.org/abs/2402.14810v1
Compressor summary: The paper proposes a novel denoising method, GeneOH Diffusion, that refines incorrect hand trajectories in noisy hand-object interaction sequences using a contact-centric representation and a domain-generalizable denoising scheme.
http://arxiv.org/abs/2402.14809v1
Compressor summary: CriticBench is a benchmark to evaluate how well large language models can critique, refine, and improve their own reasoning across various tasks and domains.
http://arxiv.org/abs/2402.14808v1
Compressor summary: The paper proposes RelayAttention, an efficient attention algorithm that reduces redundant memory accesses for long system prompts in large language models without compromising generation quality.
http://arxiv.org/abs/2402.14806v1
Compressor summary: The authors propose a deep learning method to reduce computational requirements and maintain skill in air quality forecasting, especially during extreme events.
http://arxiv.org/abs/2402.14805v1
Compressor summary: The paper proposes an external evaluation method to measure LLM personalities using situational questions and shows that LLMs can have different personalities depending on the scenario, unlike humans.
http://arxiv.org/abs/2402.14804v1
Compressor summary: MATH-Vision is a new dataset with diverse mathematical problems from real competitions that tests the limits of Large Multimodal Models' reasoning skills in visual contexts.
http://arxiv.org/abs/2402.14802v1
Compressor summary: GRAFF-LP improves link prediction performance on heterophilic graphs by adopting physics biases in the message-passing mechanism of Graph Neural Networks.
http://arxiv.org/abs/2402.14800v1
Compressor summary: This paper introduces new techniques to make large language models with MoE more efficient and faster without sacrificing performance.
http://arxiv.org/abs/2402.14798v1
Compressor summary: The paper proposes a new protocol for annotating decompositional entailment datasets, which leads to improved textual inference performance and consistency in LLM-based systems.
http://arxiv.org/abs/2402.14797v1
Compressor summary: Key points: - Image models are repurposed for video generation, but this reduces quality and scalability - Snap Video is a video-first model that addresses these challenges using EDM and transformer architectures - Snap Video achieves state-of-the-art results, higher quality, and more user preference than existing methods Summary: Snap Video is a novel video generation model that leverages EDM and transformers to overcome the limitations of image models, producing high-quality and temporally consistent videos with more user satisfaction.
http://arxiv.org/abs/2402.14792v1
Compressor summary: QNeRF is a method for improving geometric consistency in multi-view image editing by using a neural radiance field trained on query features and injecting them back into the self-attention layers during generation.
http://arxiv.org/abs/2402.14789v1
Compressor summary: SMA is a domain-agnostic method for self-supervised learning that uses attention to learn masks for sampling, achieving state-of-the-art results on three benchmarks in different domains.
http://arxiv.org/abs/2402.14781v1
Compressor summary: The paper proposes a new framework for Bayesian causal inference that combines order-based MCMC structure learning, gradient-based graph learning, and Gaussian processes to exactly marginalize over causal models and outperforms existing methods on various benchmarks.
http://arxiv.org/abs/2402.14780v1
Compressor summary: The paper presents Customize-A-Video, a method for one-shot motion customization from reference videos using low-rank adaptation and appearance absorbers, applicable to various video tasks.
http://arxiv.org/abs/2402.14778v1
Compressor summary: This paper studies how well pretrained language models can follow instructions in different languages after being trained only on English data, and suggests ways to improve their performance.
http://arxiv.org/abs/2402.14776v1
Compressor summary: The paper introduces 2DMSE, a sentence embedding model that adjusts embedding size and Transformer layers for flexibility and efficiency in downstream tasks.
http://arxiv.org/abs/2402.14767v1
Compressor summary: DualFocus is a framework that integrates macro and micro perspectives in multi-modal large language models to enhance vision-language task performance by focusing on the image from a global view, answering questions, and zooming in on relevant sub-regions for detailed analysis.
http://arxiv.org/abs/2402.14762v1
Compressor summary: MT-Bench-101 is a benchmark for evaluating large language models' dialogue skills, revealing their strengths and weaknesses across 13 tasks and showing that current methods do not significantly improve their performance.
http://arxiv.org/abs/2402.14760v1
Compressor summary: The paper proposes a meta-learning approach to optimize a reward model for out-of-distribution preference learning with large language models.
http://arxiv.org/abs/2402.14759v1
Compressor summary: The paper explores how key statistical learning concepts apply when train and test data come from the same uncertain probability distribution.
http://arxiv.org/abs/2402.14757v1
Compressor summary: The paper presents an approach using deep reinforcement learning-based control for Unmanned Aerial Vehicles to conduct concrete bridge deck surveys with traffic, detecting cracks using edge detection techniques or a Convolutional Neural Network.
http://arxiv.org/abs/2402.14753v1
Compressor summary: The paper shows that prompting and prefix-tuning can make smaller pretrained models act like any sequence-to-sequence function, with attention heads being especially powerful for this purpose.
http://arxiv.org/abs/2402.14746v1
Compressor summary: We study efficient large language models (LLMs) and show that the size and parameters of a natural training corpus scale with each other, and increasing the corpus size can reveal new skills for LLMs.
http://arxiv.org/abs/2402.14744v1
Compressor summary: The paper proposes a new way to use large language models in agent frameworks for creating personalized and realistic urban mobility scenarios, by aligning them with rich activity data and developing strategies to generate reliable activities.
http://arxiv.org/abs/2402.14743v1
Compressor summary: The study presents a method to create an Ottoman Turkish dependency treebank using a BERT-based model and manual corrections, which will enable automated analysis of historical documents.
http://arxiv.org/abs/2402.14740v1
Compressor summary: The text discusses the use of Reinforcement Learning from Human Feedback (RLHF) for AI alignment in large language models and proposes simpler optimization methods over PPO for better performance.
http://arxiv.org/abs/2402.14735v1
Compressor summary: In-context learning reveals how transformers learn latent causal structure using self-attention and gradient descent, enabling them to encode induction heads for Markov chains.
http://arxiv.org/abs/2402.14730v1
Compressor summary: CS-CNNs are $\mathrm{E}(p, q)$-equivariant CNNs that process multivector fields on pseudo-Euclidean spaces, achieving better results than baseline methods in fluid dynamics and relativistic electrodynamics tasks.
http://arxiv.org/abs/2402.14726v1
Compressor summary: The paper proposes methods to combine expert rules with neural networks for concept-based learning, ensuring that output probabilities follow the rules and offering a way to integrate inductive and deductive learning.
http://arxiv.org/abs/2402.14720v1
Compressor summary: The authors propose a Transformer-based model for Continuous Sign Language Recognition that eliminates the need for handcrafted features and improves accuracy in detecting sign boundaries.
http://arxiv.org/abs/2402.14714v1
Compressor summary: The report introduces EEVE-Korean-v1.0, a Korean adaptation of large language models that improves non-English text understanding with efficient and effective vocabulary expansion.
http://arxiv.org/abs/2402.14710v1
Compressor summary: IEPile is a large bilingual IE instruction corpus that improves LLMs' performance in Information Extraction, especially zero-shot generalization.
http://arxiv.org/abs/2402.14708v1
Compressor summary: The paper proposes a novel causal temporal graph neural network (CaT-GNN) for credit card fraud detection that leverages causal invariant learning, temporal attention, and causal mixup to enhance robustness and interpretability.
http://arxiv.org/abs/2402.14707v1
Compressor summary: The paper proposes a two-stage image synthesis framework to create realistic synthetic cytopathological images for augmenting cervical abnormality screening using Stable Diffusion and parameter-efficient fine-tuning methods.
http://arxiv.org/abs/2402.14704v1
Compressor summary: The paper introduces a new lexical simplification method that uses an adversarial editing system and knowledge distillation from large language models to simplify texts without needing annotated data or parallel corpora.
http://arxiv.org/abs/2402.14703v1
Compressor summary: The paper proposes novel coverage assumptions for off-policy evaluation in partially observable environments, leading to polynomial bounds and new algorithms.
http://arxiv.org/abs/2402.14702v1
Compressor summary: The paper introduces InfFeed, a method that uses influence functions to improve deep neural models' performance and reduce the need for manual annotation in datasets by identifying influential instances.
http://arxiv.org/abs/2402.14701v1
Compressor summary: COMPASS is a novel framework that uses advanced language models to infer the therapeutic working alliance from psychotherapy session transcripts, improving understanding of therapeutic interactions and feedback for therapists.
http://arxiv.org/abs/2402.14700v1
Compressor summary: This paper investigates how large language models achieve cross-lingual alignment and identifies a core linguistic competence region that is essential for their performance across 30 languages.
http://arxiv.org/abs/2402.14698v1
Compressor summary: The text describes a study on classifying urban dust pollution sources from earthwork-related locations using data and developing a system to help control it.
http://arxiv.org/abs/2402.14695v1
Compressor summary: Quasi-conformal interactive segmentation (QIS) is a model that uses user clicks to guide the segmentation of degraded images, improving accuracy by adjusting a template mask with an orientation-preserving mapping.
http://arxiv.org/abs/2402.14690v1
Compressor summary: The text introduces a new evaluation framework (UFO) for assessing the accuracy of large language models across various text generation tasks, using different fact sources like human-written evidence and search engine results.
http://arxiv.org/abs/2402.14688v1
Compressor summary: Q-probing is a method to adapt pre-trained language models for specific tasks using reweighted sampling based on task-specific reward functions, which can improve performance in various domains and data regimes.
http://arxiv.org/abs/2402.14683v1
Compressor summary: The authors propose a tool called VHTest that generates diverse visual hallucination instances for multi-modal large language models and use it to create a benchmark dataset.
http://arxiv.org/abs/2402.14679v1
Compressor summary: The study examines how well Large Language Models can show human-like personality traits through their responses to questionnaires and their actual behavior, using established benchmarks to understand any differences.
http://arxiv.org/abs/2402.14672v1
Compressor summary: The paper explores how tools can help large language models process complex environments like databases and knowledge bases, improving their performance significantly.
http://arxiv.org/abs/2402.14665v1
Compressor summary: The study proposes a new loss function and training method to improve face recognition systems' resistance to face morphing attacks.
http://arxiv.org/abs/2402.14664v1
Compressor summary: sDM is a Bayesian method that uses structured priors to capture action correlations for more efficient off-policy evaluation and learning in interactive systems.
http://arxiv.org/abs/2402.14660v1
Compressor summary: ConceptMath is a benchmark that evaluates how well large language models can reason about different math concepts and helps improve them.
http://arxiv.org/abs/2402.14654v1
Compressor summary: Multi-HMR is a single-shot model that recovers 3D human mesh from an RGB image using Vision Transformer features and a novel Human Prediction Head module, achieving state-of-the-art results with CUFFS dataset and various backbone sizes.
http://arxiv.org/abs/2402.14652v1
Compressor summary: The paper introduces NeuScraper, a neural network-based web scraper that improves text extraction quality for language model pretraining.
http://arxiv.org/abs/2402.14650v1
Compressor summary: The paper proposes GaussianPro, a novel method to improve neural rendering using progressive propagation and patch matching techniques for better initialization of 3D Gaussians.
http://arxiv.org/abs/2402.14648v1
Compressor summary: AR-AT improves adversarial robustness and accuracy by addressing gradient conflict and mixture distribution issues in representation-based invariance regularization.
http://arxiv.org/abs/2402.14646v1
Compressor summary: CoLoRA is a fast and accurate method to predict solutions of partial differential equations using pre-trained neural networks that adapt low-rank weights in time.
http://arxiv.org/abs/2402.14645v1
Compressor summary: The text discusses the average-case hardness of sparse linear regression and presents a reduction from lattice problems to SLR, showing that finding good solutions for SLR is challenging even when the design matrix is well-conditioned.
http://arxiv.org/abs/2402.14642v1
Compressor summary: The text proposes a new method using distributed radiance fields for efficient video compression and real-time digital twin updates in the metaverse for autonomous vehicles.
http://arxiv.org/abs/2402.14621v1
Compressor summary: The "latrend" R package is a framework for comparing and implementing longitudinal clustering methods, allowing researchers to easily analyze trends in data over time.
http://arxiv.org/abs/2402.14616v1
Compressor summary: The paper evaluates how well subword-based word embeddings represent out-of-vocabulary words and their semantic similarity to in-vocabulary words.
http://arxiv.org/abs/2402.14614v1
Compressor summary: The paper proposes R'enyi efficiency as a way to evaluate tokenizers, but shows that it has limitations and cannot capture all aspects of a good tokenization scheme.
http://arxiv.org/abs/2402.14611v1
Compressor summary: This paper explores contrastive learning for medical image analysis, proposes local feature learning and feature decorrelation to overcome dimensional collapse, and shows improved performance in medical segmentation.
http://arxiv.org/abs/2402.14591v1
Compressor summary: The paper presents Fast Fruit Detector (FFD), a fast and resource-efficient object detection algorithm for autonomous aerial harvesting, along with a new dataset and data generation method for small-sized fruit instances.
http://arxiv.org/abs/2402.14586v1
Compressor summary: FrameNeRF uses a regularization model to generate dense views from sparse inputs, enabling fast high-fidelity NeRF models to perform well in few-shot novel view synthesis tasks.
http://arxiv.org/abs/2402.14585v1
Compressor summary: The paper proposes CBA, a new algorithm for prediction with expert advice under bandit feedback, which exploits the option of abstention and achieves better reward bounds than Exp4, especially for specialists.
http://arxiv.org/abs/2402.14577v1
Compressor summary: The authors present a method called iterative distribution alignment to address social bias in text-to-image models, which simplifies and improves upon previous approaches.
http://arxiv.org/abs/2402.14568v1
Compressor summary: The paper proposes LLM-DA, a novel technique for augmenting few-shot NER data using large language models to improve performance and maintain semantic integrity.
http://arxiv.org/abs/2402.14566v1
Compressor summary: t-SimCNE is a self-supervised learning method that produces semantically meaningful 2D visualizations of medical images using contrastive learning with data augmentations, which can help in data exploration and annotation.
http://arxiv.org/abs/2402.14558v1
Compressor summary: The paper explores challenges and opportunities in using large language models for various industrial applications by surveying practitioners, analyzing 68 papers, and answering four research questions.
http://arxiv.org/abs/2402.14551v1
Compressor summary: CLCE combines contrastive learning and cross-entropy loss to improve image model performance, especially in few-shot and transfer learning scenarios, while reducing dependency on large batch sizes.
http://arxiv.org/abs/2402.14547v1
Compressor summary: OmniPred is a framework that trains language models to accurately predict numeric outcomes for diverse real-world experiments using only textual representations of parameters and values.
http://arxiv.org/abs/2402.14545v1
Compressor summary: The paper explores how overly detailed training data causes large multimodal models to generate content beyond visual perception limits and proposes two methods to reduce this issue.
http://arxiv.org/abs/2402.14536v1
Compressor summary: Key points: - The paper proposes a causal model for cross-domain sentiment analysis that can handle unknown target domains. - The model disentangles domain-specific and domain-invariant features using backdoor adjustment. - The model outperforms existing methods on various datasets. Summary: The paper presents a causal model that learns domain-invariant features for cross-domain sentiment analysis, enabling robustness to unknown target domains.
http://arxiv.org/abs/2402.14533v1
Compressor summary: The study analyzes the linguistic styles of three large language models (LLMs) and finds significant variations that can help identify their origin with high accuracy.
http://arxiv.org/abs/2402.14532v1
Compressor summary: The paper proposes a method to embed heteroscedastic uncertainty in Bayesian Neural Networks without adding learnable parameters and improves performance with sampling-free variational inference.
http://arxiv.org/abs/2402.14531v1
Compressor summary: Politeness levels in prompts affect large language models' performance differently across English, Chinese, and Japanese tasks, with the best level varying by language and cultural context.
http://arxiv.org/abs/2402.14528v1
Compressor summary: ACE is a new RL algorithm that uses causality-aware entropy to explore important actions and dormancy-guided reset to balance exploration and exploitation, achieving better performance on 29 diverse continuous control tasks.
http://arxiv.org/abs/2402.14526v1
Compressor summary: ClusterClip Sampling is a data sampling strategy for training large language models that balances the text distribution by clustering and clipping repetitive samples to improve performance.
http://arxiv.org/abs/2402.14523v1
Compressor summary: The paper proposes Daisy-TTS, an emotional text-to-speech system that uses a prosody encoder to simulate a wide range of emotions based on the structural model of emotions.
http://arxiv.org/abs/2402.14522v1
Compressor summary: FUTE is a framework that unifies task embeddings from different language models, including prompt-based LLMs, enabling comparison and analysis of similarities across models.
http://arxiv.org/abs/2402.14521v1
Compressor summary: The paper describes the construction and validation of a new annotated dataset for natural language processing tasks in Malaysian English, which improved named entity recognition performance.
http://arxiv.org/abs/2402.14505v1
Compressor summary: The paper proposes a method to adapt pre-trained models for visual place recognition using hybrid adaptation and mutual nearest neighbor local feature loss, achieving better performance with less data and time than existing methods.
http://arxiv.org/abs/2402.14499v1
Compressor summary: The paper evaluates how well multiple-choice questions reflect a large language model's actual responses when interacting with users, finding significant misalignment between them.
http://arxiv.org/abs/2402.14494v1
Compressor summary: Noise-BERT is a framework that uses pre-training tasks and fine-tuning techniques to improve slot filling performance under input perturbations.
http://arxiv.org/abs/2402.14492v1
Compressor summary: INSTRAUG is a method that automatically expands instruction-following datasets for multimodal tasks, improving the performance of large language models.
http://arxiv.org/abs/2402.14490v1
Compressor summary: Equilibrium K-means (EKM) is a novel algorithm that improves clustering results for imbalanced data by alternating between two steps, reducing centroid crowding and having competitive performance on balanced data.
http://arxiv.org/abs/2402.14488v1
Compressor summary: The study presents a new generator to produce accurate information despite changes in contextual knowledge and finds that all models tend to generate previous answers as hallucinations.
http://arxiv.org/abs/2402.14484v1
Compressor summary: This study evaluates ChatGPT's causal text mining capabilities across various datasets, finding that while it performs well as a starting point, previous models and its own advanced versions outperform it in some aspects.
http://arxiv.org/abs/2402.14482v1
Compressor summary: SpanSeq is a database partition method that helps deep learning models in computational biology generalize better by avoiding data leakage between sets.
http://arxiv.org/abs/2402.14481v1
Compressor summary: AutoCD is a system that automatically finds and explains causes in various data types using causal methods.
http://arxiv.org/abs/2402.14475v1
Compressor summary: The paper proposes new methods for learning unknown SDEs from low-resolution and variable time-step data using robust density approximations.
http://arxiv.org/abs/2402.14474v1
Compressor summary: Large language models can enhance interpretable machine learning models like Generalized Additive Models, enabling better dataset analysis and interaction with domain experts.
http://arxiv.org/abs/2402.14469v1
Compressor summary: The method creates different modified images for each anomaly, showing what changed to make it normal, helping users understand why the detector flagged it as abnormal.
http://arxiv.org/abs/2402.14464v1
Compressor summary: NeRF-Det++ improves NeRF-Det by addressing semantic ambiguity, inappropriate sampling, and insufficient depth supervision in 3D detection using segmentation maps, perspective-aware sampling, and ordinal residual depth supervision.
http://arxiv.org/abs/2402.14461v1
Compressor summary: The paper proposes a new single-stage transformer framework (S^2Former-OR) that uses multi-view 2D scenes and 3D point clouds to generate semantic scene graphs for surgical procedures in the operating room, improving efficiency and accuracy.
http://arxiv.org/abs/2402.14460v1
Compressor summary: This paper explores how to derive different formulations of active inference from a single expected free energy definition and examines the conditions under which these formulations can be recovered in two settings.
http://arxiv.org/abs/2402.14458v1
Compressor summary: This paper presents an automatic method to generate arguments in multiple languages, a large corpus of argument schemes, and models to identify them.
http://arxiv.org/abs/2402.14457v1
Compressor summary: The paper introduces a new way to label different parts of Terms-and-Conditions contracts and tests its effectiveness using few-shot learning with LLMs.
http://arxiv.org/abs/2402.14456v1
Compressor summary: VLPose is a framework that uses language models to improve human pose estimation in both natural and artificial scenarios, enhancing adaptability and robustness.
http://arxiv.org/abs/2402.14454v1
Compressor summary: CCPA is a framework that uses clothing and pose transfer to improve long-term person re-identification by capturing body shape information with a relation graph attention network.
http://arxiv.org/abs/2402.14453v1
Compressor summary: Large language models can automatically adjust text difficulty for individual learners in conversation, even outperforming humans in some cases.
http://arxiv.org/abs/2402.14433v1
Compressor summary: The paper explores how to control language models using different types of concepts beyond truthfulness, developing a new metric and showing challenges in eliciting and guiding with some novel concepts.
http://arxiv.org/abs/2402.14428v1
Compressor summary: The paper introduces KoCoSa, a new Korean dialogue sarcasm detection dataset, and proposes an efficient pipeline to build it, which outperforms strong baselines like GPT-3.5.
http://arxiv.org/abs/2402.14427v1
Compressor summary: The Text-to-Pressure framework generates ground pressure sequences from textual descriptions of human activities using deep learning, enabling efficient human activity recognition models with less costly data acquisition.
http://arxiv.org/abs/2402.14424v1
Compressor summary: The study combines a large language model with causal knowledge graphs to generate novel hypotheses in psychology, outperforming both human experts and the LLM alone.
http://arxiv.org/abs/2402.14418v1
Compressor summary: The paper proposes a benchmark to evaluate vision-language models with uncertainty quantification and finds that uncertainty levels vary with accuracy and language model components.
http://arxiv.org/abs/2402.14415v1
Compressor summary: The paper introduces TaylorGrid, a novel method for efficient and high-quality 3D geometry representation using direct Taylor expansion optimization on grids.
http://arxiv.org/abs/2402.14411v1
Compressor summary: J-UniMorph is a Japanese Morphology dataset that covers more verb forms and linguistic nuances than the existing Wiktionary Edition.
http://arxiv.org/abs/2402.14409v1
Compressor summary: Key points: - Paper explores and resolves knowledge conflicts in retrieval-augmented language models (RALMs) - Paper presents an evaluation framework for assessing knowledge conflicts - Paper finds that RALMs have biases and preferences that affect their conflict resolution - Paper proposes a method called CD2 to improve RALMs' confidence calibration Summary: The paper investigates how retrieval-augmented language models resolve knowledge conflicts, which can limit their applicability. It evaluates different sources of conflicts and biases, and introduces a new method (CD2) to enhance their confidence estimation.
http://arxiv.org/abs/2402.14408v1
Compressor summary: This paper proposes a method to use BERT models for low-resource languages by matching their vocabulary with high-resource ones and shows its effectiveness on Silesian and Kashubian languages.
http://arxiv.org/abs/2402.14407v1
Compressor summary: The paper proposes a method to train robots using human videos without action labels and limited robot data, by combining generative pre-training and policy fine-tuning with discrete diffusion models.
http://arxiv.org/abs/2402.14404v1
Compressor summary: The study explores how re-purposing the reverse dictionary task can help understand large language models' conceptual inference and reasoning abilities, which are linked to their general reasoning performance across multiple benchmarks.
http://arxiv.org/abs/2402.14402v1
Compressor summary: The paper proposes a transferable safe learning method that uses source knowledge to explore multiple disjoint safe regions and learn tasks faster with less data consumption.
http://arxiv.org/abs/2402.14401v1
Compressor summary: The paper proposes a diffusion model-based method for No-Reference Image Quality Assessment (NR-IQA) that leverages both high-level and low-level features to improve performance on seven public datasets.
http://arxiv.org/abs/2402.14400v1
Compressor summary: The paper proposes a data-driven method to predict infants' neurodevelopmental maturation using 3D video recordings and graph convolutional networks, outperforming traditional machine learning approaches.
http://arxiv.org/abs/2402.14398v1
Compressor summary: The paper proposes a novel dual-stream framework for image attribute editing that gradually injects lost details into the reconstruction and editing process, improving detail preservation and editability.
http://arxiv.org/abs/2402.14395v1
Compressor summary: Key points: - SIS generates realistic images from semantic masks - Method uses pre-trained unconditional generator and feature rearranger - Proxy masks are created from feature maps by clustering - Semantic mapper produces proxy masks from different input conditions - Method works for various applications like sketch-to-photo Summary: The authors propose a method to generate realistic images from semantic masks using a pre-trained generator, feature rearranger, and semantic mapper, which can handle different input conditions and applications.
http://arxiv.org/abs/2402.14393v1
Compressor summary: The Graph Parsing Network (GPN) adaptively learns personalized pooling structures for graphs, achieving better performance in graph classification tasks and preserving node information.
http://arxiv.org/abs/2402.14392v1
Compressor summary: The paper proposes a novel tracking paradigm that uses a relevance attention mechanism and a global representation memory to adaptively select relevant historical information for different search regions in videos, improving tracking performance and reducing redundancy.
http://arxiv.org/abs/2402.14391v1
Compressor summary: Microenvironment-Aware Protein Embedding for PPI prediction (MPAE-PPI) is a novel method that encodes protein microenvironments into discrete codes, using a large codebook and a pre-training strategy called Masked Codebook Modeling (MCM), to predict protein-protein interactions efficiently and effectively.
http://arxiv.org/abs/2402.14389v1
Compressor summary: Key points: - Paper introduces a hybrid ensemble model that combines multiple ML algorithms with proper weight optimization to enhance credit card fraud detection - Model uses IHT technique and LR to address data imbalance issue - Model achieves high accuracy rates on a public dataset and outperforms existing works Summary: The paper presents a novel hybrid ensemble model that intelligently combines ML algorithms, optimizes their weights, and uses IHT and LR to deal with data imbalance. The model excels at detecting credit card fraud with high accuracy on a public dataset and surpasses existing methods.
http://arxiv.org/abs/2402.14385v1
Compressor summary: The paper proposes an innovative approach for accurate short-term national wind power forecasting using deep learning and weather predictions.
http://arxiv.org/abs/2402.14384v1
Compressor summary: The paper proposes a fast and accurate 1D DCGAN method for detecting anomalies in energy consumption data using soft-DTW and parallel computation.
http://arxiv.org/abs/2402.14382v1
Compressor summary: Chain-of-History (CoH) reasoning improves LLM-based TKG forecasting by leveraging high-order historical information and enhancing graph-based models' performance.
http://arxiv.org/abs/2402.14379v1
Compressor summary: The paper discusses transformer-based language models for Serbian, comparing ten vectorization models on four NLP tasks and exploring factors that influence their performance.
http://arxiv.org/abs/2402.14373v1
Compressor summary: The paper proposes SLCoLM, a framework that combines pre-trained language models and large language models to improve relational extraction for long-tailed data.
http://arxiv.org/abs/2402.14371v1
Compressor summary: The text introduces HR-APR, a framework that estimates uncertainty in camera pose regression from monocular images without relying on specific APR architectures, and uses it to improve performance.
http://arxiv.org/abs/2402.14367v1
Compressor summary: SPMiner is a novel neural approach for finding frequent subgraphs in large networks using graph neural networks, order embedding space, and an efficient search strategy.
http://arxiv.org/abs/2402.14361v1
Compressor summary: OpenTab is an open-domain table reasoning framework that uses a retriever to fetch relevant tables and generates SQL programs to parse them, achieving high accuracy in inferencing tasks.
http://arxiv.org/abs/2402.14359v1
Compressor summary: The paper proposes a facet-aware metric and dataset to evaluate scientific summarization by large language models, highlighting their limitations and introducing a more logical approach.
http://arxiv.org/abs/2402.14355v1
Compressor summary: The paper explores how stories are better than rules for retrieving and using commonsense in large language models, and shows that improving stories with self-supervised fine-tuning enhances their effectiveness.
http://arxiv.org/abs/2402.14354v1
Compressor summary: The paper proposes GAM-Depth, a method for indoor depth estimation that uses gradient-aware masks and semantic constraints to handle textureless areas and object boundaries better than existing methods.
http://arxiv.org/abs/2402.14346v1
Compressor summary: DepL is a framework that ensures high-quality and efficient learning by making smart decisions on data, models, and resources, while considering both average and distribution of learning quality.
http://arxiv.org/abs/2402.14345v1
Compressor summary: The paper proposes an accelerated Visual SLAM method that combines GMS and RANSAC to remove mismatched features and rank high-confidence matches for faster performance while maintaining accuracy.
http://arxiv.org/abs/2402.14340v1
Compressor summary: TIE-KD is a new framework that improves monocular depth estimation by streamlining knowledge transfer from complex teacher models to compact student networks using explainable feature maps.
http://arxiv.org/abs/2402.14337v1
Compressor summary: The authors propose a method for improving language models' reasoning skills by adapting to imperfect rationales, which can handle uncertainty and perform better in challenging scenarios.
http://arxiv.org/abs/2402.14335v1
Compressor summary: Key points: - HyperFast is a meta-trained hypernetwork for instant classification of tabular data in one forward pass - It generates a task-specific neural network tailored to an unseen dataset without training - It outperforms or matches other methods while being much faster and more adaptable Summary: HyperFast is a fast and flexible method for classifying tabular data using a meta-trained hypernetwork that creates a custom neural network on the fly.
http://arxiv.org/abs/2402.14334v1
Compressor summary: The text discusses the need for better understanding of users' intentions and preferences in information retrieval tasks, and introduces a new benchmark (INSTRUCTIR) to evaluate how well retrievers can follow user-aligned instructions.
http://arxiv.org/abs/2402.14332v1
Compressor summary: The paper proposes a method to efficiently select the best clustering algorithm from a set of candidates by subsampling the data and evaluating accuracy on smaller instances, with theoretical and empirical results supporting this approach.
http://arxiv.org/abs/2402.14328v1
Compressor summary: The paper investigates why large language models struggle with compositional reasoning tasks and proposes CREME, a method that edits MHSA modules to improve their reasoning abilities.
http://arxiv.org/abs/2402.14327v1
Compressor summary: The paper proposes a subobject-level image tokenization method for vision language tasks, which improves efficiency over patch-based methods.
http://arxiv.org/abs/2402.14320v1
Compressor summary: Triad is a novel KBQA framework that leverages an LLM-based agent with three roles to achieve superior performance on various tasks compared to existing systems.
http://arxiv.org/abs/2402.14318v1
Compressor summary: The article evaluates various rerankers for retrieval-augmented generation in Polish and shows that effective optimization and large training data lead to compact, generalizable, and state-of-the-art models.
http://arxiv.org/abs/2402.14316v1
Compressor summary: Place-Anything is a system that lets you insert any object into any video using just a picture or text description of the target element.
http://arxiv.org/abs/2402.14314v1
Compressor summary: The paper presents a new typographic text generation system that combines ControlNet and Blended Latent Diffusion to generate and manipulate text on designs with specified font styles, colors, and effects.
http://arxiv.org/abs/2402.14313v1
Compressor summary: The paper proposes pairwise and set-wise machine-learning models to automate kerning, a task of adjusting horizontal spaces between letter pairs in fonts, and shows that the set-wise model is more efficient and accurate.
http://arxiv.org/abs/2402.14311v1
Compressor summary: The paper proposes three methods to create new font styles by blending two reference fonts using diffusion models, and evaluates their effectiveness in generating both expected and surprising styles.
http://arxiv.org/abs/2402.14310v1
Compressor summary: Hint-before-Solving Prompting (HSP) improves the accuracy of reasoning tasks for large language models by guiding them to generate hints and intermediate steps.
http://arxiv.org/abs/2402.14309v1
Compressor summary: YOLO-TLA is an improved object detection model that uses a larger scale feature map for small objects, a compact backbone network, and global attention to achieve higher accuracy in detecting small objects with fewer parameters and lower computational demand.
http://arxiv.org/abs/2402.14300v1
Compressor summary: SimICL is a novel visual in-context learning method that achieves high segmentation performance on wrist ultrasound images with limited annotations, potentially reducing human expert time for labeling and improving AI assisted image analysis.
http://arxiv.org/abs/2402.14298v1
Compressor summary: This paper proposes a new method for detecting public opinion from tweets with texts and images, using target information to learn features from both modalities.
http://arxiv.org/abs/2402.14296v1
Compressor summary: Key points: - Large language models (LLMs) can generate biased stances in stance detection tasks due to spurious sentiment-stance correlation and preference towards certain individuals and topics. - The paper proposes a novel gated calibration network (MB-Cal) to mitigate the biases on the stance reasoning results from LLMs. - Counterfactual augmented data is used to rectify stance biases. - MB-Cal achieves state-of-the-art results in stance detection tasks. Summary: The paper introduces a novel method (MB-Cal) that uses a gated calibration network and counterfactual data to reduce the bias of large language models in stance detection tasks, improving their performance.
http://arxiv.org/abs/2402.14294v1
Compressor summary: The paper develops a theory for learning with structured correlation using graphs, hypergraphs, and relational languages, and shows how high-arity PAC learnability depends on combinatorial dimension and uniform convergence.
http://arxiv.org/abs/2402.14293v1
Compressor summary: The study explores how Large Language Models can help in educational scenarios by using concept graphs and answering questions, and introduces a new benchmark called TutorQA.
http://arxiv.org/abs/2402.14290v1
Compressor summary: CEV-LM is a new language model that can adjust the pace, length, and complexity of text generation by controlling three metrics (speed, volume, and circuitousness).
http://arxiv.org/abs/2402.14289v1
Compressor summary: The TinyLLaVA framework explores how to design and analyze small-scale LMMs by experimenting with different components and training methods, achieving comparable performance to larger models.
http://arxiv.org/abs/2402.14281v1
Compressor summary: The LAVN dataset provides RGB observations and human point-click pairs for supervised learning of map building and exploration in real and virtual environments, with landmarks to simplify graph construction.