arxiv compressed, 2024-02-23

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-23 generated by the compressor, my personal LLM-based project.


PALO: A Polyglot Large Multimodal Model for 5B People

http://arxiv.org/abs/2402.14818v1

Compressor summary: The study presents Palo, a large multilingual multimodal model that enables visual reasoning in 10 major languages, using a semi-automated translation approach and showing improvements over baselines.


Cameras as Rays: Pose Estimation via Ray Diffusion

http://arxiv.org/abs/2402.14817v1

Compressor summary: The paper proposes a novel way of estimating camera poses using rays instead of global parametrizations, which improves accuracy and generalizes better to different scenarios.


Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

http://arxiv.org/abs/2402.14811v1

Compressor summary: Fine-tuning language models on generalized tasks improves their performance on entity tracking by enhancing their ability to handle positional information.


WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition

http://arxiv.org/abs/2402.14812v1

Compressor summary: Key points: - Weakly supervised visual recognition reduces human labeling costs and relies on multi-instance learning and pseudo-labeling - WeakSAM uses pre-learned world knowledge from SAM to solve WSOD and segmentation problems - WeakSAM improves WSOD and WSIS performance significantly with adaptive PGT generation, RoI drop regularization, and category awareness Summary: WeakSAM is a novel method that uses pre-learned world knowledge from SAM to improve weakly supervised visual recognition problems with better pseudo ground truth generation, region of interest regularization, and category awareness.


GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

http://arxiv.org/abs/2402.14810v1

Compressor summary: The paper proposes a novel denoising method, GeneOH Diffusion, that refines incorrect hand trajectories in noisy hand-object interaction sequences using a contact-centric representation and a domain-generalizable denoising scheme.


CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

http://arxiv.org/abs/2402.14809v1

Compressor summary: CriticBench is a benchmark to evaluate how well large language models can critique, refine, and improve their own reasoning across various tasks and domains.


RelayAttention for Efficient Large Language Model Serving with Long System Prompts

http://arxiv.org/abs/2402.14808v1

Compressor summary: The paper proposes RelayAttention, an efficient attention algorithm that reduces redundant memory accesses for long system prompts in large language models without compromising generation quality.


Difference Learning for Air Quality Forecasting Transport Emulation

http://arxiv.org/abs/2402.14806v1

Compressor summary: The authors propose a deep learning method to reduce computational requirements and maintain skill in air quality forecasting, especially during extreme events.


Identifying Multiple Personalities in Large Language Models with External Evaluation

http://arxiv.org/abs/2402.14805v1

Compressor summary: The paper proposes an external evaluation method to measure LLM personalities using situational questions and shows that LLMs can have different personalities depending on the scenario, unlike humans.


Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

http://arxiv.org/abs/2402.14804v1

Compressor summary: MATH-Vision is a new dataset with diverse mathematical problems from real competitions that tests the limits of Large Multimodal Models' reasoning skills in visual contexts.


Link Prediction under Heterophily: A Physics-Inspired Graph Neural Network Approach

http://arxiv.org/abs/2402.14802v1

Compressor summary: GRAFF-LP improves link prediction performance on heterophilic graphs by adopting physics biases in the message-passing mechanism of Graph Neural Networks.


Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

http://arxiv.org/abs/2402.14800v1

Compressor summary: This paper introduces new techniques to make large language models with MoE more efficient and faster without sacrificing performance.


Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

http://arxiv.org/abs/2402.14798v1

Compressor summary: The paper proposes a new protocol for annotating decompositional entailment datasets, which leads to improved textual inference performance and consistency in LLM-based systems.


Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

http://arxiv.org/abs/2402.14797v1

Compressor summary: Key points: - Image models are repurposed for video generation, but this reduces quality and scalability - Snap Video is a video-first model that addresses these challenges using EDM and transformer architectures - Snap Video achieves state-of-the-art results, higher quality, and more user preference than existing methods Summary: Snap Video is a novel video generation model that leverages EDM and transformers to overcome the limitations of image models, producing high-quality and temporally consistent videos with more user satisfaction.


Consolidating Attention Features for Multi-view Image Editing

http://arxiv.org/abs/2402.14792v1

Compressor summary: QNeRF is a method for improving geometric consistency in multi-view image editing by using a neural radiance field trained on query features and injecting them back into the self-attention layers during generation.


Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

http://arxiv.org/abs/2402.14789v1

Compressor summary: SMA is a domain-agnostic method for self-supervised learning that uses attention to learn masks for sampling, achieving state-of-the-art results on three benchmarks in different domains.


Rao-Blackwellising Bayesian Causal Inference

http://arxiv.org/abs/2402.14781v1

Compressor summary: The paper proposes a new framework for Bayesian causal inference that combines order-based MCMC structure learning, gradient-based graph learning, and Gaussian processes to exactly marginalize over causal models and outperforms existing methods on various benchmarks.


Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

http://arxiv.org/abs/2402.14780v1

Compressor summary: The paper presents Customize-A-Video, a method for one-shot motion customization from reference videos using low-rank adaptation and appearance absorbers, applicable to various video tasks.


Zero-shot cross-lingual transfer in instruction tuning of large language model

http://arxiv.org/abs/2402.14778v1

Compressor summary: This paper studies how well pretrained language models can follow instructions in different languages after being trained only on English data, and suggests ways to improve their performance.


2D Matryoshka Sentence Embeddings

http://arxiv.org/abs/2402.14776v1

Compressor summary: The paper introduces 2DMSE, a sentence embedding model that adjusts embedding size and Transformer layers for flexibility and efficiency in downstream tasks.


DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

http://arxiv.org/abs/2402.14767v1

Compressor summary: DualFocus is a framework that integrates macro and micro perspectives in multi-modal large language models to enhance vision-language task performance by focusing on the image from a global view, answering questions, and zooming in on relevant sub-regions for detailed analysis.


MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

http://arxiv.org/abs/2402.14762v1

Compressor summary: MT-Bench-101 is a benchmark for evaluating large language models' dialogue skills, revealing their strengths and weaknesses across 13 tasks and showing that current methods do not significantly improve their performance.


Generalizing Reward Modeling for Out-of-Distribution Preference Learning

http://arxiv.org/abs/2402.14760v1

Compressor summary: The paper proposes a meta-learning approach to optimize a reward model for out-of-distribution preference learning with large language models.


Generalising realisability in statistical learning theory under epistemic uncertainty

http://arxiv.org/abs/2402.14759v1

Compressor summary: The paper explores how key statistical learning concepts apply when train and test data come from the same uncertain probability distribution.


SHM-Traffic: DRL and Transfer learning based UAV Control for Structural Health Monitoring of Bridges with Traffic

http://arxiv.org/abs/2402.14757v1

Compressor summary: The paper presents an approach using deep reinforcement learning-based control for Unmanned Aerial Vehicles to conduct concrete bridge deck surveys with traffic, detecting cracks using edge detection techniques or a Convolutional Neural Network.


Prompting a Pretrained Transformer Can Be a Universal Approximator

http://arxiv.org/abs/2402.14753v1

Compressor summary: The paper shows that prompting and prefix-tuning can make smaller pretrained models act like any sequence-to-sequence function, with attention heads being especially powerful for this purpose.


Scaling Efficient LLMs

http://arxiv.org/abs/2402.14746v1

Compressor summary: We study efficient large language models (LLMs) and show that the size and parameters of a natural training corpus scale with each other, and increasing the corpus size can reveal new skills for LLMs.


Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

http://arxiv.org/abs/2402.14744v1

Compressor summary: The paper proposes a new way to use large language models in agent frameworks for creating personalized and realistic urban mobility scenarios, by aligning them with rich activity data and developing strategies to generate reliable activities.


Dependency Annotation of Ottoman Turkish with Multilingual BERT

http://arxiv.org/abs/2402.14743v1

Compressor summary: The study presents a method to create an Ottoman Turkish dependency treebank using a BERT-based model and manual corrections, which will enable automated analysis of historical documents.


Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

http://arxiv.org/abs/2402.14740v1

Compressor summary: The text discusses the use of Reinforcement Learning from Human Feedback (RLHF) for AI alignment in large language models and proposes simpler optimization methods over PPO for better performance.


How Transformers Learn Causal Structure with Gradient Descent

http://arxiv.org/abs/2402.14735v1

Compressor summary: In-context learning reveals how transformers learn latent causal structure using self-attention and gradient descent, enabling them to encode induction heads for Markov chains.


Clifford-Steerable Convolutional Neural Networks

http://arxiv.org/abs/2402.14730v1

Compressor summary: CS-CNNs are $\mathrm{E}(p, q)$-equivariant CNNs that process multivector fields on pseudo-Euclidean spaces, achieving better results than baseline methods in fluid dynamics and relativistic electrodynamics tasks.


Incorporating Expert Rules into Neural Networks in the Framework of Concept-Based Learning

http://arxiv.org/abs/2402.14726v1

Compressor summary: The paper proposes methods to combine expert rules with neural networks for concept-based learning, ensuring that output probabilities follow the rules and offering a way to integrate inductive and deductive learning.


A Transformer Model for Boundary Detection in Continuous Sign Language

http://arxiv.org/abs/2402.14720v1

Compressor summary: The authors propose a Transformer-based model for Continuous Sign Language Recognition that eliminates the need for handcrafted features and improves accuracy in detecting sign boundaries.


Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models

http://arxiv.org/abs/2402.14714v1

Compressor summary: The report introduces EEVE-Korean-v1.0, a Korean adaptation of large language models that improves non-English text understanding with efficient and effective vocabulary expansion.


IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

http://arxiv.org/abs/2402.14710v1

Compressor summary: IEPile is a large bilingual IE instruction corpus that improves LLMs' performance in Information Extraction, especially zero-shot generalization.


CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks

http://arxiv.org/abs/2402.14708v1

Compressor summary: The paper proposes a novel causal temporal graph neural network (CaT-GNN) for credit card fraud detection that leverages causal invariant learning, temporal attention, and causal mixup to enhance robustness and interpretability.


Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening

http://arxiv.org/abs/2402.14707v1

Compressor summary: The paper proposes a two-stage image synthesis framework to create realistic synthetic cytopathological images for augmenting cervical abnormality screening using Stable Diffusion and parameter-efficient fine-tuning methods.


An LLM-Enhanced Adversarial Editing System for Lexical Simplification

http://arxiv.org/abs/2402.14704v1

Compressor summary: The paper introduces a new lexical simplification method that uses an adversarial editing system and knowledge distillation from large language models to simplify texts without needing annotated data or parallel corpora.


On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

http://arxiv.org/abs/2402.14703v1

Compressor summary: The paper proposes novel coverage assumptions for off-policy evaluation in partially observable environments, leading to polynomial bounds and new algorithms.


InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks

http://arxiv.org/abs/2402.14702v1

Compressor summary: The paper introduces InfFeed, a method that uses influence functions to improve deep neural models' performance and reduce the need for manual annotation in datasets by identifying influential instances.


COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling

http://arxiv.org/abs/2402.14701v1

Compressor summary: COMPASS is a novel framework that uses advanced language models to infer the therapeutic working alliance from psychotherapy session transcripts, improving understanding of therapeutic interactions and feedback for therapists.


Unveiling Linguistic Regions in Large Language Models

http://arxiv.org/abs/2402.14700v1

Compressor summary: This paper investigates how large language models achieve cross-lingual alignment and identifies a core linguistic competence region that is essential for their performance across 30 languages.


Big data analytics to classify earthwork-related locations: A Chengdu study

http://arxiv.org/abs/2402.14698v1

Compressor summary: The text describes a study on classifying urban dust pollution sources from earthwork-related locations using data and developing a system to help control it.


QIS : Interactive Segmentation via Quasi-Conformal Mappings

http://arxiv.org/abs/2402.14695v1

Compressor summary: Quasi-conformal interactive segmentation (QIS) is a model that uses user clicks to guide the segmentation of degraded images, improving accuracy by adjusting a template mask with an orientation-preserving mapping.


UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

http://arxiv.org/abs/2402.14690v1

Compressor summary: The text introduces a new evaluation framework (UFO) for assessing the accuracy of large language models across various text generation tasks, using different fact sources like human-written evidence and search engine results.


Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

http://arxiv.org/abs/2402.14688v1

Compressor summary: Q-probing is a method to adapt pre-trained language models for specific tasks using reweighted sampling based on task-specific reward functions, which can improve performance in various domains and data regimes.


Visual Hallucinations of Multi-modal Large Language Models

http://arxiv.org/abs/2402.14683v1

Compressor summary: The authors propose a tool called VHTest that generates diverse visual hallucination instances for multi-modal large language models and use it to create a benchmark dataset.


Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality

http://arxiv.org/abs/2402.14679v1

Compressor summary: The study examines how well Large Language Models can show human-like personality traits through their responses to questionnaires and their actual behavior, using established benchmarks to understand any differences.


Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

http://arxiv.org/abs/2402.14672v1

Compressor summary: The paper explores how tools can help large language models process complex environments like databases and knowledge bases, improving their performance significantly.


Quadruplet Loss For Improving the Robustness to Face Morphing Attacks

http://arxiv.org/abs/2402.14665v1

Compressor summary: The study proposes a new loss function and training method to improve face recognition systems' resistance to face morphing attacks.


Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

http://arxiv.org/abs/2402.14664v1

Compressor summary: sDM is a Bayesian method that uses structured priors to capture action correlations for more efficient off-policy evaluation and learning in interactive systems.


ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

http://arxiv.org/abs/2402.14660v1

Compressor summary: ConceptMath is a benchmark that evaluates how well large language models can reason about different math concepts and helps improve them.


Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

http://arxiv.org/abs/2402.14654v1

Compressor summary: Multi-HMR is a single-shot model that recovers 3D human mesh from an RGB image using Vision Transformer features and a novel Human Prediction Head module, achieving state-of-the-art results with CUFFS dataset and various backbone sizes.


Cleaner Pretraining Corpus Curation with Neural Web Scraping

http://arxiv.org/abs/2402.14652v1

Compressor summary: The paper introduces NeuScraper, a neural network-based web scraper that improves text extraction quality for language model pretraining.


GaussianPro: 3D Gaussian Splatting with Progressive Propagation

http://arxiv.org/abs/2402.14650v1

Compressor summary: The paper proposes GaussianPro, a novel method to improve neural rendering using progressive propagation and patch matching techniques for better initialization of 3D Gaussians.


Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

http://arxiv.org/abs/2402.14648v1

Compressor summary: AR-AT improves adversarial robustness and accuracy by addressing gradient conflict and mixture distribution issues in representation-based invariance regularization.


CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations

http://arxiv.org/abs/2402.14646v1

Compressor summary: CoLoRA is a fast and accurate method to predict solutions of partial differential equations using pre-trained neural networks that adapt low-rank weights in time.


Sparse Linear Regression and Lattice Problems

http://arxiv.org/abs/2402.14645v1

Compressor summary: The text discusses the average-case hardness of sparse linear regression and presents a reduction from lattice problems to SLR, showing that finding good solutions for SLR is challenging even when the design matrix is well-conditioned.


Distributed Radiance Fields for Edge Video Compression and Metaverse Integration in Autonomous Driving

http://arxiv.org/abs/2402.14642v1

Compressor summary: The text proposes a new method using distributed radiance fields for efficient video compression and real-time digital twin updates in the metaverse for autonomous vehicles.


latrend: A Framework for Clustering Longitudinal Data

http://arxiv.org/abs/2402.14621v1

Compressor summary: The "latrend" R package is a framework for comparing and implementing longitudinal clustering methods, allowing researchers to easily analyze trends in data over time.


The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations

http://arxiv.org/abs/2402.14616v1

Compressor summary: The paper evaluates how well subword-based word embeddings represent out-of-vocabulary words and their semantic similarity to in-vocabulary words.


Two Counterexamples to \textit{Tokenization and the Noiseless Channel}

http://arxiv.org/abs/2402.14614v1

Compressor summary: The paper proposes R'enyi efficiency as a way to evaluate tokenizers, but shows that it has limitations and cannot capture all aspects of a good tokenization scheme.


Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation

http://arxiv.org/abs/2402.14611v1

Compressor summary: This paper explores contrastive learning for medical image analysis, proposes local feature learning and feature decorrelation to overcome dimensional collapse, and shows improved performance in medical segmentation.


High-Speed Detector For Low-Powered Devices In Aerial Grasping

http://arxiv.org/abs/2402.14591v1

Compressor summary: The paper presents Fast Fruit Detector (FFD), a fast and resource-efficient object detection algorithm for autonomous aerial harvesting, along with a new dataset and data generation method for small-sized fruit instances.


FrameNeRF: A Simple and Efficient Framework for Few-shot Novel View Synthesis

http://arxiv.org/abs/2402.14586v1

Compressor summary: FrameNeRF uses a regularization model to generate dense views from sparse inputs, enabling fast high-fidelity NeRF models to perform well in few-shot novel view synthesis tasks.


Bandits with Abstention under Expert Advice

http://arxiv.org/abs/2402.14585v1

Compressor summary: The paper proposes CBA, a new algorithm for prediction with expert advice under bandit feedback, which exploits the option of abstention and achieves better reward bounds than Exp4, especially for specialists.


Debiasing Text-to-Image Diffusion Models

http://arxiv.org/abs/2402.14577v1

Compressor summary: The authors present a method called iterative distribution alignment to address social bias in text-to-image models, which simplifies and improves upon previous approaches.


LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition

http://arxiv.org/abs/2402.14568v1

Compressor summary: The paper proposes LLM-DA, a novel technique for augmenting few-shot NER data using large language models to improve performance and maintain semantic integrity.


Self-supervised Visualisation of Medical Image Datasets

http://arxiv.org/abs/2402.14566v1

Compressor summary: t-SimCNE is a self-supervised learning method that produces semantically meaningful 2D visualizations of medical images using contrastive learning with data augmentations, which can help in data exploration and annotation.


LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey

http://arxiv.org/abs/2402.14558v1

Compressor summary: The paper explores challenges and opportunities in using large language models for various industrial applications by surveying practitioners, analyzing 68 papers, and answering four research questions.


CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

http://arxiv.org/abs/2402.14551v1

Compressor summary: CLCE combines contrastive learning and cross-entropy loss to improve image model performance, especially in few-shot and transfer learning scenarios, while reducing dependency on large batch sizes.


OmniPred: Language Models as Universal Regressors

http://arxiv.org/abs/2402.14547v1

Compressor summary: OmniPred is a framework that trains language models to accurately predict numeric outcomes for diverse real-world experiments using only textual representations of parameters and values.


Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

http://arxiv.org/abs/2402.14545v1

Compressor summary: The paper explores how overly detailed training data causes large multimodal models to generate content beyond visual perception limits and proposes two methods to reduce this issue.


Domain Generalization via Causal Adjustment for Cross-Domain Sentiment Analysis

http://arxiv.org/abs/2402.14536v1

Compressor summary: Key points: - The paper proposes a causal model for cross-domain sentiment analysis that can handle unknown target domains. - The model disentangles domain-specific and domain-invariant features using backdoor adjustment. - The model outperforms existing methods on various datasets. Summary: The paper presents a causal model that learns domain-invariant features for cross-domain sentiment analysis, enabling robustness to unknown target domains.


Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard

http://arxiv.org/abs/2402.14533v1

Compressor summary: The study analyzes the linguistic styles of three large language models (LLMs) and finds significant variations that can help identify their origin with high accuracy.


A Framework for Variational Inference of Lightweight Bayesian Neural Networks with Heteroscedastic Uncertainties

http://arxiv.org/abs/2402.14532v1

Compressor summary: The paper proposes a method to embed heteroscedastic uncertainty in Bayesian Neural Networks without adding learnable parameters and improves performance with sampling-free variational inference.


Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance

http://arxiv.org/abs/2402.14531v1

Compressor summary: Politeness levels in prompts affect large language models' performance differently across English, Chinese, and Japanese tasks, with the best level varying by language and cultural context.


ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

http://arxiv.org/abs/2402.14528v1

Compressor summary: ACE is a new RL algorithm that uses causality-aware entropy to explore important actions and dormancy-guided reset to balance exploration and exploitation, achieving better performance on 29 diverse continuous control tasks.


Balanced Data Sampling for Language Model Training with Clustering

http://arxiv.org/abs/2402.14526v1

Compressor summary: ClusterClip Sampling is a data sampling strategy for training large language models that balances the text distribution by clustering and clipping repetitive samples to improve performance.


Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

http://arxiv.org/abs/2402.14523v1

Compressor summary: The paper proposes Daisy-TTS, an emotional text-to-speech system that uses a prosody encoder to simulate a wide range of emotions based on the structural model of emotions.


Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

http://arxiv.org/abs/2402.14522v1

Compressor summary: FUTE is a framework that unifies task embeddings from different language models, including prompt-based LLMs, enabling comparison and analysis of similarities across models.


Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction

http://arxiv.org/abs/2402.14521v1

Compressor summary: The paper describes the construction and validation of a new annotated dataset for natural language processing tasks in Malaysian English, which improved named entity recognition performance.


Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

http://arxiv.org/abs/2402.14505v1

Compressor summary: The paper proposes a method to adapt pre-trained models for visual place recognition using hybrid adaptation and mutual nearest neighbor local feature loss, achieving better performance with less data and time than existing methods.


"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

http://arxiv.org/abs/2402.14499v1

Compressor summary: The paper evaluates how well multiple-choice questions reflect a large language model's actual responses when interacting with users, finding significant misalignment between them.


Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment Pre-training for Noisy Slot Filling Task

http://arxiv.org/abs/2402.14494v1

Compressor summary: Noise-BERT is a framework that uses pre-training tasks and fine-tuning techniques to improve slot filling performance under input perturbations.


INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning

http://arxiv.org/abs/2402.14492v1

Compressor summary: INSTRAUG is a method that automatically expands instruction-following datasets for multimodal tasks, improving the performance of large language models.


Imbalanced Data Clustering using Equilibrium K-Means

http://arxiv.org/abs/2402.14490v1

Compressor summary: Equilibrium K-means (EKM) is a novel algorithm that improves clustering results for imbalanced data by alternating between two steps, reducing centroid crowding and having competitive performance on balanced data.


Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

http://arxiv.org/abs/2402.14488v1

Compressor summary: The study presents a new generator to produce accurate information despite changes in contextual knowledge and finds that all models tend to generate previous answers as hallucinations.


Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis

http://arxiv.org/abs/2402.14484v1

Compressor summary: This study evaluates ChatGPT's causal text mining capabilities across various datasets, finding that while it performs well as a starting point, previous models and its own advanced versions outperform it in some aspects.


SpanSeq: Similarity-based sequence data splitting method for improved development and assessment of deep learning projects

http://arxiv.org/abs/2402.14482v1

Compressor summary: SpanSeq is a database partition method that helps deep learning models in computational biology generalize better by avoiding data leakage between sets.


Towards Automated Causal Discovery: a case study on 5G telecommunication data

http://arxiv.org/abs/2402.14481v1

Compressor summary: AutoCD is a system that automatically finds and explains causes in various data types using causal methods.


DynGMA: a robust approach for learning stochastic differential equations from data

http://arxiv.org/abs/2402.14475v1

Compressor summary: The paper proposes new methods for learning unknown SDEs from low-resolution and variable time-step data using robust density approximations.


Data Science with LLMs and Interpretable Models

http://arxiv.org/abs/2402.14474v1

Compressor summary: Large language models can enhance interpretable machine learning models like Generalized Additive Models, enabling better dataset analysis and interaction with domain experts.


Reimagining Anomalies: What If Anomalies Were Normal?

http://arxiv.org/abs/2402.14469v1

Compressor summary: The method creates different modified images for each anomaly, showing what changed to make it normal, helping users understand why the detector flagged it as abnormal.


NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

http://arxiv.org/abs/2402.14464v1

Compressor summary: NeRF-Det++ improves NeRF-Det by addressing semantic ambiguity, inappropriate sampling, and insufficient depth supervision in 3D detection using segmentation maps, perspective-aware sampling, and ordinal residual depth supervision.


S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR

http://arxiv.org/abs/2402.14461v1

Compressor summary: The paper proposes a new single-stage transformer framework (S^2Former-OR) that uses multi-view 2D scenes and 3D point clouds to generate semantic scene graphs for surgical procedures in the operating room, improving efficiency and accuracy.


Reframing the Expected Free Energy: Four Formulations and a Unification

http://arxiv.org/abs/2402.14460v1

Compressor summary: This paper explores how to derive different formulations of active inference from a single expected free energy definition and examines the conditions under which these formulations can be recovered in two settings.


NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes

http://arxiv.org/abs/2402.14458v1

Compressor summary: This paper presents an automatic method to generate arguments in multiple languages, a large corpus of argument schemes, and models to identify them.


Annotation and Classification of Relevant Clauses in Terms-and-Conditions Contracts

http://arxiv.org/abs/2402.14457v1

Compressor summary: The paper introduces a new way to label different parts of Terms-and-Conditions contracts and tests its effectiveness using few-shot learning with LLMs.


VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning

http://arxiv.org/abs/2402.14456v1

Compressor summary: VLPose is a framework that uses language models to improve human pose estimation in both natural and artificial scenarios, enhancing adaptability and robustness.


CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose Augmentation

http://arxiv.org/abs/2402.14454v1

Compressor summary: CCPA is a framework that uses clothing and pose transfer to improve long-term person re-identification by capturing body shape information with a relation graph attention network.


Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?

http://arxiv.org/abs/2402.14453v1

Compressor summary: Large language models can automatically adjust text difficulty for individual learners in conversation, even outperforming humans in some cases.


A Language Model's Guide Through Latent Space

http://arxiv.org/abs/2402.14433v1

Compressor summary: The paper explores how to control language models using different types of concepts beyond truthfulness, developing a new metric and showing challenges in eliciting and guiding with some novel concepts.


KoCoSa: Korean Context-aware Sarcasm Detection Dataset

http://arxiv.org/abs/2402.14428v1

Compressor summary: The paper introduces KoCoSa, a new Korean dialogue sarcasm detection dataset, and proposes an efficient pipeline to build it, which outperforms strong baselines like GPT-3.5.


Text me the data: Generating Ground Pressure Sequence from Textual Descriptions for HAR

http://arxiv.org/abs/2402.14427v1

Compressor summary: The Text-to-Pressure framework generates ground pressure sequences from textual descriptions of human activities using deep learning, enabling efficient human activity recognition models with less costly data acquisition.


Automating Psychological Hypothesis Generation with AI: Large Language Models Meet Causal Graph

http://arxiv.org/abs/2402.14424v1

Compressor summary: The study combines a large language model with causal knowledge graphs to generate novel hypotheses in psychology, outperforming both human experts and the LLM alone.


Uncertainty-Aware Evaluation for Vision-Language Models

http://arxiv.org/abs/2402.14418v1

Compressor summary: The paper proposes a benchmark to evaluate vision-language models with uncertainty quantification and finds that uncertainty levels vary with accuracy and language model components.


TaylorGrid: Towards Fast and High-Quality Implicit Field Learning via Direct Taylor-based Grid Optimization

http://arxiv.org/abs/2402.14415v1

Compressor summary: The paper introduces TaylorGrid, a novel method for efficient and high-quality 3D geometry representation using direct Taylor expansion optimization on grids.


J-UniMorph: Japanese Morphological Annotation through the Universal Feature Schema

http://arxiv.org/abs/2402.14411v1

Compressor summary: J-UniMorph is a Japanese Morphology dataset that covers more verb forms and linguistic nuances than the existing Wiktionary Edition.


Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models

http://arxiv.org/abs/2402.14409v1

Compressor summary: Key points: - Paper explores and resolves knowledge conflicts in retrieval-augmented language models (RALMs) - Paper presents an evaluation framework for assessing knowledge conflicts - Paper finds that RALMs have biases and preferences that affect their conflict resolution - Paper proposes a method called CD2 to improve RALMs' confidence calibration Summary: The paper investigates how retrieval-augmented language models resolve knowledge conflicts, which can limit their applicability. It evaluates different sources of conflicts and biases, and introduces a new method (CD2) to enhance their confidence estimation.


Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching

http://arxiv.org/abs/2402.14408v1

Compressor summary: This paper proposes a method to use BERT models for low-resource languages by matching their vocabulary with high-resource ones and shows its effectiveness on Silesian and Kashubian languages.


Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning

http://arxiv.org/abs/2402.14407v1

Compressor summary: The paper proposes a method to train robots using human videos without action labels and limited robot data, by combining generative pre-training and policy fine-tuning with discrete diffusion models.


On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

http://arxiv.org/abs/2402.14404v1

Compressor summary: The study explores how re-purposing the reverse dictionary task can help understand large language models' conceptual inference and reasoning abilities, which are linked to their general reasoning performance across multiple benchmarks.


Global Safe Sequential Learning via Efficient Knowledge Transfer

http://arxiv.org/abs/2402.14402v1

Compressor summary: The paper proposes a transferable safe learning method that uses source knowledge to explore multiple disjoint safe regions and learn tasks faster with less data consumption.


Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

http://arxiv.org/abs/2402.14401v1

Compressor summary: The paper proposes a diffusion model-based method for No-Reference Image Quality Assessment (NR-IQA) that leverages both high-level and low-level features to improve performance on seven public datasets.


Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks

http://arxiv.org/abs/2402.14400v1

Compressor summary: The paper proposes a data-driven method to predict infants' neurodevelopmental maturation using 3D video recordings and graph convolutional networks, outperforming traditional machine learning approaches.


Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

http://arxiv.org/abs/2402.14398v1

Compressor summary: The paper proposes a novel dual-stream framework for image attribute editing that gradually injects lost details into the reconstruction and editing process, improving detail preservation and editability.


Semantic Image Synthesis with Unconditional Generator

http://arxiv.org/abs/2402.14395v1

Compressor summary: Key points: - SIS generates realistic images from semantic masks - Method uses pre-trained unconditional generator and feature rearranger - Proxy masks are created from feature maps by clustering - Semantic mapper produces proxy masks from different input conditions - Method works for various applications like sketch-to-photo Summary: The authors propose a method to generate realistic images from semantic masks using a pre-trained generator, feature rearranger, and semantic mapper, which can handle different input conditions and applications.


Graph Parsing Networks

http://arxiv.org/abs/2402.14393v1

Compressor summary: The Graph Parsing Network (GPN) adaptively learns personalized pooling structures for graphs, achieving better performance in graph classification tasks and preserving node information.


Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

http://arxiv.org/abs/2402.14392v1

Compressor summary: The paper proposes a novel tracking paradigm that uses a relevance attention mechanism and a global representation memory to adaptively select relevant historical information for different search regions in videos, improving tracking performance and reducing redundancy.


MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

http://arxiv.org/abs/2402.14391v1

Compressor summary: Microenvironment-Aware Protein Embedding for PPI prediction (MPAE-PPI) is a novel method that encodes protein microenvironments into discrete codes, using a large codebook and a pre-training strategy called Masked Codebook Modeling (MCM), to predict protein-protein interactions efficiently and effectively.


Securing Transactions: A Hybrid Dependable Ensemble Machine Learning Model using IHT-LR and Grid Search

http://arxiv.org/abs/2402.14389v1

Compressor summary: Key points: - Paper introduces a hybrid ensemble model that combines multiple ML algorithms with proper weight optimization to enhance credit card fraud detection - Model uses IHT technique and LR to address data imbalance issue - Model achieves high accuracy rates on a public dataset and outperforms existing works Summary: The paper presents a novel hybrid ensemble model that intelligently combines ML algorithms, optimizes their weights, and uses IHT and LR to deal with data imbalance. The model excels at detecting credit card fraud with high accuracy on a public dataset and surpasses existing methods.


WindDragon: Enhancing wind power forecasting with Automated Deep Learning

http://arxiv.org/abs/2402.14385v1

Compressor summary: The paper proposes an innovative approach for accurate short-term national wind power forecasting using deep learning and weather predictions.


Generative Adversarial Network with Soft-Dynamic Time Warping and Parallel Reconstruction for Energy Time Series Anomaly Detection

http://arxiv.org/abs/2402.14384v1

Compressor summary: The paper proposes a fast and accurate 1D DCGAN method for detecting anomalies in energy consumption data using soft-DTW and parallel computation.


Enhancing Temporal Knowledge Graph Forecasting with Large Language Models via Chain-of-History Reasoning

http://arxiv.org/abs/2402.14382v1

Compressor summary: Chain-of-History (CoH) reasoning improves LLM-based TKG forecasting by leveraging high-order historical information and enhancing graph-based models' performance.


Novi jezički modeli za srpski jezik

http://arxiv.org/abs/2402.14379v1

Compressor summary: The paper discusses transformer-based language models for Serbian, comparing ten vectorization models on four NLP tasks and exploring factors that influence their performance.


Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction

http://arxiv.org/abs/2402.14373v1

Compressor summary: The paper proposes SLCoLM, a framework that combines pre-trained language models and large language models to improve relational extraction for long-tailed data.


HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation

http://arxiv.org/abs/2402.14371v1

Compressor summary: The text introduces HR-APR, a framework that estimates uncertainty in camera pose regression from monocular images without relying on specific APR architectures, and uses it to improve performance.


Representation Learning for Frequent Subgraph Mining

http://arxiv.org/abs/2402.14367v1

Compressor summary: SPMiner is a novel neural approach for finding frequent subgraphs in large networks using graph neural networks, order embedding space, and an efficient search strategy.


OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

http://arxiv.org/abs/2402.14361v1

Compressor summary: OpenTab is an open-domain table reasoning framework that uses a retriever to fetch relevant tables and generates SQL programs to parse them, achieving high accuracy in inferencing tasks.


Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark

http://arxiv.org/abs/2402.14359v1

Compressor summary: The paper proposes a facet-aware metric and dataset to evaluate scientific summarization by large language models, highlighting their limitations and introducing a more logical approach.


Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?

http://arxiv.org/abs/2402.14355v1

Compressor summary: The paper explores how stories are better than rules for retrieving and using commonsense in large language models, and shows that improving stories with self-supervised fine-tuning enhances their effectiveness.


GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints

http://arxiv.org/abs/2402.14354v1

Compressor summary: The paper proposes GAM-Depth, a method for indoor depth estimation that uses gradient-aware masks and semantic constraints to handle textureless areas and object boundaries better than existing methods.


Dependable Distributed Training of Compressed Machine Learning Models

http://arxiv.org/abs/2402.14346v1

Compressor summary: DepL is a framework that ensures high-quality and efficient learning by making smart decisions on data, models, and resources, while considering both average and distribution of learning quality.


An Error-Matching Exclusion Method for Accelerating Visual SLAM

http://arxiv.org/abs/2402.14345v1

Compressor summary: The paper proposes an accelerated Visual SLAM method that combines GMS and RANSAC to remove mismatched features and rank high-confidence matches for faster performance while maintaining accuracy.


TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation

http://arxiv.org/abs/2402.14340v1

Compressor summary: TIE-KD is a new framework that improves monocular depth estimation by streamlining knowledge transfer from complex teacher models to compact student networks using explainable feature maps.


AURA: Natural Language Reasoning for Aleatoric Uncertainty in Rationales

http://arxiv.org/abs/2402.14337v1

Compressor summary: The authors propose a method for improving language models' reasoning skills by adapting to imperfect rationales, which can handle uncertainty and perform better in challenging scenarios.


HyperFast: Instant Classification for Tabular Data

http://arxiv.org/abs/2402.14335v1

Compressor summary: Key points: - HyperFast is a meta-trained hypernetwork for instant classification of tabular data in one forward pass - It generates a task-specific neural network tailored to an unseen dataset without training - It outperforms or matches other methods while being much faster and more adaptable Summary: HyperFast is a fast and flexible method for classifying tabular data using a meta-trained hypernetwork that creates a custom neural network on the fly.


INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models

http://arxiv.org/abs/2402.14334v1

Compressor summary: The text discusses the need for better understanding of users' intentions and preferences in information retrieval tasks, and introduces a new benchmark (INSTRUCTIR) to evaluate how well retrievers can follow user-aligned instructions.


From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection

http://arxiv.org/abs/2402.14332v1

Compressor summary: The paper proposes a method to efficiently select the best clustering algorithm from a set of candidates by subsampling the data and evaluating accuracy on smaller instances, with theoretical and empirical results supporting this approach.


Understanding and Patching Compositional Reasoning in LLMs

http://arxiv.org/abs/2402.14328v1

Compressor summary: The paper investigates why large language models struggle with compositional reasoning tasks and proposes CREME, a method that edits MHSA modules to improve their reasoning abilities.


Subobject-level Image Tokenization

http://arxiv.org/abs/2402.14327v1

Compressor summary: The paper proposes a subobject-level image tokenization method for vision language tasks, which improves efficiency over patch-based methods.


Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering

http://arxiv.org/abs/2402.14320v1

Compressor summary: Triad is a novel KBQA framework that leverages an LLM-based agent with three roles to achieve superior performance on various tasks compared to existing systems.


Assessing generalization capability of text ranking models in Polish

http://arxiv.org/abs/2402.14318v1

Compressor summary: The article evaluates various rerankers for retrieval-augmented generation in Polish and shows that effective optimization and large training data lead to compact, generalizable, and state-of-the-art models.


Place Anything into Any Video

http://arxiv.org/abs/2402.14316v1

Compressor summary: Place-Anything is a system that lets you insert any object into any video using just a picture or text description of the target element.


Typographic Text Generation with Off-the-Shelf Diffusion Model

http://arxiv.org/abs/2402.14314v1

Compressor summary: The paper presents a new typographic text generation system that combines ControlNet and Blended Latent Diffusion to generate and manipulate text on designs with specified font styles, colors, and effects.


Learning to Kern -- Set-wise Estimation of Optimal Letter Space

http://arxiv.org/abs/2402.14313v1

Compressor summary: The paper proposes pairwise and set-wise machine-learning models to automate kerning, a task of adjusting horizontal spaces between letter pairs in fonts, and shows that the set-wise model is more efficient and accurate.


Font Style Interpolation with Diffusion Models

http://arxiv.org/abs/2402.14311v1

Compressor summary: The paper proposes three methods to create new font styles by blending two reference fonts using diffusion models, and evaluates their effectiveness in generating both expected and surprising styles.


Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge

http://arxiv.org/abs/2402.14310v1

Compressor summary: Hint-before-Solving Prompting (HSP) improves the accuracy of reasoning tasks for large language models by guiding them to generate hints and intermediate steps.


YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

http://arxiv.org/abs/2402.14309v1

Compressor summary: YOLO-TLA is an improved object detection model that uses a larger scale feature map for small objects, a compact backbone network, and global attention to achieve higher accuracy in detecting small objects with fewer parameters and lower computational demand.


A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation

http://arxiv.org/abs/2402.14300v1

Compressor summary: SimICL is a novel visual in-context learning method that achieves high segmentation performance on wrist ultrasound images with limited annotations, potentially reducing human expert time for labeling and improving AI assisted image analysis.


Multi-modal Stance Detection: New Datasets and Model

http://arxiv.org/abs/2402.14298v1

Compressor summary: This paper proposes a new method for detecting public opinion from tweets with texts and images, using target information to learn features from both modalities.


Mitigating Biases of Large Language Models in Stance Detection with Calibration

http://arxiv.org/abs/2402.14296v1

Compressor summary: Key points: - Large language models (LLMs) can generate biased stances in stance detection tasks due to spurious sentiment-stance correlation and preference towards certain individuals and topics. - The paper proposes a novel gated calibration network (MB-Cal) to mitigate the biases on the stance reasoning results from LLMs. - Counterfactual augmented data is used to rectify stance biases. - MB-Cal achieves state-of-the-art results in stance detection tasks. Summary: The paper introduces a novel method (MB-Cal) that uses a gated calibration network and counterfactual data to reduce the bias of large language models in stance detection tasks, improving their performance.


High-arity PAC learning via exchangeability

http://arxiv.org/abs/2402.14294v1

Compressor summary: The paper develops a theory for learning with structured correlation using graphs, hypergraphs, and relational languages, and shows how high-arity PAC learnability depends on combinatorial dimension and uniform convergence.


Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education

http://arxiv.org/abs/2402.14293v1

Compressor summary: The study explores how Large Language Models can help in educational scenarios by using concept graphs and answering questions, and introduces a new benchmark called TutorQA.


CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations

http://arxiv.org/abs/2402.14290v1

Compressor summary: CEV-LM is a new language model that can adjust the pace, length, and complexity of text generation by controlling three metrics (speed, volume, and circuitousness).


TinyLLaVA: A Framework of Small-scale Large Multimodal Models

http://arxiv.org/abs/2402.14289v1

Compressor summary: The TinyLLaVA framework explores how to design and analyze small-scale LMMs by experimenting with different components and training methods, achieving comparable performance to larger models.


A Landmark-Aware Visual Navigation Dataset

http://arxiv.org/abs/2402.14281v1

Compressor summary: The LAVN dataset provides RGB observations and human point-click pairs for supervised learning of map building and exploration in real and virtual environments, with landmarks to simplify graph construction.