This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-25 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.16864v1
Compressor summary: StableNormal is a method that uses diffusion priors to estimate surface normals from images and videos with high quality and robustness, while avoiding stochastic inference and ensembling steps.
http://arxiv.org/abs/2406.16866v1
Compressor summary: The authors question the validity of existing REC benchmarks due to high labeling error rates and introduce Ref-L4, a more comprehensive benchmark for evaluating modern REC models.
http://arxiv.org/abs/2406.16863v1
Compressor summary: The paper proposes a tuning-free method for controlling the motion of videos generated by diffusion models, using modified noise sampling and attention mechanisms.
http://arxiv.org/abs/2406.16860v1
Compressor summary: Cambrian-1 is a family of vision-centric multimodal language models that explore various visual representations, introduce a new benchmark (CV-Bench), and propose a spatially-aware connector (SVA) to improve sensory grounding.
http://arxiv.org/abs/2406.16858v1
Compressor summary: EAGLE-2 is a fast and context-aware method to improve speculative sampling for inference with Large Language Models.
http://arxiv.org/abs/2406.16855v1
Compressor summary: DreamBench++ is an automated, human-aligned benchmark for evaluating personalized image generation using advanced GPT models and a diverse dataset.
http://arxiv.org/abs/2406.16853v1
Compressor summary: The paper introduces GeoMFormer, a Transformer-based molecular model that learns both invariant and equivariant features for accurate calculations and simulations of molecular systems.
http://arxiv.org/abs/2406.16852v1
Compressor summary: The paper proposes a method to extend the context length of language models for understanding long videos without additional training, and introduces a new benchmark and model (LongVA) that achieve state-of-the-art results on video tasks.
http://arxiv.org/abs/2406.16851v1
Compressor summary: LoCoVQA evaluates how well vision language models can handle long visual contexts and ignore irrelevant information, finding that they struggle with this task compared to text-based language models.
http://arxiv.org/abs/2406.16850v1
Compressor summary: The authors propose a novel pipeline for synthesizing noisy data to evaluate the robustness of SLAM models against various perturbations and introduce the Noisy-Replica benchmark.
http://arxiv.org/abs/2406.16846v1
Compressor summary: D3M is a debiasing technique that removes specific training examples causing model failures on minority groups without needing extra data or annotations.
http://arxiv.org/abs/2406.16845v1
Compressor summary: RaTEScore is a new metric for measuring the quality of AI-generated medical reports by analyzing crucial entities like diagnoses and anatomy, using a trained NER model that breaks down report components into these entities.
http://arxiv.org/abs/2406.16842v1
Compressor summary: The paper introduces FactRel, a new annotation scheme for studying factual relationships in news articles, and shows how it can be used to analyze media discourse and improve GPT-4's performance.
http://arxiv.org/abs/2406.16838v1
Compressor summary: The survey explores how increasing compute during inference can improve large language models' generation performance in different ways.
http://arxiv.org/abs/2406.16833v1
Compressor summary: The text describes a study that uses large language models to automate the annotation process for identifying user stances and dogmatism in long conversation threads, creating a dataset (USDC) for training small language models on these tasks.
http://arxiv.org/abs/2406.16829v1
Compressor summary: Key points: - Language models use tokens and subword units for prediction - Tokenization causes sampling bias in encoding schemes like maximum prefix matching - Proposed algorithm simulates token-free behavior from a tokenized model without finetuning or data Summary: The paper proposes an algorithm to overcome tokenization bias and simulate token-free behavior from language models.
http://arxiv.org/abs/2406.16821v1
Compressor summary: BADGER is a novel method that uses neural networks to improve binding affinity in structure-based drug design by guiding the diffusion process with gradients of an energy function.
http://arxiv.org/abs/2406.16817v1
Compressor summary: Key points: - Paper explores GPT-4V's application to autonomous driving in mining environments - GPT-4V can understand visual scenes and answer questions about them - GPT-4V has challenges with identifying vehicle types and managing interactions - GPT-4V shows effective navigation and strategic decision-making Summary: The paper evaluates how GPT-4V, a large visual language model, can perform autonomous driving in mining environments, where it excels at scene understanding and reasoning but struggles with specific vehicle identification and interactions.
http://arxiv.org/abs/2406.16815v1
Compressor summary: ClotheDreamer is a 3D garment synthesis method that generates wearable, production-ready garments from text prompts using Disentangled Clothe Gaussian Splatting and bidirectional Score Distillation Sampling.
http://arxiv.org/abs/2406.16810v1
Compressor summary: PISTOL is a pipeline for creating datasets to evaluate structural unlearning methods for large language models, revealing challenges and impact of model choice on unlearning performance.
http://arxiv.org/abs/2406.16807v1
Compressor summary: This paper explores the trade-offs between fine-grained and coarse-grained human feedback for text-to-image generation, highlighting the challenges and benefits of each type.
http://arxiv.org/abs/2406.16802v1
Compressor summary: The research note derives new lower and upper bounds for the worst-case regret in bandits with expert advice under restricted and standard feedback models.
http://arxiv.org/abs/2406.16801v1
Compressor summary: REQ summarizes 100 real GitHub edits as natural language instructions to evaluate Large Language Models' repository editing abilities.
http://arxiv.org/abs/2406.16797v1
Compressor summary: LoTA is a sparse adaptation method that improves multi-task adaptation of large language models by identifying and optimizing subnetworks, avoiding catastrophic forgetting, and enabling model merging.
http://arxiv.org/abs/2406.16793v1
Compressor summary: Adam-mini is a new optimizer that uses average learning rates within parameter blocks to reduce memory and achieve similar or better performance than AdamW on various language models.
http://arxiv.org/abs/2406.16791v1
Compressor summary: Key points: - The author presents their community effort to co-design cheaper, faster and more energy-efficient software and hardware for AI and ML with the help of CM, virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments. - CM is a framework that modularizes, automates and virtualizes the process of building, running, profiling and optimizing complex applications across different models, datasets, software and hardware. - The author donated CM and CM4MLOps to MLCommons to connect academia and industry and make AI accessible to everyone by automatically producing it from the most suitable components based on user needs and constraints. Summary: The author describes their community effort to use a framework called CM that automates and virtualizes the development and optimization of AI and ML systems across various models, datasets, software and hardware. They donated CM and its extensions to MLCommons to share knowledge and make AI a commodity for everyone.
http://arxiv.org/abs/2406.16784v1
Compressor summary: Transformers revolutionized natural language processing and have been applied to various computer vision tasks, including Multi-Object Tracking, where they show promise but are not yet the best method.
http://arxiv.org/abs/2406.16783v1
Compressor summary: The paper introduces M2Lingual, a large multilingual IFT dataset that covers various languages and tasks, and shows its effectiveness in aligning LLMs on diverse languages and tasks.
http://arxiv.org/abs/2406.16782v1
Compressor summary: The paper proposes an ICRL method that can estimate constraints from expert demonstrations with a specified confidence level and helps users decide when to collect more data.
http://arxiv.org/abs/2406.16779v1
Compressor summary: The order of inputs and emphasizing context improves reading comprehension model performance, especially for questions that require non-parametric knowledge.
http://arxiv.org/abs/2406.16778v1
Compressor summary: Edge Pruning is a fast and effective method for finding sparse computational subgraphs (circuits) in language models, enabling efficient interpretability and revealing insights into model behavior.
http://arxiv.org/abs/2406.16777v1
Compressor summary: The paper presents a speech translation system that incorporates a large language model to refine ASR and MT outputs, achieving significant improvements in word error rate and COMET score.
http://arxiv.org/abs/2406.16776v1
Compressor summary: The paper proposes a novel self-training network, InsTeacher3D, that leverages instance consistency regularization to improve semi-supervised 3D instance segmentation without relying on semantic pseudo labels.
http://arxiv.org/abs/2406.16772v1
Compressor summary: The report compares three AI models, Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o, using the OlympicArena benchmark, which measures their performance across various disciplines, and finds that Claude-3.5-Sonnet shows highly competitive overall performance, while open-source models lag behind proprietary ones, indicating that we are still far from achieving superintelligence.
http://arxiv.org/abs/2406.16768v1
Compressor summary: WARP is a novel alignment strategy for large language models that uses weight averaging to balance KL regularization and reward optimization, leading to better performance and quality than existing methods.
http://arxiv.org/abs/2406.16767v1
Compressor summary: The study compares emotional and descriptive aspects of human and machine storytelling using GPT-3.5 and finds significant differences along six dimensions, as well as similar biases based on narrative point-of-view and gender.
http://arxiv.org/abs/2406.16758v1
Compressor summary: The paper explores a method to speed up large language models' inference time by using draft models trained with speculative decoding in multilingual settings.
http://arxiv.org/abs/2406.16756v1
Compressor summary: The paper explores how machine learning models can affect human data, proposes a framework for stable and fair predictions, and presents new fairness interventions that address stability issues.
http://arxiv.org/abs/2406.16754v1
Compressor summary: The text proposes an ML-based framework for dynamic active sampling in MRI, which could enable point-of-care disease diagnosis with reduced field strength and acquisition time.
http://arxiv.org/abs/2406.16751v1
Compressor summary: The paper presents an Arabic zero-shot text-to-speech system that adapts existing data, uses dialect identification models, and fine-tunes an open-source XTTS model to generate high-quality speech for 31 unseen speakers and multiple dialects.
http://arxiv.org/abs/2406.16749v1
Compressor summary: The paper proposes a method to fit low-rank recurrent neural networks (RNNs) to noisy neural data using variational sequential Monte Carlo, achieving lower dimensional latent dynamics and efficient fixed point identification.
http://arxiv.org/abs/2406.16748v1
Compressor summary: OCALM is a method that uses language models to create interpretable rewards for reinforcement learning agents based on natural language task descriptions.
http://arxiv.org/abs/2406.16747v1
Compressor summary: SPARSEK Attention is a new sparse attention mechanism for Transformers that improves speed and memory efficiency while maintaining performance on language modeling and downstream tasks.
http://arxiv.org/abs/2406.16746v1
Compressor summary: The Foundation Model Development Cheatsheet is a collection of tools and resources for developing responsible AI models across text, vision, and speech modalities.
http://arxiv.org/abs/2406.16745v1
Compressor summary: MAXMINLCB is a new algorithm that optimizes nonlinear target functions with infinite domains by using pairwise comparisons and human feedback, outperforming existing methods and providing rate-optimal regret guarantees.
http://arxiv.org/abs/2406.16743v1
Compressor summary: The paper proposes Adversarial Contrastive Decoding, a method to improve language model safety by generating opposite prompts for contrastive decoding, reducing harmful responses with minimal data and computation.
http://arxiv.org/abs/2406.16740v1
Compressor summary: The paper introduces LP-FNO, a neural operator that can map lower-dimensional boundary data to solution functions in the entire domain, which can be applied to various problems with spatially varying boundary conditions.
http://arxiv.org/abs/2406.16738v1
Compressor summary: The text discusses the challenges of ensuring fairness in large language model-based classifiers, especially prompt-based ones, and presents some remediation techniques.
http://arxiv.org/abs/2406.16732v1
Compressor summary: CLIMATELI is a dataset for linking climate change entities to Wikipedia, which can help evaluate existing entity linking systems and propose automated filtering methods for this important topic.
http://arxiv.org/abs/2406.16728v1
Compressor summary: CausalMMM is a new method that discovers causal structures from data to improve online advertising budget allocation and GMV predictions, addressing challenges like causal heterogeneity and marketing response patterns.
http://arxiv.org/abs/2406.16722v1
Compressor summary: Mamba is a promising alternative to Transformers in natural language processing due to its structured state space model and various enhancements, leading to hybrid models that combine their strengths.
http://arxiv.org/abs/2406.16715v1
Compressor summary: GC-Bench is a framework that evaluates graph condensation methods and provides insights into their performance, characteristics, and potential applications.
http://arxiv.org/abs/2406.16714v1
Compressor summary: The paper introduces AutoDetect, a framework that uses three LLM-powered agents to systematically identify weaknesses in large language models across various tasks, leading to model improvements and enhanced performance.
http://arxiv.org/abs/2406.16710v1
Compressor summary: The paper proposes Portrait3D, a framework that uses identity information to generate high-quality 3D heads from single portrait images.
http://arxiv.org/abs/2406.16708v1
Compressor summary: CausalFormer is a transformer-based model that uses multi-kernel causal convolution and regression relevance propagation to discover causal relations in time series data, outperforming existing methods.
http://arxiv.org/abs/2406.16707v1
Compressor summary: The paper introduces a novel method for hierarchical reinforcement learning using Gaussian Processes to represent subgoals probabilistically, enabling adaptive memory and policy learning, and showing improved performance in various environments.
http://arxiv.org/abs/2406.16698v1
Compressor summary: The paper proposes a framework for learning interpretable fair representations using prior knowledge, which improves data utility and leads to more accurate and fair predictions.
http://arxiv.org/abs/2406.16697v1
Compressor summary: The paper analyzes the performance of two methods for escaping local minima or plateaus in greedy search algorithms, and identifies a threshold (crossover point) that determines which method is faster.
http://arxiv.org/abs/2406.16695v1
Compressor summary: GSD improves text-to-3D generation by incorporating 3D consistency and geometry awareness into the score distillation sampling process using 3D consistent noise, gradient warping, and gradient consistency loss.
http://arxiv.org/abs/2406.16694v1
Compressor summary: TRAIT is a framework that improves large language models' performance on specialized domains by selecting in-domain data from general corpora and generating task-oriented synthetic passages for continual pre-training.
http://arxiv.org/abs/2406.16690v1
Compressor summary: The paper investigates the scalability of three efficient linear architectures for large language models and shows they perform similarly to transformers with better linguistic skills and knowledge preservation.
http://arxiv.org/abs/2406.16689v1
Compressor summary: The text discusses how neural networks learn task-dependent features depending on their nonlinearity and other factors, and investigates the nature of the resulting coding schemes.
http://arxiv.org/abs/2406.16687v1
Compressor summary: The paper explores using untrained message passing layers in graph neural networks for link prediction, which can be competitive or better than fully trained MPNNs, especially with high-dimensional features.
http://arxiv.org/abs/2406.16683v1
Compressor summary: RSD is a method to improve diversity and quality in visual generation using repulsion of particles based on their similarity.
http://arxiv.org/abs/2406.16678v1
Compressor summary: The paper presents a new model called Segment any Text (SaT) that can segment text into sentences robustly, adaptably, and efficiently in diverse domains and languages, especially for poorly formatted text.
http://arxiv.org/abs/2406.16674v1
Compressor summary: The text discusses the importance of detecting lesser-known rhetorical figures computationally to better understand texts, and presents an overview of their linguistic and computational aspects, as well as challenges in this domain.
http://arxiv.org/abs/2406.16672v1
Compressor summary: CAVE is a model that generates structured and verifiable AV explanations using a Llama-3-8B, achieving good quality explanations and task accuracy on difficult datasets.
http://arxiv.org/abs/2406.16666v1
Compressor summary: This paper presents SSCN, a randomized coordinate second-order method for minimizing non-convex continuous functions in high-dimensional machine learning, with theoretical convergence guarantees and improved experimental performance.
http://arxiv.org/abs/2406.16659v1
Compressor summary: Data-driven models use digital technology, sensor networks, and computing hardware to create measurement systems that can interpret data and generate predictions in complex real-world contexts where expert understanding is limited.
http://arxiv.org/abs/2406.16655v1
Compressor summary: The study shows that large language models can transfer knowledge-free reasoning across multiple languages well, but knowledge retrieval transfers poorly and may involve different mechanisms for each language.
http://arxiv.org/abs/2406.16641v1
Compressor summary: The paper proposes a new multi-modal prompt learning method for blind AI generated image quality assessment, which uses vision-language consistency knowledge to guide the optimization of the prompts and outperforms existing models.
http://arxiv.org/abs/2406.16638v1
Compressor summary: This paper shows that feature fusion improves human activity recognition accuracy by combining spatial and temporal features using deep learning models like CNNs and Transformers on various datasets.
http://arxiv.org/abs/2406.16635v1
Compressor summary: ShadowLLM predicts attention head and neuron importance in large language models, enabling better sparsity patterns for speed and accuracy improvements.
http://arxiv.org/abs/2406.16633v1
Compressor summary: MLAAN is a new model that improves local learning methods for better performance and memory efficiency compared to end-to-end training.
http://arxiv.org/abs/2406.16626v1
Compressor summary: The paper discusses how surrogate models, like decision trees, can be used to hide discriminatory behavior in algorithmic decision-making systems and calls for more research on their effectiveness in achieving explainability.
http://arxiv.org/abs/2406.16623v1
Compressor summary: The method learns object pose and part-segmentation from two observations of articulated objects using implicit models and a decoupled optimization procedure.
http://arxiv.org/abs/2406.16620v1
Compressor summary: OmAgent is an advanced AI system that can efficiently store, retrieve, and understand detailed video content, using autonomous reasoning and tool-calling capabilities for complex tasks.
http://arxiv.org/abs/2406.16619v1
Compressor summary: The study introduces a random convolution feature expansion method that improves and stabilizes dynamic functional connectivity analysis compared to the widely used sliding-window method.
http://arxiv.org/abs/2406.16615v1
Compressor summary: The paper presents a novel approach to class incremental learning using Winning Subnetworks and three training strategies, achieving the first rank in the CLVision Challenge.
http://arxiv.org/abs/2406.16611v1
Compressor summary: This paper surveys and evaluates pre-trained language models for medical tasks, showing their potential to perform well even with limited resources.
http://arxiv.org/abs/2406.16608v1
Compressor summary: The paper studies generalized label shift, a new assumption for dealing with changing environments in learning scenarios, and proposes a kernel embedding-based correction algorithm to improve generalization and knowledge transfer.
http://arxiv.org/abs/2406.16606v1
Compressor summary: Fair cake-cutting studies how to divide resources fairly among participants, and its connection to supervised multi-label classification can help achieve optimal fair decisions in machine learning problems, sometimes at the cost of seemingly unfair cherry-picking.
http://arxiv.org/abs/2406.16605v1
Compressor summary: The authors investigate how well language models understand causal graphs and create CLEAR, a benchmark to measure their performance across different complexity levels.
http://arxiv.org/abs/2406.16601v1
Compressor summary: The text describes a new method for generating realistic fake videos of people copying motion from other videos, using perceptual and memory modules to improve image quality and temporal consistency.
http://arxiv.org/abs/2406.16593v1
Compressor summary: The paper proposes a mathematical innovation model with AI integration to measure the recyclability of electronic waste components and improve their recovery and sorting processes.
http://arxiv.org/abs/2406.16592v1
Compressor summary: The authors propose a method to balance demographic attributes in generated face datasets for fairer and more transparent face recognition and verification.
http://arxiv.org/abs/2406.16567v1
Compressor summary: The paper proposes a knowledge-driven progressive thought prompting method to generate multi-turn psychology-related dialogues using large language models, aiming to improve performance in the low-resource psychology domain.
http://arxiv.org/abs/2406.16564v1
Compressor summary: The paper presents a novel method for assessing traversability using point clouds, which combines PointNet with an encoder-decoder structure and spatio-temporal attention, achieving better performance than existing methods.
http://arxiv.org/abs/2406.16563v1
Compressor summary: Transformer models' sentence embeddings contain layers of linguistic information that can be separated and analyzed, such as chunks and their properties.
http://arxiv.org/abs/2406.16562v1
Compressor summary: EvalAlign is a new evaluation metric for text-to-image generative models that uses multimodal large language models to accurately reflect the performance of these models, based on image faithfulness and text-image alignment, and shows superior stability and human preference alignment than existing metrics.
http://arxiv.org/abs/2406.16557v1
Compressor summary: The paper introduces tilted k-means (TKM), a novel algorithm that achieves individual fairness in clustering by using a new objective function and a fairness metric, and proves its convergence under mild conditions.
http://arxiv.org/abs/2406.16555v1
Compressor summary: The paper studies the complexity of comparing and embedding planning instances, and proposes an algorithm to find isomorphisms when possible, while also improving the efficiency of a SAT solver with constraint propagation.
http://arxiv.org/abs/2406.16554v1
Compressor summary: Key points: - MoE is a framework for scaling up LLMs but has data-hungry and instability problems - The paper proposes building MoE models from existing dense LLMs, using expert construction and continual pre-training - The proposed method improves the performance of LLaMA-MoE models over dense models with similar activation parameters Summary: The paper presents a method to build MoE models from dense LLMs, which solves some scaling issues and achieves better results than dense models.
http://arxiv.org/abs/2406.16552v1
Compressor summary: HYPA-DBGNN is a novel approach that models temporal patterns in dynamic graphs by combining anomaly detection in time series data on graphs with a higher-order De Bruijn graph neural network.
http://arxiv.org/abs/2406.16544v1
Compressor summary: The paper presents a learned video codec for random access that performs better than or comparably to existing methods on common test conditions, using less data and better metrics.
http://arxiv.org/abs/2406.16540v1
Compressor summary: The paper proposes DAMP, a training method that improves DNNs' robustness to various corruptions by applying random multiplicative weight perturbations, and shows its effectiveness on image classification tasks.
http://arxiv.org/abs/2406.16537v1
Compressor summary: Character-Adapter is a framework for generating images with high-fidelity consistency of characters by using prompt-guided segmentation and dynamic region-level adapters.
http://arxiv.org/abs/2406.16536v1
Compressor summary: The paper introduces C-LLM, a method that uses character-level tokenization to improve Chinese spell checking by addressing the limitations of large language models on character-level constraints.
http://arxiv.org/abs/2406.16535v1
Compressor summary: Hidden Calibration improves In-Context Learning by using nearest centroid classification on hidden states instead of token probabilities, achieving better decision boundaries and 20% performance gain.
http://arxiv.org/abs/2406.16531v1
Compressor summary: This paper introduces the GIM dataset, which has large scale, rich content, and diverse manipulation, to evaluate image manipulation detection and location methods using a novel framework called GIMFormer that outperforms existing approaches.
http://arxiv.org/abs/2406.16529v1
Compressor summary: The paper proposes a graph-based model for cross-document relation extraction that incorporates non-bridge entities and debiasing strategies, achieving state-of-the-art results.
http://arxiv.org/abs/2406.16528v1
Compressor summary: The study evaluates large language models' ability to reason about cardinal directions using two datasets and finds that none of them can consistently perform well on the more complex dataset.
http://arxiv.org/abs/2406.16527v1
Compressor summary: The paper explores using machine learning techniques to help with systematic review processes by automating tasks such as categorizing publications, extracting key information, connecting evidence to existing datasets, and identifying subgroups of articles, with promising results for improving efficiency and analysis in reviews.
http://arxiv.org/abs/2406.16524v1
Compressor summary: The paper explores how knowledge distillation can improve smaller models' performance in multilingual NLP tasks and proposes a method to enhance initialization by copying the teacher model's weights to the student model.
http://arxiv.org/abs/2406.16521v1
Compressor summary: The CASTIC dataset contains sentences that help understand how positive and negative feedback can enhance self-motivation in various fields using computational methods.
http://arxiv.org/abs/2406.16518v1
Compressor summary: VMamba improves crack segmentation accuracy and efficiency on various surfaces compared to CNNs and Transformers.
http://arxiv.org/abs/2406.16513v1
Compressor summary: The paper introduces new multi-modal transformer-based architectures for crop mapping from satellite images that significantly outperform existing methods.
http://arxiv.org/abs/2406.16508v1
Compressor summary: The paper explores how subword vocabulary size affects large language model performance and proposes a method for adapting vocabulary during continual training.
http://arxiv.org/abs/2406.16502v1
Compressor summary: LOGCAN++ is a semantic segmentation model for remote sensing images that uses global and local class awareness to handle complex backgrounds, scale and orientation variations, and large intra-class variance.
http://arxiv.org/abs/2406.16495v1
Compressor summary: The paper proposes a new language model architecture that combines Mamba and Transformer, using a position information injection method and a biomimetic Observer-Thinker-Conceiver-Expresser (OTCE) design to achieve better performance in language modeling tasks.
http://arxiv.org/abs/2406.16490v1
Compressor summary: The study compares different methods for classifying legal data using large language models and finds that the zero-shot method performs well, achieving an F1 score of 64%.
http://arxiv.org/abs/2406.16489v1
Compressor summary: The study uses NLP techniques and machine learning models to find ways to spot DeepFake tweets and improve the quality of online information.
http://arxiv.org/abs/2406.16486v1
Compressor summary: The text proposes a new framework for collecting preference data in reinforcement learning from human feedback, which improves the quality of language model responses and reduces human labor.
http://arxiv.org/abs/2406.16481v1
Compressor summary: The paper introduces new quaternion activation functions that leverage the properties of the quaternion space and show improved performance on image classification tasks.
http://arxiv.org/abs/2406.16478v1
Compressor summary: The study analyzes multimodal interaction in motivational interviewing conversations, categorizes patients into different types, and develops a virtual agent that adapts its social and empathic behaviors based on the patient's behavior.
http://arxiv.org/abs/2406.16477v1
Compressor summary: This paper proposes using degradation-aligned language prompts to improve image super-resolution and address challenges such as semantic loss, visual artifacts, and visual hallucinations.
http://arxiv.org/abs/2406.16476v1
Compressor summary: ResMaster is a method that uses a low-res reference image to guide high-res image generation, improving quality and detail while maintaining coherence.
http://arxiv.org/abs/2406.16473v1
Compressor summary: SCIU is a two-stage framework that removes uncertainty from DFER datasets by pruning low-quality samples and correcting mislabeled data, improving performance.
http://arxiv.org/abs/2406.16469v1
Compressor summary: The paper proposes a semi-automated pipeline to create culturally inclusive vision-language models benchmarks using human-VLM collaboration, demonstrating its effectiveness with an example dataset for Korean culture.
http://arxiv.org/abs/2406.16468v1
Compressor summary: The paper analyzes how cosine similarity between points affects their magnitude and convergence in self-supervised learning, and proposes a simple initialization method called cut-initialization to improve performance.
http://arxiv.org/abs/2406.16464v1
Compressor summary: InterCLIP-MEP is a framework for detecting sarcasm in text-image combinations on social media, using a refined version of CLIP and a Memory-Enhanced Predictor to overcome biases and achieve state-of-the-art performance.
http://arxiv.org/abs/2406.16459v1
Compressor summary: The paper proposes a method for super-resolution that uses uncertainty-based representation of image degradation to enable self-supervised learning and improve performance.
http://arxiv.org/abs/2406.16456v1
Compressor summary: AUTOPRIV is a meta-learning method that automates privacy-preserving data de-identification for machine learning tasks, reducing computational complexity and energy consumption.
http://arxiv.org/abs/2406.16455v1
Compressor summary: The authors propose a way to find unsafe medical product suggestions by AI models and test it on a big language model.
http://arxiv.org/abs/2406.16450v1
Compressor summary: This study explores linear layer approximations in transformer-based large language models to reduce parameters, computational costs, and improve training dynamics, achieving better perplexity and throughput performance.
http://arxiv.org/abs/2406.16449v1
Compressor summary: R-Bench is a new benchmark for evaluating vision relationship hallucinations in LVLMs, which highlights three types of co-occurrences leading to hallucinations and shows LVLMs' limitations in visual reasoning.
http://arxiv.org/abs/2406.16442v1
Compressor summary: EmoBench evaluates MLLMs' emotional understanding abilities using images, videos, and text, while EmoLLM is a novel model that improves their performance significantly.
http://arxiv.org/abs/2406.16441v1
Compressor summary: The authors propose UniCode, an intermediate representation of algorithm steps using programming language conventions, to improve LLMs for code generation tasks, and demonstrate its effectiveness with UniCoder-Instruct dataset.
http://arxiv.org/abs/2406.16439v1
Compressor summary: CTTA is a technique to adapt models to changing domains, but it faces challenges in low-quality pseudo-labels and random parameter restoration; CTAOD addresses these issues with object-level contrastive learning, dynamic threshold strategy, and data-driven stochastic restoration.
http://arxiv.org/abs/2406.16437v1
Compressor summary: The paper analyzes how the Mixture-of-Experts model can mitigate catastrophic forgetting in continual learning by diversifying and specializing tasks among multiple experts, and provides theoretical results and empirical evidence.
http://arxiv.org/abs/2406.16434v1
Compressor summary: The paper proposes multi-threshold deep metric learning for facial expression recognition, which improves performance by using different thresholds to create distinct expression feature representations.
http://arxiv.org/abs/2406.16427v1
Compressor summary: DoNuSeg is a framework that uses class activation maps to generate adaptive and accurate pseudo masks for nuclei segmentation with point supervision, reducing the annotation burden.
http://arxiv.org/abs/2406.16426v1
Compressor summary: The paper uses clustering and machine learning to analyze and predict failures in power grids optimized by Deep Reinforcement Learning agents.
http://arxiv.org/abs/2406.16424v1
Compressor summary: MEMENTO is an RL approach that uses memory to improve neural solvers' adaptation at inference time for combinatorial optimization problems, enhancing performance under a given budget.
http://arxiv.org/abs/2406.16422v1
Compressor summary: The paper proposes a Frequency-Aware Prompting method with mutual attention to enhance Cross-Domain Few-Shot classification by selecting different frequency cues and learning generalizable inductive bias.
http://arxiv.org/abs/2406.16416v1
Compressor summary: The paper proposes a new method for editing multilingual knowledge in large language models by leveraging shared neurons that represent the same factual knowledge across languages.
http://arxiv.org/abs/2406.16384v1
Compressor summary: Horyon is an open-vocabulary VLM-based architecture that uses textual prompts to identify unseen objects and estimate their relative pose between two scenes, achieving state-of-the-art performance on four datasets.
http://arxiv.org/abs/2406.16382v1
Compressor summary: The text explores the ability of large language models (LLMs) to make sequential decisions using UNO Arena, a card game-based environment, and proposes TUTRI player, which improves LLM performance by reflecting on actions and strategy.
http://arxiv.org/abs/2406.16377v1
Compressor summary: The paper shows how three adaptation tools can be used interchangeably to improve large language models and suggests a framework with six transformation directions for practical applications.
http://arxiv.org/abs/2406.16374v1
Compressor summary: The paper proposes a method to integrate knowledge from graphs into language models using hierarchical reinforcement learning, which improves their performance on natural language understanding tasks.
http://arxiv.org/abs/2406.16372v1
Compressor summary: This paper presents a novel mechanism for enhancing cross-lingual natural language understanding by augmenting training data with context-aware semantic knowledge from multiple languages.
http://arxiv.org/abs/2406.16360v1
Compressor summary: MIRReS is a novel inverse rendering framework that reconstructs geometry, material, and lighting from multi-view images using explicit geometry, multi-bounce path tracing, Monte Carlo integration, and reservoir sampling.
http://arxiv.org/abs/2406.16357v1
Compressor summary: GASSIP is a novel method that searches for optimal lightweight Graph Neural Networks (GNNs) by jointly optimizing graph data sparsification and architecture pruning, achieving high performance on node classification tasks with fewer parameters and a sparser graph.
http://arxiv.org/abs/2406.16356v1
Compressor summary: The paper proposes an automatic evaluation method for assessing how well large language models follow instructions in generating story endings using machine reading comprehension.
http://arxiv.org/abs/2406.16355v1
Compressor summary: The paper presents a method for compact model parameter extraction using derivative-free optimization, addressing critical issues in device modeling with a carefully chosen loss function, train-test split, and applying it to two semiconductor devices.
http://arxiv.org/abs/2406.16351v1
Compressor summary: METRIK is a framework that optimizes planned missing designs for clinical trials using transformers and input masking, reducing data collection costs and improving imputation performance.
http://arxiv.org/abs/2406.16349v1
Compressor summary: The authors develop a method to automatically annotate tabular data using large language models and demonstrate its usefulness in various tasks, such as SQL translation and tabular classification.
http://arxiv.org/abs/2406.16346v1
Compressor summary: The text proposes fine-tuning large language and visual models using LORA with domain-specific instructional datasets to generate more precise results, demonstrating improvement on the YouCook2 dataset.
http://arxiv.org/abs/2406.16342v1
Compressor summary: ADVSCORE is a metric that measures how well an adversarial dataset fools models and not humans, and it helps create high-quality adversarial datasets like ADVQA for question answering.
http://arxiv.org/abs/2406.16341v1
Compressor summary: EHRCon is a new dataset and task for verifying data consistency between structured and unstructured EHR elements, while CheckEHR is a framework using large language models to check this consistency.
http://arxiv.org/abs/2406.16338v1
Compressor summary: VideoHallucer is a benchmark for detecting and categorizing hallucinations in large video-language models, revealing issues with current models and informing the development of self-PEP framework.
http://arxiv.org/abs/2406.16333v1
Compressor summary: The paper proposes a diffusion-based framework to improve Text-to-Image generation by predicting object locations and generating consistent images based on textual descriptions.
http://arxiv.org/abs/2406.16330v1
Compressor summary: MKA is a novel compression technique that uses manifold learning and NPIB to merge similar layers in large language models, achieving substantial compression ratios while preserving performance.
http://arxiv.org/abs/2406.16321v1
Compressor summary: MM-GRAPH is a new benchmark for multi-modal graph learning that uses text and visual information to better evaluate graph neural networks.
http://arxiv.org/abs/2406.16320v1
Compressor summary: NOTICE is a tool to understand how vision-language models make decisions by corrupting and evaluating images and text in a meaningful way.
http://arxiv.org/abs/2406.16319v1
Compressor summary: The paper proposes a new method for measuring vowel overlap by jointly modelling acoustic dimensions and simulating distributions from the model, which improves results over existing methods and allows for uncertainty computation.
http://arxiv.org/abs/2406.16316v1
Compressor summary: The paper explores if aligning Japanese language models with mostly English resources affects their moral alignment with Japanese culture and finds mixed results.
http://arxiv.org/abs/2406.16308v1
Compressor summary: The paper shows that large language models can detect tabular anomalies without extra training, and proposes a method to fine-tune them for better performance.
http://arxiv.org/abs/2406.16307v1
Compressor summary: The paper proposes a method using Criss-Cross Attention and residual dense block to improve text detection in artistic-style text, and introduces a new Movie-Poster dataset for this task.
http://arxiv.org/abs/2406.16306v1
Compressor summary: CARDS is a technique that generates high-reward and high-likelihood text efficiently by iteratively creating small semantic segments based on predictive uncertainty.
http://arxiv.org/abs/2406.16301v1
Compressor summary: The paper introduces BiSSV, a bimodal video summarization task that uses a large dataset (BIDS) and a unified framework (UBiSS) to generate both visual and textual summaries of videos.
http://arxiv.org/abs/2406.16300v1
Compressor summary: This paper proposes a model to understand the linear mode connectivity phenomenon in neural networks by analyzing the topological features of the loss landscape.
http://arxiv.org/abs/2406.16299v1
Compressor summary: The paper proposes Learnable Singular Value Increment (LSI), a method that improves quantization accuracy by making weight singular values learnable and allowing them to compensate each other based on activations, achieving state-of-the-art results in various settings.
http://arxiv.org/abs/2406.16297v1
Compressor summary: The paper proposes a novel model called PriorFormer for blind video quality assessment of user-generated content videos, which improves adaptability and representation capability using content and distortion priors.
http://arxiv.org/abs/2406.16295v1
Compressor summary: The paper proposes a Discrete Equivariant Graph Neural Network (DEGNN) that captures discrete symmetries in physical dynamics modeling and outperforms existing methods in various applications.
http://arxiv.org/abs/2406.16294v1
Compressor summary: LangSuitE is a testbed for evaluating language models' abilities as embodied agents in dynamic environments, using a novel chain-of-thought schema called EmMem to summarize embodied states.
http://arxiv.org/abs/2406.16293v1
Compressor summary: The paper proposes MLPAC, an RL-based framework that combines exploration and exploitation abilities for partially annotated multi-label tasks like document-level relation extraction.
http://arxiv.org/abs/2406.16289v1
Compressor summary: Key points: - The paper presents a crowd-sourced framework that uses data from production vehicles to reconstruct large-scale scenes with NeRF - The framework includes multiple modules for data selection, 3D reconstruction, appearance embedding, depth supervision, and occlusion completion - The system can generate high-quality 3D street view and guide the driver with a synthesized video Summary: The paper proposes a crowd-sourced framework that leverages NeRF to reconstruct large-scale scenes from production vehicle data, using various modules for quality enhancement and navigation guidance.
http://arxiv.org/abs/2406.16288v1
Compressor summary: PlagBench is a dataset for testing plagiarism detection in large language models and shows that some LLMs can outperform current commercial checkers.
http://arxiv.org/abs/2406.16282v1
Compressor summary: This paper proposes a theory and methods to reduce memory overhead in fine-tuning large models, using new activation functions and layer normalization techniques.
http://arxiv.org/abs/2406.16279v1
Compressor summary: SegNet4D is a novel, fast, and accurate method for real-time 4D LiDAR semantic segmentation in autonomous vehicles using a projection-based approach and instance-aware segmentation.
http://arxiv.org/abs/2406.16275v1
Compressor summary: The paper proposes FAILOpt, an attack to deceive AI text detectors by exploiting prompt-specific shortcuts, and uses it to enhance the robustness of the detector.
http://arxiv.org/abs/2406.16273v1
Compressor summary: YouDream is a text-to-image diffusion model that generates high-quality, anatomically controllable 3D animals using a 2D pose prior and a multi-agent LLM to adapt poses.
http://arxiv.org/abs/2406.16272v1
Compressor summary: Patcher is an automated repair approach for text-to-image models that fixes semantic inconsistencies by enhancing features of neglected objects in the prompt.
http://arxiv.org/abs/2406.16271v1
Compressor summary: GBMSeg is a training-free AI framework that automatically segments the glomerular basement membrane in TEM images using one labeled reference, achieving superior performance and robustness.
http://arxiv.org/abs/2406.16264v1
Compressor summary: NoCha is a dataset to test long-context LLMs' ability to retrieve, synthesize, and reason over book-length inputs, which requires global reasoning and proves challenging for current models.
http://arxiv.org/abs/2406.16260v1
Compressor summary: Video-Infinity is a distributed inference pipeline that uses clip parallelism and dual-scope attention to enable fast generation of long videos across multiple GPUs.
http://arxiv.org/abs/2406.16257v1
Compressor summary: S3T is a framework for exact machine unlearning that minimizes retraining costs and service disruptions by sequentially training model layers with disjoint data slices and handling multiple deletion requests simultaneously.
http://arxiv.org/abs/2406.16255v1
Compressor summary: The paper proposes a reward-free reinforcement learning algorithm, GFA-RFE, which uses uncertainty-aware exploration and weighted learning to improve sample efficiency in heterogeneous environments.
http://arxiv.org/abs/2406.16254v1
Compressor summary: This study explores how large language models represent and regulate uncertainty using entropy and token frequency neurons, which affect the normalization scale and logit distribution, respectively.
http://arxiv.org/abs/2406.16253v1
Compressor summary: This study explores how large language models can assist researchers in reviewing and identifying issues in NLP papers using a new dataset called ReviewCritique.
http://arxiv.org/abs/2406.16252v1
Compressor summary: The paper introduces a graph-augmented LLM framework that enhances the personalization and clarity of health insights by capturing inter and intra-patient relationships and using dynamic feature importance scores.
http://arxiv.org/abs/2406.16249v1
Compressor summary: The paper proposes an improved bound for value-prediction error in reinforcement learning and hierarchical abstraction, addressing model misspecification and compounding probability errors.