arxiv compressed, 2024-06-25

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-25 generated by the compressor, my personal LLM-based project.


StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

http://arxiv.org/abs/2406.16864v1

Compressor summary: StableNormal is a method that uses diffusion priors to estimate surface normals from images and videos with high quality and robustness, while avoiding stochastic inference and ensembling steps.


Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

http://arxiv.org/abs/2406.16866v1

Compressor summary: The authors question the validity of existing REC benchmarks due to high labeling error rates and introduce Ref-L4, a more comprehensive benchmark for evaluating modern REC models.


FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

http://arxiv.org/abs/2406.16863v1

Compressor summary: The paper proposes a tuning-free method for controlling the motion of videos generated by diffusion models, using modified noise sampling and attention mechanisms.


Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

http://arxiv.org/abs/2406.16860v1

Compressor summary: Cambrian-1 is a family of vision-centric multimodal language models that explore various visual representations, introduce a new benchmark (CV-Bench), and propose a spatially-aware connector (SVA) to improve sensory grounding.


EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

http://arxiv.org/abs/2406.16858v1

Compressor summary: EAGLE-2 is a fast and context-aware method to improve speculative sampling for inference with Large Language Models.


DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

http://arxiv.org/abs/2406.16855v1

Compressor summary: DreamBench++ is an automated, human-aligned benchmark for evaluating personalized image generation using advanced GPT models and a diverse dataset.


GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

http://arxiv.org/abs/2406.16853v1

Compressor summary: The paper introduces GeoMFormer, a Transformer-based molecular model that learns both invariant and equivariant features for accurate calculations and simulations of molecular systems.


Long Context Transfer from Language to Vision

http://arxiv.org/abs/2406.16852v1

Compressor summary: The paper proposes a method to extend the context length of language models for understanding long videos without additional training, and introduces a new benchmark and model (LongVA) that achieve state-of-the-art results on video tasks.


Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

http://arxiv.org/abs/2406.16851v1

Compressor summary: LoCoVQA evaluates how well vision language models can handle long visual contexts and ignore irrelevant information, finding that they struggle with this task compared to text-based language models.


From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

http://arxiv.org/abs/2406.16850v1

Compressor summary: The authors propose a novel pipeline for synthesizing noisy data to evaluate the robustness of SLAM models against various perturbations and introduce the Noisy-Replica benchmark.


Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

http://arxiv.org/abs/2406.16846v1

Compressor summary: D3M is a debiasing technique that removes specific training examples causing model failures on minority groups without needing extra data or annotations.


RaTEScore: A Metric for Radiology Report Generation

http://arxiv.org/abs/2406.16845v1

Compressor summary: RaTEScore is a new metric for measuring the quality of AI-generated medical reports by analyzing crucial entities like diagnoses and anatomy, using a trained NER model that breaks down report components into these entities.


Exploring Factual Entailment with NLI: A News Media Study

http://arxiv.org/abs/2406.16842v1

Compressor summary: The paper introduces FactRel, a new annotation scheme for studying factual relationships in news articles, and shows how it can be used to analyze media discourse and improve GPT-4's performance.


From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

http://arxiv.org/abs/2406.16838v1

Compressor summary: The survey explores how increasing compute during inference can improve large language models' generation performance in different ways.


USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations

http://arxiv.org/abs/2406.16833v1

Compressor summary: The text describes a study that uses large language models to automate the annotation process for identifying user stances and dogmatism in long conversation threads, creating a dataset (USDC) for training small language models on these tasks.


Understanding and Mitigating Tokenization Bias in Language Models

http://arxiv.org/abs/2406.16829v1

Compressor summary: Key points: - Language models use tokens and subword units for prediction - Tokenization causes sampling bias in encoding schemes like maximum prefix matching - Proposed algorithm simulates token-free behavior from a tokenized model without finetuning or data Summary: The paper proposes an algorithm to overcome tokenization bias and simulate token-free behavior from language models.


General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design

http://arxiv.org/abs/2406.16821v1

Compressor summary: BADGER is a novel method that uses neural networks to improve binding affinity in structure-based drug design by guiding the diffusion process with gradients of an energy function.


GPT-4V Explorations: Mining Autonomous Driving

http://arxiv.org/abs/2406.16817v1

Compressor summary: Key points: - Paper explores GPT-4V's application to autonomous driving in mining environments - GPT-4V can understand visual scenes and answer questions about them - GPT-4V has challenges with identifying vehicle types and managing interactions - GPT-4V shows effective navigation and strategic decision-making Summary: The paper evaluates how GPT-4V, a large visual language model, can perform autonomous driving in mining environments, where it excels at scene understanding and reasoning but struggles with specific vehicle identification and interactions.


ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

http://arxiv.org/abs/2406.16815v1

Compressor summary: ClotheDreamer is a 3D garment synthesis method that generates wearable, production-ready garments from text prompts using Disentangled Clothe Gaussian Splatting and bidirectional Score Distillation Sampling.


PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs

http://arxiv.org/abs/2406.16810v1

Compressor summary: PISTOL is a pipeline for creating datasets to evaluate structural unlearning methods for large language models, revealing challenges and impact of model choice on unlearning performance.


Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

http://arxiv.org/abs/2406.16807v1

Compressor summary: This paper explores the trade-offs between fine-grained and coarse-grained human feedback for text-to-image generation, highlighting the challenges and benefits of each type.


Improved Regret Bounds for Bandits with Expert Advice

http://arxiv.org/abs/2406.16802v1

Compressor summary: The research note derives new lower and upper bounds for the worst-case regret in bandits with expert advice under restricted and standard feedback models.


RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale

http://arxiv.org/abs/2406.16801v1

Compressor summary: REQ summarizes 100 real GitHub edits as natural language instructions to evaluate Large Language Models' repository editing abilities.


Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

http://arxiv.org/abs/2406.16797v1

Compressor summary: LoTA is a sparse adaptation method that improves multi-task adaptation of large language models by identifying and optimizing subnetworks, avoiding catastrophic forgetting, and enabling model merging.


Adam-mini: Use Fewer Learning Rates To Gain More

http://arxiv.org/abs/2406.16793v1

Compressor summary: Adam-mini is a new optimizer that uses average learning rates within parameter blocks to reduce memory and achieve similar or better performance than AdamW on various language models.


Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments

http://arxiv.org/abs/2406.16791v1

Compressor summary: Key points: - The author presents their community effort to co-design cheaper, faster and more energy-efficient software and hardware for AI and ML with the help of CM, virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments. - CM is a framework that modularizes, automates and virtualizes the process of building, running, profiling and optimizing complex applications across different models, datasets, software and hardware. - The author donated CM and CM4MLOps to MLCommons to connect academia and industry and make AI accessible to everyone by automatically producing it from the most suitable components based on user needs and constraints. Summary: The author describes their community effort to use a framework called CM that automates and virtualizes the development and optimization of AI and ML systems across various models, datasets, software and hardware. They donated CM and its extensions to MLCommons to share knowledge and make AI a commodity for everyone.


The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers

http://arxiv.org/abs/2406.16784v1

Compressor summary: Transformers revolutionized natural language processing and have been applied to various computer vision tasks, including Multi-Object Tracking, where they show promise but are not yet the best method.


M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

http://arxiv.org/abs/2406.16783v1

Compressor summary: The paper introduces M2Lingual, a large multilingual IFT dataset that covers various languages and tasks, and shows its effectiveness in aligning LLMs on diverse languages and tasks.


Confidence Aware Inverse Constrained Reinforcement Learning

http://arxiv.org/abs/2406.16782v1

Compressor summary: The paper proposes an ICRL method that can estimate constraints from expert demonstrations with a specified confidence level and helps users decide when to collect more data.


It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension

http://arxiv.org/abs/2406.16779v1

Compressor summary: The order of inputs and emphasizing context improves reading comprehension model performance, especially for questions that require non-parametric knowledge.


Finding Transformer Circuits with Edge Pruning

http://arxiv.org/abs/2406.16778v1

Compressor summary: Edge Pruning is a fast and effective method for finding sparse computational subgraphs (circuits) in language models, enabling efficient interpretability and revealing insights into model behavior.


Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

http://arxiv.org/abs/2406.16777v1

Compressor summary: The paper presents a speech translation system that incorporates a large language model to refine ASR and MT outputs, achieving significant improvements in word error rate and COMET score.


Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation

http://arxiv.org/abs/2406.16776v1

Compressor summary: The paper proposes a novel self-training network, InsTeacher3D, that leverages instance consistency regularization to improve semi-supervised 3D instance segmentation without relying on semantic pseudo labels.


OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

http://arxiv.org/abs/2406.16772v1

Compressor summary: The report compares three AI models, Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o, using the OlympicArena benchmark, which measures their performance across various disciplines, and finds that Claude-3.5-Sonnet shows highly competitive overall performance, while open-source models lag behind proprietary ones, indicating that we are still far from achieving superintelligence.


WARP: On the Benefits of Weight Averaged Rewarded Policies

http://arxiv.org/abs/2406.16768v1

Compressor summary: WARP is a novel alignment strategy for large language models that uses weight averaging to balance KL regularization and reward optimization, leading to better performance and quality than existing methods.


The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories

http://arxiv.org/abs/2406.16767v1

Compressor summary: The study compares emotional and descriptive aspects of human and machine storytelling using GPT-3.5 and finds significant differences along six dimensions, as well as similar biases based on narrative point-of-view and gender.


Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

http://arxiv.org/abs/2406.16758v1

Compressor summary: The paper explores a method to speed up large language models' inference time by using draft models trained with speculative decoding in multilingual settings.


Addressing Polarization and Unfairness in Performative Prediction

http://arxiv.org/abs/2406.16756v1

Compressor summary: The paper explores how machine learning models can affect human data, proposes a framework for stable and fair predictions, and presents new fairness interventions that address stability issues.


The MRI Scanner as a Diagnostic: Image-less Active Sampling

http://arxiv.org/abs/2406.16754v1

Compressor summary: The text proposes an ML-based framework for dynamic active sampling in MRI, which could enable point-of-care disease diagnosis with reduced field strength and acquisition time.


Towards Zero-Shot Text-To-Speech for Arabic Dialects

http://arxiv.org/abs/2406.16751v1

Compressor summary: The paper presents an Arabic zero-shot text-to-speech system that adapts existing data, uses dialect identification models, and fine-tunes an open-source XTTS model to generate high-quality speech for 31 unseen speakers and multiple dialects.


Inferring stochastic low-rank recurrent neural networks from neural data

http://arxiv.org/abs/2406.16749v1

Compressor summary: The paper proposes a method to fit low-rank recurrent neural networks (RNNs) to noisy neural data using variational sequential Monte Carlo, achieving lower dimensional latent dynamics and efficient fixed point identification.


OCALM: Object-Centric Assessment with Language Models

http://arxiv.org/abs/2406.16748v1

Compressor summary: OCALM is a method that uses language models to create interpretable rewards for reinforcement learning agents based on natural language task descriptions.


Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

http://arxiv.org/abs/2406.16747v1

Compressor summary: SPARSEK Attention is a new sparse attention mechanism for Transformers that improves speed and memory efficiency while maintaining performance on language modeling and downstream tasks.


The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

http://arxiv.org/abs/2406.16746v1

Compressor summary: The Foundation Model Development Cheatsheet is a collection of tools and resources for developing responsible AI models across text, vision, and speech modalities.


Bandits with Preference Feedback: A Stackelberg Game Perspective

http://arxiv.org/abs/2406.16745v1

Compressor summary: MAXMINLCB is a new algorithm that optimizes nonlinear target functions with infinite domains by using pairwise comparisons and human feedback, outperforming existing methods and providing rate-optimal regret guarantees.


Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization

http://arxiv.org/abs/2406.16743v1

Compressor summary: The paper proposes Adversarial Contrastive Decoding, a method to improve language model safety by generating opposite prompts for contrastive decoding, reducing harmful responses with minimal data and computation.


Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations

http://arxiv.org/abs/2406.16740v1

Compressor summary: The paper introduces LP-FNO, a neural operator that can map lower-dimensional boundary data to solution functions in the entire domain, which can be applied to various problems with spatially varying boundary conditions.


Inducing Group Fairness in LLM-Based Decisions

http://arxiv.org/abs/2406.16738v1

Compressor summary: The text discusses the challenges of ensuring fairness in large language model-based classifiers, especially prompt-based ones, and presents some remediation techniques.


CLIMATELI: Evaluating Entity Linking on Climate Change Data

http://arxiv.org/abs/2406.16732v1

Compressor summary: CLIMATELI is a dataset for linking climate change entities to Wikipedia, which can help evaluate existing entity linking systems and propose automated filtering methods for this important topic.


CausalMMM: Learning Causal Structure for Marketing Mix Modeling

http://arxiv.org/abs/2406.16728v1

Compressor summary: CausalMMM is a new method that discovers causal structures from data to improve online advertising budget allocation and GMV predictions, addressing challenges like causal heterogeneity and marketing response patterns.


Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

http://arxiv.org/abs/2406.16722v1

Compressor summary: Mamba is a promising alternative to Transformers in natural language processing due to its structured state space model and various enhancements, leading to hybrid models that combine their strengths.


GC-Bench: A Benchmark Framework for Graph Condensation with New Insights

http://arxiv.org/abs/2406.16715v1

Compressor summary: GC-Bench is a framework that evaluates graph condensation methods and provides insights into their performance, characteristics, and potential applications.


AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

http://arxiv.org/abs/2406.16714v1

Compressor summary: The paper introduces AutoDetect, a framework that uses three LLM-powered agents to systematically identify weaknesses in large language models across various tasks, leading to model improvements and enhanced performance.


Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

http://arxiv.org/abs/2406.16710v1

Compressor summary: The paper proposes Portrait3D, a framework that uses identity information to generate high-quality 3D heads from single portrait images.


CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

http://arxiv.org/abs/2406.16708v1

Compressor summary: CausalFormer is a transformer-based model that uses multi-kernel causal convolution and regression relevance propagation to discover causal relations in time series data, outperforming existing methods.


Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

http://arxiv.org/abs/2406.16707v1

Compressor summary: The paper introduces a novel method for hierarchical reinforcement learning using Gaussian Processes to represent subgoals probabilistically, enabling adaptive memory and policy learning, and showing improved performance in various environments.


Learning Interpretable Fair Representations

http://arxiv.org/abs/2406.16698v1

Compressor summary: The paper proposes a framework for learning interpretable fair representations using prior knowledge, which improves data utility and leads to more accurate and fair predictions.


Expected Runtime Comparisons Between Breadth-First Search and Constant-Depth Restarting Random Walks

http://arxiv.org/abs/2406.16697v1

Compressor summary: The paper analyzes the performance of two methods for escaping local minima or plateaus in greedy search algorithms, and identifies a threshold (crossover point) that determines which method is faster.


Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

http://arxiv.org/abs/2406.16695v1

Compressor summary: GSD improves text-to-3D generation by incorporating 3D consistency and geometry awareness into the score distillation sampling process using 3D consistent noise, gradient warping, and gradient consistency loss.


Task Oriented In-Domain Data Augmentation

http://arxiv.org/abs/2406.16694v1

Compressor summary: TRAIT is a framework that improves large language models' performance on specialized domains by selecting in-domain data from general corpora and generating task-oriented synthetic passages for continual pre-training.


Scaling Laws for Linear Complexity Language Models

http://arxiv.org/abs/2406.16690v1

Compressor summary: The paper investigates the scalability of three efficient linear architectures for large language models and shows they perform similarly to transformers with better linguistic skills and knowledge preservation.


Coding schemes in neural networks learning classification tasks

http://arxiv.org/abs/2406.16689v1

Compressor summary: The text discusses how neural networks learn task-dependent features depending on their nonlinearity and other factors, and investigates the nature of the resulting coding schemes.


Link Prediction with Untrained Message Passing Layers

http://arxiv.org/abs/2406.16687v1

Compressor summary: The paper explores using untrained message passing layers in graph neural networks for link prediction, which can be competitive or better than fully trained MPNNs, especially with high-dimensional features.


Repulsive Score Distillation for Diverse Sampling of Diffusion Models

http://arxiv.org/abs/2406.16683v1

Compressor summary: RSD is a method to improve diversity and quality in visual generation using repulsion of particles based on their similarity.


Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

http://arxiv.org/abs/2406.16678v1

Compressor summary: The paper presents a new model called Segment any Text (SaT) that can segment text into sentences robustly, adaptably, and efficiently in diverse domains and languages, especially for poorly formatted text.


Computational Approaches to the Detection of Lesser-Known Rhetorical Figures: A Systematic Survey and Research Challenges

http://arxiv.org/abs/2406.16674v1

Compressor summary: The text discusses the importance of detecting lesser-known rhetorical figures computationally to better understand texts, and presents an overview of their linguistic and computational aspects, as well as challenges in this domain.


CAVE: Controllable Authorship Verification Explanations

http://arxiv.org/abs/2406.16672v1

Compressor summary: CAVE is a model that generates structured and verifiable AV explanations using a Llama-3-8B, achieving good quality explanations and task accuracy on difficult datasets.


Cubic regularized subspace Newton for non-convex optimization

http://arxiv.org/abs/2406.16666v1

Compressor summary: This paper presents SSCN, a randomized coordinate second-order method for minimizing non-convex continuous functions in high-dimensional machine learning, with theoretical convergence guarantees and improved experimental performance.


Data-driven Modeling in Metrology -- A Short Introduction, Current Developments and Future Perspectives

http://arxiv.org/abs/2406.16659v1

Compressor summary: Data-driven models use digital technology, sensor networks, and computing hardware to create measurement systems that can interpret data and generate predictions in complex real-world contexts where expert understanding is limited.


Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

http://arxiv.org/abs/2406.16655v1

Compressor summary: The study shows that large language models can transfer knowledge-free reasoning across multiple languages well, but knowledge retrieval transfers poorly and may involve different mechanisms for each language.


Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

http://arxiv.org/abs/2406.16641v1

Compressor summary: The paper proposes a new multi-modal prompt learning method for blind AI generated image quality assessment, which uses vision-language consistency knowledge to guide the optimization of the prompts and outperforms existing models.


Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

http://arxiv.org/abs/2406.16638v1

Compressor summary: This paper shows that feature fusion improves human activity recognition accuracy by combining spatial and temporal features using deep learning models like CNNs and Transformers on various datasets.


ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

http://arxiv.org/abs/2406.16635v1

Compressor summary: ShadowLLM predicts attention head and neuron importance in large language models, enabling better sparsity patterns for speed and accuracy improvements.


MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

http://arxiv.org/abs/2406.16633v1

Compressor summary: MLAAN is a new model that improves local learning methods for better performance and memory efficiency compared to end-to-end training.


Hacking a surrogate model approach to XAI

http://arxiv.org/abs/2406.16626v1

Compressor summary: The paper discusses how surrogate models, like decision trees, can be used to hide discriminatory behavior in algorithmic decision-making systems and calls for more research on their effectiveness in achieving explainability.


Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

http://arxiv.org/abs/2406.16623v1

Compressor summary: The method learns object pose and part-segmentation from two observations of articulated objects using implicit models and a decoupled optimization procedure.


OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

http://arxiv.org/abs/2406.16620v1

Compressor summary: OmAgent is an advanced AI system that can efficiently store, retrieve, and understand detailed video content, using autonomous reasoning and tool-calling capabilities for complex tasks.


No More Sliding-Windows: Dynamic Functional Connectivity Based On Random Convolutions Without Learning

http://arxiv.org/abs/2406.16619v1

Compressor summary: The study introduces a random convolution feature expansion method that improves and stabilizes dynamic functional connectivity analysis compared to the widely used sliding-window method.


The Championship-Winning Solution for the 5th CLVISION Challenge 2024

http://arxiv.org/abs/2406.16615v1

Compressor summary: The paper presents a novel approach to class incremental learning using Winning Subnetworks and three training strategies, achieving the first rank in the CLVision Challenge.


Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings

http://arxiv.org/abs/2406.16611v1

Compressor summary: This paper surveys and evaluates pre-trained language models for medical tasks, showing their potential to perform well even with limited resources.


When Invariant Representation Learning Meets Label Shift: Insufficiency and Theoretical Insights

http://arxiv.org/abs/2406.16608v1

Compressor summary: The paper studies generalized label shift, a new assumption for dealing with changing environments in learning scenarios, and proposes a kernel embedding-based correction algorithm to improve generalization and knowledge transfer.


Cherry on the Cake: Fairness is NOT an Optimization Problem

http://arxiv.org/abs/2406.16606v1

Compressor summary: Fair cake-cutting studies how to divide resources fairly among participants, and its connection to supervised multi-label classification can help achieve optimal fair decisions in machine learning problems, sometimes at the cost of seemingly unfair cherry-picking.


CLEAR: Can Language Models Really Understand Causal Graphs?

http://arxiv.org/abs/2406.16605v1

Compressor summary: The authors investigate how well language models understand causal graphs and create CLEAR, a benchmark to measure their performance across different complexity levels.


Do As I Do: Pose Guided Human Motion Copy

http://arxiv.org/abs/2406.16601v1

Compressor summary: The text describes a new method for generating realistic fake videos of people copying motion from other videos, using perceptual and memory modules to improve image quality and temporal consistency.


Measuring the Recyclability of Electronic Components to Assist Automatic Disassembly and Sorting Waste Printed Circuit Boards

http://arxiv.org/abs/2406.16593v1

Compressor summary: The paper proposes a mathematical innovation model with AI integration to measure the recyclability of electronic waste components and improve their recovery and sorting processes.


Toward Fairer Face Recognition Datasets

http://arxiv.org/abs/2406.16592v1

Compressor summary: The authors propose a method to balance demographic attributes in generated face datasets for fairer and more transparent face recognition and verification.


Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

http://arxiv.org/abs/2406.16567v1

Compressor summary: The paper proposes a knowledge-driven progressive thought prompting method to generate multi-turn psychology-related dialogues using large language models, aiming to improve performance in the low-resource psychology domain.


FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud

http://arxiv.org/abs/2406.16564v1

Compressor summary: The paper presents a novel method for assessing traversability using point clouds, which combines PointNet with an encoder-decoder structure and spatio-temporal attention, achieving better performance than existing methods.


Are there identifiable structural parts in the sentence embedding whole?

http://arxiv.org/abs/2406.16563v1

Compressor summary: Transformer models' sentence embeddings contain layers of linguistic information that can be separated and analyzed, such as chunks and their properties.


EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations

http://arxiv.org/abs/2406.16562v1

Compressor summary: EvalAlign is a new evaluation metric for text-to-image generative models that uses multimodal large language models to accurately reflect the performance of these models, based on image faithfulness and text-image alignment, and shows superior stability and human preference alignment than existing metrics.


Efficient k-means with Individual Fairness via Exponential Tilting

http://arxiv.org/abs/2406.16557v1

Compressor summary: The paper introduces tilted k-means (TKM), a novel algorithm that achieves individual fairness in clustering by using a new objective function and a fairness metric, and proves its convergence under mild conditions.


Homomorphisms and Embeddings of STRIPS Planning Models

http://arxiv.org/abs/2406.16555v1

Compressor summary: The paper studies the complexity of comparing and embedding planning instances, and proposes an algorithm to find isomorphisms when possible, while also improving the efficiency of a SAT solver with constraint propagation.


LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

http://arxiv.org/abs/2406.16554v1

Compressor summary: Key points: - MoE is a framework for scaling up LLMs but has data-hungry and instability problems - The paper proposes building MoE models from existing dense LLMs, using expert construction and continual pre-training - The proposed method improves the performance of LLaMA-MoE models over dense models with similar activation parameters Summary: The paper presents a method to build MoE models from dense LLMs, which solves some scaling issues and achieves better results than dense models.


Inference of Sequential Patterns for Neural Message Passing in Temporal Graphs

http://arxiv.org/abs/2406.16552v1

Compressor summary: HYPA-DBGNN is a novel approach that models temporal patterns in dynamic graphs by combining anomaly detection in time series data on graphs with a higher-order De Bruijn graph neural network.


Hierarchical B-frame Video Coding for Long Group of Pictures

http://arxiv.org/abs/2406.16544v1

Compressor summary: The paper presents a learned video codec for random access that performs better than or comparably to existing methods on common test conditions, using less data and better metrics.


Improving robustness to corruptions with multiplicative weight perturbations

http://arxiv.org/abs/2406.16540v1

Compressor summary: The paper proposes DAMP, a training method that improves DNNs' robustness to various corruptions by applying random multiplicative weight perturbations, and shows its effectiveness on image classification tasks.


Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

http://arxiv.org/abs/2406.16537v1

Compressor summary: Character-Adapter is a framework for generating images with high-fidelity consistency of characters by using prompt-guided segmentation and dynamic region-level adapters.


C-LLM: Learn to Check Chinese Spelling Errors Character by Character

http://arxiv.org/abs/2406.16536v1

Compressor summary: The paper introduces C-LLM, a method that uses character-level tokenization to improve Chinese spell checking by addressing the limitations of large language models on character-level constraints.


Token-based Decision Criteria Are Suboptimal in In-context Learning

http://arxiv.org/abs/2406.16535v1

Compressor summary: Hidden Calibration improves In-Context Learning by using nearest centroid classification on hidden states instead of token probabilities, achieving better decision boundaries and 20% performance gain.


GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

http://arxiv.org/abs/2406.16531v1

Compressor summary: This paper introduces the GIM dataset, which has large scale, rich content, and diverse manipulation, to evaluate image manipulation detection and location methods using a novel framework called GIMFormer that outperforms existing approaches.


Towards Better Graph-based Cross-document Relation Extraction via Non-bridge Entity Enhancement and Prediction Debiasing

http://arxiv.org/abs/2406.16529v1

Compressor summary: The paper proposes a graph-based model for cross-document relation extraction that incorporates non-bridge entities and debiasing strategies, achieving state-of-the-art results.


Evaluating the Ability of Large Language Models to Reason about Cardinal Directions

http://arxiv.org/abs/2406.16528v1

Compressor summary: The study evaluates large language models' ability to reason about cardinal directions using two datasets and finds that none of them can consistently perform well on the more complex dataset.


SyROCCo: Enhancing Systematic Reviews using Machine Learning

http://arxiv.org/abs/2406.16527v1

Compressor summary: The paper explores using machine learning techniques to help with systematic review processes by automating tasks such as categorizing publications, extracting key information, connecting evidence to existing datasets, and identifying subgroups of articles, with promising results for improving efficiency and analysis in reviews.


The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation

http://arxiv.org/abs/2406.16524v1

Compressor summary: The paper explores how knowledge distillation can improve smaller models' performance in multilingual NLP tasks and proposes a method to enhance initialization by copying the teacher model's weights to the student model.


Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

http://arxiv.org/abs/2406.16521v1

Compressor summary: The CASTIC dataset contains sentences that help understand how positive and negative feedback can enhance self-motivation in various fields using computational methods.


Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces

http://arxiv.org/abs/2406.16518v1

Compressor summary: VMamba improves crack segmentation accuracy and efficiency on various surfaces compared to CNNs and Transformers.


Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series

http://arxiv.org/abs/2406.16513v1

Compressor summary: The paper introduces new multi-modal transformer-based architectures for crop mapping from satellite images that significantly outperform existing methods.


Large Vocabulary Size Improves Large Language Models

http://arxiv.org/abs/2406.16508v1

Compressor summary: The paper explores how subword vocabulary size affects large language model performance and proposes a method for adapting vocabulary during continual training.


LOGCAN++: Local-global class-aware network for semantic segmentation of remote sensing images

http://arxiv.org/abs/2406.16502v1

Compressor summary: LOGCAN++ is a semantic segmentation model for remote sensing images that uses global and local class awareness to handle complex backgrounds, scale and orientation variations, and large intra-class variance.


OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

http://arxiv.org/abs/2406.16495v1

Compressor summary: The paper proposes a new language model architecture that combines Mamba and Transformer, using a position information injection method and a biomimetic Observer-Thinker-Conceiver-Expresser (OTCE) design to achieve better performance in language modeling tasks.


eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

http://arxiv.org/abs/2406.16490v1

Compressor summary: The study compares different methods for classifying legal data using large language models and finds that the zero-shot method performs well, achieving an F1 score of 64%.


Deepfake tweets automatic detection

http://arxiv.org/abs/2406.16489v1

Compressor summary: The study uses NLP techniques and machine learning models to find ways to spot DeepFake tweets and improve the quality of online information.


Towards Comprehensive Preference Data Collection for Reward Modeling

http://arxiv.org/abs/2406.16486v1

Compressor summary: The text proposes a new framework for collecting preference data in reinforcement learning from human feedback, which improves the quality of language model responses and reduces human labor.


Improving Quaternion Neural Networks with Quaternionic Activation Functions

http://arxiv.org/abs/2406.16481v1

Compressor summary: The paper introduces new quaternion activation functions that leverage the properties of the quaternion space and show improved performance on image classification tasks.


EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations

http://arxiv.org/abs/2406.16478v1

Compressor summary: The study analyzes multimodal interaction in motivational interviewing conversations, categorizes patients into different types, and develops a virtual agent that adapts its social and empathic behaviors based on the patient's behavior.


DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

http://arxiv.org/abs/2406.16477v1

Compressor summary: This paper proposes using degradation-aligned language prompts to improve image super-resolution and address challenges such as semantic loss, visual artifacts, and visual hallucinations.


ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

http://arxiv.org/abs/2406.16476v1

Compressor summary: ResMaster is a method that uses a low-res reference image to guide high-res image generation, improving quality and detail while maintaining coherence.


Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

http://arxiv.org/abs/2406.16473v1

Compressor summary: SCIU is a two-stage framework that removes uncertainty from DFER datasets by pruning low-quality samples and correcting mislabeled data, improving performance.


Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

http://arxiv.org/abs/2406.16469v1

Compressor summary: The paper proposes a semi-automated pipeline to create culturally inclusive vision-language models benchmarks using human-VLM collaboration, demonstrating its effectiveness with an example dataset for Korean culture.


The Hidden Pitfalls of the Cosine Similarity Loss

http://arxiv.org/abs/2406.16468v1

Compressor summary: The paper analyzes how cosine similarity between points affects their magnitude and convergence in self-supervised learning, and proposes a simple initialization method called cut-initialization to improve performance.


InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

http://arxiv.org/abs/2406.16464v1

Compressor summary: InterCLIP-MEP is a framework for detecting sarcasm in text-image combinations on social media, using a refined version of CLIP and a Memory-Enhanced Predictor to overcome biases and achieve state-of-the-art performance.


Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

http://arxiv.org/abs/2406.16459v1

Compressor summary: The paper proposes a method for super-resolution that uses uncertainty-based representation of image degradation to enable self-supervised learning and improve performance.


Automated Privacy-Preserving Techniques via Meta-Learning

http://arxiv.org/abs/2406.16456v1

Compressor summary: AUTOPRIV is a meta-learning method that automates privacy-preserving data de-identification for machine learning tasks, reducing computational complexity and energy consumption.


Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models

http://arxiv.org/abs/2406.16455v1

Compressor summary: The authors propose a way to find unsafe medical product suggestions by AI models and test it on a big language model.


Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

http://arxiv.org/abs/2406.16450v1

Compressor summary: This study explores linear layer approximations in transformer-based large language models to reduce parameters, computational costs, and improve training dynamics, achieving better perplexity and throughput performance.


Evaluating and Analyzing Relationship Hallucinations in LVLMs

http://arxiv.org/abs/2406.16449v1

Compressor summary: R-Bench is a new benchmark for evaluating vision relationship hallucinations in LVLMs, which highlights three types of co-occurrences leading to hallucinations and shows LVLMs' limitations in visual reasoning.


EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

http://arxiv.org/abs/2406.16442v1

Compressor summary: EmoBench evaluates MLLMs' emotional understanding abilities using images, videos, and text, while EmoLLM is a novel model that improves their performance significantly.


UniCoder: Scaling Code Large Language Model via Universal Code

http://arxiv.org/abs/2406.16441v1

Compressor summary: The authors propose UniCode, an intermediate representation of algorithm steps using programming language conventions, to improve LLMs for code generation tasks, and demonstrate its effectiveness with UniCoder-Instruct dataset.


Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

http://arxiv.org/abs/2406.16439v1

Compressor summary: CTTA is a technique to adapt models to changing domains, but it faces challenges in low-quality pseudo-labels and random parameter restoration; CTAOD addresses these issues with object-level contrastive learning, dynamic threshold strategy, and data-driven stochastic restoration.


Theory on Mixture-of-Experts in Continual Learning

http://arxiv.org/abs/2406.16437v1

Compressor summary: The paper analyzes how the Mixture-of-Experts model can mitigate catastrophic forgetting in continual learning by diversifying and specializing tasks among multiple experts, and provides theoretical results and empirical evidence.


Multi-threshold Deep Metric Learning for Facial Expression Recognition

http://arxiv.org/abs/2406.16434v1

Compressor summary: The paper proposes multi-threshold deep metric learning for facial expression recognition, which improves performance by using different thresholds to create distinct expression feature representations.


Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation

http://arxiv.org/abs/2406.16427v1

Compressor summary: DoNuSeg is a framework that uses class activation maps to generate adaptive and accurate pseudo masks for nuclei segmentation with point supervision, reducing the annotation burden.


Fault Detection for agents on power grid topology optimization: A Comprehensive analysis

http://arxiv.org/abs/2406.16426v1

Compressor summary: The paper uses clustering and machine learning to analyze and predict failures in power grids optimized by Deep Reinforcement Learning agents.


Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

http://arxiv.org/abs/2406.16424v1

Compressor summary: MEMENTO is an RL approach that uses memory to improve neural solvers' adaptation at inference time for combinatorial optimization problems, enhancing performance under a given budget.


Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

http://arxiv.org/abs/2406.16422v1

Compressor summary: The paper proposes a Frequency-Aware Prompting method with mutual attention to enhance Cross-Domain Few-Shot classification by selecting different frequency cues and learning generalizable inductive bias.


Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

http://arxiv.org/abs/2406.16416v1

Compressor summary: The paper proposes a new method for editing multilingual knowledge in large language models by leveraging shared neurons that represent the same factual knowledge across languages.


High-resolution open-vocabulary object 6D pose estimation

http://arxiv.org/abs/2406.16384v1

Compressor summary: Horyon is an open-vocabulary VLM-based architecture that uses textual prompts to identify unseen objects and estimate their relative pose between two scenes, achieving state-of-the-art performance on four datasets.


UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models

http://arxiv.org/abs/2406.16382v1

Compressor summary: The text explores the ability of large language models (LLMs) to make sequential decisions using UNO Arena, a card game-based environment, and proposes TUTRI player, which improves LLM performance by reflecting on actions and strategy.


On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

http://arxiv.org/abs/2406.16377v1

Compressor summary: The paper shows how three adaptation tools can be used interchangeably to improve large language models and suggests a framework with six transformation directions for practical applications.


KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

http://arxiv.org/abs/2406.16374v1

Compressor summary: The paper proposes a method to integrate knowledge from graphs into language models using hierarchical reinforcement learning, which improves their performance on natural language understanding tasks.


UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding

http://arxiv.org/abs/2406.16372v1

Compressor summary: This paper presents a novel mechanism for enhancing cross-lingual natural language understanding by augmenting training data with context-aware semantic knowledge from multiple languages.


MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

http://arxiv.org/abs/2406.16360v1

Compressor summary: MIRReS is a novel inverse rendering framework that reconstructs geometry, material, and lighting from multi-view images using explicit geometry, multi-bounce path tracing, Monte Carlo integration, and reservoir sampling.


Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

http://arxiv.org/abs/2406.16357v1

Compressor summary: GASSIP is a novel method that searches for optimal lightweight Graph Neural Networks (GNNs) by jointly optimizing graph data sparsification and architecture pruning, achieving high performance on node classification tasks with fewer parameters and a sparser graph.


Evaluation of Instruction-Following Ability for Large Language Models on Story-Ending Generation

http://arxiv.org/abs/2406.16356v1

Compressor summary: The paper proposes an automatic evaluation method for assessing how well large language models follow instructions in generating story endings using machine reading comprehension.


Compact Model Parameter Extraction via Derivative-Free Optimization

http://arxiv.org/abs/2406.16355v1

Compressor summary: The paper presents a method for compact model parameter extraction using derivative-free optimization, addressing critical issues in device modeling with a carefully chosen loss function, train-test split, and applying it to two semiconductor devices.


METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking

http://arxiv.org/abs/2406.16351v1

Compressor summary: METRIK is a framework that optimizes planned missing designs for clinical trials using transformers and input masking, reducing data collection costs and improving imputation performance.


AnnotatedTables: A Large Tabular Dataset with Language Model Annotations

http://arxiv.org/abs/2406.16349v1

Compressor summary: The authors develop a method to automatically annotate tabular data using large language models and demonstrate its usefulness in various tasks, such as SQL translation and tabular classification.


Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks

http://arxiv.org/abs/2406.16346v1

Compressor summary: The text proposes fine-tuning large language and visual models using LORA with domain-specific instructional datasets to generate more precise results, demonstrating improvement on the YouCook2 dataset.


ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks

http://arxiv.org/abs/2406.16342v1

Compressor summary: ADVSCORE is a metric that measures how well an adversarial dataset fools models and not humans, and it helps create high-quality adversarial datasets like ADVQA for question answering.


EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

http://arxiv.org/abs/2406.16341v1

Compressor summary: EHRCon is a new dataset and task for verifying data consistency between structured and unstructured EHR elements, while CheckEHR is a framework using large language models to check this consistency.


VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

http://arxiv.org/abs/2406.16338v1

Compressor summary: VideoHallucer is a benchmark for detecting and categorizing hallucinations in large video-language models, revealing issues with current models and informing the development of self-PEP framework.


Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models

http://arxiv.org/abs/2406.16333v1

Compressor summary: The paper proposes a diffusion-based framework to improve Text-to-Image generation by predicting object locations and generating consistent images based on textual descriptions.


Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

http://arxiv.org/abs/2406.16330v1

Compressor summary: MKA is a novel compression technique that uses manifold learning and NPIB to merge similar layers in large language models, achieving substantial compression ratios while preserving performance.


Multimodal Graph Benchmark

http://arxiv.org/abs/2406.16321v1

Compressor summary: MM-GRAPH is a new benchmark for multi-modal graph learning that uses text and visual information to better evaluate graph neural networks.


What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation

http://arxiv.org/abs/2406.16320v1

Compressor summary: NOTICE is a tool to understand how vision-language models make decisions by corrupting and evaluating images and text in a meaningful way.


Modelled Multivariate Overlap: A method for measuring vowel merger

http://arxiv.org/abs/2406.16319v1

Compressor summary: The paper proposes a new method for measuring vowel overlap by jointly modelling acoustic dimensions and simulating distributions from the model, which improves results over existing methods and allows for uncertainty computation.


Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?

http://arxiv.org/abs/2406.16316v1

Compressor summary: The paper explores if aligning Japanese language models with mostly English resources affects their moral alignment with Japanese culture and finds mixed results.


Anomaly Detection of Tabular Data Using LLMs

http://arxiv.org/abs/2406.16308v1

Compressor summary: The paper shows that large language models can detect tabular anomalies without extra training, and proposes a method to fine-tune them for better performance.


Artistic-style text detector and a new Movie-Poster dataset

http://arxiv.org/abs/2406.16307v1

Compressor summary: The paper proposes a method using Criss-Cross Attention and residual dense block to improve text detection in artistic-style text, and introduces a new Movie-Poster dataset for this task.


Cascade Reward Sampling for Efficient Decoding-Time Alignment

http://arxiv.org/abs/2406.16306v1

Compressor summary: CARDS is a technique that generates high-reward and high-likelihood text efficiently by iteratively creating small semantic segments based on predictive uncertainty.


UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

http://arxiv.org/abs/2406.16301v1

Compressor summary: The paper introduces BiSSV, a bimodal video summarization task that uses a large dataset (BIDS) and a unified framework (UBiSS) to generate both visual and textual summaries of videos.


Landscaping Linear Mode Connectivity

http://arxiv.org/abs/2406.16300v1

Compressor summary: This paper proposes a model to understand the linear mode connectivity phenomenon in neural networks by analyzing the topological features of the loss landscape.


Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

http://arxiv.org/abs/2406.16299v1

Compressor summary: The paper proposes Learnable Singular Value Increment (LSI), a method that improves quantization accuracy by making weight singular values learnable and allowing them to compensate each other based on activations, achieving state-of-the-art results in various settings.


Priorformer: A UGC-VQA Method with content and distortion priors

http://arxiv.org/abs/2406.16297v1

Compressor summary: The paper proposes a novel model called PriorFormer for blind video quality assessment of user-generated content videos, which improves adaptability and representation capability using content and distortion priors.


Relaxing Continuous Constraints of Equivariant Graph Neural Networks for Physical Dynamics Learning

http://arxiv.org/abs/2406.16295v1

Compressor summary: The paper proposes a Discrete Equivariant Graph Neural Network (DEGNN) that captures discrete symmetries in physical dynamics modeling and outperforms existing methods in various applications.


LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

http://arxiv.org/abs/2406.16294v1

Compressor summary: LangSuitE is a testbed for evaluating language models' abilities as embodied agents in dynamic environments, using a novel chain-of-thought schema called EmMem to summarize embodied states.


Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels

http://arxiv.org/abs/2406.16293v1

Compressor summary: The paper proposes MLPAC, an RL-based framework that combines exploration and exploitation abilities for partially annotated multi-label tasks like document-level relation extraction.


Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

http://arxiv.org/abs/2406.16289v1

Compressor summary: Key points: - The paper presents a crowd-sourced framework that uses data from production vehicles to reconstruct large-scale scenes with NeRF - The framework includes multiple modules for data selection, 3D reconstruction, appearance embedding, depth supervision, and occlusion completion - The system can generate high-quality 3D street view and guide the driver with a synthesized video Summary: The paper proposes a crowd-sourced framework that leverages NeRF to reconstruct large-scale scenes from production vehicle data, using various modules for quality enhancement and navigation guidance.


PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection

http://arxiv.org/abs/2406.16288v1

Compressor summary: PlagBench is a dataset for testing plagiarism detection in large language models and shows that some LLMs can outperform current commercial checkers.


Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

http://arxiv.org/abs/2406.16282v1

Compressor summary: This paper proposes a theory and methods to reduce memory overhead in fine-tuning large models, using new activation functions and layer normalization techniques.


SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments

http://arxiv.org/abs/2406.16279v1

Compressor summary: SegNet4D is a novel, fast, and accurate method for real-time 4D LiDAR semantic segmentation in autonomous vehicles using a projection-based approach and instance-aware segmentation.


Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

http://arxiv.org/abs/2406.16275v1

Compressor summary: The paper proposes FAILOpt, an attack to deceive AI text detectors by exploiting prompt-specific shortcuts, and uses it to enhance the robustness of the detector.


YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

http://arxiv.org/abs/2406.16273v1

Compressor summary: YouDream is a text-to-image diffusion model that generates high-quality, anatomically controllable 3D animals using a 2D pose prior and a multi-agent LLM to adapt poses.


Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

http://arxiv.org/abs/2406.16272v1

Compressor summary: Patcher is an automated repair approach for text-to-image models that fixes semantic inconsistencies by enhancing features of neglected objects in the prompt.


Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

http://arxiv.org/abs/2406.16271v1

Compressor summary: GBMSeg is a training-free AI framework that automatically segments the glomerular basement membrane in TEM images using one labeled reference, achieving superior performance and robustness.


One Thousand and One Pairs: A "novel" challenge for long-context language models

http://arxiv.org/abs/2406.16264v1

Compressor summary: NoCha is a dataset to test long-context LLMs' ability to retrieve, synthesize, and reason over book-length inputs, which requires global reasoning and proves challenging for current models.


Video-Infinity: Distributed Long Video Generation

http://arxiv.org/abs/2406.16260v1

Compressor summary: Video-Infinity is a distributed inference pipeline that uses clip parallelism and dual-scope attention to enable fast generation of long videos across multiple GPUs.


Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

http://arxiv.org/abs/2406.16257v1

Compressor summary: S3T is a framework for exact machine unlearning that minimizes retraining costs and service disruptions by sequentially training model layers with disjoint data slices and handling multiple deletion requests simultaneously.


Uncertainty-Aware Reward-Free Exploration with General Function Approximation

http://arxiv.org/abs/2406.16255v1

Compressor summary: The paper proposes a reward-free reinforcement learning algorithm, GFA-RFE, which uses uncertainty-aware exploration and weighted learning to improve sample efficiency in heterogeneous environments.


Confidence Regulation Neurons in Language Models

http://arxiv.org/abs/2406.16254v1

Compressor summary: This study explores how large language models represent and regulate uncertainty using entropy and token frequency neurons, which affect the normalization scale and logit distribution, respectively.


LLMs assist NLP Researchers: Critique Paper (Meta-)Reviewing

http://arxiv.org/abs/2406.16253v1

Compressor summary: This study explores how large language models can assist researchers in reviewing and identifying issues in NLP papers using a new dataset called ReviewCritique.


Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis

http://arxiv.org/abs/2406.16252v1

Compressor summary: The paper introduces a graph-augmented LLM framework that enhances the personalization and clarity of health insights by capturing inter and intra-patient relationships and using dynamic feature importance scores.


An Optimal Tightness Bound for the Simulation Lemma

http://arxiv.org/abs/2406.16249v1

Compressor summary: The paper proposes an improved bound for value-prediction error in reinforcement learning and hierarchical abstraction, addressing model misspecification and compounding probability errors.