arxiv compressed, 2024-09-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-17 generated by the compressor, my personal LLM-based project.


RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

http://arxiv.org/abs/2409.10516v1

Compressor summary: RetrievalAttention speeds up attention computation by using approximate nearest neighbor search and reducing data access to exploit sparsity, achieving sub-linear time complexity and lower GPU memory requirements.


DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction

http://arxiv.org/abs/2409.10504v1

Compressor summary: The DIctionary Label Attention module disentangles dense embeddings into sparse ones, making medical code predictions more accurate and interpretable by uncovering thousands of learned medical concepts.


Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles

http://arxiv.org/abs/2409.10502v1

Compressor summary: The paper shows that Transformers can learn to solve Sudoku and Zebra puzzles by training on logical steps and have a hidden reasoning engine within their weights.


Partial Distribution Matching via Partial Wasserstein Adversarial Networks

http://arxiv.org/abs/2409.10499v1

Compressor summary: The paper proposes a partial distribution matching method using a Wasserstein adversarial network and shows its effectiveness in point set registration and domain adaptation tasks.


Flash STU: Fast Spectral Transform Units

http://arxiv.org/abs/2409.10489v1

Compressor summary: The paper presents an efficient PyTorch implementation of the Spectral Transform Unit (STU) that beats the Transformer and other state space models in different sequence prediction tasks like language, robotics, and simulated systems.


Do Pre-trained Vision-Language Models Encode Object States?

http://arxiv.org/abs/2409.10488v1

Compressor summary: The paper examines if vision-language models can recognize physical states of objects over time and suggests improvements for better performance.


Schrodinger's Memory: Large Language Models

http://arxiv.org/abs/2409.10482v1

Compressor summary:


SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

http://arxiv.org/abs/2409.10476v1

Compressor summary: The text proposes improving DDIM inversion for image editing by disentangling the guidance scale and using a better scale (0.5) derived theoretically, leading to better performance and efficiency.


MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion

http://arxiv.org/abs/2409.10473v1

Compressor summary: Masked Conditional Diffusion (MacDiff) uses diffusion models and random masking to learn effective representations for human skeleton understanding, achieving state-of-the-art performance on benchmarks and improving fine-tuning in scarce labeled data scenarios.


Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

http://arxiv.org/abs/2409.10463v1

Compressor summary: This paper compares MLPs and KANs for modeling complex relationships with a focus on low-data regimes, introducing an effective technique to design MLPs with individualized activation functions that achieve higher predictive accuracy.


Signed Graph Autoencoder for Explainable and Polarization-Aware Network Embeddings

http://arxiv.org/abs/2409.10452v1

Compressor summary: SGAAE is an explainable graph generative model that extracts node representations from signed networks based on polarization and archetypes, and shows high performance in signed link prediction tasks.


Deep-Wide Learning Assistance for Insect Pest Classification

http://arxiv.org/abs/2409.10445v1

Compressor summary: Key points: - DeWi is a novel learning assistance for insect pest classification that uses a one-stage and alternating training strategy. - It improves several Convolutional Neural Networks in discrimination and generalization by optimizing a triplet margin loss and data augmentation. - It achieves the highest performances on two insect pest classification benchmarks. Summary: DeWi is a new method for classifying insect pests that uses a combined training strategy to improve both discrimination and generalization of Convolutional Neural Networks, resulting in the best performance on two datasets.


Structure-preserving learning for multi-symplectic PDEs

http://arxiv.org/abs/2409.10432v1

Compressor summary: The paper proposes a machine learning method to infer energy-preserving reduced-order models from PDEs using data without requiring fully discrete operators or intrusive knowledge of the PDEs.


A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration

http://arxiv.org/abs/2409.10403v1

Compressor summary: The paper presents a method to improve disease diagnosis by using structured knowledge from external sources and enhancing language models' performance and interpretability on three datasets.


Revising the Structure of Recurrent Neural Networks to Eliminate Numerical Derivatives in Forming Physics Informed Loss Terms with Respect to Time

http://arxiv.org/abs/2409.10388v1

Compressor summary: MI-RNN is a modified RNN that predicts over time intervals and can solve unsteady PDEs more accurately without numerical derivatives.


Mamba-ST: State Space Model for Efficient Style Transfer

http://arxiv.org/abs/2409.10385v1

Compressor summary: Mamba-ST is an efficient State-Space Model that performs style transfer by simulating cross-attention layers without extra modules, improving quality and reducing computational burden compared to transformers and diffusion models.


Instigating Cooperation among LLM Agents Using Adaptive Information Modulation

http://arxiv.org/abs/2409.10372v1

Compressor summary: The paper presents a new method that uses large language models and reinforcement learning to create more cooperative team behaviors in simulations.


Uncovering the Mechanism of Hepatotoxiciy of PFAS Targeting L-FABP Using GCN and Computational Modeling

http://arxiv.org/abs/2409.10370v1

Compressor summary: This study develops a novel approach using graph convolutional networks (GCNs) and molecular descriptors to predict the toxicity of per- and polyfluoroalkyl substances (PFAS), which are persistent environmental pollutants with known health concerns.


Robust image representations with counterfactual contrastive learning

http://arxiv.org/abs/2409.10365v1

Compressor summary: Counterfactual contrastive learning creates positive pairs for medical imaging that capture relevant domain variations, improving generalisation and performance on both in-distribution and out-of-distribution data.


2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?

http://arxiv.org/abs/2409.10357v1

Compressor summary: This paper explores how using 2D or 3D joint coordinates as training data affects the quality of speech-to-gesture deep generative models and compares the results with human gestures.


Taming Diffusion Models for Image Restoration: A Review

http://arxiv.org/abs/2409.10353v1

Compressor summary: This paper reviews diffusion models' applications in image restoration tasks, discussing their techniques, challenges, and future directions.


Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

http://arxiv.org/abs/2409.10341v1

Compressor summary: The authors propose a method to detect sexism and misogyny in German online comments using text embeddings, achieving competitive results in a challenge and showing potential for scalability.


Hyperedge Modeling in Hypergraph Neural Networks by using Densest Overlapping Subgraphs

http://arxiv.org/abs/2409.10340v1

Compressor summary: The text introduces hypergraphs, which extend graphs by allowing multiple nodes to be connected by a single hyperedge, and proposes a novel algorithm (DOSAGE) for finding densest overlapping subgraphs in hypergraphs that improves node classification performance.


The 20 questions game to distinguish large language models

http://arxiv.org/abs/2409.10338v1

Compressor summary: The paper proposes a method to distinguish between large language models using a small number of binary questions, which could be useful for detecting model leaks.


InfoDisent: Explainability of Image Classification Models by Information Disentanglement

http://arxiv.org/abs/2409.10329v1

Compressor summary: InfoDisent is a hybrid model that combines post-hoc and intrinsic methods to better understand and interpret the decisions made by pre-trained image classification networks.


Baking Relightable NeRF for Real-time Direct/Indirect Illumination Rendering

http://arxiv.org/abs/2409.10327v1

Compressor summary: The paper proposes a method to perform real-time relighting using a CNN renderer for direct illumination and a hash grid-based renderer for indirect illumination, trained with distillation from a pre-trained teacher model.


PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

http://arxiv.org/abs/2409.10141v1

Compressor summary: PSHuman uses cross-scale diffusion and parametric models to reconstruct detailed and photorealistic 3D human meshes from monocular RGB images, addressing challenges like self-occlusions and clothing topology.


StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models

http://arxiv.org/abs/2409.10132v1

Compressor summary: StruEdit is a method to update large language models' answers with current knowledge by editing structured reasoning triplets, improving accuracy and speed.


Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection

http://arxiv.org/abs/2409.10111v1

Compressor summary: This study evaluates whether instance incremental or batch incremental learning is better for real-world fraud detection problems with delayed labels, finding that batch incremental models perform similarly or better in terms of predictive performance and interpretability.


A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying

http://arxiv.org/abs/2409.10104v1

Compressor summary: The study explores how Transfer Learning can help train AI models in small data contexts for quality control of CFRP tape laying in aerospace manufacturing using optical sensors.


Robust Reinforcement Learning with Dynamic Distortion Risk Measures

http://arxiv.org/abs/2409.10096v1

Compressor summary: The paper proposes a framework for robust risk-aware reinforcement learning using dynamic distortion risk measures, neural networks, and actor-critic algorithms to handle uncertainty and environmental dynamics.


DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

http://arxiv.org/abs/2409.10094v1

Compressor summary: The paper proposes a diffusion-based OoD detection framework that uses a novel similarity metric in feature and probability spaces to measure distribution disparities between original and generated images, achieving better performance than existing methods.


MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior

http://arxiv.org/abs/2409.10090v1

Compressor summary: MotionCom is a novel image composition method that uses a large vision language model and video diffusion prior for automatic integration of objects into new scenes with realistic motion and interaction.


A Riemannian Approach to Ground Metric Learning for Optimal Transport

http://arxiv.org/abs/2409.10085v1

Compressor summary: This paper proposes learning a latent ground metric for optimal transport distances in machine learning and signal processing applications using Riemannian geometry.


DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

http://arxiv.org/abs/2409.10080v1

Compressor summary: The paper proposes a new framework called DAE-Fuse that uses a two-phase autoencoder to generate sharp and natural fused images from different imaging modalities.


LLM-DER:A Named Entity Recognition Method Based on Large Language Models for Chinese Coal Chemical Domain

http://arxiv.org/abs/2409.10077v1

Compressor summary: The paper proposes LLM-DER, a framework that uses large language models to enrich entity information and evaluate plausibility for complex domain-specific entity recognition in Chinese.


Steinmetz Neural Networks for Complex-Valued Data

http://arxiv.org/abs/2409.10075v1

Compressor summary: Steinmetz Neural Networks use real-valued subnetworks with coupled outputs to process complex-valued data and Analytic Neural Networks enforce analytic signal representations for better generalization error bounds and performance.


Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks

http://arxiv.org/abs/2409.10070v1

Compressor summary: The paper proposes using semantic information from goal-oriented human-human dialogues to improve summarization and introduces a new dataset version for research on this topic.


Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

http://arxiv.org/abs/2409.10069v1

Compressor summary: The paper introduces a domain-agnostic method for unsupervised anomaly detection using conditional perturbators and a discriminator that generates diverse and hard-to-distinguish synthetic anomalies.


Spatiotemporal Covariance Neural Networks

http://arxiv.org/abs/2409.10068v1

Compressor summary: The STVNN model uses joint spatiotemporal convolutions to process multivariate time series, addressing instabilities in traditional methods like PCA and improving stability for dynamic data settings.


MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM

http://arxiv.org/abs/2409.10064v1

Compressor summary: MindGuard is a mobile mental healthcare system using an LLM to provide personalized screening and intervention conversations, addressing the low treatment rate due to stigma and improving accessibility in mental healthcare.


Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective

http://arxiv.org/abs/2409.10053v1

Compressor summary: The paper proposes a new editing method for large language models that preserves activation magnitudes and improves safety benchmarks by rotating activations instead of adding steering vectors.


Global Lightning-Ignited Wildfires Prediction and Climate Change Projections based on Explainable Machine Learning Models

http://arxiv.org/abs/2409.10046v1

Compressor summary: This study develops machine learning models to characterize and predict lightning-ignited wildfires globally, showing that climate change increases their risk and highlighting the importance of tailored models for different types of fires.


Learning Latent Wireless Dynamics from Channel State Information

http://arxiv.org/abs/2409.10045v1

Compressor summary: The paper introduces a new machine learning technique to model and predict wireless channel dynamics using compressed representations of channel state information.


Benchmarking Large Language Model Uncertainty for Prompt Optimization

http://arxiv.org/abs/2409.10044v1

Compressor summary: The paper introduces a benchmark dataset to evaluate different types of uncertainty in LLMs, highlighting the need for improved metrics that better guide prompt optimization.


DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments

http://arxiv.org/abs/2409.10041v1

Compressor summary: DENSER uses wavelets to improve 3D Gaussian splatting for dynamic urban scene reconstruction, outperforming existing methods on the KITTI dataset.


On the Diagram of Thought

http://arxiv.org/abs/2409.10038v1

Compressor summary: Diagram of Thought (DoT) is a framework for modeling iterative reasoning in large language models using directed acyclic graphs, enhancing logical consistency and soundness while improving reasoning capabilities.


AttnMod: Attention-Based New Art Styles

http://arxiv.org/abs/2409.10028v1

Compressor summary: AttnMod modifies cross attention in diffusion models to create new art styles that are not achievable with standard prompts.


LithoHoD: A Litho Simulator-Powered Framework for IC Layout Hotspot Detection

http://arxiv.org/abs/2409.10021v1

Compressor summary: Key points: - Hotspot detection techniques for VLSI fabrication need to generalize well to real-world scenarios - The proposed framework integrates a lithography simulator and an object detection network with cross-attention blocks - The framework outperforms previous methods on real-world data Summary: The paper presents a novel hotspot detection framework for VLSI fabrication that combines a lithography simulator and an object detection network, enabling better generalization to real-world scenarios.


AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing

http://arxiv.org/abs/2409.10016v1

Compressor summary: The paper introduces AceParse, a comprehensive dataset for parsing diverse structured texts in academic literature, and presents AceParser, a multimodal model that outperforms the previous state-of-the-art in this task.


HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

http://arxiv.org/abs/2409.10011v1

Compressor summary: HALO is a framework that detects and mitigates hallucinations in medical question-answering systems by using multiple queries, retrieving context from external sources, and scoring relevance to improve the accuracy of large language models.


SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

http://arxiv.org/abs/2409.10007v1

Compressor summary: SelECT-SQL is a new in-context learning approach that combines chain-of-thought prompting, self-correction, and ensemble methods to improve Text-to-SQL conversion accuracy using large language models like GPT-3.5-Turbo, achieving state-of-the-art results on challenging benchmarks.


SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning

http://arxiv.org/abs/2409.09990v1

Compressor summary: SHIRE is a framework that uses human intuition encoded in PGMs to improve sample efficiency and explainability of Deep RL policies in robotic tasks.


Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system

http://arxiv.org/abs/2409.09989v1

Compressor summary: This paper reviews the evolution of sentiment analysis in NLP, its challenges, and future trends.


Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate

http://arxiv.org/abs/2409.09984v1

Compressor summary: The paper analyzes the effect of increasing batch sizes or decaying learning rates on the GSAM algorithm's ability to find flat local minima in deep neural networks.


From Bytes to Bites: Using Country Specific Machine Learning Models to Predict Famine

http://arxiv.org/abs/2409.09980v1

Compressor summary: The study shows that using machine learning, especially Random Forests, can help predict household nutrition in countries facing hunger crises by analyzing various factors, but better data is needed for more accurate results.


2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction

http://arxiv.org/abs/2409.09969v1

Compressor summary: The paper proposes a novel omni-directional image synthesis method that uses a pre-trained VQGAN model and reduces training time by using two stages: global coarse image creation and local refinement.


An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

http://arxiv.org/abs/2409.09958v1

Compressor summary: The paper proposes a method for multi-objective reinforcement learning that adapts to preferences and safety constraints from demonstrations without explicit input.


Deep Graph Anomaly Detection: A Survey and New Perspectives

http://arxiv.org/abs/2409.09957v1

Compressor summary: This text reviews deep learning methods for graph anomaly detection (GAD), discussing challenges, methodologies, and datasets, and provides a taxonomy of 13 fine-grained categories.


Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection

http://arxiv.org/abs/2409.09953v1

Compressor summary: UAAN is a novel network that detects out-of-distribution actions in videos using both appearance and motion features, outperforming existing methods.


Optimal ablation for interpretability

http://arxiv.org/abs/2409.09951v1

Compressor summary: Optimal ablation is a new method for measuring the importance of model components in machine learning models, which has advantages over existing methods and can improve interpretability tasks.


Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations

http://arxiv.org/abs/2409.09947v1

Compressor summary: The authors propose a new way to evaluate machine-generated legal analysis by identifying gaps between human and machine outputs and create a detector with an annotated dataset to measure these gaps.


Tracking the spatial dynamics of the synthetic opioid crisis in the USA, 2013-2020 using human mobility-based graph neural network

http://arxiv.org/abs/2409.09945v1

Compressor summary: This study analyzes how synthetic opioids and heroin spread in the U.S. from 2013 to 2020 using a graph convolutional neural network model that accounts for spatial connections between counties.


Fault Analysis And Predictive Maintenance Of Induction Motor Using Machine Learning

http://arxiv.org/abs/2409.09944v1

Compressor summary: Key points: - Paper presents machine learning model for induction motor fault detection and classification using three phase voltages and currents as inputs - Aims to protect vital electrical components and prevent abnormal event progression through early detection and diagnosis - Uses fast forward artificial neural network model to detect common electrical faults - Interfaces the model with a real motor to test its performance Summary: The paper proposes a machine learning model that uses voltages and currents of induction motors to detect and classify common electrical faults, with the goal of protecting vital electrical components.


Generalizability of Graph Neural Network Force Fields for Predicting Solid-State Properties

http://arxiv.org/abs/2409.09931v1

Compressor summary: This study shows that a machine learning model can accurately predict properties of solid materials, even when trained only on some aspects and tested on unseen configurations.


Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

http://arxiv.org/abs/2409.09930v1

Compressor summary: MissNet is a method to accurately impute missing values in multivariate time series data by exploiting temporal dependency and inter-correlation using adaptive networks.


Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges

http://arxiv.org/abs/2409.09927v1

Compressor summary: The authors evaluate five data contamination detection methods on four state-of-the-art LLMs using eight challenging datasets, finding significant limitations and inconsistencies in current approaches.


Multi-Step Embed to Control: A Novel Deep Learning-based Approach for Surrogate Modelling in Reservoir Simulation

http://arxiv.org/abs/2409.09920v1

Compressor summary: This paper proposes a deep learning-based surrogate model that uses multiple forward transitions in latent space to improve long-term predictions for two-phase reservoir simulations.


SFR-RAG: Towards Contextually Faithful LLMs

http://arxiv.org/abs/2409.09916v1

Compressor summary: SFR-RAG is a small language model that uses external context to generate accurate and relevant answers, outperforming larger models like GPT-4o with fewer parameters.


Forearm Ultrasound based Gesture Recognition on Edge

http://arxiv.org/abs/2409.09915v1

Compressor summary: The paper presents a method to recognize hand gestures using forearm ultrasound and deep neural networks on low-resource devices like Raspberry Pi with high accuracy and low latency.


Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors

http://arxiv.org/abs/2409.09905v1

Compressor summary: The authors propose a novel method that uses large language models to reveal latent personality dimensions without relying on explicit questionnaires and show that their approach can predict Big Five traits more accurately than previous methods.