arxiv compressed, 2024-09-03

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-03 generated by the compressor, my personal LLM-based project.


Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding

http://arxiv.org/abs/2408.17443v1

Compressor summary: BREASE is a novel model that simulates human cognition for long-form video understanding by combining episodic memory with semantic knowledge, achieving state-of-the-art performance on multiple benchmarks.


SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists

http://arxiv.org/abs/2408.17437v1

Compressor summary: SYNTHEVEL uses large language models to create diverse test types for evaluating NLP models' performance and weaknesses.


DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model

http://arxiv.org/abs/2408.17433v1

Compressor summary: The DARES method improves robotic-assisted surgery depth estimation by adapting Depth Anything Models with Vector Low-Rank Adaptation and a reprojection loss, achieving better performance than existing techniques.


CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models

http://arxiv.org/abs/2408.17428v1

Compressor summary: This paper introduces CLOCR-C, a method that uses transformer-based language models to correct OCR errors in historical print media archives, improving downstream NLP tasks and showing the importance of socio-cultural context.


CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

http://arxiv.org/abs/2408.17424v1

Compressor summary: CinePreGen is a visual previsualization system that uses engine-powered diffusion to enable dynamic control over camera placement and storyboarding, improving video production quality and reducing development challenges.


Open-vocabulary Temporal Action Localization using VLMs

http://arxiv.org/abs/2408.17422v1

Compressor summary: The paper proposes a learning-free method to find actions in videos using vision-language models and iterative visual prompting.


Exploring the Effect of Explanation Content and Format on User Comprehension and Trust

http://arxiv.org/abs/2408.17401v1

Compressor summary: The paper investigates how users understand and trust different methods of explaining AI-based cancer risk assessments, finding that simpler text explanations are preferred over complex charts and game-theoretic approaches.


How Knowledge Distillation Mitigates the Synthetic Gap in Fair Face Recognition

http://arxiv.org/abs/2408.17399v1

Compressor summary: The paper proposes a Knowledge Distillation strategy that uses pretrained models on real data to train smaller models on synthetic or mixed data, improving face recognition accuracy and reducing bias.


Fairness-Aware Estimation of Graphical Models

http://arxiv.org/abs/2408.17396v1

Compressor summary: The paper proposes a method to reduce bias in graphical models related to sensitive attributes by using a multi-objective optimization problem.


Continual learning with the neural tangent ensemble

http://arxiv.org/abs/2408.17394v1

Compressor summary: The text proposes interpreting a neural network as an ensemble of fixed or adaptive experts, which can help prevent forgetting in continual learning.


LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification

http://arxiv.org/abs/2408.17384v1

Compressor summary: LASSO-MOGAT is a novel deep learning framework that uses gene expression, microRNA, and DNA methylation data to classify 31 types of cancer by integrating multi-omics data and protein-protein interaction networks.


MoRe Fine-Tuning with 10x Fewer Parameters

http://arxiv.org/abs/2408.17383v1

Compressor summary: MoRe is a simple framework that uses the Monarch matrix class to search for optimal adapter architectures for fine-tuning pretrained models, outperforming existing techniques in terms of parameter efficiency and performance.


Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

http://arxiv.org/abs/2408.17380v1

Compressor summary: The paper introduces a knowledge-informed model-based residual reinforcement learning framework that integrates traffic expert knowledge into a virtual environment model to enhance learning efficiency and improve CAV trajectory control in mixed traffic flow.


NDP: Next Distribution Prediction as a More Broad Target

http://arxiv.org/abs/2408.17377v1

Compressor summary: The authors critique large language models trained on next-token prediction (NTP) due to its limitations and propose Next Distribution Prediction (NDP), which uses $n$-gram distributions instead of one-hot targets, resulting in significant improvements across various tasks.


Exploring the Impact of Environmental Pollutants on Multiple Sclerosis Progression

http://arxiv.org/abs/2408.17376v1

Compressor summary: The study used data from a project to examine how environmental factors affect relapse frequency in Multiple Sclerosis patients, finding that certain variables like air pollution and weather conditions play a role.


Leveraging Graph Neural Networks to Forecast Electricity Consumption

http://arxiv.org/abs/2408.17366v1

Compressor summary: The text proposes a new method for accurate electricity demand forecasting using graph-based models that consider the spatial distribution and interconnectedness of consumers in a decentralized network structure.


Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain

http://arxiv.org/abs/2408.17362v1

Compressor summary: The paper compares three language models' performance in climate change classification tasks and evaluates their calibration of confidence scores.


C-RADAR: A Centralized Deep Learning System for Intrusion Detection in Software Defined Networks

http://arxiv.org/abs/2408.17356v1

Compressor summary: The research proposes using deep learning for intrusion detection in software defined networks, showing better accuracy and efficiency than traditional methods.


Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

http://arxiv.org/abs/2408.17354v1

Compressor summary: The study presents a model-unlearning poisoning technique that increases data leakage during fine-tuning and warns against using unverified pre-trained models.


Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method

http://arxiv.org/abs/2408.17339v1

Compressor summary: The paper proposes a 4-D light field imaging framework to improve underwater image quality and depth estimation, and introduces a new dataset for this task.


Evaluating Reliability in Medical DNNs: A Critical Analysis of Feature and Confidence-Based OOD Detection

http://arxiv.org/abs/2408.17337v1

Compressor summary: Key points: - The paper studies OOD detection methods for medical image analysis using two new benchmarks with artefacts - Confidence-based methods are less effective than feature-based methods, but both have limitations - A combination of both methods is suggested to improve OOD detection performance Summary: The paper compares confidence- and feature-based methods for out-of-distribution (OOD) detection in medical image analysis using new benchmarks with artefacts. It shows that both methods have weaknesses and proposes a hybrid approach.


Impact of ChatGPT on the writing style of condensed matter physicists

http://arxiv.org/abs/2408.17325v1

Compressor summary: The text studies how ChatGPT's release affected condensed matter paper abstracts on arXiv, finding improved English quality for non-native speakers and changes in word usage.


Modularity in Transformers: Investigating Neuron Separability & Specialization

http://arxiv.org/abs/2408.17324v1

Compressor summary: The paper explores how neurons within transformer models specialize for different tasks, finding task-specific clusters and suggesting an inherent structure that training refines.


Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

http://arxiv.org/abs/2408.17322v1

Compressor summary: The authors explore different ways to interpret neuron activations in transformer models and evaluate their effectiveness using various ablation methods, finding that each method has its own advantages and disadvantages depending on the model and regime.


Bridging Domain Knowledge and Process Discovery Using Large Language Models

http://arxiv.org/abs/2408.17316v1

Compressor summary: The paper proposes using Large Language Models to incorporate domain knowledge into automated process discovery, resulting in more robust models and practical benefits, demonstrated through a case study.


Fair Best Arm Identification with Fixed Confidence

http://arxiv.org/abs/2408.17313v1

Compressor summary: F-BAI is a framework that identifies the best arm under fairness constraints and shows how fairness impacts sample complexity using an instance-specific lower bound, with F-TaS as an efficient algorithm for this setting.


Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

http://arxiv.org/abs/2408.17308v1

Compressor summary: Key points: - Machine translations have lower lexical diversity than human translations - Lexical diversity matters for literature translation - Current methods for increasing lexical diversity are rigid - The approach proposed is reranking translation candidates with a classifier - The approach achieves high lexical diversity scores for some books Summary: The paper proposes a novel method to recover lost lexical diversity in machine translations of literature by reranking translation candidates with a classifier that distinguishes between original and translated text.


Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR

http://arxiv.org/abs/2408.17286v1

Compressor summary: The paper proposes a simple and interpretable method to optimize risk-averse objectives in discounted MDPs using a stationary policy and various optimization techniques.


DCUDF2: Improving Efficiency and Accuracy in Extracting Zero Level Sets from Unsigned Distance Fields

http://arxiv.org/abs/2408.17284v1

Compressor summary: DCUDF2 is a new method to accurately extract surfaces from complex models using self-adaptive weights and topology correction, improving on previous state-of-the-art methods.


Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts

http://arxiv.org/abs/2408.17280v1

Compressor summary: The text introduces a toolkit that helps create cost-effective Mixture-of-Domain-Experts (MOE) models or adapters, with tips and resources for using it.


The Transferability of Downsampling Sparse Graph Convolutional Networks

http://arxiv.org/abs/2408.17274v1

Compressor summary: The paper introduces a method to reduce the size of large sparse graphs by preserving their topological structure while maintaining sparsity levels.


UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

http://arxiv.org/abs/2408.17267v1

Compressor summary: UrBench is a comprehensive benchmark for evaluating Large Multimodal Models in complex multi-view urban scenarios, revealing their limitations and inconsistencies.


Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach

http://arxiv.org/abs/2408.17258v1

Compressor summary: The authors propose a machine learning model that predicts city-wide delivery demand by using message-passing neural networks and geospatial knowledge from large language models, achieving better performance than existing methods on real-world datasets.


Self-supervised learning for crystal property prediction via denoising

http://arxiv.org/abs/2408.17255v1

Compressor summary: CDSSL is a new self-supervised learning method for predicting crystalline material properties by recovering valid structures from perturbed ones.


VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

http://arxiv.org/abs/2408.17253v1

Compressor summary: The paper proposes a novel method to improve time series forecasting by leveraging visual models pre-trained on natural images, which outperforms existing approaches with minimal fine-tuning.


Categorical data clustering: 25 years beyond K-modes

http://arxiv.org/abs/2408.17244v1

Compressor summary: This paper reviews the past 25 years of categorical data clustering methods, comparing different algorithms and their applications across various fields.


A methodological framework for Resilience as a Service (RaaS) in multimodal urban transportation networks

http://arxiv.org/abs/2408.17233v1

Compressor summary: The study proposes an optimization model for managing public transport disruptions using resilience as a service strategies, considering various transportation options and factors to allocate resources effectively and minimize costs and adverse effects on stakeholders.


OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping

http://arxiv.org/abs/2408.17223v1

Compressor summary: OG-Mapping uses sparse octrees and structured 3D Gaussians for efficient and robust online dense mapping, addressing redundancy, depth noise sensitivity, storage challenges, and recovery issues.


Geometry of Lightning Self-Attention: Identifiability and Dimension

http://arxiv.org/abs/2408.17221v1

Compressor summary: The paper studies the geometry and identifiability of function spaces defined by polynomial self-attention networks using algebraic geometry tools.


Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification

http://arxiv.org/abs/2408.17197v1

Compressor summary: The paper proposes Whitening-Net, a framework to improve image classification with class imbalance by normalizing and decorrelating batch samples using ZCA whitening and two covariance-corrected modules.


Reasoning with maximal consistent signatures

http://arxiv.org/abs/2408.17190v1

Compressor summary: The paper studies a method for dealing with inconsistent information by focusing on maximal consistent subsets that allow forgetting to restore consistency, and explores its implications for non-monotonic reasoning, computational complexity, and related concepts.


Hybrid Classification-Regression Adaptive Loss for Dense Object Detection

http://arxiv.org/abs/2408.17182v1

Compressor summary: The paper proposes a new method, HCRAL, for object detection that improves performance by considering inconsistencies across tasks and focusing on difficult samples using a hybrid loss function and additional sample selection strategy.


Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study

http://arxiv.org/abs/2408.17181v1

Compressor summary: The study compares natural language models, finding that BERT with class imbalance mitigation performs best at extracting relevant and contextualized clinical events from electronic health records text.


Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

http://arxiv.org/abs/2408.17180v1

Compressor summary: The paper proposes two advanced measures to quantify balance in competitive games, using win value estimations and vector quantization, and validates them in popular online games.


SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management

http://arxiv.org/abs/2408.17171v1

Compressor summary: SafeTail is a framework that uses deep learning to selectively replicate services across multiple edge servers to meet latency targets while minimizing resource usage in uncertain edge computing environments.


Efficient Testable Learning of General Halfspaces with Adversarial Label Noise

http://arxiv.org/abs/2408.17165v1

Compressor summary: The paper presents a polynomial time tester-learner for robustly learning non-homogeneous halfspaces with adversarial label noise, using a new method to reduce the problem to nearly homogeneous halfspaces.


The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

http://arxiv.org/abs/2408.17163v1

Compressor summary: The paper proposes new sparse recovery algorithms based on the Optimal Brain Surgeon framework that improve accuracy-vs-sparsity in deep neural networks, with theoretical guarantees and practical performance.


Deep Feature Embedding for Tabular Data

http://arxiv.org/abs/2408.17162v1

Compressor summary: The paper introduces a new framework that uses deep neural networks to create feature embeddings for tabular data with numerical and categorical features, improving their representation and capture complex relationships.


Self-supervised Anomaly Detection Pretraining Enhances Long-tail ECG Diagnosis

http://arxiv.org/abs/2408.17154v1

Compressor summary: The study introduces a novel anomaly detection method for ECG diagnosis that significantly improves accuracy, especially for rare cardiac anomalies, and enhances clinical efficiency in emergency care settings.


Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

http://arxiv.org/abs/2408.17150v1

Compressor summary: Our method, Multi-View Multi-Path Reasoning (MVP), reduces hallucinations in large vision-language models by enhancing their information perception and considering the certainty of answer tokens.


GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring

http://arxiv.org/abs/2408.17149v1

Compressor summary: Keypoints are important features in computer vision, but their quality scores are not easy to compare; a new framework refines and scores keypoints based on how well they survive viewpoint changes and localization accuracy.


The Many Faces of Optimal Weak-to-Strong Learning

http://arxiv.org/abs/2408.17148v1

Compressor summary: A new Boosting algorithm combines multiple classifiers with optimal sample complexity and fast runtime using a simple majority vote.


RenDetNet: Weakly-supervised Shadow Detection with Shadow Caster Verification

http://arxiv.org/abs/2408.17143v1

Compressor summary: This paper introduces RenDetNet, a learning-based shadow detection model that verifies shadows are real by re-rendering the scene and uses self-supervised signals for training.


Flow Matching for Optimal Reaction Coordinates of Biomolecular System

http://arxiv.org/abs/2408.17139v1

Compressor summary: Flow Matching for Reaction Coordinates (FMRC) is a deep learning algorithm that efficiently identifies optimal reaction coordinates in biomolecular reversible dynamics using conditional probability and generative models.


Temporal and Interactive Modeling for Efficient Human-Human Motion Generation

http://arxiv.org/abs/2408.17135v1

Compressor summary: TIM is a new method for generating human-human motion sequences that uses RWKV, Causal Interactive Injection, Role-Evolving Mixing, and Localized Pattern Amulation to model temporal and interactive properties of motion while being efficient and effective.


VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

http://arxiv.org/abs/2408.17131v1

Compressor summary: VQ4DiT is a fast post-training vector quantization method for Diffusion Transformers Models that reduces their memory usage and achieves state-of-the-art performance in image generation quality.


Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

http://arxiv.org/abs/2408.17129v1

Compressor summary: CETExplainer is a novel post-hoc interpretability algorithm for predicting cancer drug responses that provides biologically meaningful explanations using edge-type-specific weighting and mutual information between subgraphs and predictions.


Efficient Estimation of Unique Components in Independent Component Analysis by Matrix Representation

http://arxiv.org/abs/2408.17118v1

Compressor summary: The paper presents a faster method to estimate the unique global optimum of independent component analysis (ICA) using matrix representation and fewer calculations, which was previously achieved through time-consuming random initializations.