arxiv compressed, 2024-10-01

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-10-01 generated by the compressor, my personal LLM-based project.


PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

http://arxiv.org/abs/2409.18964v1

Compressor summary: PhysGen is a system that generates realistic videos from images by combining physical simulation with data-driven video generation, enabling physics-based interaction and control.


Exploring Token Pruning in Vision State Space Models

http://arxiv.org/abs/2409.18962v1

Compressor summary: The paper proposes a novel token pruning method for state space model (SSM)-based vision transformers, which improves efficiency without sacrificing performance on various tasks.


$O(d/T)$ Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

http://arxiv.org/abs/2409.18959v1

Compressor summary: The paper develops fast convergence theory for score-based diffusion models under minimal assumptions, showing that the generated distribution converges to the target distribution with a bound depending on data dimensionality and number of steps.


LML: Language Model Learning a Dataset for Data-Augmented Prediction

http://arxiv.org/abs/2409.18957v1

Compressor summary: This paper presents a new method using Large Language Models for classification tasks, called "Language Model Learning with Data-Augmented Prediction", which simplifies data preparation and enables context-aware decisions, achieving high accuracy.


UniCal: Unified Neural Sensor Calibration

http://arxiv.org/abs/2409.18953v1

Compressor summary: UniCal is a framework for calibrating self-driving vehicles' sensors using a differentiable scene representation and outdoor data without specific fiducials, reducing costs and enabling efficient calibration at scale.


Spectral Wavelet Dropout: Regularization in the Wavelet Domain

http://arxiv.org/abs/2409.18951v1

Compressor summary: Spectral Wavelet Dropout (SWD) is a new regularization method for convolutional neural networks that improves generalization by randomly dropping frequency bands in the wavelet decomposition of feature maps, with lower computational complexity than existing methods.


Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

http://arxiv.org/abs/2409.18943v1

Compressor summary: The paper introduces TLG, two metrics (PM and FM) to measure response length, and Ruler, a method that uses Meta Length Tokens to help large language models generate responses of specified lengths.


From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

http://arxiv.org/abs/2409.18938v1

Compressor summary: Key points: - The paper surveys advancements of MM-LLMs in visual understanding tasks, focusing on long video understanding - Long videos have more complex spatiotemporal details, dynamic events, and long-term dependencies than static images or short videos - The paper reviews model design and training methods for long video understanding and compares existing MM-LLMs on benchmarks Summary: The paper surveys how MultiModal Large Language Models (MM-LLMs) handle visual understanding tasks, especially long videos that have more challenging spatiotemporal details and dynamics than static images or short videos. It reviews model design and training methods for long video understanding and compares MM-LLMs on benchmarks.


ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions

http://arxiv.org/abs/2409.18932v1

Compressor summary: ReviveDiff is a universal network architecture that enhances and restores image quality in various challenging conditions using diffusion models.


AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

http://arxiv.org/abs/2409.18924v1

Compressor summary: AIPatient is an advanced simulated patient system that uses a knowledge graph from electronic health records and six LLM-powered agents to generate realistic and accurate medical scenarios for education and research.


SurfaceAI: Automated creation of cohesive road surface quality datasets based on open street-level imagery

http://arxiv.org/abs/2409.18922v1

Compressor summary: SurfaceAI is a system that uses street-level images to generate detailed datasets on road surface types and qualities for infrastructure modeling and analysis.


Soft Measures for Extracting Causal Collective Intelligence

http://arxiv.org/abs/2409.18911v1

Compressor summary: The text proposes an approach using large language models to automatically extract causal mental models from text and evaluates its performance and limitations.


Best Arm Identification with Minimal Regret

http://arxiv.org/abs/2409.18909v1

Compressor summary: The paper introduces a new problem (BAI with minimal regret) that combines minimizing regret and identifying the best arm with confidence, studies its properties, and proposes an algorithm (Double KL-UCB) that performs well in this setting.


Improving Visual Object Tracking through Visual Prompting

http://arxiv.org/abs/2409.18901v1

Compressor summary: PiVOT is a new visual object tracking method that uses CLIP to generate and refine visual prompts, enabling better discrimination of targets from distractors.


Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

http://arxiv.org/abs/2409.18899v1

Compressor summary: The paper proposes a new unsupervised method for enhancing low-light images using diffusion priors, lookup tables, and curve parameters to adjust the dynamic range and suppress noise efficiently.


Detecting Dataset Abuse in Fine-Tuning Stable Diffusion Models for Text-to-Image Synthesis

http://arxiv.org/abs/2409.18897v1

Compressor summary: Key points: - The paper proposes a framework to protect text-to-image synthesis datasets from unauthorized usage and data leaks. - The framework uses two key strategies and multiple watermarking schemes for effective large-scale dataset authorization. - The framework has minimal impact on the dataset, high detection accuracy, and ability to trace data leaks. Summary: The paper presents a dataset watermarking framework for text-to-image synthesis that protects datasets from abuse, authorizes large-scale usage, and traces data leaks with minimal modification and high accuracy.


S2O: Static to Openable Enhancement for Articulated 3D Objects

http://arxiv.org/abs/2409.18896v1

Compressor summary: The paper introduces a new task (S2O) and framework for converting static 3D objects into interactive ones with opening features, and evaluates existing methods on a new dataset.


Multi-Source Hard and Soft Information Fusion Approach for Accurate Cryptocurrency Price Movement Prediction

http://arxiv.org/abs/2409.18895v1

Compressor summary: The study introduces HSIF, a novel approach that fuses hard and soft data using AI to enhance accuracy in predicting cryptocurrency price movements, achieving 96.8% accuracy on Bitcoin dataset.


HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

http://arxiv.org/abs/2409.18893v1

Compressor summary: The paper proposes a reinforcement learning-based method for merging pretrained models with different architectures, achieving better performance and adaptability across various tasks.


IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation

http://arxiv.org/abs/2409.18892v1

Compressor summary: The authors propose a data synthesis framework based on Item Discrimination theory that generates challenging and discriminative prompts for evaluating Large Language Models, showing improved performance compared to previous works.


HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

http://arxiv.org/abs/2409.18885v1

Compressor summary: This study introduces a new dataset for high-resolution extreme weather events and evaluates deep learning models and NWP systems, highlighting the need for improved accuracy in forecasting such events.


Explainable Artifacts for Synthetic Western Blot Source Attribution

http://arxiv.org/abs/2409.18881v1

Compressor summary: This study investigates explainable artifacts in synthetic scientific images generated by advanced AI models, aiming to help detect and attribute fraudulent articles.


Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

http://arxiv.org/abs/2409.18878v1

Compressor summary: The study compared four BERT-based models for identifying multiple suicidal events from psychiatric notes, finding that RoBERTa with a single multi-label classifier performed best.


UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception

http://arxiv.org/abs/2409.18877v1

Compressor summary: UniEmoX is a pretraining framework that combines psychological theories with modern techniques to analyze emotions in diverse visual scenarios, and it introduces the Emo8 dataset for this purpose.


CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition

http://arxiv.org/abs/2409.18876v1

Compressor summary: The paper proposes CemiFace, a diffusion-based approach to generate face images with varying degrees of similarity to the subject's identity center, improving face recognition model training and performance.


CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

http://arxiv.org/abs/2409.18874v1

Compressor summary: The paper introduces a dataset with network traffic data from a large ISP to evaluate and improve forecast-based anomaly detection methods in computer networks.


Emu3: Next-Token Prediction is All You Need

http://arxiv.org/abs/2409.18869v1

Compressor summary: Emu3 is a new multimodal model that uses next-token prediction to excel in various tasks, outperforming existing models without the need for diffusion or compositional architectures.


Individuation in Neural Models with and without Visual Grounding

http://arxiv.org/abs/2409.18868v1

Compressor summary:


MCUBench: A Benchmark of Tiny Object Detectors on MCUs

http://arxiv.org/abs/2409.18866v1

Compressor summary: MCUBench is a benchmark for YOLO-based object detection models on MCUs, measuring precision, latency, RAM, and Flash usage across various models and input resolutions.


LW2G: Learning Whether to Grow for Prompt-based Continual Learning

http://arxiv.org/abs/2409.18860v1

Compressor summary: The text proposes a new method for continual learning using pre-trained models that adapts the prompt set based on task similarities and uses a metric to measure the hindrance of growing the set, improving performance.


Challenges of Generating Structurally Diverse Graphs

http://arxiv.org/abs/2409.18859v1

Compressor summary: The paper explores generating structurally diverse graphs for testing and research purposes and proposes several algorithms to improve diversity over random graph generators.


Mitigating Selection Bias with Node Pruning and Auxiliary Options

http://arxiv.org/abs/2409.18857v1

Compressor summary: Our work proposes Bias Node Pruning and Auxiliary Option Injection, two techniques to reduce selection bias in large language models' responses to multiple-choice questions, and introduces Choice Kullback-Leibler Divergence as a new metric for evaluating selection bias.


Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

http://arxiv.org/abs/2409.18852v1

Compressor summary: The paper proposes a space-time 2D Gaussian Splatting method to improve surface reconstruction accuracy in complex dynamic scenes with occlusions, and shows better results than existing methods.


Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

http://arxiv.org/abs/2409.18850v1

Compressor summary: Double Sparse Factorization (DSF) is a new method that sparsifies neural network weight matrices by factorizing them into two sparse matrices, achieving state-of-the-art results in pruning and fine-tuning models.


MinerU: An Open-Source Solution for Precise Document Content Extraction

http://arxiv.org/abs/2409.18839v1

Compressor summary: MinerU is an open-source tool for accurate and consistent content extraction from diverse documents using PDF-Extract-Kit models and custom preprocessing and postprocessing rules.


Classification and regression of trajectories rendered as images via 2D Convolutional Neural Networks

http://arxiv.org/abs/2409.18832v1

Compressor summary: The text discusses using computer vision methods like CNNs for trajectory classification, regression, and forecasting from images of rendered trajectories, and investigates how various parameters affect the performance of these methods.


ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

http://arxiv.org/abs/2409.18827v1

Compressor summary: ARLBench is a benchmark for hyperparameter optimization in reinforcement learning that enables efficient evaluation and comparison of different HPO approaches across various algorithm and environment combinations.


YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection

http://arxiv.org/abs/2409.18826v1

Compressor summary: The paper introduces YOLOv8-ResCBAM, a neural network model that improves fracture detection by incorporating Convolutional Block Attention Module into the original YOLOv8 architecture, achieving state-of-the-art performance on the GRAZPEDWRI-DX dataset.