This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-10-01 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.18964v1
Compressor summary: PhysGen is a system that generates realistic videos from images by combining physical simulation with data-driven video generation, enabling physics-based interaction and control.
http://arxiv.org/abs/2409.18962v1
Compressor summary: The paper proposes a novel token pruning method for state space model (SSM)-based vision transformers, which improves efficiency without sacrificing performance on various tasks.
http://arxiv.org/abs/2409.18959v1
Compressor summary: The paper develops fast convergence theory for score-based diffusion models under minimal assumptions, showing that the generated distribution converges to the target distribution with a bound depending on data dimensionality and number of steps.
http://arxiv.org/abs/2409.18957v1
Compressor summary: This paper presents a new method using Large Language Models for classification tasks, called "Language Model Learning with Data-Augmented Prediction", which simplifies data preparation and enables context-aware decisions, achieving high accuracy.
http://arxiv.org/abs/2409.18953v1
Compressor summary: UniCal is a framework for calibrating self-driving vehicles' sensors using a differentiable scene representation and outdoor data without specific fiducials, reducing costs and enabling efficient calibration at scale.
http://arxiv.org/abs/2409.18951v1
Compressor summary: Spectral Wavelet Dropout (SWD) is a new regularization method for convolutional neural networks that improves generalization by randomly dropping frequency bands in the wavelet decomposition of feature maps, with lower computational complexity than existing methods.
http://arxiv.org/abs/2409.18943v1
Compressor summary: The paper introduces TLG, two metrics (PM and FM) to measure response length, and Ruler, a method that uses Meta Length Tokens to help large language models generate responses of specified lengths.
http://arxiv.org/abs/2409.18938v1
Compressor summary: Key points: - The paper surveys advancements of MM-LLMs in visual understanding tasks, focusing on long video understanding - Long videos have more complex spatiotemporal details, dynamic events, and long-term dependencies than static images or short videos - The paper reviews model design and training methods for long video understanding and compares existing MM-LLMs on benchmarks Summary: The paper surveys how MultiModal Large Language Models (MM-LLMs) handle visual understanding tasks, especially long videos that have more challenging spatiotemporal details and dynamics than static images or short videos. It reviews model design and training methods for long video understanding and compares MM-LLMs on benchmarks.
http://arxiv.org/abs/2409.18932v1
Compressor summary: ReviveDiff is a universal network architecture that enhances and restores image quality in various challenging conditions using diffusion models.
http://arxiv.org/abs/2409.18924v1
Compressor summary: AIPatient is an advanced simulated patient system that uses a knowledge graph from electronic health records and six LLM-powered agents to generate realistic and accurate medical scenarios for education and research.
http://arxiv.org/abs/2409.18922v1
Compressor summary: SurfaceAI is a system that uses street-level images to generate detailed datasets on road surface types and qualities for infrastructure modeling and analysis.
http://arxiv.org/abs/2409.18911v1
Compressor summary: The text proposes an approach using large language models to automatically extract causal mental models from text and evaluates its performance and limitations.
http://arxiv.org/abs/2409.18909v1
Compressor summary: The paper introduces a new problem (BAI with minimal regret) that combines minimizing regret and identifying the best arm with confidence, studies its properties, and proposes an algorithm (Double KL-UCB) that performs well in this setting.
http://arxiv.org/abs/2409.18901v1
Compressor summary: PiVOT is a new visual object tracking method that uses CLIP to generate and refine visual prompts, enabling better discrimination of targets from distractors.
http://arxiv.org/abs/2409.18899v1
Compressor summary: The paper proposes a new unsupervised method for enhancing low-light images using diffusion priors, lookup tables, and curve parameters to adjust the dynamic range and suppress noise efficiently.
http://arxiv.org/abs/2409.18897v1
Compressor summary: Key points: - The paper proposes a framework to protect text-to-image synthesis datasets from unauthorized usage and data leaks. - The framework uses two key strategies and multiple watermarking schemes for effective large-scale dataset authorization. - The framework has minimal impact on the dataset, high detection accuracy, and ability to trace data leaks. Summary: The paper presents a dataset watermarking framework for text-to-image synthesis that protects datasets from abuse, authorizes large-scale usage, and traces data leaks with minimal modification and high accuracy.
http://arxiv.org/abs/2409.18896v1
Compressor summary: The paper introduces a new task (S2O) and framework for converting static 3D objects into interactive ones with opening features, and evaluates existing methods on a new dataset.
http://arxiv.org/abs/2409.18895v1
Compressor summary: The study introduces HSIF, a novel approach that fuses hard and soft data using AI to enhance accuracy in predicting cryptocurrency price movements, achieving 96.8% accuracy on Bitcoin dataset.
http://arxiv.org/abs/2409.18893v1
Compressor summary: The paper proposes a reinforcement learning-based method for merging pretrained models with different architectures, achieving better performance and adaptability across various tasks.
http://arxiv.org/abs/2409.18892v1
Compressor summary: The authors propose a data synthesis framework based on Item Discrimination theory that generates challenging and discriminative prompts for evaluating Large Language Models, showing improved performance compared to previous works.
http://arxiv.org/abs/2409.18885v1
Compressor summary: This study introduces a new dataset for high-resolution extreme weather events and evaluates deep learning models and NWP systems, highlighting the need for improved accuracy in forecasting such events.
http://arxiv.org/abs/2409.18881v1
Compressor summary: This study investigates explainable artifacts in synthetic scientific images generated by advanced AI models, aiming to help detect and attribute fraudulent articles.
http://arxiv.org/abs/2409.18878v1
Compressor summary: The study compared four BERT-based models for identifying multiple suicidal events from psychiatric notes, finding that RoBERTa with a single multi-label classifier performed best.
http://arxiv.org/abs/2409.18877v1
Compressor summary: UniEmoX is a pretraining framework that combines psychological theories with modern techniques to analyze emotions in diverse visual scenarios, and it introduces the Emo8 dataset for this purpose.
http://arxiv.org/abs/2409.18876v1
Compressor summary: The paper proposes CemiFace, a diffusion-based approach to generate face images with varying degrees of similarity to the subject's identity center, improving face recognition model training and performance.
http://arxiv.org/abs/2409.18874v1
Compressor summary: The paper introduces a dataset with network traffic data from a large ISP to evaluate and improve forecast-based anomaly detection methods in computer networks.
http://arxiv.org/abs/2409.18869v1
Compressor summary: Emu3 is a new multimodal model that uses next-token prediction to excel in various tasks, outperforming existing models without the need for diffusion or compositional architectures.
http://arxiv.org/abs/2409.18868v1
Compressor summary:
http://arxiv.org/abs/2409.18866v1
Compressor summary: MCUBench is a benchmark for YOLO-based object detection models on MCUs, measuring precision, latency, RAM, and Flash usage across various models and input resolutions.
http://arxiv.org/abs/2409.18860v1
Compressor summary: The text proposes a new method for continual learning using pre-trained models that adapts the prompt set based on task similarities and uses a metric to measure the hindrance of growing the set, improving performance.
http://arxiv.org/abs/2409.18859v1
Compressor summary: The paper explores generating structurally diverse graphs for testing and research purposes and proposes several algorithms to improve diversity over random graph generators.
http://arxiv.org/abs/2409.18857v1
Compressor summary: Our work proposes Bias Node Pruning and Auxiliary Option Injection, two techniques to reduce selection bias in large language models' responses to multiple-choice questions, and introduces Choice Kullback-Leibler Divergence as a new metric for evaluating selection bias.
http://arxiv.org/abs/2409.18852v1
Compressor summary: The paper proposes a space-time 2D Gaussian Splatting method to improve surface reconstruction accuracy in complex dynamic scenes with occlusions, and shows better results than existing methods.
http://arxiv.org/abs/2409.18850v1
Compressor summary: Double Sparse Factorization (DSF) is a new method that sparsifies neural network weight matrices by factorizing them into two sparse matrices, achieving state-of-the-art results in pruning and fine-tuning models.
http://arxiv.org/abs/2409.18839v1
Compressor summary: MinerU is an open-source tool for accurate and consistent content extraction from diverse documents using PDF-Extract-Kit models and custom preprocessing and postprocessing rules.
http://arxiv.org/abs/2409.18832v1
Compressor summary: The text discusses using computer vision methods like CNNs for trajectory classification, regression, and forecasting from images of rendered trajectories, and investigates how various parameters affect the performance of these methods.
http://arxiv.org/abs/2409.18827v1
Compressor summary: ARLBench is a benchmark for hyperparameter optimization in reinforcement learning that enables efficient evaluation and comparison of different HPO approaches across various algorithm and environment combinations.
http://arxiv.org/abs/2409.18826v1
Compressor summary: The paper introduces YOLOv8-ResCBAM, a neural network model that improves fracture detection by incorporating Convolutional Block Attention Module into the original YOLOv8 architecture, achieving state-of-the-art performance on the GRAZPEDWRI-DX dataset.