arxiv compressed, 2024-08-06

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-06 generated by the compressor, my personal LLM-based project.


Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

http://arxiv.org/abs/2408.02672v1

Compressor summary: Implicit Neural Networks can compress videos efficiently while preserving semantic meaning, enabling various downstream applications like retrieval and chat.


Self-Taught Evaluators

http://arxiv.org/abs/2408.02666v1

Compressor summary: The paper proposes a self-improving evaluator that uses synthetic data to train an LLM without human annotations, achieving performance comparable to top reward models.


On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

http://arxiv.org/abs/2408.02654v1

Compressor summary: Using quasirandom number generators (QRNGs) for neural network weight initialization improves model performance and training speed compared to pseudorandom number generators (PRNGs).


Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

http://arxiv.org/abs/2408.02651v1

Compressor summary: This paper proposes a reinforcement learning method to optimize adversarial triggers for jailbreaking large language models, enhancing their effectiveness and transferability.


SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

http://arxiv.org/abs/2408.02632v1

Compressor summary: SEAS is a novel optimization framework that enhances the security of large language models by leveraging data generated by the model itself and refining both red team and target models through three iterative stages.


VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

http://arxiv.org/abs/2408.02629v1

Compressor summary: VidGen-1M is a high-quality text-to-video training dataset created by curating videos and captions using a coarse-to-fine strategy.


DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

http://arxiv.org/abs/2408.02279v1

Compressor summary: The paper proposes a dynamic tokenizer and multi-scale Transformer model for long-term time series forecasting, addressing challenges in capturing diverse characteristics and features across different temporal scales.


Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes

http://arxiv.org/abs/2408.02275v1

Compressor summary: The paper presents shenlong, a system that uses Large Language Models and Conformal Geometric Algebra to enable precise 3D scene editing with natural language instructions, outperforming traditional methods in accuracy and latency.


COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark

http://arxiv.org/abs/2408.02272v1

Compressor summary: Key points: - The paper introduces COM Kitchens, a dataset for procedural video understanding - The dataset is collected using smartphones with wide-angle lenses to capture overhead-view videos of food preparation - The paper proposes two new tasks: Online Recipe Retrieval and Dense Video Captioning on unedited Overhead-View videos Summary: The paper presents COM Kitchens, a novel dataset for procedural video understanding using smartphone-captured overhead-view videos of cooking activities, and introduces two new tasks based on it.


StyEmp: Stylizing Empathetic Response Generation via Multi-Grained Prefix Encoder and Personality Reinforcement

http://arxiv.org/abs/2408.02271v1

Compressor summary: StyEmp is a system that generates empathetic responses with consistent personality using prefix mechanisms and contrastive learning.


One-Shot Collaborative Data Distillation

http://arxiv.org/abs/2408.02266v1

Compressor summary: The authors propose CollabDM, a collaborative data distillation technique that creates high-quality synthetic datasets from large machine learning training datasets in distributed environments with minimal communication cost and outperforms existing methods on skewed data and attack detection in 5G networks.


Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts

http://arxiv.org/abs/2408.02265v1

Compressor summary: The OpenCBM model allows users to add or remove concepts from a bottleneck framework, making it more interpretable and achieving better classification results than previous models.


VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

http://arxiv.org/abs/2408.02263v1

Compressor summary: VoxelTrack is a novel 3D object tracking framework that uses voxelization to effectively capture and model 3D spatial information for accurate position prediction.


To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

http://arxiv.org/abs/2408.02257v1

Compressor summary: The paper investigates how to automatically predict text spans from legal problem descriptions that indicate a legal area, using a corpus of laypeople's texts annotated by lawyers, and shows that majority-voted spans perform better than disaggregated ones.


Advancing Post-OCR Correction: A Comparative Study of Synthetic Data

http://arxiv.org/abs/2408.02253v1

Compressor summary: The paper studies how synthetic data helps improve post-OCR models' performance across various languages, especially low-resource ones, by testing different data aspects and introducing a glyph similarity algorithm.


ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems

http://arxiv.org/abs/2408.02248v1

Compressor summary: ReDel is a toolkit for creating recursive multi-agent systems with flexible delegation and tool-use, which can improve performance on agentic tasks and be easily visualized and debugged.


Contrastive Learning and Abstract Concepts: The Case of Natural Numbers

http://arxiv.org/abs/2408.02247v1

Compressor summary: The authors apply contrastive learning to train a neural network to estimate discrete quantities, showing its advantages over supervised learning in certain generalization scenarios.


Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets

http://arxiv.org/abs/2408.02244v1

Compressor summary: The study uses a vision-language model to detect and classify helmet usage in motorcycle videos, achieving good results for safety and traffic enforcement.


Methods to improve run time of hydrologic models: opportunities and challenges in the machine learning era

http://arxiv.org/abs/2408.02242v1

Compressor summary: This paper discusses the use of machine learning in hydrologic modeling, its advantages over physics-based models, and the challenges and opportunities for improving simulation speed and addressing future research needs.


BOTS-LM: Training Large Language Models for Setswana

http://arxiv.org/abs/2408.02239v1

Compressor summary: BOTS-LM is a bilingual language model for Setswana and English that performs well on translation and reasoning tasks while being computationally efficient.


Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings

http://arxiv.org/abs/2408.02237v1

Compressor summary: This study evaluates LLMs' performance in sentiment and hate speech tasks in low-resource South Asian languages, finding English outperforms other languages and NLI is the strongest task for GPT-4.


A Multi-Source Heterogeneous Knowledge Injected Prompt Learning Method for Legal Charge Prediction

http://arxiv.org/abs/2408.02233v1

Compressor summary: Key points: - Legal charge prediction is an important task in legal AI that assigns accurate labels to case descriptions - Existing methods use neural networks but do not leverage multi-source external knowledge effectively - The proposed method uses a prompt learning framework to integrate knowledge from a legal knowledge base, a conversational LLM, and related legal articles - The method achieved state-of-the-art results on CAIL-2018 and has low data dependency - The method is also interpretable Summary: The paper proposes a prompt learning framework that leverages multi-source external knowledge to improve legal charge prediction, achieving better performance and interpretability than existing methods.


REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

http://arxiv.org/abs/2408.02231v1

Compressor summary: REVISION improves spatial fidelity in vision-language models by generating accurate images from text, and evaluates spatial reasoning with RevQA benchmark.


ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation

http://arxiv.org/abs/2408.02226v1

Compressor summary: ProCreate enhances the diversity and creativity of diffusion-based image generative models by moving generated images away from reference images, and demonstrates its effectiveness on a new few-shot creative generation dataset called FSCG-8.


Large Language Model Aided QoS Prediction for Service Recommendation

http://arxiv.org/abs/2408.02223v1

Compressor summary: The paper introduces a new model (llmQoS) that uses large language models to extract information from natural language sentences describing web users and services, and predicts their quality of service.


Cross-modulated Attention Transformer for RGBT Tracking

http://arxiv.org/abs/2408.02222v1

Compressor summary: CAFormer is a novel attention model for RGBT tracking that performs self-correlation, inter-modality interaction, and search-template correlation in a unified way to improve robustness and efficiency.


Climate-Driven Doubling of Maize Loss Probability in U.S. Crop Insurance: Spatiotemporal Prediction and Possible Policy Responses

http://arxiv.org/abs/2408.02217v1

Compressor summary: The paper uses machine learning to predict increased frequency and severity of crop losses due to climate change, suggesting changes in crop insurance policies to support growers' adaptation to the changing environment.


More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis

http://arxiv.org/abs/2408.02214v1

Compressor summary: This paper proposes a fine-grained benchmark for chest X-ray analysis and presents a simple but effective method to improve AI diagnostic systems by using coarse labels in training.


ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

http://arxiv.org/abs/2408.02210v1

Compressor summary: ExoViP is a "plug-and-play" method to improve visual-language programming by correcting errors in planning and execution with introspective verification.


Source-Free Domain-Invariant Performance Prediction

http://arxiv.org/abs/2408.02209v1

Compressor summary: The paper presents a new method for estimating model performance without using source data, based on uncertainty and calibration with a generative model, and shows its superiority over existing approaches.


Evaluating the Performance of Large Language Models for SDG Mapping (Technical Report)

http://arxiv.org/abs/2408.02201v1

Compressor summary: The study compares open-source language models' performance on the SDG mapping task, finding no significant differences among four of them and room for improvement in LLaMA 2 and Gemma.


Synergistic Learning with Multi-Task DeepONet for Efficient PDE Problem Solving

http://arxiv.org/abs/2408.02198v1

Compressor summary: The text introduces a multi-task deep operator network (MT-DeepONet) for solving partial differential equations (PDEs) with different functional forms and geometries, improving generalization and reducing training cost.


CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

http://arxiv.org/abs/2408.02193v1

Compressor summary: CodeACT framework improves open-source large language models' performance and efficiency in code-related tasks by selectively using high-quality data and reducing computational resources.


Unsupervised Domain Adaption Harnessing Vision-Language Pre-training

http://arxiv.org/abs/2408.02192v1

Compressor summary: The paper proposes a novel method called CMKD that uses VLP models to guide UDA tasks, and introduces RST to reduce storage overhead for model deployment, achieving state-of-the-art performance on standard benchmarks.


Dense Feature Interaction Network for Image Inpainting Localization

http://arxiv.org/abs/2408.02191v1

Compressor summary: DeFI-Net is a novel method for detecting image inpainting that uses a feature pyramid architecture and adaptive feature refinement to improve accuracy and edge localization.


AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines

http://arxiv.org/abs/2408.02181v1

Compressor summary: AssemAI is an interpretable image-based anomaly detection system for smart manufacturing pipelines, using a custom object detection model and a tailored image dataset.