This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-06 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.02672v1
Compressor summary: Implicit Neural Networks can compress videos efficiently while preserving semantic meaning, enabling various downstream applications like retrieval and chat.
http://arxiv.org/abs/2408.02666v1
Compressor summary: The paper proposes a self-improving evaluator that uses synthetic data to train an LLM without human annotations, achieving performance comparable to top reward models.
http://arxiv.org/abs/2408.02654v1
Compressor summary: Using quasirandom number generators (QRNGs) for neural network weight initialization improves model performance and training speed compared to pseudorandom number generators (PRNGs).
http://arxiv.org/abs/2408.02651v1
Compressor summary: This paper proposes a reinforcement learning method to optimize adversarial triggers for jailbreaking large language models, enhancing their effectiveness and transferability.
http://arxiv.org/abs/2408.02632v1
Compressor summary: SEAS is a novel optimization framework that enhances the security of large language models by leveraging data generated by the model itself and refining both red team and target models through three iterative stages.
http://arxiv.org/abs/2408.02629v1
Compressor summary: VidGen-1M is a high-quality text-to-video training dataset created by curating videos and captions using a coarse-to-fine strategy.
http://arxiv.org/abs/2408.02279v1
Compressor summary: The paper proposes a dynamic tokenizer and multi-scale Transformer model for long-term time series forecasting, addressing challenges in capturing diverse characteristics and features across different temporal scales.
http://arxiv.org/abs/2408.02275v1
Compressor summary: The paper presents shenlong, a system that uses Large Language Models and Conformal Geometric Algebra to enable precise 3D scene editing with natural language instructions, outperforming traditional methods in accuracy and latency.
http://arxiv.org/abs/2408.02272v1
Compressor summary: Key points: - The paper introduces COM Kitchens, a dataset for procedural video understanding - The dataset is collected using smartphones with wide-angle lenses to capture overhead-view videos of food preparation - The paper proposes two new tasks: Online Recipe Retrieval and Dense Video Captioning on unedited Overhead-View videos Summary: The paper presents COM Kitchens, a novel dataset for procedural video understanding using smartphone-captured overhead-view videos of cooking activities, and introduces two new tasks based on it.
http://arxiv.org/abs/2408.02271v1
Compressor summary: StyEmp is a system that generates empathetic responses with consistent personality using prefix mechanisms and contrastive learning.
http://arxiv.org/abs/2408.02266v1
Compressor summary: The authors propose CollabDM, a collaborative data distillation technique that creates high-quality synthetic datasets from large machine learning training datasets in distributed environments with minimal communication cost and outperforms existing methods on skewed data and attack detection in 5G networks.
http://arxiv.org/abs/2408.02265v1
Compressor summary: The OpenCBM model allows users to add or remove concepts from a bottleneck framework, making it more interpretable and achieving better classification results than previous models.
http://arxiv.org/abs/2408.02263v1
Compressor summary: VoxelTrack is a novel 3D object tracking framework that uses voxelization to effectively capture and model 3D spatial information for accurate position prediction.
http://arxiv.org/abs/2408.02257v1
Compressor summary: The paper investigates how to automatically predict text spans from legal problem descriptions that indicate a legal area, using a corpus of laypeople's texts annotated by lawyers, and shows that majority-voted spans perform better than disaggregated ones.
http://arxiv.org/abs/2408.02253v1
Compressor summary: The paper studies how synthetic data helps improve post-OCR models' performance across various languages, especially low-resource ones, by testing different data aspects and introducing a glyph similarity algorithm.
http://arxiv.org/abs/2408.02248v1
Compressor summary: ReDel is a toolkit for creating recursive multi-agent systems with flexible delegation and tool-use, which can improve performance on agentic tasks and be easily visualized and debugged.
http://arxiv.org/abs/2408.02247v1
Compressor summary: The authors apply contrastive learning to train a neural network to estimate discrete quantities, showing its advantages over supervised learning in certain generalization scenarios.
http://arxiv.org/abs/2408.02244v1
Compressor summary: The study uses a vision-language model to detect and classify helmet usage in motorcycle videos, achieving good results for safety and traffic enforcement.
http://arxiv.org/abs/2408.02242v1
Compressor summary: This paper discusses the use of machine learning in hydrologic modeling, its advantages over physics-based models, and the challenges and opportunities for improving simulation speed and addressing future research needs.
http://arxiv.org/abs/2408.02239v1
Compressor summary: BOTS-LM is a bilingual language model for Setswana and English that performs well on translation and reasoning tasks while being computationally efficient.
http://arxiv.org/abs/2408.02237v1
Compressor summary: This study evaluates LLMs' performance in sentiment and hate speech tasks in low-resource South Asian languages, finding English outperforms other languages and NLI is the strongest task for GPT-4.
http://arxiv.org/abs/2408.02233v1
Compressor summary: Key points: - Legal charge prediction is an important task in legal AI that assigns accurate labels to case descriptions - Existing methods use neural networks but do not leverage multi-source external knowledge effectively - The proposed method uses a prompt learning framework to integrate knowledge from a legal knowledge base, a conversational LLM, and related legal articles - The method achieved state-of-the-art results on CAIL-2018 and has low data dependency - The method is also interpretable Summary: The paper proposes a prompt learning framework that leverages multi-source external knowledge to improve legal charge prediction, achieving better performance and interpretability than existing methods.
http://arxiv.org/abs/2408.02231v1
Compressor summary: REVISION improves spatial fidelity in vision-language models by generating accurate images from text, and evaluates spatial reasoning with RevQA benchmark.
http://arxiv.org/abs/2408.02226v1
Compressor summary: ProCreate enhances the diversity and creativity of diffusion-based image generative models by moving generated images away from reference images, and demonstrates its effectiveness on a new few-shot creative generation dataset called FSCG-8.
http://arxiv.org/abs/2408.02223v1
Compressor summary: The paper introduces a new model (llmQoS) that uses large language models to extract information from natural language sentences describing web users and services, and predicts their quality of service.
http://arxiv.org/abs/2408.02222v1
Compressor summary: CAFormer is a novel attention model for RGBT tracking that performs self-correlation, inter-modality interaction, and search-template correlation in a unified way to improve robustness and efficiency.
http://arxiv.org/abs/2408.02217v1
Compressor summary: The paper uses machine learning to predict increased frequency and severity of crop losses due to climate change, suggesting changes in crop insurance policies to support growers' adaptation to the changing environment.
http://arxiv.org/abs/2408.02214v1
Compressor summary: This paper proposes a fine-grained benchmark for chest X-ray analysis and presents a simple but effective method to improve AI diagnostic systems by using coarse labels in training.
http://arxiv.org/abs/2408.02210v1
Compressor summary: ExoViP is a "plug-and-play" method to improve visual-language programming by correcting errors in planning and execution with introspective verification.
http://arxiv.org/abs/2408.02209v1
Compressor summary: The paper presents a new method for estimating model performance without using source data, based on uncertainty and calibration with a generative model, and shows its superiority over existing approaches.
http://arxiv.org/abs/2408.02201v1
Compressor summary: The study compares open-source language models' performance on the SDG mapping task, finding no significant differences among four of them and room for improvement in LLaMA 2 and Gemma.
http://arxiv.org/abs/2408.02198v1
Compressor summary: The text introduces a multi-task deep operator network (MT-DeepONet) for solving partial differential equations (PDEs) with different functional forms and geometries, improving generalization and reducing training cost.
http://arxiv.org/abs/2408.02193v1
Compressor summary: CodeACT framework improves open-source large language models' performance and efficiency in code-related tasks by selectively using high-quality data and reducing computational resources.
http://arxiv.org/abs/2408.02192v1
Compressor summary: The paper proposes a novel method called CMKD that uses VLP models to guide UDA tasks, and introduces RST to reduce storage overhead for model deployment, achieving state-of-the-art performance on standard benchmarks.
http://arxiv.org/abs/2408.02191v1
Compressor summary: DeFI-Net is a novel method for detecting image inpainting that uses a feature pyramid architecture and adaptive feature refinement to improve accuracy and edge localization.
http://arxiv.org/abs/2408.02181v1
Compressor summary: AssemAI is an interpretable image-based anomaly detection system for smart manufacturing pipelines, using a custom object detection model and a tailored image dataset.