arxiv compressed, 2024-08-09

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-09 generated by the compressor, my personal LLM-based project.


LiDAR-Event Stereo Fusion with Hallucinations

http://arxiv.org/abs/2408.04633v1

Compressor summary: The authors propose a method to fuse event-based stereo with LiDAR depth hints to overcome limitations in event correspondence and hallucinate events for motion detection.


Arctic-TILT. Business Document Understanding at Sub-Billion Scale

http://arxiv.org/abs/2408.04632v1

Compressor summary: Arctic-TILT is a small and efficient model that can answer questions on PDFs and scans with high accuracy, low costs, and fast inference.


Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

http://arxiv.org/abs/2408.04631v1

Compressor summary: Puppet-Master is an interactive video generative model that uses motion trajectories to create realistic part-level motion videos, trained on Objaverse-Animation-HQ and outperforming existing methods.


LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

http://arxiv.org/abs/2408.04628v1

Compressor summary: The paper presents LogogramNLP, a benchmark for NLP analysis of ancient logographic languages using direct processing of visual representations, which can outperform textual representations in some tasks.


Transformer Explainer: Interactive Learning of Text-Generative Models

http://arxiv.org/abs/2408.04619v1

Compressor summary: Transformer Explainer is a web-based, interactive visualization tool that helps non-experts learn about Transformers through GPT-2 by running a live instance in the user's browser and allowing them to experiment with inputs.


Better Alignment with Instruction Back-and-Forth Translation

http://arxiv.org/abs/2408.04614v1

Compressor summary: The new method, instruction back-and-forth translation, creates high-quality synthetic data for aligning large language models using web documents and improved response rewriting.


Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes

http://arxiv.org/abs/2408.04606v1

Compressor summary: The EPPNet is an image classification DNN that finds human-understandable prototypes to explain its decisions and achieves high accuracy.


Fall Detection for Industrial Setups Using YOLOv8 Variants

http://arxiv.org/abs/2408.04605v1

Compressor summary: The paper develops an industrial fall detection system using different YOLOv8 models, finding the YOLOv8m model to be a good balance between efficiency and accuracy.


Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

http://arxiv.org/abs/2408.04604v1

Compressor summary: The Group3AD network detects anomalies in high-resolution point clouds by mapping different groups to clusters, aligning them within clusters, and selecting group centers based on geometric information.


Improving Network Interpretability via Explanation Consistency Evaluation

http://arxiv.org/abs/2408.04600v1

Compressor summary: Key points: - Paper proposes a framework to improve interpretability and performance of neural networks without extra supervision - Framework uses explanation consistency metric to reweight training samples based on similarity of visual explanations - Framework achieves better results on various benchmarks in multiple aspects Summary: The paper presents a simple and effective framework that enhances the interpretability and performance of neural networks by using a novel explanation consistency metric to adaptively reweight training samples. The framework outperforms existing methods on several tasks and datasets.


Code-switching in text and speech reveals information-theoretic audience design

http://arxiv.org/abs/2408.04596v1

Compressor summary: The study examines the reasons for switching between languages (code-switching) by analyzing Chinese-English online texts and speech, finding that speakers use a second language to signal the need for more attention from listeners.


Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

http://arxiv.org/abs/2408.04594v1

Compressor summary: The Img-Diff dataset improves fine-grained image recognition in multimodal language models by providing pairs of similar images with contrastive learning and object replacement challenges, leading to better performance on various tasks.


HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts

http://arxiv.org/abs/2408.04591v1

Compressor summary: This paper introduces a new method for generalized category discovery that handles different domains and outperforms existing models.


Learn To Learn More Precisely

http://arxiv.org/abs/2408.04590v1

Compressor summary: The paper proposes Meta Self-Distillation (MSD), a meta-learning framework that improves generalization by learning precise target knowledge from data and reducing noisy knowledge effects.


Unveiling the Power of Sparse Neural Networks for Feature Selection

http://arxiv.org/abs/2408.04583v1

Compressor summary: Key points: - Sparse neural networks (SNNs) use dynamic sparse training (DST) algorithms for efficient feature selection - The paper analyzes different aspects of SNNs for feature selection and introduces a novel metric for measuring feature importance - The results show that SNNs achieve significant memory and computation reduction while preserving or improving feature quality compared to dense networks Summary: The paper presents a systematic analysis of feature selection with sparse neural networks (SNNs) using dynamic sparse training (DST) algorithms and proposes a new metric for measuring feature importance. It demonstrates that SNNs can reduce memory and computation costs while maintaining or enhancing feature quality over dense networks.


SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals

http://arxiv.org/abs/2408.04575v1

Compressor summary: SCENE is a new method for generating explanations for AI models in natural language processing by using large language models to create contextually appropriate counterfactuals without fine-tuning.


Mathematical Programming For Adaptive Experiments

http://arxiv.org/abs/2408.04570v1

Compressor summary: The paper proposes a mathematical programming view of adaptive experimentation that can handle various practical issues and challenges in real-world settings, offering better solutions than bespoke algorithms.


Activation thresholds and expressiveness of polynomial neural networks

http://arxiv.org/abs/2408.04569v1

Compressor summary: Polynomial neural networks are powerful machine learning frameworks that map weights to polynomials and have a measure of expressivity called neurovariety dimension; this study explores when they achieve maximum expressivity and proves their effectiveness for equi-width architectures.


Learning Fine-Grained Grounded Citations for Attributed Large Language Models

http://arxiv.org/abs/2408.04568v1

Compressor summary: FRONT is a training framework that improves citation quality and verifiability for large language models by generating fine-grained grounded citations.


Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

http://arxiv.org/abs/2408.04567v1

Compressor summary: Key points: - Paper proposes a deep-learning approach for generating 3D game scenes from sketches - Uses a pre-trained 2D denoising diffusion model and image understanding to guide the process - Output is interactive and playable in game engines like Unity or Unreal Summary: The paper presents a method that can create interactive 3D game scenes from user sketches using deep learning and pre-trained models.


Conversational Prompt Engineering

http://arxiv.org/abs/2408.04560v1

Compressor summary: CPE is a tool that helps users create personalized prompts for their tasks by interacting with them and using their data and feedback to generate and refine the prompt.


Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models

http://arxiv.org/abs/2408.04556v1

Compressor summary: BA-LoRA is a novel parameter-efficient fine-tuning method for large language models that addresses bias propagation from pre-training data by using three regularization terms to improve consistency, diversity, and generalization.


Molyé: A Corpus-based Approach to Language Contact in Colonial France

http://arxiv.org/abs/2408.04554v1

Compressor summary: The Moly'e corpus, a new open resource, combines stereotypical representations of European language variation with early French-based Creole languages to study their genetic relationship.


MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification

http://arxiv.org/abs/2408.04540v1

Compressor summary: The paper presents a method for detecting propaganda techniques in Arabic text using a pre-trained AraBERT model and a two-phase fine-tuning approach that achieves competitive results.


How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

http://arxiv.org/abs/2408.04532v1

Compressor summary: This study explores how multi-head transformers perform in-context learning for sparse linear regression, discovering that they preprocess data in the first layer and optimize it in subsequent layers, outperforming naive algorithms.


AExGym: Benchmarks and Environments for Adaptive Experimentation

http://arxiv.org/abs/2408.04531v1

Compressor summary: The authors present a benchmark for adaptive experiments using real-world datasets, identify practical challenges, and release an open source library to facilitate methodological development.


Reasoning about Study Regulations in Answer Set Programming

http://arxiv.org/abs/2408.04528v1

Compressor summary: The authors develop a system to automate reasoning with and about study regulations at the University of Potsdam using formal methods and Answer Set Programming.


Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation

http://arxiv.org/abs/2408.04523v1

Compressor summary: The paper proposes a method to estimate tree canopy height using monocular depth estimation models, which is efficient, cost-effective, and environmentally friendly.


Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

http://arxiv.org/abs/2408.04522v1

Compressor summary: The paper investigates the safety vulnerabilities of large language models in Italian when exposed to many-shot jailbreaking, a technique that makes them behave unsafely by providing unsafe demonstrations.


Knowledge-Aided Semantic Communication Leveraging Probabilistic Graphical Modeling

http://arxiv.org/abs/2408.04499v1

Compressor summary: The paper presents a new communication method that uses probabilistic graphical models to encode and compress semantic information, achieving better transmission efficiency and image quality.


Model-Based Transfer Learning for Contextual Reinforcement Learning

http://arxiv.org/abs/2408.04498v1

Compressor summary: This paper proposes Model-Based Transfer Learning (MBTL), a method to select optimal tasks for training in deep reinforcement learning that explicitly models the performance loss due to task similarity and uses Bayesian optimization to estimate training performance, leading to improved results in contextual RL problems.


NFDI4Health workflow and service for synthetic data generation, assessment and risk management

http://arxiv.org/abs/2408.04478v1

Compressor summary: The paper presents AI tools and a web-based tool for generating and assessing synthetic health data to protect patient privacy while enabling scientific advancements.


Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate

http://arxiv.org/abs/2408.04472v1

Compressor summary: The paper introduces Agent4Debate, a multi-agent framework using LLMs that enhances their debate skills by collaborating through different stages and matches human performance on Chinese debates.


Crowd Intelligence for Early Misinformation Prediction on Social Media

http://arxiv.org/abs/2408.04463v1

Compressor summary: CROWDSHIELD is a crowd intelligence-based method that uses deep Q-learning and transformer-based encoders to predict misinformation on social media by analyzing user reactions, stances, and claims in conversation threads.


Random Walk Diffusion for Efficient Large-Scale Graph Generation

http://arxiv.org/abs/2408.04461v1

Compressor summary: ARROW-Diff is a novel method for generating large-scale graphs with similar data distribution using random walk sampling and graph pruning.


An experimental comparative study of backpropagation and alternatives for training binary neural networks for image classification

http://arxiv.org/abs/2408.04460v1

Compressor summary: This paper explores binary neural networks as a way to reduce model size, increase speed, and deploy powerful models on edge devices, and tests alternative training methods to overcome the challenges of backpropagation-based gradient descent.


RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents

http://arxiv.org/abs/2408.04449v1

Compressor summary: RiskAwareBench is a framework to evaluate and improve physical risk awareness in language model-based robots, as most existing models fail to avoid potential harm in real-world scenarios.


Deep Learning for identifying systolic complexes in SCG traces: a cross-dataset analysis

http://arxiv.org/abs/2408.04439v1

Compressor summary: The authors evaluate deep learning models for detecting systolic complexes in seismocardiograms from different datasets and real-world scenarios, highlighting the need for personalization and multi-channel analysis.


AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora

http://arxiv.org/abs/2408.04427v1

Compressor summary: AcrosticSleuth is a tool that automatically identifies hidden messages in texts, such as initial letters forming words or phrases, using statistical analysis and a new dataset.


A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery

http://arxiv.org/abs/2408.04426v1

Compressor summary: This paper reviews NeRF and Gaussian splatting-based methods for reconstructing surgical scenes in robotic minimally invasive surgery, evaluating their performance on two datasets.


Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning

http://arxiv.org/abs/2408.04414v1

Compressor summary: The study proposes an in-context learning method to improve retrieval-augmented language models' reasoning and robustness in handling unanswerable queries and conflicting information in open-domain question answering.


Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

http://arxiv.org/abs/2408.04413v1

Compressor summary: The paper presents Deeploy, a DNN compiler that generates C code for efficient edge deployment of Small Language Models on a multicore RISC-V MCU with an NPU, achieving low energy and high throughput.


Clutter Classification Using Deep Learning in Multiple Stages

http://arxiv.org/abs/2408.04407v1

Compressor summary: This paper applies deep learning to satellite images to automatically identify environmental clutter types for improved wireless communication propagation predictions.


Probabilistic energy forecasting through quantile regression in reproducing kernel Hilbert spaces

http://arxiv.org/abs/2408.04405v1

Compressor summary: Key points: - Accurate energy demand forecasting is important for sustainable development - The study explores a new method based on kernel quantile regression for energy prediction - The method is reliable, sharp, and competitive with existing methods Summary: The study proposes a novel energy prediction method using kernel quantile regression, which is accurate and comparable to other methods, to help achieve sustainable energy development.


Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset

http://arxiv.org/abs/2408.04403v1

Compressor summary: The paper studies how well large language models can do syllogistic reasoning, a form of human reasoning, and finds that they have similar biases and errors as humans, but struggle with some types of problems.


DIVE: Subgraph Disagreement for Graph Out-of-Distribution Generalization

http://arxiv.org/abs/2408.04400v1

Compressor summary: The paper proposes DIVE, a method to improve out-of-distribution generalization in graph machine learning by training multiple models on all label-predictive subgraphs and encouraging divergence between them.


Evaluating the Impact of Pulse Oximetry Bias in Machine Learning under Counterfactual Thinking

http://arxiv.org/abs/2408.04396v1

Compressor summary: The study shows that algorithmic bias in medical devices, such as pulse oximeters, can lead to worse outcomes and false negatives in machine learning models for healthcare.


Automated Educational Question Generation at Different Bloom's Skill Levels using Large Language Models: Strategies and Evaluation

http://arxiv.org/abs/2408.04394v1

Compressor summary: The study examines how large language models can generate diverse and high-quality educational questions for online education across different cognitive levels, using advanced prompting techniques and expert evaluations.


Open-domain Implicit Format Control for Large Language Model Generation

http://arxiv.org/abs/2408.04392v1

Compressor summary: The paper introduces a new framework for controlling the format of outputs from large language models using one-shot QA pairs and develops a method to collect and fine-tune datasets for open-domain format control.


Non-maximizing policies that fulfill multi-criterion aspirations in expectation

http://arxiv.org/abs/2408.04385v1

Compressor summary: The paper proposes an algorithm for sequential decision making in stochastic environments with multiple evaluation metrics and aspiration sets, which ensures desired outcomes while avoiding extreme or nonsensical actions.


Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation

http://arxiv.org/abs/2408.04378v1

Compressor summary: The paper describes a shared task on Chinese metaphor generation at a conference, with two subtasks involving creating and identifying metaphors using machine learning techniques.


Anomaly Prediction: A Novel Approach with Explicit Delay and Horizon

http://arxiv.org/abs/2408.04377v1

Compressor summary: The paper proposes a novel method for predicting anomalies in time series data that incorporates temporal information, introduces a new dataset to evaluate it, and shows its effectiveness in providing timely and accurate predictions.


Deep Reinforcement Learning for the Design of Metamaterial Mechanisms with Functional Compliance Control

http://arxiv.org/abs/2408.04376v1

Compressor summary: The study uses deep reinforcement learning to efficiently design cell-based compliant mechanisms that outperform human-designed ones in applications like a door-latch and a soft gripper.


Analyzing Consumer Reviews for Understanding Drivers of Hotels Ratings: An Indian Perspective

http://arxiv.org/abs/2408.04369v1

Compressor summary: The study analyzes consumer reviews of Indian hotels to identify important aspects and sentiments that affect their ratings using web scraping, text analysis, and machine learning.


MultiViPerFrOG: A Globally Optimized Multi-Viewpoint Perception Framework for Camera Motion and Tissue Deformation

http://arxiv.org/abs/2408.04367v1

Compressor summary: Key points: - The paper proposes a method to reconstruct 3D shape of deformable environment from moving depth camera for surgery. - The method uses multi-viewpoint global optimization and kinematic priors to estimate camera motion and tissue deformation. - The method is robust to noisy input and can process hundreds of points quickly. Summary: The paper presents a robust and efficient method to reconstruct 3D shape of deformable environment from moving depth camera for surgery, using multi-viewpoint global optimization and kinematic priors.


Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework

http://arxiv.org/abs/2408.04360v1

Compressor summary: The project aims to use deep learning and handheld devices like mobile phones or wearable cameras to estimate and control over speeding in road accidents.


AggSS: An Aggregated Self-Supervised Approach for Class-Incremental Learning

http://arxiv.org/abs/2408.04347v1

Compressor summary: The paper explores how image rotations can improve feature learning for class-incremental learning by using a strategy called Aggregated Self-Supervision, which enhances performance on various datasets.


Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion

http://arxiv.org/abs/2408.04339v1

Compressor summary: CGCN is a novel deep graph clustering method that uses contrastive learning and structural information to improve the reliability of pre-training for various real-world applications.


KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination

http://arxiv.org/abs/2408.04336v1

Compressor summary: The paper proposes KnowPC, a method to learn interpretable programs for cooperative AI agents using reinforcement learning and domain-specific language, addressing the challenges of zero-shot coordination and generalization.


Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs

http://arxiv.org/abs/2408.04331v1

Compressor summary: The study evaluates how large multimodal models can improve news captions by using different context sources and compares their performance with two-stage pipelines, finding that smaller open-source models perform better than GPT-based ones and that controlling context amount enhances results.


Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

http://arxiv.org/abs/2408.04303v1

Compressor summary: The study presents trans-tokenization, a novel cross-lingual vocabulary transfer strategy that adapts a high-resource monolingual LLM to an unseen target language using semantically similar token embeddings and translation resources, enabling efficient language adaptation and improving performance on various downstream tasks across languages.


Respiratory Subtraction for Pulmonary Microwave Ablation Evaluation

http://arxiv.org/abs/2408.04299v1

Compressor summary: The paper proposes a method called respiratory subtraction to evaluate microwave ablation surgery for lung tumors using pre- and post-operative images, and a quantitative analysis metric to measure its performance.


Dual-branch PolSAR Image Classification Based on GraphMAE and Local Feature Extraction

http://arxiv.org/abs/2408.04294v1

Compressor summary: Key points: - PolSAR image classification is challenging with limited labels - Generative self-supervised learning is proposed for this task - A dual-branch model using superpixel and pixel features is designed - The approach shows promising results on Flevoland dataset Summary: The paper presents a generative self-supervised dual-branch model for PolSAR image classification, which uses superpixel and pixel features and achieves promising results on a benchmark dataset.


Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments

http://arxiv.org/abs/2408.04293v1

Compressor summary: The study investigates how well large language models capture and express sentiments between different social groups based on nationality, religion, and race/ethnicity, comparing their responses to social surveys.


EMTeC: A Corpus of Eye Movements on Machine-Generated Texts

http://arxiv.org/abs/2408.04289v1

Compressor summary: EMTeC is a corpus of eye movement data from native English speakers reading machine-generated texts with various characteristics, which can be used for various research purposes related to human reading and language models.


LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

http://arxiv.org/abs/2408.04284v1

Compressor summary: LLM-DetectiAIve is a system that classifies texts into four categories to identify the authorship and degree of LLM intervention in text creation, helping to maintain integrity in education and academia.


LaDiMo: Layer-wise Distillation Inspired MoEfier

http://arxiv.org/abs/2408.04278v1

Compressor summary: The text describes LaDiMo, an algorithm that converts non-MoE language models to sparse MoE models with minimal additional training cost, improving efficiency and reducing environmental impacts.


Stability Analysis of Equivariant Convolutional Representations Through The Lens of Equivariant Multi-layered CKNs

http://arxiv.org/abs/2408.04277v1

Compressor summary: The paper explores how group equivariant convolutional kernel networks (CKNs) help understand and improve the geometry of equivariant CNNs for stable representation learning under perturbations.


Early Risk Assessment Model for ICA Timing Strategy in Unstable Angina Patients Using Multi-Modal Machine Learning

http://arxiv.org/abs/2408.04276v1

Compressor summary: The study uses machine learning to improve early risk assessment for unstable angina patients, potentially helping doctors balance the risks of invasive coronary arteriography.


Analysis of Argument Structure Constructions in the Large Language Model BERT

http://arxiv.org/abs/2408.04270v1

Compressor summary: This study analyzes how BERT represents different types of Argument Structure Constructions (ASCs) across its 12 layers and compares it with LSTMs, finding that BERT's layered processing differs from LSTMs and reflects human language understanding.


Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods

http://arxiv.org/abs/2408.04268v1

Compressor summary: The study compares NeRF, Gaussian-based methods, and SLAM systems for 3D scene reconstruction, finding that NeRF is good at view synthesis but slower, while newer SLAM methods are more robust and handle complex environments better.


Unveiling Hidden Visual Information: A Reconstruction Attack Against Adversarial Visual Information Hiding

http://arxiv.org/abs/2408.04261v1

Compressor summary: The paper explores the security risks of image encryption using adversarial examples and proposes a new attack method that improves the quality of reconstructed images.


EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

http://arxiv.org/abs/2408.04259v1

Compressor summary: EfficientRAG is an efficient method for multi-hop question answering that iteratively generates queries without relying on large language models.


UHNet: An Ultra-Lightweight and High-Speed Edge Detection Network

http://arxiv.org/abs/2408.04258v1

Compressor summary: UHNet is an ultra-lightweight edge detection model for medical image processing with minimal parameters, fast computation speed, low pre-training costs, and good performance on various datasets.


Generating Fine-Grained Causality in Climate Time Series Data for Forecasting and Anomaly Detection

http://arxiv.org/abs/2408.04254v1

Compressor summary: The paper presents a fine-grained causal model and a deep generative model to discover causal relations among spatial-temporal variables for accurate time series analysis, such as climate forecasting and extreme weather alerts.


Cooperative Multi-Agent Deep Reinforcement Learning in Content Ranking Optimization

http://arxiv.org/abs/2408.04251v1

Compressor summary: The paper proposes a reinforcement learning method to optimize e-commerce search results across all positions, outperforming existing CRO models.


InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting

http://arxiv.org/abs/2408.04249v1

Compressor summary: InstantStyleGaussian is a fast and effective 3D scene style transfer method using diffusion models and iterative dataset updates.


Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

http://arxiv.org/abs/2408.04246v1

Compressor summary: Key points: - The problem of argument detection is reformulated as textual entailment across sentence boundaries - A method is proposed that encodes relations into a simple proposition and tests for entailment against the passage - The method does not need direct supervision and can potentially explicate pragmatically understood relations Summary: The paper proposes a textual entailment-based method to detect semantic arguments of predicates across sentences without supervision, and shows its effectiveness on a document-level benchmark.


Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

http://arxiv.org/abs/2408.04245v1

Compressor summary: STHD is a new model that improves channel-dependent forecasting for high-dimensional MTS data by addressing noise, training strategies, and 2-D inputs.


MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning

http://arxiv.org/abs/2408.04243v1

Compressor summary: Mu-MAE is a novel approach for multimodal human activity recognition that uses self-supervised pretraining and cross-attention fusion to achieve high accuracy in one-shot learning without external data.


The Ungrounded Alignment Problem

http://arxiv.org/abs/2408.04242v1

Compressor summary: The paper explores the Ungrounded Alignment Problem, where an unsupervised learner maps unknown font images to class labels using only letter bigram frequencies, demonstrating a way to encode specific behaviors in modality-agnostic models.


Learning to Rewrite: Generalized LLM-Generated Text Detection

http://arxiv.org/abs/2408.04237v1

Compressor summary: The paper proposes a method to train an LLM to detect its own-generated content using minimal edits, improving detection performance across different domains.


Cluster-Wide Task Slowdown Detection in Cloud System

http://arxiv.org/abs/2408.04236v1

Compressor summary: SORN is a new anomaly detection method for cloud computing clusters that uses skimming attention and neural optimal transport to capture compound periodicity and distinguish slowdowns from other fluctuations, achieving better performance than existing methods.


Enhanced Traffic Flow Prediction with Multi-Segment Fusion Tensor Graph Convolutional Networks

http://arxiv.org/abs/2408.04232v1

Compressor summary: Key points: - Traffic flow prediction helps with transportation efficiency and reliability - Existing models have limitations in capturing complex spatial-temporal dependencies - The study proposes a multi-segment fusion tensor graph convolutional network (MS-FTGCN) that captures spatial-temporal patterns, multi temporal properties, and fuses them by attention mechanism - MS-FTGCN outperforms state-of-the-art models on two datasets Summary: The study introduces a novel network (MS-FTGCN) that predicts traffic flow more accurately by capturing spatial-temporal dependencies, multi temporal properties, and fusing them using attention mechanism.


Probabilistic Circuits for Cumulative Distribution Functions

http://arxiv.org/abs/2408.04229v1

Compressor summary: The paper explores how probabilistic circuits can represent both probability mass functions and cumulative distribution functions, and shows that these representations are equivalent for binary and finite discrete variables, as well as continuous variables with smooth and decomposable PDFs and CDFs.


Evaluating Language Model Math Reasoning via Grounding in Educational Curricula

http://arxiv.org/abs/2408.04226v1

Compressor summary: The paper introduces datasets and methods to evaluate language models' mathematical abilities and finds that they have difficulties with tagging, verifying, and generating math problems based on standards.


VideoQA in the Era of LLMs: An Empirical Study

http://arxiv.org/abs/2408.04223v1

Compressor summary: This paper studies how large language models perform in video question answering, revealing their strengths and weaknesses, such as struggles with temporal content and lack of robustness and interpretability.


Connective Viewpoints of Signal-to-Noise Diffusion Models

http://arxiv.org/abs/2408.04221v1

Compressor summary: This study provides a comprehensive analysis of noise schedulers in Signal-to-Noise diffusion models, connecting them to SNR and information theory, and developing a generalized backward equation for better inference.


Diffusion Guided Language Modeling

http://arxiv.org/abs/2408.04220v1

Compressor summary: Key points: - Text generation with controlled attributes (e.g., sentiment) using guided diffusion model and auto-regressive language model - Proposed model combines fluency of auto-regressive approach and flexibility of diffusion - Outperforms previous guidance methods and requires only one classifier per attribute Summary: The paper proposes a novel text generation method that uses a guided diffusion model and an auto-regressive language model to produce fluent and flexible texts with controlled attributes, such as sentiment, using only one classifier per attribute.


Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs

http://arxiv.org/abs/2408.04217v1

Compressor summary: The study proposes a method to simplify translations for children using large language models that replace complex words with simpler ones based on their Age of Acquisitions.


Attention Mechanism and Context Modeling System for Text Mining Machine Translation

http://arxiv.org/abs/2408.04216v1

Compressor summary: The paper proposes a new architecture that combines Transformer and K-means algorithms to improve machine translation by better handling contextual ambiguity and preserving local structure.


MMREC: LLM Based Multi-Modal Recommender System

http://arxiv.org/abs/2408.04211v1

Compressor summary: The paper proposes a novel framework that uses large language models and deep learning techniques to enhance recommender systems by leveraging natural language and image data in a unified latent space, improving accuracy and relevance of recommendations.


DC Algorithm for Estimation of Sparse Gaussian Graphical Models

http://arxiv.org/abs/2408.04206v1

Compressor summary: The authors propose a new method for sparse estimation of Gaussian graphical models using the $\ell_0$ norm, which improves accuracy and edge selection compared to existing methods.


MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

http://arxiv.org/abs/2408.04203v1

Compressor summary: The paper introduces Multimodal Role-Playing Agents (MRPAs) that can simulate human multimodal perception, and presents MMRole, a framework with data and evaluation methods for developing and testing MRPAs.


Uncertainty-Aware Crime Prediction With Spatial Temporal Multivariate Graph Neural Networks

http://arxiv.org/abs/2408.04193v1

Compressor summary: Key points: - Crime forecasting is important but challenging due to sparsity and non-Gaussian nature of data - STMGNN-ZINB is a novel approach that combines diffusion, convolution, and zero-inflated negative binomial models - STMGNN-ZINB improves prediction accuracy and confidence interval precision on real-world datasets Summary: STMGNN-ZINB is a new method for crime forecasting that handles sparsity and non-Gaussian data by using diffusion, convolution, and zero-inflated negative binomial models, leading to better results than existing models.


Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

http://arxiv.org/abs/2408.04190v1

Compressor summary: LiRE is a novel PbRL method that uses second-order preference information from human feedback to learn reward models more effectively, especially with limited feedback budgets and noisy data.


Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

http://arxiv.org/abs/2408.04187v1

Compressor summary: The MedGraphRAG framework enhances LLM capabilities for generating evidence-based medical responses using a hierarchical graph structure and improves safety and reliability when handling private medical data.


pyBregMan: A Python library for Bregman Manifolds

http://arxiv.org/abs/2408.04175v1

Compressor summary: pyBregMan is a library that implements operations on Bregman manifolds, which are related to dually flat spaces in information geometry, and provides algorithms for applications in various fields.


wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

http://arxiv.org/abs/2408.04174v1

Compressor summary: wav2graph is a framework that learns knowledge graphs from speech data using transcriptions and named entity databases, and applies graph neural networks for node classification and link prediction tasks.


MultiColor: Image Colorization by Learning from Multiple Color Spaces

http://arxiv.org/abs/2408.04172v1

Compressor summary: The paper proposes MultiColor, a new method for image colorization that leverages multiple color spaces and transformer decoders to produce high-quality results.


Rotation center identification based on geometric relationships for rotary motion deblurring

http://arxiv.org/abs/2408.04171v1

Compressor summary: The paper proposes a geometric-based method for identifying the rotation center in rotary motion blurred images, which improves the performance of non-blind rotary motion deblurring methods.


M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction

http://arxiv.org/abs/2408.04170v1

Compressor summary: The study proposes a neural network model, M2EF-NNs, that fuses multimodal data using a Vision Transformer and Dempster-Shafer theory to improve cancer survival prediction.


Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions

http://arxiv.org/abs/2408.04168v1

Compressor summary: The paper proposes a novel agentic workflow that combines perception, reflection, and planning to improve the long-range city navigation ability of large language models without instructions.


mbrs: A Library for Minimum Bayes Risk Decoding

http://arxiv.org/abs/2408.04167v1

Compressor summary: MBR decoding is a text generation technique that selects high-quality outputs based on utility functions, and mbrs is an open-source library for MBR decoding with various metrics and algorithms.


Semantics or spelling? Probing contextual word embeddings with orthographic noise

http://arxiv.org/abs/2408.04162v1

Compressor summary: Using minimal orthographic noise, the study finds that contextual word embeddings from popular language models are sensitive to input data modifications and may not accurately capture semantic information.


The Data Addition Dilemma

http://arxiv.org/abs/2408.04154v1

Compressor summary: The Data Addition Dilemma is when adding more training data from different sources can hurt model performance, fairness, and subgroup accuracy due to distribution shift, and proposes heuristics to guide data scaling decisions.


Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning

http://arxiv.org/abs/2408.04150v1

Compressor summary: The paper proposes a new ensemble learning method for computer vision tasks that uses adapters to decorrelate multiple prediction heads, improving reliability and robustness without increasing training time or complexity.


ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model

http://arxiv.org/abs/2408.04145v1

Compressor summary: ComKD-CLIP is a novel approach that uses image feature alignment and educational attention to distill knowledge from large CLIP models into smaller ones, achieving comparable performance with fewer parameters.


Integrated Dynamic Phenological Feature for Remote Sensing Image Land Cover Change Detection

http://arxiv.org/abs/2408.04144v1

Compressor summary: InPhea is a remote sensing image change detection model that uses phenological features to differentiate actual changes from complex scenes and filter out pseudo-changes.


UNLEARN Efficient Removal of Knowledge in Large Language Models

http://arxiv.org/abs/2408.04140v1

Compressor summary: UNLEARN and LEARN are novel methods to selectively forget or add knowledge in large language models without retraining them, improving performance on targeted tasks.


Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering

http://arxiv.org/abs/2408.04138v1

Compressor summary: This paper compares different large language models trained on a medical dataset and finds that Sentence-t5 combined with Mistral 7B performs best in providing accurate medical information.