arxiv compressed, 2024-07-15

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-15 generated by the compressor, my personal LLM-based project.


StyleSplat: 3D Object Style Transfer with Gaussian Splatting

http://arxiv.org/abs/2407.09473v1

Compressor summary: StyleSplat is a fast method for applying diverse artistic styles to 3D objects in scenes using 3D Gaussian splatting and feature matching, achieving localized stylization and photorealistic results.


Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

http://arxiv.org/abs/2407.09468v1

Compressor summary: The text discusses the need for a broader mathematical perspective in modern machine learning to handle non-Euclidean data with intricate geometric, topological, and algebraic structures.


FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3

http://arxiv.org/abs/2407.09467v1

Compressor summary: FairyLandAI is an AI-driven storytelling model that creates personalized fairytales for children, integrating text and image generation to enhance the storytelling experience and impart moral values.


Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

http://arxiv.org/abs/2407.09453v1

Compressor summary: The text describes a method called weight block sparsity that reduces the size and computational cost of deep neural networks by zeroing some parameters in pre-trained models, leading to faster inference speeds and memory efficiency.


Human-like Episodic Memory for Infinite Context LLMs

http://arxiv.org/abs/2407.09450v1

Compressor summary: EM-LLM integrates human episodic memory into large language models, enabling them to handle infinite context lengths and outperforming state-of-the-art models in various tasks.


ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts

http://arxiv.org/abs/2407.09447v1

Compressor summary: The paper proposes a reinforcement learning approach to find prompts that can trigger toxic outputs from language models while maintaining intelligibility and realism.


MUSCLE: A Model Update Strategy for Compatible LLM Evolution

http://arxiv.org/abs/2407.09435v1

Compressor summary: The paper proposes compatibility metrics and a training strategy to minimize inconsistencies when updating large language models, aiming to improve user experience and satisfaction.


A Perspective on Foundation Models for the Electric Power Grid

http://arxiv.org/abs/2407.09434v1

Compressor summary: The paper proposes using foundation models, advanced deep learning techniques, to enhance the management of complex and uncertain aspects of electric power grids in the context of climate change and energy transition.


Rethinking temporal self-similarity for repetitive action counting

http://arxiv.org/abs/2407.09431v1

Compressor summary: The paper proposes a new framework to count repetitive actions in videos by learning embeddings and predicting action start probabilities, instead of using a temporal self-similarity matrix as an intermediate representation.


Open (Clinical) LLMs are Sensitive to Instruction Phrasings

http://arxiv.org/abs/2407.09429v1

Compressor summary: The study examines how well instruction-tuned LLMs perform on clinical NLP tasks when given different natural language instructions, finding that domain-specific models can be more brittle than general ones and that phrasing differences affect fairness.


Mitigating Entity-Level Hallucination in Large Language Models

http://arxiv.org/abs/2407.09417v1

Compressor summary: The paper proposes DRAD, a method to detect and correct hallucinations in LLMs by adapting the retrieval process based on real-time detection and using external knowledge.


A Benchmark Environment for Offline Reinforcement Learning in Racing Games

http://arxiv.org/abs/2407.09415v1

Compressor summary: OfflineMania is a new environment for offline reinforcement learning research that simulates a single-agent racing game and provides datasets to test different algorithms.


SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

http://arxiv.org/abs/2407.09413v1

Compressor summary: SPIQA is a large-scale QA dataset that tests multimodal models' ability to understand figures and tables in computer science research articles using multiple images and a Chain-of-Thought evaluation strategy.


Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution

http://arxiv.org/abs/2407.09392v1

Compressor summary: Open-Canopy is an open-access benchmark for estimating very high resolution canopy height and detecting change using satellite and aerial data in France.


GAVEL: Generating Games Via Evolution and Language Models

http://arxiv.org/abs/2407.09388v1

Compressor summary: Key points: - Automated game generation is complex and has challenges - The work uses Ludii, a language that encodes rules of many board games - The approach uses large language models and evolutionary computation to mutate and recombine games - The generated games are new, interesting, and cover unseen regions of the rules space - Some games can be played online through Ludii portal Summary: The work presents a novel approach to generate new and interesting board games using Ludii, a language that encodes many existing games, and combining large language models and evolutionary computation to create diverse and original game mechanics.


Radiance Fields from Photons

http://arxiv.org/abs/2407.09386v1

Compressor summary: Quanta radiance fields use single-photon cameras to train neural networks at the photon level, enabling high-quality view synthesis under challenging conditions like motion, low light, and dynamic range.


The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited

http://arxiv.org/abs/2407.09381v1

Compressor summary: Curvature-based rewiring may not improve message passing efficiency in real-world graphs because it does not always target oversquashed edges, and performance gains are due to hyperparameter sweeps.


FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background

http://arxiv.org/abs/2407.09379v1

Compressor summary: The paper proposes FANet, a network that uses AFE blocks to incorporate semantic information for better semantic segmentation in complex scenes with cluttered backgrounds and translucent objects.


Graph Neural Network Causal Explanation via Neural Causal Models

http://arxiv.org/abs/2407.09378v1

Compressor summary: { ame} is a causal graph neural network explainer that identifies important subgraphs by training neural causal models on the input graph.


HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

http://arxiv.org/abs/2407.09375v1

Compressor summary: This paper investigates how State Space Models can learn from past observations and make predictions without retraining, by using a new weight construction method that approximates input signal derivatives.


Towards Personalised Patient Risk Prediction Using Temporal Hospital Data Trajectories

http://arxiv.org/abs/2407.09373v1

Compressor summary: The text proposes a method to group ICU patients based on their observation trajectories and develop personalized risk predictions, improving clinical decision making.


ConRebSeg: A Segmentation Dataset for Reinforced Concrete Construction

http://arxiv.org/abs/2407.09372v1

Compressor summary: The text introduces a new dataset of images for autonomous robots in construction and analyzes their performance, suggesting more data and consistent labels are needed.


Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

http://arxiv.org/abs/2407.09370v1

Compressor summary: Sinusoidal positional encoding (SPE) is a new method that adapts to different tasks without hyperparameter tuning, improving performance in various tasks like 3D view synthesis, Text-to-Speech generation, and 1D regression.


Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

http://arxiv.org/abs/2407.09367v1

Compressor summary: The paper introduces a new approach to adapt pre-trained models to unsupervised domain shifts by using uncertainty-aware data buffering, graph-based class relation preservation, and pseudo-target replay.


Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text

http://arxiv.org/abs/2407.09364v1

Compressor summary: WhosAI is a new framework that can detect and reveal whether text was written by humans or AI, using contrastive learning to learn semantic similarity representations from multiple generators.


A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

http://arxiv.org/abs/2407.09359v1

Compressor summary: GLASS is a novel anomaly synthesis strategy that enhances unsupervised anomaly detection by combining global and local synthesis methods, achieving state-of-the-art results in weak defect detection and industrial applications.


Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees

http://arxiv.org/abs/2407.09357v1

Compressor summary: STGG+ is a Transformer-based method for generating molecules with desired properties by incorporating random masking, property prediction loss, and other improvements, achieving state-of-the-art results on various tasks.


Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems

http://arxiv.org/abs/2407.09352v1

Compressor summary: The paper proposes an implicit method for solving Electromagnetic Inverse Scattering Problems, which improves the accuracy of non-invasively determining the internal relative permittivity of a scatterer using electromagnetic fields.


Pre-training Point Cloud Compact Model with Partial-aware Reconstruction

http://arxiv.org/abs/2407.09344v1

Compressor summary: Point-CPR is a compact point cloud model that uses partial-aware prediction and a local aggregation encoder to improve 3D representation and reduce model size compared to existing Masked Point Modeling methods.


Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

http://arxiv.org/abs/2407.09336v1

Compressor summary: Key points: - Contrastive learning is a key technique for unsupervised time series representation learning - Augmentation choice affects performance significantly, but is often empirical or grid searching - The paper proposes a framework to select augmentations based on trend and seasonality of datasets - The framework is evaluated on synthetic and real-world datasets, showing better results than baselines Summary: The paper presents a principled method to recommend augmentations for contrastive learning in time series analysis, based on dataset characteristics such as trend and seasonality, and shows its effectiveness on diverse datasets.


Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda

http://arxiv.org/abs/2407.09327v1

Compressor summary: The article presents a multilingual bias and propaganda annotated corpus from Facebook posts about the Israeli War on Gaza, created for a shared task and used to evaluate performance of detection techniques.


Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis

http://arxiv.org/abs/2407.09311v1

Compressor summary: The authors present a new method for finding Bayesian Network structures using multiple LLMs and majority voting, compare it to an alternative method, and discuss its scalability and applicability.


ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

http://arxiv.org/abs/2407.09303v1

Compressor summary: ProDepth is a novel framework that uses a probabilistic approach to address inconsistencies caused by dynamic objects in multi-frame monocular depth estimation, improving performance on various datasets.


PID: Physics-Informed Diffusion Model for Infrared Image Generation

http://arxiv.org/abs/2407.09299v1

Compressor summary: The Physics-Informed Diffusion model translates RGB images to infrared images while adhering to physical laws, achieving better results than existing methods.


Transformer Layers as Painters

http://arxiv.org/abs/2407.09298v1

Compressor summary: The study investigates how transformer layers work and how to use them more efficiently for different problems.


Learning Distances from Data with Normalizing Flows and Score Matching

http://arxiv.org/abs/2407.09297v1

Compressor summary: The paper proposes a new method to estimate density-based distances in high-dimensional data using normalizing flows and a dimension-adapted Fermat distance.


SS-SfP:Neural Inverse Rendering for Self Supervised Shape from (Mixed) Polarization

http://arxiv.org/abs/2407.09294v1

Compressor summary: The text presents a novel method to estimate 3D shapes and refractive index from single-view polarization images using a modified polarization reflection model, reflectance cues, and an inverse rendering-based deep learning framework.


WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation

http://arxiv.org/abs/2407.09288v1

Compressor summary: The paper presents a new dataset for winter sports equipment segmentation and tests interactive segmentation models with online adaptation methods to improve efficiency and accuracy.


Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

http://arxiv.org/abs/2407.09287v1

Compressor summary: The study proposes a hierarchical framework that combines language understanding and reinforcement learning to enable AI agents to execute complex language instructions in virtual environments.


MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

http://arxiv.org/abs/2407.09285v1

Compressor summary: The MetaFood Workshop and its challenge aim to improve 3D food reconstruction for nutrition monitoring using a visible checkerboard as a size reference, with 16 teams submitting results on varying difficulty levels.


DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection

http://arxiv.org/abs/2407.09283v1

Compressor summary: DAHRS improves multilingual SRL projection accuracy by addressing spurious role labels caused by naturally occurring divergences and using linguistically-informed alignment remediation followed by FCFA projection.


Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

http://arxiv.org/abs/2407.09281v1

Compressor summary: The paper compares large language models and a cognitive instance-based learning model in predicting human behavior in sequential decision-making tasks, finding that LLMs are better at incorporating feedback while the cognitive IBL model captures loss aversion bias.


H2O-Danube3 Technical Report

http://arxiv.org/abs/2407.09276v1

Compressor summary: H2O-Danube3 is a small language model pre-trained on Web data, with high performance and portability for various tasks and devices.


Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

http://arxiv.org/abs/2407.09274v1

Compressor summary: HelixProtX is a system that uses a large multimodal model to transform any input protein modality into any desired protein modality, enabling better understanding and generation of protein data and outperforming existing models in various tasks.


iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

http://arxiv.org/abs/2407.09271v1

Compressor summary: Our method improves continual learning for vision tasks by using incremental neural mesh models, latent space initialization, and positional regularization, achieving better performance in both in-domain and out-of-distribution scenarios.


Context Embeddings for Efficient Answer Generation in RAG

http://arxiv.org/abs/2407.09252v1

Compressor summary: COCOM compresses long contexts for Retrieval-Augmented Generation, speeding up decoding time and improving answer quality.


Semantic UV mapping to improve texture inpainting for indoor scenes

http://arxiv.org/abs/2407.09248v1

Compressor summary: The authors propose a method that uses semantic information to improve UV mapping and 3D reconstruction of indoor scenes after clutter removal.


Constrained Intrinsic Motivation for Reinforcement Learning

http://arxiv.org/abs/2407.09247v1

Compressor summary: Constrained Intrinsic Motivation (CIM) improves unsupervised skill discovery and exploration with intrinsic motivation in reinforcement learning tasks by addressing challenges like static skills, limited state coverage, and suboptimality.


The Sociolinguistic Foundations of Language Modeling

http://arxiv.org/abs/2407.09241v1

Compressor summary: The paper proposes a sociolinguistic approach to language modeling, considering how different language varieties affect various challenges and emphasizing the importance of accurate representation in training data.


Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks

http://arxiv.org/abs/2407.09236v1

Compressor summary: The study proposes a visual intuition model based on Gestalt theory to improve CNN performance by completing missing information in images, and tests it on MNIST data set.


Surgical Text-to-Image Generation

http://arxiv.org/abs/2407.09230v1

Compressor summary: The authors propose Surgical Imagen, a text-to-image generative model that can create realistic surgical images from textual descriptions, addressing challenges like high annotation costs and ethical constraints in acquiring surgical data for research and development.


Evaluating AI Evaluation: Perils and Prospects

http://arxiv.org/abs/2407.09221v1

Compressor summary: The paper argues for a reform in AI evaluation methods using cognitive sciences and identifies challenges and promising research directions.


A Fair Ranking and New Model for Panoptic Scene Graph Generation

http://arxiv.org/abs/2407.09216v1

Compressor summary: The paper corrects an error in panoptic scene graph generation evaluations, shows that two-stage models are competitive to one-stage models, and introduces a new two-stage model (DSFormer) that outperforms existing models.


HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation

http://arxiv.org/abs/2407.09215v1

Compressor summary: HUP-3D is a large, diverse, and realistic synthetic dataset for estimating hand-ultrasound probe pose in obstetric ultrasound using markerless 3D joint pose estimation with potential applications in medical education and guidance.


Generating SROI^{-} Ontologies via Knowledge Graph Query Embedding Learning

http://arxiv.org/abs/2407.09212v1

Compressor summary: AConE is a novel query embedding method that explains knowledge in SROI^{-} description logic and outperforms previous models with fewer parameters.


Pronunciation Assessment with Multi-modal Large Language Models

http://arxiv.org/abs/2407.09209v1

Compressor summary: The paper proposes a LLM-based scoring system for automated language learning assessment that uses speech encoder and modality adapter layer to generate accuracy and fluency scores.


A Chatbot for Asylum-Seeking Migrants in Europe

http://arxiv.org/abs/2407.09197v1

Compressor summary: ACME is a chatbot that helps migrants in Europe find the best protection option using computational argumentation.


Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy

http://arxiv.org/abs/2407.09192v1

Compressor summary: The text describes a method for accurate anatomical landmark detection using diffusion models that generate probability regions as heatmaps, achieving high quality results in medical image processing.


From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

http://arxiv.org/abs/2407.09191v1

Compressor summary: CAFE is a model-agnostic learning strategy for panoptic scene graph generation that incorporates shape-aware features in an easy-to-hard manner and outperforms existing methods.


Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings

http://arxiv.org/abs/2407.09187v1

Compressor summary: The study proposes a method to detect depression in Bangla social media posts using advanced natural language processing and deep learning techniques, achieving better results than existing approaches.


Variational Inference via Smoothed Particle Hydrodynamics

http://arxiv.org/abs/2407.09186v1

Compressor summary: SPH-ParVI is a new variational inference method that uses smoothed particle hydrodynamics to simulate fluid flow and sample probabilistic models efficiently.


Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers

http://arxiv.org/abs/2407.09184v1

Compressor summary: The study introduces a new dataset (SIKO) for Korean language models to improve their handling of incomplete syntax in natural language processing.


Exploring the Effectiveness of Methods for Persona Extraction

http://arxiv.org/abs/2407.09181v1

Compressor summary: The paper studies how to extract dialogue participant information and evaluate their performance in Russian using Multi-Session Chat dataset, F-score metric, and various models, finding that all models have low recall and larger models improve extraction.


DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

http://arxiv.org/abs/2407.09174v1

Compressor summary: DART is an automated pipeline for object detection in industrial applications that uses image generation, open-vocabulary annotation, and multimodal review to streamline the workflow and improve performance.


Conformal Inductive Graph Neural Networks

http://arxiv.org/abs/2407.09173v1

Compressor summary: Conformal prediction can be applied to transductive node-classification using exchangeable graphs, preserving coverage and statistical efficiency.


Machine Apophenia: The Kaleidoscopic Generation of Architectural Images

http://arxiv.org/abs/2407.09172v1

Compressor summary: The study applies generative AI to create unique architectural designs using neural networks trained on human data, producing coherent images with captions shared online.


SE(3)-bi-equivariant Transformers for Point Cloud Assembly

http://arxiv.org/abs/2407.09167v1

Compressor summary: SE(3)-bi-equivariant transformer (BITR) is a method to align non-overlapped point clouds by exploiting SE(3)-bi-equivariance prior, which ensures robustness against rigid perturbations and initial positions.


Robust Yet Efficient Conformal Prediction Sets

http://arxiv.org/abs/2407.09165v1

Compressor summary: Conformal prediction can produce robust prediction sets against adversarial examples, by bounding the change in conformity scores with efficient algorithms.


Exploring State Space and Reasoning by Elimination in Tsetlin Machine

http://arxiv.org/abs/2407.09162v1

Compressor summary: The paper proposes a method to improve the Tsetlin Machine's word embedding by incorporating feature negations and optimizing its parameters, achieving high accuracy in pattern classification tasks.


Weakly-supervised Autism Severity Assessment in Long Videos

http://arxiv.org/abs/2407.09159v1

Compressor summary: The paper presents a video-based method to detect and categorize severity of autism using spatio-temporal features and a shallow TCN-MLP network, which can aid clinicians in autism spectrum analysis.


The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

http://arxiv.org/abs/2407.09152v1

Compressor summary: The study examined four large language models' abilities to generate and detect hallucinations, using ensemble voting for detection, in the CLEF ELOQUENT HalluciGen shared task.


Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off

http://arxiv.org/abs/2407.09150v1

Compressor summary: Key points: - Machine learning models can be fooled by small adversarial perturbations - Semantic segmentation models are more sensitive than image classification models - New attacks and detailed analysis reveal the extent of the problem - Size-bias and diversity of attacks make evaluation challenging Summary: The paper shows that semantic segmentation models are highly vulnerable to adversarial perturbations, which can fool them by altering small parts of images. The authors propose new attacks and analyze the models in detail, finding size-bias and diversity issues that complicate evaluation.


Accuracy is Not All You Need

http://arxiv.org/abs/2407.09141v1

Compressor summary: Summary: The paper studies how compression techniques affect LLMs' quality and suggests using KL-Divergence and flips as better evaluation metrics than accuracy.


Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

http://arxiv.org/abs/2407.09136v1

Compressor summary: The authors propose a method to improve dialog tutoring models by detecting student errors in math reasoning problems and generating targeted feedback based on the errors.


Robustness of Explainable Artificial Intelligence in Industrial Process Modelling

http://arxiv.org/abs/2407.09127v1

Compressor summary: The paper evaluates XAI methods using an EAF model and a novel scoring method based on ground truth simulations and sensitivity analysis, showing how well they explain the data-generating process.


Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

http://arxiv.org/abs/2407.09124v1

Compressor summary: The text proposes a photonic-based algorithm for multi-agent decision-making that balances exploration and exploitation without explicit information sharing, using chaotic lasers and decentralized coupling adjustment.


Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

http://arxiv.org/abs/2407.09121v1

Compressor summary: This study proposes a new method, DeRTa, to improve the safety of Large Language Models by training them to recognize and refuse harmful content at any position in a response.


URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

http://arxiv.org/abs/2407.09120v1

Compressor summary: URRL-IMVC is a novel method for incomplete multi-view clustering that leverages multi-view information and neighboring samples to generate robust embeddings without relying on cross-view contrastive learning or missing view recovery.


Layer-Wise Relevance Propagation with Conservation Property for ResNet

http://arxiv.org/abs/2407.09115v1

Compressor summary: The paper presents a method to explain ResNet neural networks by extending Layer-wise Relevance Propagation with Relevance Splitting to handle skip connections, and shows its effectiveness on ImageNet and Caltech-UCSD Birds datasets.


Inference Optimization of Foundation Models on AI Accelerators

http://arxiv.org/abs/2407.09111v1

Compressor summary: The tutorial discusses optimization techniques for fast and efficient inference using AI accelerators with Transformer-based foundation models in various applications.


Enhancing Training Efficiency Using Packing with Flash Attention

http://arxiv.org/abs/2407.09105v1

Compressor summary: Packing and Flash Attention with proper masking improve LLM training efficiency and accuracy by combining multiple examples up to the maximum sequence length without wasting GPU resources or affecting attention computation.


DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents

http://arxiv.org/abs/2407.09103v1

Compressor summary: DANIEL is a fast, end-to-end architecture for handwritten document understanding that integrates language modeling, layout analysis, text recognition, and named entity recognition across multiple languages, layouts, and tasks.


STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs

http://arxiv.org/abs/2407.09096v1

Compressor summary: Key points: - The paper proposes STD-LLM, a model that can do spatial-temporal forecasting and imputation tasks using LLMs - STD-LLM uses tokenizers, virtual nodes, node embeddings, and hypergraph learning to understand spatial-temporal data - STD-LLM performs well on various datasets and few-shot/zero-shot learning tasks Summary: The paper introduces STD-LLM, a model that leverages LLMs to handle both forecasting and imputation for spatial-temporal data using special tokens, embeddings, and hypergraph learning, achieving strong results on different datasets and few-shot/zero-shot scenarios.


On Exact Bit-level Reversible Transformers Without Changing Architectures

http://arxiv.org/abs/2407.09093v1

Compressor summary: The paper proposes bit-level reversible transformers by treating each block as an Euler integration approximation and using bidirectional integration approximation, which improves accuracy and data-throughput in training.


On the Role of Discrete Tokenization in Visual Representation Learning

http://arxiv.org/abs/2407.09087v1

Compressor summary: The paper explores the impact of discrete tokens in masked image modeling, proposes a new metric to measure their effectiveness, and introduces ClusterMIM, a method that outperforms existing approaches on various datasets and models.


Open Vocabulary Multi-Label Video Classification

http://arxiv.org/abs/2407.09073v1

Compressor summary: The authors propose a method to adapt a pre-trained vision-language model (VLM) for open vocabulary multilabel video classification by using large language models (LLMs) to provide semantic guidance and integrating temporal modeling into the VLM's encoder.


New Desiderata for Direct Preference Optimization

http://arxiv.org/abs/2407.09072v1

Compressor summary: The text discusses the challenges of using direct preference optimization (DPO) methods for fine-tuning language models based on human feedback, and proposes a new loss function to address these issues.


Spectral Self-supervised Feature Selection

http://arxiv.org/abs/2407.09061v1

Compressor summary: Key points: - Unsupervised feature selection for high-dimensional data - Self-supervised graph-based approach using pseudo-labels from graph Laplacian eigenvectors - Robust to outliers and complex substructures - Effective on real-world datasets, especially biological ones Summary: The paper presents a robust and effective self-supervised graph-based method for unsupervised feature selection from high-dimensional data, using pseudo-labels from graph Laplacian eigenvectors.


Domain-adaptive Video Deblurring via Test-time Blurring

http://arxiv.org/abs/2407.09059v1

Compressor summary: The paper proposes a domain adaptation scheme that uses a blurring model to generate training pairs for fine-tuning a video deblurring model in unseen domains, improving performance on real-world videos.


PersonificationNet: Making customized subject act like a person

http://arxiv.org/abs/2407.09057v1

Compressor summary: PersonificationNet is a model that can make a cartoon or toy act like a person by copying their pose and appearance from few images.


From MIDI to Rich Tablatures: an Automatic Generative System incorporating Lead Guitarists' Fingering and Stylistic choices

http://arxiv.org/abs/2407.09052v1

Compressor summary: Key points: - The system generates tablatures for lead electric guitar from MIDI melodies - It solves a multi-attribute optimization problem to find the best fingering - It incorporates common clichés, biomechanical feasibility, articulations, and expressive techniques - It converts the output into MusicXML format for easy use Summary: The system creates tablatures for lead electric guitar from MIDI melodies by optimizing fingering, incorporating musical aspects, and converting to MusicXML.


KUNPENG: An Embodied Large Model for Intelligent Maritime

http://arxiv.org/abs/2407.09048v1

Compressor summary: Key points: - Intelligent maritime is a part of smart ocean construction that uses AI and data analysis to improve efficiency and intelligence of ocean transportation - It faces challenges from complex and dynamic maritime environment and diverse data sources - KUNPENG is a model for intelligent maritime that perceives heterogeneous data, makes decisions, and optimizes power for safe and efficient navigation Summary: KUNPENG is an AI model that enhances the efficiency and intelligence of ocean transportation by navigating vessels safely and optimally in complex maritime environments using diverse data sources.


Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

http://arxiv.org/abs/2407.09047v1

Compressor summary: The paper proposes a method for incremental semantic segmentation that balances learning from old and new classes using prototype-guided techniques and weight-guided consolidation.


Molecule Language Model with Augmented Pairs and Expertise Transfer

http://arxiv.org/abs/2407.09043v1

Compressor summary: AMOLE is a model that enhances the understanding of molecules and their texts by preserving structural similarity and transferring expertise.


Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach

http://arxiv.org/abs/2407.09039v1

Compressor summary: TRIL3 is a new method for continuous learning in tabular data classification that uses synthetic data to prevent forgetting and achieves superior performance with minimal synthetic data.


Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

http://arxiv.org/abs/2407.09033v1

Compressor summary: The paper proposes a text-based semantic segmentation method that leverages vision-language models and transformers to improve generalization across domains, achieving state-of-the-art results on GTA5$->$Cityscapes.


HPC: Hierarchical Progressive Coding Framework for Volumetric Video

http://arxiv.org/abs/2407.09026v1

Compressor summary: The paper proposes HPC, a framework for compressing volumetric videos based on NeRF that enables variable bitrate and quality using a single model, reducing temporal redundancy and optimizing compression with multi-rate-distortion loss function.


SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

http://arxiv.org/abs/2407.09025v1

Compressor summary: SpreadsheetLLM introduces an efficient encoding framework for large language models to effectively understand and reason with spreadsheets using SheetCompressor and Chain of Spreadsheet.


Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

http://arxiv.org/abs/2407.09024v1

Compressor summary: The paper proposes a two-stage optimization method for offline Reinforcement Learning using language model alignment, introduces Efficient Diffusion Alignment (EDA) for continuous control problems, and shows its superior performance on the D4RL benchmark.


3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

http://arxiv.org/abs/2407.09020v1

Compressor summary: This paper proposes a multimodal and multi-teacher approach to improve mental health classification using social media data, which integrates diverse features like text and sound, and distributes learning across specialized teachers.


AI-Driven Guided Response for Security Operation Centers with Microsoft Copilot for Security

http://arxiv.org/abs/2407.09017v1

Compressor summary: Copilot Guided Response (CGR) is an ML system that helps security analysts investigate, triage, and remediate security incidents by providing historical context, determining the nature of the incident, and suggesting containment actions.


CompAct: Compressing Retrieved Documents Actively for Question Answering

http://arxiv.org/abs/2407.09014v1

Compressor summary: CompAct is a framework that compresses extensive documents for language models to improve multi-hop question-answering without losing key information.


Procedural Content Generation via Generative Artificial Intelligence

http://arxiv.org/abs/2407.09013v1

Compressor summary: This paper surveys how generative AI is used for creating various types of content in PCG and discusses the challenges of limited domain-specific training data.


TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

http://arxiv.org/abs/2407.09012v1

Compressor summary: TCAN is a pose-driven human image animation method that uses ControlNet and LoRA to create realistic videos with robustness to erroneous poses and consistent over time, while also allowing for a more static background.


One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

http://arxiv.org/abs/2407.09011v1

Compressor summary: The paper proposes a method to improve question answering systems by using supervised contrastive learning, which enhances robustness and efficiency in handling various user inputs.


Benchmarking Language Model Creativity: A Case Study on Code Generation

http://arxiv.org/abs/2407.09007v1

Compressor summary: The paper introduces a framework to measure the creativity of large language models using two characteristics: convergent and divergent thinking, and applies it to Codeforces problems.


Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

http://arxiv.org/abs/2407.09005v1

Compressor summary: The paper proposes a new AI model for recognizing objects in maritime scenes, introduces a benchmark dataset, and develops a performance evaluation method to standardize and improve autonomous navigation systems.


Enhancing Few-Shot Stock Trend Prediction with Large Language Models

http://arxiv.org/abs/2407.09003v1

Compressor summary: The summary: The authors propose a 'denoising-then-voting' method using large language models to predict stock trends from individual news instead of merged news, overcoming noise and input length limits and achieving comparable performance to supervised methods.


Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs

http://arxiv.org/abs/2407.08995v1

Compressor summary: The study proposes a method for LLMs to generate their own role-play prompts through fine-tuning, improving their performance in various domains and automating complex prompting strategies.


Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

http://arxiv.org/abs/2407.08994v1

Compressor summary: Key points: - Point-based neural models are effective for point cloud analysis but have issues with input embedding and neighboring aggregations - The paper proposes GAD, a network with CPT and DKFF modules to address these issues - Experiments show the superior performance of GAD on various tasks Summary: The paper introduces GAD, a network that improves point cloud analysis by using global attention and dual-domain feature learning to enhance input embedding and neighboring aggregations.


Task-driven single-image super-resolution reconstruction of document scans

http://arxiv.org/abs/2407.08993v1

Compressor summary: The paper explores using deep learning for super-resolution to improve optical character recognition from document scans, introducing a multi-task loss function to address the ill-posedness of the problem.


Emotion Talk: Emotional Support via Audio Messages for Psychological Assistance

http://arxiv.org/abs/2407.08992v1

Compressor summary: Emotion Talk is a system that provides emotional support through audio messages in Portuguese, analyzing and responding to users' emotions outside therapy sessions.


Robustness of LLMs to Perturbations in Text

http://arxiv.org/abs/2407.08989v1

Compressor summary: This study evaluates large language models' robustness against morphological variations in text and finds that they perform better than previous pre-trained models on real-world benchmarks like GEC and LSC.


Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

http://arxiv.org/abs/2407.08978v1

Compressor summary: The text describes a new dataset of Chinese-English literature with complex discourse structures, proposes chapter-to-chapter translation as a pragmatic context-aware translation task, and explores the performance of machine translation models and large language models on this setting.


Integrating White and Black Box Techniques for Interpretable Machine Learning

http://arxiv.org/abs/2407.08973v1

Compressor summary: The paper proposes an ensembling method that uses transparent and opaque models for easy and hard inputs respectively to balance interpretability and performance.


Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

http://arxiv.org/abs/2407.08972v1

Compressor summary: The paper evaluates the robustness of large kernel convolutional neural networks (CNNs) and compares their performance to transformers (ViTs) on six benchmark datasets, revealing novel insights into their properties.


Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization

http://arxiv.org/abs/2407.08971v1

Compressor summary: FuSTAL improves pseudo label quality for action localization in untrimmed videos using cross-video contrastive learning, prior-based filtering, and EMA-based distillation, achieving state-of-the-art results.


SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification

http://arxiv.org/abs/2407.08968v1

Compressor summary: SlideGCD is a WSI analysis pipeline that uses slide-based graph construction and graph learning to explore inter-correlations between slides for cancer diagnostics, improving the performance of existing multi-instance learning methods on TCGA datasets.


Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models

http://arxiv.org/abs/2407.08967v1

Compressor summary: The text proposes a Dual-System Augmented Relation Extractor (DSARE) that combines traditional RE models with LLMs to address their limitations in few-shot relation extraction tasks.


LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

http://arxiv.org/abs/2407.08966v1

Compressor summary: LAPT is a novel approach for OOD detection that automatically generates prompts with class names and negative labels, improving reliability and reducing manual intervention.


Lite-SAM Is Actually What You Need for Segment Everything

http://arxiv.org/abs/2407.08965v1

Compressor summary: Lite-SAM is an efficient end-to-end solution for SegEvery with a streamlined CNN-Transformer hybrid encoder, an automated prompt proposal network, and a mask decoder that achieves state-of-the-art performance while reducing computational costs.


Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

http://arxiv.org/abs/2407.08964v1

Compressor summary: CA-RL improves traffic efficiency and safety by using communication-aware Reinforcement Learning to optimize CACC in Connected and Autonomous Vehicles.


Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

http://arxiv.org/abs/2407.08959v1

Compressor summary: The paper proposes HierICRF, a method that improves few-shot hierarchical text classification by adapting unstructured semantic spaces in pre-trained language models to the downstream domain hierarchy using iterative language modeling and hierarchical consistency correction.


Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection

http://arxiv.org/abs/2407.08952v1

Compressor summary: DAFND is a model that improves fake news detection by enhancing large language models with a Detection, Investigation, Judge, and Determination module that utilizes both inside and outside information.


Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration

http://arxiv.org/abs/2407.08950v1

Compressor summary: MSFSNet is a network that improves image restoration by selectively recovering information using spatial and frequency domain knowledge, dynamic filter selection, and skip feature fusion.


One-Shot Pose-Driving Face Animation Platform

http://arxiv.org/abs/2407.08949v1

Compressor summary: The authors propose a method to generate dynamic and expressive talking head videos from a single face image, improving the existing Image2Video model with a Face Locator and Motion Frame mechanism, and provide a demo platform for easy use.


Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

http://arxiv.org/abs/2407.08947v1

Compressor summary: Key points: - The paper proposes a method to build interpretable models (CBMs) using foundation models with minimal human effort - The method reduces biases and vulnerability to spurious correlations in CBMs - The method is evaluated on multiple datasets and shows effectiveness Summary: The paper presents a novel framework that leverages foundation models to construct interpretable models with low bias and high robustness, and evaluates it on various datasets.


Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

http://arxiv.org/abs/2407.08946v1

Compressor summary: The paper proposes a new self-supervised training objective for diffusion models to improve denoising in regions outside the standard training distribution, leading to better sampling quality and performance.


Bora: Biomedical Generalist Video Generation Model

http://arxiv.org/abs/2407.08944v1

Compressor summary: Bora is a new spatio-temporal diffusion probabilistic model that can generate realistic and diverse biomedical videos from text prompts, potentially improving medical education, decision-making, and training.


Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

http://arxiv.org/abs/2407.08940v1

Compressor summary: The paper evaluates large language models' ability to generate novel hypotheses from biomedical literature, using various settings and metrics, and finds that they can produce valid and diverse hypotheses with some limitations.


LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

http://arxiv.org/abs/2407.08939v1

Compressor summary: The paper introduces LightenDiffusion, a diffusion-based unsupervised framework that enhances low-light images using Retinex theory and physically explainable diffusion models, improving visual quality over existing methods.


Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

http://arxiv.org/abs/2407.08937v1

Compressor summary: The text describes a framework that enables large language models to learn from experience and improve their performance on various tasks, demonstrating the feasibility of using LLMs to mimic human experiential learning.


Compositional Structures in Neural Embedding and Interaction Decompositions

http://arxiv.org/abs/2407.08934v1

Compressor summary: The text explores how vector embeddings in neural networks relate to conditional independence constraints and helps explain structural patterns in data representations.


Machine Learning in High Volume Media Manufacturing

http://arxiv.org/abs/2407.08933v1

Compressor summary: The authors develop a novel program that combines rule-based and machine learning approaches to identify and adapt to errors or failures in high-volume manufacturing environments.


Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Vehicle Decision-Making in Dynamic Environment

http://arxiv.org/abs/2407.08932v1

Compressor summary: The paper proposes a simple framework for autonomous vehicles to make decisions in urban environments by incorporating the significance of surrounding vehicles and contextual information using deep attention, reinforcement learning, and an encoder.


Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

http://arxiv.org/abs/2407.08931v1

Compressor summary: Key points: - OVD is detecting objects in a scene without predefined classes - The paper proposes GLIS, a method that uses both object-level and scene-level features from lidar point clouds - GLIS also uses LLM for inference and RPLG and BAOL for refinement - Experiments show the effectiveness of GLIS on two datasets Summary: The paper introduces GLIS, a lidar-based OVD method that leverages object-level and scene-level features, as well as LLM inference, RPLG, and BAOL for refinement.


Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?

http://arxiv.org/abs/2407.08922v1

Compressor summary: This paper introduces a benchmark and a novel evaluation metric (c-score) to assess whether large language models understand the underlying physicochemical mechanisms of gold nanoparticle synthesis, suggesting their potential for advancing scientific discovery.


Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

http://arxiv.org/abs/2407.08918v1

Compressor summary: This paper proposes a new framework using complex networks to analyze and improve knowledge transfer in evolutionary many-task optimization (EMaTO).


Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

http://arxiv.org/abs/2407.08916v1

Compressor summary: The study creates a movie recommendation system using machine learning methods like NMF, SVD, and K-Means clustering to improve user experience.


PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

http://arxiv.org/abs/2407.08910v1

Compressor summary: PAIL is a novel method that uses imitation learning and adversarial training to optimize industrial operations for carbon neutrality without predefined rewards.


KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting

http://arxiv.org/abs/2407.08909v1

Compressor summary: KGpose is a novel framework that uses keypoint graphs to estimate 6D poses of multiple objects from RGB and point cloud features, achieving competitive results for robotic applications.


Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval

http://arxiv.org/abs/2407.08908v1

Compressor summary: Key points: - Image retrieval is important for various applications and requires human expertise - CHAIR is a model that allows humans to correct intermediate concepts and improves embedding generation - CHAIR outperforms similar models without intervention and benefits from human guidance Summary: CHAIR is a human-in-the-loop image retrieval model that enables better performance by letting humans correct intermediate concepts.


AirSketch: Generative Motion to Sketch

http://arxiv.org/abs/2407.08906v1

Compressor summary: AirSketch is a method for generating sketches from hand motions without needing expensive hardware or markers, using controllable image diffusion models.


Application of Artificial Intelligence in Supporting Healthcare Professionals and Caregivers in Treatment of Autistic Children

http://arxiv.org/abs/2407.08902v1

Compressor summary: The paper presents an AI algorithm that can accurately detect Autism Spectrum Disorder by analyzing facial and bodily expressions of children during daily activities.


IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents

http://arxiv.org/abs/2407.08898v1

Compressor summary: The paper introduces a toolkit for creating interactive AI agents that understand natural language and execute grounded instructions in a Minecraft-like environment, as well as an evaluation platform with human annotators to assess their performance.