arxiv compressed, 2024-08-16

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-16 generated by the compressor, my personal LLM-based project.


Can Large Language Models Understand Symbolic Graphics Programs?

http://arxiv.org/abs/2408.08313v1

Compressor summary: The authors create a benchmark to test large language models' understanding of symbolic graphics programs, which generate visual content, and find that existing models struggle with this task but finetuning improves their performance.


ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws

http://arxiv.org/abs/2408.08310v1

Compressor summary: ScalingFilter is a new method to evaluate text quality for pre-training large language models by comparing perplexities of two models on the same data, without relying on a reference dataset, and it improves zero-shot performance while preserving semantic diversity.


Understanding the Local Geometry of Generative Model Manifolds

http://arxiv.org/abs/2408.08307v1

Compressor summary: The paper investigates how local geometric properties of generative models' manifolds relate to generation quality and proposes using reward models trained on these properties to control generation outcomes.


Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors

http://arxiv.org/abs/2408.08302v1

Compressor summary: The paper evaluates large language models' abilities to solve transportation engineering problems using the new benchmark dataset TransportBench, revealing each model's strengths and limitations.


SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training

http://arxiv.org/abs/2408.08295v1

Compressor summary: SLCA++ is a framework that improves continual learning with pre-training by slowing down the learning rate and aligning classification layers, achieving state-of-the-art results on image classification benchmarks.


The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

http://arxiv.org/abs/2408.08291v1

Compressor summary: ShareLM is a collection of human-model conversations that can be shared using a Web extension plugin to improve open source language models.


Absence of Closed-Form Descriptions for Gradient Flow in Two-Layer Narrow Networks

http://arxiv.org/abs/2408.08286v1

Compressor summary: The paper shows that the training dynamics of two-layer neural networks are non-integrable, meaning they are complex and hard to predict, which implies the need for numerical methods in optimizing neural networks.


BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

http://arxiv.org/abs/2408.08274v1

Compressor summary: BAM is a method for improving large language models by using experts' attention parameters as well as their feed-forward networks to initialize Mixture of Experts (MoE) layers, achieving better performance than existing methods.


HeightLane: BEV Heightmap guided 3D Lane Detection

http://arxiv.org/abs/2408.08270v1

Compressor summary: HeightLane is a novel method for 3D lane detection from monocular images that uses a height map to improve spatial accuracy and recognition.


mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis

http://arxiv.org/abs/2408.08261v1

Compressor summary: mhGPT is a small but powerful AI model that helps with mental health tasks using social media and PubMed data, and it works well even on low-end devices.


GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

http://arxiv.org/abs/2408.08260v1

Compressor summary: GSVD-NMF is a method to improve non-negative matrix factorization by recovering missing components using generalized singular value decomposition.


Snuffy: Efficient Whole Slide Image Classifier

http://arxiv.org/abs/2408.08258v1

Compressor summary: Snuffy is a sparse transformer-based MIL-pooling method for WSI classification in digital pathology that requires limited pre-training and achieves superior performance on two datasets.


Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

http://arxiv.org/abs/2408.08252v1

Compressor summary: Our method improves naturalness while optimizing downstream rewards in diffusion models without differentiable proxies or fine-tuning, using soft value functions and iterative sampling.


Computer Vision Model Compression Techniques for Embedded Systems: A Survey

http://arxiv.org/abs/2408.08250v1

Compressor summary: The paper reviews model compression techniques for computer vision tasks to enable the use of large deep neural networks in embedded systems, discusses different approaches and their performance variations on various devices, and provides codes and case studies.


Conformalized Answer Set Prediction for Knowledge Graph Embedding

http://arxiv.org/abs/2408.08248v1

Compressor summary: The paper proposes using conformal prediction to generate answer sets with probabilistic guarantees for link prediction tasks in knowledge graph embeddings, addressing the issue of distinguishing plausible from implausible answers.


Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators

http://arxiv.org/abs/2408.08230v1

Compressor summary: Temporal Reward Decomposition (TRD) is a method to predict and explain the future rewards of reinforcement learning agents, such as their expected reward timing, value, confidence, and input feature importance.


Predictive Multiplicity of Knowledge Graph Embeddings in Link Prediction

http://arxiv.org/abs/2408.08226v1

Compressor summary: The paper studies predictive multiplicity in link prediction for knowledge graphs and proposes using voting methods to reduce conflicting predictions.


Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius

http://arxiv.org/abs/2408.08222v1

Compressor summary: The paper proposes a bilevel optimization framework called LETS to learn the perturbation radius for sharpness-aware minimization algorithms, which improves model generalization by searching for flat minima in the loss landscape.


RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

http://arxiv.org/abs/2408.08217v1

Compressor summary: This study proposes a systems design approach to improve classification performance using large language models as imperfect data annotators in various industries.


The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation

http://arxiv.org/abs/2408.08216v1

Compressor summary: The study proposes using Kolmogorov-Arnold Network (KAN) instead of Multi-layer Perceptron (MLP) to improve image-to-image translation in generative AI.


Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices

http://arxiv.org/abs/2408.08215v1

Compressor summary: TinyML is explored as a solution for healthcare support in low connectivity areas by using low-spec devices to classify skin diseases from images without internet access.


Covert Bias: The Severity of Social Views' Unalignment Towards Implicit and Explicit Opinion

http://arxiv.org/abs/2408.08212v1

Compressor summary: The study investigates how implicit language affects bias amplification in large language models and finds that biased models generate more cautious responses when aligned with conflicting opinions but are less reliable on socially nuanced topics.


Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models

http://arxiv.org/abs/2408.08210v1

Compressor summary: The paper proposes a framework to assess how well large language models can reason using probability of necessity and sufficiency concepts, and demonstrates it with math examples.


WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting

http://arxiv.org/abs/2408.08206v1

Compressor summary: The proposed method fuses volumetric rendering with 3D Gaussian Splatting to effectively reconstruct underwater scenes, outperforming state-of-the-art NeRF-based methods in quality and efficiency.


Heavy Labels Out! Dataset Distillation with Label Space Lightening

http://arxiv.org/abs/2408.08201v1

Compressor summary: HeLlO is a framework that generates synthetic labels from images, reducing the storage and data requirements for dataset distillation while maintaining performance.


Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

http://arxiv.org/abs/2408.08192v1

Compressor summary: The paper proposes a novel online learning method called SemiSGD for large-population multi-agent systems, which combines value function and population distribution into one parameter, and shows its theoretical advantages over traditional fixed-point iteration methods.


Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation

http://arxiv.org/abs/2408.08191v1

Compressor summary: The paper proposes a learning-based method to generate infrared small target labels using a single-point annotation paradigm that improves detection and reduces false alarms.


FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

http://arxiv.org/abs/2408.08189v1

Compressor summary: FancyVideo is a text-to-video generator that improves temporal coherence in video synthesis using a Cross-frame Textual Guidance Module with three components.


Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion

http://arxiv.org/abs/2408.08184v1

Compressor summary: The paper proposes a method to measure text-to-image model's originality based on the number of tokens needed to reconstruct an image, inspired by legal definitions of originality.


Machine learning empowered Modulation detection for OFDM-based signals

http://arxiv.org/abs/2408.08179v1

Compressor summary: The text proposes a blind modulation detection method for OFDM-based technologies using a ResNet network that can handle realistic environmental imperfections without prior knowledge of the transmitted signal.


Towards flexible perception with visual memory

http://arxiv.org/abs/2408.08172v1

Compressor summary: Key points: - The authors propose a new way of doing image classification using a database instead of a neural network - Their approach has three main advantages: flexibility, unlearning, and interpretability - They argue that this method could improve how knowledge is represented in deep vision models Summary: The authors introduce a flexible and interpretable image classification method that uses a database to store and search for embeddings, allowing them to add, remove, and intervene on data.


DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

http://arxiv.org/abs/2408.08152v1

Compressor summary: DeepSeek-Prover-V1.5 is an open-source language model that improves theorem proving in Lean 4 by optimizing training, inference, and exploration strategies, achieving state-of-the-art results on miniF2F and ProofNet benchmarks.


Winning Snake: Design Choices in Multi-Shot ASP

http://arxiv.org/abs/2408.08150v1

Compressor summary: Answer set programming techniques are applied to solve the arcade game snake, with five implementations compared and visualized using clingraph.


Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks

http://arxiv.org/abs/2408.08149v1

Compressor summary: VaT is an unsupervised learning method that bridges restoration and high-level vision networks without retraining them, enhancing image quality and performance on degraded environments.


KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning

http://arxiv.org/abs/2408.08146v1

Compressor summary: KOALA improves speculative decoding by optimizing the draft head with multi-layer architecture and adversarial learning, achieving significant latency reduction.


Model-based Workflow for the Automated Generation of PDDL Descriptions

http://arxiv.org/abs/2408.08145v1

Compressor summary: The paper proposes an automatic way to generate PDDL descriptions for planning from integrated system and product models using MBSE.


MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU

http://arxiv.org/abs/2408.08144v1

Compressor summary: The paper introduces MIDAS, a novel approach using multi-level knowledge distillation to improve multi-turn Natural Language Understanding (NLU) in conversations.


Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality

http://arxiv.org/abs/2408.08142v1

Compressor summary: The study evaluates a custom data preprocessing pipeline that improves the accuracy of machine learning models predicting COVID-19 mortality using OWID data.


Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability

http://arxiv.org/abs/2408.08137v1

Compressor summary: The text discusses the limitations of AOPC for evaluating feature attribution faithfulness in deep neural networks, proposing Normalized AOPC (NAOPC) as a more reliable and interpretable alternative.


CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

http://arxiv.org/abs/2408.08134v1

Compressor summary: Key points: - CorrAdaptor is a new architecture for pixel-level correspondences in computer vision and robotics - It uses two branches to learn local contexts: explicit (KNN) and implicit (learnable matrix) - It also has a motion injection module to improve robustness and adaptability - It outperforms previous methods on various tasks Summary: CorrAdaptor is a novel architecture that learns pixel-level correspondences using two branches of local context learning and a motion injection module, achieving state-of-the-art results.


EXPLAIN, AGREE, LEARN: Scaling Learning for Neural Probabilistic Logic

http://arxiv.org/abs/2408.08133v1

Compressor summary: Key points: - Neural probabilistic logic systems combine neural networks and probabilistic logic - Optimize a sampling based objective instead of exact likelihood - Error vanishes with more samples and sample diversity - EXAL method explains, agrees, and learns from explanations - Scales up to larger problems and outperforms previous methods Summary: The paper proposes a novel neural probabilistic logic system that uses sampling based optimization and a new method called EXAL that can explain, agree, and learn from explanations, achieving scalability and performance on complex problems.


Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification

http://arxiv.org/abs/2408.08125v1

Compressor summary: CPRFL is a novel approach for long-tailed multi-label image classification that leverages semantic correlations between categories and refines prompts with contextual visual information to improve recognition performance.


The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks

http://arxiv.org/abs/2408.08119v1

Compressor summary: Neural networks trained on inverse problems can achieve higher accuracy than classical optimizers even on their training data, challenging the assumption that faster inference sacrifices solution quality.


Hearing Your Blood Sugar: Non-Invasive Glucose Measurement Through Simple Vocal Signals, Transforming any Speech into a Sensor with Machine Learning

http://arxiv.org/abs/2408.08109v1

Compressor summary: The study proposes using voice analysis as a non-invasive way to monitor blood glucose levels in people with diabetes, potentially improving their quality of life.


Unsupervised Part Discovery via Dual Representation Alignment

http://arxiv.org/abs/2408.08108v1

Compressor summary: The paper proposes a novel method called PartFormer to learn unsupervised part-specific attention from paired images using geometric and semantic constraints, improving downstream tasks and part discovery performance.


Adaptation of uncertainty-penalized Bayesian information criterion for parametric partial differential equation discovery

http://arxiv.org/abs/2408.08106v1

Compressor summary: The paper introduces an extension of the uncertainty-penalized Bayesian information criterion (UBIC) for efficiently discovering parametric partial differential equations (PDEs) in noisy situations, using data transformation based on power spectral densities and confidence intervals.


Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

http://arxiv.org/abs/2408.08105v1

Compressor summary: The paper introduces MuCR, a new benchmark to test VLLMs' ability to infer cause-and-effect relationships from visual cues using image synthesis and tailored metrics.


When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

http://arxiv.org/abs/2408.08093v1

Compressor summary: The paper introduces a new cross-modality approach to video compression using multimodal language models that separates videos into spatial content and motion components, optimizing quality for specific decoding requirements.


OC3D: Weakly Supervised Outdoor 3D Object Detection with Only Coarse Click Annotation

http://arxiv.org/abs/2408.08092v1

Compressor summary: OC3D is a weakly supervised method for LiDAR-based 3D object detection that uses coarse clicks and achieves state-of-the-art performance with minimal annotation cost.


HAIR: Hypernetworks-based All-in-One Image Restoration

http://arxiv.org/abs/2408.08091v1

Compressor summary: HAIR is a plug-in-and-play method that generates parameters for image restoration models based on the contents of input images, improving their performance on various tasks.


AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

http://arxiv.org/abs/2408.08089v1

Compressor summary: The paper introduces AgentCourt, a simulation system that uses large language models to train lawyer agents in legal skills through adversarial evolutionary processes and courtroom simulations.


ColorMamba: Towards High-quality NIR-to-RGB Spectral Translation with Mamba

http://arxiv.org/abs/2408.08087v1

Compressor summary: ColorMamba is a novel model that uses Mamba, improved padding tokens, local convolutional enhancement, agent attention, and HSV color guidance to achieve better spectral translation in the visible spectrum.


Single-image coherent reconstruction of objects and humans

http://arxiv.org/abs/2408.08086v1

Compressor summary: The paper presents a method to reconstruct 3D scenes with interacting objects and people from a single image, reducing mesh collisions and improving performance.


An Efficient Replay for Class-Incremental Learning with Pre-trained Models

http://arxiv.org/abs/2408.08084v1

Compressor summary: The paper proposes a new method to overcome catastrophic forgetting in class-incremental learning using pre-trained models, replay, and simple gradient constraints.


Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

http://arxiv.org/abs/2408.08078v1

Compressor summary: The paper proposes a novel framework, CTMA, that incorporates motion cues and spatial features for remote sensing change detection using bi-temporal images.


Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

http://arxiv.org/abs/2408.08075v1

Compressor summary: The paper studies how to improve the efficiency of learning Nash equilibria in Markov Potential Games using policy gradient methods.


Extracting Sentence Embeddings from Pretrained Transformer Models

http://arxiv.org/abs/2408.08073v1

Compressor summary: The paper explores various methods to improve sentence embeddings for natural language processing tasks, achieving significant improvements for static token-based models and comparable results for BERT-derived representations.


I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

http://arxiv.org/abs/2408.08072v1

Compressor summary: I-SHEEP is a self-alignment method for large language models that continuously improves their performance without human intervention.


Universality of Real Minimal Complexity Reservoir

http://arxiv.org/abs/2408.08071v1

Compressor summary: Reservoir Computing models can approximate dynamic filters with fading memory, and Simple Cycle Reservoirs, a specialized class with constrained architecture, are universally applicable and suitable for low-complexity hardware implementations.


MambaMIM: Pre-training Mamba with State Space Token-interpolation

http://arxiv.org/abs/2408.08070v1

Compressor summary: MambaMIM is a generative self-supervised learning method for selective state space models that improves long-range dependency representation and can be used for pre-training medical image tasks.


RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

http://arxiv.org/abs/2408.08067v1

Compressor summary: RAGChecker is a framework to evaluate Retrieval-Augmented Generation (RAG) systems by measuring their retrieval and generation modules, which reveals patterns and trade-offs in RAG design choices and can help improve them.


Maximally Permissive Reward Machines

http://arxiv.org/abs/2408.08059v1

Compressor summary: The paper proposes a new method to generate reward machines from multiple plans, which leads to higher rewards for learning agents compared to methods based on a single plan.


Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging

http://arxiv.org/abs/2408.08058v1

Compressor summary: The study compares different foundation models' performance in few-shot and zero-shot learning for medical image analysis tasks, finding that BiomedCLIP works best with very small training sets while CLIP models perform better with slightly more samples.


DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

http://arxiv.org/abs/2408.08056v1

Compressor summary: DATTA is a new method for test-time adaptation that adapts batch normalization and fine-tuning strategies based on the diversity score of input data, improving accuracy and Quality of Experience.


COTODE: COntinuous Trajectory neural Ordinary Differential Equations for modelling event sequences

http://arxiv.org/abs/2408.08055v1

Compressor summary: The paper proposes a new model for event sequences that considers actor dynamics governed by a Gaussian Process and incorporates uncertainty estimation and negative feedback for improved performance.


Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework

http://arxiv.org/abs/2408.08054v1

Compressor summary: Text2BIM is a framework that uses LLMs and multi-agent reasoning to generate high-quality 3D building models from natural language instructions, enhancing the design process in the AEC industry.


CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

http://arxiv.org/abs/2408.08050v1

Compressor summary: CamoTeacher is a novel semi-supervised object detection method that uses dual-rotation consistency learning to reduce noise and achieve state-of-the-art results.


An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation

http://arxiv.org/abs/2408.08047v1

Compressor summary: The paper proposes a continuous control framework for sequential recommendation using reinforcement learning that improves efficiency and long-term user engagement.


The Clever Hans Effect in Unsupervised Learning

http://arxiv.org/abs/2408.08041v1

Compressor summary: Unsupervised learning models may produce accurate predictions but with hidden biases that can lead to Clever Hans effects; using Explainable AI techniques, researchers found widespread CH effects and suggest ways to improve model robustness.


An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition

http://arxiv.org/abs/2408.08035v1

Compressor summary: The text proposes a novel three-stream hybrid model that combines pixel and skeleton-based features to recognize hand gestures, addressing challenges like dataset limitations and varying lighting conditions.