arxiv compressed, 2024-08-20

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-20 generated by the compressor, my personal LLM-based project.


KAN 2.0: Kolmogorov-Arnold Networks Meet Science

http://arxiv.org/abs/2408.10205v1

Compressor summary: The text proposes a framework that combines connectionist AI (Kolmogorov-Arnold Networks) with science for discovering features, structures, and formulas in physical laws.


Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

http://arxiv.org/abs/2408.10204v1

Compressor summary: CLAT is an approach that improves both clean accuracy and adversarial robustness of neural networks by fine-tuning only critical layers, reducing parameters, and adapting to changes in layer criticality.


SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

http://arxiv.org/abs/2408.10202v1

Compressor summary: The paper proposes a new method called SANER to reduce societal bias in CLIP without losing attribute information or using attribute annotations during debiasing.


MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

http://arxiv.org/abs/2408.10198v1

Compressor summary: MeshFormer is a sparse-view 3D reconstruction model that uses transformers, 3D convolutions, input normal maps, and SDF supervision to efficiently train and generate high-quality textured meshes with geometric details.


Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

http://arxiv.org/abs/2408.10189v1

Compressor summary: Key points: - The paper proposes MOHAWK, a method to distill pretrained Transformers into alternative architectures like SSMs. - MOHAWK matches different degrees of granularity in the mixing matrices and hidden units of Transformers and SSMs. - Phi-Mamba, a Mamba-2 variant, achieves strong performance with less than 1% of the training data typically used for non-Transformer models. Summary: MOHAWK learns to convert pretrained Transformers into state space models by matching their mixing matrices and hidden units, enabling Phi-Mamba to outperform past non-Transformer models with much less data.


LongVILA: Scaling Long-Context Visual Language Models for Long Videos

http://arxiv.org/abs/2408.10188v1

Compressor summary: LongVILA is a system that enables efficient long-context training and inference for vision-language models, improving performance on tasks like long video captioning.


Assessment of Spectral based Solutions for the Detection of Floating Marine Debris

http://arxiv.org/abs/2408.10187v1

Compressor summary: The text discusses using remote sensing data and Machine Learning algorithms to detect floating plastic debris in the ocean, and introduces the Marine Debris Archive as a standard dataset for evaluating these methods.


NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction

http://arxiv.org/abs/2408.10178v1

Compressor summary: NeuRodin is a new neural framework that improves surface reconstruction in volume rendering by addressing challenges in SDF-based methods and retaining flexibility of density-based methods.


Fairness Under Cover: Evaluating the Impact of Occlusions on Demographic Bias in Facial Recognition

http://arxiv.org/abs/2408.10175v1

Compressor summary: The study shows that occlusions worsen face recognition system fairness by increasing bias against some demographic groups, especially Africans.


SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

http://arxiv.org/abs/2408.10174v1

Compressor summary: The study proposes a novel zero-shot model fusion method called SMILE that addresses parameter interference issues and improves performance without extra training data or parameters.


NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

http://arxiv.org/abs/2408.10161v1

Compressor summary: The paper presents NeuFlow v2, an efficient optical flow method that balances high accuracy with reduced computational costs, achieving 10x-70x speedup and running at over 20 FPS on a Jetson Orin Nano.


LoopSplat: Loop Closure by Registering 3D Gaussian Splats

http://arxiv.org/abs/2408.10154v1

Compressor summary: LoopSplat improves 3D scene mapping accuracy by triggering loop closure online and aligning submaps via 3D Gaussian Splats registration.


Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

http://arxiv.org/abs/2408.10153v1

Compressor summary: The paper proposes a method to translate unrealistic synthetic data to realistic clinical data for training monocular depth estimation in colonoscopy videos, improving its generalization to the clinical domain.


Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models

http://arxiv.org/abs/2408.10151v1

Compressor summary: The MLNeedle test evaluates LLMs' ability to retrieve relevant information from multilingual contexts and reveals their limitations in handling long contexts across languages.


In-Context Learning with Representations: Contextual Generalization of Trained Transformers

http://arxiv.org/abs/2408.10147v1

Compressor summary: This paper studies how transformer models can learn new tasks from few examples during inference, using non-linear regression to show they can acquire contextual knowledge for generalization.


Multi-Scale Representation Learning for Image Restoration with State-Space Model

http://arxiv.org/abs/2408.10145v1

Compressor summary: MS-Mamba is a novel image restoration method that uses a multi-scale state-space model and adaptive gradient block to enhance detail and contrast, achieving state-of-the-art results with low computational complexity.


Instruction Finetuning for Leaderboard Generation from Empirical AI Research

http://arxiv.org/abs/2408.10141v1

Compressor summary: The study shows how to use large language models to automatically generate leaderboard data from AI research articles, improving the speed and accuracy of information extraction.


$R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement

http://arxiv.org/abs/2408.10135v1

Compressor summary: The text introduces a new algorithm that improves NeRF's ability to generate and optimize meshes from multi-view images by refining SDF and appearance representation, and adaptively incorporating additional images for training.


Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images

http://arxiv.org/abs/2408.10134v1

Compressor summary: The paper proposes a new model, DQI, to measure depth quality in stereoscopic omnidirectional images, which improves upon existing methods and considers human visual system characteristics.


Rhyme-aware Chinese lyric generator based on GPT

http://arxiv.org/abs/2408.10130v1

Compressor summary: The paper proposes a method to improve lyric generation by integrating rhyme information into a pre-trained language model.


Learning Brave Assumption-Based Argumentation Frameworks via ASP

http://arxiv.org/abs/2408.10126v1

Compressor summary: The paper proposes a new algorithm for learning assumption-based argumentation from background knowledge and examples using transformation rules and answer set programming.


Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models

http://arxiv.org/abs/2408.10124v1

Compressor summary: MolGraph-LarDo integrates Large language models and Domain-specific small models for accurate molecular property prediction by using a two-stage prompt strategy and multi-modal alignment.


Geometry Informed Tokenization of Molecules for Language Model Generation

http://arxiv.org/abs/2408.10120v1

Compressor summary: The paper proposes Geo2Seq, a method to convert 3D molecular geometries into discrete sequences, enabling better generation of molecules using language models.


Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

http://arxiv.org/abs/2408.10119v1

Compressor summary: Key points: - T2V generation is challenging due to real world complex motions - The paper proposes Factorized-Dreamer, a framework that uses limited and low-quality data to generate HQ videos - Factorized-Dreamer has several designs, such as adapter, cross attention, T5 encoder, and PredictNet - Noise schedule is used to ensure quality and stability of video generation - Experiments show the effectiveness of Factorized-Dreamer on various tasks Summary: The paper presents Factorized-Dreamer, a framework that can generate high-quality videos from limited and low-quality data using text and image embeddings, cross attention, T5 encoder, PredictNet, and noise schedule.


GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

http://arxiv.org/abs/2408.10115v1

Compressor summary: GLIMMER is an unsupervised multi-document summarization approach that uses sentence graphs and semantic clusters to generate fluent and informative summaries, outperforming existing methods and pre-trained models.


Enhancing Reinforcement Learning Through Guided Search

http://arxiv.org/abs/2408.10113v1

Compressor summary: The paper proposes using Monte Carlo Tree Search (MCTS) as a guide for Reinforcement Learning (RL) agents to improve performance, especially in Off-Policy settings, and shows significant results on Atari 100k benchmark.


PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities

http://arxiv.org/abs/2408.10111v1

Compressor summary: PLUTUS is a large-scale, open-source, transformer-based model that uses contrastive learning and attention mechanisms to capture complex patterns in financial time series data.


Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

http://arxiv.org/abs/2408.10107v1

Compressor summary: MixDiff is a framework for detecting out-of-distribution inputs for machine learning models without accessing their parameters or activations, by applying input-level perturbations and comparing the model outputs of two similar samples.


ARMADA: Attribute-Based Multimodal Data Augmentation

http://arxiv.org/abs/2408.10086v1

Compressor summary: ARMADA is a novel multimodal data augmentation method that generates semantically consistent image-text pairs by manipulating visual attributes of entities using knowledge bases and large language models.


MASALA: Model-Agnostic Surrogate Explanations by Locality Adaptation

http://arxiv.org/abs/2408.10085v1

Compressor summary: MASALA is a new XAI method that automatically determines the appropriate local region for explaining each instance, unlike existing methods that require a user-defined locality size.


TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization

http://arxiv.org/abs/2408.10084v1

Compressor summary: TANGO is a density-based clustering algorithm that uses global typicality to improve local dependencies, achieving better peak selection and sub-cluster characterization than mode-seeking methods.


Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

http://arxiv.org/abs/2408.10075v1

Compressor summary: Multimodal RLHF methods use a latent variable formulation to infer user-specific preferences and learn reward models and policies tailored to each individual, improving alignment with diverse populations.


Modelling the Distribution of Human Motion for Sign Language Assessment

http://arxiv.org/abs/2408.10073v1

Compressor summary: The paper presents a new Sign Language Assessment tool that models natural human motion and provides useful feedback for language learners.


FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant

http://arxiv.org/abs/2408.10072v1

Compressor summary: The paper introduces a new face forgery analysis task with descriptions and reasoning, and proposes an assistive system based on a multimodal language model and a decision system that provides user-friendly and explainable results.


Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm

http://arxiv.org/abs/2408.10055v1

Compressor summary: The text discusses the theoretical foundations of deep reinforcement learning, focusing on exploration methods, and proposing a novel Bayesian actor-critic algorithm with empirical evaluation on benchmarks.


Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory

http://arxiv.org/abs/2408.10053v1

Compressor summary: The paper proposes a comprehensive checklist for contextual integrity-based privacy research that covers social identities, private attributes, and existing regulations using large language models.


Exploiting Fine-Grained Prototype Distribution for Boosting Unsupervised Class Incremental Learning

http://arxiv.org/abs/2408.10046v1

Compressor summary: The paper proposes a method for unsupervised class incremental learning that uses fine-grained prototypes, granularity alignment, and strategy to minimize overlap between classes to discover unknown novel classes and preserve historical knowledge.


Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation

http://arxiv.org/abs/2408.10041v1

Compressor summary: Implicit Gaussian Splatting (IGS) is an efficient and compact model for photo-realistic novel view synthesis that integrates explicit point clouds with implicit feature embeddings using a multi-level tri-plane architecture and progressive training scheme, achieving high rendering quality while consuming only a few MBs.


The Practimum-Optimum Algorithm for Manufacturing Scheduling: A Paradigm Shift Leading to Breakthroughs in Scale and Performance

http://arxiv.org/abs/2408.10040v1

Compressor summary: The P-O algorithm creates virtual human expert agents to generate many valid schedules and uses reinforced machine learning to improve them, achieving breakthrough performance in automatic manufacturing scheduling.


MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis

http://arxiv.org/abs/2408.10039v1

Compressor summary: The paper introduces a multi-step clinical diagnostic dataset (MSDiagnosis) and a framework that combines forward and backward inference with reflection and refinement to improve the performance of language models in complex medical diagnosis tasks.


Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

http://arxiv.org/abs/2408.10015v1

Compressor summary: The paper presents a new method for finding optimal deterministic policies for continuous-state and -action constrained MDPs using a primal-dual policy gradient approach, which is applied to robot navigation and fluid control problems.


CLIPCleaner: Cleaning Noisy Labels with CLIP

http://arxiv.org/abs/2408.10012v1

Compressor summary: The paper proposes CLIPCleaner, a method that uses a vision-language model to select clean samples from noisy labels, outperforming existing methods on benchmark datasets.


PinnDE: Physics-Informed Neural Networks for Solving Differential Equations

http://arxiv.org/abs/2408.10011v1

Compressor summary: PinnDE is an open-source python library for solving differential equations using physics-informed neural networks (PINNs) and deep operator networks (DeepONets).


P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders

http://arxiv.org/abs/2408.10007v1

Compressor summary: Key points: - propose self-supervised pre-training framework using real 3D data and pseudo-3D data from images - use efficient token embedding and 2D reconstruction target to overcome data scaling and efficiency challenges - achieve state-of-the-art performance in 3D perception tasks Summary: The paper presents a self-supervised pre-training method for 3D perception that leverages real and pseudo-3D data, and improves efficiency with novel token embedding and reconstruction target, leading to superior results.


Unlocking the Power of LSTM for Long Term Time Series Forecasting

http://arxiv.org/abs/2408.10006v1

Compressor summary: P-sLSTM is a modified sLSTM algorithm that improves time series forecasting by incorporating patching and channel independence, achieving state-of-the-art results with theoretical justifications.


Towards a Knowledge Graph for Models and Algorithms in Applied Mathematics

http://arxiv.org/abs/2408.10003v1

Compressor summary: The text describes how a living knowledge graph was created by merging and extending ontologies to represent mathematical models and algorithms semantically and enrich them with metadata, including subject-specific properties.


The Fairness-Quality Trade-off in Clustering

http://arxiv.org/abs/2408.10002v1

Compressor summary: The paper introduces algorithms to find trade-offs between quality and fairness in clustering problems by exploring the complete Pareto front.


Uniting contrastive and generative learning for event sequences models

http://arxiv.org/abs/2408.09995v1

Compressor summary: The study combines two self-supervised methods to create balanced representations of transactional data, improving performance in tasks like sequence classification and next-event prediction in banking applications.


Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

http://arxiv.org/abs/2408.09984v1

Compressor summary: The paper proposes a method to improve open-domain continual learning in vision-language models by using category-aware prototypes as Task-ID discriminators and domain prior prompts.


Preference-Optimized Pareto Set Learning for Blackbox Optimization

http://arxiv.org/abs/2408.09976v1

Compressor summary: The paper proposes an efficient method to approximate the whole Pareto set in multi-objective optimization problems using bilevel optimization and differentiable cross-entropy methods, which improves upon previous naive approaches.


The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

http://arxiv.org/abs/2408.09974v1

Compressor summary: AdaZero is an end-to-end adaptive framework for reinforcement learning that balances exploration and exploitation based on entropy, achieving significant improvements in various environments.


Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique

http://arxiv.org/abs/2408.09967v1

Compressor summary: This paper proposes a new method that combines linear programming and an unsupervised machine learning model to solve complex optimization problems with constraints while preserving interpretability and adapting to different scenarios.


Mask in the Mirror: Implicit Sparsification

http://arxiv.org/abs/2408.09966v1

Compressor summary: The paper proposes a method to control the sparsity of deep neural networks using continuous sparsification and shows improved performance in high-sparsity scenarios.


AdaResNet: Enhancing Residual Networks with Dynamic Weight Adjustment for Improved Feature Integration

http://arxiv.org/abs/2408.09958v1

Compressor summary: AdaResNet automatically adjusts the ratio between input pixels and transformed data in skip connections, improving the performance of deep neural networks.


Contextual Importance and Utility in Python: New Functionality and Insights with the py-ciu Package

http://arxiv.org/abs/2408.09957v1

Compressor summary: The paper presents py-ciu, a Python tool for generating explanations from machine learning models using the CIU method, which offers novel features compared to existing methods.


Principle Driven Parameterized Fiber Model based on GPT-PINN Neural Network

http://arxiv.org/abs/2408.09951v1

Compressor summary: The paper proposes a new AI-based fiber model for Beyond 5G communications that reduces re-training time and increases efficiency by using linear combinations of pre-trained models.


C${^2}$RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval

http://arxiv.org/abs/2408.09949v1

Compressor summary: C${^2}$RL is a novel pretraining paradigm for gloss-free Sign Language Representation Learning that emphasizes Implicit Content Learning and Explicit Context Learning to improve performance in tasks like Sign Language Translation and Sign Language Retrieval.


Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision

http://arxiv.org/abs/2408.09948v1

Compressor summary: The paper introduces a dataset and method to study and predict human attention during image captioning tasks using CLIP models and NeVA algorithms, improving existing models.


Fiber Transmission Model with Parameterized Inputs based on GPT-PINN Neural Network

http://arxiv.org/abs/2408.09947v1

Compressor summary: The paper proposes a new fiber transmission model that can handle different bit rates without retraining and uses universal solutions based on the novelty principle.


Microscopic Analysis on LLM players via Social Deduction Game

http://arxiv.org/abs/2408.09946v1

Compressor summary: The text describes an approach to evaluate autonomous game players for social deduction games using a variant of SpyFall called SpyGame, introducing new metrics and qualitative analysis methods to assess their skills in intent identification and camouflage.


Benchmarking LLMs for Translating Classical Chinese Poetry:Evaluating Adequacy, Fluency, and Elegance

http://arxiv.org/abs/2408.09945v1

Compressor summary: The authors introduce a benchmark for translating classical Chinese poetry into English and propose RAT, a retrieval-augmented machine translation method that improves translation quality using knowledge about classical poetry and an automatic evaluation metric based on GPT-4.


ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

http://arxiv.org/abs/2408.09940v1

Compressor summary: The ML-CrAIST model uses spatial and channel self-attention and a cross-attention block to effectively utilize multi-scale image details and improve single-image super-resolution performance, outperforming state-of-the-art methods.


"Image, Tell me your story!" Predicting the original meta-context of visual misinformation

http://arxiv.org/abs/2408.09939v1

Compressor summary: The researchers propose a new automated method to provide context for images, which can help detect misinformation and support fact-checking efforts.


Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

http://arxiv.org/abs/2408.09929v1

Compressor summary: The paper explores how to learn beneficial noise for contrastive learning using Positive-incentive Noise (Pi-Noise) and proposes a framework to generate such noise as data augmentations.


Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

http://arxiv.org/abs/2408.09920v1

Compressor summary: The paper proposes a human visual attention estimation strategy to improve existing image quality assessment models by measuring the statistical dependency between degraded and reference images.


Expressive Power of Temporal Message Passing

http://arxiv.org/abs/2408.09918v1

Compressor summary: The paper analyzes how two types of temporal message passing mechanisms in graph neural networks differ in their expressive power and performance on color-persistent temporal graphs.


Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

http://arxiv.org/abs/2408.09916v1

Compressor summary: VisEdit is a novel model editor for vision-language models that edits intermediate visual representations in relevant regions to correct knowledge, based on attribution analysis showing the importance of these representations for token predictions.


Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning

http://arxiv.org/abs/2408.09914v1

Compressor summary: The study investigates the potential of Active Learning for identifying disaster-related posts on social media, finding that it outperforms other methods with minimal labelling effort.


$p$SVM: Soft-margin SVMs with $p$-norm Hinge Loss

http://arxiv.org/abs/2408.09908v1

Compressor summary: The paper explores properties and performance of soft-margin SVMs with $p$-norm hinge loss, called $p$SVMs, and proposes a generalized version of the SMO algorithm to train them.


LCE: A Framework for Explainability of DNNs for Ultrasound Image Based on Concept Discovery

http://arxiv.org/abs/2408.09899v1

Compressor summary: The Lesion Concept Explainer (LCE) framework combines attribution and concept-based methods to explain the decisions of Deep Neural Networks for medical images, especially ultrasound images, using a fine-tuned Segment Anything Model (SAM).


Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model

http://arxiv.org/abs/2408.09896v1

Compressor summary: The UTGDiff model generates molecules from textual instructions using a unified text-graph transformer derived from language models, achieving better performance than sequence-based methods with fewer parameters.


Performance Law of Large Language Models

http://arxiv.org/abs/2408.09895v1

Compressor summary: The authors propose a "Performance Law" equation that predicts the MMLU score of large language models based on their architecture and training data size, helping in selecting architectures and allocating computational resources efficiently.


Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

http://arxiv.org/abs/2408.09891v1

Compressor summary: The paper explores optimal rates for differential privacy optimization with heavy-tailed gradients using a simple clipping method and an iterative updating method, improving on existing methods and matching the minimax lower bound.


GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits

http://arxiv.org/abs/2408.09882v1

Compressor summary: GINO-Q is a new algorithm that efficiently learns optimal policies for restless multi-armed bandits without requiring indexability, outperforming existing methods.


Uncertainty Quantification of Pre-Trained and Fine-Tuned Surrogate Models using Conformal Prediction

http://arxiv.org/abs/2408.09881v1

Compressor summary: The text proposes a method to estimate reliable uncertainty for spatio-temporal surrogate models using conformal prediction with minimal computational cost and broad applicability.


New spectral imaging biomarkers for sepsis and mortality in intensive care

http://arxiv.org/abs/2408.09873v1

Compressor summary: Hyperspectral imaging can predict sepsis and mortality rates by monitoring microcirculatory changes in the palm and fingers, improving diagnosis and treatment management.


Docling Technical Report

http://arxiv.org/abs/2408.09869v1

Compressor summary: Docling is an open-source package that converts PDF documents using AI models for layout analysis and table structure recognition, running efficiently on common hardware.


MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation

http://arxiv.org/abs/2408.09865v1

Compressor summary: The paper proposes a model called MAPLE that generates fine-grained explanations for recommending items to users, using aspect categories as input and achieving better performance than existing review-generation models.


ShortCircuit: AlphaZero-Driven Circuit Design

http://arxiv.org/abs/2408.09858v1

Compressor summary: ShortCircuit is a transformer-based architecture that uses supervised and reinforcement learning to generate efficient Boolean circuits from truth tables, outperforming existing tools.


TaSL: Continual Dialog State Tracking via Task Skill Localization and Consolidation

http://arxiv.org/abs/2408.09857v1

Compressor summary: TaSL is a framework that improves Continual Dialogue State Tracking by using group-wise techniques and skill consolidation to balance knowledge preservation and adaptation.


TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

http://arxiv.org/abs/2408.09856v1

Compressor summary: TeamLoRA is a novel PEFT method that combines collaboration and competition among task-specific LoRA modules to enhance multi-task learning efficiency and performance.


Self-Directed Turing Test for Large Language Models

http://arxiv.org/abs/2408.09853v1

Compressor summary: The Self-Directed Turing Test is a new way to evaluate Large Language Models' human-like behaviour in natural language conversations by allowing more dynamic exchanges and reducing human involvement.


Importance Weighting Can Help Large Language Models Self-Improve

http://arxiv.org/abs/2408.09849v1

Compressor summary: Key points: - Large language models (LLMs) are useful but costly to fine-tune with external supervision - LLM self-improvement involves training on self-generated data, which may have low quality - The paper proposes a new metric called DS weight to filter out correct but highly shifted samples - The approach improves reasoning ability and competes with methods using pre-trained reward models Summary: The paper introduces DS weight, a new metric to filter self-generated data for LLM self-improvement, which enhances reasoning and rivals external supervision.


Continual Dialogue State Tracking via Reason-of-Select Distillation

http://arxiv.org/abs/2408.09846v1

Compressor summary: The paper introduces Reason-of-Select distillation, a method that enhances dialogue systems with meta-reasoning and domain bootstrapping to improve continual learning and mitigate forgetting.


Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

http://arxiv.org/abs/2408.09841v1

Compressor summary: The paper applies xAI frameworks to explain the reasoning behind scheduling decisions of a DRL agent, but finds that current methods lack falsifiability, consistent terminology, and causal interpretations; they propose a hypotheses-based workflow to address these issues.


Machine Learning with Physics Knowledge for Prediction: A Survey

http://arxiv.org/abs/2408.09840v1

Compressor summary: The survey explores various methods and models for combining machine learning with physics knowledge to improve prediction and forecast using partial differential equations, considering both architectural and data-driven approaches and their industrial applications.


Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

http://arxiv.org/abs/2408.09838v1

Compressor summary: CDE is a new algorithm that uses curriculum learning and Q-function subspaces to improve learning efficiency and adaptability in complex multi-agent domains like train scheduling.


Minor DPO reject penalty to increase training robustness

http://arxiv.org/abs/2408.09834v1

Compressor summary: The text describes a method called Direct Preference Optimization (DPO) for fine-tuning language models based on human preferences without reinforcement learning, and proposes MinorDPO as an improvement to address some limitations.


TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

http://arxiv.org/abs/2408.09825v1

Compressor summary: The paper proposes TDNetGen, a novel framework that uses generative data augmentation to predict network resilience without needing labeled data or prior knowledge of network dynamics.


SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models

http://arxiv.org/abs/2408.09822v1

Compressor summary: SurgicaL-CD is a new method to create realistic surgical images for training machine learning models using diffusion and consistency distillation, improving quality and utility over previous methods.


Symplectic Neural Networks Based on Dynamical Systems

http://arxiv.org/abs/2408.09821v1

Compressor summary: SympNets are a new type of neural network that can approximate symplectic maps and solve Hamiltonian systems more accurately and efficiently than existing methods.


CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

http://arxiv.org/abs/2408.09819v1

Compressor summary: Key points: - Paper introduces CMoralEval, a large and diverse moral evaluation dataset for Chinese LLMs - Data sources are TV program discussing moral norms and newspaper articles on moral anomalies - Morality taxonomy and principles based on traditional Chinese culture and contemporary norms - Platform with AI-assisted instance generation and annotation to streamline construction and evaluation of CMoralEval - Experiments show that CMoralEval is a challenging benchmark for Chinese LLMs Summary: The paper presents CMoralEval, a new dataset to test the morality of Chinese LLMs using TV programs and newspaper articles as data sources, with a morality taxonomy and principles derived from Chinese culture and norms.


Liquid Fourier Latent Dynamics Networks for fast GPU-based numerical simulations in computational cardiology

http://arxiv.org/abs/2408.09818v1

Compressor summary: Liquid Fourier LDNets (LFLDNets) are an extension of Latent Dynamics Networks for creating surrogate models of complex differential equations, improving performance, accuracy, and efficiency in computational cardiology applications.


A Population-to-individual Tuning Framework for Adapting Pretrained LM to On-device User Intent Prediction

http://arxiv.org/abs/2408.09815v1

Compressor summary: PITuning is a framework that uses pre-trained language models to improve user intent prediction on smartphones by adapting to diverse event sequences and addressing long-tailed preferences.


World Models Increase Autonomy in Reinforcement Learning

http://arxiv.org/abs/2408.09807v1

Compressor summary: Key points: - Reinforcement learning (RL) is a paradigm for training intelligent agents from experience - Model-based (MB) RL methods are better suited for reset-free setting than previous methods - MoReFree agent adapts exploration and policy learning to prioritize task-relevant states - MoReFree outperforms privileged baselines with less supervision and data Summary: MoReFree is a model-based reset-free RL agent that learns from experience and prioritizes task-relevant states, achieving superior performance with minimal supervision.


Latent Diffusion for Guided Document Table Generation

http://arxiv.org/abs/2408.09800v1

Compressor summary: This paper proposes a novel method to generate realistic and annotated images of complex table structures using latent diffusion models, which improves the performance of object detection models like YOLOv5.


Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting

http://arxiv.org/abs/2408.09798v1

Compressor summary: Text-centric adversarial training improves robustness of multimodal models by converting diverse inputs into unified textual representation despite noise, order changes, and missing modalities.


AutoML-guided Fusion of Entity and LLM-based representations

http://arxiv.org/abs/2408.09794v1

Compressor summary: This paper shows how incorporating knowledge base information into language models improves text classification and enables faster, efficient classifiers using matrix factorization.


Unsupervised Composable Representations for Audio

http://arxiv.org/abs/2408.09792v1

Compressor summary: Key points: - The paper proposes a framework for compositional representation learning for music data using generative models - The framework can perform unsupervised audio source separation, generation, and variation generation - The framework achieves comparable or superior performance to other methods and has lower computational cost Summary: The paper presents a novel framework that leverages generative models and compositional representation learning for music data, enabling unsupervised source separation, generation, and variation generation with high quality and low computational cost.


Structure-enhanced Contrastive Learning for Graph Clustering

http://arxiv.org/abs/2408.09790v1

Compressor summary: SECL is a novel contrastive learning method for graph clustering that leverages network structures and outperforms existing methods.


Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

http://arxiv.org/abs/2408.09787v1

Compressor summary: The text describes an autonomous animation-making agent called Anim-Director that uses large multimodal models and generative AI tools to create coherent and context-rich animations from concise narratives or simple instructions.


Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

http://arxiv.org/abs/2408.09786v1

Compressor summary: The paper proposes a method to learn disentangled visual features across different compositions using a compositional graph and CLIP with adapters, improving Compositional Zero-shot Learning performance.


GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making

http://arxiv.org/abs/2408.09785v1

Compressor summary: GoNoGo is a Large Language Model agent system that automates software deployment decisions in the automotive industry, reducing costs and delays while meeting functional and industrial constraints.


Summarizing long regulatory documents with a multi-step pipeline

http://arxiv.org/abs/2408.09777v1

Compressor summary: The paper proposes a two-step approach to summarize long regulatory texts and shows that its effectiveness depends on the encoder-decoder model used, highlighting challenges in evaluating generated texts.


Faster Adaptive Decentralized Learning Algorithms

http://arxiv.org/abs/2408.09775v1

Compressor summary: The paper proposes new adaptive decentralized algorithms for distributed machine learning tasks and proves their near-optimal sample complexity.


Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?

http://arxiv.org/abs/2408.09773v1

Compressor summary: The paper analyzes how large language models perceive their knowledge boundaries through probabilistic and verbalized confidence, finding that probabilistic perception is more accurate but both are affected by question frequency and natural language expression challenges.


MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions

http://arxiv.org/abs/2408.09768v1

Compressor summary: The paper proposes MalLight, a novel traffic signal control framework that uses reinforcement learning to optimize the functioning of surrounding signals and reduce congestion and collisions caused by malfunctioning signals.


Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations

http://arxiv.org/abs/2408.09765v1

Compressor summary: The paper proposes IBWS, an iterative method for robustly ranking elements using crowd-sourced data, and evaluates cheaper direct assessment methods that can scale to large datasets.


Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning

http://arxiv.org/abs/2408.09757v1

Compressor summary: This study shows how changing demonstrations in in-context learning can improve the fairness of large language models without losing accuracy and proposes a new technique to curate diverse data samples for better performance and fairness.


A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method

http://arxiv.org/abs/2408.09752v1

Compressor summary: The paper introduces IrisGeneral, a comprehensive dataset for iris anti-spoofing evaluation, and proposes Masked-MoE, a novel method to improve generalization across devices and racial groups using multiple sub-neural networks.


Enhanced Cascade Prostate Cancer Classifier in mp-MRI Utilizing Recall Feedback Adaptive Loss and Prior Knowledge-Based Feature Extraction

http://arxiv.org/abs/2408.09746v1

Compressor summary: The authors propose a solution for automated prostate cancer grading in mpMRI that incorporates prior knowledge, addresses data imbalance, and maintains interpretability using feature extraction, adaptive feedback loss, and an enhanced cascade classifier.


RealCustom++: Representing Images as Real-Word for Real-Time Customization

http://arxiv.org/abs/2408.09744v1

Compressor summary: RealCustom++ is a new method for text-to-image customization that uses real words instead of pseudo-words to improve both subject similarity and text controllability in generated images.


R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

http://arxiv.org/abs/2408.09743v1

Compressor summary: The paper proposes a novel context-guided efficient X-ray medical report generation framework using Mamba as the vision backbone and context retrieval from the training set to enhance feature representation and generate high-quality reports.


Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs

http://arxiv.org/abs/2408.09742v1

Compressor summary: The paper develops and evaluates paired completion, a novel method for detecting and quantifying issue framing in textual discourse using next-token log probabilities from generative large language models, which shows promising results in scalability, accuracy and low bias.


TraDiffusion: Trajectory-Based Training-Free Image Generation

http://arxiv.org/abs/2408.09739v1

Compressor summary: TraDiffusion is a training-free method for controlling image generation with mouse movements that can manipulate various aspects of the image while following a specified trajectory.


Mutually-Aware Feature Learning for Few-Shot Object Counting

http://arxiv.org/abs/2408.09734v1

Compressor summary: MAFEA is a novel framework for few-shot object counting that encodes query and exemplar features mutually aware of each other, reducing target confusion and achieving state-of-the-art performance on two benchmarks.


sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

http://arxiv.org/abs/2408.09723v1

Compressor summary: The paper proposes sTransformer, a new Transformer-based model with STCN and Sequence-guided Mask Attention to improve long-term time-series forecasting by capturing sequential and temporal information.


Towards Few-Shot Learning in the Open World: A Review and Beyond

http://arxiv.org/abs/2408.09722v1

Compressor summary: This paper reviews recent progress in adapting few-shot learning methods for open-world settings, where data is uncertain, incomplete, and dynamic, and discusses the challenges, strengths, and weaknesses of three types of open-world few-shot learning approaches.


Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework

http://arxiv.org/abs/2408.09720v1

Compressor summary: The paper introduces MSP60K, a large-scale cross-domain pedestrian attribute recognition dataset, and proposes LLM-PAR, a framework that combines vision transformers with language models for better performance.


SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

http://arxiv.org/abs/2408.09717v1

Compressor summary: The paper proposes a novel Semantic-Aware Dual Encoder Model (SEMDR) that uses a legal clue tracing mechanism to conduct fine-grained semantic reasoning between criminal facts and instruments for accurate Legal Judgment Prediction (LJP).


HYDEN: Hyperbolic Density Representations for Medical Images and Reports

http://arxiv.org/abs/2408.09715v1

Compressor summary: HYDEN uses hyperbolic space to learn image-text representations that handle semantic uncertainty in the medical domain, outperforming baselines on zero-shot tasks.


Dataset Distillation for Histopathology Image Classification

http://arxiv.org/abs/2408.09709v1

Compressor summary: The paper introduces Histo-DD, a novel dataset distillation algorithm for histopathology image analysis that improves compatibility with high colour heterogeneity and generates more informative synthetic samples than previous methods.


MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model

http://arxiv.org/abs/2408.09706v1

Compressor summary: MePT is a novel method that uses diverse visual prompts to improve VLMs' generalization ability for various downstream tasks.


Community-Centric Graph Unlearning

http://arxiv.org/abs/2408.09705v1

Compressor summary: Key points: - Graph unlearning technology is important for privacy and security of AI - Existing methods are inefficient and lack structural information - The paper proposes a novel framework called Community-centric Graph Eraser (CGE) that reduces data and parameters Summary: The paper introduces CGE, a new graph unlearning method that efficiently eliminates specific data from graph neural networks by mapping community subgraphs to nodes.


Partial-Multivariate Model for Forecasting

http://arxiv.org/abs/2408.09703v1

Compressor summary: PMformer is a new Transformer-based model that captures partial relationships among some time-series features and achieves better forecasting results than existing univariate or complete-multivariate models, while also being efficient and robust to missing data.


Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

http://arxiv.org/abs/2408.09702v1

Compressor summary: The text describes a method to realistically insert virtual objects into real-world images by using a diffusion model to guide an inverse rendering process that recovers scene lighting and other parameters, improving the appearance of the composited object.


Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

http://arxiv.org/abs/2408.09701v1

Compressor summary: The paper proposes a zero-shot cross-lingual approach using a neural projection technique to improve code generation for non-English prompts, addressing biases and limitations of current Large Language Models.


LightWeather: Harnessing Absolute Positional Encoding to Efficient and Scalable Global Weather Forecasting

http://arxiv.org/abs/2408.09695v1

Compressor summary: The paper proposes LightWeather, a lightweight and effective Transformer-based model for global weather forecasting, using absolute positional encoding to capture spatial-temporal correlations without attention mechanisms.


Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

http://arxiv.org/abs/2408.09688v1

Compressor summary: The authors propose a task to convert spoken language transcripts into written text with improved readability and evaluate the performance of large language models on this task using a new benchmark dataset.


Simulating Field Experiments with Large Language Models

http://arxiv.org/abs/2408.09682v1

Compressor summary: The paper explores using large language models to simulate field experiments and proposes two prompting strategies, observer and participant modes, which show promising results in certain scenarios but also identify limitations.


MambaLoc: Efficient Camera Localisation via State Space Model

http://arxiv.org/abs/2408.09680v1

Compressor summary: MambaLoc is a new visual localization model that uses selective state space (SSM) to improve training efficiency, robustness in sparse data environments, and global feature extraction, while GIS enhances Non-local Neural Networks' performance with SSM's computational efficiency.


Image-based Freeform Handwriting Authentication with Energy-oriented Self-Supervised Learning

http://arxiv.org/abs/2408.09676v1

Compressor summary: SherlockNet is a self-supervised learning framework for handwriting authentication that handles noisy data, high-dimensional features, and lack of supervision by using energy-oriented contrastive learning and personalized fine-tuning.


Implicit Grid Convolution for Multi-Scale Image Super-Resolution

http://arxiv.org/abs/2408.09674v1

Compressor summary: The paper proposes a new framework for super-resolution using neural networks that trains multiple scales with one model and introduces a novel upsampler called Implicit Grid Convolution (IGConv) that improves performance.


BLADE: Benchmarking Language Model Agents for Data-Driven Science

http://arxiv.org/abs/2408.09667v1

Compressor summary: BLADE is a benchmark to evaluate agents' abilities in data-driven scientific discovery by comparing their analyses with expert-verified ground truth.


SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting

http://arxiv.org/abs/2408.09665v1

Compressor summary: Key points: - The paper proposes SG-GS, a method to reconstruct human avatars from monocular videos using semantics-embedded 3D Gaussians, skeleton deformation, and cloth dynamics deformation. - The paper also introduces SHA, a tool for efficient body part semantic labeling, and a 3D network that integrates geometric and semantic associations for human avatar deformation. - The paper enhances the semantic accuracy of 3D Gaussians and rendering quality with three strategies: semantic projection, semantic-guided density regularization, and semantic-aware regularization with neighborhood consistency. Summary: The paper presents SG-GS, a method that uses semantics-embedded 3D Gaussians and deformation techniques to reconstruct realistic human avatars from monocular videos, achieving state-of-the-art performance with semantic guidance and regularization.


CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning

http://arxiv.org/abs/2408.09663v1

Compressor summary: CHASE is a method for creating realistic human avatars with supervision from intrinsic 3D consistency and 3D geometry contrastive learning, achieving better performance than current state-of-the-art methods in both full and sparse input settings.


A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

http://arxiv.org/abs/2408.09656v1

Compressor summary: This study tests if ChatGPT-3.5, an AI language model, shows similar biases to humans when generating random numbers, finding that it avoids repetition better than humans.


Contextual Bandits for Unbounded Context Distributions

http://arxiv.org/abs/2408.09655v1

Compressor summary: The paper proposes two nearest neighbor methods for nonparametric contextual bandits with unbounded contexts and analyzes their regret bounds under different conditions.


ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement

http://arxiv.org/abs/2408.09650v1

Compressor summary: ExpoMamba is a fast and efficient model that enhances low-light images by combining frequency components with a modified U-Net, outperforming traditional models in speed and quality.


C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

http://arxiv.org/abs/2408.09647v1

Compressor summary: The study analyzes how CLIP detects deepfakes by recognizing similar concepts and introduces C2P-CLIP, an improved method that enhances detection performance using category-related concepts.


Acquiring Bidirectionality via Large and Small Language Models

http://arxiv.org/abs/2408.09640v1

Compressor summary: The authors propose a new method to improve token representation for bidirectional language models by adding a small backward LM, which improves performance in named entity recognition and other tasks.


How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments

http://arxiv.org/abs/2408.09639v1

Compressor summary: The study compares different methods to measure the grammatical knowledge of large language models and suggests using a variety of methods for comprehensive evaluation.