This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-21 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.11055v1
Compressor summary: The paper proposes a framework for interpretable image enhancement using learnable filters with expressive parameters and guiding prompts, achieving better results than predefined methods.
http://arxiv.org/abs/2408.11052v1
Compressor summary: The paper introduces JaxGCRL, a high-performance codebase for self-supervised goal-conditioned reinforcement learning, which enables faster training and experimentation on diverse environments.
http://arxiv.org/abs/2408.11051v1
Compressor summary: FLAME is a novel Multimodal LLM-based agent that efficiently handles urban VLN tasks by adapting to multiple observations using a three-phase tuning technique and synthesizing augmented datasets.
http://arxiv.org/abs/2408.11049v1
Compressor summary: MagicDec is a technique that improves low-latency, high-throughput inference for large language models by using speculative decoding and intelligent drafting strategies.
http://arxiv.org/abs/2408.11046v1
Compressor summary: The paper explores privacy risks of pre-trained language models, finding that membership leakage can occur even with black-box outputs, which is a greater risk than previously thought.
http://arxiv.org/abs/2408.11042v1
Compressor summary: GraphFSA is a framework for machine learning that learns finite state automata to run on each node of a given graph, enabling representation of complex graph algorithms with strong generalization and extrapolation abilities.
http://arxiv.org/abs/2408.11039v1
Compressor summary: Transfusion trains a multi-modal transformer using language modeling and diffusion over mixed text and image data, achieving good results on uni- and cross-modal benchmarks.
http://arxiv.org/abs/2408.11032v1
Compressor summary: This study tests four deep neural networks for their ability to accurately model CO2 distribution in the atmosphere, showing promising results with SwinTransformer achieving near perfect emulation skill.
http://arxiv.org/abs/2408.11030v1
Compressor summary: This paper introduces GOV-3D, a challenging open vocabulary task that requires understanding 3D scenes with fine-grained attributes, and presents OpenScan, a new benchmark for it.
http://arxiv.org/abs/2408.11029v1
Compressor summary: The text describes a scaling law that relates cross-entropy loss curves of neural language models to learning rate annealing, which can accurately predict loss dynamics during training with low computational cost.
http://arxiv.org/abs/2408.11021v1
Compressor summary: The Athena framework teaches language models to be safe and trustworthy by using past examples of safe and unsafe actions and guiding them with feedback on risky behavior.
http://arxiv.org/abs/2408.11006v1
Compressor summary: The paper presents targeted attacks on LLM-based Code Completion Tools, revealing significant vulnerabilities in their security frameworks and handling of code.
http://arxiv.org/abs/2408.11001v1
Compressor summary: MegaFusion is a novel diffusion-based text-to-image generation approach that efficiently produces high-resolution images with improved semantic accuracy and less object replication, using a coarse-to-fine strategy and adapting to different resolutions.
http://arxiv.org/abs/2408.11000v1
Compressor summary: SenPa-MAE is a transformer model that learns to encode sensor parameters from multispectral images, enabling cross-sensor learning for Earth observation.
http://arxiv.org/abs/2408.10997v1
Compressor summary: The text presents an accent conversion system that separates and combines segmental, voice, and prosodic features to improve the comprehensibility and social perception of non-native speech.
http://arxiv.org/abs/2408.10995v1
Compressor summary: The study proposes CTP-LLM, a large language model that predicts phase transitions in clinical trials using trial protocol texts, achieving 67% accuracy overall and 75% in predicting final approval from Phase~III.
http://arxiv.org/abs/2408.10993v1
Compressor summary: The paper proposes a reference-free demorphing technique that can recover the original face images from morphs by decomposing them into identity-preserving features.
http://arxiv.org/abs/2408.10979v1
Compressor summary: The text proposes creating a new writing system combining the advantages of Chinese and English, such as reducing vocabulary, allowing quick word recognition, and facilitating learning of complex concepts.
http://arxiv.org/abs/2408.10970v1
Compressor summary: The paper proposes a hybrid AI algorithm that combines recurrent switching linear dynamical systems (rSLDS) with active inference to learn discrete abstractions for planning and control, achieving fast system identification and non-trivial planning on the Continuous Mountain Car task.
http://arxiv.org/abs/2408.10962v1
Compressor summary: The paper surveys three decades of research on NLP tools and resources for the Greek language, including Ancient Greek and dialects, to help researchers and students working with Greek in these fields.
http://arxiv.org/abs/2408.10955v1
Compressor summary: The study used a CNN with ensemble transfer learning and a multichannel attention network to improve Bengali handwritten character recognition, achieving 92% accuracy on raw data and 98% on preprocessed data.
http://arxiv.org/abs/2408.10951v1
Compressor summary: WaveMask and WaveMix are novel data augmentation techniques for time series forecasting that use the discrete wavelet transform to preserve temporal dependencies while adjusting frequency elements.
http://arxiv.org/abs/2408.10948v1
Compressor summary: GAIM is a black-box adversarial attack on GNNs that uses node feature perturbations and optimizes them with a surrogate model to maximize influence on target nodes, supporting both untargeted and label-oriented attacks.
http://arxiv.org/abs/2408.10947v1
Compressor summary: The text discusses evaluating large language models' questioning abilities as teachers using a benchmark based on Anderson and Krathwohl's taxonomy and four metrics, showing their potential in teaching different subjects.
http://arxiv.org/abs/2408.10946v1
Compressor summary: The chapter explores how large language models can enhance recommendation systems by using natural language interactions and general reasoning to personalize recommendations based on diverse user preferences.
http://arxiv.org/abs/2408.10945v1
Compressor summary: HiRED is a token-dropping scheme that improves efficiency in high-resolution VLMs by selectively dropping excessive visual tokens based on attention scores.
http://arxiv.org/abs/2408.10943v1
Compressor summary: SysBench is a benchmark that evaluates how well large language models follow system messages, which are crucial for customizing the models to specific scenarios.
http://arxiv.org/abs/2408.10942v1
Compressor summary: Key points: - Machine learning models need distributed settings due to size limitations - Distributed inference tasks face reliability challenges from noisy channels and low-precision devices - The problem is studied for ensemble regression methods (bagging and gradient boosting) - Optimization methods are developed for aggregation coefficients based on noise parameters - Effectiveness is shown on synthetic and real datasets Summary: The paper proposes optimization methods for distributed ensemble regression with noisy channels, and evaluates their effectiveness on various datasets.
http://arxiv.org/abs/2408.10940v1
Compressor summary: The paper studies how different data augmentation methods affect object detection performance and energy efficiency, and proposes using the Efficiency Factor to balance these trade-offs.
http://arxiv.org/abs/2408.10939v1
Compressor summary: The paper proposes new methods for conformal prediction of sum or average labels over specific sets, with valid coverage guarantees and improved performance in some applications.
http://arxiv.org/abs/2408.10935v1
Compressor summary: The paper proposes a novel Point-to-Gaussian model that leverages point clouds and image features to generate high-quality 3D assets from images more efficiently than existing methods.
http://arxiv.org/abs/2408.10934v1
Compressor summary: SDI-Net is a new method for enhancing low-light stereo images by fully exploiting the correlation between binocular views using an attention mechanism.
http://arxiv.org/abs/2408.10932v1
Compressor summary: The text reviews recent advancements in reinforcement learning (RL) applications within finance, highlighting its potential to enhance traditional solutions using machine learning techniques.
http://arxiv.org/abs/2408.10923v1
Compressor summary: The text describes a new Language-Based-Classifier (LBC) that leverages pre-trained knowledge from Large Language Models to outperform traditional machine learning models on Out-of-Variable tasks involving tabular data.
http://arxiv.org/abs/2408.10921v1
Compressor summary: The paper introduces MTFinEval, a new benchmark to measure the basic economic knowledge and generalization ability of LLMs, which can help select suitable LLMs for production scenarios.
http://arxiv.org/abs/2408.10920v1
Compressor summary: The paper shows that gated RNNs do not always learn linear representations of input tokens, but rather their order of magnitude, challenging the strong Linear Representation Hypothesis.
http://arxiv.org/abs/2408.10919v1
Compressor summary: CrossFi is a siamese network that uses an attention mechanism and a weight-net to improve Wi-Fi sensing across various scenarios, achieving state-of-the-art results in gesture recognition and other tasks.
http://arxiv.org/abs/2408.10918v1
Compressor summary: CheckWhy is a dataset for verifying causal relations in claims using logical reasoning and evidence, which is challenging for current language models.
http://arxiv.org/abs/2408.10914v1
Compressor summary: The study shows that using code data in pre-training improves general language understanding and reasoning skills for large language models beyond coding tasks.
http://arxiv.org/abs/2408.10903v1
Compressor summary: The BEYOND DIALOGUE framework improves role-playing by aligning dialogue with profile traits using beyond dialogue tasks, reasoning outcomes, and fine-grained alignment, achieving better performance than existing methods.
http://arxiv.org/abs/2408.10902v1
Compressor summary: Soda-Eval is a new dataset to evaluate chatbots based on Soda, revealing issues with coherence and commonsense knowledge, and showing that fine-tuning LLMs improves performance.
http://arxiv.org/abs/2408.10895v1
Compressor summary: The paper proposes a mathematical model to study how rating aggregation rules and review selection mechanisms can correct assessment errors caused by herding effects in online product ratings, and tests the model on Amazon and TripAdvisor datasets.
http://arxiv.org/abs/2408.10889v1
Compressor summary: The paper introduces a new problem of learning action costs from unlabeled input plans for optimal planning and presents an algorithm, $LACFIP^k$, that solves it with theoretical and empirical evidence.
http://arxiv.org/abs/2408.10885v1
Compressor summary: The study proposes a method to detect and explain low-quality images in unsupervised ways using hierarchical variational autoencoders.
http://arxiv.org/abs/2408.10883v1
Compressor summary: The text proposes a new method for detecting fake news using dynamic analysis and adaptive discriminators, which are more flexible than existing methods.
http://arxiv.org/abs/2408.10878v1
Compressor summary: The paper proposes a DBHP framework that uses neural networks with Set Transformers to predict missing trajectory data for multiple agents, considering their physical constraints and dynamics, and achieves better results than existing methods in real-world scenarios.
http://arxiv.org/abs/2408.10872v1
Compressor summary: The paper proposes an approach using Vision Language Models to assess road safety from crowdsourced images, overcoming limitations of traditional CNNs and providing a scalable, cost-effective, and automated solution for low-resource settings.
http://arxiv.org/abs/2408.10865v1
Compressor summary: The paper proposes new algorithms for a multi-player bandit problem with stochastic arrivals and decentralized learning, using explore then commit framework.
http://arxiv.org/abs/2408.10862v1
Compressor summary: The paper evaluates a new method for private feature selection in high-dimensional datasets and shows it performs better than existing methods.
http://arxiv.org/abs/2408.10858v1
Compressor summary: The authors propose a multi-task reinforcement learning framework that uses a centralized reward agent to distill knowledge from various tasks and improve learning efficiency.
http://arxiv.org/abs/2408.10848v1
Compressor summary: The paper proposes a method to generate natural attack prompts for Text-to-Image models that can produce inappropriate images by substituting unsafe words with similar but safe phrases.
http://arxiv.org/abs/2408.10844v1
Compressor summary: The paper investigates how to align object detections with human preference for box size and proposes an asymmetric bounding box regression loss to encourage large boxes over small ones.
http://arxiv.org/abs/2408.10839v1
Compressor summary: The paper introduces a benchmark to compare seven algorithms for mathematical reasoning using LLMs across five datasets and four large models, exploring efficiency and performance trade-offs and practical applications.
http://arxiv.org/abs/2408.10838v1
Compressor summary: The paper presents a neural network that approximates parameter-to-solution maps for high-dimensional partial differential equations using adaptively refined finite element meshes and a reliable error estimator, achieving accuracy and complexity similar to low-rank tensor regression methods.
http://arxiv.org/abs/2408.10823v1
Compressor summary: AI image compression can affect biometric recognition, especially for iris and fabric/tattoo images, while fingerprint recognition remains robust.
http://arxiv.org/abs/2408.10822v1
Compressor summary: STGormer is a new neural network model for traffic forecasting that effectively uses spatial and temporal information in traffic data to improve prediction accuracy.
http://arxiv.org/abs/2408.10821v1
Compressor summary: The paper introduces GLAKES-Additional, a database with biennial data on over 150,000 lakes worldwide, which helps monitor lake dynamics and attribute area changes to climate and land use factors using advanced machine learning models.
http://arxiv.org/abs/2408.10819v1
Compressor summary: GS-KGC is a novel generative completion framework for knowledge graphs that uses question-answering, subgraph extraction, and contextual information to discover new triples and facts beyond existing KGs.
http://arxiv.org/abs/2408.10818v1
Compressor summary: The paper shows how randomization can improve transformer models in various tasks by learning random parameters that enhance robustness and performance.
http://arxiv.org/abs/2408.10811v1
Compressor summary: The study explores if non-English-centric LLMs have dual latent languages and how they handle cultural conflicts between them when generating text in different languages.
http://arxiv.org/abs/2408.10808v1
Compressor summary: The paper presents methods to improve small language models' performance in telecom question answering, achieving leading marks of 81.9% (Phi-2) and 57.3% (Falcon-7B).
http://arxiv.org/abs/2408.10802v1
Compressor summary: Inverse Deep Learning Ray Tracing (iDLR) is a novel method for predicting heliostat surfaces in Concentrating Solar Power plants using target images, improving safety and efficiency by accounting for non-ideal surface conditions.
http://arxiv.org/abs/2408.10798v1
Compressor summary: The paper proposes AutoAugOOD, a probabilistic auto-negative pair generation method that leverages contrastive learning to create a universal novelty detector that adapts well to different distribution shifts and settings.
http://arxiv.org/abs/2408.10795v1
Compressor summary: UAT2E is a method that tests the robustness of explainable AI models by inserting malicious triggers into inputs without changing their predictions, revealing their vulnerability in providing trustworthy explanations.
http://arxiv.org/abs/2408.10794v1
Compressor summary: Key points: - Advanced automotive systems use CPS for driver assistance but have limitations in occluded scenarios - V2I and V2V communication can share data about surrounding objects to improve safety - LLMs can enhance voice assistants and communicate what other vehicles see - GPT-4V and GPT-4o are powerful LLMs that can understand traffic situations and spot participants Summary: The text discusses how CPS, V2I/V2V communication, and LLMs can improve automotive systems by overcoming occlusions, sharing data, and enhancing voice assistants. It also evaluates GPT-4V and GPT-4o as LLMs that can understand traffic situations and spot participants.
http://arxiv.org/abs/2408.10789v1
Compressor summary: The paper proposes a part-aware 3D reconstruction method using a hybrid representation of superquadrics and 2D Gaussians to parse objects or scenes into semantic parts from multi-view image inputs, achieving accurate geometry reconstruction and high-quality rendering.
http://arxiv.org/abs/2408.10788v1
Compressor summary: The paper examines UK university AI courses' alignment with industry demands and finds they are strong in programming and machine learning but weak in data science and maths.
http://arxiv.org/abs/2408.10787v1
Compressor summary: LightMDETR is a lightweight version of MDETR that uses a Deep Fusion Encoder to represent image and text modalities for improved object detection and classification with less computational cost.
http://arxiv.org/abs/2408.10777v1
Compressor summary: The paper proposes a method for detecting camouflaged objects using only one point supervision and unsupervised contrastive learning to address the challenges of ambiguous boundaries and feature instability.
http://arxiv.org/abs/2408.10775v1
Compressor summary: Key points: - Machine vision enhances automation, quality control, and operational efficiency in industrial settings using visual data - Generative AI is a promising approach to improve pattern recognition in machine vision - A literature review based on the PRISMA guidelines was conducted to analyze over 1,200 papers on generative AI in industrial machine vision - The main use of generative AI in machine vision is data augmentation for tasks such as classification and object detection - The paper identifies application challenges and data requirements for successful generative AI implementation in machine vision Summary: The paper reviews recent advances and applications of generative AI in industrial machine vision, focusing on data augmentation for tasks like classification and object detection, and highlighting the challenges and data needs for its successful use.
http://arxiv.org/abs/2408.10774v1
Compressor summary: Flexora improves fine-tuning of large language models by flexibly selecting important layers for different downstream tasks using hyperparameter optimization.
http://arxiv.org/abs/2408.10764v1
Compressor summary: Otter inserts extra parameters into transformer models to predict calibration signals for safer, more reliable language generation while saving time and space.
http://arxiv.org/abs/2408.10760v1
Compressor summary: SAM-COD is a unified framework that supports various weakly-supervised labels for camouflaged object detection, using a prompt adapter, response filter, semantic matcher, and prompt-adaptive knowledge distillation to improve performance.
http://arxiv.org/abs/2408.10755v1
Compressor summary: The text discusses a technique to generate fair data using knowledge distillation, which improves fairness, quality, and utility of synthetic data compared to existing methods.
http://arxiv.org/abs/2408.10739v1
Compressor summary: TrackNeRF improves NeRFs by enforcing global 3D consistency among feature tracks for more accurate reconstruction with sparse and noisy views.
http://arxiv.org/abs/2408.10729v1
Compressor summary: The text discusses how large language models can improve scientific tasks, but their high resource requirements call for methods to make them more cost-effective, either by reducing model size or enhancing data quality.
http://arxiv.org/abs/2408.10724v1
Compressor summary: The authors introduce a benchmark dataset for neural news detection in four languages, using outputs from various multilingual generators and classifiers, to study the interpretability and robustness of machine-generated text detectors.
http://arxiv.org/abs/2408.10722v1
Compressor summary: The paper proposes MEGen, an editing-based generative backdoor for NLP tasks, which can insert a trigger into an LLM to output dangerous information when activated.
http://arxiv.org/abs/2408.10720v1
Compressor summary: The paper proposes a new machine learning method (MLP-Mixer) to improve modeling of stiff chemical reactions in fluid dynamics, comparing it with traditional methods on a benchmark dataset.
http://arxiv.org/abs/2408.10717v1
Compressor summary: Key points: - New surrogate modeling framework for CO2 storage operations history matching using deep learning - Inexpensive flow-only simulations combined with coupled runs - Effective rock compressibility, residual U-Net architectures for saturation, pressure and surface displacement predictions - Median relative error less than 4% for all variables - Surrogate models incorporated into hierarchical Markov chain Monte Carlo history matching workflow - High prior uncertainty and data types affect history matching results Summary: The authors present a new deep learning framework for history matching of CO2 storage operations using surrogate models that predict saturation, pressure and surface displacement from flow-only simulations and coupled runs. They use effective rock compressibility and residual U-Net architectures to achieve low error rates and incorporate uncertainty in the geomodels and data types.
http://arxiv.org/abs/2408.10715v1
Compressor summary: The study shows that fine-tuning LLaMA models for radiation oncology using QLoRA algorithm improves the quality of physician letters, which are valuable in clinical practice but time-consuming to generate manually.
http://arxiv.org/abs/2408.10713v1
Compressor summary: MoMo is a new offline reinforcement learning algorithm that uses anti-exploration techniques to improve performance and handle out-of-distribution states without large ensembles of models.
http://arxiv.org/abs/2408.10711v1
Compressor summary: This paper investigates how large language models are influenced by order bias in similarity judgments, similar to humans, and discusses the implications for AI applications that rely on human values and expectations.
http://arxiv.org/abs/2408.10710v1
Compressor summary: Key points: - The paper proposes a novel framework for multiple weld seams extraction using RGB images and 3D point clouds - The method uses region growth to fine-edge extract the weld seams within the region of interest - The method is accelerated by a pre-trained deep learning model - The method is tested on various workpieces with linear and curved weld seams and shows potential for industrial applications Summary: The paper presents a new method that uses RGB images and 3D point clouds to extract multiple weld seams accurately and efficiently, using region growth and a pre-trained deep learning model.
http://arxiv.org/abs/2408.10709v1
Compressor summary: The paper presents a technique to learn rules from state transitions using neural networks that are invariant to variable permutation and naming, improving generalization and scalability.
http://arxiv.org/abs/2408.10703v1
Compressor summary: LLM-Morph uses pre-trained Large Language Models to align deep features from different modal medical images for coarse-to-fine Multimodal Deformable Image Registration.
http://arxiv.org/abs/2408.10700v1
Compressor summary: AnyGraph is a unified graph foundation model that leverages Graph Mixture-of-Experts to handle structure and feature heterogeneity, enabling fast adaptation and favorable scaling in various graph tasks.
http://arxiv.org/abs/2408.10694v1
Compressor summary: The paper proposes MsMemoryGAN, a model that filters adversarial perturbations from vein images by reconstructing them using memory modules and a learnable metric.
http://arxiv.org/abs/2408.10692v1
Compressor summary: The text proposes a method to measure uncertainty in language model generations by learning the conditional dependency between generation steps using a regression model.
http://arxiv.org/abs/2408.10691v1
Compressor summary: Key points: - Large language models are versatile but require fine-tuning and resources for deployment - Traditional fine-tuning methods need more GPU memory than mainstream hardware can offer - Memory-efficient methods are needed to reduce energy consumption, costs, and environmental impact - The paper reviews memory-efficient fine-tuning methods and model compression techniques for deploying LLMs over the network edge Summary: The paper surveys memory-efficient methods and model compression techniques to enable sustainable deployment of large language models over the network edge.
http://arxiv.org/abs/2408.10689v1
Compressor summary: The Genesis project develops robot scientists that use advanced AI and hardware to rapidly improve biological models in system biology.
http://arxiv.org/abs/2408.10683v1
Compressor summary: This paper introduces rejection conditions for argumentation frameworks, which allow flexible rejection of arguments based on logic programs and have high expressiveness and complexity.
http://arxiv.org/abs/2408.10682v1
Compressor summary: The paper proposes Dynamic Unlearning Attack to proactively assess vulnerabilities of unlearned models and Latent Adversarial Unlearning to enhance their robustness against adversarial attacks.
http://arxiv.org/abs/2408.10681v1
Compressor summary: HMoE is a new model that uses experts with different sizes and capacities to handle variable input data, leading to better performance and efficiency compared to traditional homogeneous MoE models.
http://arxiv.org/abs/2408.10680v1
Compressor summary: The study explores strategies to improve multilingual speech models on new languages without original data while preserving performance on the original languages, using LoRA-based methods and a learnable rank coefficient.
http://arxiv.org/abs/2408.10679v1
Compressor summary: DemMamba is an alignment-free video demoireing network that uses Mamba to model moire patterns' spatial and temporal relationships, improving performance and visual quality.
http://arxiv.org/abs/2408.10676v1
Compressor summary: The paper proposes RNA, a method that improves OOD detection and classification by using representation norm as a new dimension and training in a way that creates a noticeable difference between ID and OOD data.
http://arxiv.org/abs/2408.10672v1
Compressor summary: The paper presents NeurELA, an end-to-end framework that profiles landscape features for MetaBBO algorithms, enhancing their performance and autonomy.
http://arxiv.org/abs/2408.10670v1
Compressor summary: The text proposes a new technique combining thermal stereography and deep learning to measure waves noncontactly and accurately, overcoming challenges posed by water's optical properties.
http://arxiv.org/abs/2408.10663v1
Compressor summary: REInstruct is a method to automatically create instruction data for language models using unlabeled texts and rewriting techniques, achieving high performance without reliance on proprietary LLMs or human annotation.
http://arxiv.org/abs/2408.10653v1
Compressor summary: The paper proposes a deep unfolding network (DUN) that integrates color priors and inter-stage feature transformation to improve underwater image enhancement (UIE), offering more accurate, reliable, and explainable results than existing methods.
http://arxiv.org/abs/2408.10649v1
Compressor summary: The study shows that FINN, a novel hybrid model combining physics and machine learning, can accurately reconstruct underwater topography from wave dynamics data, outperforming conventional ML and physics-aware ML models.
http://arxiv.org/abs/2408.10646v1
Compressor summary: The study explores how well large language models can represent and share factual knowledge across different languages, finding that script similarity affects representation sharing and improving accuracy.
http://arxiv.org/abs/2408.10642v1
Compressor summary: The text introduces a paradigm for aligning large language models to human preference using supervised fine tuning and reinforcement learning from human feedback, and proposes a new training metric and loss function for supervised fine tuning.
http://arxiv.org/abs/2408.10641v1
Compressor summary: This paper reviews image-based human-object interaction detection methods, datasets, and challenges, as well as recent advancements in zero-shot, weakly supervised learning, and large-scale language models.
http://arxiv.org/abs/2408.10635v1
Compressor summary: Strategist is a new method that uses LLMs to improve skills for multi-agent games through self-play simulations and reflection, achieving better performance than existing methods in action planning and dialogue generation.
http://arxiv.org/abs/2408.10633v1
Compressor summary: Key points: - Interactive method for generating counterfactual explanations for univariate time series classification tasks - Uses 2D projections and decision boundary maps to improve interpretability - Allows users to manipulate projected data points and explore hypothetical scenarios - Validated on ECG5000 dataset, shows significant improvements in interpretability and user understanding Summary: The paper presents an interactive method that uses 2D projections and decision boundary maps to help users generate counterfactual explanations for univariate time series classification tasks, improving interpretability and user understanding.
http://arxiv.org/abs/2408.10631v1
Compressor summary: LLM-Barber is a one-shot pruning framework that optimizes large language models by rebuilding the sparsity mask without retraining or weight reconstruction, achieving state-of-the-art results in perplexity and zero-shot performance.
http://arxiv.org/abs/2408.10628v1
Compressor summary: The paper introduces Sequence Dreaming, a method to improve interpretability of neural networks for time series data by generating sequences that reflect the critical features identified by the model.
http://arxiv.org/abs/2408.10624v1
Compressor summary: WRIM-Net is a network that mines modality-invariant information for visible-infrared person re-identification by using multi-dimension interactive information mining and auxiliary-information-based contrastive learning.
http://arxiv.org/abs/2408.10623v1
Compressor summary: TextMastero is a multilingual scene text editing method using latent diffusion models with glyph conditioning and latent guidance modules to improve accuracy and style similarity.
http://arxiv.org/abs/2408.10619v1
Compressor summary: The Diffusion Based Change Detector combines diffusion models and SSIM to create accurate and interpretable change maps for remote sensing data.
http://arxiv.org/abs/2408.10616v1
Compressor summary: The Aesthetics Toolbox is an open-access, easy-to-use web tool that allows users to calculate various quantitative image properties related to visual aesthetics research using standardized and consistent methods.
http://arxiv.org/abs/2408.10615v1
Compressor summary: The paper introduces a new method to help large language models better handle irrelevant information in math problems, improving their reasoning skills.
http://arxiv.org/abs/2408.10614v1
Compressor summary: The paper proposes a novel FER pipeline that uses CLIP features and learned sigmoid masks to extract expression features, improving zero-shot generalization on unseen test sets.
http://arxiv.org/abs/2408.10610v1
Compressor summary: The authors study how well ARMA models can approximate stationary random variables using Hardy space functions and find some limitations and possibilities.
http://arxiv.org/abs/2408.10609v1
Compressor summary: PerturBench is a framework that helps predict cell perturbation effects, standardize benchmarking, and improve model evaluation in single-cell research.
http://arxiv.org/abs/2408.10608v1
Compressor summary: Bayesian-Theory based Bias Removal (BTBR) is a framework that identifies and removes implicit biases from large language models trained on biased data.
http://arxiv.org/abs/2408.10605v1
Compressor summary: MUSES is a novel AI system that generates 3D-controllable images from user queries using a progressive workflow with three key components and outperforms existing methods on newly constructed benchmarks.
http://arxiv.org/abs/2408.10604v1
Compressor summary: MuNfQuAD is a multilingual QA dataset with non-factoid questions from BBC news articles that can be answered using silver answers and has potential for low-resource languages.
http://arxiv.org/abs/2408.10600v1
Compressor summary: The paper proposes a new method using self-supervised learning and triplet networks to diagnose breast tumors from unlabeled ultrasound videos, achieving high performance with very few labeled samples.
http://arxiv.org/abs/2408.10593v1
Compressor summary: SpaMo is a novel Sign Language Translation framework that leverages spatial and motion features from visual encoders and large language models for better translation without relying on domain-specific fine-tuning.
http://arxiv.org/abs/2408.10592v1
Compressor summary: The paper proposes a hologram reasoning scheme to solve Algebra Problems with Geometry Diagrams (APGDs) using a graph model pool and deep reinforcement learning, achieving high accuracy and interpretability.
http://arxiv.org/abs/2408.10588v1
Compressor summary: DEGAS is a method that uses 3D Gaussian Splatting and a conditional variational autoencoder to create realistic full-body avatars with facial expressions driven by audio and body motion.
http://arxiv.org/abs/2408.10575v1
Compressor summary: MUSE is a TVR method that uses multi-scale representations to better match videos with natural language queries, using a feature pyramid and an efficient Mamba structure.
http://arxiv.org/abs/2408.10573v1
Compressor summary: The paper introduces a method to improve the quality of answers generated by large language models by rewriting user questions, without needing human annotations.
http://arxiv.org/abs/2408.10571v1
Compressor summary: The paper proposes a Prompt-Agnostic Adversarial Perturbation (PAP) method that protects personal images and artistic styles from customized text-to-image generation models by generating perturbations based on the prompt distribution.
http://arxiv.org/abs/2408.10566v1
Compressor summary: SparseGrow is a novel approach to overcome growth-induced forgetting (GIFt) in continual learning by using data-driven sparse layer expansion and on-data initialization to balance adaptability and knowledge retention.
http://arxiv.org/abs/2408.10557v1
Compressor summary: The paper proposes Other HuBERT, a modified speech modeling method that separates content and other information, and shows its effectiveness in encoding and learning the latter with robust data augmentation.
http://arxiv.org/abs/2408.10556v1
Compressor summary: Hokoff is a new dataset and framework for offline RL and MARL research, based on a complex real-world MOBA game, that reveals the limitations of existing methods.
http://arxiv.org/abs/2408.10555v1
Compressor summary: TOGCL is a framework that uses graph attention networks and Transformer encoders to predict QoS in service-oriented architecture by modeling user-service interactions and long-term dependencies.
http://arxiv.org/abs/2408.10549v1
Compressor summary: The article proposes using AI to improve IVR systems in call centers, reducing operator workload and improving customer service, especially for non-English languages like Kazakh.
http://arxiv.org/abs/2408.10548v1
Compressor summary: Key points: - Tabular data is a common and challenging data type that requires advanced modeling techniques - Transformer architectures inspired by natural language processing have revolutionized tabular data analysis - The paper reviews different aspects of language modeling for tabular data, including data structures, datasets, models, and challenges Summary: The paper provides a comprehensive survey of language modeling techniques for tabular data, covering various aspects such as data types, models, and tasks. It highlights the impact of transformer architectures on tabular data analysis and identifies future research directions.
http://arxiv.org/abs/2408.10543v1
Compressor summary: The paper introduces Diff-PCC, a diffusion-based point cloud compression method that uses dual-space latent representation and diffusion generator to produce high-quality reconstructions with state-of-the-art compression performance.
http://arxiv.org/abs/2408.10539v1
Compressor summary: The paper proposes a method to perform deep image matting using rough annotations like trimaps, and a directional distance consistency loss to infer alpha values at transition areas, achieving comparable or better results than fine-label-supervised methods.
http://arxiv.org/abs/2408.10538v1
Compressor summary: The text describes an AI-assisted monitoring system for laparoscopic liver resection that uses a novel dataset, PmLR50, to improve workflow recognition and detect ischemic injury caused by the Pringle maneuver.
http://arxiv.org/abs/2408.10533v1
Compressor summary: FAGStyle is a text-guided image style transfer method that uses sliding window crop, feature augmentation, and self-correlation consistency to achieve high-quality stylization while preserving content.
http://arxiv.org/abs/2408.10532v1
Compressor summary: The paper presents a computer vision-based food detection and nutrition analysis system that offers personalized meal recommendations through mobile and web applications.
http://arxiv.org/abs/2408.10528v1
Compressor summary: The paper proposes MoMatterXAI, a novel algorithm that generates alterfactual explanations for text classification tasks to ensure AI models' decisions are not biased against specific attributes.
http://arxiv.org/abs/2408.10527v1
Compressor summary: Key points: - EdgeNAT is a one-stage transformer-based edge detector with DiNAT as the encoder - It captures global and local features efficiently - It uses a novel SCAF-MLA decoder to enhance feature representation - It achieves state-of-the-art performance on RGB and depth images Summary: EdgeNAT is a new edge detector that uses transformers with DiNAT encoder and SCAF-MLA decoder to extract object boundaries and meaningful edges, outperforming existing methods on multiple datasets.
http://arxiv.org/abs/2408.10524v1
Compressor summary: The XCB module improves the recognition of uncommon phrases in bilingual settings by enhancing the dominant language model with auxiliary language biasing and a specific loss, without increasing inference overhead.
http://arxiv.org/abs/2408.10518v1
Compressor summary: The authors introduce a new Bangla sign language dataset and a hybrid CNN model, achieving 97.92% accuracy in recognizing Bangla signs.
http://arxiv.org/abs/2408.10517v1
Compressor summary: The paper introduces Decision MetaMamba, a model that enhances offline reinforcement learning by combining patterns from short sequences and using a State Space Model to selectively combine information from distant sequences.
http://arxiv.org/abs/2408.10516v1
Compressor summary: The study proposes a data augmentation framework using LLM and PLM to create personalized dialogue data for SDSs engaging with diverse user groups, enhancing their performance.
http://arxiv.org/abs/2408.10512v1
Compressor summary: The paper proposes a particle-filter-based estimator for human agent's execution error that can handle arbitrary shapes and changes over time, improving AI decision-making assistance.
http://arxiv.org/abs/2408.10511v1
Compressor summary: scCLG, a novel method for scRNA-seq data clustering, uses ChebAE and selective training to handle challenges such as boundary nodes and high-quality node variation, achieving superior performance compared to existing approaches.
http://arxiv.org/abs/2408.10504v1
Compressor summary: The paper introduces Query-dependent Prompt Optimization (QPO), a method that uses reinforcement learning to fine-tune a small language model to generate optimal prompts for large language models, improving their performance on various tasks without frequent interactions.
http://arxiv.org/abs/2408.10503v1
Compressor summary: The paper explores using vision transformers to classify hand images based on their unique features, explains their internal workings, and introduces adaptive distillation methods to transfer knowledge across domains without forgetting.
http://arxiv.org/abs/2408.10497v1
Compressor summary: The authors propose a new method to compress prompts for generative LLM using cross-attention and information bottleneck theory, achieving better performance and lower latency.
http://arxiv.org/abs/2408.10496v1
Compressor summary: The TextileNet8 dataset is a novel 3D point cloud data set for textile pilling assessment with high accuracy and can be used to train the PointGPT+NN model, which outperforms other models.
http://arxiv.org/abs/2408.10493v1
Compressor summary: This paper proposes a new top-down approach for spectral clustering that improves efficiency and adaptability by using local structures, density-based splitting, and a novel similarity measure for micro-clusters.
http://arxiv.org/abs/2408.10491v1
Compressor summary: The paper introduces $\alpha$-sig, a method that improves the convex relaxation of sigmoid activation functions for formal verification by using tunable hyperplanes to create element-wise tight bounds.
http://arxiv.org/abs/2408.10492v1
Compressor summary: The paper presents an ILA system that uses a knowledge graph and real-time voice analysis to help instructors improve their lectures' effectiveness and student learning.
http://arxiv.org/abs/2408.10490v1
Compressor summary: The paper explores how planning can help instruction-tuned LLMs reduce hallucinations in long-form text generation by guiding retrieval of relevant facts.
http://arxiv.org/abs/2408.10488v1
Compressor summary: This paper introduces Event-CSL, a high-resolution event stream sign language dataset, and a novel baseline method using Mamba model for improved AI-assisted sign language translation in various conditions.
http://arxiv.org/abs/2408.10487v1
Compressor summary: The paper proposes a new event camera-based visual tracking framework that uses Mamba networks for feature extraction, interaction, and dynamic template update to improve accuracy and efficiency.
http://arxiv.org/abs/2408.10483v1
Compressor summary: The paper introduces PRformer, a model that combines Pyramid RNN embeddings with Transformer encoder to improve temporal sequence representation and performance on time series prediction tasks.
http://arxiv.org/abs/2408.10479v1
Compressor summary: The paper proposes a one-stage reinforcement learning method for order-dispatching in ride-hailing services, improving matching efficiency and user experience.
http://arxiv.org/abs/2408.10473v1
Compressor summary: SDS is a pruning framework that enhances the performance of pre-trained language models by optimizing weight distribution through two rounds of sparse pruning.
http://arxiv.org/abs/2408.10468v1
Compressor summary: Key points: - Large Language Models can leak sensitive information from training data - Influence Functions can trace privacy leakage back to the data, but have limitations - Heuristically Adjusted IF (HAIF) improves tracing accuracy by reducing weights of tokens with large gradient norms - HAIF is tested on two datasets and outperforms existing methods Summary: The paper proposes HAIF, a method to improve the ability of Language Models to trace privacy leakage back to their training data by adjusting the influence of tokens with large gradient norms. HAIF shows better performance than existing Influence Functions on various datasets and scenarios.
http://arxiv.org/abs/2408.10467v1
Compressor summary: The text proposes a method that combines an energy-based model prior with Markov Chain Monte Carlo inference for better multimodal generation across different modalities.
http://arxiv.org/abs/2408.10458v1
Compressor summary: The paper proposes a new framework that improves transfer learning in solving PDEs by combining fusion frame theory with POD-DeepONet, leading to better performance across different problems.