arxiv compressed, 2024-08-21

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-21 generated by the compressor, my personal LLM-based project.


Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement

http://arxiv.org/abs/2408.11055v1

Compressor summary: The paper proposes a framework for interpretable image enhancement using learnable filters with expressive parameters and guiding prompts, achieving better results than predefined methods.


Accelerating Goal-Conditioned RL Algorithms and Research

http://arxiv.org/abs/2408.11052v1

Compressor summary: The paper introduces JaxGCRL, a high-performance codebase for self-supervised goal-conditioned reinforcement learning, which enables faster training and experimentation on diverse environments.


FLAME: Learning to Navigate with Multimodal LLM in Urban Environments

http://arxiv.org/abs/2408.11051v1

Compressor summary: FLAME is a novel Multimodal LLM-based agent that efficiently handles urban VLN tasks by adapting to multiple observations using a three-phase tuning technique and synthesizing augmented datasets.


MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

http://arxiv.org/abs/2408.11049v1

Compressor summary: MagicDec is a technique that improves low-latency, high-throughput inference for large language models by using speculative decoding and intelligent drafting strategies.


Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

http://arxiv.org/abs/2408.11046v1

Compressor summary: The paper explores privacy risks of pre-trained language models, finding that membership leakage can occur even with black-box outputs, which is a greater risk than previously thought.


GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

http://arxiv.org/abs/2408.11042v1

Compressor summary: GraphFSA is a framework for machine learning that learns finite state automata to run on each node of a given graph, enabling representation of complex graph algorithms with strong generalization and extrapolation abilities.


Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

http://arxiv.org/abs/2408.11039v1

Compressor summary: Transfusion trains a multi-modal transformer using language modeling and diffusion over mixed text and image data, achieving good results on uni- and cross-modal benchmarks.


Atmospheric Transport Modeling of CO$_2$ with Neural Networks

http://arxiv.org/abs/2408.11032v1

Compressor summary: This study tests four deep neural networks for their ability to accurately model CO2 distribution in the atmosphere, showing promising results with SwinTransformer achieving near perfect emulation skill.


OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

http://arxiv.org/abs/2408.11030v1

Compressor summary: This paper introduces GOV-3D, a challenging open vocabulary task that requires understanding 3D scenes with fine-grained attributes, and presents OpenScan, a new benchmark for it.


Scaling Law with Learning Rate Annealing

http://arxiv.org/abs/2408.11029v1

Compressor summary: The text describes a scaling law that relates cross-entropy loss curves of neural language models to learning rate annealing, which can accurately predict loss dynamics during training with low computational cost.


Athena: Safe Autonomous Agents with Verbal Contrastive Learning

http://arxiv.org/abs/2408.11021v1

Compressor summary: The Athena framework teaches language models to be safe and trustworthy by using past examples of safe and unsafe actions and guiding them with feedback on risky behavior.


While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?

http://arxiv.org/abs/2408.11006v1

Compressor summary: The paper presents targeted attacks on LLM-based Code Completion Tools, revealing significant vulnerabilities in their security frameworks and handling of code.


MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

http://arxiv.org/abs/2408.11001v1

Compressor summary: MegaFusion is a novel diffusion-based text-to-image generation approach that efficiently produces high-resolution images with improved semantic accuracy and less object replication, using a coarse-to-fine strategy and adapting to different resolutions.


SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining

http://arxiv.org/abs/2408.11000v1

Compressor summary: SenPa-MAE is a transformer model that learns to encode sensor parameters from multispectral images, enabling cross-sensor learning for Earth observation.


Disentangling segmental and prosodic factors to non-native speech comprehensibility

http://arxiv.org/abs/2408.10997v1

Compressor summary: The text presents an accent conversion system that separates and combines segmental, voice, and prosodic features to improve the comprehensibility and social perception of non-native speech.


CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

http://arxiv.org/abs/2408.10995v1

Compressor summary: The study proposes CTP-LLM, a large language model that predicts phase transitions in clinical trials using trial protocol texts, achieving 67% accuracy overall and 75% in predicting final approval from Phase~III.


Facial Demorphing via Identity Preserving Image Decomposition

http://arxiv.org/abs/2408.10993v1

Compressor summary: The paper proposes a reference-free demorphing technique that can recover the original face images from morphs by decomposing them into identity-preserving features.


The fusion of phonography and ideographic characters into virtual Chinese characters -- Based on Chinese and English

http://arxiv.org/abs/2408.10979v1

Compressor summary: The text proposes creating a new writing system combining the advantages of Chinese and English, such as reducing vocabulary, allowing quick word recognition, and facilitating learning of complex concepts.


Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

http://arxiv.org/abs/2408.10970v1

Compressor summary: The paper proposes a hybrid AI algorithm that combines recurrent switching linear dynamical systems (rSLDS) with active inference to learn discrete abstractions for planning and control, achieving fast system identification and non-trivial planning on the Continuous Mountain Car task.


NLP for The Greek Language: A Longer Survey

http://arxiv.org/abs/2408.10962v1

Compressor summary: The paper surveys three decades of research on NLP tools and resources for the Greek language, including Ancient Greek and dialects, to help researchers and students working with Greek in these fields.


Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter

http://arxiv.org/abs/2408.10955v1

Compressor summary: The study used a CNN with ensemble transfer learning and a multichannel attention network to improve Bengali handwritten character recognition, achieving 92% accuracy on raw data and 98% on preprocessed data.


Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting

http://arxiv.org/abs/2408.10951v1

Compressor summary: WaveMask and WaveMix are novel data augmentation techniques for time series forecasting that use the discrete wavelet transform to preserve temporal dependencies while adjusting frequency elements.


GAIM: Attacking Graph Neural Networks via Adversarial Influence Maximization

http://arxiv.org/abs/2408.10948v1

Compressor summary: GAIM is a black-box adversarial attack on GNNs that uses node feature perturbations and optimizes them with a surrogate model to maximize influence on target nodes, supporting both untargeted and label-oriented attacks.


Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models

http://arxiv.org/abs/2408.10947v1

Compressor summary: The text discusses evaluating large language models' questioning abilities as teachers using a benchmark based on Anderson and Krathwohl's taxonomy and four metrics, showing their potential in teaching different subjects.


Large Language Model Driven Recommendation

http://arxiv.org/abs/2408.10946v1

Compressor summary: The chapter explores how large language models can enhance recommendation systems by using natural language interactions and general reasoning to personalize recommendations based on diverse user preferences.


HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments

http://arxiv.org/abs/2408.10945v1

Compressor summary: HiRED is a token-dropping scheme that improves efficiency in high-resolution VLMs by selectively dropping excessive visual tokens based on attention scores.


SysBench: Can Large Language Models Follow System Messages?

http://arxiv.org/abs/2408.10943v1

Compressor summary: SysBench is a benchmark that evaluates how well large language models follow system messages, which are crucial for customizing the models to specific scenarios.


Robust Regression with Ensembles Communicating over Noisy Channels

http://arxiv.org/abs/2408.10942v1

Compressor summary: Key points: - Machine learning models need distributed settings due to size limitations - Distributed inference tasks face reliability challenges from noisy channels and low-precision devices - The problem is studied for ensemble regression methods (bagging and gradient boosting) - Optimization methods are developed for aggregation coefficients based on noise parameters - Effectiveness is shown on synthetic and real datasets Summary: The paper proposes optimization methods for distributed ensemble regression with noisy channels, and evaluates their effectiveness on various datasets.


A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection

http://arxiv.org/abs/2408.10940v1

Compressor summary: The paper studies how different data augmentation methods affect object detection performance and energy efficiency, and proposes using the Efficiency Factor to balance these trade-offs.


Conformalized Interval Arithmetic with Symmetric Calibration

http://arxiv.org/abs/2408.10939v1

Compressor summary: The paper proposes new methods for conformal prediction of sum or average labels over specific sets, with valid coverage guarantees and improved performance in some applications.


Large Point-to-Gaussian Model for Image-to-3D Generation

http://arxiv.org/abs/2408.10935v1

Compressor summary: The paper proposes a novel Point-to-Gaussian model that leverages point clouds and image features to generate high-quality 3D assets from images more efficiently than existing methods.


SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement

http://arxiv.org/abs/2408.10934v1

Compressor summary: SDI-Net is a new method for enhancing low-light stereo images by fully exploiting the correlation between binocular views using an attention mechanism.


The Evolution of Reinforcement Learning in Quantitative Finance

http://arxiv.org/abs/2408.10932v1

Compressor summary: The text reviews recent advancements in reinforcement learning (RL) applications within finance, highlighting its potential to enhance traditional solutions using machine learning techniques.


LBC: Language-Based-Classifier for Out-Of-Variable Generalization

http://arxiv.org/abs/2408.10923v1

Compressor summary: The text describes a new Language-Based-Classifier (LBC) that leverages pre-trained knowledge from Large Language Models to outperform traditional machine learning models on Out-of-Variable tasks involving tabular data.


MTFinEval:A Multi-domain Chinese Financial Benchmark with Eurypalynous questions

http://arxiv.org/abs/2408.10921v1

Compressor summary: The paper introduces MTFinEval, a new benchmark to measure the basic economic knowledge and generalization ability of LLMs, which can help select suitable LLMs for production scenarios.


Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

http://arxiv.org/abs/2408.10920v1

Compressor summary: The paper shows that gated RNNs do not always learn linear representations of input tokens, but rather their order of magnitude, challenging the strong Linear Representation Hypothesis.


CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network

http://arxiv.org/abs/2408.10919v1

Compressor summary: CrossFi is a siamese network that uses an attention mechanism and a weight-net to improve Wi-Fi sensing across various scenarios, achieving state-of-the-art results in gesture recognition and other tasks.


CHECKWHY: Causal Fact Verification via Argument Structure

http://arxiv.org/abs/2408.10918v1

Compressor summary: CheckWhy is a dataset for verifying causal relations in claims using logical reasoning and evidence, which is challenging for current language models.


To Code, or Not To Code? Exploring Impact of Code in Pre-training

http://arxiv.org/abs/2408.10914v1

Compressor summary: The study shows that using code data in pre-training improves general language understanding and reasoning skills for large language models beyond coding tasks.


BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model

http://arxiv.org/abs/2408.10903v1

Compressor summary: The BEYOND DIALOGUE framework improves role-playing by aligning dialogue with profile traits using beyond dialogue tasks, reasoning outcomes, and fine-grained alignment, achieving better performance than existing methods.


Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs

http://arxiv.org/abs/2408.10902v1

Compressor summary: Soda-Eval is a new dataset to evaluate chatbots based on Soda, revealing issues with coherence and commonsense knowledge, and showing that fine-tuning LLMs improves performance.


Analytical and Empirical Study of Herding Effects in Recommendation Systems

http://arxiv.org/abs/2408.10895v1

Compressor summary: The paper proposes a mathematical model to study how rating aggregation rules and review selection mechanisms can correct assessment errors caused by herding effects in online product ratings, and tests the model on Amazon and TripAdvisor datasets.


On Learning Action Costs from Input Plans

http://arxiv.org/abs/2408.10889v1

Compressor summary: The paper introduces a new problem of learning action costs from unlabeled input plans for optimal planning and presents an algorithm, $LACFIP^k$, that solves it with theoretical and empirical evidence.


Low-Quality Image Detection by Hierarchical VAE

http://arxiv.org/abs/2408.10885v1

Compressor summary: The study proposes a method to detect and explain low-quality images in unsupervised ways using hierarchical variational autoencoders.


DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

http://arxiv.org/abs/2408.10883v1

Compressor summary: The text proposes a new method for detecting fake news using dynamic analysis and adaptive discriminators, which are more flexible than existing methods.


DBHP: Trajectory Imputation in Multi-Agent Sports Using Derivative-Based Hybrid Prediction

http://arxiv.org/abs/2408.10878v1

Compressor summary: The paper proposes a DBHP framework that uses neural networks with Set Transformers to predict missing trajectory data for multiple agents, considering their physical constraints and dynamics, and achieves better results than existing methods in real-world scenarios.


V-RoAst: A New Dataset for Visual Road Assessment

http://arxiv.org/abs/2408.10872v1

Compressor summary: The paper proposes an approach using Vision Language Models to assess road safety from crowdsourced images, overcoming limitations of traditional CNNs and providing a scalable, cost-effective, and automated solution for low-resource settings.


Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

http://arxiv.org/abs/2408.10865v1

Compressor summary: The paper proposes new algorithms for a multi-player bandit problem with stochastic arrivals and decentralized learning, using explore then commit framework.


Feature Selection from Differentially Private Correlations

http://arxiv.org/abs/2408.10862v1

Compressor summary: The paper evaluates a new method for private feature selection in high-dimensional datasets and shows it performs better than existing methods.


Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

http://arxiv.org/abs/2408.10858v1

Compressor summary: The authors propose a multi-task reinforcement learning framework that uses a centralized reward agent to distill knowledge from various tasks and improve learning efficiency.


Perception-guided Jailbreak against Text-to-Image Models

http://arxiv.org/abs/2408.10848v1

Compressor summary: The paper proposes a method to generate natural attack prompts for Text-to-Image models that can produce inappropriate images by substituting unsafe words with similar but safe phrases.


Aligning Object Detector Bounding Boxes with Human Preference

http://arxiv.org/abs/2408.10844v1

Compressor summary: The paper investigates how to align object detections with human preference for box size and proposes an asymmetric bounding box regression loss to encourage large boxes over small ones.


Benchmarking Large Language Models for Math Reasoning Tasks

http://arxiv.org/abs/2408.10839v1

Compressor summary: The paper introduces a benchmark to compare seven algorithms for mathematical reasoning using LLMs across five datasets and four large models, exploring efficiency and performance trade-offs and practical applications.


Multilevel CNNs for Parametric PDEs based on Adaptive Finite Elements

http://arxiv.org/abs/2408.10838v1

Compressor summary: The paper presents a neural network that approximates parameter-to-solution maps for high-dimensional partial differential equations using adaptively refined finite element meshes and a reliable error estimator, achieving accuracy and complexity similar to low-rank tensor regression methods.


Trustworthy Compression? Impact of AI-based Codecs on Biometrics for Law Enforcement

http://arxiv.org/abs/2408.10823v1

Compressor summary: AI image compression can affect biometric recognition, especially for iris and fabric/tattoo images, while fingerprint recognition remains robust.


Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

http://arxiv.org/abs/2408.10822v1

Compressor summary: STGormer is a new neural network model for traffic forecasting that effectively uses spatial and temporal information in traffic data to improve prediction accuracy.


Constructing a High Temporal Resolution Global Lakes Dataset via Swin-Unet with Applications to Area Prediction

http://arxiv.org/abs/2408.10821v1

Compressor summary: The paper introduces GLAKES-Additional, a database with biennial data on over 150,000 lakes worldwide, which helps monitor lake dynamics and attribute area changes to climate and land use factors using advanced machine learning models.


Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains

http://arxiv.org/abs/2408.10819v1

Compressor summary: GS-KGC is a novel generative completion framework for knowledge graphs that uses question-answering, subgraph extraction, and contextual information to discover new triples and facts beyond existing KGs.


Learning Randomized Algorithms with Transformers

http://arxiv.org/abs/2408.10818v1

Compressor summary: The paper shows how randomization can improve transformer models in various tasks by learning random parameters that enhance robustness and performance.


Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

http://arxiv.org/abs/2408.10811v1

Compressor summary: The study explores if non-English-centric LLMs have dual latent languages and how they handle cultural conflicts between them when generating text in different languages.


ColBERT Retrieval and Ensemble Response Scoring for Language Model Question Answering

http://arxiv.org/abs/2408.10808v1

Compressor summary: The paper presents methods to improve small language models' performance in telecom question answering, achieving leading marks of 81.9% (Phi-2) and 57.3% (Falcon-7B).


Inverse Deep Learning Ray Tracing for Heliostat Surface Prediction

http://arxiv.org/abs/2408.10802v1

Compressor summary: Inverse Deep Learning Ray Tracing (iDLR) is a novel method for predicting heliostat surfaces in Concentrating Solar Power plants using target images, improving safety and efficiency by accounting for non-ideal surface conditions.


Universal Novelty Detection Through Adaptive Contrastive Learning

http://arxiv.org/abs/2408.10798v1

Compressor summary: The paper proposes AutoAugOOD, a probabilistic auto-negative pair generation method that leverages contrastive learning to create a universal novelty detector that adapts well to different distribution shifts and settings.


Adversarial Attack for Explanation Robustness of Rationalization Models

http://arxiv.org/abs/2408.10795v1

Compressor summary: UAT2E is a method that tests the robustness of explainable AI models by inserting malicious triggers into inputs without changing their predictions, revealing their vulnerability in providing trustworthy explanations.


Tapping in a Remote Vehicle's onboard LLM to Complement the Ego Vehicle's Field-of-View

http://arxiv.org/abs/2408.10794v1

Compressor summary: Key points: - Advanced automotive systems use CPS for driver assistance but have limitations in occluded scenarios - V2I and V2V communication can share data about surrounding objects to improve safety - LLMs can enhance voice assistants and communicate what other vehicles see - GPT-4V and GPT-4o are powerful LLMs that can understand traffic situations and spot participants Summary: The text discusses how CPS, V2I/V2V communication, and LLMs can improve automotive systems by overcoming occlusions, sharing data, and enhancing voice assistants. It also evaluates GPT-4V and GPT-4o as LLMs that can understand traffic situations and spot participants.


Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics

http://arxiv.org/abs/2408.10789v1

Compressor summary: The paper proposes a part-aware 3D reconstruction method using a hybrid representation of superquadrics and 2D Gaussians to parse objects or scenes into semantic parts from multi-view image inputs, achieving accurate geometry reconstruction and high-quality rendering.


Understanding the Skills Gap between Higher Education and Industry in the UK in Artificial Intelligence Sector

http://arxiv.org/abs/2408.10788v1

Compressor summary: The paper examines UK university AI courses' alignment with industry demands and finds they are strong in programming and machine learning but weak in data science and maths.


LightMDETR: A Lightweight Approach for Low-Cost Open-Vocabulary Object Detection Training

http://arxiv.org/abs/2408.10787v1

Compressor summary: LightMDETR is a lightweight version of MDETR that uses a Deep Fusion Encoder to represent image and text modalities for improved object detection and classification with less computational cost.


Just a Hint: Point-Supervised Camouflaged Object Detection

http://arxiv.org/abs/2408.10777v1

Compressor summary: The paper proposes a method for detecting camouflaged objects using only one point supervision and unsupervised contrastive learning to address the challenges of ambiguous boundaries and feature instability.


Generative AI in Industrial Machine Vision -- A Review

http://arxiv.org/abs/2408.10775v1

Compressor summary: Key points: - Machine vision enhances automation, quality control, and operational efficiency in industrial settings using visual data - Generative AI is a promising approach to improve pattern recognition in machine vision - A literature review based on the PRISMA guidelines was conducted to analyze over 1,200 papers on generative AI in industrial machine vision - The main use of generative AI in machine vision is data augmentation for tasks such as classification and object detection - The paper identifies application challenges and data requirements for successful generative AI implementation in machine vision Summary: The paper reviews recent advances and applications of generative AI in industrial machine vision, focusing on data augmentation for tasks like classification and object detection, and highlighting the challenges and data needs for its successful use.


Flexora: Flexible Low Rank Adaptation for Large Language Models

http://arxiv.org/abs/2408.10774v1

Compressor summary: Flexora improves fine-tuning of large language models by flexibly selecting important layers for different downstream tasks using hyperparameter optimization.


Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

http://arxiv.org/abs/2408.10764v1

Compressor summary: Otter inserts extra parameters into transformer models to predict calibration signals for safer, more reliable language generation while saving time and space.


SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection

http://arxiv.org/abs/2408.10760v1

Compressor summary: SAM-COD is a unified framework that supports various weakly-supervised labels for camouflaged object detection, using a prompt adapter, response filter, semantic matcher, and prompt-adaptive knowledge distillation to improve performance.


Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation

http://arxiv.org/abs/2408.10755v1

Compressor summary: The text discusses a technique to generate fair data using knowledge distillation, which improves fairness, quality, and utility of synthetic data compared to existing methods.


TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

http://arxiv.org/abs/2408.10739v1

Compressor summary: TrackNeRF improves NeRFs by enforcing global 3D consistency among feature tracks for more accurate reconstruction with sparse and noisy views.


Towards Efficient Large Language Models for Scientific Text: A Review

http://arxiv.org/abs/2408.10729v1

Compressor summary: The text discusses how large language models can improve scientific tasks, but their high resource requirements call for methods to make them more cost-effective, either by reducing model size or enhancing data quality.


Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian

http://arxiv.org/abs/2408.10724v1

Compressor summary: The authors introduce a benchmark dataset for neural news detection in four languages, using outputs from various multilingual generators and classifiers, to study the interpretability and robustness of machine-generated text detectors.


MEGen: Generative Backdoor in Large Language Models via Model Editing

http://arxiv.org/abs/2408.10722v1

Compressor summary: The paper proposes MEGen, an editing-based generative backdoor for NLP tasks, which can insert a trigger into an LLM to output dangerous information when activated.


Towards Foundation Models for the Industrial Forecasting of Chemical Kinetics

http://arxiv.org/abs/2408.10720v1

Compressor summary: The paper proposes a new machine learning method (MLP-Mixer) to improve modeling of stiff chemical reactions in fluid dynamics, comparing it with traditional methods on a benchmark dataset.


Accelerated training of deep learning surrogate models for surface displacement and flow, with application to MCMC-based history matching of CO2 storage operations

http://arxiv.org/abs/2408.10717v1

Compressor summary: Key points: - New surrogate modeling framework for CO2 storage operations history matching using deep learning - Inexpensive flow-only simulations combined with coupled runs - Effective rock compressibility, residual U-Net architectures for saturation, pressure and surface displacement predictions - Median relative error less than 4% for all variables - Surrogate models incorporated into hierarchical Markov chain Monte Carlo history matching workflow - High prior uncertainty and data types affect history matching results Summary: The authors present a new deep learning framework for history matching of CO2 storage operations using surrogate models that predict saturation, pressure and surface displacement from flow-only simulations and coupled runs. They use effective rock compressibility and residual U-Net architectures to achieve low error rates and incorporate uncertainty in the geomodels and data types.


Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

http://arxiv.org/abs/2408.10715v1

Compressor summary: The study shows that fine-tuning LLaMA models for radiation oncology using QLoRA algorithm improves the quality of physician letters, which are valuable in clinical practice but time-consuming to generate manually.


Offline Model-Based Reinforcement Learning with Anti-Exploration

http://arxiv.org/abs/2408.10713v1

Compressor summary: MoMo is a new offline reinforcement learning algorithm that uses anti-exploration techniques to improve performance and handle out-of-distribution states without large ensembles of models.


Investigating Context Effects in Similarity Judgements in Large Language Models

http://arxiv.org/abs/2408.10711v1

Compressor summary: This paper investigates how large language models are influenced by order bias in similarity judgments, similar to humans, and discusses the implications for AI applications that rely on human values and expectations.


Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

http://arxiv.org/abs/2408.10710v1

Compressor summary: Key points: - The paper proposes a novel framework for multiple weld seams extraction using RGB images and 3D point clouds - The method uses region growth to fine-edge extract the weld seams within the region of interest - The method is accelerated by a pre-trained deep learning model - The method is tested on various workpieces with linear and curved weld seams and shows potential for industrial applications Summary: The paper presents a new method that uses RGB images and 3D point clouds to extract multiple weld seams accurately and efficiently, using region growth and a pre-trained deep learning model.


Variable Assignment Invariant Neural Networks for Learning Logic Programs

http://arxiv.org/abs/2408.10709v1

Compressor summary: The paper presents a technique to learn rules from state transitions using neural networks that are invariant to variable permutation and naming, improving generalization and scalability.


Large Language Models for Multimodal Deformable Image Registration

http://arxiv.org/abs/2408.10703v1

Compressor summary: LLM-Morph uses pre-trained Large Language Models to align deep features from different modal medical images for coarse-to-fine Multimodal Deformable Image Registration.


AnyGraph: Graph Foundation Model in the Wild

http://arxiv.org/abs/2408.10700v1

Compressor summary: AnyGraph is a unified graph foundation model that leverages Graph Mixture-of-Experts to handle structure and feature heterogeneity, enabling fast adaptation and favorable scaling in various graph tasks.


MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification

http://arxiv.org/abs/2408.10694v1

Compressor summary: The paper proposes MsMemoryGAN, a model that filters adversarial perturbations from vein images by reconstructing them using memory modules and a learnable metric.


Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

http://arxiv.org/abs/2408.10692v1

Compressor summary: The text proposes a method to measure uncertainty in language model generations by learning the conditional dependency between generation steps using a regression model.


Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

http://arxiv.org/abs/2408.10691v1

Compressor summary: Key points: - Large language models are versatile but require fine-tuning and resources for deployment - Traditional fine-tuning methods need more GPU memory than mainstream hardware can offer - Memory-efficient methods are needed to reduce energy consumption, costs, and environmental impact - The paper reviews memory-efficient fine-tuning methods and model compression techniques for deploying LLMs over the network edge Summary: The paper surveys memory-efficient methods and model compression techniques to enable sustainable deployment of large language models over the network edge.


Genesis: Towards the Automation of Systems Biology Research

http://arxiv.org/abs/2408.10689v1

Compressor summary: The Genesis project develops robot scientists that use advanced AI and hardware to rapidly improve biological models in system biology.


Rejection in Abstract Argumentation: Harder Than Acceptance?

http://arxiv.org/abs/2408.10683v1

Compressor summary: This paper introduces rejection conditions for argumentation frameworks, which allow flexible rejection of arguments based on logic programs and have high expressiveness and complexity.


Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

http://arxiv.org/abs/2408.10682v1

Compressor summary: The paper proposes Dynamic Unlearning Attack to proactively assess vulnerabilities of unlearned models and Latent Adversarial Unlearning to enhance their robustness against adversarial attacks.


HMoE: Heterogeneous Mixture of Experts for Language Modeling

http://arxiv.org/abs/2408.10681v1

Compressor summary: HMoE is a new model that uses experts with different sizes and capacities to handle variable input data, leading to better performance and efficiency compared to traditional homogeneous MoE models.


Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper

http://arxiv.org/abs/2408.10680v1

Compressor summary: The study explores strategies to improve multilingual speech models on new languages without original data while preserving performance on the original languages, using LoRA-based methods and a learnable rank coefficient.


DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

http://arxiv.org/abs/2408.10679v1

Compressor summary: DemMamba is an alignment-free video demoireing network that uses Mamba to model moire patterns' spatial and temporal relationships, improving performance and visual quality.


Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

http://arxiv.org/abs/2408.10676v1

Compressor summary: The paper proposes RNA, a method that improves OOD detection and classification by using representation norm as a new dimension and training in a way that creates a noticeable difference between ID and OOD data.


Neural Exploratory Landscape Analysis

http://arxiv.org/abs/2408.10672v1

Compressor summary: The paper presents NeurELA, an end-to-end framework that profiles landscape features for MetaBBO algorithms, enhancing their performance and autonomy.


A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning

http://arxiv.org/abs/2408.10670v1

Compressor summary: The text proposes a new technique combining thermal stereography and deep learning to measure waves noncontactly and accurately, overcoming challenges posed by water's optical properties.


REInstruct: Building Instruction Data from Unlabeled Corpus

http://arxiv.org/abs/2408.10663v1

Compressor summary: REInstruct is a method to automatically create instruction data for language models using unlabeled texts and rewriting techniques, achieving high performance without reliance on proprietary LLMs or human annotation.


UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement

http://arxiv.org/abs/2408.10653v1

Compressor summary: The paper proposes a deep unfolding network (DUN) that integrates color priors and inter-stage feature transformation to improve underwater image enhancement (UIE), offering more accurate, reliable, and explainable results than existing methods.


Inferring Underwater Topography with FINN

http://arxiv.org/abs/2408.10649v1

Compressor summary: The study shows that FINN, a novel hybrid model combining physics and machine learning, can accurately reconstruct underwater topography from wave dynamics data, outperforming conventional ML and physics-aware ML models.


Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

http://arxiv.org/abs/2408.10646v1

Compressor summary: The study explores how well large language models can represent and share factual knowledge across different languages, finding that script similarity affects representation sharing and improving accuracy.


Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation

http://arxiv.org/abs/2408.10642v1

Compressor summary: The text introduces a paradigm for aligning large language models to human preference using supervised fine tuning and reinforcement learning from human feedback, and proposes a new training metric and loss function for supervised fine tuning.


A Review of Human-Object Interaction Detection

http://arxiv.org/abs/2408.10641v1

Compressor summary: This paper reviews image-based human-object interaction detection methods, datasets, and challenges, as well as recent advancements in zero-shot, weakly supervised learning, and large-scale language models.


Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

http://arxiv.org/abs/2408.10635v1

Compressor summary: Strategist is a new method that uses LLMs to improve skills for multi-agent games through self-play simulations and reflection, achieving better performance than existing methods in action planning and dialogue generation.


Interactive Counterfactual Generation for Univariate Time Series

http://arxiv.org/abs/2408.10633v1

Compressor summary: Key points: - Interactive method for generating counterfactual explanations for univariate time series classification tasks - Uses 2D projections and decision boundary maps to improve interpretability - Allows users to manipulate projected data points and explore hypothetical scenarios - Validated on ECG5000 dataset, shows significant improvements in interpretability and user understanding Summary: The paper presents an interactive method that uses 2D projections and decision boundary maps to help users generate counterfactual explanations for univariate time series classification tasks, improving interpretability and user understanding.


LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

http://arxiv.org/abs/2408.10631v1

Compressor summary: LLM-Barber is a one-shot pruning framework that optimizes large language models by rebuilding the sparsity mask without retraining or weight reconstruction, achieving state-of-the-art results in perplexity and zero-shot performance.


Finding the DeepDream for Time Series: Activation Maximization for Univariate Time Series

http://arxiv.org/abs/2408.10628v1

Compressor summary: The paper introduces Sequence Dreaming, a method to improve interpretability of neural networks for time series data by generating sequences that reflect the critical features identified by the model.


WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification

http://arxiv.org/abs/2408.10624v1

Compressor summary: WRIM-Net is a network that mines modality-invariant information for visible-infrared person re-identification by using multi-dimension interactive information mining and auxiliary-information-based contrastive learning.


TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles

http://arxiv.org/abs/2408.10623v1

Compressor summary: TextMastero is a multilingual scene text editing method using latent diffusion models with glyph conditioning and latent guidance modules to improve accuracy and style similarity.


Novel Change Detection Framework in Remote Sensing Imagery Using Diffusion Models and Structural Similarity Index (SSIM)

http://arxiv.org/abs/2408.10619v1

Compressor summary: The Diffusion Based Change Detector combines diffusion models and SSIM to create accurate and interpretable change maps for remote sensing data.


A toolbox for calculating objective image properties in aesthetics research

http://arxiv.org/abs/2408.10616v1

Compressor summary: The Aesthetics Toolbox is an open-access, easy-to-use web tool that allows users to calculate various quantitative image properties related to visual aesthetics research using standardized and consistent methods.


Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

http://arxiv.org/abs/2408.10615v1

Compressor summary: The paper introduces a new method to help large language models better handle irrelevant information in math problems, improving their reasoning skills.


Generalizable Facial Expression Recognition

http://arxiv.org/abs/2408.10614v1

Compressor summary: The paper proposes a novel FER pipeline that uses CLIP features and learned sigmoid masks to extract expression features, improving zero-shot generalization on unseen test sets.


On the Approximability of Stationary Processes using the ARMA Model

http://arxiv.org/abs/2408.10610v1

Compressor summary: The authors study how well ARMA models can approximate stationary random variables using Hardy space functions and find some limitations and possibilities.


PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

http://arxiv.org/abs/2408.10609v1

Compressor summary: PerturBench is a framework that helps predict cell perturbation effects, standardize benchmarking, and improve model evaluation in single-cell research.


Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

http://arxiv.org/abs/2408.10608v1

Compressor summary: Bayesian-Theory based Bias Removal (BTBR) is a framework that identifies and removes implicit biases from large language models trained on biased data.


MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

http://arxiv.org/abs/2408.10605v1

Compressor summary: MUSES is a novel AI system that generates 3D-controllable images from user queries using a progressive workflow with three key components and outperforms existing methods on newly constructed benchmarks.


Multilingual Non-Factoid Question Answering with Silver Answers

http://arxiv.org/abs/2408.10604v1

Compressor summary: MuNfQuAD is a multilingual QA dataset with non-factoid questions from BBC news articles that can be answered using silver answers and has potential for low-resource languages.


Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

http://arxiv.org/abs/2408.10600v1

Compressor summary: The paper proposes a new method using self-supervised learning and triplet networks to diagnose breast tumors from unlabeled ultrasound videos, achieving high performance with very few labeled samples.


An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs

http://arxiv.org/abs/2408.10593v1

Compressor summary: SpaMo is a novel Sign Language Translation framework that leverages spatial and motion features from visual encoders and large language models for better translation without relying on domain-specific fine-tuning.


Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

http://arxiv.org/abs/2408.10592v1

Compressor summary: The paper proposes a hologram reasoning scheme to solve Algebra Problems with Geometry Diagrams (APGDs) using a graph model pool and deep reinforcement learning, achieving high accuracy and interpretability.


DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

http://arxiv.org/abs/2408.10588v1

Compressor summary: DEGAS is a method that uses 3D Gaussian Splatting and a conditional variational autoencoder to create realistic full-body avatars with facial expressions driven by audio and body motion.


MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

http://arxiv.org/abs/2408.10575v1

Compressor summary: MUSE is a TVR method that uses multi-scale representations to better match videos with natural language queries, using a feature pyramid and an efficient Mamba structure.


Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

http://arxiv.org/abs/2408.10573v1

Compressor summary: The paper introduces a method to improve the quality of answers generated by large language models by rewriting user questions, without needing human annotations.


Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

http://arxiv.org/abs/2408.10571v1

Compressor summary: The paper proposes a Prompt-Agnostic Adversarial Perturbation (PAP) method that protects personal images and artistic styles from customized text-to-image generation models by generating perturbations based on the prompt distribution.


SparseGrow: Addressing Growth-Induced Forgetting in Task-Agnostic Continual Learning

http://arxiv.org/abs/2408.10566v1

Compressor summary: SparseGrow is a novel approach to overcome growth-induced forgetting (GIFt) in continual learning by using data-driven sparse layer expansion and on-data initialization to balance adaptability and knowledge retention.


Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

http://arxiv.org/abs/2408.10557v1

Compressor summary: The paper proposes Other HuBERT, a modified speech modeling method that separates content and other information, and shows its effectiveness in encoding and learning the latter with robust data augmentation.


Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

http://arxiv.org/abs/2408.10556v1

Compressor summary: Hokoff is a new dataset and framework for offline RL and MARL research, based on a complex real-world MOBA game, that reveals the limitations of existing methods.


Target-Prompt Online Graph Collaborative Learning for Temporal QoS Prediction

http://arxiv.org/abs/2408.10555v1

Compressor summary: TOGCL is a framework that uses graph attention networks and Transformer encoders to predict QoS in service-oriented architecture by modeling user-service interactions and long-term dependencies.


AI-Based IVR

http://arxiv.org/abs/2408.10549v1

Compressor summary: The article proposes using AI to improve IVR systems in call centers, reducing operator workload and improving customer service, especially for non-English languages like Kazakh.


Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution

http://arxiv.org/abs/2408.10548v1

Compressor summary: Key points: - Tabular data is a common and challenging data type that requires advanced modeling techniques - Transformer architectures inspired by natural language processing have revolutionized tabular data analysis - The paper reviews different aspects of language modeling for tabular data, including data structures, datasets, models, and challenges Summary: The paper provides a comprehensive survey of language modeling techniques for tabular data, covering various aspects such as data types, models, and tasks. It highlights the impact of transformer architectures on tabular data analysis and identifies future research directions.


Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

http://arxiv.org/abs/2408.10543v1

Compressor summary: The paper introduces Diff-PCC, a diffusion-based point cloud compression method that uses dual-space latent representation and diffusion generator to produce high-quality reconstructions with state-of-the-art compression performance.


Training Matting Models without Alpha Labels

http://arxiv.org/abs/2408.10539v1

Compressor summary: The paper proposes a method to perform deep image matting using rough annotations like trimaps, and a directional distance consistency loss to infer alpha values at transition areas, achieving comparable or better results than fine-label-supervised methods.


Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resections with Pringle Maneuver

http://arxiv.org/abs/2408.10538v1

Compressor summary: The text describes an AI-assisted monitoring system for laparoscopic liver resection that uses a novel dataset, PmLR50, to improve workflow recognition and detect ischemic injury caused by the Pringle maneuver.


FAGStyle: Feature Augmentation on Geodesic Surface for Zero-shot Text-guided Diffusion Image Style Transfer

http://arxiv.org/abs/2408.10533v1

Compressor summary: FAGStyle is a text-guided image style transfer method that uses sliding window crop, feature augmentation, and self-correlation consistency to achieve high-quality stylization while preserving content.


NutrifyAI: An AI-Powered System for Real-Time Food Detection, Nutritional Analysis, and Personalized Meal Recommendations

http://arxiv.org/abs/2408.10532v1

Compressor summary: The paper presents a computer vision-based food detection and nutrition analysis system that offers personalized meal recommendations through mobile and web applications.


NoMatterXAI: Generating "No Matter What" Alterfactual Examples for Explaining Black-Box Text Classification Models

http://arxiv.org/abs/2408.10528v1

Compressor summary: The paper proposes MoMatterXAI, a novel algorithm that generates alterfactual explanations for text classification tasks to ensure AI models' decisions are not biased against specific attributes.


EdgeNAT: Transformer for Efficient Edge Detection

http://arxiv.org/abs/2408.10527v1

Compressor summary: Key points: - EdgeNAT is a one-stage transformer-based edge detector with DiNAT as the encoder - It captures global and local features efficiently - It uses a novel SCAF-MLA decoder to enhance feature representation - It achieves state-of-the-art performance on RGB and depth images Summary: EdgeNAT is a new edge detector that uses transformers with DiNAT encoder and SCAF-MLA decoder to extract object boundaries and meaningful edges, outperforming existing methods on multiple datasets.


XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

http://arxiv.org/abs/2408.10524v1

Compressor summary: The XCB module improves the recognition of uncommon phrases in bilingual settings by enhancing the dominant language model with auxiliary language biasing and a specific loss, without increasing inference overhead.


BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

http://arxiv.org/abs/2408.10518v1

Compressor summary: The authors introduce a new Bangla sign language dataset and a hybrid CNN model, achieving 97.92% accuracy in recognizing Bangla signs.


Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba

http://arxiv.org/abs/2408.10517v1

Compressor summary: The paper introduces Decision MetaMamba, a model that enhances offline reinforcement learning by combining patterns from short sequences and using a State Space Model to selectively combine information from distant sequences.


Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups

http://arxiv.org/abs/2408.10516v1

Compressor summary: The study proposes a data augmentation framework using LLM and PLM to create personalized dialogue data for SDSs engaging with diverse user groups, enhancing their performance.


Approximate Estimation of High-dimension Execution Skill for Dynamic Agents in Continuous Domains

http://arxiv.org/abs/2408.10512v1

Compressor summary: The paper proposes a particle-filter-based estimator for human agent's execution error that can handle arbitrary shapes and changes over time, improving AI decision-making assistance.


Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

http://arxiv.org/abs/2408.10511v1

Compressor summary: scCLG, a novel method for scRNA-seq data clustering, uses ChebAE and selective training to handle challenges such as boundary nodes and high-quality node variation, achieving superior performance compared to existing approaches.


QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

http://arxiv.org/abs/2408.10504v1

Compressor summary: The paper introduces Query-dependent Prompt Optimization (QPO), a method that uses reinforcement learning to fine-tune a small language model to generate optimal prompts for large language models, improving their performance on various tasks without frequent interactions.


Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

http://arxiv.org/abs/2408.10503v1

Compressor summary: The paper explores using vision transformers to classify hand images based on their unique features, explains their internal workings, and introduces adaptive distillation methods to transfer knowledge across domains without forgetting.


QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

http://arxiv.org/abs/2408.10497v1

Compressor summary: The authors propose a new method to compress prompts for generative LLM using cross-attention and information bottleneck theory, achieving better performance and lower latency.


GPT-based Textile Pilling Classification Using 3D Point Cloud Data

http://arxiv.org/abs/2408.10496v1

Compressor summary: The TextileNet8 dataset is a novel 3D point cloud data set for textile pilling assessment with high accuracy and can be used to train the PointGPT+NN model, which outperforms other models.


Clustering by Mining Density Distributions and Splitting Manifold Structure

http://arxiv.org/abs/2408.10493v1

Compressor summary: This paper proposes a new top-down approach for spectral clustering that improves efficiency and adaptability by using local structures, density-based splitting, and a novel similarity measure for micro-clusters.


Achieving the Tightest Relaxation of Sigmoids for Formal Verification

http://arxiv.org/abs/2408.10491v1

Compressor summary: The paper introduces $\alpha$-sig, a method that improves the convex relaxation of sigmoid activation functions for formal verification by using tunable hyperplanes to create element-wise tight bounds.


Is the Lecture Engaging for Learning? Lecture Voice Sentiment Analysis for Knowledge Graph-Supported Intelligent Lecturing Assistant (ILA) System

http://arxiv.org/abs/2408.10492v1

Compressor summary: The paper presents an ILA system that uses a knowledge graph and real-time voice analysis to help instructors improve their lectures' effectiveness and student learning.


Analysis of Plan-based Retrieval for Grounded Text Generation

http://arxiv.org/abs/2408.10490v1

Compressor summary: The paper explores how planning can help instruction-tuned LLMs reduce hallucinations in long-form text generation by guiding retrieval of relevant facts.


Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

http://arxiv.org/abs/2408.10488v1

Compressor summary: This paper introduces Event-CSL, a high-resolution event stream sign language dataset, and a novel baseline method using Mamba model for improved AI-assisted sign language translation in various conditions.


MambaEVT: Event Stream based Visual Object Tracking using State Space Model

http://arxiv.org/abs/2408.10487v1

Compressor summary: The paper proposes a new event camera-based visual tracking framework that uses Mamba networks for feature extraction, interaction, and dynamic template update to improve accuracy and efficiency.


PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

http://arxiv.org/abs/2408.10483v1

Compressor summary: The paper introduces PRformer, a model that combines Pyramid RNN embeddings with Transformer encoder to improve temporal sequence representation and performance on time series prediction tasks.


An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing

http://arxiv.org/abs/2408.10479v1

Compressor summary: The paper proposes a one-stage reinforcement learning method for order-dispatching in ride-hailing services, improving matching efficiency and user experience.


Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

http://arxiv.org/abs/2408.10473v1

Compressor summary: SDS is a pruning framework that enhances the performance of pre-trained language models by optimizing weight distribution through two rounds of sparse pruning.


Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions

http://arxiv.org/abs/2408.10468v1

Compressor summary: Key points: - Large Language Models can leak sensitive information from training data - Influence Functions can trace privacy leakage back to the data, but have limitations - Heuristically Adjusted IF (HAIF) improves tracing accuracy by reducing weights of tokens with large gradient norms - HAIF is tested on two datasets and outperforms existing methods Summary: The paper proposes HAIF, a method to improve the ability of Language Models to trace privacy leakage back to their training data by adjusting the influence of tokens with large gradient norms. HAIF shows better performance than existing Influence Functions on various datasets and scenarios.


Learning Multimodal Latent Space with EBM Prior and MCMC Inference

http://arxiv.org/abs/2408.10467v1

Compressor summary: The text proposes a method that combines an energy-based model prior with Markov Chain Monte Carlo inference for better multimodal generation across different modalities.


Transfer Operator Learning with Fusion Frame

http://arxiv.org/abs/2408.10458v1

Compressor summary: The paper proposes a new framework that improves transfer learning in solving PDEs by combining fusion frame theory with POD-DeepONet, leading to better performance across different problems.