arxiv compressed, 2024-01-29

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-29 generated by the compressor, my personal LLM-based project.


EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

http://arxiv.org/abs/2401.15077v1

Compressor summary: EAGLE is a fast and lossless framework for accelerating Large Language Models using speculative sampling at the second-top-layer feature level.


Annotated Hands for Generative Models

http://arxiv.org/abs/2401.15075v1

Compressor summary: The authors propose a new way to train generative models, such as GANs and diffusion models, to create more realistic hand images by adding extra information about hands in the training data.


From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

http://arxiv.org/abs/2401.15071v1

Compressor summary: The paper studies how well large language models can handle different types of input (text, code, image, video) and their trustworthiness, generalizability, and causal reasoning abilities using qualitative analysis on various models including GPT-4 and Gemini.


Pairing Orthographically Variant Literary Words to Standard Equivalents Using Neural Edit Distance Models

http://arxiv.org/abs/2401.15068v1

Compressor summary: The paper introduces a new corpus of 19th century U.S. literature words with variants, trains neural edit distance models on it, and compares their performance with models trained on L2 English learners' errors, using different negative sample generation strategies.


Expert with Clustering: Hierarchical Online Preference Learning Framework

http://arxiv.org/abs/2401.15062v1

Compressor summary: EWC is a hierarchical contextual bandit framework that leverages low-dimensional latent space to accelerate user preference learning and minimize regret in mobility recommendation systems.


Fully Independent Communication in Multi-Agent Reinforcement Learning

http://arxiv.org/abs/2401.15059v1

Compressor summary: The paper explores how independent agents can communicate in multi-agent reinforcement learning without parameter sharing, proposes a new learning scheme for this setting, and studies the impact of network capacities on communication efficiency.


Deep learning-based approach for tomato classification in complex scenes

http://arxiv.org/abs/2401.15055v1

Compressor summary: The text proposes an AI-based computer vision system that can detect and classify different stages of ripening in tomato plants to optimize harvesting time.


LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

http://arxiv.org/abs/2401.15050v1

Compressor summary: LongFin is a multimodal document AI model that can handle long financial documents and outperforms existing models on the new LongForms dataset.


Unrecognizable Yet Identifiable: Image Distortion with Preserved Embeddings

http://arxiv.org/abs/2401.15048v1

Compressor summary: The paper introduces an image distortion technique that keeps biometric facial images unrecognizable to human eyes but identifiable by neural networks for privacy-aware biometric authentication systems.


Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning

http://arxiv.org/abs/2401.15043v1

Compressor summary: The authors introduce SimpleDC, a corpus for cancer education text simplification research, and explore various LLM-based methods, finding that RLHF with a novel reward function improves performance across metrics and adapts out-of-domain models to targeted domains.


PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

http://arxiv.org/abs/2401.15042v1

Compressor summary: The study introduces a framework called ProxyQA to evaluate the quality of long-form text generation by LLMs using human-curated meta-questions and proxy-questions with annotated answers.


On the generalization capacity of neural networks during generic multimodal reasoning

http://arxiv.org/abs/2401.15030v1

Compressor summary: The study evaluates various neural network architectures' ability to handle different types of multimodal generalization tasks and introduces gCOG, a new benchmark for multimodal reasoning research.


Learning Neural Radiance Fields of Forest Structure for Scalable and Fine Monitoring

http://arxiv.org/abs/2401.15029v1

Compressor summary: Neural radiance fields improve forest monitoring by capturing fine 3D structures and integrating multiple remote sensing modalities.


SliceGPT: Compress Large Language Models by Deleting Rows and Columns

http://arxiv.org/abs/2401.15024v1

Compressor summary: SliceGPT is a new sparsification technique that reduces the embedding dimension of large language models, enabling faster inference with fewer GPUs and less memory.


Airavata: Introducing Hindi Instruction-tuned LLM

http://arxiv.org/abs/2401.15006v1

Compressor summary: Airavata is a new Hindi-tuned LLM that improves OpenHathi's performance for assistive tasks, and comes with a dataset and evaluation framework to support further research on Indic languages.


BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

http://arxiv.org/abs/2401.15002v1

Compressor summary: BackdoorBench is a comprehensive benchmark for backdoor learning that provides an integrated implementation, comprehensive evaluation, and abundant analysis of state-of-the-art algorithms, helping researchers investigate, develop, and explore this field.


Graph-based Active Learning for Entity Cluster Repair

http://arxiv.org/abs/2401.14992v1

Compressor summary: The study proposes a new cluster repair method using graph metrics and active learning to handle duplicate-containing data sources effectively.


Mapping-to-Parameter Nonlinear Functional Regression with Novel B-spline Free Knot Placement Algorithm

http://arxiv.org/abs/2401.14989v1

Compressor summary: The Mapping-to-Parameter function model is a novel approach to nonlinear functional regression that uses B-spline basis functions and a new knot placement algorithm to map complex functions from infinite-dimensional space to finite-dimensional parameter space, outperforming existing methods in various applications.


Masked Pre-trained Model Enables Universal Zero-shot Denoiser

http://arxiv.org/abs/2401.14966v1

Compressor summary: The paper proposes a new image denoising method called MPI that uses masking strategy to pre-train a model on natural images and then iteratively fills the masked parts for efficient denoising of single noisy images.


Learning Universal Predictors

http://arxiv.org/abs/2401.14953v1

Compressor summary: This paper explores using Solomonoff Induction, a powerful universal predictor, in neural networks by leveraging meta-learning with Universal Turing Machines data to push the limits of problem solving.


DAM: Diffusion Activation Maximization for 3D Global Explanations

http://arxiv.org/abs/2401.14938v1

Compressor summary: The paper proposes DAM, a point cloud explainability method that uses a novel model and generates high-quality explanations with an adapted path gradient integration method, improving on existing methods in various aspects.


SSDOnt: an Ontology for representing Single-Subject Design Studies

http://arxiv.org/abs/2401.14933v1

Compressor summary: SSDOnt is an ontology for describing and annotating single-subject design studies, enabling complex questions and searches about them.


Do LLMs Dream of Ontologies?

http://arxiv.org/abs/2401.14931v1

Compressor summary: This paper studies how well large language models (LLMs) remember concepts from known ontologies, finding that their memorization depends on the popularity of these concepts online.


Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

http://arxiv.org/abs/2401.14923v1

Compressor summary: The paper introduces Behavior Model Reinforcement Learning (BMRL), an AI framework that helps individuals achieve their goals by personalizing and interpreting interventions on their decision-making processes.


PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus

http://arxiv.org/abs/2401.14919v1

Compressor summary: Key points: - Method for robust estimation of geometric models from noisy data in real-time - Neural network segments input data into clusters representing potential model instances - Determines model parameters for each instance separately using sample and inlier weights - Trained via task-specific loss functions and new synthetic datasets Summary: The paper proposes a fast and accurate method that uses a neural network to segment noisy data into clusters of geometric models and estimates their parameters with RANSAC-like technique.


MPTQ-ViT:Mixed-PrecisionPost-TrainingQuantizationforVisionTransformer

http://arxiv.org/abs/2401.14895v1

Compressor summary: SQ-b and OPT-m are techniques that improve post-training quantization of vision transformers, achieving significant accuracy improvements on various bit-width settings.


A structured regression approach for evaluating model performance across intersectional subgroups

http://arxiv.org/abs/2401.14893v1

Compressor summary: The paper proposes a structured regression method for AI fairness evaluation across intersectional subgroups, improving accuracy and providing confidence intervals and insights into harm factors.


Cross-Space Adaptive Filter: Integrating Graph Topology and Node Attributes for Alleviating the Over-smoothing Problem

http://arxiv.org/abs/2401.14876v1

Compressor summary: The paper proposes a cross-space adaptive filter (CSF) for Graph Convolutional Networks that combines topology and node attributes to address the over-smoothing problem and improve node classification performance.


F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods

http://arxiv.org/abs/2401.14869v1

Compressor summary: F-Eval is a bilingual benchmark to evaluate large language models based on expression, commonsense, and logic, using tasks that better assess their fundamental abilities than previous methods.


Implicit Neural Representation for Physics-driven Actuated Soft Bodies

http://arxiv.org/abs/2401.14861v1

Compressor summary: The paper presents a new method for controlling active soft bodies using neural networks and a physics-based simulation, which can accurately reproduce facial expressions and is easy to use.


Memory-Inspired Temporal Prompt Interaction for Text-Image Classification

http://arxiv.org/abs/2401.14856v1

Compressor summary: Our proposed method, Memory-Inspired Temporal Prompt Interaction (MITP), uses a two-stage human memory strategy to efficiently align vision and language modalities in large-scale multimodal models, reducing computational cost.


Extracting Process-Aware Decision Models from Object-Centric Process Data

http://arxiv.org/abs/2401.14847v1

Compressor summary: The paper introduces IODDA, a novel algorithm that discovers how decisions are made and structured in complex business processes using object-centric process logs.


Understanding Domain Generalization: A Noise Robustness Perspective

http://arxiv.org/abs/2401.14846v1

Compressor summary: The text discusses how machine learning algorithms for domain generalization may perform better than classic empirical risk minimization in dealing with label noise, but this advantage does not always translate to real-world benchmarks.


Adaptive Point Transformer

http://arxiv.org/abs/2401.14845v1

Compressor summary: AdaPT is a point cloud transformer model that adapts its token selection and budget at inference time, enabling efficient processing of large point clouds without sacrificing accuracy.


GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption

http://arxiv.org/abs/2401.14840v1

Compressor summary: The text introduces hybrid homomorphic encryption as a solution to protect privacy in machine learning, especially for classifying heart disease using encrypted ECG data.


Multi-modality action recognition based on dual feature shift in vehicle cabin monitoring

http://arxiv.org/abs/2401.14838v1

Compressor summary: The paper proposes a new method called DFS for recognizing driver actions using multiple camera modalities in car cabins, which integrates complementary features across modalities and shares feature extraction stages.


Text Image Inpainting via Global Structure-Guided Diffusion Models

http://arxiv.org/abs/2401.14832v1

Compressor summary: The paper introduces two new text inpainting datasets, one for scene text and one for handwritten text, and a novel neural framework called GSDM that uses global structure to restore corrupted texts with improved accuracy and quality.


TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

http://arxiv.org/abs/2401.14828v1

Compressor summary: TIPEditor is a 3D scene editor that uses text, images, and bounding boxes to accurately edit scenes while maintaining their background.


ChemDFM: Dialogue Foundation Model for Chemistry

http://arxiv.org/abs/2401.14818v1

Compressor summary: The text introduces ChemDFM, a large language model for chemistry that can understand chemical knowledge and languages better than general-domain models.


On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

http://arxiv.org/abs/2401.14811v1

Compressor summary: This paper examines scalar, Markovian rewards in RL and shows they cannot express many instances of multi-objective, risk-sensitive, and modal RL tasks.


PL-FSCIL: Harnessing the Power of Prompts for Few-Shot Class-Incremental Learning

http://arxiv.org/abs/2401.14807v1

Compressor summary: PL-FSCIL uses visual prompts with a pre-trained Vision Transformer to enable deep neural networks to learn new tasks incrementally from few labeled samples without forgetting previous tasks, mimicking human learning patterns.


Deep Variational Privacy Funnel: General Modeling with Applications in Face Recognition

http://arxiv.org/abs/2401.14792v1

Compressor summary: The study proposes a method for protecting privacy during machine learning using the Privacy Funnel model, which works well for various face recognition tasks.


Study of the gOMP Algorithm for Recovery of Compressed Sensed Hyperspectral Images

http://arxiv.org/abs/2401.14786v1

Compressor summary: Key points: - Hyperspectral Imaging (HSI) has many applications but is challenging to transmit due to large number of spectral bands - Compressive Sensing reduces HSI images by randomly subsampling spectral bands and reconstructing them with recovery algorithms - This work studies a data sparsification pre-processing stage prior to compression to ensure pixel sparsity - The gOMP algorithm reconstructs HSI images with high accuracy and fast convergence when pixels are highly sparsified but reduces image quality compared to original images Summary: The text proposes a method to compress and reconstruct hyperspectral images using data sparsification and the gOMP algorithm, which improves accuracy and speed but degrades image quality.


SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras

http://arxiv.org/abs/2401.14785v1

Compressor summary: The text describes a method for estimating human body poses from downwards-facing cameras on head-mounted devices using probabilistic joint rotations and a synthetic egocentric dataset, achieving state-of-the-art results with reduced parameters and faster speed.


Large Language Model Adaptation for Financial Sentiment Analysis

http://arxiv.org/abs/2401.14777v1

Compressor summary: The paper studies adaptation methods for large language models (LLMs) in financial sentiment analysis, showing that smaller LLMs can perform similarly to larger ones while being more efficient.


Spatial Transcriptomics Analysis of Zero-shot Gene Expression Prediction

http://arxiv.org/abs/2401.14772v1

Compressor summary: SGN is a new framework that can predict gene expression in tissue slides without training on specific gene types, by using functionality and phenotype information from a language model.


A Comparative Study of Compressive Sensing Algorithms for Hyperspectral Imaging Reconstruction

http://arxiv.org/abs/2401.14762v1

Compressor summary: The text compares different algorithms for compressing and recovering hyperspectral images, showing that greedy gOMP algorithm performs best in terms of accuracy and speed.


Off-Policy Primal-Dual Safe Reinforcement Learning

http://arxiv.org/abs/2401.14758v1

Compressor summary: The paper proposes conservative policy optimization and local policy convexification to improve safety constraints in primal-dual safe RL methods by addressing the uncertainty in cost estimation.


VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising

http://arxiv.org/abs/2401.14754v1

Compressor summary: The paper presents a novel end-to-end video transformer method for simultaneously deburring, enhancing low-light, and denoising videos using a multi-tier architecture and a new dataset.


Synthetic Multimodal Dataset for Empowering Safety and Well-being in Home Environments

http://arxiv.org/abs/2401.14743v1

Compressor summary: The paper introduces a new multimodal dataset of daily activities combining video simulations and knowledge graphs for hazard detection in home environments.


Personality Perception in Human Videos Altered by Motion Transfer Networks

http://arxiv.org/abs/2401.14733v1

Compressor summary: The text discusses how movement and appearance in digital characters affect the perceived personality traits of videos altered by motion transfer networks.


Residual Quantization with Implicit Neural Codebooks

http://arxiv.org/abs/2401.14732v1

Compressor summary: QINCo is a neural method for vector quantization that uses specialized codebooks per vector, predicted by a neural network, to improve data compression and search accuracy.


Sketch and Refine: Towards Fast and Accurate Lane Detection

http://arxiv.org/abs/2401.14729v1

Compressor summary: The paper presents SRLane, a lane detection method that combines keypoint-based and proposal-based approaches with a "Sketch-and-Refine" paradigm, achieving fast performance and good accuracy.


3D Reconstruction and New View Synthesis of Indoor Environments based on a Dual Neural Radiance Field

http://arxiv.org/abs/2401.14726v1

Compressor summary: Du-NeRF is a new method that combines two neural fields to achieve high-quality geometry reconstruction and view rendering for indoor environments, improving both novel view synthesis and 3D reconstruction.


pLitterStreet: Street Level Plastic Litter Detection and Mapping

http://arxiv.org/abs/2401.14719v1

Compressor summary: The paper presents a method to map street-level plastic litter using deep learning and vehicle-mounted cameras, creating an open-source dataset and showing its effectiveness with four object detection algorithms.


A Survey on Video Prediction: From Deterministic to Generative Approaches

http://arxiv.org/abs/2401.14718v1

Compressor summary: The paper surveys video prediction methods in computer vision, highlights challenges and trends, and introduces a new taxonomy based on stochasticity.


Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

http://arxiv.org/abs/2401.14717v1

Compressor summary: The authors propose a method that combines neural acoustic modeling with large language modeling to predict turn-taking and backchanneling locations in spoken dialogue, improving human-AI conversation quality.


Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement

http://arxiv.org/abs/2401.14707v1

Compressor summary: The paper proposes a disentanglement-based approach to improve adversarial robustness of deep neural networks by separating and aligning latent features in the pre-trained and fine-tuned models.


FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently

http://arxiv.org/abs/2401.14702v1

Compressor summary: The paper proposes FairSample, a framework that mitigates demographic parity biases in GCNs by injecting edges, using reinforcement learning for neighbor sampling, and applying regularization.


Under the Surface: Tracking the Artifactuality of LLM-Generated Data

http://arxiv.org/abs/2401.14698v1

Compressor summary: This paper explores the use of large language models to generate artificial data, highlighting their limitations in capturing human nuances and emphasizing ethical concerns in their application.


Asymptotic Midpoint Mixup for Margin Balancing and Moderate Broadening

http://arxiv.org/abs/2401.14696v1

Compressor summary: The paper proposes a new feature augmentation method called asymptotic midpoint mixup that improves representation learning by addressing both inter-class and intra-class collapse problems in transfer learning tasks.


Continuously Evolving Graph Neural Controlled Differential Equations for Traffic Forecasting

http://arxiv.org/abs/2401.14695v1

Compressor summary: The paper proposes a novel method (CEGNCDE) that captures both continuous temporal and spatial dependencies in traffic forecasting using a continuously evolving graph generator and a graph neural controlled differential equations framework.


TA-RNN: an Attention-based Time-aware Recurrent Neural Network Architecture for Electronic Health Records

http://arxiv.org/abs/2401.14694v1

Compressor summary: This paper proposes two interpretable deep learning models for predicting clinical outcomes in electronic health records (EHR) with irregular time intervals, and shows their superior performance on datasets for Alzheimer's disease and mortality prediction.


Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support

http://arxiv.org/abs/2401.14688v1

Compressor summary: Taiyi-Diffusion-XL is a bilingual Chinese and English text-to-image model that improves image generation and retrieval using CLIP and Stable-Diffusion-XL with vocabulary expansion, position encoding, and large vision-language pre-training.


SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

http://arxiv.org/abs/2401.14686v1

Compressor summary: SSR improves image encoder robustness and semantic segmentation performance using SAM as a regularizer during training, while maintaining efficiency.


MasonTigers@LT-EDI-2024: An Ensemble Approach towards Detecting Homophobia and Transphobia in Social Media Comments

http://arxiv.org/abs/2401.14681v1

Compressor summary: The paper presents methods to detect homophobia and transphobia across ten languages using monolingual transformers and ensemble models, achieving top results for eight languages.


MaLLaM -- Malaysia Large Language Model

http://arxiv.org/abs/2401.14680v1

Compressor summary: The authors trained MaLLaM, a large language model for the Malay language, with different parameter sizes and showed its effectiveness in understanding and generating natural language specific to Malaysia.


From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution

http://arxiv.org/abs/2401.14661v1

Compressor summary: The paper proposes a new lightweight model that combines super-resolution and YOLOv5 architecture to improve object detection in aerial images with small and densely clustered objects, achieving better performance than existing methods.


Scientific Large Language Models: A Survey on Biological & Chemical Domains

http://arxiv.org/abs/2401.14656v1

Compressor summary: Scientific LLMs are a new subclass of large language models that facilitate scientific discovery by enhancing natural language comprehension and extending across various scientific disciplines, with a focus on biological and chemical domains.


A Korean Legal Judgment Prediction Dataset for Insurance Disputes

http://arxiv.org/abs/2401.14654v1

Compressor summary: The paper presents a dataset for predicting legal outcomes in insurance disputes in Korean and shows that Sentence Transformer Fine-tuning can achieve similar performance to existing models with limited data.


Omnipredictors for Regression and the Approximate Rank of Convex Functions

http://arxiv.org/abs/2401.14645v1

Compressor summary: The paper introduces and studies sufficient statistics for learning omnipredictors that minimize expected loss for various loss functions in supervised learning, especially focusing on the regression setting with continuous labels.


Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution

http://arxiv.org/abs/2401.14641v1

Compressor summary: The proposed CNN algorithm improves video quality by reducing artifacts and increasing resolution when streaming with low internet speed.


Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs

http://arxiv.org/abs/2401.14640v1

Compressor summary: The paper introduces a new benchmark (CAQA) to evaluate the quality of citations generated by language models for question-answer pairs, using fine-grained categories and knowledge graphs.


T-Rex: Text-assisted Retrosynthesis Prediction

http://arxiv.org/abs/2401.14637v1

Compressor summary: T-Rex is a text-assisted retrosynthesis prediction approach that uses pre-trained language models like ChatGPT to generate descriptions and rank candidate reactants, improving the accuracy of synthesizing target molecules.


Efficient Constraint Generation for Stochastic Shortest Path Problems

http://arxiv.org/abs/2401.14636v1

Compressor summary: The paper proposes a new technique for solving Stochastic Shortest Path Problems that reduces unnecessary computation by ignoring sub-optimal actions and improves the efficiency of the iLAO* algorithm.


An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

http://arxiv.org/abs/2401.14630v1

Compressor summary: The paper evaluates the domain adaption ability of different Chinese Spelling Check (CSC) models using three new datasets from financial, medical, and legal domains and tests ChatGPT's performance as well.


Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning

http://arxiv.org/abs/2401.14626v1

Compressor summary: This paper proposes Lifelong Scene Graph Generation (LSGG), a novel framework that enables scene graph generation models to continuously learn new relationships without forgetting previous knowledge, using a limited number of exemplars and in-context learning techniques.


Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

http://arxiv.org/abs/2401.14625v1

Compressor summary: The paper proposes an Error Explainable Benchmark dataset to evaluate automatic speech recognition models based on both speech- and text-level aspects, improving user satisfaction and understanding system weaknesses.


Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

http://arxiv.org/abs/2401.14624v1

Compressor summary: Key points: - Large language models have potential but lack domain-specific data and resources - The proposed method uses a large language model to bootstrap seed information and retrieve related data from public corpora - The method creates a high-quality dataset called Knowledge Pile covering four major domains - The dataset improves the performance of large language models in reasoning tests - The dataset and code are open-sourced for academic sharing Summary: The paper introduces an efficient data collection method that uses a large language model to create a high-quality dataset called Knowledge Pile, which enhances the reasoning ability of large language models and is open-sourced for academic use.


Resilient Practical Test-Time Adaptation: Soft Batch Normalization Alignment and Entropy-driven Memory Bank

http://arxiv.org/abs/2401.14619v1

Compressor summary: ResiTTA is a test-time adaptation method that improves model performance by using resilient batch normalization and an entropy-driven memory bank to handle domain shifts and non-i.i.d. test samples.


Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

http://arxiv.org/abs/2401.14616v1

Compressor summary: Alternative Speech is a new approach that offers practical solutions to hate speech by correcting speakers and promoting social change, while working alongside counter-narratives.


Physically Informed Synchronic-adaptive Learning for Industrial Systems Modeling in Heterogeneous Media with Unavailable Time-varying Interface

http://arxiv.org/abs/2401.14609v1

Compressor summary: The paper proposes a data-physics-hybrid method called PISAL to solve PDEs for complex industrial systems with heterogeneous media and unknown parameters or time-varying interfaces, using synchronic-adaptive learning strategy.


Ricci flow-guided autoencoders in learning time-dependent dynamics

http://arxiv.org/abs/2401.14591v1

Compressor summary: The text describes a new method to learn nonlinear dynamics from PDEs using an autoencoder with an evolving manifold latent space based on Ricci flow.


Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias

http://arxiv.org/abs/2401.14589v1

Compressor summary: The study used a GPT-4 Turbo-based multi-agent system to simulate clinical decision-making conversations and improve diagnosis accuracy by mitigating cognitive biases.


CNA-TTA: Clean and Noisy Region Aware Feature Learning within Clusters for Online-Offline Test-Time Adaptation

http://arxiv.org/abs/2401.14587v1

Compressor summary: The proposed CNA-TTA method addresses domain shift by selectively training a model with clean and noisy regions in target data clusters using cluster structure and mixup inputs.


Diffusion Stochastic Optimization for Min-Max Problems

http://arxiv.org/abs/2401.14585v1

Compressor summary: The paper introduces DSS-OG, a new optimization method that improves upon the conventional stochastic gradient method for nonconvex problems and distributed scenarios, with a complexity comparable to its counterpart.


Design Your Own Universe: A Physics-Informed Agnostic Method for Enhancing Graph Neural Networks

http://arxiv.org/abs/2401.14580v1

Compressor summary: Key points: - The paper proposes a model-agnostic framework to enhance Physics-informed Graph Neural Networks (PGINNs) for learning through graph-structured data - The framework introduces additional nodes and rewiring connections with both positive and negative weights, guided by node labeling information - The framework improves GNNs' performance on over-smoothing, over-squashing, and heterophily adaption issues Summary: The paper presents a method to improve PGINNs by adding nodes and rewiring connections based on node labels, addressing common GNN challenges.


Recognizing Multiple Ingredients in Food Images Using a Single-Ingredient Classification Model

http://arxiv.org/abs/2401.14579v1

Compressor summary: The study presents an advanced method for recognizing ingredients segmented from food images using CNNs and novel algorithms, with a focus on multi-ingredient recognition.


GOAt: Explaining Graph Neural Networks via Graph Output Attribution

http://arxiv.org/abs/2401.14578v1

Compressor summary: Graph Output Attribution (GOAt) is a novel method to explain Graph Neural Networks (GNNs) by attributing graph outputs to input features, resulting in faithful, discriminative, and stable explanations.