arxiv compressed, 2024-02-14

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-14 generated by the compressor, my personal LLM-based project.


IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

http://arxiv.org/abs/2402.08682v1

Compressor summary: IM-3D is a text-to-3D model that uses video generators to improve multi-view generation and produces high-quality 3D outputs efficiently with reduced artifacts.


Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

http://arxiv.org/abs/2402.08680v1

Compressor summary: MARINE is a training-free and API-free framework that reduces object hallucinations in large vision-language models by enriching visual context and using classifier-free guidance.


COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

http://arxiv.org/abs/2402.08679v1

Compressor summary: Key points: - The paper studies how to generate controllable attacks on large language models (LLMs) using a text generation algorithm called COLD - The paper introduces the COLD-Attack framework that can enforce various control requirements such as fluency, stealthiness, sentiment, and coherence - The paper shows that COLD-Attack works well on different LLMs and scenarios Summary: The paper presents a novel framework for controllable attack generation on LLMs using COLD, a text generation algorithm, and demonstrates its effectiveness and versatility.


Graph Mamba: Towards Learning on Graphs with State Space Models

http://arxiv.org/abs/2402.08678v1

Compressor summary: Graph Mamba Networks (GMNs) are a new class of graph neural networks based on selective state space models that overcome the limitations of message-passing and graph transformers while achieving excellent performance with reduced computational cost.


A Convergence Analysis of Approximate Message Passing with Non-Separable Functions and Applications to Multi-Class Classification

http://arxiv.org/abs/2402.08676v1

Compressor summary: The text analyzes the convergence of approximate message passing dynamics for non-separable multivariate nonlinearities and its application to a convex optimization problem in multi-class classifications.


Model Assessment and Selection under Temporal Distribution Shift

http://arxiv.org/abs/2402.08672v1

Compressor summary: The text proposes a method for evaluating and selecting models in changing environments using rolling windows, generalization error estimation, and tournaments.


Are Semi-Dense Detector-Free Methods Good at Matching Local Features?

http://arxiv.org/abs/2402.08671v1

Compressor summary: The paper proposes a new image matching method (SAM) that performs well in estimating poses, compares it with SDF methods, and shows that correspondences in textured regions are crucial for accurate pose estimation.


Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models

http://arxiv.org/abs/2402.08670v1

Compressor summary: The proposed Rec-GPT4V: VST scheme uses large vision-language models for multimodal recommendations by leveraging user history, generating item image summaries, and querying user preferences over candidate items.


Target Score Matching

http://arxiv.org/abs/2402.08667v1

Compressor summary: The paper proposes a new method for training Denoising Diffusion Models that improves score estimation at low noise levels by using knowledge of the target score.


Improving Generalization in Semantic Parsing by Increasing Natural Language Variation

http://arxiv.org/abs/2402.08666v1

Compressor summary: The authors use large language models to create more diverse and realistic text-to-SQL questions and show that this improves the performance and generalization of parsers.


PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

http://arxiv.org/abs/2402.08657v1

Compressor summary: The paper proposes PIN, a learnable spatial prompt that unlocks object localization in caption-based vision-language models without supervised training or new output heads.


Learning Continuous 3D Words for Text-to-Image Generation

http://arxiv.org/abs/2402.08654v1

Compressor summary: The paper introduces Continuous 3D Words, input tokens that allow users to fine-tune various abstract attributes like illumination and shape in text-to-image models.


SAGMAN: Stability Analysis of Graph Neural Networks on the Manifolds

http://arxiv.org/abs/2402.08653v1

Compressor summary: SAGMAN is a spectral framework that analyzes the stability of graph neural networks (GNNs) by examining distance distortions between input and output manifolds using spectral graph embedding and probabilistic graphical models.


Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

http://arxiv.org/abs/2402.08646v1

Compressor summary: The paper proposes a probabilistic framework for symbolic reasoning that incorporates neuroscience findings and formal logic concepts to advance machine intelligence.


Peeking Behind the Curtains of Residual Learning

http://arxiv.org/abs/2402.08645v1

Compressor summary: The paper investigates the "dissipating inputs" phenomenon in plain neural nets, which causes convergence failure, and proposes a new hypothesis and architecture for deep learning without residual connections.


Tandem Transformers for Inference Efficient LLMs

http://arxiv.org/abs/2402.08644v1

Compressor summary: Tandem transformers combine a small autoregressive model and a large model in block mode to enhance prediction accuracy and speed up inference in language models.


Learned Image Compression with Text Quality Enhancement

http://arxiv.org/abs/2402.08643v1

Compressor summary: The paper proposes a new loss function to improve the quality of reconstructed text in learned image compression, showing significant enhancements in experiments.


SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

http://arxiv.org/abs/2402.08638v1

Compressor summary: The paper introduces SemRel, a new dataset of semantic relatedness annotations across 14 languages from Africa and Asia, to study the broader phenomenon of semantic relatedness and its implications for NLP tasks and Large Language Models.


BdSLW60: A Word-Level Bangla Sign Language Dataset

http://arxiv.org/abs/2402.08635v1

Compressor summary: The paper presents a new Bangla Sign Language dataset with 60 sign words, annotated by professionals and tested with machine learning models for word-level recognition.


Knowledge Editing on Black-box Large Language Models

http://arxiv.org/abs/2402.08631v1

Compressor summary: The paper proposes a multi-perspective evaluation framework for black-box knowledge editing of large language models and introduces postEdit, a novel method that improves style retention and privacy protection.


NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

http://arxiv.org/abs/2402.08622v1

Compressor summary: The paper proposes a method to transfer the appearance of a Neural Radiance Field (NeRF) onto a different 3D geometry using semantic image analogies and correspondence transfer, achieving multi-view consistent results that users prefer over traditional methods.


A Generalized Approach to Online Convex Optimization

http://arxiv.org/abs/2402.08621v1

Compressor summary: The paper analyzes online convex optimization algorithms in different settings and presents general meta-algorithms to convert them between feedback types with comparable regret bounds.


Mixtures of Experts Unlock Parameter Scaling for Deep RL

http://arxiv.org/abs/2402.08609v1

Compressor summary: The paper shows that using Soft MoE modules in value-based networks improves the scalability and performance of reinforcement learning models.


Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

http://arxiv.org/abs/2402.08601v1

Compressor summary: The paper proposes a training-free approach for non-rigid image editing with Stable Diffusion that improves identity preservation without compromising editability by optimizing text, performing latent inversion, and using timestep-aware text injection sampling.


Homomorphism Counts for Graph Neural Networks: All About That Basis

http://arxiv.org/abs/2402.08595v1

Compressor summary: Graph neural networks struggle with counting patterns like cycles in graphs, but a new approach using homomorphism counts of all structures in the target pattern can improve their expressive power without increasing complexity.


Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning

http://arxiv.org/abs/2402.08594v1

Compressor summary: The text proposes a Bayesian method for improving prompt tuning by considering the correlation among source tasks and outperforms existing methods on various NLP benchmarks.


Graph Feature Preprocessor: Real-time Extraction of Subgraph-based Features from Transaction Graphs

http://arxiv.org/abs/2402.08593v1

Compressor summary: Key points: - Graph Feature Preprocessor is a software library that detects money laundering and fraud patterns in financial transaction graphs. - It enriches transactions with features that improve gradient-boosting-based machine learning models' accuracy. - It operates in a streaming manner, exploits multicore parallelism, and mines subgraph patterns efficiently. - It outperforms standard graph neural networks in accuracy, throughput, and latency on synthetic AML and real Ethereum phishing datasets. Summary: Graph Feature Preprocessor is a fast and accurate software library that detects money laundering and fraud patterns in financial transaction graphs, enriches transactions with features, and improves gradient-boosting-based machine learning models' performance.


Faster Repeated Evasion Attacks in Tree Ensembles

http://arxiv.org/abs/2402.08586v1

Compressor summary: The paper proposes a method to find adversarial examples for tree ensembles faster by identifying the consistent features that are perturbed.


Mixture of Link Predictors

http://arxiv.org/abs/2402.08583v1

Compressor summary: Link-MoE is a new graph machine learning model that adapts to different node pairs and improves link prediction accuracy by using various GNNs as experts.


FESS Loss: Feature-Enhanced Spatial Segmentation Loss for Optimizing Medical Image Analysis

http://arxiv.org/abs/2402.08582v1

Compressor summary: FESS Loss combines contrastive learning and Dice loss for accurate and refined segmentation of medical images, especially in low-data situations.


Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze

http://arxiv.org/abs/2402.08581v1

Compressor summary: The paper introduces FactCloze, a model for correcting factual errors in summaries, and SummDSC, a more faithful summary dataset generated by data distillation.


Two Tales of Single-Phase Contrastive Hebbian Learning

http://arxiv.org/abs/2402.08573v1

Compressor summary: The text discusses a local learning algorithm called "dual propagation" that improves learning efficiency and stability, but has limitations in biological plausibility and nudging symmetry.


Glass Segmentation with Multi Scales and Primary Prediction Guiding

http://arxiv.org/abs/2402.08571v1

Compressor summary: Key points: - Glass-like objects are hard to detect and segment due to their transparency and vague boundaries - The paper proposes a new network (MGNet) that improves spatial relationship extraction and semantic mining - The paper also introduces a novel loss function with uncertainty-aware loss for high-confidence segmentation maps Summary: The paper presents MGNet, a new network that can accurately locate and segment glass-like objects using fine-rescaling, merging, and uncertainty-aware loss functions.


Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

http://arxiv.org/abs/2402.08567v1

Compressor summary: Infectious jailbreak exploits large language models in multi-agent environments by spreading harmful behaviors exponentially fast through adversarial images.


Artificial Intelligence for Literature Reviews: Opportunities and Challenges

http://arxiv.org/abs/2402.08565v1

Compressor summary: The text reviews AI applications in Systematic Literature Reviews, focusing on screening and extraction phases, and evaluates 21 leading tools using a framework of traditional and AI features.


Denoising Diffusion Restoration Tackles Forward and Inverse Problems for the Laplace Operator

http://arxiv.org/abs/2402.08563v1

Compressor summary: This paper proposes a new method for solving inverse problems in PDEs using denoising diffusion restoration models that improve estimation by exploiting eigenvalues and eigenfunctions of the Laplacian operator.


Higher Layers Need More LoRA Experts

http://arxiv.org/abs/2402.08562v1

Compressor summary: The paper introduces MoE-LoRA with Layer-wise Expert Allocation (MoLA), a novel method to improve the efficiency of LoRA by dynamically allocating experts in different layers, achieving better performance on various NLP and commonsense QA tasks.


Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

http://arxiv.org/abs/2402.08552v1

Compressor summary: The paper proposes TDPO-R, an algorithm that combines temporal inductive bias and critic active neuron reset to reduce reward overoptimization in diffusion model alignment.


Generative VS non-Generative Models in Engineering Shape Optimization

http://arxiv.org/abs/2402.08540v1

Compressor summary: This paper compares generative and non-generative models for design space construction, finding that non-generative models can be more cost-effective and produce high-quality designs with fewer invalid options.


Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

http://arxiv.org/abs/2402.08539v1

Compressor summary: The study uses innovative preprocessing strategies on ADNI dataset to detect Alzheimer's disease early using machine learning models with high accuracy.


A Distributional Analogue to the Successor Representation

http://arxiv.org/abs/2402.08530v1

Compressor summary: The paper introduces a new method for reinforcement learning that separates transition structure and reward, using a distributional successor measure that describes the consequences of behavior, and can learn from data using generative model techniques.


Approximately Piecewise E(3) Equivariant Point Networks

http://arxiv.org/abs/2402.08529v1

Compressor summary: APEN is a framework that improves point cloud neural networks by approximating part-based symmetry using functions with finer equivariance, leading to better generalization for classification and segmentation tasks.


Concept-1K: A Novel Benchmark for Instance Incremental Learning

http://arxiv.org/abs/2402.08526v1

Compressor summary: The text proposes a new scenario (instance-incremental learning) and dataset (Concept-1K) to study catastrophic forgetting in large neural networks, showing that existing methods are insufficient to address this issue.


Fairness Auditing with Multi-Agent Collaboration

http://arxiv.org/abs/2402.08522v1

Compressor summary: The paper explores how agents can audit platforms for fairness using different collaboration and sampling strategies, finding that uncoordinated collaboration generally improves audit accuracy.


Counterfactual Influence in Markov Decision Processes

http://arxiv.org/abs/2402.08514v1

Compressor summary: This paper proposes a method to generate counterfactual paths for Markov Decision Processes that remain influenced by the observed path, addressing an overlooked issue in existing approaches.


Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown

http://arxiv.org/abs/2402.08511v1

Compressor summary: AmEx-MCTS improves Monte-Carlo tree search by separating value updates, visit count updates, and the selected path, allowing exclusion of already explored regions for better search performance in complex problems.


P-Mamba: Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation

http://arxiv.org/abs/2402.08506v1

Compressor summary: P-Mamba is a novel method for efficient and accurate segmentation of the left ventricle in pediatric echocardiography using two encoder branches to improve noise suppression and global dependencies.


Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

http://arxiv.org/abs/2402.08502v1

Compressor summary: The paper proposes a safe reinforcement learning method for autonomous vessels that follows temporal logic constraints of COLREGS to avoid collisions.


Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style

http://arxiv.org/abs/2402.08498v1

Compressor summary: The Counterfire corpus is a novel dataset of enriched counterarguments to Reddit posts generated by various language models, showcasing strong paraphrasing abilities with evidence and high style integration, but human-written arguments have more richness and diversity.


A Systematic Review of Data-to-Text NLG

http://arxiv.org/abs/2402.08496v1

Compressor summary: The text summarizes a systematic review of data-to-text generation research, highlighting research gaps, future directions, challenges, and various aspects of the field.


Sparsity via Sparse Group $k$-max Regularization

http://arxiv.org/abs/2402.08493v1

Compressor summary: The paper proposes a new regularization method for linear inverse problems with sparsity constraints that enhances group sparsity and approximates the $l_0$ norm more closely than existing methods.


The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale

http://arxiv.org/abs/2402.08492v1

Compressor summary: ChatGPT has lower accuracy and consistency than experienced endoscopists in assessing colonoscopy images using the Boston Bowel Preparation Scale, but shows promise for future fine-tuning.


Deep Reinforcement Learning for Controlled Traversing of the Attractor Landscape of Boolean Models in the Context of Cellular Reprogramming

http://arxiv.org/abs/2402.08491v1

Compressor summary: The study proposes a deep reinforcement learning framework to identify efficient cellular reprogramming strategies using artificial neural networks.


Revealing Decurve Flows for Generalized Graph Propagation

http://arxiv.org/abs/2402.08480v1

Compressor summary: The study introduces generalized propagation for weighted and directed graphs, proposes GPNNs for better attention mechanisms, and extends Ollivier-Ricci Curvature to CURC for analyzing propagation patterns in graph neural networks.


Plausible Extractive Rationalization through Semi-Supervised Entailment Signal

http://arxiv.org/abs/2402.08479v1

Compressor summary: This paper proposes a semi-supervised method to improve interpretability of black box models using entailment alignment between explanations and answers.


Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models

http://arxiv.org/abs/2402.08473v1

Compressor summary: The paper explores the embedding space of vision-language models using a new optimization method and finds that they can overgeneralize, failing systematic evaluations despite high zero-shot performance.


Large Language Models for the Automated Analysis of Optimization Algorithms

http://arxiv.org/abs/2402.08472v1

Compressor summary: The paper proposes using large language models like GPT-4 in STNWeb, a web tool for visualizing optimization algorithms, to create reports and plots that make the tool more accessible and useful for researchers.


Parallel-friendly Spatio-Temporal Graph Learning for Photovoltaic Degradation Analysis at Scale

http://arxiv.org/abs/2402.08470v1

Compressor summary: Key points: - The paper proposes a novel approach (ST-GTrend) to analyze performance degradation in PV power networks - ST-GTrend uses spatio-temporal coherence, graph attention, and parallel algorithm to separate aging and fluctuation terms - ST-GTrend outperforms existing methods on three large-scale datasets and can speed up trend analysis by 7.92 times Summary: ST-GTrend is a new method that uses spatio-temporal graph neural networks to estimate the long-term performance loss rate of PV inverters from time series data, achieving better accuracy and faster computation than existing methods.


Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale

http://arxiv.org/abs/2402.08467v1

Compressor summary: The study shows how ChatGPT can create convincing fake news about the war in Ukraine that people and tools can't easily spot.


Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence

http://arxiv.org/abs/2402.08466v1

Compressor summary: The text discusses the need for human oversight in AI training under a management-based regulatory paradigm, which can improve AI performance, fairness, and explainability.


Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products

http://arxiv.org/abs/2402.08450v1

Compressor summary: The paper introduces Subgraphormer, an architecture that combines Subgraph GNNs and Graph Transformers, improving performance on various datasets by leveraging attention and positional encodings in product graphs.


Latent space configuration for improved generalization in supervised autoencoder neural networks

http://arxiv.org/abs/2402.08441v1

Compressor summary: The paper proposes two methods to control the latent space configuration of autoencoders, enabling more stable and interpretable training and allowing similarity estimation in supervised autoencoders without decoders or classifiers.


JeFaPaTo -- A joint toolbox for blinking analysis and facial features extraction

http://arxiv.org/abs/2402.08439v1

Compressor summary: The Jena Facial Palsy Toolbox is a user-friendly tool that simplifies advanced computer vision analysis of subtle facial movements like blinking for medical professionals without programming skills.


Camera Calibration through Geometric Constraints from Rotation and Projection Matrices

http://arxiv.org/abs/2402.08437v1

Compressor summary: The paper proposes a novel loss function based on geometric constraints for camera calibration, using a multitask learning framework that combines neural networks and mathematical properties, and introduces a new dataset with realistic conditions.


Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection

http://arxiv.org/abs/2402.08427v1

Compressor summary: RiCL is a self-supervised learning framework that uses contrastive learning to pre-train radar object detectors, enabling them to learn with fewer data and achieve better performance in adverse conditions.


Vehicle Behavior Prediction by Episodic-Memory Implanted NDT

http://arxiv.org/abs/2402.08423v1

Compressor summary: The text proposes eMem-NDT, a method to predict vehicle behavior in autonomous driving using an interpretable neural decision tree that clusters and aligns historical vehicle behavior features.


Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning for Digital Twins

http://arxiv.org/abs/2402.08421v1

Compressor summary: The text proposes an offline multi-agent reinforcement learning scheme for digital twin-based wireless networks that uses distributional RL and conservative Q-learning to handle uncertainty and jointly trains policies in a centralized manner.


Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance Imaging

http://arxiv.org/abs/2402.08409v1

Compressor summary: The study presents a deep-learning framework that fuses low-field MRI features with 7T-like features to improve brain image segmentation in a 7T-absent environment, achieving superior results and adaptability.


Transition Constrained Bayesian Optimization via Markov Decision Processes

http://arxiv.org/abs/2402.08406v1

Compressor summary: Bayesian optimization with Markov Decision Processes enables better optimization of black-box functions with constraints and transitions by using reinforcement learning to plan ahead.


A Novel Approach to Regularising 1NN classifier for Improved Generalization

http://arxiv.org/abs/2402.08405v1

Compressor summary: The paper introduces Watershed Classifiers, a novel non-parametric approach to regularize 1NN classifiers using a greedy method, leading to arbitrary boundary learning and good generalization on dense datasets.


LLMs and the Human Condition

http://arxiv.org/abs/2402.08403v1

Compressor summary: The paper proposes a model of human decision-making integrating three theories and applies it to conversational AI, aiming to understand ChatGPT's intelligence and its implications for our world.


LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection

http://arxiv.org/abs/2402.08401v1

Compressor summary: The paper proposes LOSS-GAT, a graph-based semi-supervised and one-class approach for fake news detection that uses only a small set of labeled data and improves performance over baseline models.


Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

http://arxiv.org/abs/2402.08400v1

Compressor summary: The paper proposes an adaptive hierarchical certification method for image semantic segmentation that relaxes the certification level within a multi-level hierarchy to provide more meaningful information and lower abstain rate.


A Neural-network Enhanced Video Coding Framework beyond ECM

http://arxiv.org/abs/2402.08397v1

Compressor summary: A hybrid video compression framework with deep learning-based techniques improves the performance of the Enhanced Compression Model, achieving significant BD-rate savings.


Large Language Models as Minecraft Agents

http://arxiv.org/abs/2402.08392v1

Compressor summary: This paper studies how Large Language Models can be used as Minecraft agents, explores their strengths and weaknesses in builder and architect roles, and proposes a platform for online interaction and comparison with previous works.


Selective Learning: Towards Robust Calibration with Dynamic Regularization

http://arxiv.org/abs/2402.08384v1

Compressor summary: Dynamic Regularization (DReg) is a method to improve deep learning models' confidence and performance by fitting labels for in-distribution samples and applying regularization to outliers, resulting in a more reliable and calibrated model.


Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution

http://arxiv.org/abs/2402.08383v1

Compressor summary: LE-PDE-UQ is a method that uses latent vectors to integrate uncertainty quantification into deep learning-based surrogate models for PDEs, outperforming strong baselines in accuracy and long-term predictions.


Punctuation Restoration Improves Structure Understanding without Supervision

http://arxiv.org/abs/2402.08382v1

Compressor summary: Punctuation restoration improves structure understanding in natural language and enhances structure-aware representations for various linguistic tasks.


Time-Series Classification for Dynamic Strategies in Multi-Step Forecasting

http://arxiv.org/abs/2402.08373v1

Compressor summary: Dynamic Strategies (DyStrat) is a novel method for multi-step forecasting that adapts to different datasets and outperforms fixed strategies.


Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization

http://arxiv.org/abs/2402.08371v1

Compressor summary: The paper proposes a hybrid recommendation system combining Collaborative and Content-based filtering with a Genetic Algorithm to suggest suitable courses for students based on multiple criteria, using real data from University of Cordoba's Computer Science Degree.


One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill

http://arxiv.org/abs/2402.08369v1

Compressor summary: The text proposes a skill-based imitation learning framework that uses a vision-language model to learn skills from videos and adapts to environmental changes for one-shot imitation of complex tasks.


RBF-PINN: Non-Fourier Positional Embedding in Physics-Informed Neural Networks

http://arxiv.org/abs/2402.08367v1

Compressor summary: The paper proposes using Radial Basis Function instead of Fourier-based feature mapping for Physics-Informed Neural Networks, showing improved performance in various problems.


NeuRes: Learning Proofs of Propositional Satisfiability

http://arxiv.org/abs/2402.08365v1

Compressor summary: NeuRes is a neuro-symbolic SAT solver that can prove unsatisfiability and uses a novel architecture combining Graph Neural Networks and Pointer Networks to select node pairs for resolution proofs.


Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks

http://arxiv.org/abs/2402.08360v1

Compressor summary: The paper introduces VQA-IN, a method to train multimodal language models for domain-specific visual tasks using smaller versions of large language models.


Learning to Produce Semi-dense Correspondences for Visual Localization

http://arxiv.org/abs/2402.08359v1

Compressor summary: The study presents a novel localization method using semi-dense 2D-3D matching points to improve camera pose estimation accuracy, especially in noisy or sparse scenarios.


Visually Dehallucinative Instruction Generation

http://arxiv.org/abs/2402.08348v1

Compressor summary: The paper introduces CAP2QA, a method to reduce visual hallucination in synthetic image-text data for question-answering tasks by using image-aligned instructive QA dataset.


Conditional Information Gain Trellis

http://arxiv.org/abs/2402.08345v1

Compressor summary: Conditional Information Gain Trellis (CIGT) is a method for executing parts of a deep convolutional neural network using routing mechanisms based on differentiable information gain-based cost functions, which reduces computational burden and improves classification accuracy with fewer parameters.


Eliciting Big Five Personality Traits in Large Language Models: A Textual Analysis with Classifier-Driven Approach

http://arxiv.org/abs/2402.08341v1

Compressor summary: This study examines how different input prompts affect the personality traits of large language models (LLMs), revealing that more parameters result in a broader range of traits and fine-tuning influences their behavior.


Scribble-based fast weak-supervision and interactive corrections for segmenting whole slide images

http://arxiv.org/abs/2402.08333v1

Compressor summary: The paper presents a segmentation method for histopathology images that requires minimal user input and helps bridge the gap between pathologists and machines in clinical settings.


PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

http://arxiv.org/abs/2402.08327v1

Compressor summary: The paper introduces M2KR, a framework for training and evaluating multi-modal retrievers in KB-VQA tasks, and presents PreFLMR, a pre-trained model that achieves state-of-the-art results.


Uncertainty Quantification via Stable Distribution Propagation

http://arxiv.org/abs/2402.08324v1

Compressor summary: The authors present a new method for neural networks to handle uncertain inputs by approximating non-linearities with local linearization, which allows them to predict output uncertainties and outperform other methods like moment matching.


Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

http://arxiv.org/abs/2402.08321v1

Compressor summary: The paper proposes a hybrid regularizer for exploration by optimization in online decision-making problems, improving regret bounds in both stochastic and adversarial environments.


The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models

http://arxiv.org/abs/2402.08320v1

Compressor summary: Vision-based gait recognition relies more on anthropometric information than motion patterns, and existing benchmarks may contain spurious correlations.


Explicit References to Social Values in Fairy Tales: A Comparison between Three European Cultures

http://arxiv.org/abs/2402.08318v1

Compressor summary: The study uses word embedding to analyse how fairy tales from Portugal, Italy and Germany differ in their references to values, revealing a possible shared cultural understanding across European societies.


CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild

http://arxiv.org/abs/2402.08316v1

Compressor summary: CrossGaze is a novel gaze estimation method that uses existing computer vision models and attention modules to predict where people are looking without specialized architectures, achieving competitive results on Gaze360 benchmark.


Approximating Families of Sharp Solutions to Fisher's Equation with Physics-Informed Neural Networks

http://arxiv.org/abs/2402.08313v1

Compressor summary: The paper uses physics-informed neural networks to solve Fisher's equation for traveling waves under large reaction rates, improving the method with a residual weighting scheme and an input-based network architecture.


One-to-many Reconstruction of 3D Geometry of cultural Artifacts using a synthetically trained Generative Model

http://arxiv.org/abs/2402.08310v1

Compressor summary: The approach generates detailed 3D representations from historic sketches and can be guided by text inputs, helping experts recreate lost artifacts.


Prompted Contextual Vectors for Spear-Phishing Detection

http://arxiv.org/abs/2402.08309v1

Compressor summary: The authors propose a novel document vectorization method using an ensemble of LLMs to detect spear-phishing emails by analyzing their persuasion principles and achieving a 91% F1 score with limited training data.


ChatCell: Facilitating Single-Cell Analysis with Natural Language

http://arxiv.org/abs/2402.08303v1

Compressor summary: ChatCell is a natural language-based tool that leverages large language models to facilitate single-cell analysis and improve the accessibility of this pivotal field.


An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation

http://arxiv.org/abs/2402.08300v1

Compressor summary: The text proposes using Birkhoff's aesthetic measure to objectively evaluate AI generated music and improve its quality and recommendations.


Time to Stop and Think: What kind of research do we want to do?

http://arxiv.org/abs/2402.08298v1

Compressor summary: The paper discusses the importance of rigorous and convincing experimentation in artificial intelligence research, especially metaheuristic optimization, and encourages critical assessment and reflection at both individual and community levels.


Multi-Level GNN Preconditioner for Solving Large Scale Problems

http://arxiv.org/abs/2402.08296v1

Compressor summary: The paper introduces a GNN-based preconditioner within a multi-level Domain Decomposition framework to enhance the efficiency and scalability of numerical simulations using GPU computations.


Learning semantic image quality for fetal ultrasound from noisy ranking annotation

http://arxiv.org/abs/2402.08294v1

Compressor summary: The paper proposes a robust model for ranking fetal ultrasound images based on semantic image quality and uncertainty, and compares it to existing methods.


The Effect of Data Poisoning on Counterfactual Explanations

http://arxiv.org/abs/2402.08290v1

Compressor summary: This paper examines how data poisoning can undermine the effectiveness of counterfactual explanations in analyzing and improving black-box systems.


A Logical Approach to Criminal Case Investigation

http://arxiv.org/abs/2402.08284v1

Compressor summary: XAI techniques can help forensic experts solve complex cases by explaining the reasons behind their conclusions and applying logical approaches to crime scene investigations.


Pix2Code: Learning to Compose Neural Visual Concepts as Programs

http://arxiv.org/abs/2402.08280v1

Compressor summary: Pix2Code is a framework that uses program synthesis and both symbolic and neural representations to learn abstract concepts from images in an unsupervised way, enabling generalizable and interpretable visual relational reasoning.


Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

http://arxiv.org/abs/2402.08277v1

Compressor summary: This paper proposes a method to improve the reliability of large language models for answering questions based on evidence by fine-tuning them with synthetic data and quality filters.


Geometry-induced Implicit Regularization in Deep ReLU Neural Networks

http://arxiv.org/abs/2402.08269v1

Compressor summary: The study investigates how activation patterns in neural networks affect their optimization, finding that a lower batch functional dimension is favored, which may help reduce overfitting.


World Model on Million-Length Video And Language With RingAttention

http://arxiv.org/abs/2402.08268v1

Compressor summary: Key points: - Current language models struggle with complex tasks that involve temporal information from videos - The paper introduces a large dataset of diverse videos and books, RingAttention technique, and other methods to train on long video and language sequences - The paper achieves new benchmarks in retrieval tasks and video understanding, and releases open-source 7B parameter models for multimodal training Summary: The paper presents a novel approach to train large neural networks on long video and language sequences using a curated dataset, RingAttention, and other techniques, and demonstrates improved performance on complex tasks.


Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss

http://arxiv.org/abs/2402.08267v1

Compressor summary: The paper proposes a new training method for image coding for machines that improves both recognition and compression performance by adding auxiliary loss to the encoder.


A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

http://arxiv.org/abs/2402.08265v1

Compressor summary: The paper proposes a method to improve text-to-image diffusion models by aligning them with user preferences using a fine-grained reward perspective that considers the initial steps of the generation process and introduces temporal discounting.


A Survey of Table Reasoning with Large Language Models

http://arxiv.org/abs/2402.08259v1

Compressor summary: The paper surveys table reasoning with large language models (LLMs), analyzing techniques, advantages, and future research directions.


Distal Interference: Exploring the Limits of Model-Based Continual Learning

http://arxiv.org/abs/2402.08255v1

Compressor summary: The study introduces a new ANN architecture, ABEL-Spline, that addresses distal interference and catastrophic forgetting in continual learning by ensuring uniformly trainable models without exponential growth in size.


Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

http://arxiv.org/abs/2402.08251v1

Compressor summary: Key points: - Neural network model for recognizing small objects in thermal images from drones - Model consists of backbone, neck, and prediction head using YOLOv5, BI-FPN, transformer encoder, sliding window, and attention mechanism - High accuracy and real-time speed on public dataset and embedded computer Summary: The authors propose a neural network model that uses YOLOv5 and transformers to detect small objects in thermal images from drones with high accuracy and real-time speed.


A survey of recent methods for addressing AI fairness and bias in biomedicine

http://arxiv.org/abs/2402.08250v1

Compressor summary: Key points: - AI systems can revolutionize clinical practices but may also perpetuate social inequities or biases - The authors reviewed 55 articles on different debiasing methods for biomedical NLP and CV - They discussed the strengths and weaknesses of each method and suggested other potential methods to address bias and improve fairness Summary: The authors surveyed recent publications on debiasing methods for AI systems in biomedicine, discussing their pros and cons, and recommending further approaches to ensure accurate and reliable applications.


SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization

http://arxiv.org/abs/2402.08249v1

Compressor summary: SepRep-Net is a framework for adapting multiple models to a new domain without source data, using separate pathways that are optimized together and reparameterized for inference.


APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

http://arxiv.org/abs/2402.08244v1

Compressor summary: APALU is a novel trainable activation function that improves learning performance in various deep-learning tasks by maintaining stability, efficiency, and adaptability.


Towards Equitable Agile Research and Development of AI and Robotics

http://arxiv.org/abs/2402.08242v1

Compressor summary: Key points: - ML/AI methods can amplify biases and prejudices in robots and AI applications - A culture of modularity prevents accountability for harms caused by AI - The authors propose a framework to build organizational equity capabilities and detect/address fairness issues in AI development projects - They adapt an Agile process based on Scrum as a primary example Summary: The authors suggest a framework to integrate equity practices into AI development projects, using Scrum as an example, to prevent and mitigate biases and harms caused by ML/AI systems.


BERT4FCA: A Method for Bipartite Link Prediction using Formal Concept Analysis and BERT

http://arxiv.org/abs/2402.08236v1

Compressor summary: BERT4FCA is a novel method for predicting links in bipartite networks using formal concept analysis and BERT, which improves performance over previous FCA-based methods and other classical methods.


Causal Discovery under Off-Target Interventions

http://arxiv.org/abs/2402.08229v1

Compressor summary: The paper proposes a stochastic intervention model for causal graph discovery that minimizes the number of interventions needed, and presents approximation algorithms and experimental results.


Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

http://arxiv.org/abs/2402.08228v1

Compressor summary: This paper investigates how different GNN architectures affect Out-of-Distribution generalization on graphs and proposes a new model that leverages robust properties of self-attention and decoupling.


Privacy-Preserving Language Model Inference with Instance Obfuscation

http://arxiv.org/abs/2402.08227v1

Compressor summary: The text discusses a new method called Instance-Obfuscated Inference (IOI) that protects decision privacy in natural language understanding tasks using pre-trained models, while maintaining seamless operation and low overhead.


Improving Black-box Robustness with In-Context Rewriting

http://arxiv.org/abs/2402.08225v1

Compressor summary: LLM-TTA improves OOD robustness of NLP models by using LLM-generated augmentations without retraining or labeling, enhancing performance in sentiment, toxicity, and news classification tasks.


BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

http://arxiv.org/abs/2402.08219v1

Compressor summary: BBox-Adapter is a novel method to adapt black-box LLMs for specific tasks using ranking-based NCE loss and online adaptation mechanism, improving performance and reducing costs.


Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

http://arxiv.org/abs/2402.08211v1

Compressor summary: The study investigates how Transformer models achieve complex tasks by analyzing their self-attention mechanism, which resembles gating mechanisms in the human brain.


Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits

http://arxiv.org/abs/2402.08209v1

Compressor summary: The paper proposes an iterative method to quickly identify harmful data instances for data cleansing by using a thresholding bandit algorithm, with theoretical and empirical evidence of its effectiveness.


Inherent Diverse Redundant Safety Mechanisms for AI-based Software Elements in Automotive Applications

http://arxiv.org/abs/2402.08208v1

Compressor summary: The paper discusses the challenges of using AI algorithms in autonomous driving systems, focusing on generalization issues and overconfidence risks, and proposes methods to improve their safety and reliability.


Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach

http://arxiv.org/abs/2402.08207v1

Compressor summary: Key points: - Road network extraction is important for high-definition maps and localization - Existing methods struggle to merge Euclidean and non-Euclidean data domains - The paper proposes a unified representation (RoadNet Sequence) and a non-autoregressive sequence-to-sequence model - The approach outperforms existing alternatives on nuScenes dataset Summary: The paper introduces a novel way to represent road network data and a non-autoregressive model that efficiently and accurately generates high-definition maps from Euclidean and non-Euclidean data.


Confronting Discrimination in Classification: Smote Based on Marginalized Minorities in the Kernel Space for Imbalanced Data

http://arxiv.org/abs/2402.08202v1

Compressor summary: The text discusses a novel approach to improve fraud detection by adaptively oversampling critical samples in the kernel space based on their distance to the decision boundary and surrounding sample density.


Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation

http://arxiv.org/abs/2402.08200v1

Compressor summary: The paper proposes a method to generate spurious features using large-scale text-to-image diffusion models and a new similarity loss, which can create consistent and visually similar spurious images across different classifiers.


Optimized Information Flow for Transformer Tracking

http://arxiv.org/abs/2402.08195v1

Compressor summary: OIFTrack is a one-stream Transformer tracker that optimizes information flow between target template and search tokens to improve discriminative capability and achieve outstanding performance in challenging benchmarks.


Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems

http://arxiv.org/abs/2402.08193v1

Compressor summary: GEnBP is a fusion of Ensemble Kalman filter and Gaussian belief propagation methods that efficiently infers high-dimensional models by handling complex dependence structures and distributed computing with low-rank local messages.


Learning time-dependent PDE via graph neural networks and deep operator network for robust accuracy on irregular grids

http://arxiv.org/abs/2402.08187v1

Compressor summary: GraphDeepONet is a new model that uses graph neural networks to improve deep learning-based prediction of solutions for partial differential equations, handling irregular grids and enabling time extrapolation.


Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5

http://arxiv.org/abs/2402.08185v1

Compressor summary: The paper presents a novel strategy that uses low-resolution data for global weather prediction and shows its effectiveness, efficiency, and potential in climate change studies.


Enabling Multi-Agent Transfer Reinforcement Learning via Scenario Independent Representation

http://arxiv.org/abs/2402.08184v1

Compressor summary: The study introduces a framework for transfer learning in MARL using unified state spaces and evaluates it on StarCraft scenarios, showing improved performance compared to learning from scratch or without curriculum.


Pixel Sentence Representation Learning

http://arxiv.org/abs/2402.08183v1

Compressor summary: The authors propose an unsupervised visual sentence representation learning framework that uses visually-grounded text perturbations and achieves comparable performance to existing methods in semantic textual similarity, with cross-lingual transferability.


Variational Continual Test-Time Adaptation

http://arxiv.org/abs/2402.08182v1

Compressor summary: VCoTTA is a variational Bayesian method that reduces error propagation in Continual Test-Time Adaptation by measuring uncertainties and updating the student model with priors from both source and teacher models.


Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

http://arxiv.org/abs/2402.08180v1

Compressor summary: The paper presents a general framework for online structured prediction with surrogate losses, improving regret bounds for multiclass classification and extending it to other problems using randomized decoding.


LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents

http://arxiv.org/abs/2402.08178v1

Compressor summary: The paper proposes a benchmark system to compare language-oriented task planners for home-service embodied agents, testing them on different datasets and simulators.


Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

http://arxiv.org/abs/2402.08174v1

Compressor summary: The paper proposes a method called HPLC that uses landmarks, or high-degree nodes, to represent positional information in graphs and improve link prediction performance.


LLaGA: Large Language and Graph Assistant

http://arxiv.org/abs/2402.08170v1

Compressor summary: The LLaGA model combines Large Language Models and Graph Neural Networks to analyze graph-structured data effectively, adapting graphs into sequences and token embeddings with a versatile projector.


Group Decision-Making among Privacy-Aware Agents

http://arxiv.org/abs/2402.08156v1

Compressor summary: The paper proposes a method to enable efficient social learning while preserving individual privacy using differential privacy and log-linear rules for information exchange.


CMA-R:Causal Mediation Analysis for Explaining Rumour Detection

http://arxiv.org/abs/2402.08155v1

Compressor summary: The authors propose a method (CMA-R) to analyze how neural models detect rumors on Twitter by identifying the causal effects of tweets and words, and show that it agrees with human judgments and improves interpretability.


Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings

http://arxiv.org/abs/2402.08145v1

Compressor summary: The paper presents a new method for sequential decision-making systems to adapt to non-stationary stochastic environments using relational representations, exploration, and probabilistic model learning.


H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

http://arxiv.org/abs/2402.08138v1

Compressor summary: H2O-SDF uses a new learning method with Object Surface Field to reconstruct 3D indoor scenes with accurate room layouts and detailed object surfaces, overcoming previous limitations.


Randomized Algorithms for Symmetric Nonnegative Matrix Factorization

http://arxiv.org/abs/2402.08134v1

Compressor summary: Key points: - SymNMF approximates a symmetric matrix with a product of two nonnegative matrices - Two randomized algorithms for SymNMF are developed: one using matrix sketching and one using leverage score sampling - Both methods are applied to graph clustering tasks on large data sets and achieve speed ups and quality preservation Summary: The paper proposes two fast and scalable algorithms for SymNMF, a technique that factors symmetric matrices, and demonstrates their effectiveness on graph clustering problems.