arxiv compressed, 2024-04-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-04-17 generated by the compressor, my personal LLM-based project.


Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

http://arxiv.org/abs/2404.10776v1

Compressor summary: The paper proposes a robust algorithm for learning from human feedback in generative models, even when the feedback is adversarial and manipulative.


COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

http://arxiv.org/abs/2404.10775v1

Compressor summary: The paper proposes a method for multi-agent cooperation using a compositional world model that generates videos from partial observations and enables online planning.


MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

http://arxiv.org/abs/2404.10774v1

Compressor summary: The authors propose a method to train smaller models with low cost that can fact-check LLM outputs by using synthetic data generated with GPT-4.


Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes

http://arxiv.org/abs/2404.10772v1

Compressor summary: The paper introduces Gaussian Opacity Fields (GOF), a new method for efficient and high-quality surface reconstruction from 3D Gaussians, using ray-tracing-based volume rendering and marching tetrahedra.


TENG: Time-Evolving Natural Gradient for Solving PDEs with Deep Neural Net

http://arxiv.org/abs/2404.10771v1

Compressor summary: The paper presents Time-Evolving Natural Gradient (TENG), a method that uses neural networks to solve partial differential equations (PDEs) with high accuracy by optimizing variational principles and time integration.


RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

http://arxiv.org/abs/2404.10765v1

Compressor summary: The paper proposes RefFusion, a 3D inpainting method that uses a reference image to enable high-quality synthesis and control over the reconstructed scene, achieving state-of-the-art results for various tasks.


LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

http://arxiv.org/abs/2404.10763v1

Compressor summary: The paper proposes a new diffusion model, LaDiC, for image captioning that leverages a latent space for captions and improves performance without pre-training or extra modules.


TorchSurv: A Lightweight Package for Deep Survival Analysis

http://arxiv.org/abs/2404.10761v1

Compressor summary: TorchSurv is a lightweight Python package that helps create custom deep survival models with PyTorch, especially for complex high-dimensional data.


Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

http://arxiv.org/abs/2404.10760v1

Compressor summary: This work introduces a large-scale COCO-AD dataset for anomaly detection, new evaluation metrics, and an effective InvAD framework for reconstruction-based methods.


Laplace-HDC: Understanding the geometry of binary hyperdimensional computing

http://arxiv.org/abs/2404.10759v1

Compressor summary: The paper explores binary hyperdimensional computing, introduces a new encoding method called Laplace-HDC that improves accuracy, and discusses its limitations and potential solutions for image processing.


Watch Your Step: Optimal Retrieval for Continual Learning at Scale

http://arxiv.org/abs/2404.10758v1

Compressor summary: This paper proposes a framework for evaluating and improving selective retrieval strategies in continual learning using replay buffers.


Settling Constant Regrets in Linear Markov Decision Processes

http://arxiv.org/abs/2404.10745v1

Compressor summary: The paper proposes a new RL algorithm, Cert-LSVI-UCB, that achieves constant regret guarantees for linear MDPs with misspecified transition kernels and rewards, and provides novel analysis techniques.


N-Agent Ad Hoc Teamwork

http://arxiv.org/abs/2404.10740v1

Compressor summary: The paper introduces N-agent ad hoc teamwork, a new multi-agent reinforcement learning problem, and proposes POAM, an algorithm that learns representations of teammate behaviors for cooperative task adaptation.


Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

http://arxiv.org/abs/2404.10733v1

Compressor summary: BLR-HAC is a method that combines offline data and online logistic regression to initialize and update policies for human-agent collaboration tasks.


What is Meant by AGI? On the Definition of Artificial General Intelligence

http://arxiv.org/abs/2404.10731v1

Compressor summary: The paper seeks a common definition for AGI by defining it as adapting to open environments with limited resources using intelligent principles.


Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units

http://arxiv.org/abs/2404.10730v1

Compressor summary: The paper explores IPUs as accelerators for ML in materials science and battery research, using a CNN model for predicting effective conductivity with comparable performance to GPUs.


Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

http://arxiv.org/abs/2404.10719v1

Compressor summary: The paper compares reward-based (PPO) and reward-free (DPO) methods for aligning large language models with human preferences using theoretical, empirical, and benchmarking studies.


GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

http://arxiv.org/abs/2404.10718v1

Compressor summary: The proposed GazeHTA method detects multiple head-target associations in a scene using a pre-trained diffusion model, enhanced head features, and a connection map for improved gaze target detection.


Mixed Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

http://arxiv.org/abs/2404.10717v1

Compressor summary: MPCL is a semi-supervised medical image segmentation method that uses mixed prototypes to improve class embeddings and achieve better performance than existing methods.


MOWA: Multiple-in-One Image Warping Model

http://arxiv.org/abs/2404.10716v1

Compressor summary: MOWA is a novel image warping model that can handle multiple tasks in one single model and generalize well to different scenarios.


A Plausibility Study of Using Augmented Reality in the Ventriculoperitoneal Shunt Operations

http://arxiv.org/abs/2404.10713v1

Compressor summary: This paper explores augmented reality techniques for medical surgeries, focusing on a new approach to ventriculoperitoneal shunt operations using 3D models and the Microsoft HoloLens 2.


Dual Modalities of Text: Visual and Textual Generative Pre-training

http://arxiv.org/abs/2404.10710v1

Compressor summary: The paper introduces a pre-training framework that combines visual and textual data to improve pixel-based language models.


Question Difficulty Ranking for Multiple-Choice Reading Comprehension

http://arxiv.org/abs/2404.10704v1

Compressor summary: The text explores automated methods for ranking MC questions by difficulty in English learning tests, comparing task transfer and zero-shot approaches, and finding that zero-shot comparative assessment is more effective than other methods.


ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

http://arxiv.org/abs/2404.10699v1

Compressor summary: ECLAIR is a large outdoor LiDAR dataset with 11 object categories for research in point cloud semantic segmentation.


Integrating knowledge bases to improve coreference and bridging resolution for the chemical domain

http://arxiv.org/abs/2404.10696v1

Compressor summary: Our method improves coreference and bridging resolution in chemical patents by using external knowledge in a multi-task learning model.


MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

http://arxiv.org/abs/2404.10690v1

Compressor summary: MathWriting is the largest dataset of handwritten mathematical expressions, containing 630k samples that can be used for offline recognition and benchmarking.


Network architecture search of X-ray based scientific applications

http://arxiv.org/abs/2404.10689v1

Compressor summary: The authors propose an automated method to optimize neural network models for X-ray and electron microscopy using hyperparameter and architecture search, achieving improved performance and reduced resource usage.


Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution

http://arxiv.org/abs/2404.10688v1

Compressor summary: The paper proposes Efficient Conditional Diffusion Model with Probability Flow Sampling, a fast and high-quality image super-resolution method that uses a continuous-time conditional diffusion model and a hybrid parametrization for the denoiser network.


Generating Human Interaction Motions in Scenes with Text Control

http://arxiv.org/abs/2404.10685v1

Compressor summary: TeSMo is a method that generates realistic and diverse human-object interactions in different scenes using denoising diffusion models and detailed scene information.


Driver Fatigue Prediction using Randomly Activated Neural Networks for Smart Ridesharing Platforms

http://arxiv.org/abs/2404.10684v1

Compressor summary: The paper proposes a new heuristic (DDS) and model (stochastic neural network with random activations) to predict how ridesharing drivers make decisions as they experience fatigue and cognitive decline during their shifts, outperforming existing methods in simulations and real data.


Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning

http://arxiv.org/abs/2404.10683v1

Compressor summary: The paper proposes a new reinforcement learning method, CAOSD, to optimize portfolios with allocation constraints, such as investing in green technologies while limiting fossil energy sector exposure, and shows it outperforms existing methods on real-world data.


StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

http://arxiv.org/abs/2404.10681v1

Compressor summary: StyleCity is a system that stylizes large-scale urban scenes using vision and text, generating harmonious backgrounds and enhancing semantics consistency for virtual production prototyping.


VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

http://arxiv.org/abs/2404.10667v1

Compressor summary: VASA is a framework that generates realistic talking faces from images and audio, with high quality and fast performance, enabling engaging avatar interactions.


Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks

http://arxiv.org/abs/2404.10664v1

Compressor summary: The study proposes a novel method for defect detection in noisy images using deep learning models and denoising techniques, achieving significant improvements in accuracy compared to previous methods.


Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

http://arxiv.org/abs/2404.10662v1

Compressor summary: The paper proposes a dual generative replay framework for continual offline reinforcement learning that retains previous knowledge and mitigates forgetting by replaying high-fidelity samples of past tasks.


ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

http://arxiv.org/abs/2404.10652v1

Compressor summary: The authors introduce ViTextVQA, a Vietnamese dataset for visual question answering that focuses on understanding text in images and improve the performance of models by studying the order of processing OCR tokens.


Efficient Parking Search using Shared Fleet Data

http://arxiv.org/abs/2404.10646v1

Compressor summary: The paper explores how sharing parking spot availability data within a vehicle fleet can help drivers find free spots faster in smart cities.


Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

http://arxiv.org/abs/2404.10645v1

Compressor summary: Distributed Distributional DrQ is a model-free RL algorithm that uses distributional value functions and distributed actor policies to improve performance in continuous control tasks.


Self-playing Adversarial Language Game Enhances LLM Reasoning

http://arxiv.org/abs/2404.10642v1

Compressor summary: The authors study how self-play in an adversarial language game called SPAG can improve large language models' reasoning ability on various benchmarks.


Contextrast: Contextual Contrastive Learning for Semantic Segmentation

http://arxiv.org/abs/2404.10633v1

Compressor summary: Contextrast is a semantic segmentation method that uses contrastive learning to capture local/global contexts and their relationships, improving performance on various datasets.


HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

http://arxiv.org/abs/2404.10630v1

Compressor summary: The paper presents HLAT, a large language model pre-trained on AWS Trainium using Neuron Distributed Training Library (NDTL), achieving comparable performance to baseline models trained on GPUs and TPUs.


Exploring selective image matching methods for zero-shot and few-sample unsupervised domain adaptation of urban canopy prediction

http://arxiv.org/abs/2404.10626v1

Compressor summary: We propose and test simple methods to adapt a trained UNet for canopy cover and height estimation across different geographic settings using remotely sensed data, achieving better results than baselines and existing approaches.


Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks

http://arxiv.org/abs/2404.10625v1

Compressor summary: The paper proposes a method to combine NeRF-based 3D GANs with 3D Gaussian Splatting for efficient rendering quality and real-time editing.


PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

http://arxiv.org/abs/2404.10620v1

Compressor summary: PyTorchGeoNodes is a module for 3D object reconstruction from images using shape programs, allowing for semantic reasoning and optimization.


Private Attribute Inference from Images with Vision-Language Models

http://arxiv.org/abs/2404.10618v1

Compressor summary: The text discusses the privacy risks posed by large language and multimodal vision-language models that can accurately infer personal attributes from benign images posted online.


Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

http://arxiv.org/abs/2404.10603v1

Compressor summary: CorrespondentDream is an effective method to use cross-view correspondences from diffusion U-Net as additional 3D prior for NeRF models, improving their geometry and coherence with common sense.


Intra-operative tumour margin evaluation in breast-conserving surgery with deep learning

http://arxiv.org/abs/2404.10600v1

Compressor summary: The study proposes and evaluates an intra-operative tumour margin evaluation scheme using specimen mammography, deep learning, and image thresholding to reduce the risk of local recurrences after breast retention surgery.


Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

http://arxiv.org/abs/2404.10595v1

Compressor summary: CODA-LM is a vision-language benchmark for self-driving that evaluates large language models' abilities in interpretable autonomous driving scenarios, especially challenging road corner cases.


Do Counterfactual Examples Complicate Adversarial Training?

http://arxiv.org/abs/2404.10588v1

Compressor summary: The study uses diffusion models to analyze how robust classifiers handle semantically altered data and finds that they struggle with low-norm counterfactual examples, suggesting a link between non-robustness and semantic features.


ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

http://arxiv.org/abs/2404.10584v1

Compressor summary: The paper introduces a new dataset, ReWiTe, that uses a hardware setup with two cellphones to capture authentic wide-angle and telephoto images for training deep learning methods in dual camera system fusion tasks, improving performance over existing synthetic datasets.


The application of Augmented Reality (AR) in Remote Work and Education

http://arxiv.org/abs/2404.10579v1

Compressor summary: This paper explores how Augmented Reality (AR) technology can improve remote work and online education by analyzing its features, advantages, challenges, scientific basis, technical support, performance, influencing factors, and future trends.


EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

http://arxiv.org/abs/2404.10575v1

Compressor summary: EMC^2 is an efficient Markov Chain Monte Carlo method for generating negative samples in contrastive learning, which achieves low computation and memory cost, global convergence, and competitive performance.


Uncertainty-guided Open-Set Source-Free Unsupervised Domain Adaptation with Target-private Class Segregation

http://arxiv.org/abs/2404.10574v1

Compressor summary: This paper proposes a novel approach for source-free open-set domain adaptation that improves target-private sample segregation and robustness using clustering, uncertainty-based selection, and a new contrastive loss.


AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

http://arxiv.org/abs/2404.10573v1

Compressor summary: Key points: - The study proposes an end-to-end diffusion model to generate capsid sequences with enhanced viability for gene therapy using rAAV vectors - The model outperforms traditional methods and can generate viable sequences even in the absence of AAV9 capsid data - The research contributes to the improvement of specificity and transduction efficiency in gene therapy applications Summary: The study presents a novel diffusion model that generates better capsid sequences for rAAV vectors, enhancing gene therapy outcomes.


Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation

http://arxiv.org/abs/2404.10572v1

Compressor summary: Label merge-and-split reduces the number of labels for whole brain parcellation, improves accuracy and efficiency, and can be used in other semantic segmentation tasks.


CMU-Flownet: Exploring Point Cloud Scene Flow Estimation in Occluded Scenario

http://arxiv.org/abs/2404.10571v1

Compressor summary: The paper proposes CMU-Flownet, a model that handles occlusions in LiDAR data by using a Correlation Matrix to estimate point similarity and an Occlusion-aware Cost Volume mechanism for better flow estimation.


HiGraphDTI: Hierarchical Graph Representation Learning for Drug-Target Interaction Prediction

http://arxiv.org/abs/2404.10561v1

Compressor summary: The paper proposes a novel deep learning method (HiGraphDTI) for predicting drug-target interactions that leverages hierarchical graph representations to capture chemical information from atoms, motifs, and molecules, as well as an attentional feature fusion module and a hierarchical attention mechanism for interpreting interaction mechanisms.


Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training

http://arxiv.org/abs/2404.10555v1

Compressor summary: The study developed a Japanese financial-specific large language model by continually pre-training it with custom datasets, improving its performance and quality of outputs compared to the original model.


Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

http://arxiv.org/abs/2404.10552v1

Compressor summary: The text warns that large language models without ethical alignment can execute malicious instructions, posing significant risks and demanding improved security measures.


The Evolution of Learning: Assessing the Transformative Impact of Generative AI on Higher Education

http://arxiv.org/abs/2404.10551v1

Compressor summary: The paper investigates how generative AI models like ChatGPT influence higher education, exploring benefits, drawbacks, and transformative changes through a survey and scenario analysis of students' perspectives and attitudes.


Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem

http://arxiv.org/abs/2404.10550v1

Compressor summary: The paper presents a method to approximate the gradient of the Evidence Lower Bound in Bayesian networks with clutter problems using the reparameterization trick and local likelihood factor approximations, which is faster and more accurate than classical methods.


A/B testing under Interference with Partial Network Information

http://arxiv.org/abs/2404.10547v1

Compressor summary: The paper introduces UNITE, a new method to estimate global average treatment effects in A/B tests with social connections by using only information about neighbors without knowing the exact network structure.


SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

http://arxiv.org/abs/2404.10527v1

Compressor summary: SPVLoc is a global indoor localization method that uses a convolutional network to match a query image with a panoramic semantic layout of the environment and estimate its 6D camera pose.


MobileNetV4 - Universal Models for the Mobile Ecosystem

http://arxiv.org/abs/2404.10518v1

Compressor summary: MobileNetV4 is an efficient and versatile architecture for mobile devices that uses new blocks and search techniques to achieve high accuracy on various accelerators.


CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity

http://arxiv.org/abs/2404.10513v1

Compressor summary: The text introduces a new method for improving QA systems by generating more accurate and correct attributions using Chain-of-Thought reasoning.


Four-hour thunderstorm nowcasting using deep diffusion models of satellite

http://arxiv.org/abs/2404.10512v1

Compressor summary: The text describes a new AI-based system (DDMS) for accurate and efficient convection nowcasting using diffusion processes and geostationary satellite data, which improves forecast lead time, coverage, and resolution compared to existing methods.


White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

http://arxiv.org/abs/2404.10508v1

Compressor summary: The study finds significant social biases in human and AI-generated texts based on gender, race, and intersectional identities, with minority groups experiencing lower levels of agency.


Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

http://arxiv.org/abs/2404.10505v1

Compressor summary: The paper introduces RLKWiC, a new dataset of real-world knowledge work data that can help study and improve knowledge workers' productivity.


A Sentiment Analysis of Medical Text Based on Deep Learning

http://arxiv.org/abs/2404.10503v1

Compressor summary: The paper explores how different deep learning networks affect sentiment analysis of medical texts using pre-trained models like BERT, and finds that CNN models perform best on smaller datasets.


Self-Supervised Visual Preference Alignment

http://arxiv.org/abs/2404.10501v1

Compressor summary: The paper proposes an unsupervised method for improving vision-language models by generating and aligning responses with augmented image inputs, achieving high scores on complex reasoning tasks without GPT-4 supervision or human involvement.


When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm

http://arxiv.org/abs/2404.10500v1

Compressor summary: This paper introduces an Auto-Prompt Graphical Paradigm (APGP) that combines both stimulating and framework prompts to enhance problem-solving capabilities of large language models across multiple domains using automated approaches.


Robust Noisy Label Learning via Two-Stream Sample Distillation

http://arxiv.org/abs/2404.10499v1

Compressor summary: TSSD is a sample selection framework for noisy label learning that uses PSD to generate reliable samples and MSP to mine semi-hard samples from uncertain data, improving network robustness.


LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System

http://arxiv.org/abs/2404.10498v1

Compressor summary: LAECIPS is a new edge-cloud framework for vision tasks that achieves high accuracy, low latency, and adapts to dynamic IoT data streams using plug-and-play models and hard input mining strategy.


Teaching Chinese Sign Language with Feedback in Mixed Reality

http://arxiv.org/abs/2404.10490v1

Compressor summary: The study proposes a new sign language teaching model using real-time vision, mixed reality, and improved hand-posture reconstruction to provide an immersive and effective learning experience.


AbsGS: Recovering Fine Details for 3D Gaussian Splatting

http://arxiv.org/abs/2404.10484v1

Compressor summary: The paper analyzes the cause of over-reconstruction issue in 3D Gaussian Splatting technique and proposes a novel homodirectional view-space positional gradient criterion to split large Gaussians and recover fine details for better rendering quality.


Would You Trust an AI Doctor? Building Reliable Medical Predictions with Kernel Dropout Uncertainty

http://arxiv.org/abs/2404.10483v1

Compressor summary: The paper presents a new AI model that improves reliability on small medical datasets, addressing trust issues in healthcare AI.


BayesJudge: Bayesian Kernel Language Modelling with Confidence Uncertainty in Legal Judgment Prediction

http://arxiv.org/abs/2404.10481v1

Compressor summary: BayesJudge is a novel Bayesian approach using deep learning and Gaussian Processes to improve prediction confidence and accuracy for legal tasks.


Efficient optimal dispersed Haar-like filters for face detection

http://arxiv.org/abs/2404.10476v1

Compressor summary: The paper presents a novel method to efficiently detect faces using optimally configured dispersed Haar-like filters that balance between-class and within-class variance.


Conversations as a Source for Teaching Scientific Concepts at Different Education Levels

http://arxiv.org/abs/2404.10475v1

Compressor summary: The paper introduces a new data set from video transcripts to train language models for engaging and effective conversational teaching of scientific concepts across different audiences.


Toward a Realistic Benchmark for Out-of-Distribution Detection

http://arxiv.org/abs/2404.10474v1

Compressor summary: The paper proposes a new benchmark for evaluating out-of-distribution detection in deep neural networks, using ImageNet and Places365 datasets and varying the criteria for in-distribution classes.


DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

http://arxiv.org/abs/2404.10464v1

Compressor summary: DeStein is a novel method that reduces toxic outputs from language models by altering their activation space representations using self-induced steering pairs and arithmetic operations, achieving better performance than previous methods with lower resource and time cost.


Advancing Long-Term Multi-Energy Load Forecasting with Patchformer: A Patch and Transformer-Based Approach

http://arxiv.org/abs/2404.10458v1

Compressor summary: Patchformer is a novel Transformer-based model that improves long-term multi-energy load forecasting by segmenting data into patches and capturing local and global dependencies.


Revealing data leakage in protein interaction benchmarks

http://arxiv.org/abs/2404.10457v1

Compressor summary: The text argues that machine learning for protein-protein interactions needs better evaluation strategies, data preparation, and structural similarity-based data splits to avoid overoptimistic evaluations and unfair benchmarking.


A Computer Vision-Based Quality Assessment Technique for the automatic control of consumables for analytical laboratories

http://arxiv.org/abs/2404.10454v1

Compressor summary: The paper proposes an AI-based automatic monitoring system for detecting anticoagulant substances in test tubes, which is competitive with existing models and could improve efficiency and sustainability in the production of plastic consumables.


Graph Neural Networks for Protein-Protein Interactions - A Short Survey

http://arxiv.org/abs/2404.10450v1

Compressor summary: This paper reviews different graph-based methods for predicting protein-protein interactions, discussing their applications and classifying them into two groups based on model structures.


SparseDM: Toward Sparse Efficient Diffusion Models

http://arxiv.org/abs/2404.10445v1

Compressor summary: Key points: - The paper proposes a method to improve deployment efficiency of diffusion models on mobile devices. - The method uses sparse masks and progressive sparsity training to control the trade-off between FID and MACs. - The experiments show that the method reduces MACs by 50% while maintaining low FID. Summary: The paper introduces a method for improving diffusion models on mobile devices using sparse masks and progressive sparsity training, which achieves low FID and MACs.


AGHINT: Attribute-Guided Representation Learning on Heterogeneous Information Networks with Transformer

http://arxiv.org/abs/2404.10443v1

Compressor summary: AGHINT is a new model for representing heterogeneous information networks, which improves node classification by considering attribute disparities and incorporating higher-order similar neighbor features.


1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

http://arxiv.org/abs/2404.10441v1

Compressor summary: The report describes the winning method for reconstructing 3D objects from few images using Pixel-NeRF, depth supervision, and positional encoding.


Language Proficiency and F0 Entrainment: A Study of L2 English Imitation in Italian, French, and Slovak Speakers

http://arxiv.org/abs/2404.10440v1

Compressor summary: The study examines how well second language learners of English can imitate native speakers' pitch variations in a reading task and finds that proficiency affects entrainment differently at individual and group levels.


The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

http://arxiv.org/abs/2404.10438v1

Compressor summary: The paper presents a simple and effective method for pose refinement using pre-trained features, a particle filter, and a renderable scene representation.


Tree Bandits for Generative Bayes

http://arxiv.org/abs/2404.10436v1

Compressor summary: The paper proposes a self-aware framework that learns from past trials and errors to accelerate ABC rejection sampling in generative models with obscured likelihood.


Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

http://arxiv.org/abs/2404.10433v1

Compressor summary: The study investigates how deep neural networks learn to classify Alzheimer's patients from normal controls using quantitative maps, and finds that they focus on brain regions near the basal ganglia.


MEEL: Multi-Modal Event Evolution Learning

http://arxiv.org/abs/2404.10429v1

Compressor summary: MEEL is a new method to improve machines' ability to understand event relations across different data types by generating evolving graphs and using them for instruction tuning and guiding discrimination.


AudioProtoPNet: An interpretable deep learning model for bird sound classification

http://arxiv.org/abs/2404.10420v1

Compressor summary: The study adapts a deep learning model that can accurately classify bird species from acoustic signals and provide interpretable explanations of its decisions using prototypical patterns.


Disentangling Instructive Information from Ranked Multiple Candidates for Multi-Document Scientific Summarization

http://arxiv.org/abs/2404.10416v1

Compressor summary: This paper introduces summary candidates into Multi-Document Scientific Summarization (MDSS) to guide the decoding process, improve global information handling, and generate better summaries using a specialized pairwise comparison method and Conditional Variational Autoencoder.


Camera clustering for scalable stream-based active distillation

http://arxiv.org/abs/2404.10411v1

Compressor summary: The paper proposes a framework to create efficient video object detection models using self-training and knowledge distillation, and shows that clustering cameras improves accuracy of distilled models.


Adversarial Identity Injection for Semantic Face Image Synthesis

http://arxiv.org/abs/2404.10408v1

Compressor summary: The paper proposes an SIS architecture with cross-attention to generate realistic and identity-preserving faces using semantic, style, and identity features.


Comprehensive Survey of Model Compression and Speed up for Vision Transformers

http://arxiv.org/abs/2404.10407v1

Compressor summary: The study compares four techniques to optimize Vision Transformers (ViT) for resource-constrained environments, improving their performance and efficiency.


Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

http://arxiv.org/abs/2404.10405v1

Compressor summary: The paper proposes an enhanced medical image recognition method by combining self-supervised and semi-supervised learning techniques, which improves accuracy when labeled data is limited.


Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

http://arxiv.org/abs/2404.10394v1

Compressor summary: Portrait3D is a novel text-to-3D-portrait generation framework that overcomes geometry issues by using a joint geometry-appearance prior and a pyramid tri-grid 3D representation.


Offline Trajectory Generalization for Offline Reinforcement Learning

http://arxiv.org/abs/2404.10393v1

Compressor summary: OTTO is a method to improve offline reinforcement learning by using World Transformers to predict dynamics and reward, and generating high-rewarded data simulations from offline data.


CNN-based explanation ensembling for dataset, representation and explanations evaluation

http://arxiv.org/abs/2404.10387v1

Compressor summary: The authors explore how combining different explanations from deep learning models can reveal more reliable patterns and improve evaluation of the model's behavior.


Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

http://arxiv.org/abs/2404.10384v1

Compressor summary: Key points: - Large language models (LLMs) excel at many tasks but struggle with domain-specific evaluations and hallucination problems - Knowledge graphs (KGs) provide background knowledge for LLMs and enable reasoning and analysis - The paper proposes a pipeline to select reasoning paths from KG based on LLM and a subgraph retrieval method based on CoT and page rank - Experiments show that fewer LLM calls can achieve the same results as previous SOTAs models Summary: The paper presents a method to use knowledge graphs and large language models for reasoning and subgraph retrieval, reducing the dependency on LLM and achieving similar performance to existing models.


Learning to Score Sign Language with Two-stage Method

http://arxiv.org/abs/2404.10383v1

Compressor summary: The paper proposes a two-stage method for sign language performance evaluation using pose reconstruction and smoothing methods, providing effective feedback and consistent results with professional assessments.


Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

http://arxiv.org/abs/2404.10378v1

Compressor summary: The paper overviews the 2nd edition of a face recognition challenge that explores using synthetic data to address privacy, bias, and performance issues in face recognition.


Know Yourself Better: Diverse Discriminative Feature Learning Improves Open Set Recognition

http://arxiv.org/abs/2404.10370v1

Compressor summary: The paper analyzes open set recognition methods and proposes a new approach that improves performance by leveraging feature diversity.


A Survey on Data-Driven Fault Diagnostic Techniques for Marine Diesel Engines

http://arxiv.org/abs/2404.10363v1

Compressor summary: This paper discusses the significance of fault diagnosis in marine diesel engines for maritime safety, efficiency, and reliability, focusing on subsystems, common issues, and data-driven methods.


Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation

http://arxiv.org/abs/2404.10358v1

Compressor summary: The paper proposes a novel framework called IREANet, which uses optical flow and residual blocks to align and aggregate features from multiple low dynamic range images to restore high quality high dynamic range images.


Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

http://arxiv.org/abs/2404.10357v1

Compressor summary: CoKnow is a framework that improves Prompt Learning for Vision-Language Models by using Multi-Knowledge Representation to enhance context optimization and achieve better performance on various downstream tasks.


Generating Counterfactual Trajectories with Latent Diffusion Models for Concept Discovery

http://arxiv.org/abs/2404.10356v1

Compressor summary: The study proposes a novel framework called CDCT to discover decision-relevant concepts from opaque deep learning models, which could improve trust and advance medical research.


Rethinking the Graph Polynomial Filter via Positive and Negative Coupling Analysis

http://arxiv.org/abs/2404.10353v1

Compressor summary: Key points: - The text introduces a novel basis for spectral GNNs that incorporates graph information and decouples positive and negative activation. - The basis is based on the Positive and Negative Coupling Analysis (PNCA) framework, which analyses the message propagation process and reveals hidden information. - The proposed GSCNet achieves better or comparable results with less computational time than existing GNNs. Summary: The text presents a new basis for spectral GNNs that uses graph information and PNCA to decouple activation effects, leading to improved performance and efficiency over existing GNNs.


Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards

http://arxiv.org/abs/2404.10346v1

Compressor summary: Self-Explore is a method that helps language models improve their reasoning skills by exploring the first mistake in a rationale and using it as feedback for further improvement, achieving significant gains compared to supervised fine-tuning on GSM8K and MATH datasets.


The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

http://arxiv.org/abs/2404.10343v1

Compressor summary: The paper reviews the NTIRE 2024 challenge on efficient single-image super-resolution, focusing on optimization aspects and outcomes, with four sub-tracks to evaluate runtime, FLOPs, parameters, and overall performance.


Referring Flexible Image Restoration

http://arxiv.org/abs/2404.10342v1

Compressor summary: Key points: - The text introduces a new challenge in image restoration called RFIR, where models need to remove specific degradation types based on human commands in images with multiple degradations. - The text presents a synthetic dataset (RFIR) and a novel transformer-based model (TransRFIR) for this task, which use two attention modules (MHASA and MHACA) to perceive and remove degradation types. - The text claims that TransRFIR achieves state-of-the-art performances and is released at https://github.com/GuanRunwei/FIR-CP. Summary: The text proposes a new image restoration task (RFIR) that requires models to remove specific degradation types in images with multiple degradations based on human commands, and introduces a synthetic dataset and a transformer-based model for this task that use novel attention modules.


Asset management, condition monitoring and Digital Twins: damage detection and virtual inspection on a reinforced concrete bridge

http://arxiv.org/abs/2404.10341v1

Compressor summary: The text describes a bridge incident in Norway where a structural defect was detected by Internet of Things sensors and Digital Twin technology, highlighting the benefits of online monitoring and condition-based maintenance for infrastructure management.


Intriguing Properties of Positional Encoding in Time Series Forecasting

http://arxiv.org/abs/2404.10337v1

Compressor summary: The paper proposes two new positional encodings for transformer-based time series forecasting methods and evaluates their performance in a dual-branch framework.


Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models

http://arxiv.org/abs/2404.10335v1

Compressor summary: AdvDiffVLM generates natural and effective adversarial examples for large visual-language models using diffusion models and GradCAM-guided Mask method, improving speed and robustness compared to existing methods.


Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning

http://arxiv.org/abs/2404.10332v1

Compressor summary: The paper proposes DFTG, a framework that generates targeted instruction data for different large vision-language models to address their specific hallucination issues and improve their performance on cross-modal tasks.


Towards Complex Ontology Alignment using Large Language Models

http://arxiv.org/abs/2404.10329v1

Compressor summary: The paper explores using large language models to automate the complex process of aligning ontologies in the Semantic Web, which is currently done manually by experts.


Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks

http://arxiv.org/abs/2404.10324v1

Compressor summary: The text proposes a graph neural network (GNN)-based surrogate model for predicting hydraulic states in urban drainage networks, which incorporates physical constraints and outperforms a fully-connected neural network (NN) model in accuracy and cost-effectiveness.


Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

http://arxiv.org/abs/2404.10322v1

Compressor summary: Our method adapts a small adapter to rectify diverse target domain styles to the source domain for better few-shot semantic segmentation, using local-global style perturbations and cyclic domain alignment.


CARE to Compare: A real-world dataset for anomaly detection in wind turbine data

http://arxiv.org/abs/2404.10320v1

Compressor summary: The authors present a new dataset for wind turbine anomaly detection with detailed fault information, and propose a CARE scoring method to evaluate anomaly detection models.


Application of Deep Learning Methods to Processing of Noisy Medical Video Data

http://arxiv.org/abs/2404.10319v1

Compressor summary: The authors propose methods to improve cell counting in moving streams by adapting training and decision making processes.


SRGS: Super-Resolution 3D Gaussian Splatting

http://arxiv.org/abs/2404.10318v1

Compressor summary: The paper proposes SRGS, a method that improves 3D Gaussian Splatting for high-resolution novel view synthesis by densifying and learning texture features from low-resolution inputs using sub-pixel constraints and a pre-trained 2D super-resolution model.


LLMs4OM: Matching Ontologies with Large Language Models

http://arxiv.org/abs/2404.10317v1

Compressor summary: The text introduces LLMs4OM, a novel approach that uses large language models for ontology matching tasks and shows they can outperform traditional methods in data integration.


Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

http://arxiv.org/abs/2404.10315v1

Compressor summary: The paper proposes LePe, a method to improve confidence expression in large language models by learning from past experience, addressing key problems and designing a complete pipeline for data preparation and sampling.


Awareness of uncertainty in classification using a multivariate model and multi-views

http://arxiv.org/abs/2404.10314v1

Compressor summary: The paper proposes an uncertain AI model that estimates its own predictions' uncertainties and uses them for data augmentation and multimodal optimization.


OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

http://arxiv.org/abs/2404.10312v1

Compressor summary: The OmniSSR method uses Stable Diffusion and tangent projection to achieve high-resolution omnidirectional image super-resolution with fidelity and realness, without training or fine-tuning.


Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

http://arxiv.org/abs/2404.10308v1

Compressor summary: HOMER is a new method that divides long inputs into smaller chunks, processes them collectively, and merges them using a hierarchical strategy to overcome the context limit of large language models without requiring training or expensive modifications.


Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain

http://arxiv.org/abs/2404.10307v1

Compressor summary: The paper presents a method that uses SegGPT and learnable prompts for few-shot segmentation, addressing catastrophic forgetting, object sizes, and discontinuities, with image similarity search for inference.


Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

http://arxiv.org/abs/2404.10306v1

Compressor summary: CoFiTune is a framework that balances speciality and versatility in aligned large language models by updating specific modules and using soft-masking, achieving better performance across diverse tasks.


TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

http://arxiv.org/abs/2404.10305v1

Compressor summary: The research proposes an end-to-end deep learning pipeline for recognizing tables in document images, improving accuracy and efficiency over existing methods.


Clustering and Data Augmentation to Improve Accuracy of Sleep Assessment and Sleep Individuality Analysis

http://arxiv.org/abs/2404.10299v1

Compressor summary: This study developed a machine learning model that uses sleep sounds to accurately assess sleep quality and identifies the specific sound events and timing that affect individual sleep satisfaction.


Future Language Modeling from Temporal Document History

http://arxiv.org/abs/2404.10297v1

Compressor summary: The text introduces future language modeling, a task of predicting future texts based on their temporal history, which can be useful for various human activities and improve upon existing non-temporal language models.


Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration

http://arxiv.org/abs/2404.10296v1

Compressor summary: The interpolating neural network (INN) is a new AI approach that uses interpolation points in physical space to improve software programming, reducing parameters, increasing accuracy, and addressing data challenges.


From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

http://arxiv.org/abs/2404.10292v1

Compressor summary: The paper proposes a Filtering-WoRA method to efficiently train person search models using synthetic data with minimal but effective data samples and fine-tuning, improving retrieval performance and reducing training time.


Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

http://arxiv.org/abs/2404.10282v1

Compressor summary: Tripod is a neural network autoencoder with three complementary inductive biases that improve disentangled representation learning, achieving state-of-the-art results on image benchmarks.


EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

http://arxiv.org/abs/2404.10279v1

Compressor summary: EucliDreamer is a method that generates realistic and diverse textures for 3D models based on text prompts using a depth-conditioned Stable Diffusion model, achieving superior quality and faster convergence than existing methods.


OptiGrad: A Fair and more Efficient Price Elasticity Optimization via a Gradient Based Learning

http://arxiv.org/abs/2404.10275v1

Compressor summary: The paper proposes a new gradient-based method for optimizing profit margins in insurance markets that directly integrates fairness criteria into pricing, addressing challenges faced by traditional methods.


Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso

http://arxiv.org/abs/2404.10274v1

Compressor summary: The study presents a new method that combines UMAP and LASSO to improve predictions of soil fertility from imbalanced datasets, achieving high accuracy and interpretability.


Plug-and-Play Acceleration of Occupancy Grid-based NeRF Rendering using VDB Grid and Hierarchical Ray Traversal

http://arxiv.org/abs/2404.10272v1

Compressor summary: The text introduces two techniques to improve the efficiency of ray-tracing in Occupancy Grid for Neural Radiance Field by using VDB grids and hierarchical digital differential analyzer.


Social Choice for AI Alignment: Dealing with Diverse Human Feedback

http://arxiv.org/abs/2404.10271v1

Compressor summary: The paper explores how social choice theory can help address ethical and safety challenges in fine-tuning foundation models like GPT-4 based on human preferences and principles.


Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text Generation

http://arxiv.org/abs/2404.10268v1

Compressor summary: The paper presents neuro-symbolic goal summarizer and dialogue generation models for health coaching that assist patients in setting and achieving physical activity goals, improve over previous methods, and introduce a new dataset and metric for evaluating patient responses.


OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

http://arxiv.org/abs/2404.10267v1

Compressor summary: OneActor is a novel method that generates consistent and high-quality images from text using cluster-conditioned diffusion models, without relying on external data or expensive tuning.


PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network

http://arxiv.org/abs/2404.10263v1

Compressor summary: PreGSU is a generalized pre-trained scene understanding model for autonomous driving that learns universal interactions using graph attention networks and self-supervised tasks.


Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy

http://arxiv.org/abs/2404.10259v1

Compressor summary: The authors propose a method using large language models to automatically discover arguments related to specific themes in social media discussions, reducing the need for manual coding and human intervention.


Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

http://arxiv.org/abs/2404.10242v1

Compressor summary: This paper compares weakly supervised classifiers and self-supervised masked autoencodters (MAEs) for featurizing microscopy images, showing that MAEs perform better and introducing a new channel-agnostic MAE architecture that generalizes well across different data.


Vision-and-Language Navigation via Causal Learning

http://arxiv.org/abs/2404.10241v1

Compressor summary: The paper introduces GOAT, a VLN model that uses causal inference and feature pooling to reduce dataset bias and improve performance on multiple VLN tasks.


MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models

http://arxiv.org/abs/2404.10237v1

Compressor summary: MoE-TinyMed is a low-parameter model for medical visual question answering that performs better than existing models with fewer resources.


Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression

http://arxiv.org/abs/2404.10234v1

Compressor summary: The paper proposes a framework that combines AI-native multi-modal search with neural image compression, improving storage and retrieval efficiency for large multimedia datasets.


Generative Text Steganography with Large Language Model

http://arxiv.org/abs/2404.10229v1

Compressor summary: LLM-Stega is a black-box generative text steganography method that uses large language models' user interfaces to securely communicate secret messages with rich semantics.


Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks

http://arxiv.org/abs/2404.10228v1

Compressor summary: Key points: - The paper proposes a two stage method for stance labeling using user-hashtag and user-user graphs - The method uses label propagation, semi-supervised learning, and graph neural networks - The method outperforms zero-shot stance labeling with LLMs on climate change and gun control tweets Summary: The paper presents a novel graph-based method for stance labeling that beats LLMs on tweets about climate change and gun control.


MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints

http://arxiv.org/abs/2404.10227v1

Compressor summary: The paper introduces a new model for realistic hand motion analysis that combines a musculoskeletal system with a parametric hand model, and a pose refinement framework that uses a neural network to improve the estimated hand pose.


Find The Gap: Knowledge Base Reasoning For Visual Question Answering

http://arxiv.org/abs/2404.10226v1

Compressor summary: The text analyzes how to improve visual question answering by using large language models and external knowledge bases, focusing on the impact of explicit supervised retrieval and multi-hop reasoning.


GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling

http://arxiv.org/abs/2404.10213v1

Compressor summary: The text proposes a new gait recognition network, GaitPoint+, that uses both silhouette and skeleton features for more robust recognition, with a lightweight and fast key point learning module and a recycling max-pooling method to improve accuracy.


Anomaly Correction of Business Processes Using Transformer Autoencoder

http://arxiv.org/abs/2404.10211v1

Compressor summary: The paper proposes a Transformer autoencoder-based method for detecting and correcting business process anomalies without setting thresholds, outperforming previous methods in accuracy and efficiency.