arxiv compressed, 2024-01-08

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-08 generated by the compressor, my personal LLM-based project.


Denoising Vision Transformers

Jiawei Yang,Katie Z Luo,Jiefeng Li,Kilian Q Weinberger,Yonglong Tian,Yue Wang

http://arxiv.org/abs/2401.02957v1

Compressor summary: Key points: - Vision Transformers (ViTs) have grid-like artifacts in feature maps due to positional embeddings - The paper proposes a denoising method that splits ViT outputs into three components and removes the artifacts - The method does not require re-training or changing existing ViT architectures - The method improves performance on semantic and geometric tasks across multiple datasets Summary: The paper introduces Denoising Vision Transformers (DVT), a method that splits and denoises ViT outputs to eliminate grid-like artifacts and boost performance in downstream tasks without re-training.


Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Haobo Yuan,Xiangtai Li,Chong Zhou,Yining Li,Kai Chen,Chen Change Loy

http://arxiv.org/abs/2401.02955v1

Compressor summary: The paper introduces Open-Vocabulary SAM, a unified model that combines CLIP and SAM for interactive segmentation and recognition across diverse domains using knowledge transfer modules.


DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek-AI,:,Xiao Bi,Deli Chen,Guanting Chen,Shanhuang Chen,Damai Dai,Chengqi Deng,Honghui Ding,Kai Dong,Qiushi Du,Zhe Fu,Huazuo Gao,Kaige Gao,Wenjun Gao,Ruiqi Ge,Kang Guan,Daya Guo,Jianzhong Guo,Guangbo Hao,Zhewen Hao,Ying He,Wenjie Hu,Panpan Huang,Erhang Li,Guowei Li,Jiashi Li,Yao Li,Y. K. Li,Wenfeng Liang,Fangyun Lin,A. X. Liu,Bo Liu,Wen Liu,Xiaodong Liu,Xin Liu,Yiyuan Liu,Haoyu Lu,Shanghao Lu,Fuli Luo,Shirong Ma,Xiaotao Nie,Tian Pei,Yishi Piao,Junjie Qiu,Hui Qu,Tongzheng Ren,Zehui Ren,Chong Ruan,Zhangli Sha,Zhihong Shao,Junxiao Song,Xuecheng Su,Jingxiang Sun,Yaofeng Sun,Minghui Tang,Bingxuan Wang,Peiyi Wang,Shiyu Wang,Yaohui Wang,Yongji Wang,Tong Wu,Y. Wu,Xin Xie,Zhenda Xie,Ziwei Xie,Yiliang Xiong,Hanwei Xu,R. X. Xu,Yanhong Xu,Dejian Yang,Yuxiang You,Shuiping Yu,Xingkai Yu,B. Zhang,Haowei Zhang,Lecong Zhang,Liyue Zhang,Mingchuan Zhang,Minghua Zhang,Wentao Zhang,Yichao Zhang,Chenggang Zhao,Yao Zhao,Shangyan Zhou,Shunfeng Zhou,Qihao Zhu,Yuheng Zou

http://arxiv.org/abs/2401.02954v1

Compressor summary: The paper introduces DeepSeek LLM, a scalable and open-source language model that outperforms LLaMA-2 and GPT-3.5 in various domains.


Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving

Jason Rute,Miroslav Olšák,Lasse Blaauwbroek,Fidel Ivan Schaposnik Massolo,Jelle Piepenbrock,Vasily Pestun

http://arxiv.org/abs/2401.02949v1

Compressor summary: The paper introduces Graph2Tac, a graph neural network that learns from Coq projects and their dependencies, to help AI agents prove new theorems in mathematics.


Locally Adaptive Neural 3D Morphable Models

Michail Tarasiou,Rolandos Alexandros Potamias,Eimear O'Sullivan,Stylianos Ploumpis,Stefanos Zafeiriou

http://arxiv.org/abs/2401.02937v1

Compressor summary: The Locally Adaptive Morphable Model (LAMM) is an Auto-Encoder framework that learns to generate and manipulate 3D meshes with local control, achieving state-of-the-art performance in disentangling geometry manipulation and reconstruction.


SPFormer: Enhancing Vision Transformer with Superpixel Representation

Jieru Mei,Liang-Chieh Chen,Alan Yuille,Cihang Xie

http://arxiv.org/abs/2401.02931v1

Compressor summary: SPFormer is a Vision Transformer that uses superpixels to adaptively partition images into semantically coherent regions, achieving superior performance and explainability compared to traditional methods.


Dagma-DCE: Interpretable, Non-Parametric Differentiable Causal Discovery

Daniel Waxman,Kurt Butler,Petar M. Djuric

http://arxiv.org/abs/2401.02930v1

Compressor summary: Dagma-DCE is a new, interpretable, model-agnostic scheme for causal discovery that uses an interpretable measure of causal strength and outperforms existing methods in simulated datasets.


Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Kevin Everson,Yile Gu,Huck Yang,Prashanth Gurunath Shivakumar,Guan-Ting Lin,Jari Kolehmainen,Ivan Bulyko,Ankur Gandhe,Shalini Ghosh,Wael Hamza,Hung-yi Lee,Ariya Rastrow,Andreas Stolcke

http://arxiv.org/abs/2401.02921v1

Compressor summary: The paper proposes a method that uses lattice output from ASR systems to improve SLU tasks by incorporating word confusion networks, enhancing LLM's resilience to noisy speech transcripts and robustness to varying ASR performance conditions.


Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

Yuxin Yang,Pengfei Zhu,Mengshi Qi,Huadong Ma

http://arxiv.org/abs/2401.02916v1

Compressor summary: Key points: - Human trajectory forecasting is challenging due to uncertainty in human actions - A novel memory-based method, Motion Pattern Priors Memory Network, is introduced - The method constructs a memory bank of motion patterns and uses an addressing mechanism to retrieve matched patterns for prediction - The approach achieves state-of-the-art trajectory prediction accuracy Summary: The paper presents a memory-based method that retrieves motion patterns from a memory bank to predict human trajectories with high accuracy.


A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty

Parvin Malekzadeh,Ming Hou,Konstantinos N. Plataniotis

http://arxiv.org/abs/2401.02914v1

Compressor summary: The paper proposes an algorithm that combines aleatory and epistemic uncertainty estimation for better risk-sensitive exploration in reinforcement learning.


Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

Gabriel Lino Garcia,Pedro Henrique Paiola,Luis Henrique Morelli,Giovani Candido,Arnaldo Cândido Júnior,Danilo Samuel Jodas,Luis C. S. Afonso,Ivan Rizzo Guilherme,Bruno Elias Penteado,João Paulo Papa

http://arxiv.org/abs/2401.02909v1

Compressor summary: This paper introduces Bode, a fine-tuned LLaMA 2-based model for Portuguese NLP tasks, which performs better than existing LLMs and is freely available.


H2G2-Net: A Hierarchical Heterogeneous Graph Generative Network Framework for Discovery of Multi-Modal Physiological Responses

Haidong Gu,Nathan Gaw,Yinan Wang,Chancellor Johnstone,Christine Beauchene,Sophia Yuditskaya,Hrishikesh Rao,Chun-An Chou

http://arxiv.org/abs/2401.02905v1

Compressor summary: The paper proposes a new network, H2G2-Net, that can automatically learn from hierarchical and multi-modal physiological data to predict human cognitive states without prior knowledge or graph structure.


Class-wise Generalization Error: an Information-Theoretic Analysis

Firas Laakom,Yuheng Bu,Moncef Gabbouj

http://arxiv.org/abs/2401.02904v1

Compressor summary: The paper proposes new information-theoretic bounds for measuring how well a model generalizes for each individual class, which can capture class-specific variations and are easier to estimate than existing bounds.


Reversing the Irreversible: A Survey on Inverse Biometrics

Marta Gomez-Barrero,Javier Galbally

http://arxiv.org/abs/2401.02861v1

Compressor summary: The text discusses the security risks of biometric recognition due to inverse biometrics, which allows reconstructing synthetic samples from unprotected templates, and reviews methods to assess, evaluate, and mitigate these threats.


Framework for Variable-lag Motif Following Relation Inference In Time Series using Matrix Profile analysis

Naaek Chinpattanakarn,Chainarong Amornbunchornvej

http://arxiv.org/abs/2401.02860v1

Compressor summary: The text describes a method to find and analyze patterns of following behavior between two time series, such as human movements or stock market fluctuations, using the Matrix Profile Method.


Generative Large Language Models are autonomous practitioners of evidence-based medicine

Akhil Vaid,Joshua Lampert,Juhee Lee,Ashwin Sawant,Donald Apakama,Ankit Sakhuja,Ali Soroush,Denise Lee,Isotta Landi,Nicole Bussola,Ismail Nabeel,Robbie Freeman,Patricia Kovatch,Brendan Carr,Benjamin Glicksberg,Edgar Argulian,Stamatios Lerakis,Monica Kraft,Alexander Charney,Girish Nadkarni

http://arxiv.org/abs/2401.02851v1

Compressor summary: This study shows that large language models can assist in evidence-based medicine by making clinical decisions, ordering tests, and following guidelines, but they still have limitations in handling complex cases.


Generating Non-Stationary Textures using Self-Rectification

Yang Zhou,Rongjun Xiao,Dani Lischinski,Daniel Cohen-Or,Hui Huang

http://arxiv.org/abs/2401.02847v1

Compressor summary: The paper presents a new method for creating seamless non-stationary textures by refining user-edited reference images with a diffusion network and self-attention.


Multi-Stage Contrastive Regression for Action Quality Assessment

Qi An,Mengshi Qi,Huadong Ma

http://arxiv.org/abs/2401.02841v1

Compressor summary: MCoRe is a novel framework for video-based action quality assessment that segments videos into stages and uses stage-wise contrastive learning to improve performance.


CrisisViT: A Robust Vision Transformer for Crisis Image Classification

Zijun Long,Richard McCreadie,Muhammad Imran

http://arxiv.org/abs/2401.02838v1

Compressor summary: The paper introduces CrisisViT, a transformer-based model for automatic image classification of crisis situations using social media images and shows its superior performance over previous methods.


Two-stage Progressive Residual Dense Attention Network for Image Denoising

Wencong Wu,An Ge,Guannan Lv,Yuelong Xia,Yungang Zhang,Wen Xiong

http://arxiv.org/abs/2401.02831v1

Compressor summary: The paper introduces a new network called TSP-RDANet that divides image denoising into two stages and uses different attention mechanisms to learn important features and suppress irrelevant ones, achieving better performance than existing methods.


CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras

Yabin Zhu,Xiao Wang,Chenglong Li,Bo Jiang,Lin Zhu,Zhixiang Huang,Yonghong Tian,Jin Tang

http://arxiv.org/abs/2401.02826v1

Compressor summary: Key points: - The paper proposes a new object tracking task using unaligned neuromorphic and visible cameras - It introduces a dataset (CRSOT) with high-definition RGB-Event video pairs collected with a specially built data acquisition system - It develops a novel tracking framework that fuses RGB and Event features using ViT, uncertainty perception, and modality fusion modules - The tracker achieves robust tracking without strict alignment between modalities Summary: The paper presents a new object tracking task with unaligned neuromorphic and visible cameras, a large dataset (CRSOT) collected with a custom system, and a novel framework that fuses RGB and Event features for robust tracking without alignment.


DocGraphLM: Documental Graph Language Model for Information Extraction

Dongsheng Wang,Zhiqiang Ma,Armineh Nourbakhsh,Kang Gu,Sameena Shah

http://arxiv.org/abs/2401.02823v1

Compressor summary: DocGraphLM is a new framework that uses pre-trained language models and graph semantics to improve information extraction and question answering over visually rich documents.


Physics-Informed Neural Networks for High-Frequency and Multi-Scale Problems using Transfer Learning

Abdul Hannan Mustajab,Hao Lyu,Zarghaam Rizvi,Frank Wuttke

http://arxiv.org/abs/2401.02810v1

Compressor summary: Transfer learning improves the robustness and convergence of physics-informed neural networks (PINN) for high-frequency and multi-scale problems by starting from low-frequency problems and gradually increasing complexity.


Diffbody: Diffusion-based Pose and Shape Editing of Human Images

Yuta Okuyama,Yuki Endo,Yoshihiro Kanamori

http://arxiv.org/abs/2401.02804v1

Compressor summary: The paper proposes a one-shot approach to edit human poses and body shapes in images while preserving identity and realism, using 3D modeling, diffusion-based refinement, and text embedding fine-tuning.


PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering

Jinlong He,Pengfei Li,Gang Liu,Zixu Zhao,Shenjun Zhong

http://arxiv.org/abs/2401.02797v1

Compressor summary: The paper introduces a parameter efficient framework for fine-tuning multimodal large language models to improve medical visual question answering performance, achieving high accuracy and outperforming GPT-4v.


Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos

Ryo Fujii,Ryo Hachiuma,Hideo Saito

http://arxiv.org/abs/2401.02791v1

Compressor summary: Our method improves surgical tool detection using image-level labels by leveraging co-occurrence between tool pairs, reducing annotation burden and enhancing performance.


From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

Na Liu,Liangyu Chen,Xiaoyu Tian,Wei Zou,Kaijiang Chen,Ming Cui

http://arxiv.org/abs/2401.02777v1

Compressor summary: The paper presents RAISE, a new architecture that integrates large language models into conversational agents using a dual-component memory system, improving their controllability and adaptability in complex dialogues, as shown by its performance in a real estate sales context.


Tackling Electrode Shift In Gesture Recognition with HD-EMG Electrode Subsets

Joao Pereira,Dimitrios Chalatsis,Balint Hodossy,Dario Farina

http://arxiv.org/abs/2401.02773v1

Compressor summary: The study proposes a method to improve the performance of sEMG pattern recognition algorithms by training on different combinations of channels and augmenting with data from various electrode locations, making them more robust to electrode shifts and reducing dimensionality.


Powerformer: A Section-adaptive Transformer for Power Flow Adjustment

Kaixuan Chen,Wei Luo,Shunyu Liu,Yaoquan Wei,Yihe Zhou,Yunpeng Qing,Quan Zhang,Jie Song,Mingli Song

http://arxiv.org/abs/2401.02771v1

Compressor summary: Powerformer is a novel transformer architecture that learns robust power system state representations by using a section-adaptive attention mechanism and customized strategies, achieving better power dispatch for different transmission sections.


Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing

Hugo Chan-To-Hing,Bharadwaj Veeravalli

http://arxiv.org/abs/2401.02764v1

Compressor summary: Fus-MAE is a novel self-supervised framework that uses cross-attention in masked autoencoders to fuse SAR and optical data without complex data augmentations.


Systematic review of image segmentation using complex networks

Amin Rezaei,Fatemeh Asadi

http://arxiv.org/abs/2401.02758v1

Compressor summary: The review discusses various image segmentation methods using complex networks, highlighting their importance in analyzing complex images and describing different algorithms and hybrid approaches.


Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding

Yuu Jinnai,Kaito Ariu

http://arxiv.org/abs/2401.02749v1

Compressor summary: AMBR is a fast and accurate method to approximate MBR decoding without hyperparameter tuning, using the CSH algorithm.


Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

David Gimeno-Gómez,Ana-Maria Bucur,Adrian Cosma,Carlos-David Martínez-Hinarejos,Paolo Rosso

http://arxiv.org/abs/2401.02746v1

Compressor summary: Key points: - The paper proposes a model to detect depression from user-generated video content using multiple modalities (audio, face emotion, etc.) - The model performs better than previous methods on three benchmark datasets - The code is publicly available on GitHub Summary: The paper presents a multi-modal temporal model that can effectively identify depression cues from real-world videos and provides the code online.


MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning

Alfirsa Damasyifa Fauzulhaq,Wahyu Parwitayasa,Joseph Ananda Sugihdharma,M. Fadli Ridhani,Novanto Yudistira

http://arxiv.org/abs/2401.02744v1

Compressor summary: The text describes a method to visualize neuron behavior in deep neural networks using an improved encoder-decoder model with multiple attention mechanisms, achieving better results on long sequence neuron captioning.


Diffusion Variational Inference: Diffusion Models as Expressive Variational Posteriors

Top Piriyakulkij,Yingheng Wang,Volodymyr Kuleshov

http://arxiv.org/abs/2401.02739v1

Compressor summary: The paper introduces DDVI, an inference method for latent variable models that uses diffusion models as variational posteriors and auxiliary latents to perform denoising in latent space.


On the numerical reliability of nonsmooth autodiff: a MaxPool case study

Ryan Boustany

http://arxiv.org/abs/2401.02736v1

Compressor summary: The paper investigates how different aspects of neural networks, such as MaxPool operation and numerical precision, affect the reliability of automatic differentiation and its impact on performance.


Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Haoyuan Wu,Haisheng Zheng,Bei Yu

http://arxiv.org/abs/2401.02731v1

Compressor summary: PESC is a novel method that transforms dense language models into sparse ones using MoE layers with adapters, improving generalization across multiple tasks without increasing parameters much.


Enhancing targeted transferability via feature space fine-tuning

Hui Zeng,Biwei Chen,Anjie Peng

http://arxiv.org/abs/2401.02727v1

Compressor summary: Key points: - Adversarial examples (AEs) can protect privacy and inspire robust neural networks, but transferring them across unknown models is hard. - Paper proposes fine-tuning AE in feature space to improve targeted transferability. - Few iterations of fine-tuning can outperform existing attacks and be cheaper than resource-intensive methods. Summary: The paper introduces a simple and effective method to fine-tune adversarial examples in the feature space, improving their ability to fool unknown models with minimal cost and effort.


Une ontologie pour les syst{è}mes multi-agents ambiants dans les villes intelligentes

Nathan Aky,Denis Payet,Sylvain Giroux,Rémy Courdier

http://arxiv.org/abs/2401.02726v1

Compressor summary: The article proposes an OWL-formatted ontology for structuring connected devices in smart cities, enabling personalized services through autonomous agents.


A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

Ikumi Okubo,Keisuke Sugiura,Hiroki Matsutani

http://arxiv.org/abs/2401.02721v1

Compressor summary: The paper proposes a lightweight Transformer model using Neural ODEs, quantizes it for edge computing on FPGAs, and achieves significant speedup and energy efficiency.


Learning Image Demoireing from Unpaired Real Data

Yunshan Zhong,Yuyao Zhou,Yuxin Zhang,Fei Chao,Rongrong Ji

http://arxiv.org/abs/2401.02719v1

Compressor summary: The paper proposes a method to learn from unpaired moire images and clean images, synthesizing pseudo moire images and adaptively denoising them for demoireing model training.


Complementary Information Mutual Learning for Multimodality Medical Image Segmentation

Chuyun Shen,Wenhao Li,Haoqing Chen,Xiaoling Wang,Fengping Zhu,Yuxin Li,Xiangfeng Wang,Bo Jin

http://arxiv.org/abs/2401.02717v1

Compressor summary: CIML is a framework for multimodal learning in tumor segmentation that models and addresses inter-modal redundancy issues by decomposing the task into subtasks and using message passing and cross-modal attention to extract complementary information.


Graph-level Protein Representation Learning by Structure Knowledge Refinement

Ge Wang,Zelin Zang,Jiangbin Zheng,Jun Xia,Stan Z. Li

http://arxiv.org/abs/2401.02713v1

Compressor summary: This paper presents a new unsupervised learning method for graphs called Structure Knowledge Refinement (SKR), which improves graph feature extraction and classification by handling false negative pairs and adapting augmentation strategies.


German Text Embedding Clustering Benchmark

Silvan Wehrli,Bert Arnrich,Christopher Irrgang

http://arxiv.org/abs/2401.02709v1

Compressor summary: The authors present a benchmark for evaluating German text embeddings in clustering tasks, analyze various models' performance, and explore continued pre-training for improved results on short texts.


TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis

Liwen Zhang,Lianzhen Zhong,Fan Yang,Di Dong,Hui Hui,Jie Tian

http://arxiv.org/abs/2401.02708v1

Compressor summary: The paper proposes a new loss function, TripleSurv, for survival analysis that considers sample pair differences and time intervals to improve prediction accuracy and robustness.


XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

Zhitao Wang,Wei Wang,Zirao Li,Long Wang,Can Yi,Xinjie Xu,Luyang Cao,Hanjing Su,Shouzhi Chen,Jun Zhou

http://arxiv.org/abs/2401.02705v1

Compressor summary: The paper proposes XUAT-Copilot, an LLM-based multi-agent system that automates test script generation for WeChat Pay's UAT process, achieving high accuracy and saving labor costs.


Verifying Relational Explanations: A Probabilistic Approach

Abisha Thapa Magar,Anup Shakya,Somdeb Sarkhel,Deepak Venugopal

http://arxiv.org/abs/2401.02703v1

Compressor summary: The paper proposes a method to estimate uncertainty in explanations generated by GNNExplainer using counterfactual examples and a factor graph model.


VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection

Ziying Song,Guoxin Zhang,Jun Xie,Lin Liu,Caiyan Jia,Shaoqing Xu,Zhepeng Wang

http://arxiv.org/abs/2401.02702v1

Compressor summary: VoxelNextFusion is a framework that enhances 3D object detection by fusing point clouds and images using self-attention and feature importance module, achieving better performance than existing methods.


PAHD: Perception-Action based Human Decision Making using Explainable Graph Neural Networks on SAR Images

Sasindu Wijeratne,Bingyi Zhang,Rajgopal Kannan,Viktor Prasanna,Carl Busart

http://arxiv.org/abs/2401.02687v1

Compressor summary: The paper proposes a Graph Neural Network (GNN) framework for identifying and classifying ground objects in Synthetic Aperture Radar (SAR) images, providing detailed information to help commanding officers make decisions.


Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation

Can Xu,Haosen Wang,Weigang Wang,Pengfei Zheng,Hongyang Chen

http://arxiv.org/abs/2401.02683v1

Compressor summary: GFMDiff is a novel diffusion-based molecule generation method that uses a Dual-Track Transformer Network and Geometric-Facilitated Loss to capture complex multi-body interatomic relationships and predict bond formation.


Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering

Zichen Wen,Yawen Ling,Yazhou Ren,Tianyi Wu,Jianpeng Chen,Xiaorong Pu,Zhifeng Hao,Lifang He

http://arxiv.org/abs/2401.02682v1

Compressor summary: The paper proposes AHGFC, a method to cluster heterophilous graphs by adaptively filtering graph signals based on their homophily degree and learning distinguishable node embeddings.


Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

Yatharth Gupta,Vishnu V. Jaddipal,Harish Prabhala,Sayak Paul,Patrick Von Platen

http://arxiv.org/abs/2401.02677v1

Compressor summary: The authors introduce two smaller versions of the open source text-to-image model SDXL by removing some components and using layer-level losses to reduce size and latency while maintaining quality.


Zero-shot Microclimate Prediction with Deep Learning

Iman Deznabi,Peeyush Kumar,Madalina Fiterau

http://arxiv.org/abs/2401.02665v1

Compressor summary: The proposed zero-shot learning approach can accurately forecast microclimate variables at new, unmonitored locations using knowledge from similar places.


A backdoor attack against link prediction tasks with graph neural networks

Jiazhu Dai,Haoyu Sun

http://arxiv.org/abs/2401.02663v1

Compressor summary: The paper proposes a backdoor attack on graph neural networks (GNNs) for link prediction tasks, where trigger nodes can cause misclassification of unlinked node pairs.


Nurse-in-the-Loop Artificial Intelligence for Precision Management of Type 2 Diabetes in a Clinical Trial Utilizing Transfer-Learned Predictive Digital Twin

Syed Hasib Akhter Faruqui,Adel Alaeddini,Yan Du,Shiyu Li,Kumar Sharma,Jing Wang

http://arxiv.org/abs/2401.02661v1

Compressor summary: The study used AI to create personalized feedback for Type 2 diabetes patients, resulting in improved physical activity and diet.


GTA: Guided Transfer of Spatial Attention from Object-Centric Representations

SeokHyun Seo,Jinwoo Hong,JungWoo Chae,Kyungyul Kim,Sangheum Hwang

http://arxiv.org/abs/2401.02656v1

Compressor summary: The paper proposes a regularization method called Guided Transfer of spatial Attention (GTA) for ViT models to prevent overfitting and improve performance, especially when the dataset size is small.


A Deep Q-Learning based Smart Scheduling of EVs for Demand Response in Smart Grids

Viorica Rozina Chifu,Tudor Cioara,Cristina Bianca Pop,Horia Rusu,Ionut Anghel

http://arxiv.org/abs/2401.02653v1

Compressor summary: The paper proposes a model-free solution using Deep Q-Learning to schedule EV charging and discharging in microgrids based on a target energy profile provided by the distribution system operator, showing promising results.


Adaptive Discounting of Training Time Attacks

Ridhima Bector,Abhay Aradhya,Chai Quek,Zinovi Rabinovich

http://arxiv.org/abs/2401.02652v1

Compressor summary: The paper proposes gammaDDPG, a method that enables constructive training-time attacks on reinforcement learning agents by dynamically adjusting the attack policy based on the victim's behavior.


Benchmarking PathCLIP for Pathology Image Analysis

Sunyi Zheng,Xiaonan Cui,Yuxuan Sun,Jingxiong Li,Honglin Li,Yunlong Zhang,Pingyi Chen,Xueping Jing,Zhaoxiang Ye,Lin Yang

http://arxiv.org/abs/2401.02651v1

Compressor summary: PathCLIP is a promising model for pathology image analysis, but its robustness under certain image corruptions varies and requires careful consideration when used in clinical settings.


Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Zeji Yi,Yunyue Wei,Chu Xin Cheng,Kaibo He,Yanan Sui

http://arxiv.org/abs/2401.02650v1

Compressor summary: The paper introduces a new Markov Chain Monte Carlo method for efficient Gaussian process sequential optimization in high-dimensional spaces, with theoretical guarantees of convergence and experimental improvements over existing methods.


Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNN

Saurabh Atreya,Maheswar Bora,Aritra Mukherjee,Abhijit Das

http://arxiv.org/abs/2401.02649v1

Compressor summary: The authors present a new pen tool, a stereo camera, and a 2D CNN to analyze air signature from signers and detect skilled forgery.


Recent Advancement in 3D Biometrics using Monocular Camera

Aritra Mukherjee,Abhijit Das

http://arxiv.org/abs/2401.02646v1

Compressor summary: This paper reviews recent advances in 3D face recognition using a single camera and compares it to traditional biometrics, highlighting its advantages and difficulties.


Simple Hierarchical Planning with Diffusion

Chang Chen,Fei Deng,Kenji Kawaguchi,Caglar Gulcehre,Sungjin Ahn

http://arxiv.org/abs/2401.02644v1

Compressor summary: The Hierarchical Diffuser combines hierarchical and diffusion-based planning to improve the performance and efficiency of long-horizon tasks in reinforcement learning.


Training and Serving System of Foundation Models: A Comprehensive Survey

Jiahang Zhou,Yanyu Chen,Zicong Hong,Wuhui Chen,Yue Yu,Tao Zhang,Hui Wang,Chuanfu Zhang,Zibin Zheng

http://arxiv.org/abs/2401.02643v1

Compressor summary: This paper reviews efficient training and serving strategies for foundation models, which are the mainstream trend of artificial general intelligence, and discusses their challenges and future directions.


AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification

Huy Nguyen,Kien Nguyen,Sridha Sridharan,Clinton Fookes

http://arxiv.org/abs/2401.02634v1

Compressor summary: The paper introduces AG-ReID.v2, a dataset for person re-identification in aerial and ground scenarios, along with an explainable attention network that outperforms existing methods.


Model-Agnostic Interpretation Framework in Machine Learning: A Comparative Study in NBA Sports

Shun Liu

http://arxiv.org/abs/2401.02630v1

Compressor summary: Our framework combines modular data processing and interpretability techniques to create transparent deep learning models without sacrificing performance.


Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

Song Bai,Jie Li

http://arxiv.org/abs/2401.02620v1

Compressor summary: The text describes the rapid growth of 3D generation in AI, covering object creation, realistic human models, and motion generation, driven by advances in diffusion, control, rendering, and language models.


FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

Hao Zhang,Yu-Wing Tai,Chi-Keung Tang

http://arxiv.org/abs/2401.02616v1

Compressor summary: The paper introduces a novel face video editing method using GAN-NeRF that ensures multi-view consistency, temporal coherence, and restores 3D geometry, outperforming existing approaches.


Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

Yongxu Liu,Yinghui Quan,Guoyao Xiao,Aobo Li,Jinjian Wu

http://arxiv.org/abs/2401.02614v1

Compressor summary: The paper proposes SAMA, a data sampling method that preserves both local and global content in images and videos, enabling single-branch models to achieve competitive performance without extra complexity.


MOODv2: Masked Image Modeling for Out-of-Distribution Detection

Jingyao Li,Pengguang Chen,Shaozuo Yu,Shu Liu,Jiaya Jia

http://arxiv.org/abs/2401.02611v1

Compressor summary: The study shows that reconstruction-based pretraining improves out-of-distribution detection by enhancing feature representations and making simple score functions perform as well as complex ones.


DHGCN: Dynamic Hop Graph Convolution Network for Self-supervised Point Cloud Learning

Jincen Jiang,Lizhi Zhao,Xuequan Lu,Wei Hu,Imran Razzak,Meili Wang

http://arxiv.org/abs/2401.02610v1

Compressor summary: The paper proposes a novel graph convolution network for point clouds that learns the relationships between voxelized parts based on hop distances and achieves state-of-the-art results.


Partition-based Nonrigid Registration for 3D Face Model

Yuping Ye,Zhan Song,Juan Zhao

http://arxiv.org/abs/2401.02607v1

Compressor summary: The paper introduces a surface registration method for 3D morphable models that uses landmarks to partition, scale, and smooth the template model, improving performance and robustness compared to traditional methods.


Exploiting Polarized Material Cues for Robust Car Detection

Wen Dong,Haiyang Mei,Ziqi Wei,Ao Jin,Sen Qiu,Qiang Zhang,Xin Yang

http://arxiv.org/abs/2401.02606v1

Compressor summary: The text introduces a new car detection method using trichromatic linear polarization as an additional feature to improve performance in challenging conditions.


Neural Causal Abstractions

Kevin Xia,Elias Bareinboim

http://arxiv.org/abs/2401.02602v1

Compressor summary: Key points: - The paper develops a new family of causal abstractions by clustering variables and their domains - The abstractions are learnable with Neural Causal Models and can solve various causal inference tasks - The paper integrates the abstractions with representation learning for more flexibility and applies them to image data Summary: The paper proposes a novel way of creating and learning causal abstractions from variables and their domains, which enables deep learning tools to perform causal inference tasks on high-dimensional image data.


Object-oriented backdoor attack against image captioning

Meiling Li,Nan Zhong,Xinpeng Zhang,Zhenxing Qian,Sheng Li

http://arxiv.org/abs/2401.02600v1

Compressor summary: The paper explores how to create a backdoor attack on image captioning models by poisoning training data, which causes the model to generate irrelevant captions for specific images without affecting its performance on benign ones.


Unsupervised hard Negative Augmentation for contrastive learning

Yuxuan Shu,Vasileios Lampos

http://arxiv.org/abs/2401.02594v1

Compressor summary: UNA generates synthetic negative instances for text similarity tasks using term frequency-inverse document frequency scores to replace important terms in sentences.


Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data

Hung Nguyen,Morris Chang

http://arxiv.org/abs/2401.02591v1

Compressor summary: The study presents a method for balancing imbalanced data in deep learning by generating synthetic samples that fit well with the class distribution and maintain data topology.


Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting

Van Minh Nguyen,Emma Sandidge,Trupti Mahendrakar,Ryan T. White

http://arxiv.org/abs/2401.02588v1

Compressor summary: The authors propose a fast 3D modeling technique for satellites that can run on space hardware and support autonomy in rendezvous and proximity operations.


CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

Daoan Zhang,Junming Yang,Hanjia Lyu,Zijian Jin,Yuan Yao,Mingkai Chen,Jiebo Luo

http://arxiv.org/abs/2401.02582v1

Compressor summary: The study examines how well Large Multimodal Models can perceive fine-grained visual details from multiple images and proposes a Contrastive Chain-of-Thought approach to improve their performance.