This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-08 generated by the compressor, my personal LLM-based project.
Jiawei Yang,Katie Z Luo,Jiefeng Li,Kilian Q Weinberger,Yonglong Tian,Yue Wang
http://arxiv.org/abs/2401.02957v1
Compressor summary: Key points: - Vision Transformers (ViTs) have grid-like artifacts in feature maps due to positional embeddings - The paper proposes a denoising method that splits ViT outputs into three components and removes the artifacts - The method does not require re-training or changing existing ViT architectures - The method improves performance on semantic and geometric tasks across multiple datasets Summary: The paper introduces Denoising Vision Transformers (DVT), a method that splits and denoises ViT outputs to eliminate grid-like artifacts and boost performance in downstream tasks without re-training.
Haobo Yuan,Xiangtai Li,Chong Zhou,Yining Li,Kai Chen,Chen Change Loy
http://arxiv.org/abs/2401.02955v1
Compressor summary: The paper introduces Open-Vocabulary SAM, a unified model that combines CLIP and SAM for interactive segmentation and recognition across diverse domains using knowledge transfer modules.
DeepSeek-AI,:,Xiao Bi,Deli Chen,Guanting Chen,Shanhuang Chen,Damai Dai,Chengqi Deng,Honghui Ding,Kai Dong,Qiushi Du,Zhe Fu,Huazuo Gao,Kaige Gao,Wenjun Gao,Ruiqi Ge,Kang Guan,Daya Guo,Jianzhong Guo,Guangbo Hao,Zhewen Hao,Ying He,Wenjie Hu,Panpan Huang,Erhang Li,Guowei Li,Jiashi Li,Yao Li,Y. K. Li,Wenfeng Liang,Fangyun Lin,A. X. Liu,Bo Liu,Wen Liu,Xiaodong Liu,Xin Liu,Yiyuan Liu,Haoyu Lu,Shanghao Lu,Fuli Luo,Shirong Ma,Xiaotao Nie,Tian Pei,Yishi Piao,Junjie Qiu,Hui Qu,Tongzheng Ren,Zehui Ren,Chong Ruan,Zhangli Sha,Zhihong Shao,Junxiao Song,Xuecheng Su,Jingxiang Sun,Yaofeng Sun,Minghui Tang,Bingxuan Wang,Peiyi Wang,Shiyu Wang,Yaohui Wang,Yongji Wang,Tong Wu,Y. Wu,Xin Xie,Zhenda Xie,Ziwei Xie,Yiliang Xiong,Hanwei Xu,R. X. Xu,Yanhong Xu,Dejian Yang,Yuxiang You,Shuiping Yu,Xingkai Yu,B. Zhang,Haowei Zhang,Lecong Zhang,Liyue Zhang,Mingchuan Zhang,Minghua Zhang,Wentao Zhang,Yichao Zhang,Chenggang Zhao,Yao Zhao,Shangyan Zhou,Shunfeng Zhou,Qihao Zhu,Yuheng Zou
http://arxiv.org/abs/2401.02954v1
Compressor summary: The paper introduces DeepSeek LLM, a scalable and open-source language model that outperforms LLaMA-2 and GPT-3.5 in various domains.
Jason Rute,Miroslav Olšák,Lasse Blaauwbroek,Fidel Ivan Schaposnik Massolo,Jelle Piepenbrock,Vasily Pestun
http://arxiv.org/abs/2401.02949v1
Compressor summary: The paper introduces Graph2Tac, a graph neural network that learns from Coq projects and their dependencies, to help AI agents prove new theorems in mathematics.
Michail Tarasiou,Rolandos Alexandros Potamias,Eimear O'Sullivan,Stylianos Ploumpis,Stefanos Zafeiriou
http://arxiv.org/abs/2401.02937v1
Compressor summary: The Locally Adaptive Morphable Model (LAMM) is an Auto-Encoder framework that learns to generate and manipulate 3D meshes with local control, achieving state-of-the-art performance in disentangling geometry manipulation and reconstruction.
Jieru Mei,Liang-Chieh Chen,Alan Yuille,Cihang Xie
http://arxiv.org/abs/2401.02931v1
Compressor summary: SPFormer is a Vision Transformer that uses superpixels to adaptively partition images into semantically coherent regions, achieving superior performance and explainability compared to traditional methods.
Daniel Waxman,Kurt Butler,Petar M. Djuric
http://arxiv.org/abs/2401.02930v1
Compressor summary: Dagma-DCE is a new, interpretable, model-agnostic scheme for causal discovery that uses an interpretable measure of causal strength and outperforms existing methods in simulated datasets.
Kevin Everson,Yile Gu,Huck Yang,Prashanth Gurunath Shivakumar,Guan-Ting Lin,Jari Kolehmainen,Ivan Bulyko,Ankur Gandhe,Shalini Ghosh,Wael Hamza,Hung-yi Lee,Ariya Rastrow,Andreas Stolcke
http://arxiv.org/abs/2401.02921v1
Compressor summary: The paper proposes a method that uses lattice output from ASR systems to improve SLU tasks by incorporating word confusion networks, enhancing LLM's resilience to noisy speech transcripts and robustness to varying ASR performance conditions.
Yuxin Yang,Pengfei Zhu,Mengshi Qi,Huadong Ma
http://arxiv.org/abs/2401.02916v1
Compressor summary: Key points: - Human trajectory forecasting is challenging due to uncertainty in human actions - A novel memory-based method, Motion Pattern Priors Memory Network, is introduced - The method constructs a memory bank of motion patterns and uses an addressing mechanism to retrieve matched patterns for prediction - The approach achieves state-of-the-art trajectory prediction accuracy Summary: The paper presents a memory-based method that retrieves motion patterns from a memory bank to predict human trajectories with high accuracy.
Parvin Malekzadeh,Ming Hou,Konstantinos N. Plataniotis
http://arxiv.org/abs/2401.02914v1
Compressor summary: The paper proposes an algorithm that combines aleatory and epistemic uncertainty estimation for better risk-sensitive exploration in reinforcement learning.
Gabriel Lino Garcia,Pedro Henrique Paiola,Luis Henrique Morelli,Giovani Candido,Arnaldo Cândido Júnior,Danilo Samuel Jodas,Luis C. S. Afonso,Ivan Rizzo Guilherme,Bruno Elias Penteado,João Paulo Papa
http://arxiv.org/abs/2401.02909v1
Compressor summary: This paper introduces Bode, a fine-tuned LLaMA 2-based model for Portuguese NLP tasks, which performs better than existing LLMs and is freely available.
Haidong Gu,Nathan Gaw,Yinan Wang,Chancellor Johnstone,Christine Beauchene,Sophia Yuditskaya,Hrishikesh Rao,Chun-An Chou
http://arxiv.org/abs/2401.02905v1
Compressor summary: The paper proposes a new network, H2G2-Net, that can automatically learn from hierarchical and multi-modal physiological data to predict human cognitive states without prior knowledge or graph structure.
Firas Laakom,Yuheng Bu,Moncef Gabbouj
http://arxiv.org/abs/2401.02904v1
Compressor summary: The paper proposes new information-theoretic bounds for measuring how well a model generalizes for each individual class, which can capture class-specific variations and are easier to estimate than existing bounds.
Marta Gomez-Barrero,Javier Galbally
http://arxiv.org/abs/2401.02861v1
Compressor summary: The text discusses the security risks of biometric recognition due to inverse biometrics, which allows reconstructing synthetic samples from unprotected templates, and reviews methods to assess, evaluate, and mitigate these threats.
Naaek Chinpattanakarn,Chainarong Amornbunchornvej
http://arxiv.org/abs/2401.02860v1
Compressor summary: The text describes a method to find and analyze patterns of following behavior between two time series, such as human movements or stock market fluctuations, using the Matrix Profile Method.
Akhil Vaid,Joshua Lampert,Juhee Lee,Ashwin Sawant,Donald Apakama,Ankit Sakhuja,Ali Soroush,Denise Lee,Isotta Landi,Nicole Bussola,Ismail Nabeel,Robbie Freeman,Patricia Kovatch,Brendan Carr,Benjamin Glicksberg,Edgar Argulian,Stamatios Lerakis,Monica Kraft,Alexander Charney,Girish Nadkarni
http://arxiv.org/abs/2401.02851v1
Compressor summary: This study shows that large language models can assist in evidence-based medicine by making clinical decisions, ordering tests, and following guidelines, but they still have limitations in handling complex cases.
Yang Zhou,Rongjun Xiao,Dani Lischinski,Daniel Cohen-Or,Hui Huang
http://arxiv.org/abs/2401.02847v1
Compressor summary: The paper presents a new method for creating seamless non-stationary textures by refining user-edited reference images with a diffusion network and self-attention.
Qi An,Mengshi Qi,Huadong Ma
http://arxiv.org/abs/2401.02841v1
Compressor summary: MCoRe is a novel framework for video-based action quality assessment that segments videos into stages and uses stage-wise contrastive learning to improve performance.
Zijun Long,Richard McCreadie,Muhammad Imran
http://arxiv.org/abs/2401.02838v1
Compressor summary: The paper introduces CrisisViT, a transformer-based model for automatic image classification of crisis situations using social media images and shows its superior performance over previous methods.
Wencong Wu,An Ge,Guannan Lv,Yuelong Xia,Yungang Zhang,Wen Xiong
http://arxiv.org/abs/2401.02831v1
Compressor summary: The paper introduces a new network called TSP-RDANet that divides image denoising into two stages and uses different attention mechanisms to learn important features and suppress irrelevant ones, achieving better performance than existing methods.
Yabin Zhu,Xiao Wang,Chenglong Li,Bo Jiang,Lin Zhu,Zhixiang Huang,Yonghong Tian,Jin Tang
http://arxiv.org/abs/2401.02826v1
Compressor summary: Key points: - The paper proposes a new object tracking task using unaligned neuromorphic and visible cameras - It introduces a dataset (CRSOT) with high-definition RGB-Event video pairs collected with a specially built data acquisition system - It develops a novel tracking framework that fuses RGB and Event features using ViT, uncertainty perception, and modality fusion modules - The tracker achieves robust tracking without strict alignment between modalities Summary: The paper presents a new object tracking task with unaligned neuromorphic and visible cameras, a large dataset (CRSOT) collected with a custom system, and a novel framework that fuses RGB and Event features for robust tracking without alignment.
Dongsheng Wang,Zhiqiang Ma,Armineh Nourbakhsh,Kang Gu,Sameena Shah
http://arxiv.org/abs/2401.02823v1
Compressor summary: DocGraphLM is a new framework that uses pre-trained language models and graph semantics to improve information extraction and question answering over visually rich documents.
Abdul Hannan Mustajab,Hao Lyu,Zarghaam Rizvi,Frank Wuttke
http://arxiv.org/abs/2401.02810v1
Compressor summary: Transfer learning improves the robustness and convergence of physics-informed neural networks (PINN) for high-frequency and multi-scale problems by starting from low-frequency problems and gradually increasing complexity.
Yuta Okuyama,Yuki Endo,Yoshihiro Kanamori
http://arxiv.org/abs/2401.02804v1
Compressor summary: The paper proposes a one-shot approach to edit human poses and body shapes in images while preserving identity and realism, using 3D modeling, diffusion-based refinement, and text embedding fine-tuning.
Jinlong He,Pengfei Li,Gang Liu,Zixu Zhao,Shenjun Zhong
http://arxiv.org/abs/2401.02797v1
Compressor summary: The paper introduces a parameter efficient framework for fine-tuning multimodal large language models to improve medical visual question answering performance, achieving high accuracy and outperforming GPT-4v.
Ryo Fujii,Ryo Hachiuma,Hideo Saito
http://arxiv.org/abs/2401.02791v1
Compressor summary: Our method improves surgical tool detection using image-level labels by leveraging co-occurrence between tool pairs, reducing annotation burden and enhancing performance.
Na Liu,Liangyu Chen,Xiaoyu Tian,Wei Zou,Kaijiang Chen,Ming Cui
http://arxiv.org/abs/2401.02777v1
Compressor summary: The paper presents RAISE, a new architecture that integrates large language models into conversational agents using a dual-component memory system, improving their controllability and adaptability in complex dialogues, as shown by its performance in a real estate sales context.
Joao Pereira,Dimitrios Chalatsis,Balint Hodossy,Dario Farina
http://arxiv.org/abs/2401.02773v1
Compressor summary: The study proposes a method to improve the performance of sEMG pattern recognition algorithms by training on different combinations of channels and augmenting with data from various electrode locations, making them more robust to electrode shifts and reducing dimensionality.
Kaixuan Chen,Wei Luo,Shunyu Liu,Yaoquan Wei,Yihe Zhou,Yunpeng Qing,Quan Zhang,Jie Song,Mingli Song
http://arxiv.org/abs/2401.02771v1
Compressor summary: Powerformer is a novel transformer architecture that learns robust power system state representations by using a section-adaptive attention mechanism and customized strategies, achieving better power dispatch for different transmission sections.
Hugo Chan-To-Hing,Bharadwaj Veeravalli
http://arxiv.org/abs/2401.02764v1
Compressor summary: Fus-MAE is a novel self-supervised framework that uses cross-attention in masked autoencoders to fuse SAR and optical data without complex data augmentations.
Amin Rezaei,Fatemeh Asadi
http://arxiv.org/abs/2401.02758v1
Compressor summary: The review discusses various image segmentation methods using complex networks, highlighting their importance in analyzing complex images and describing different algorithms and hybrid approaches.
Yuu Jinnai,Kaito Ariu
http://arxiv.org/abs/2401.02749v1
Compressor summary: AMBR is a fast and accurate method to approximate MBR decoding without hyperparameter tuning, using the CSH algorithm.
David Gimeno-Gómez,Ana-Maria Bucur,Adrian Cosma,Carlos-David Martínez-Hinarejos,Paolo Rosso
http://arxiv.org/abs/2401.02746v1
Compressor summary: Key points: - The paper proposes a model to detect depression from user-generated video content using multiple modalities (audio, face emotion, etc.) - The model performs better than previous methods on three benchmark datasets - The code is publicly available on GitHub Summary: The paper presents a multi-modal temporal model that can effectively identify depression cues from real-world videos and provides the code online.
Alfirsa Damasyifa Fauzulhaq,Wahyu Parwitayasa,Joseph Ananda Sugihdharma,M. Fadli Ridhani,Novanto Yudistira
http://arxiv.org/abs/2401.02744v1
Compressor summary: The text describes a method to visualize neuron behavior in deep neural networks using an improved encoder-decoder model with multiple attention mechanisms, achieving better results on long sequence neuron captioning.
Top Piriyakulkij,Yingheng Wang,Volodymyr Kuleshov
http://arxiv.org/abs/2401.02739v1
Compressor summary: The paper introduces DDVI, an inference method for latent variable models that uses diffusion models as variational posteriors and auxiliary latents to perform denoising in latent space.
Ryan Boustany
http://arxiv.org/abs/2401.02736v1
Compressor summary: The paper investigates how different aspects of neural networks, such as MaxPool operation and numerical precision, affect the reliability of automatic differentiation and its impact on performance.
Haoyuan Wu,Haisheng Zheng,Bei Yu
http://arxiv.org/abs/2401.02731v1
Compressor summary: PESC is a novel method that transforms dense language models into sparse ones using MoE layers with adapters, improving generalization across multiple tasks without increasing parameters much.
Hui Zeng,Biwei Chen,Anjie Peng
http://arxiv.org/abs/2401.02727v1
Compressor summary: Key points: - Adversarial examples (AEs) can protect privacy and inspire robust neural networks, but transferring them across unknown models is hard. - Paper proposes fine-tuning AE in feature space to improve targeted transferability. - Few iterations of fine-tuning can outperform existing attacks and be cheaper than resource-intensive methods. Summary: The paper introduces a simple and effective method to fine-tune adversarial examples in the feature space, improving their ability to fool unknown models with minimal cost and effort.
Nathan Aky,Denis Payet,Sylvain Giroux,Rémy Courdier
http://arxiv.org/abs/2401.02726v1
Compressor summary: The article proposes an OWL-formatted ontology for structuring connected devices in smart cities, enabling personalized services through autonomous agents.
Ikumi Okubo,Keisuke Sugiura,Hiroki Matsutani
http://arxiv.org/abs/2401.02721v1
Compressor summary: The paper proposes a lightweight Transformer model using Neural ODEs, quantizes it for edge computing on FPGAs, and achieves significant speedup and energy efficiency.
Yunshan Zhong,Yuyao Zhou,Yuxin Zhang,Fei Chao,Rongrong Ji
http://arxiv.org/abs/2401.02719v1
Compressor summary: The paper proposes a method to learn from unpaired moire images and clean images, synthesizing pseudo moire images and adaptively denoising them for demoireing model training.
Chuyun Shen,Wenhao Li,Haoqing Chen,Xiaoling Wang,Fengping Zhu,Yuxin Li,Xiangfeng Wang,Bo Jin
http://arxiv.org/abs/2401.02717v1
Compressor summary: CIML is a framework for multimodal learning in tumor segmentation that models and addresses inter-modal redundancy issues by decomposing the task into subtasks and using message passing and cross-modal attention to extract complementary information.
Ge Wang,Zelin Zang,Jiangbin Zheng,Jun Xia,Stan Z. Li
http://arxiv.org/abs/2401.02713v1
Compressor summary: This paper presents a new unsupervised learning method for graphs called Structure Knowledge Refinement (SKR), which improves graph feature extraction and classification by handling false negative pairs and adapting augmentation strategies.
Silvan Wehrli,Bert Arnrich,Christopher Irrgang
http://arxiv.org/abs/2401.02709v1
Compressor summary: The authors present a benchmark for evaluating German text embeddings in clustering tasks, analyze various models' performance, and explore continued pre-training for improved results on short texts.
Liwen Zhang,Lianzhen Zhong,Fan Yang,Di Dong,Hui Hui,Jie Tian
http://arxiv.org/abs/2401.02708v1
Compressor summary: The paper proposes a new loss function, TripleSurv, for survival analysis that considers sample pair differences and time intervals to improve prediction accuracy and robustness.
Zhitao Wang,Wei Wang,Zirao Li,Long Wang,Can Yi,Xinjie Xu,Luyang Cao,Hanjing Su,Shouzhi Chen,Jun Zhou
http://arxiv.org/abs/2401.02705v1
Compressor summary: The paper proposes XUAT-Copilot, an LLM-based multi-agent system that automates test script generation for WeChat Pay's UAT process, achieving high accuracy and saving labor costs.
Abisha Thapa Magar,Anup Shakya,Somdeb Sarkhel,Deepak Venugopal
http://arxiv.org/abs/2401.02703v1
Compressor summary: The paper proposes a method to estimate uncertainty in explanations generated by GNNExplainer using counterfactual examples and a factor graph model.
Ziying Song,Guoxin Zhang,Jun Xie,Lin Liu,Caiyan Jia,Shaoqing Xu,Zhepeng Wang
http://arxiv.org/abs/2401.02702v1
Compressor summary: VoxelNextFusion is a framework that enhances 3D object detection by fusing point clouds and images using self-attention and feature importance module, achieving better performance than existing methods.
Sasindu Wijeratne,Bingyi Zhang,Rajgopal Kannan,Viktor Prasanna,Carl Busart
http://arxiv.org/abs/2401.02687v1
Compressor summary: The paper proposes a Graph Neural Network (GNN) framework for identifying and classifying ground objects in Synthetic Aperture Radar (SAR) images, providing detailed information to help commanding officers make decisions.
Can Xu,Haosen Wang,Weigang Wang,Pengfei Zheng,Hongyang Chen
http://arxiv.org/abs/2401.02683v1
Compressor summary: GFMDiff is a novel diffusion-based molecule generation method that uses a Dual-Track Transformer Network and Geometric-Facilitated Loss to capture complex multi-body interatomic relationships and predict bond formation.
Zichen Wen,Yawen Ling,Yazhou Ren,Tianyi Wu,Jianpeng Chen,Xiaorong Pu,Zhifeng Hao,Lifang He
http://arxiv.org/abs/2401.02682v1
Compressor summary: The paper proposes AHGFC, a method to cluster heterophilous graphs by adaptively filtering graph signals based on their homophily degree and learning distinguishable node embeddings.
Yatharth Gupta,Vishnu V. Jaddipal,Harish Prabhala,Sayak Paul,Patrick Von Platen
http://arxiv.org/abs/2401.02677v1
Compressor summary: The authors introduce two smaller versions of the open source text-to-image model SDXL by removing some components and using layer-level losses to reduce size and latency while maintaining quality.
Iman Deznabi,Peeyush Kumar,Madalina Fiterau
http://arxiv.org/abs/2401.02665v1
Compressor summary: The proposed zero-shot learning approach can accurately forecast microclimate variables at new, unmonitored locations using knowledge from similar places.
Jiazhu Dai,Haoyu Sun
http://arxiv.org/abs/2401.02663v1
Compressor summary: The paper proposes a backdoor attack on graph neural networks (GNNs) for link prediction tasks, where trigger nodes can cause misclassification of unlinked node pairs.
Syed Hasib Akhter Faruqui,Adel Alaeddini,Yan Du,Shiyu Li,Kumar Sharma,Jing Wang
http://arxiv.org/abs/2401.02661v1
Compressor summary: The study used AI to create personalized feedback for Type 2 diabetes patients, resulting in improved physical activity and diet.
SeokHyun Seo,Jinwoo Hong,JungWoo Chae,Kyungyul Kim,Sangheum Hwang
http://arxiv.org/abs/2401.02656v1
Compressor summary: The paper proposes a regularization method called Guided Transfer of spatial Attention (GTA) for ViT models to prevent overfitting and improve performance, especially when the dataset size is small.
Viorica Rozina Chifu,Tudor Cioara,Cristina Bianca Pop,Horia Rusu,Ionut Anghel
http://arxiv.org/abs/2401.02653v1
Compressor summary: The paper proposes a model-free solution using Deep Q-Learning to schedule EV charging and discharging in microgrids based on a target energy profile provided by the distribution system operator, showing promising results.
Ridhima Bector,Abhay Aradhya,Chai Quek,Zinovi Rabinovich
http://arxiv.org/abs/2401.02652v1
Compressor summary: The paper proposes gammaDDPG, a method that enables constructive training-time attacks on reinforcement learning agents by dynamically adjusting the attack policy based on the victim's behavior.
Sunyi Zheng,Xiaonan Cui,Yuxuan Sun,Jingxiong Li,Honglin Li,Yunlong Zhang,Pingyi Chen,Xueping Jing,Zhaoxiang Ye,Lin Yang
http://arxiv.org/abs/2401.02651v1
Compressor summary: PathCLIP is a promising model for pathology image analysis, but its robustness under certain image corruptions varies and requires careful consideration when used in clinical settings.
Zeji Yi,Yunyue Wei,Chu Xin Cheng,Kaibo He,Yanan Sui
http://arxiv.org/abs/2401.02650v1
Compressor summary: The paper introduces a new Markov Chain Monte Carlo method for efficient Gaussian process sequential optimization in high-dimensional spaces, with theoretical guarantees of convergence and experimental improvements over existing methods.
Saurabh Atreya,Maheswar Bora,Aritra Mukherjee,Abhijit Das
http://arxiv.org/abs/2401.02649v1
Compressor summary: The authors present a new pen tool, a stereo camera, and a 2D CNN to analyze air signature from signers and detect skilled forgery.
Aritra Mukherjee,Abhijit Das
http://arxiv.org/abs/2401.02646v1
Compressor summary: This paper reviews recent advances in 3D face recognition using a single camera and compares it to traditional biometrics, highlighting its advantages and difficulties.
Chang Chen,Fei Deng,Kenji Kawaguchi,Caglar Gulcehre,Sungjin Ahn
http://arxiv.org/abs/2401.02644v1
Compressor summary: The Hierarchical Diffuser combines hierarchical and diffusion-based planning to improve the performance and efficiency of long-horizon tasks in reinforcement learning.
Jiahang Zhou,Yanyu Chen,Zicong Hong,Wuhui Chen,Yue Yu,Tao Zhang,Hui Wang,Chuanfu Zhang,Zibin Zheng
http://arxiv.org/abs/2401.02643v1
Compressor summary: This paper reviews efficient training and serving strategies for foundation models, which are the mainstream trend of artificial general intelligence, and discusses their challenges and future directions.
Huy Nguyen,Kien Nguyen,Sridha Sridharan,Clinton Fookes
http://arxiv.org/abs/2401.02634v1
Compressor summary: The paper introduces AG-ReID.v2, a dataset for person re-identification in aerial and ground scenarios, along with an explainable attention network that outperforms existing methods.
Shun Liu
http://arxiv.org/abs/2401.02630v1
Compressor summary: Our framework combines modular data processing and interpretability techniques to create transparent deep learning models without sacrificing performance.
Song Bai,Jie Li
http://arxiv.org/abs/2401.02620v1
Compressor summary: The text describes the rapid growth of 3D generation in AI, covering object creation, realistic human models, and motion generation, driven by advances in diffusion, control, rendering, and language models.
Hao Zhang,Yu-Wing Tai,Chi-Keung Tang
http://arxiv.org/abs/2401.02616v1
Compressor summary: The paper introduces a novel face video editing method using GAN-NeRF that ensures multi-view consistency, temporal coherence, and restores 3D geometry, outperforming existing approaches.
Yongxu Liu,Yinghui Quan,Guoyao Xiao,Aobo Li,Jinjian Wu
http://arxiv.org/abs/2401.02614v1
Compressor summary: The paper proposes SAMA, a data sampling method that preserves both local and global content in images and videos, enabling single-branch models to achieve competitive performance without extra complexity.
Jingyao Li,Pengguang Chen,Shaozuo Yu,Shu Liu,Jiaya Jia
http://arxiv.org/abs/2401.02611v1
Compressor summary: The study shows that reconstruction-based pretraining improves out-of-distribution detection by enhancing feature representations and making simple score functions perform as well as complex ones.
Jincen Jiang,Lizhi Zhao,Xuequan Lu,Wei Hu,Imran Razzak,Meili Wang
http://arxiv.org/abs/2401.02610v1
Compressor summary: The paper proposes a novel graph convolution network for point clouds that learns the relationships between voxelized parts based on hop distances and achieves state-of-the-art results.
Yuping Ye,Zhan Song,Juan Zhao
http://arxiv.org/abs/2401.02607v1
Compressor summary: The paper introduces a surface registration method for 3D morphable models that uses landmarks to partition, scale, and smooth the template model, improving performance and robustness compared to traditional methods.
Wen Dong,Haiyang Mei,Ziqi Wei,Ao Jin,Sen Qiu,Qiang Zhang,Xin Yang
http://arxiv.org/abs/2401.02606v1
Compressor summary: The text introduces a new car detection method using trichromatic linear polarization as an additional feature to improve performance in challenging conditions.
Kevin Xia,Elias Bareinboim
http://arxiv.org/abs/2401.02602v1
Compressor summary: Key points: - The paper develops a new family of causal abstractions by clustering variables and their domains - The abstractions are learnable with Neural Causal Models and can solve various causal inference tasks - The paper integrates the abstractions with representation learning for more flexibility and applies them to image data Summary: The paper proposes a novel way of creating and learning causal abstractions from variables and their domains, which enables deep learning tools to perform causal inference tasks on high-dimensional image data.
Meiling Li,Nan Zhong,Xinpeng Zhang,Zhenxing Qian,Sheng Li
http://arxiv.org/abs/2401.02600v1
Compressor summary: The paper explores how to create a backdoor attack on image captioning models by poisoning training data, which causes the model to generate irrelevant captions for specific images without affecting its performance on benign ones.
Yuxuan Shu,Vasileios Lampos
http://arxiv.org/abs/2401.02594v1
Compressor summary: UNA generates synthetic negative instances for text similarity tasks using term frequency-inverse document frequency scores to replace important terms in sentences.
Hung Nguyen,Morris Chang
http://arxiv.org/abs/2401.02591v1
Compressor summary: The study presents a method for balancing imbalanced data in deep learning by generating synthetic samples that fit well with the class distribution and maintain data topology.
Van Minh Nguyen,Emma Sandidge,Trupti Mahendrakar,Ryan T. White
http://arxiv.org/abs/2401.02588v1
Compressor summary: The authors propose a fast 3D modeling technique for satellites that can run on space hardware and support autonomy in rendezvous and proximity operations.
Daoan Zhang,Junming Yang,Hanjia Lyu,Zijian Jin,Yuan Yao,Mingkai Chen,Jiebo Luo
http://arxiv.org/abs/2401.02582v1
Compressor summary: The study examines how well Large Multimodal Models can perceive fine-grained visual details from multiple images and proposes a Contrastive Chain-of-Thought approach to improve their performance.