This page contains one-sentence summaries of cs.AI/ML/CV papers announced on 2023-11-28 generated by the compressor, my personal LLM-based project.
Ivan Lopes,Fabio Pizzati,Raoul de Charette
http://arxiv.org/abs/2311.17060v1
Compressor summary: The paper presents a method to create physically-based rendering materials from real images using diffusion model, texture generation, SVBRDF decomposition, and unsupervised domain adaptation.
Xian Liu,Xiaohang Zhan,Jiaxiang Tang,Ying Shan,Gang Zeng,Dahua Lin,Xihui Liu,Ziwei Liu
http://arxiv.org/abs/2311.17061v1
Compressor summary: HumanGaussian is a new method to generate realistic 3D human models from text prompts using adaptive Gaussian Splatting and Structure-Aware Score Distillation Sampling.
Jingkang Yang,Wenxuan Peng,Xiangtai Li,Zujin Guo,Liangyu Chen,Bo Li,Zheng Ma,Kaiyang Zhou,Wayne Zhang,Chen Change Loy,Ziwei Liu
http://arxiv.org/abs/2311.17058v1
Compressor summary: PVSG is a new problem that aims to generate pixel-level segmented scene graphs from videos, overcoming the limitations of VidSGG that uses bounding boxes.
Anindita Ghosh,Rishabh Dabral,Vladislav Golyanik,Christian Theobalt,Philipp Slusallek
http://arxiv.org/abs/2311.17057v1
Compressor summary: ReMoS is a model that generates realistic reactive motions for two people interacting, such as dancing or fighting, given the motion of one person, and can handle complex scenarios with full-body and hand interactions.
Zhaoying Pan,Daniel Geng,Andrew Owens
http://arxiv.org/abs/2311.17056v1
Compressor summary: The paper presents a self-supervised method to magnify subtle motions in video using a loss function that estimates the optical flow and penalizes deviations from the desired magnification factor, which can be adapted at test time and applied to selected objects without needing synthetic datasets.
Congyue Deng,Jiawei Yang,Leonidas Guibas,Yue Wang
http://arxiv.org/abs/2311.16504v1
Compressor summary: The paper proposes a simple modification to the NeRF rendering equation to improve view-dependent effects and achieve better rendering quality.
Sagar Vaze,Andrea Vedaldi,Andrew Zisserman
http://arxiv.org/abs/2311.17055v1
Compressor summary: The paper introduces Clevr-4, a synthetic dataset for Generalized Category Discovery (GCD) that requires models to extrapolate the taxonomy from labels and outperforms existing methods on it.
Zhengming Yu,Zhiyang Dou,Xiaoxiao Long,Cheng Lin,Zekun Li,Yuan Liu,Norman Müller,Taku Komura,Marc Habermann,Christian Theobalt,Xin Li,Wenping Wang
http://arxiv.org/abs/2311.17050v1
Compressor summary: The paper introduces Surf-D, a method for generating high-quality 3D shapes using Diffusion models and Unsigned Distance Field representation, which handles arbitrary topologies and performs well in various shape generation tasks.
Pavan Kumar Anasosalu Vasu,Hadi Pouransari,Fartash Faghri,Raviteja Vemulapalli,Oncel Tuzel
http://arxiv.org/abs/2311.17049v1
Compressor summary: The paper introduces MobileCLIP, a family of efficient image-text models optimized for runtime performance and a novel multi-modal reinforced training approach that improves accuracy and learning efficiency.
Zeyu Han,Fangrui Zhu,Qianru Lao,Huaizu Jiang
http://arxiv.org/abs/2311.17048v1
Compressor summary: Zero-shot referring expression comprehension involves locating objects in an image based on textual prompts, and the authors propose a method using foundation models, triplets, and fine-tuning VLA models for improved performance.
Yanwei Li,Chengyao Wang,Jiaya Jia
http://arxiv.org/abs/2311.17043v1
Compressor summary: The LLaMA-VID method generates tokens for VLMs to process long videos by using context and content tokens, reducing computational burdens and improving performance on video and image benchmarks.
Axel Sauer,Dominik Lorenz,Andreas Blattmann,Robin Rombach
http://arxiv.org/abs/2311.17042v1
Compressor summary: ADD trains large-scale image diffusion models to generate high-quality images in just 1-4 steps using score distillation and adversarial losses.
Keunwoo Peter Yu,Zheyuan Zhang,Fengyuan Hu,Joyce Chai
http://arxiv.org/abs/2311.17041v1
Compressor summary: The paper proposes a new method to train vision-language models for egocentric videos using in-context learning with minimal data, achieving better performance and adaptability than existing methods.
Milad Nasr,Nicholas Carlini,Jonathan Hayase,Matthew Jagielski,A. Feder Cooper,Daphne Ippolito,Christopher A. Choquette-Choo,Eric Wallace,Florian Tramèr,Katherine Lee
http://arxiv.org/abs/2311.17035v1
Compressor summary: The paper demonstrates how adversaries can efficiently extract training data from various machine learning models, including closed models like ChatGPT, by developing a new attack technique that exploits model misbehavior.
Junyi Zhang,Charles Herrmann,Junhwa Hur,Eric Chen,Varun Jampani,Deqing Sun,Ming-Hsuan Yang
http://arxiv.org/abs/2311.17034v1
Compressor summary: The paper proposes geometry-aware solutions to improve semantic correspondence in vision models, creating a new benchmark dataset and achieving state-of-the-art results.
Aleksandar Makelov,Georg Lange,Neel Nanda
http://arxiv.org/abs/2311.17030v1
Compressor summary: The paper explores how subspace interventions in models can lead to misleading interpretability, but also provides examples of successful cases and suggests more evidence is needed for faithfulness.
G. Cascavilla,G. Catolino,M. Conti,D. Mellios,D. A. Tamburri
http://arxiv.org/abs/2311.17026v1
Compressor summary: The paper explores using Siamese neural networks for recognizing illegal activities on the Dark Web from images, showing promising results with 90.9% accuracy on a small dataset.
Niladri Shekhar Dutt,Sanjeev Muralikrishnan,Niloy J. Mitra
http://arxiv.org/abs/2311.17024v1
Compressor summary: Diff3F is a feature descriptor for untextured shapes that uses conditional image synthesis from depth and normal maps to create robust, semantic features for correspondence across shape families.
Danah Yatim,Rafail Fridman,Omer Bar Tal,Yoni Kasten,Tali Dekel
http://arxiv.org/abs/2311.17009v1
Compressor summary: The paper proposes a text-driven motion transfer method that can synthesize videos with different objects and motions, using a pre-trained model and a new space-time feature loss.
Théo Bourdais,Pau Batlle,Xianjin Yang,Ricardo Baptista,Nicolas Rouquette,Houman Owhadi
http://arxiv.org/abs/2311.17007v1
Compressor summary: The paper introduces a GP framework for discovering and completing computational hypergraphs from partial observations, using a kernel generalization of Row Echelon Form reduction.
Brett Barkley,Amy Zhang,David Fridovich-Keil
http://arxiv.org/abs/2311.17008v1
Compressor summary: The paper explores using time reversal symmetry in Markov decision processes to increase sample efficiency in reinforcement learning, but notes that it may not work well in all environments.
Helena Calatrava,Ricardo Augusto Borsoi,Tales Imbiriba,Pau Closas
http://arxiv.org/abs/2311.17006v1
Compressor summary: The paper proposes IW-DKF, which uses importance sampling to improve state inference and parameter learning in deep Kalman filter for sequential models, showing better generative modeling performance and state estimation accuracy.
Kunchang Li,Yali Wang,Yinan He,Yizhuo Li,Yi Wang,Yi Liu,Zun Wang,Jilan Xu,Guo Chen,Ping Luo,Limin Wang,Yu Qiao
http://arxiv.org/abs/2311.17005v1
Compressor summary: MVBench is a benchmark for evaluating multi-modal language models' comprehension of dynamic videos, covering 20 challenging tasks that require temporal understanding.
Yutong Feng,Biao Gong,Di Chen,Yujun Shen,Yu Liu,Jingren Zhou
http://arxiv.org/abs/2311.17002v1
Compressor summary: The authors propose Ranni, a system that improves text-to-image diffusion models by using a semantic panel as a middleware to better follow complex instructions and enable more convenient interaction for users.
Marco Bagatella,Georg Martius
http://arxiv.org/abs/2311.16996v1
Compressor summary: The text discusses using exploration techniques in deep reinforcement learning to extract goal-conditioned behavior without additional environment interaction, and proposes a method to combine model-based planning with graph-based value aggregation to improve zero-shot goal-reaching performance.
Hailin Chen,Fangkai Jiao,Xingxuan Li,Chengwei Qin,Mathieu Ravaut,Ruochen Zhao,Caiming Xiong,Shafiq Joty
http://arxiv.org/abs/2311.16989v1
Compressor summary: ChatGPT's release in 2022 sparked a surge in interest and development of large language models (LLMs), leading to rapid progress in both open-source and closed-source LLMs, with significant implications for research and business.
Christos-Nikolaos Zacharopoulos,Théo Desbordes,Mathias Sablé-Meyer
http://arxiv.org/abs/2311.16978v1
Compressor summary: The paragraph discusses how the distance between an attractor noun and a verb affects grammatical judgments and reaction times, with neural networks performing differently from humans.
Peidong Jia,Chenxuan Li,Zeyu Liu,Yichao Shen,Xingru Chen,Yuhui Yuan,Yinglin Zheng,Dong Chen,Ji Li,Xiaodong Xie,Shanghang Zhang,Baining Guo
http://arxiv.org/abs/2311.16974v1
Compressor summary: COLE is a hierarchical framework that uses specialized models to generate and edit high-quality graphic designs based on user input, enhancing reliability and streamlining the process.
Aman Yadav,Abhishek Vichare
http://arxiv.org/abs/2311.16965v1
Compressor summary: The paper shows that using pre-trained BERT models for sentiment analysis on IMDb movie reviews improves accuracy, but cautions against overfitting or lack of generalization without further analysis.
Jingbo Zhang,Xiaoyu Li,Qi Zhang,Yanpei Cao,Ying Shan,Jing Liao
http://arxiv.org/abs/2311.16961v1
Compressor summary: The paper proposes a method called HumanRef for creating realistic 3D human models from one image that preserves texture details and maintains consistency in different views using reference-guided score distillation sampling and region-aware attention.
Kai Cheng,Xiaoxiao Long,Wei Yin,Jin Wang,Zhiqiang Wu,Yuexin Ma,Kaixuan Wang,Xiaozhi Chen,Xuejin Chen
http://arxiv.org/abs/2311.16945v1
Compressor summary: The paper introduces UC-NeRF, a method to improve NeRF's performance in under-calibrated multi-camera systems by addressing color inconsistency and pose calibration issues through layer-based correction, virtual warping, and spatiotemporal constraint refinement.
Luisa H. B. Liboni,Roberto C. Budzinski,Alexandra N. Busch,Sindy Löwe,Thomas A. Keller,Max Welling,Lyle E. Muller
http://arxiv.org/abs/2311.16943v1
Compressor summary: The text describes a recurrent neural network that uses complex numbers to perform image segmentation by dividing an image into groups based on structural characteristics, with a simple algorithm that generalizes across different input types.
Vaidehi Patil,Adyasha Maharana,Mohit Bansal
http://arxiv.org/abs/2311.16941v1
Compressor summary: The paper proposes a novel debiasing method for multimodal models using causally-motivated information minimization to learn confounder representations and improve OOD performance without sacrificing in-distribution performance.
James A. D. Gardner,Evgenii Kashin,Bernhard Egger,William A. P. Smith
http://arxiv.org/abs/2311.16937v1
Compressor summary: The authors propose a method to infer albedo, geometry, illumination, and sky visibility from unconstrained images using neural networks, achieving state-of-the-art results on a benchmark dataset.
Yuwei Guo,Ceyuan Yang,Anyi Rao,Maneesh Agrawala,Dahua Lin,Bo Dai
http://arxiv.org/abs/2311.16933v1
Compressor summary: SparseCtrl is a method for controlling video generation with minimal input signals, such as sketches or depth maps, improving flexibility and practicality for various applications.
Lanyun Zhu,Tianrun Chen,Deyi Ji,Jieping Ye,Jun Liu
http://arxiv.org/abs/2311.16926v1
Compressor summary: The paper introduces LLaFS, a method that uses large language models to improve few-shot segmentation by incorporating prior knowledge and providing multi-modal guidance.
Marzieh Gheisari,Auguste Genovesio
http://arxiv.org/abs/2311.16923v1
Compressor summary: The paper proposes a new method to improve super-resolution by constraining the search in the latent space of StyleGAN and expanding the image prior around the optimal code, achieving realistic and high-quality results.
Sicong Leng,Hang Zhang,Guanzheng Chen,Xin Li,Shijian Lu,Chunyan Miao,Lidong Bing
http://arxiv.org/abs/2311.16922v1
Compressor summary: The Visual Contrastive Decoding method reduces object hallucinations in large vision-language models by contrasting output distributions from original and distorted visual inputs without additional training or external tools.
Lingteng Qiu,Guanying Chen,Xiaodong Gu,Qi Zuo,Mutian Xu,Yushuang Wu,Weihao Yuan,Zilong Dong,Liefeng Bo,Xiaoguang Han
http://arxiv.org/abs/2311.16918v1
Compressor summary: The paper proposes a Normal-Depth diffusion model for 3D generation using LAION dataset and introduces an albedo diffusion model to handle mixed illumination effects, achieving state-of-the-art results.
Jiaxin Lu,Hao Kang,Haoxiang Li,Bo Liu,Yiding Yang,Qixing Huang,Gang Hua
http://arxiv.org/abs/2311.16917v1
Compressor summary: The UGG model generates diverse and successful grasping postures for objects by using a diffusion-based approach that incorporates object, hand, and contact information.
Peirong Liu,Oula Puonti,Xiaoling Hu,Daniel C. Alexander,Juan Eugenio Iglesias
http://arxiv.org/abs/2311.16914v1
Compressor summary: Brain-ID is a robust feature representation learning strategy for brain imaging that works well on various tasks and datasets, even with limited training data.
Der-Hau Lee
http://arxiv.org/abs/2311.16900v1
Compressor summary: The soft-CILQR algorithm improves the stability and smoothness of autonomous vehicle steering by using slack variables to soften constraints in the optimization process.
Gustavo Sutter Carvalho,Moacir Antonelli Ponti
http://arxiv.org/abs/2311.16894v1
Compressor summary: The paper proposes a new way to measure how well generative models capture all aspects of the data using dendrograms and shows it performs as well as existing methods.
Daniel Barley,Holger Fröning
http://arxiv.org/abs/2311.16883v1
Compressor summary: This paper proposes a method to reduce memory usage in DNNs during training by pruning activations using structured pruning and block sparsity, achieving up to 32% memory reduction without sacrificing accuracy on image classification tasks.
Bowen Li,Yongxin Yang,Steven McDonagh,Shifeng Zhang,Petru-Daniel Tudosiu,Sarah Parisot
http://arxiv.org/abs/2311.16882v1
Compressor summary: The paper introduces an image editing method that allows various types of instructions and balances local modifications with global consistency using two loss functions.
Thu Nguyen,Pål Halvorsen,Michael A. Riegler
http://arxiv.org/abs/2311.16877v1
Compressor summary: The authors propose a method that stacks the label with the input for imputation, improving the performance of classification models with missing data.
Oliver Urs Lenz,Henri Bollaert,Chris Cornelis
http://arxiv.org/abs/2311.16872v1
Compressor summary: The paper evaluates different methods for nearest neighbor classification using fuzzy logic and kernel functions, finding that some perform best with Boscovich distance and others with Yager negation.
Ebtesam Almazrouei,Hamza Alobeidli,Abdulaziz Alshamsi,Alessandro Cappelli,Ruxandra Cojocaru,Daniel Hesslow,Julien Launay,Quentin Malartic,Daniele Mazzotta,Badreddine Noune,Baptiste Pannier,Guilherme Penedo
http://arxiv.org/abs/2311.16867v1
Compressor summary: The Falcon series are causal decoder-only models with different sizes, pretrained on a large web dataset, and achieve high performance while being cost-effective.
Noëmi Aepli,Chantal Amrhein,Florian Schottmann,Rico Sennrich
http://arxiv.org/abs/2311.16865v1
Compressor summary: The paper evaluates how well evaluation metrics work for Swiss German dialects and proposes improvements to make them more robust.
Alexandra Sasha Luccioni,Yacine Jernite,Emma Strubell
http://arxiv.org/abs/2311.16863v1
Compressor summary: This paper compares the environmental cost of different types of machine learning systems, finding that multi-purpose generative models are much more energy-intensive than task-specific ones.
Noah Ford,Victor J. Leon,Honest Merman,Jeffrey Gilbert,Alexander New
http://arxiv.org/abs/2311.16860v1
Compressor summary: The paragraph discusses using SciML with Neural Basis Functions to improve predictions for high Mach fluid flows over irregular geometries when data is limited.
Wenzhong Yan,Juntao Wang,Feng Yin,Abdelhak M. Zoubir
http://arxiv.org/abs/2311.16856v1
Compressor summary: The paper proposes a novel GNN-based method for network localization that is stable, accurate, and robust to NLOS propagations, and introduces two attentional graph neural networks (AGNNs) that improve accuracy and flexibility by learning optimal hyperparameters.
Yufeng Zheng,Xueting Li,Koki Nagano,Sifei Liu,Otmar Hilliges,Shalini De Mello
http://arxiv.org/abs/2311.16854v1
Compressor summary: The authors propose Dream-in-4D, a novel two-stage method that leverages diffusion guidance to generate high-quality static and dynamic 3D scenes from text prompts, achieving significant improvements in quality and controllability compared to existing approaches.
Chen Zhao,Weiling Cai,Chenyu Dong,Chengwei Hu
http://arxiv.org/abs/2311.16845v1
Compressor summary: The paper introduces WF-Diff, a novel framework for enhancing underwater images using frequency domain information and diffusion models, which improves the visual quality of underwater images.
Lijun Sheng,Zhengbo Wang,Jian Liang
http://arxiv.org/abs/2311.16843v1
Compressor summary: The paper introduces a new domain adaptation benchmark called GeoNet with three challenges and presents a two-stage source-free method using Swin Transformer to achieve high performance in all challenges.
Julie Hunter,Jérôme Louradour,Virgile Rennard,Ismaïl Harrando,Guokan Shang,Jean-Pierre Lorré
http://arxiv.org/abs/2311.16840v1
Compressor summary: The Claire French Dialogue Dataset (CFDD) is a large corpus of diverse French texts released for developing multilingual language models, and this paper describes its composition, categories, and format.
Zhiyuan Zhao,Bin Wang,Linke Ouyang,Xiaoyi Dong,Jiaqi Wang,Conghui He
http://arxiv.org/abs/2311.16839v1
Compressor summary: The paper proposes HA-DPO, a novel approach to address the hallucination problem in multimodal language models by training them to prefer accurate responses over hallucinating ones, leading to improved accuracy and generalization.
Kunpeng Wang,Chenglong Li,Zhengzheng Tu,Bin Luo
http://arxiv.org/abs/2311.16835v1
Compressor summary: UniSOD is a framework that combines single-modal and multi-modal salient object detection using modality-aware prompts with task-specific hints, achieving consistent performance improvement on various datasets.
Qiqi Su,Christos Kloukinas,Artur d'Garcez
http://arxiv.org/abs/2311.16834v1
Compressor summary: The paper presents an interpretable modular neural network model for multivariate time series prediction that combines a recurrent neural network with an attention-based feature selection component and achieves high performance comparable to non-interpretable methods.
Bernd Prach,Fabio Brau,Giorgio Buttazzo,Christoph H. Lampert
http://arxiv.org/abs/2311.16833v1
Compressor summary: This paper compares different methods for creating 1-Lipschitz neural networks, which are more robust against input perturbations, and provides guidelines to choose the best method depending on available resources.
Jinfeng Zhou,Zhuang Chen,Dazhen Wan,Bosi Wen,Yi Song,Jifan Yu,Yongkang Huang,Libiao Peng,Jiaming Yang,Xiyao Xiao,Sahand Sabour,Xiaohan Zhang,Wenjing Hou,Yijia Zhang,Yuxiao Dong,Jie Tang,Minlie Huang
http://arxiv.org/abs/2311.16832v1
Compressor summary: CharacterGLM is a series of large language models that can generate character-based dialogues for conversational AI systems, with better performance than most existing models in terms of consistency, human-likeness, and engagement.
Boris Meinardus,Mariusz Trzeciakiewicz,Tim Herzig,Monika Kwiatkowski,Simon Matern,Olaf Hellwich
http://arxiv.org/abs/2311.16829v1
Compressor summary: Decomposer is a semi-supervised model that uses 3D Swin-Transformers and 3D U-Nets to separate distorted images into their original components and the applied augmentations like shadows, lighting, and occlusions.
Xiaojing Zhong,Xinyi Huang,Zhonghua Wu,Guosheng Lin,Qingyao Wu
http://arxiv.org/abs/2311.16828v1
Compressor summary: SARA is a novel method for makeup transfer that can handle large spatial misalignments, preserve the source images' identities, and achieve part-specific and shade-controllable results using three modules: spatial alignment, region-adaptive normalization, and makeup fusion.
Martin Briesch,Dominik Sobania,Franz Rothlauf
http://arxiv.org/abs/2311.16822v1
Compressor summary: The study examines how large language models generate and consume content in a self-consuming loop, finding that this process improves quality and diversity at first but then decreases diversity over time.
Xiaojing Zhong,Yukun Su,Zhonghua Wu,Guosheng Lin,Qingyao Wu
http://arxiv.org/abs/2311.16818v1
Compressor summary: DI-Net is a new method for 3D virtual try-on that uses two modules to reconstruct a human mesh with accurate pose and texture preservation.
Yuqing Wen,Yucheng Zhao,Yingfei Liu,Fan Jia,Yanhui Wang,Chong Luo,Chi Zhang,Tiancai Wang,Xiaoyan Sun,Xiangyu Zhang
http://arxiv.org/abs/2311.16813v1
Compressor summary: Panacea is a method to create diverse, annotated videos for autonomous driving research that ensures consistency and controllability in driving scenarios.
Yaoquan Wei,Shunyu Liu,Jie Song,Tongya Zheng,Kaixuan Chen,Yong Wang,Mingli Song
http://arxiv.org/abs/2311.16807v1
Compressor summary: The proposed A7 framework uses state feature similarity, proxy models, and behavior cloning to efficiently advise agents in DRL without relying on specific agents or expert teachers.
Hongru Wang,Lingzhi Wang,Yiming Du,Liang Chen,Jingyan Zhou,Yufei Wang,Kam-Fai Wong
http://arxiv.org/abs/2311.16789v1
Compressor summary: The survey describes the four stages of dialogue system evolution, highlighting their dependence on language model advancements and discussing current challenges and future directions for LLM-based systems.
Vilém Zouhar,Věra Kloudová,Martin Popel,Ondřej Bojar
http://arxiv.org/abs/2311.16787v1
Compressor summary: The article introduces a method to create more reliable human reference translations for machine translation evaluation by raising the bar of human translation quality.
Christel Chappuis,Eliot Walt,Vincent Mendez,Sylvain Lobry,Bertrand Le Saux,Devis Tuia
http://arxiv.org/abs/2311.16782v1
Compressor summary: The text discusses language biases in remote sensing visual question answering (RSVQA), their impact on model performance and robustness, and the need for less-biased datasets and more informative evaluation metrics.
M. Ibsen,C. Rathgeb,S. Marcel,C. Busch
http://arxiv.org/abs/2311.16773v1
Compressor summary: The authors propose a new method for detecting synthetic face images using Cross Modal Focal Loss, which performs better than existing methods in cross-model experiments.
Anuj Srivastava,Karm Patel,Pradeep Shenoy,Devarajan Sridharan
http://arxiv.org/abs/2311.16766v1
Compressor summary: The paper discusses the problem of selective classification during automated diagnosis with domain-shifted medical images and how current approaches fail to handle uncertainty in such cases, leading to poor performance and a need for human intervention.
Amos Calamida,Farhad Nooralahzadeh,Morteza Rohanian,Koji Fujimoto,Mizuho Nishio,Michael Krauthammer
http://arxiv.org/abs/2311.16764v1
Compressor summary: The authors propose a new evaluation metric for machine-generated radiology reports using COMET architecture, train models, and show that their metric correlates well with existing metrics and human judgment.
Senkang Hu,Zhengru Fang,Xianhao Chen,Yuguang Fang,Sam Kwong
http://arxiv.org/abs/2311.16754v1
Compressor summary: The paragraph discusses a framework for improving collaborative perception in autonomous vehicles by addressing domain shifts and data heterogeneity using Amplitude Augmentation and meta-consistency training, as well as an intra-system domain alignment mechanism during inference.
Seungwoo Yoo,Kunho Kim,Vladimir G. Kim,Minhyuk Sung
http://arxiv.org/abs/2311.16739v1
Compressor summary: APAP is a mesh deformation technique that uses 2D diffusion models and user input to create realistic and plausible edits of 2D and 3D meshes.
Rui Wang,Xiao-Jun Wu,Hui Li,Josef Kittler
http://arxiv.org/abs/2311.16738v1
Compressor summary: The paper proposes an SPD manifold self-attention mechanism (SMSA) for learning features in scientific areas, which uses geometric operations like Riemannian metric and optimization to capture dependencies of data on a curved Riemannian manifold.
Jiajun Huang,Hongchuan Yu
http://arxiv.org/abs/2311.16737v1
Compressor summary: Point'n Move is an interactive scene manipulation method that uses Gaussian Splatting Radiance Field for real-time editing and inpainting of selected objects in scenes.
Huajian Huang,Longwei Li,Hui Cheng,Sai-Kit Yeung
http://arxiv.org/abs/2311.16728v1
Compressor summary: Photo-SLAM is a novel SLAM framework that uses explicit geometric features and implicit photometric features for efficient online photorealistic mapping on portable devices.
Yijun Yang,Tianyi Zhou,Kanxue Li,Dapeng Tao,Lusong Li,Li Shen,Xiaodong He,Jing Jiang,Yuhui Shi
http://arxiv.org/abs/2311.16714v1
Compressor summary: The paper presents EMMA, an embodied multi-modal agent that adapts quickly to a visual world by distilling knowledge from a large language model in a parallel text world.
Manuel Brack,Felix Friedrich,Katharina Kornmeier,Linoy Tsaban,Patrick Schramowski,Kristian Kersting,Apolinário Passos
http://arxiv.org/abs/2311.16711v1
Compressor summary: LEDITS++ is a text-based image editing technique that is efficient, versatile, and precise, supporting multiple edits without fine-tuning or optimization.
Mohammad Reza Karimi,Ya-Ping Hsieh,Andreas Krause
http://arxiv.org/abs/2311.16706v1
Compressor summary: The text discusses entropy-regularized optimal transport problems in machine learning, introducing a continuous-time version of the Sinkhorn algorithm that can handle noise and bias and connects to other related dynamics.
Haocheng Yuan,Jing Xu,Hao Pan,Adrien Bousseau,Niloy Mitra,Changjian Li
http://arxiv.org/abs/2311.16703v1
Compressor summary: The authors propose a method to semantically comment on CAD programs by parsing them, generating shapes and images, and using visual-semantic analysis to assign labels to code blocks.
Vandan Gorade,Sparsh Mittal,Debesh Jha,Ulas Bagci
http://arxiv.org/abs/2311.16700v1
Compressor summary: HLFD is a novel method that improves knowledge distillation for medical imaging tasks by strategically transferring knowledge from middle to earlier layers and vice versa, leading to better focus on tumor-specific features and improved performance.
Jixiao Zhang,Yongkang Li,Ruotong Zou,Jingyuan Zhang,Zipei Fan,Xuan Song
http://arxiv.org/abs/2311.16683v1
Compressor summary: The paper proposes a new model, HKGNN, for POI recommendation in LBSN that considers hyper-relations, structural information, and side information to address data sparsity and improve recommendations.
Jiawei Wang,Changjian Li
http://arxiv.org/abs/2311.16682v1
Compressor summary: ContextSeg is a novel method that uses an autoencoder with dense distance fields and a Transformer with group-based labeling to achieve state-of-the-art semantic segmentation of strokes in computer vision.
Maximilian Dreyer,Reduan Achtibat,Wojciech Samek,Sebastian Lapuschkin
http://arxiv.org/abs/2311.16681v1
Compressor summary: The authors propose a novel XAI framework that uses prototypes to convey both local and global decision-making strategies of DNNs, enabling better understanding, model validation, and detection of outlier behavior.
Dan Ma,Jun Xu,Zongyu Wang,Xuezhi Cao,Yunsen Xian
http://arxiv.org/abs/2311.16678v1
Compressor summary: The paper introduces a new task (EASQE) for aspect-based sentiment analysis that decomposes aspect terms into entities and aspects, and proposes a baseline method (Trigger-Opinion) that outperforms existing approaches.
Gioele Cadamuro,Marco Gruppo
http://arxiv.org/abs/2311.16675v1
Compressor summary: The authors propose a neural network based on siamese architecture to solve a semantic textual similarity problem with highly specific information, using a threshold to distinguish similar from dissimilar pairs and combining features from both distributions and distance functions to score predictions.
Raby Hamadi
http://arxiv.org/abs/2311.16673v1
Compressor summary: The paper surveys the latest advancements in transformers and their impact on computer vision and large language models, comparing different models and datasets, and suggesting future directions for research.
Jesus Zarzar,Bernard Ghanem
http://arxiv.org/abs/2311.16671v1
Compressor summary: The novel approach digitizes real-world objects by estimating their geometry, material properties, and lighting from posed images with fixed lighting using Neural Radiance Fields and image-based lighting.
Xiyuan Wang,Muhan Zhang
http://arxiv.org/abs/2311.16670v1
Compressor summary: PyTorch Geometric High Order (PyGHO) is a library that simplifies the implementation of high-order graph neural networks and improves their performance on real-world tasks.
Laura Fink,Darius Rückert,Linus Franke,Joachim Keinert,Marc Stamminger
http://arxiv.org/abs/2311.16668v1
Compressor summary: LiveNVS is a system that enables real-time, high-quality neural novel view synthesis for live RGB-D input using projected neural features and a generalizable neural network.
Zhuoyuan Wang,Jiacong Mi,Shan Lu,Jieyue He
http://arxiv.org/abs/2311.16666v1
Compressor summary: MolIG is a novel framework that uses both image and graph structures to predict drug molecule properties, outperforming existing models.
Zhuopeng Li,Chenming Wu,Liangjun Zhang,Jianke Zhu
http://arxiv.org/abs/2311.16664v1
Compressor summary: The paper proposes a novel method called Density-Guided Neural Rendering (DGNR) that learns a density space from scenes to guide the construction of a point-based renderer for large-scale driving scenes, eliminating the need for geometric priors and achieving photorealistic and efficient rendering.
Yu Chen,Gim Hee Lee
http://arxiv.org/abs/2311.16657v1
Compressor summary: SCALAR-NeRF is a framework that uses an encoder-decoder architecture and a coarse-to-fine strategy to reconstruct large-scale scenes efficiently and effectively, outperforming existing NeRF methods.
Theo Gruner,Boris Belousov,Fabio Muratore,Daniel Palenicek,Jan Peters
http://arxiv.org/abs/2311.16656v1
Compressor summary: Pseudo-Likelihood Inference (PLI) is a new method that improves Simulation-Based Inference by combining neural approximation with likelihood kernel, making it better at handling challenging Bayesian system identification tasks on higher dimensions and dynamic systems.
Aida Brankovic,Wenjie Huang,David Cook,Sankalp Khanna,Konstanty Bialkowski
http://arxiv.org/abs/2311.16654v1
Compressor summary: The study evaluates how well explainable AI methods align with expert medical knowledge in clinical decision support algorithms for EMR data, identifying discrepancies and suggesting ways to improve trustworthiness.
Zhantao Chen,Cong Wang,Mingye Gao,Chun Hong Yoon,Jana B. Thayer,Joshua J. Turner
http://arxiv.org/abs/2311.16652v1
Compressor summary: The authors present a machine learning approach that improves Single Particle Imaging (SPI) with X-ray Free Electron Lasers (XFELs) by estimating particle orientations and reciprocal space intensities from diffraction images only.
Jiahuan Yan,Haojun Gao,Zhang Kai,Weize Liu,Danny Chen,Jian Wu,Jintai Chen
http://arxiv.org/abs/2311.16650v1
Compressor summary: The paper proposes Text2Tree, a novel algorithm that uses internal label hierarchy in training deep learning models for medical text classification, improving performance on imbalanced and scarce data.
Ming-Yu Chung,Sheng-Yen Chou,Chia-Mu Yu,Pin-Yu Chen,Sy-Yen Kuo,Tsung-Yi Ho
http://arxiv.org/abs/2311.16646v1
Compressor summary: The study presents new trigger pattern generation methods for dataset distillation, which enable effective and hard-to-detect backdoor attacks.
Gaël Le Mens,Aina Gallego
http://arxiv.org/abs/2311.16639v1
Compressor summary: The text describes how GPT-4 can estimate the positions of political texts on various dimensions accurately, quickly, and cheaply, comparing its performance with other methods.
Jian Yu,Yi Yu,Feipeng Da
http://arxiv.org/abs/2311.16637v1
Compressor summary: The paper presents a new method for stitching large parallax images using epipolar geometry, which reduces alignment artifacts and maintains projectivity.
Sitong Su,Litao Guo,Lianli Gao,Hengtao Shen,Jingkuan Song
http://arxiv.org/abs/2311.16635v1
Compressor summary: MotionZero is a method for generating videos from text prompts without using motion information, exploiting the implied motion priors in the prompts to accurately and independently control the motion of different objects.
Takuma Nakamura,Yuki Saito,Ryosuke Goto
http://arxiv.org/abs/2311.16630v1
Compressor summary: The paper introduces a new framework for outfit completion using deep neural networks and a conditional set transformation architecture that improves accuracy and scalability.
Clara Stoddart,Lauren Shrack,Richard Sserunjogi,Usman Abdul-Ganiy,Engineer Bainomugisha,Deo Okure,Ruth Misener,Jose Pablo Folch,Ruby Sedgwick
http://arxiv.org/abs/2311.16625v1
Compressor summary: The paper explores using Gaussian Processes to nowcast and forecast air pollution in Kampala, Uganda, where sensor coverage is limited.
Itamar Zimerman,Lior Wolf
http://arxiv.org/abs/2311.16620v1
Compressor summary: The authors propose modifications to the transformer architecture inspired by long-range layers, improving its performance on the Long Range Arena benchmark while maintaining simplicity and minimal additional computation.
Jiepan Li,Fangxiao Lu,Nan Xue,Zhuohong Li,Hongyan Zhang,Wei He
http://arxiv.org/abs/2311.16618v1
Compressor summary: The paper introduces a new method called OWinCA that enhances low-level features for detecting camouflaged objects using cross-level attention and an overlapped window partition strategy, achieving better results than existing methods.
Stefan Schrod,Fabian Sinz,Michael Altenbuchinger
http://arxiv.org/abs/2311.16616v1
Compressor summary: ADBCR is a machine learning method for counterfactual reasoning that uses potential outcome estimates to remove spurious causal relations and performs well on benchmark datasets, especially when using unlabeled validation data.
Jintang Li,Jiawang Dan,Ruofan Wu,Jing Zhou,Sheng Tian,Yunfei Liu,Baokun Wang,Changhua Meng,Weiqiang Wang,Yuchang Zhu,Liang Chen,Zibin Zheng
http://arxiv.org/abs/2311.16605v1
Compressor summary: LasTGL is an industrial framework that integrates implementations of common temporal graph learning algorithms to facilitate research and application in this emerging field.
Daeun Lee,Minhyeok Heo,Jiwon Kim
http://arxiv.org/abs/2311.16589v1
Compressor summary: The paper proposes a new framework for improving lane detection algorithms by using HD maps and generative models to increase data diversity without expanding the data volume.
Rui Yang,Qingcheng Zeng,Keen You,Yujie Qiao,Lucas Huang,Chia-Chun Hsieh,Benjamin Rosand,Jeremy Goldwasser,Amisha D Dave,Tiarnan D. L. Keenan,Emily Y Chew,Dragomir Radev,Zhiyong Lu,Hua Xu,Qingyu Chen,Irene Li
http://arxiv.org/abs/2311.16588v1
Compressor summary: Summary: The MedGen NLP toolkit offers easy-to-use generative and basic NLP functions for biomedical researchers and healthcare professionals, with fine-tuned domain models and public availability.
Zicheng Wang,Zhen Zhao,Erjian Guo,Luping Zhou
http://arxiv.org/abs/2311.16580v1
Compressor summary: The authors propose a class-balanced sampling strategy and a noisy feature-aided clean label disentangling framework to address the noisy label issue in medical image segmentation, achieving state-of-the-art performance.
Xinhong Chen,Zongxi Li,Yaowei Wang,Haoran Xie,Jianping Wang,Qing Li
http://arxiv.org/abs/2311.16579v1
Compressor summary: The paper introduces a new task that identifies valid causal relationships between emotions and causes in texts, taking into account specific context clauses, and proposes a multi-task framework to handle this task.
AprilPyone MaungMaung,Isao Echizen,Hitoshi Kiya
http://arxiv.org/abs/2311.16577v1
Compressor summary: The paper introduces key-based defense model proliferation using pre-trained models and efficient fine-tuning techniques for on-device image classification, improving accuracy by more than 10%.
Yang Zhao,Yanwu Xu,Zhisheng Xiao,Tingbo Hou
http://arxiv.org/abs/2311.16567v1
Compressor summary: The paper introduces MobileDiffusion, an efficient text-to-image diffusion model with reduced size and fast inference speed, achieved through architecture optimization and sampling techniques.
Peng Chen,Xiaobao Wei,Ming Lu,Yitong Zhu,Naiming Yao,Xingyu Xiao,Hui Chen
http://arxiv.org/abs/2311.16565v1
Compressor summary: The proposed DiffusionTalker method uses contrastive learning and knowledge distillation to personalize and speed up 3D facial animation based on speech input, overcoming limitations of existing diffusion-based approaches.
Xingyu Zhao,Yuexuan An,Lei Qi,Xin Geng
http://arxiv.org/abs/2311.16556v1
Compressor summary: SLDL is a novel multi-label classification method that uses continuous distributions in a low-dimensional latent space to model asymmetric label correlations, reducing computational complexity and achieving competitive performance.
Ling Fu,Zijie Wu,Yingying Zhu,Yuliang Liu,Xiang Bai
http://arxiv.org/abs/2311.16555v1
Compressor summary: DiffText is a new method that uses a diffusion model to create realistic synthetic text images with less spelling errors and better background integration, improving scene text detection performance.
Shutong Zhang,Yi-Ling Qiao,Guanglei Zhu,Eric Heiden,Dylan Turpin,Jingzhou Liu,Ming Lin,Miles Macklin,Animesh Garg
http://arxiv.org/abs/2311.16552v1
Compressor summary: HandyPriors is a unified and general pipeline for pose estimation in human-object interaction scenes using differentiable physics and rendering priors, with two alternatives for hand and object pose estimation that achieve comparable or superior results and can be used for robotic manipulation and perception tasks.
Owen Howell,Haoen Huang,David Rosen
http://arxiv.org/abs/2311.16544v1
Compressor summary: The authors propose a convex spectral relaxation method for estimating unknown orientations in robotics and computer vision, which has advantages over prior methods and provides performance guarantees under specific noise assumptions.
Siyu Xing,Jie Cao,Huaibo Huang,Xiao-Yu Zhang,Ran He
http://arxiv.org/abs/2311.16507v1
Compressor summary: StraightFM is a novel flow matching method that straightens trajectories using diffusion models and real data, resulting in higher quality images with fewer sampling steps.
Yi Zheng,Chongyang Ma,Kanle Shi,Haibin Huang
http://arxiv.org/abs/2311.16542v1
Compressor summary: The OKR-Agent framework enhances Large Language Models' task-solving abilities by using self-collaboration, self-correction, and hierarchical agents to improve domain knowledge, reasoning, and execution structure.
Ray Zirui Zhang,Ivan Ezhov,Michal Balcerak,Andy Zhu,Benedikt Wiestler,Bjoern Menze,John Lowengrub
http://arxiv.org/abs/2311.16536v1
Compressor summary: This paper proposes a method that uses Physics-Informed Neural Networks to estimate patient-specific parameters of a model of Glioblastoma growth from a single MRI scan, which could help in designing personalized radiotherapy treatment plans.
Xiangguo Sun,Jiawen Zhang,Xixi Wu,Hong Cheng,Yun Xiong,Jia Li
http://arxiv.org/abs/2311.16534v1
Compressor summary: Key points: - The paper surveys the emerging domain of graph prompts in AGI, addressing challenges and opportunities - It proposes a unified framework for understanding graph prompt learning and categorizes over 100 works in this field - It presents ProG, a Python library and website, to support research in graph prompting - It discusses current challenges and future directions, offering a roadmap for research in graph prompting within AGI Summary: The paper provides an overview of graph prompts in AGI, introducing a framework, a library, and a roadmap for this new domain.
Runzhi Tian,Yongyi Mao
http://arxiv.org/abs/2311.16526v1
Compressor summary: Adversarial training causes robust overfitting and this paper investigates its correlation with perturbation-induced distributions and proposes a new upper bound for generalization error using local dispersion.
Sihwa Park,Seongjun Kim,In-Seok Song,Seung Jun Baek
http://arxiv.org/abs/2311.16524v1
Compressor summary: Occudent is a novel framework that uses neural implicit functions to reconstruct 3D teeth shapes from panoramic radiographs, outperforming existing methods.
Hao Pei,Si Lin,Chuanfu Li,Che Wang,Haoming Chen,Sizhe Li
http://arxiv.org/abs/2311.16522v1
Compressor summary: The text describes a new method using a graph neural network that can accurately detect and analyze faults in power grids with high accuracy and insightful results.
Zhihao Kong,Amirhossein Mollaali,Christian Moya,Na Lu,Guang Lin
http://arxiv.org/abs/2311.16519v1
Compressor summary: B-LSTM-MIONet is a redesigned framework that combines MIONet, LSTM, and Bayesian methods to learn neural operators from time-dependent data, handling variable-length real-time data and providing uncertainty quantification for complex systems modeling.
Kazuki Yamauchi,Yusuke Ijima,Yuki Saito
http://arxiv.org/abs/2311.16509v1
Compressor summary: StyleCap is a method to generate natural language descriptions of speaking styles in speech using neural networks, paired data, and large language models.
Zizhao Hu,Shaochong Jia,Mohammad Rostami
http://arxiv.org/abs/2311.16488v1
Compressor summary: The paper introduces PS-U-Net, an efficient multimodal diffusion model that preserves modality-specific details and a new multimodal sampling method for conditional generation of text and image data with higher quality.
Zixiang Zhou,Yu Wan,Baoyuan Wang
http://arxiv.org/abs/2311.16471v1
Compressor summary: The paper presents a scalable method to generate multimodal and multi-part human motion using codebooks and pre-trained models.
Zixiang Zhou,Yu Wan,Baoyuan Wang
http://arxiv.org/abs/2311.16468v1
Compressor summary: AvatarGPT is an all-in-one framework that uses a large language model to perform various motion-related tasks, such as understanding, planning, and generating human motions, by treating each task as an instruction fine-tuned on the shared model.
Jingye Chen,Yupan Huang,Tengchao Lv,Lei Cui,Qifeng Chen,Furu Wei
http://arxiv.org/abs/2311.16465v1
Compressor summary: TextDiffuser-2 is a method that uses a large language model to improve the flexibility, automation, and style diversity of text rendering in diffusion models.
Yicheng Xiao,Zhuoyan Luo,Yong Liu,Yue Ma,Hengwei Bian,Yatai Ji,Yujiu Yang,Xiu Li
http://arxiv.org/abs/2311.16464v1
Compressor summary: UVCOM is a framework that effectively combines Video Moment Retrieval and Highlight Detection tasks by integrating multi-granularity, intra and inter-modality, and multi-aspect contrastive learning.
Jie Li,Zhixin Li,Zhi Liu,Pengyuan Zhou,Richang Hong,Qiyue Li,Han Hu
http://arxiv.org/abs/2311.16462v1
Compressor summary: The paper proposes a novel method for improving viewport prediction in volumetric video streaming using saliency detection, trajectory information, and a new sampling technique.
Gourav Datta,Zeyu Liu,Anni Li,Peter A. Beerel
http://arxiv.org/abs/2311.16456v1
Compressor summary: The paper proposes a new training framework that adapts the number of time steps for each module in vision transformers, resulting in energy-efficient spiking neural networks for image recognition tasks.
Harsha Nori,Yin Tat Lee,Sheng Zhang,Dean Carignan,Richard Edgar,Nicolo Fusi,Nicholas King,Jonathan Larson,Yuanzhi Li,Weishung Liu,Renqian Luo,Scott Mayer McKinney,Robert Osazuwa Ness,Hoifung Poon,Tao Qin,Naoto Usuyama,Chris White,Eric Horvitz
http://arxiv.org/abs/2311.16452v1
Compressor summary: The authors explore prompt engineering techniques with GPT-4 to unlock its specialist capabilities in various domains, achieving state-of-the-art results on medical benchmarks and outperforming specialist models.
Huanxin Chen,Pengshuai Yin,Huichou Huang,Qingyao Wu,Ruirui Liu,Xiatian Zhu
http://arxiv.org/abs/2311.16450v1
Compressor summary: The Typhoon Intensity Transformer (Tint) uses self-attention mechanisms to capture local and global contextual relations in satellite images, improving typhoon intensity prediction accuracy.
Hanyuan Wang,Majid Mirmehdi,Dima Damen,Toby Perrett
http://arxiv.org/abs/2311.16446v1
Compressor summary: The paper proposes a novel method to improve one-stage action detection by fusing visual and audio modalities, using multi-scale cross-attention and a centricity score that estimates the proximity of timesteps to the action centre, achieving state-of-the-art performance on EPIC-Kitchens-100.
Yichao Cai,Yuhang Liu,Zhen Zhang,Javen Qinfeng Shi
http://arxiv.org/abs/2311.16445v1
Compressor summary: The study proposes a method to improve vision-language models' resilience against perturbations by modifying text data's style while preserving its content, without retraining the image encoder on adversarial examples.
Takehiko Ohkawa,Takuma Yagi,Taichi Nishimura,Ryosuke Furuta,Atsushi Hashimoto,Yoshitaka Ushiku,Yoichi Sato
http://arxiv.org/abs/2311.16444v1
Compressor summary: The paper introduces a novel benchmark for transferring knowledge from exocentric web videos to dense video captioning of egocentric videos using adversarial training, addressing the challenge of dynamic view changes between these two domains.
Jinhao Li,Shiyao Li,Jiaming Xu,Shan Huang,Yaoxiu Lian,Jun Liu,Yu Wang,Guohao Dai
http://arxiv.org/abs/2311.16442v1
Compressor summary: The paper proposes techniques to improve the accuracy and efficiency of large language models by adjusting the bit width of quantization and optimizing dequantization operations on GPUs.
Yuanze Lin,Yi-Wen Chen,Yi-Hsuan Tsai,Lu Jiang,Ming-Hsuan Yang
http://arxiv.org/abs/2311.16432v1
Compressor summary: The paper presents a text-to-image editing method that uses bounding boxes to find edit regions based on textual prompts, achieving high fidelity and realism with complex prompts.
Yutong He,Naoki Murata,Chieh-Hsin Lai,Yuhta Takida,Toshimitsu Uesaka,Dongjun Kim,Wei-Hsiang Liao,Yuki Mitsufuji,J. Zico Kolter,Ruslan Salakhutdinov,Stefano Ermon
http://arxiv.org/abs/2311.16424v1
Compressor summary: MPGD is a training-free method for conditional image generation that uses diffusion models, neural networks, and pretrained autoencodters, achieving efficiency and effectiveness with speed-ups and high sample quality.
Yuhang Wang,Yanxu Zhu,Chao Kong,Shuyu Wei,Xiaoyuan Yi,Xing Xie,Jitao Sang
http://arxiv.org/abs/2311.16421v1
Compressor summary: The paragraph introduces CDEval, a new benchmark to evaluate cultural dimensions of Large Language Models (LLMs), emphasizing the importance of cultural considerations in their development and applications.
YiFan Zhang,Xue Wang,Tian Zhou,Kun Yuan,Zhang Zhang,Liang Wang,Rong Jin,Tieniu Tan
http://arxiv.org/abs/2311.16420v1
Compressor summary: The Non-Parametric Test Time Adaptation framework for Out-Of-Distribution Detection (NPTTA) is a method that adapts to changing data distributions during testing and uses detected out-of-distribution samples to improve reliability, achieving better performance than existing methods.
Gonzalo Uribarri,Simon Ekman von Huth,Josefine Waldthaler,Per Svenningsson,Erik Fransén
http://arxiv.org/abs/2311.16381v1
Compressor summary: The authors investigate using deep learning algorithms to classify Parkinson's disease from eye-tracking data during saccade experiments, achieving high accuracy with raw fixation interval inputs.