This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-06 generated by the compressor, my personal LLM-based project.
Rundi Wu,Ben Mildenhall,Philipp Henzler,Keunhong Park,Ruiqi Gao,Daniel Watson,Pratul P. Srinivasan,Dor Verbin,Jonathan T. Barron,Ben Poole,Aleksander Holynski
http://arxiv.org/abs/2312.02981v1
Compressor summary: ReconFusion is a method that uses a diffusion prior to reconstruct 3D scenes from few input images with realistic geometry and texture.
Zhangyang Qi,Ye Fang,Zeyi Sun,Xiaoyang Wu,Tong Wu,Jiaqi Wang,Dahua Lin,Hengshuang Zhao
http://arxiv.org/abs/2312.02980v1
Compressor summary: GPT4Point is a new model that improves 3D object understanding and generation using point-text features and Pyramid-XL, a large dataset annotation engine.
Lisa Dunlap,Yuhui Zhang,Xiaohan Wang,Ruiqi Zhong,Trevor Darrell,Jacob Steinhardt,Joseph E. Gonzalez,Serena Yeung-Levy
http://arxiv.org/abs/2312.02974v1
Compressor summary: The text describes a method called Set Difference Captioning that automatically generates descriptions of differences between two sets of images using a two-stage approach and a dataset called VisDiffBench.
Shoukang Hu,Ziwei Liu
http://arxiv.org/abs/2312.02973v1
Compressor summary: GauHuman is a fast 3D human model that uses Gaussian Splatting for training and rendering, achieving state-of-the-art performance without compromising quality.
Prafull Sharma,Varun Jampani,Yuanzhen Li,Xuhui Jia,Dmitry Lagun,Fredo Durand,William T. Freeman,Mark Matthews
http://arxiv.org/abs/2312.02970v1
Compressor summary: The proposed method uses text-to-image models and a synthetic dataset with controlled material properties to edit object attributes like roughness, metallic, albedo, and transparency in real images while preserving other features.
Xinyu Zhang,Sebastian Hofstätter,Patrick Lewis,Raphael Tang,Jimmy Lin
http://arxiv.org/abs/2312.02969v1
Compressor summary: The authors propose a new listwise reranker that does not depend on GPT models and outperforms existing ones in passage retrieval experiments, highlighting the need for better listwise ranking data.
Boheng Zhao,Rana Hanocka,Raymond A. Yeh
http://arxiv.org/abs/2312.02967v1
Compressor summary: The paper proposes a new method to generate ambigrams using a large-scale vision and language diffusion model that improves legibility and accuracy.
Cheng-Ju Ho,Chen-Hsuan Tai,Yen-Yu Lin,Ming-Hsuan Yang,Yi-Hsuan Tsai
http://arxiv.org/abs/2312.02966v1
Compressor summary: The paper proposes Diffusion-SS3D, a method that uses a diffusion model to improve pseudo-label generation and semi-supervised object detection in 3D scenes by incorporating noise and denoising it.
Zhangyang Xiong,Chenghong Li,Kenkun Liu,Hongjie Liao,Jianqiao Hu,Junyi Zhu,Shuliang Ning,Lingteng Qiu,Chongjie Wang,Shijie Wang,Shuguang Cui,Xiaoguang Han
http://arxiv.org/abs/2312.02963v1
Compressor summary: MVHumanNet is a large-scale 3D human dataset that enables progress in various visual tasks, addressing the lack of high-quality human data for 3D vision research.
Akshat Jindal,Shreya Singh,Soham Gadgil
http://arxiv.org/abs/2312.02957v1
Compressor summary: The paper explores and proposes solutions for the geographical biases in image classification models using two datasets and various mitigation methods.
Hao Zhang,Hongyang Li,Feng Li,Tianhe Ren,Xueyan Zou,Shilong Liu,Shijia Huang,Jianfeng Gao,Lei Zhang,Chunyuan Li,Jianwei Yang
http://arxiv.org/abs/2312.02949v1
Compressor summary: The authors propose a new dataset for grounded visual chat (GVC), a benchmark called Grounding-Bench, and a model that combines segmentation and language models to improve GVC capabilities.
Yao Teng,Enze Xie,Yue Wu,Haoyu Han,Zhenguo Li,Xihui Liu
http://arxiv.org/abs/2312.02936v1
Compressor summary: The paper presents Drag-A-Video, a diffusion-based method for interactive point-based video manipulation that allows users to modify the contents of videos by dragging points and masks across frames.
Jiachen Lu,Ze Huang,Jiahui Zhang,Zeyu Yang,Li Zhang
http://arxiv.org/abs/2312.02934v1
Compressor summary: WoVoGen is a system that generates high-quality and diverse street-view videos for autonomous driving datasets by using a 4D world volume and sensor interconnectivity.
Lukas Wolf,Klemen Kotar,Greta Tuckute,Eghbal Hosseini,Tamar Regev,Ethan Wilcox,Alex Warstadt
http://arxiv.org/abs/2312.02931v1
Compressor summary: The paper introduces Whisbert, a multimodal language model that combines text and audio, but finds that it does not improve over the text-only version in terms of optimization and performance.
Xi Chen,Zhiheng Liu,Mengting Chen,Yutong Feng,Yu Liu,Yujun Shen,Hengshuang Zhao
http://arxiv.org/abs/2312.02928v1
Compressor summary: LivePhoto is a system that uses text to animate images with temporal motions and allows users to adjust the intensity of the motion.
Qizhe Zhang,Bocheng Zou,Ruichuan An,Jiaming Liu,Shanghang Zhang
http://arxiv.org/abs/2312.02923v1
Compressor summary: MoSA is a new Adapter Tuning method that splits adapters into modules, stochastically activates them for sparse training, and merges them after tuning to achieve better performance than standard methods without extra overhead.
Hsin-Ping Huang,Yu-Chuan Su,Deqing Sun,Lu Jiang,Xuhui Jia,Yukun Zhu,Ming-Hsuan Yang
http://arxiv.org/abs/2312.02919v1
Compressor summary: FACTOR is a text-to-video generation method that allows detailed control of objects' appearances, context, location, and category by injecting control signals into the existing model using joint encoder and adaptive cross-attention layers.
Yuang Ai,Huaibo Huang,Xiaoqiang Zhou,Jiexiang Wang,Ran He
http://arxiv.org/abs/2312.02918v1
Compressor summary: MPerceiver is a multimodal prompt learning approach that leverages Stable Diffusion priors to improve adaptiveness, generalizability, and fidelity for all-in-one image restoration, outperforming state-of-the-art methods in multiple tasks.
Jacopo Bonato,Francesco Pelosin,Luigi Sabetta,Alessandro Nicolosi
http://arxiv.org/abs/2312.02916v1
Compressor summary: MIND is a method that improves replay-free learning in dynamic data streams, achieving state-of-the-art results on several benchmarks with significant accuracy gains.
Arun Reddy,William Paul,Corban Rivera,Ketul Shah,Celso M. de Melo,Rama Chellappa
http://arxiv.org/abs/2312.02914v1
Compressor summary: UNITE is a method that adapts a video student model to a new domain using an image teacher model with self-supervised pre-training and self-training.
Zahra Abbasiantaeb,Yifei Yuan,Evangelos Kanoulas,Mohammad Aliannejadi
http://arxiv.org/abs/2312.02913v1
Compressor summary: The proposed framework simulates human-like conversations for question-answering systems using GPT-4 as both student and teacher, evaluating their performance and comparing them to human-generated conversations.
Helisa Dhamo,Yinyu Nie,Arthur Moreau,Jifei Song,Richard Shaw,Yiren Zhou,Eduardo Pérez-Pellitero
http://arxiv.org/abs/2312.02902v1
Compressor summary: HeadGaS is a hybrid model that uses 3D Gaussian Splats and learnable latent features for fast and high-quality 3D head reconstruction and animation.
Cristiano Mesquita Garcia,Ramon Simoes Abilio,Alessandro Lameiras Koerich,Alceu de Souza Britto Jr.,Jean Paul Barddal
http://arxiv.org/abs/2312.02901v1
Compressor summary: The paragraph discusses how researchers are working on discovering patterns in textual data from social media and other sources, and the challenges they face due to concept drift and outdated datasets and models.
Rizhao Cai,Zirui Song,Dayan Guan,Zhenhao Chen,Xing Luo,Chenyu Yi,Alex Kot
http://arxiv.org/abs/2312.02896v1
Compressor summary: The paper introduces BenchLMM, a benchmark to evaluate how well large multimodal models (LMMs) can handle different image styles, and suggests a method to improve their performance by having them predict the style first.
Dongkeun Kim,Youngkil Song,Minsu Cho,Suha Kwak
http://arxiv.org/abs/2312.02878v1
Compressor summary: The authors introduce a new large-scale dataset (Caf'e) for group activity detection (GAD) in videos, along with a new model that handles unknown groups and members better than previous approaches.
Yang Ai,Xi Yang
http://arxiv.org/abs/2312.02877v1
Compressor summary: The paper proposes a dynamic, iterative point cloud registration method that uses deep global sampling and local registration to remove noisy points, improving efficiency by over 40% on two datasets.
Lukas Schulze Balhorn,Marc Caballero,Artur M. Schweidtmann
http://arxiv.org/abs/2312.02873v1
Compressor summary: The authors propose using generative AI and large language models to automatically correct errors in process flow diagrams and instrumentation diagrams, which could improve safety, efficiency, and cost savings in the process engineering domain.
Angie Nataly Melo,Carlota Salinas,Miguel Angel Sotelo
http://arxiv.org/abs/2312.02872v1
Compressor summary: The research proposes an explainable and interpretable pedestrian crossing prediction method using deep learning and fuzzy logic.
Danyal Rehman,John H. Lienhard
http://arxiv.org/abs/2312.02871v1
Compressor summary: The authors propose a machine learning approach to model ion transport in nanoporous membranes, using attention-enhanced neural differential equations that incorporate electroneutrality biases and outperform conventional PDE-based methods.
Gaëtan Frusque,Ismail Nejjar,Majid Nabavi,Olga Fink
http://arxiv.org/abs/2312.02867v1
Compressor summary: The authors propose a semi-supervised method to construct Health Index (HI) for system health evaluation using run-to failure datasets and deep learning, addressing interpretability and sensitivity issues, and applying it to monitor wear states of thermal spray coatings.
Alexandra Zytek,Wei-En Wang,Sofia Koukoura,Kalyan Veeramachaneni
http://arxiv.org/abs/2312.02859v1
Compressor summary: The paragraph discusses three key lessons learned from deploying usable machine learning in real-world domains and how they can be applied to wind turbine monitoring for decision-making in renewable energy.
Julien Boussard,Chandni Nagda,Julia Kaltenborn,Charlotte Emilie Elektra Lange,Philippe Brouillard,Yaniv Gurwicz,Peer Nowack,David Rolnick
http://arxiv.org/abs/2312.02858v1
Compressor summary: The authors explore how causal representation learning, specifically CDSD, can improve the efficiency and interpretability of climate model emulators for simulating future climate change scenarios.
Tom Savage,Ehecatl Antonio del Rio Chanona
http://arxiv.org/abs/2312.02852v1
Compressor summary: The authors propose a method to integrate human expertise into Bayesian optimization by allowing experts to choose between multiple optimal solutions with high utility and low redundancy at each iteration.
Lalit Pandey,Samantha M. W. Wood,Justin N. Wood
http://arxiv.org/abs/2312.02843v1
Compressor summary: The study shows that vision transformers (ViTs) can learn view invariant object recognition tasks like newborn chicks when trained on similar impoverished visual environments, challenging the assumption that ViTs are more data hungry than biological systems.
Nicolas Menet,Michael Hersche,Geethan Karunaratne,Luca Benini,Abu Sebastian,Abbas Rahimi
http://arxiv.org/abs/2312.02829v1
Compressor summary: The paragraph describes MIMONets, which are neural networks that can process multiple inputs simultaneously using variable binding mechanisms and superposition, achieving speedup and accuracy trade-offs in various architectures like CNN and Transformer.
Florent Forest,Olga Fink
http://arxiv.org/abs/2312.02826v1
Compressor summary: The paper proposes a new unsupervised domain adaptation method called Calibrated Adaptive Teacher (CAT) to improve intelligent fault diagnosis using deep learning, which calibrates the teacher network's predictions during self-training.
Zhu Yuke,Ruan Yumeng,Yang Lei,Guo Sheng
http://arxiv.org/abs/2312.02821v1
Compressor summary: The paper proposes RotaTR, an extension of DETR that uses Rotation Sensitive deformable attention to improve detection of dense and rotated objects in scenes.
Xinyu Ma,Xuebo Liu,Min Zhang
http://arxiv.org/abs/2312.02820v1
Compressor summary: The authors propose a new method using fisher information matrix to cluster languages into pseudo families, which improves multilingual translation model performance and language similarity measurements.
Donggeun Yoon,Minseok Seo,Doyi Kim,Yeji Choi,Donghyeon Cho
http://arxiv.org/abs/2312.02819v1
Compressor summary: The paper introduces a new model, DGDM, that combines deterministic and probabilistic methods for accurate and probabilistic weather forecasting and evaluates it on various datasets.
Fengyuan Shi,Jiaxi Gu,Hang Xu,Songcen Xu,Wei Zhang,Limin Wang
http://arxiv.org/abs/2312.02813v1
Compressor summary: The authors propose BIVDiff, a training-free framework for general-purpose video synthesis that combines image diffusion models with text-to-video foundation diffusion models to address challenges in downstream video tasks.
Céline Comte,Matthieu Jonckheere,Jaron Sanders,Albert Senen-Cerda
http://arxiv.org/abs/2312.02804v1
Compressor summary: The paper introduces score-aware gradient estimators (SAGEs) for policy-gradient methods in large state and action space Markov decision processes (MDPs), which can estimate the policy gradient without value-function estimation and have better convergence properties, especially for product-form stationary distributions.
Vera Pavlova
http://arxiv.org/abs/2312.02803v1
Compressor summary: The authors present a novel approach to Qur'anic information retrieval using neural methods, data augmentation, and domain-specific language models in both English and Arabic, achieving state-of-the-art results.
Miriam Rateike,Celia Cintas,John Wamburu,Tanya Akumu,Skyler Speakman
http://arxiv.org/abs/2312.02798v1
Compressor summary: The authors propose an auditing method to detect anomalies in large language models' internal states and identify the nodes responsible for encoding hallucinations.
Bowen Jin,Gang Liu,Chi Han,Meng Jiang,Heng Ji,Jiawei Han
http://arxiv.org/abs/2312.02783v1
Compressor summary: The paragraph discusses the applications and techniques of large language models on graph data, and provides a systematic review of scenarios and methods for using them in various contexts.
Tianshun Han,Shengnan Gui,Yiqing Huang,Baihui Li,Lijian Liu,Benjia Zhou,Ning Jiang,Quan Lu,Ruicong Zhi,Yanyan Liang,Du Zhang,Jun Wan
http://arxiv.org/abs/2312.02781v1
Compressor summary: PMMTalk is a novel framework that uses pseudo multi-modal features to improve 3D facial animation by incorporating visual and textual cues from speech, requiring only an additional reference image for more accurate results.
Xu Shi,Chuanchen Luo,Junran Peng,Hongwen Zhang,Yunlian Sun
http://arxiv.org/abs/2312.02772v1
Compressor summary: The paper introduces FG-MDM, a framework that uses a large language model to parse vague textual annotations into fine-grained descriptions of human motions and generates fine-grained and stylized motions with a transformer-based diffusion model.
Chenguang Zhao,Huan Yu
http://arxiv.org/abs/2312.02770v1
Compressor summary: The paper proposes a data-enhanced nonlocal traffic model using a physics-informed neural network to learn the look-ahead dynamics and improve traffic wave prediction.
Rui Huang,Binbin Jiang,Qingyi Zhao,William Wang,Yuxiang Zhang,Qing Guo
http://arxiv.org/abs/2312.02751v1
Compressor summary: The authors propose C-NERF, a method to detect changes in a scene represented by neural radiance fields (NeRFs), which outperforms existing 2D change detection and NeRF-based methods.
Xinnuo Xu,Ivan Titov,Mirella Lapata
http://arxiv.org/abs/2312.02748v1
Compressor summary: The paragraph discusses data-to-text generation challenges, proposes a compositional generalization benchmark, and introduces a new model that clusters predicates for improved textual descriptions.
Kevin Badalian,Lucas Koch,Tobias Brinkmann,Mario Picerno,Marius Wegener,Sung-Yong Lee,Jakob Andert
http://arxiv.org/abs/2312.02739v1
Compressor summary: The paper introduces LExCI, a free and open-source framework that allows training agents on embedded systems using the RLlib library, overcoming challenges faced by professionals in control engineering.
Max Klabunde,Mehdi Ben Amor,Michael Granitzer,Florian Lemmerich
http://arxiv.org/abs/2312.02730v1
Compressor summary: The authors investigate how similar large language models (LLMs) with 7B parameters are and find that some LLMs differ significantly, while cautioning about potential pitfalls in measuring similarity.
Chenhuan Li,Meihua Xiao,zehuan li,Mengxi Gao
http://arxiv.org/abs/2312.02725v1
Compressor summary: The authors propose a new method for voxel 3D reconstruction using shifted windows attention, which improves the accuracy of single-view reconstruction compared to existing methods.
Mingyu Huang,Ke Li
http://arxiv.org/abs/2312.02720v1
Compressor summary: The paper proposes using graph data mining techniques to analyze local optima networks and find structural similarities between fitness landscapes of different combinatorial optimization problems, which can help improve problem-solving by analogy.
Jan Schuchardt,Yan Scholten,Stephan Günnemann
http://arxiv.org/abs/2312.02708v1
Compressor summary: The paper proposes a new way to measure adversarial robustness in machine learning models that considers task equivariance and provides methods to achieve provable robustness for various tasks.
Huajun Chen
http://arxiv.org/abs/2312.02706v1
Compressor summary: Key points: - Human languages carry world knowledge - Large Language Models (LLMs) like ChatGPT process and manipulate world knowledge in neural networks - The article explores how symbolic knowledge (e.g., Knowledge Graphs) can enhance LLMs and how LLMs can amplify traditional knowledge bases - The authors propose Large Knowledge Models (LKM) to manage diverse knowledge structures and discuss some challenges and principles for LKM Summary: The article explores the role of symbolic knowledge in enhancing and amplifying Large Language Models (LLMs), which process world knowledge in neural networks, and proposes Large Knowledge Models (LKM) to manage diverse knowledge structures with some challenges and principles.
Jianghui Zhang,Yuanyuan Wang,Lina Guo,Jixiang Luo,Tongda Xu,Yan Wang,Zhi Wang,Hongwei Qin
http://arxiv.org/abs/2312.02705v1
Compressor summary: The paper presents a new method for compressing JPEG images that combines lossy and lossless techniques using learned quantization tables and hierarchical variational autoencoders, achieving low distortion near the bitrate limit.
Bo Ding,Zhenfeng Fan,Shuang Yang,Shihong Xia
http://arxiv.org/abs/2312.02703v1
Compressor summary: Myportrait is a framework for generating realistic talking faces with personalized details using monocular video and 3D face morphable space, supporting both video-driven and audio-driven face animation and outperforming state-of-the-art methods.
Vasileios Baltatzis,Rolandos Alexandros Potamias,Evangelos Ververas,Guanxiong Sun,Jiankang Deng,Stefanos Zafeiriou
http://arxiv.org/abs/2312.02702v1
Compressor summary: The proposed 3D diffusion-based model generates realistic Sign Language avatars using a novel graph neural network and outperforms previous methods, potentially reducing communication barriers between Deaf and hearing communities.
Xinpeng Liu,Haowen Hou,Yanchao Yang,Yong-Lu Li,Cewu Lu
http://arxiv.org/abs/2312.02700v1
Compressor summary: The paper proposes a new approach to generate human-scene interaction (HSI) using motion-only data, which can handle complex scenes and generalize well without ground truth 3D scenes.
Muhammad Umer Ramzan,Usman Ali,Syed Haider Abbas Naqvi,Zeeshan Aslam,Tehseen,Husnain Ali,Muhammad Faheem
http://arxiv.org/abs/2312.02699v1
Compressor summary: The paragraph describes an auto management system that uses deep learning models for vehicle entrance and parking, integrating various technologies like vehicle detection, license plate recognition, and face recognition to ensure efficiency, security, and convenience.
Tero Karras,Miika Aittala,Jaakko Lehtinen,Janne Hellsten,Timo Aila,Samuli Laine
http://arxiv.org/abs/2312.02696v1
Compressor summary: The paper improves the ADM diffusion model for data-driven image synthesis by redesigning network layers to preserve magnitudes and introducing a method for tuning exponential moving average parameters after training.
Dezhi Peng,Zhenhua Yang,Jiaxin Zhang,Chongyu Liu,Yongxin Shi,Kai Ding,Fengjun Guo,Lianwen Jin
http://arxiv.org/abs/2312.02694v1
Compressor summary: The paper introduces UPOCR, a simple and effective generalist model that unifies diverse OCR tasks as image-to-image transformation using a vision Transformer encoder-decoder with learnable task prompts, achieving state-of-the-art performance on three tasks.
Xiaze Zhang,Ziheng Ding,Qi Jing,Yuejie Zhang,Wenchao Ding,Rui Feng
http://arxiv.org/abs/2312.02684v1
Compressor summary: The paper presents DeepPointMap, a unified architecture that uses neural networks to extract sparse neural descriptors from point clouds, achieving high localization accuracy and memory-efficient map representation for SLAM tasks and multi-agent collaboration.
Zhengyao Jiang,Yingchen Xu,Nolan Wagener,Yicheng Luo,Michael Janner,Edward Grefenstette,Tim Rocktäschel,Yuandong Tian
http://arxiv.org/abs/2312.02682v1
Compressor summary: The paper introduces H-GAP, a model that generates humanoid trajectories from human motion-captured data and can handle various control tasks with MPC, outperforming baselines and transferring behaviors flexibly.
Mila Gorecki,Jakob H. Macke,Michael Deistler
http://arxiv.org/abs/2312.02674v1
Compressor summary: The authors propose a neural network method for Bayesian decision making on stochastic simulators without computing explicit posterior approximations, and show its effectiveness in both benchmark problems and a real-world medical neurosciences application.
Rosario Leonardi,Antonino Furnari,Francesco Ragusa,Giovanni Maria Farinella
http://arxiv.org/abs/2312.02672v1
Compressor summary: This study shows that synthetic data and domain adaptation techniques can improve egocentric hand-object interaction detection performance while reducing the need for real data annotations.
N. Ordonez,M. Tromp,P. M. Julbe,W. Böhmer
http://arxiv.org/abs/2312.02665v1
Compressor summary: The paragraph describes a method for training agents with DQN to handle real-world changes in observations by using a neural network with hidden representations and a new loss function that allows them to act until they get a recognized observation again, demonstrating robustness to temporary blindness.
Yuxuan Yan,Chi Zhang,Rui Wang,Pei Cheng,Gang Yu,Bin Fu
http://arxiv.org/abs/2312.02663v1
Compressor summary: The study presents a new approach for creating personalized images that maintain the subject's identity by combining stylized, facial, and textual guidance, achieving efficient and high-quality results.
Pere Izquierdo Gomez,Miguel E. Lopez Gajardo,Nenad Mijatovic,Tomislav Dragicevic
http://arxiv.org/abs/2312.02661v1
Compressor summary: The text describes an edge computing method that uses autonomous data selection to improve condition monitoring of power electronic converters using field data and machine learning.
Andrew J. Charlton-Perez,Helen F. Dacre,Simon Driscoll,Suzanne L. Gray,Ben Harvey,Natalie J. Harvey,Kieran M. R. Hunt,Robert W. Lee,Ranjini Swaminathan,Remy Vandaele,Ambrogio Volonté
http://arxiv.org/abs/2312.02658v1
Compressor summary: The paragraph compares the accuracy of four machine learning models in forecasting the structure and details of Storm Ciar'an, a European windstorm, with numerical weather prediction models.
Hong-En Chen,Bin-Shih Wu,Sheng-Yu Huang,Yu-Chiang Frank Wang
http://arxiv.org/abs/2312.02647v1
Compressor summary: The paper introduces TPA3D, a GAN-based model for fast text-to-3D generation using attention mechanisms on text features and 3D shape data.
Xiaobei Zou,Luolin Xiong,Yang Tang,Jurgen Kurths
http://arxiv.org/abs/2312.02646v1
Compressor summary: The authors propose a new framework for spatio-temporal forecasting that considers time delays and multi-scale interactions by using a series-aligned graph convolution layer and a multi-scale graph learning architecture.
Camillo Quattrocchi,Antonino Furnari,Daniele Di Mauro,Mario Valerio Giuffrida,Giovanni Maria Farinella
http://arxiv.org/abs/2312.02638v1
Compressor summary: The paper proposes a method to adapt a temporal action segmentation system from exocentric to egocentric cameras using existing labeled and unlabeled video pairs without collecting new labels, achieving similar performance to supervised methods.
Yichi Zhang,Xiaogang Xu
http://arxiv.org/abs/2312.02625v1
Compressor summary: The paper proposes a new image representation called Diffusion Noise Feature (DNF) that can effectively detect generated images by exploiting the differences between real and fake images in an inverse diffusion process within a pre-trained diffusion model.
Jiahang Li,Yakun Song,Xiang Song,David Paul Wipf
http://arxiv.org/abs/2312.02622v1
Compressor summary: Virgo is a new initialization method for GNNs that reduces variance instability by considering the effects of activation function, hidden dimension, graph structure and message passing on forward and backward propagation.
Wangbin Sun,Jintang Li,Liang Chen,Bingzhe Wu,Yatao Bian,Zibin Zheng
http://arxiv.org/abs/2312.02619v1
Compressor summary: The paper proposes SGCL, a simple and efficient graph self-supervised learning framework that eliminates negative samples and reduces model complexity by using outputs from two consecutive iterations as positive pairs.
Evlampios Apostolidis,Konstantinos Apostolidis,Vasileios Mezaris
http://arxiv.org/abs/2312.02616v1
Compressor summary: The paper introduces a web tool that creates custom video summaries for social media platforms using AI models.
Sungik Choi,Hankook Lee,Honglak Lee,Moontae Lee
http://arxiv.org/abs/2312.02615v1
Compressor summary: PR is an efficient novelty detection method for diffusion models that uses perceptual distance and recursive projections to detect abnormal samples with similar background information to in-distribution data.
Xuan Long Do,Yiran Zhao,Hannah Brown,Yuxi Xie,James Xu Zhao,Nancy F. Chen,Kenji Kawaguchi,Michael Qizhe Xie,Junxian He
http://arxiv.org/abs/2312.02614v1
Compressor summary: The paper introduces adv-ICL, a new method that optimizes prompts for in-context learning using adversarial learning with pre-trained models, which improves performance on various tasks and is computationally efficient.
Niccolò Bisagno,Nicola Garau,Antonio Luigi Stefani,Nicola Conci
http://arxiv.org/abs/2312.02613v1
Compressor summary: The paragraph describes a human crowd simulator called UniCrowd that can generate annotated data for various computer vision tasks involving crowds.
Shashi Raj Pandey,Pierre Pinson,Petar Popovski
http://arxiv.org/abs/2312.02611v1
Compressor summary: Data similarity and privacy preferences are important for designing data markets, and a new protocol using local differential privacy is proposed to address this issue in a two-party data acquisition mechanism.
Florian Kofler,Hendrik Möller,Josef A. Buchner,Ezequiel de la Rosa,Ivan Ezhov,Marcel Rosier,Isra Mekki,Suprosanna Shit,Moritz Negwer,Rami Al-Maskari,Ali Ertürk,Shankeeth Vinayahalingam,Fabian Isensee,Sarthak Pati,Daniel Rueckert,Jan S. Kirschke,Stefan K. Ehrlich,Annika Reinke,Bjoern Menze,Benedikt Wiestler,Marie Piraud
http://arxiv.org/abs/2312.02608v1
Compressor summary: panoptica is a new Python package that computes various metrics to evaluate 2D and 3D segmentation quality for biomedical applications.
Mikhail Tikhomirov,Daniil Chernyshev
http://arxiv.org/abs/2312.02598v1
Compressor summary: The paper investigates using vocabulary substitution to improve non-English performance and efficiency of large language models, and shows positive results on Russian Super Glue benchmark and human evaluation.
Anuradha Kumari,M. Tanveer
http://arxiv.org/abs/2312.02596v1
Compressor summary: The paper introduces TSVR+, a fusion of twin support vector regression with learning using privileged information, which uses both regular and privileged features for training and improves the efficiency of prediction.
Alexandru Ţifrea,Preethi Lahoti,Ben Packer,Yoni Halpern,Ahmad Beirami,Flavien Prost
http://arxiv.org/abs/2312.02592v1
Compressor summary: The paper presents a new post-processing technique for group fairness that overcomes the limitations of existing methods and achieves similar performance to in-processing approaches.
Tanmay Chavan,Ved Patwardhan
http://arxiv.org/abs/2312.02590v1
Compressor summary: The paper presents a method for estimating intimacy level in text using multilingual models and various data augmentation techniques, with applications to tweets in multiple languages.
Tanmay Chavan,Kshitij Deshpande,Sheetal Sonawane
http://arxiv.org/abs/2312.02578v1
Compressor summary: The paper describes an approach for detecting empathy and distress in natural language conversations using BERT-based models and ensemble methods, achieving third place in a shared task.
Ioannis Kontostathis,Evlampios Apostolidis,Vasileios Mezaris
http://arxiv.org/abs/2312.02576v1
Compressor summary: This paper introduces an integrated system for creating concise summaries of 360-degree videos by detecting important events and using different methods depending on camera movement.
Junjie Gao,Xiangyu Zheng,DongDong Wang,Zhixiang Huang,Bangqi Zheng,Kai Yang
http://arxiv.org/abs/2312.02573v1
Compressor summary: Uplift modeling uses machine learning techniques to estimate the net effect of an action on some customer outcome, and this paper proposes two innovative adaptations of the Gradient Boosting Decision Trees algorithm that improve uplift estimation and introduces UTBoost, an open source system for uplift modeling.
Jianmeng Liu,Yuyao Zhang,Zeyuan Meng,Yu-Wing Tai,Chi-Keung Tang
http://arxiv.org/abs/2312.02568v1
Compressor summary: The paper presents Prompt2NeRF-PIL, a fast and easy way to generate 3D scenes from text or images using a pre-trained implicit latent space of NeRF parameters, which also speeds up existing prompt-to-NeRF methods.
Michael Igorevich Ivanitskiy,Alex F. Spies,Tilman Räuker,Guillaume Corlouer,Chris Mathwin,Lucia Quirke,Can Rager,Rusheb Shah,Dan Valentine,Cecilia Diniz Behn,Katsumi Inoue,Samy Wu Fung
http://arxiv.org/abs/2312.02566v1
Compressor summary: The authors study how small transformer models learn to solve mazes and discover that they form structured representations of the maze topology and paths, as well as identifying specific attention heads for path-following.
Youpeng Zhao,Yudong Lu,Jian Zhao,Wengang Zhou,Houqiang Li
http://arxiv.org/abs/2312.02561v1
Compressor summary: The authors develop and evaluate an AI program for the complex card game GuanDan using deep Monte Carlo techniques and policy-based reinforcement learning, achieving superior performance compared to baseline methods.
Tianchi Cai,Xierui Song,Jiyan Jiang,Fei Teng,Jinjie Gu,Guannan Zhang
http://arxiv.org/abs/2312.02554v1
Compressor summary: The paper proposes a method for aligning language models to user's intent using both supervised fine-tuning and point-wise preference learning, and introduces a new dataset for harmlessness.
Thong Nguyen,Xiaobao Wu,Xinshuai Dong,Cong-Duy Nguyen,See-Kiong Ng,Luu Anh Tuan
http://arxiv.org/abs/2312.02549v1
Compressor summary: The paper proposes an energy-based model and a novel Transformer-based architecture to improve localizing video moments corresponding to natural language queries using attention mechanisms.
Soroush Abbasi Koohpayegani,Anuj Singh,K L Navaneet,Hadi Jamali-Rad,Hamed Pirsiavash
http://arxiv.org/abs/2312.02548v1
Compressor summary: GeNIe is a data augmentation technique that uses diffusion models to generate challenging samples for target categories by merging images from source and target categories, improving deep model training in few-shot and long-tail distribution settings.
Zhuo Huang,Chang Liu,Yinpeng Dong,Hang Su,Shibao Zheng,Tongliang Liu
http://arxiv.org/abs/2312.02546v1
Compressor summary: The paper proposes a method to improve vision models' robustness by using multi-modal language models to provide guidance on correcting noisy predictions in an unsupervised manner.
Yuntao Shou,Wei Ai,Tao Meng
http://arxiv.org/abs/2312.02545v1
Compressor summary: The paper proposes a simple contrastive vision GNN (SC-ViG) architecture for remote sensing segmentation, which adapts to irregular objects and minimizes task-independent redundant information using information bottleneck theory.