This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-11-29 generated by the compressor, my personal LLM-based project.
Daniel Geng,Inbum Park,Andrew Owens
http://arxiv.org/abs/2311.17919v1
Compressor summary: The authors propose a simple method to create multi-view optical illusions using text-to-image diffusion models and noise estimation from different views, resulting in visual anagrams that change appearance under certain transformations.
Soumik Mukhopadhyay,Matthew Gwilliam,Yosuke Yamaguchi,Vatsal Agarwal,Namitha Padmanabhan,Archana Swaminathan,Tianyi Zhou,Abhinav Shrivastava
http://arxiv.org/abs/2311.17921v1
Compressor summary: The paper proposes a unified representation learner that combines generative and discriminative tasks using diffusion models, which use U-Nets to remove noise and produce diverse, high-quality images, and introduces new mechanisms for feature fusion and feedback to improve performance on various image tasks.
Mohammad Fahes,Tuan-Hung Vu,Andrei Bursuc,Patrick Pérez,Raoul de Charette
http://arxiv.org/abs/2311.17922v1
Compressor summary: The paper presents a framework for improving semantic segmentation with neural networks by using language to randomize and augment the data, while preserving CLIP's robustness.
Yuqi Wang,Jiawei He,Lue Fan,Hongxin Li,Yuntao Chen,Zhaoxiang Zhang
http://arxiv.org/abs/2311.17918v1
Compressor summary: Drive-WM is a driving world model that generates high-fidelity multiview videos in driving scenes to enhance autonomous vehicle safety and efficiency by predicting future events and evaluating risks.
Qidong Huang,Xiaoyi Dong,Pan Zhang,Bin Wang,Conghui He,Jiaqi Wang,Dahua Lin,Weiming Zhang,Nenghai Yu
http://arxiv.org/abs/2311.17911v1
Compressor summary: OPERA is a new method to reduce hallucination in multi-modal language models by penalizing over-trust and retrospecting token selection during decoding.
Muhammed Kocabas,Jen-Hao Rick Chang,James Gabriel,Oncel Tuzel,Anurag Ranjan
http://arxiv.org/abs/2311.17910v1
Compressor summary: HUGS uses 3D Gaussian Splatting and a monocular video with a small number of frames to learn an animatable human avatar from a static scene, achieving state-of-the-art rendering quality and speed.
Alexander Vilesov,Pradyumna Chari,Achuta Kadambi
http://arxiv.org/abs/2311.17907v1
Compressor summary: CG3D is a method for generating detailed 3D graphics with text-conditioned guidance, overcoming constraints such as limited scene complexity and physically unrealistic compositions.
Jang Hyun Cho,Philipp Krähenbühl
http://arxiv.org/abs/2311.17902v1
Compressor summary: The paper introduces DECOLA, an open-vocabulary detection framework that uses image-level labels and detailed annotations to train language-conditioned and unconditioned detectors for zero-shot performance on various benchmarks.
Drew A. Hudson,Daniel Zoran,Mateusz Malinowski,Andrew K. Lampinen,Andrew Jaegle,James L. McClelland,Loic Matthey,Felix Hill,Alexander Lerchner
http://arxiv.org/abs/2311.17901v1
Compressor summary: SODA is a self-supervised diffusion model that learns strong visual representations by generating related novel views from compact source view encodings, enabling unsupervised ImageNet classification, reconstruction, editing, and synthesis tasks.
Jinqi Luo,Kwan Ho Ryan Chan,Dimitris Dimos,René Vidal
http://arxiv.org/abs/2311.17898v1
Compressor summary: KPP is a zero-shot framework that uses external knowledge to enhance text-driven generative models' quality and faithfulness in multiple tasks without accessing their parameters.
Shuangrui Ding,Rui Qian,Haohang Xu,Dahua Lin,Hongkai Xiong
http://arxiv.org/abs/2311.17893v1
Compressor summary: The paper presents a simple self-supervised video object segmentation method that uses DINO-pretrained Transformers and clustering to achieve state-of-the-art results without auxiliary modalities or slot attention.
Jonathon Liu,Razin A. Shaikh,Benjamin Rodatz,Richie Yeung,Bob Coecke
http://arxiv.org/abs/2311.17892v1
Compressor summary: DisCoCirc is a new model that connects linguistic theory and modern NLP by representing text as circuits that capture meaning and can be used with classical or quantum methods.
Or Hirschorn,Shai Avidan
http://arxiv.org/abs/2311.17891v1
Compressor summary: The paper introduces a new category-agnostic pose estimation method that uses a Graph Transformer Decoder to capture geometrical relations between keypoints, improving accuracy on the MP-100 benchmark.
Chaerin Min,Sehyun Cha,Changhee Won,Jongwoo Lim
http://arxiv.org/abs/2311.17878v1
Compressor summary: The paper proposes a new method to speed up multi-view neural surface reconstruction by using the Truncated Signed Distance Field (TSDF) of the scene, which reduces the number of samplings and maintains high rendering quality.
Tristan Gomez,Harold Mouchère
http://arxiv.org/abs/2311.17876v1
Compressor summary: This paper uses Krippendorf's alpha to measure the reliability of image classification explanation methods and suggests model modifications to improve it.
Wen Jiang,Boshu Lei,Kostas Daniilidis
http://arxiv.org/abs/2311.17874v1
Compressor summary: The study proposes a new method using Fisher Information to efficiently select informative views and quantify uncertainty in Neural Radiance Fields, achieving state-of-the-art results and fast performance.
Yatao Li,Jianfeng Zhan
http://arxiv.org/abs/2311.17869v1
Compressor summary: AI4S research uses machine learning to improve scientific computing, but needs better benchmarking methods like structural interpretation to ensure accuracy in real-world applications.
Rameen Abdal,Wang Yifan,Zifan Shi,Yinghao Xu,Ryan Po,Zhengfei Kuang,Qifeng Chen,Dit-Yan Yeung,Gordon Wetzstein
http://arxiv.org/abs/2311.17857v1
Compressor summary: Gaussian Shell Maps (GSMs) are a new framework for generating high-quality, multi-view consistent 3D digital humans using an articulable scaffold of inflated and deflated shells with 3D Gaussian rendering primitives, avoiding the need for volume representations or view-inconsistent upsamplers.
Puja Trivedi,Ryan Rossi,David Arbour,Tong Yu,Franck Dernoncourt,Sungchul Kim,Nedim Lipka,Namyong Park,Nesreen K. Ahmed,Danai Koutra
http://arxiv.org/abs/2311.17856v1
Compressor summary: The paragraph introduces a new graph generative framework called SGDM that uses subgraph diffusion to refine noisy and incomplete networks in various ways, such as removing unwanted subgraphs, expanding existing ones, and changing their style.
Amin Rakhsha,Mete Kemertas,Mohammad Ghavamzadeh,Amir-massoud Farahmand
http://arxiv.org/abs/2311.17855v1
Compressor summary: The paper proposes a method for planning in reinforcement learning using an approximate model that can reduce error, accelerate convergence, and outperform traditional approaches.
Rishabh Kabra,Loic Matthey,Alexander Lerchner,Niloy J. Mitra
http://arxiv.org/abs/2311.17851v1
Compressor summary: The paper proposes a method to leverage pretrained vision language models for various annotation tasks involving unlabeled 3D objects by aggregating their scores and improving downstream predictions.
Alexandre Araujo,Jean Ponce,Julien Mairal
http://arxiv.org/abs/2311.17846v1
Compressor summary: The paper introduces a new dataset and deep learning algorithm for focus stacking in photography that works well with long bursts of real-world images and is more robust to noise.
Etai Sella,Gal Fiebelman,Noam Atia,Hadar Averbuch-Elor
http://arxiv.org/abs/2311.17834v1
Compressor summary: SPiC-E is a neural network that improves 3D diffusion models by using cross-entity attention to learn structural guidance from auxiliary shapes, enabling various applications with high quality and speed.
Maximilian Augustin,Yannic Neuhaus,Matthias Hein
http://arxiv.org/abs/2311.17833v1
Compressor summary: The paper proposes a framework for generating images that help analyze and improve image classifiers' reliability and explainability, revealing new and existing failure modes.
Seyedeh Gol Ara Ghoreishi,Sonia Moshfeghi,Muhammad Tanveer Jan,Joshua Conniff,KwangSoo Yang,Jinwoo Jang,Borko Furht,Ruth Tappen,David Newman,Monica Rosselli,Jiannan Zhai
http://arxiv.org/abs/2311.17822v1
Compressor summary: The paper proposes a method to detect drivers with unusual behavior from large datasets of detailed trajectories, which can help with applications like MCI detection and safe route recommendations for older drivers.
Alexis Toumi,Giovanni de Felice
http://arxiv.org/abs/2311.17813v1
Compressor summary: The authors introduce a new type of linguistic model based on diagram-valued functions that can handle non-linear language phenomena and have a Python implementation.
Ting Liu,Yue Hu,Wansen Wu,Youkai Wang,Kai Xu,Quanjun Yin
http://arxiv.org/abs/2311.17812v1
Compressor summary: The paragraph describes a new method (DAP) for improving vision-and-language models' performance in navigation tasks by learning soft visual prompts from in-domain image-text pairs.
David Komorowicz,Lu Sang,Ferdinand Maiwald,Daniel Cremers
http://arxiv.org/abs/2311.17810v1
Compressor summary: The authors propose a volumetric rendering technique to reconstruct 3D models of historical buildings from limited datasets, including color appearance loss and a new historical dataset.
Gustav Bredell,Marcel Fischer,Przemyslaw Szostak,Samaneh Abbasi-Sureshjani,Alvaro Gomariz
http://arxiv.org/abs/2311.17804v1
Compressor summary: The paragraph discusses how the performance of feature extractor models in digital pathology depends on the choice of aggregation model hyperparameters, and proposes a comprehensive evaluation approach to understand this relationship better.
L. Jeff Hong,Yanxi Hou,Qingkai Zhang,Xiaowei Zhang
http://arxiv.org/abs/2311.17797v1
Compressor summary: Generative metamodeling is a new technique that quickly generates outputs from complex simulation models for real-time decision-making, while preserving the distribution of inputs, and the paper proposes a new algorithm called quantile-regression-based generative metamodeling (QRGMM).
Guy Hay,Ohad Volk
http://arxiv.org/abs/2311.17795v1
Compressor summary: The paper proposes Marginal Laplacian Score, a modified unsupervised feature selection method for handling imbalanced data, which improves the performance of Differentiable Unsupervised Feature Selection on synthetic and real-world data sets.
Yong-Min Shin,Won-Yong Shin
http://arxiv.org/abs/2311.17781v1
Compressor summary: The authors propose a method called Propagate & Distill (P&D) to train a student MLP using knowledge distillation from a teacher GNN, which injects structural information by propagating the output of the teacher before distillation.
Gen Li,Deqing Sun,Laura Sevilla-Lara,Varun Jampani
http://arxiv.org/abs/2311.17776v1
Compressor summary: The paper introduces a vision-language framework for learning object affordances from one example per category, which improves upon existing models' understanding and performance in this task.
Simão Gonçalves,Gonçalo Correia,Diogo Pernes,Afonso Mendes
http://arxiv.org/abs/2311.17771v1
Compressor summary: The paragraph describes an enhanced version of the centroid method for extractive multi-document summarization that uses beam search and attention to achieve better performance across multiple languages.
Weixin Mao,Tiancai Wang,Diankun Zhang,Junjie Yan,Osamu Yoshie
http://arxiv.org/abs/2311.17770v1
Compressor summary: The paper improves pillar-based 3D object detection by using pretrained 2D ConvNets as backbones, which adapt to point cloud features like sparsity and irregularity.
Bernd Bassimir,Rolf Wanka
http://arxiv.org/abs/2311.17766v1
Compressor summary: The authors discuss different robust optimization methods for solving the examination timetabling problem with uncertainty, and evaluate their performance on real and random instances.
Xuekun Jiang,Anyi Rao,Jingbo Wang,Dahua Lin,Bo Dai
http://arxiv.org/abs/2311.17754v1
Compressor summary: The authors propose a method for optimizing camera movements and transferring shot types to new videos or virtual environments using NeRF and SMPL techniques.
Zijian Chen,Wei Sun,Jun Jia,Fangfang Lu,Zicheng Zhang,Jing Liu,Ru Huang,Xiongkuo Min,Guangtao Zhai
http://arxiv.org/abs/2311.17752v1
Compressor summary: The paper presents a large dataset for image banding assessment, proposes an effective method for detecting and evaluating banding artifacts using convolutional neural networks, and shows high correlation between banding intensity and perceptual quality.
Maud Biquard,Marie Chabert,Thomas Oberlin
http://arxiv.org/abs/2311.17744v1
Compressor summary: The paper proposes a new algorithm (VBLE) for regularization in computational imaging using compressive autoencoders, which are smaller and easier to train than generative models, and shows that it performs well and fasts compared to existing methods.
Lokesh Madasu,Gopichand Kanumolu,Nirmal Surange,Manish Shrivastava
http://arxiv.org/abs/2311.17743v1
Compressor summary: Mukhyansh is a large multilingual dataset for headline generation in Indian languages, overcoming challenges due to low-resource and limited data quality.
Can Cui,Imran Ahamad Sheikh,Mostafa Sadeghi,Emmanuel Vincent
http://arxiv.org/abs/2311.17741v1
Compressor summary: The paper compares two methods to train a speech recognition model that produces transcriptions with punctuation and capitalization, using limited labeled data and achieving different performance on out-of-domain data.
Lei Li,Angela Dai
http://arxiv.org/abs/2311.17737v1
Compressor summary: GenZI is a method for generating 3D human-scene interactions using natural language descriptions and no 3D data, by distilling interaction priors from vision-language models and optimizing a 3D model's pose and shape.
Chi-Pin Huang,Kai-Po Chang,Chung-Ting Tsai,Yung-Hsuan Lai,Yu-Chiang Frank Wang
http://arxiv.org/abs/2311.17717v1
Compressor summary: Receler is a method to remove specific concepts from text-to-image models by using locality and robustness, improving performance over previous erasing methods.
Mutian Xu,Xingyilang Yin,Lingteng Qiu,Yang Liu,Xin Tong,Xiaoguang Han
http://arxiv.org/abs/2311.17707v1
Compressor summary: SAMPro3D is a method for segmenting 3D indoor scenes from 2D frames using pretrained SAM, with techniques to improve alignment, quality, and diversity of results without additional training.
Chenxi Dong
http://arxiv.org/abs/2311.17696v1
Compressor summary: AI Tutor is a web application that uses a large language model to provide personalized, evidence-based tutoring in any subject based on course materials.
Jia Li,Lijie Hu,Jingfeng Zhang,Tianhang Zheng,Hua Zhang,Di Wang
http://arxiv.org/abs/2311.17695v1
Compressor summary: Fair Mapping is a model-agnostic method for generating fair and diverse images from text-to-image diffusion models by controlling prompts and using a linear mapping network.
Liya Wang,Jason Chou,Xin Zhou,Alex Tien,Diane M Baumgartner
http://arxiv.org/abs/2311.17686v1
Compressor summary: AviationGPT is a large language model designed for the aviation domain that can handle various NLP tasks and improve the efficiency and safety of NAS operations.
Jonathan Ivey,Susan Gauch
http://arxiv.org/abs/2311.17676v1
Compressor summary: The authors evaluate psychological stress models on detecting minority stress and suggest using emotion-infused models to improve performance for these vulnerable populations.
Zheng Chu,Jingchang Chen,Qianglong Chen,Weijiang Yu,Haotian Wang,Ming Liu,Bing Qin
http://arxiv.org/abs/2311.17667v1
Compressor summary: The paper introduces TimeBench, a benchmark for testing the temporal reasoning abilities of large language models, which reveals a performance gap between current LLMs and humans.
Junyi Ma,Xieyuanli Chen,Jiawei Huang,Jingyi Xu,Zhen Luo,Jintao Xu,Weihao Gu,Rui Ai,Hesheng Wang
http://arxiv.org/abs/2311.17663v1
Compressor summary: The paragraph introduces a new benchmark, Cam4DOcc, for camera-only 4D occupancy forecasting in autonomous driving applications that considers future scene changes based on multiple public datasets and evaluates four baseline methods.
Jacob Lin,Miguel Farinha,Edward Gryspeerdt,Ronald Clark
http://arxiv.org/abs/2311.17657v1
Compressor summary: The paper presents a novel deep learning approach that uses stereo images to reconstruct the shape and dynamics of volumetric phenomena like clouds and fog.
Somaieh Amraee,Bishoy Galoaa,Matthew Goodwin,Elaheh Hatamimajoumerd,Sarah Ostadabbas
http://arxiv.org/abs/2311.17656v1
Compressor summary: The paper presents MTTSort, a method for accurately tracking toddlers in indoor videos using DeepSort algorithm and addressing challenges such as unpredictable movements, occlusions, and limited fields of view.
Pavel Korshunov,Haolin Chen,Philip N. Garner,Sebastien Marcel
http://arxiv.org/abs/2311.17655v1
Compressor summary: The paper introduces SWAN-DF, a realistic audio-visual deepfakes database that tests the vulnerability of face and speech recognition systems to synthetic media.
Yujie Lu,Xiujun Li,William Yang Wang,Yejin Choi
http://arxiv.org/abs/2311.17647v1
Compressor summary: VIM is a framework that tests how well multimodal language models understand visual instructions by embedding them in scenes, revealing performance differences among models.
Alexander Becker,Rodrigo Caye Daudt,Nando Metzger,Jan Dirk Wegner,Konrad Schindler
http://arxiv.org/abs/2311.17643v1
Compressor summary: The authors propose a novel way to design neural fields for single image super-resolution that incorporates Gaussian PSF as an anti-aliasing technique without increasing computational cost.
Mreenav Shyam Deka,Lu Sang,Daniel Cremers
http://arxiv.org/abs/2311.17634v1
Compressor summary: The paper presents a method for creating new views of outdoor urban scenes using neural point light fields and dynamic object detection, while optimizing camera pose and refining both elements.
Tong Xiao,Jingbo Zhu
http://arxiv.org/abs/2311.17633v1
Compressor summary: The paper provides an overview of Transformers, their architecture, refinements, applications, and limitations in natural language processing.
Jiaqi Zhao,Zeyu Ding,Yong Zhou,Hancheng Zhu,Wenliang Du,Rui Yao,Abdulmotaleb El Saddik
http://arxiv.org/abs/2311.17629v1
Compressor summary: The proposed end-to-end oriented detector uses RRoI attention for multi-scale feature alignment and SDQ for efficient query optimization, achieving state-of-the-art performance on multiple datasets.
Yuan Wang,Naisong Luo,Tianzhu Zhang
http://arxiv.org/abs/2311.17626v1
Compressor summary: The paper presents a new few-shot segmentation model, AMFormer, that focuses on query information and achieves accurate results with minimal support guidance or labels.
Fukun Yin,Xin Chen,Chi Zhang,Biao Jiang,Zibo Zhao,Jiayuan Fan,Gang Yu,Taihao Li,Tao Chen
http://arxiv.org/abs/2311.17618v1
Compressor summary: ShapeGPT is a multimodal framework that uses language models to generate and edit 3D shapes based on instructions in natural language.
Andrey Voynov,Amir Hertz,Moab Arar,Shlomi Fruchter,Daniel Cohen-Or
http://arxiv.org/abs/2311.17609v1
Compressor summary: The study presents a framework that combines a text-to-image diffusion model with lens geometry to create realistic images with diverse visual effects like fish-eye and panorama.
Xiaoyue Mi,Fan Tang,Zonghan Yang,Danding Wang,Juan Cao,Peng Li,Yang Liu
http://arxiv.org/abs/2311.17608v1
Compressor summary: The study proposes a new memory-based continual learning method that improves robustness against adversarial attacks by adjusting data logits and using gradient-based data selection.
Xiaoyue Mi,Fan Tang,Yepeng Weng,Danding Wang,Juan Cao,Sheng Tang,Peng Li,Yang Liu
http://arxiv.org/abs/2311.17607v1
Compressor summary: The study proposes TRAIN, a method that preserves the structure of natural samples in the representation space during adversarial training, improving both natural and robust accuracies on various image datasets.
Martin Wistuba,Prabhu Teja Sivaprasad,Lukas Balles,Giovanni Zappella
http://arxiv.org/abs/2311.17601v1
Compressor summary: The paper proposes CoLoR, a continual learning method that uses Low Rank Adaptation (LoRA) to update pre-trained transformers and maintain their performance on new data without relying on prompt tuning.
Andrea Marinoni,Pietro Lio',Alessandro Barp,Christian Jutten,Mark Girolami
http://arxiv.org/abs/2311.17598v1
Compressor summary: The paper introduces soft manifolds, a new class of mathematical structures for graph embedding that can handle weighted connections and missing data in complex datasets, leading to more accurate and reliable graph analysis.
Yiwen Ye,Yutong Xie,Jianpeng Zhang,Ziyang Chen,Qi Wu,Yong Xia
http://arxiv.org/abs/2311.17597v1
Compressor summary: The paper proposes MedCoSS, a continuous self-supervised learning approach for multi-modal medical data that addresses representation conflicts and catastrophic forgetting using rehearsal-based continual learning and feature distillation.
Rudra P. K. Poudel,Harit Pandya,Chao Zhang,Roberto Cipolla
http://arxiv.org/abs/2311.17593v1
Compressor summary: The authors propose a method called LanGWM that uses language to improve reinforcement learning models' ability to handle out-of-distribution tasks and demonstrate its effectiveness in iGibson point navigation tasks.
Ziqiao Peng,Wentao Hu,Yue Shi,Xiangyu Zhu,Xiaomei Zhang,Hao Zhao,Jun He,Hongyan Liu,Zhaoxin Fan
http://arxiv.org/abs/2311.17590v1
Compressor summary: SyncTalk is a NeRF-based method that improves the realism of talking head videos by synchronizing facial expressions, lip movements, and head poses using innovative techniques.
Xu Liu,Shu Zhou,Yurong Song,Wenzhe Luo,Xin Zhang
http://arxiv.org/abs/2311.17583v1
Compressor summary: The proposed face liveness detection method uses image-text pairs and contrastive learning to detect eight types of financial field attack behaviors, achieving high performance and robustness on various datasets.
Daan Van Wesenbeeck,Aras Yurtman,Wannes Meert,Hendrik Blockeel
http://arxiv.org/abs/2311.17582v1
Compressor summary: LoCoMotif is a novel method for time series motif discovery that overcomes existing limitations and performs better in a physiotherapy use case.
Wenhao Zhong,Jie Jiang
http://arxiv.org/abs/2311.17571v1
Compressor summary: The paper proposes a novel convolutional transformer that captures both local and global features for image matching under extreme conditions, outperforming existing methods.
Lisheng Wu,Ke Chen
http://arxiv.org/abs/2311.17565v1
Compressor summary: The paper analyzes two types of off-policy biases in goal-conditioned reinforcement learning, proposes solutions to leverage their benefits, and shows improved efficiency and performance in challenging ten-step scenarios.
Yu Chen,Nivedita Bijlani,Samaneh Kouchaki,Payam Barnaghi
http://arxiv.org/abs/2311.17560v1
Compressor summary: The paper presents an algorithm to interpret latent states and predictions in machine learning models, which can help identify patterns and predict patient outcomes in digital healthcare.
Leonie Henschel,David Kügler,Lilla Zöllei,Martin Reuter
http://arxiv.org/abs/2311.17546v1
Compressor summary: The paper presents VINNA, a method for segmenting neonatal brain images that uses resolution-aware internal augmentations and 4-DOF transform module to improve accuracy and robustness.
Bo Qiao,Liqun Li,Xu Zhang,Shilin He,Yu Kang,Chaoyun Zhang,Fangkai Yang,Hang Dong,Jue Zhang,Lu Wang,Minghua Ma,Pu Zhao,Si Qin,Xiaoting Qin,Chao Du,Yong Xu,Qingwei Lin,Saravan Rajmohan,Dongmei Zhang
http://arxiv.org/abs/2311.17541v1
Compressor summary: TaskWeaver is a framework that uses LLMs to create chatbots with rich data structures, flexible plugins, and secure code execution for complex tasks in specific domains.
Sungbin Shin,Dongyeop Lee,Maksym Andriushchenko,Namhoon Lee
http://arxiv.org/abs/2311.17539v1
Compressor summary: The paper investigates how overparameterization affects sharpness-aware minimization (SAM) and finds that it improves generalization, convergence rate, and stability of minima in neural networks.
Liang Peng,Haoran Cheng,Zheng Yang,Ruisi Zhao,Linxuan Xia,Chaotian Song,Qinglin Lu,Wei Liu,Boxi Wu
http://arxiv.org/abs/2311.17536v1
Compressor summary: This paper proposes a noise constraint for one-shot video tuning methods to improve consistency and smoothness, and introduces a new metric to evaluate video smoothness better.
Xingqun Qi,Jiahao Pan,Peng Li,Ruibin Yuan,Xiaowei Chi,Mengfei Li,Wenhan Luo,Wei Xue,Shanghang Zhang,Qifeng Liu,Yike Guo
http://arxiv.org/abs/2311.17532v1
Compressor summary: The paper proposes a novel method for generating realistic 3D co-speech gestures with emotional transitions using ChatGPT-4, audio inpainting, weakly supervised training, and keyframe sampling.
Shen Zhang,Zhaowei Chen,Zhenyu Zhao,Zhenyuan Chen,Yao Tang,Yuhao Chen,Wengang Cao,Jiajun Liang
http://arxiv.org/abs/2311.17528v1
Compressor summary: HiDiffusion is a framework that improves high-resolution image synthesis by adjusting feature map size and using dynamic window attention in U-Net, achieving state-of-the-art performance without tuning.