This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-20 generated by the compressor, my personal LLM-based project.
Jianghang Lin,Yunhang Shen,Bingquan Wang,Shaohui Lin,Ke Li,Liujuan Cao
http://arxiv.org/abs/2312.12437v1
Compressor summary: The paper proposes WSOVOD, a framework for weakly supervised open-vocabulary object detection that can detect novel concepts and use diverse datasets with only image-level annotations.
Chaoyou Fu,Renrui Zhang,Haojia Lin,Zihan Wang,Timin Gao,Yongdong Luo,Yubo Huang,Zhengye Zhang,Longtian Qiu,Gaoxiang Ye,Yunhang Shen,Mengdan Zhang,Peixian Chen,Sirui Zhao,Xiawu Zheng,Shaohui Lin,Deqiang Jiang,Di Yin,Peng Gao,Ke Li,Xing Sun,Rongrong Ji
http://arxiv.org/abs/2312.12436v1
Compressor summary: The paper explores Gemini Pro's visual understanding abilities and compares it with GPT-4V and Sphinx, finding that Gemini can be a strong challenger to GPT-4V in multi-modal tasks.
Cheng-Yen Hsieh,Tarasha Khurana,Achal Dave,Deva Ramanan
http://arxiv.org/abs/2312.12433v1
Compressor summary: The paper introduces a new benchmark dataset and a module that improves amodal perception for detection and tracking of occluded objects in videos.
Viet Nguyen,Giang Vu,Tung Nguyen Thanh,Khoat Than,Toan Tran
http://arxiv.org/abs/2312.12431v1
Compressor summary: The paper proposes a new loss function for Denoising Probabilistic Models that considers the correlation between timesteps, improving image quality and generalization.
Aditya Murali,Deepak Alapatt,Pietro Mascagni,Armine Vardazaryan,Alain Garcia,Nariaki Okamoto,Guido Costamagna,Didier Mutter,Jacques Marescaux,Bernard Dallemagne,Nicolas Padoy
http://arxiv.org/abs/2312.12429v1
Compressor summary: The Endoscapes dataset contains LC videos with detailed annotations for assessing CVS and other aspects of the surgery, along with benchmarks and public access to the data and models.
Mengyu Wang,Henghui Ding,Jun Hao Liew,Jiajun Liu,Yao Zhao,Yunchao Wei
http://arxiv.org/abs/2312.12425v1
Compressor summary: Key points: - SegRefiner is a model-agnostic solution for enhancing object masks from different segmentation models - It uses a discrete diffusion process and predicts label and transition probabilities for each pixel - It performs well on various segmentation tasks, improving both metrics and details Summary: SegRefiner refines object masks using a discrete diffusion process that predicts label and transition probabilities for each pixel, achieving superior results on different segmentation tasks.
Shraman Pramanick,Guangxing Han,Rui Hou,Sayan Nag,Ser-Nam Lim,Nicolas Ballas,Qifan Wang,Rama Chellappa,Amjad Almahairi
http://arxiv.org/abs/2312.12423v1
Compressor summary: VistaLLM is a visual system that uses instruction-guided image tokenization and adaptive sampling to perform various vision-language tasks with single and multiple input images, while leveraging the CoinIt dataset and introducing the AttCoSeg task.
Jinghao Zhou,Tomas Jakab,Philip Torr,Christian Rupprecht
http://arxiv.org/abs/2312.12419v1
Compressor summary: Key points: - The paper proposes a framework to stylize 3D assets to fit into 2D scenes - The framework uses differentiable ray tracing and text-to-image diffusion models - The method can handle different environments and objects Summary: The paper presents a method to adapt 3D assets to match 2D scenes using ray tracing and image priors, enabling object stylization and realistic composition.
Shweta Mahajan,Tanzila Rahman,Kwang Moo Yi,Leonid Sigal
http://arxiv.org/abs/2312.12416v1
Compressor summary: The authors propose a method to obtain interpretable language prompts from text-to-image diffusion models by using a delayed projection scheme and focusing on later timesteps of the diffusion process.
Raven Beutner,Bernd Finkbeiner
http://arxiv.org/abs/2312.12403v1
Compressor summary: HyperATLS$^*_S$ extends ATL$^*$ to compare outcomes of multiple strategic interactions and enforce shared strategies among agents, capturing important AI properties and enabling decidable model checking.
Koji Ichikawa,Shinji Ito,Daisuke Hatano,Hanna Sumita,Takuro Fukunaga,Naonori Kakimura,Ken-ichi Kawarabayashi
http://arxiv.org/abs/2312.12400v1
Compressor summary: The paper proposes new distribution classes that allow the greedy algorithm to be applied to more contextual bandit problems without strong assumptions on arm feature diversity.
Yunhao Gou,Zhili Liu,Kai Chen,Lanqing Hong,Hang Xu,Aoxue Li,Dit-Yan Yeung,James T. Kwok,Yu Zhang
http://arxiv.org/abs/2312.12379v1
Compressor summary: MoCLE is a new MoE architecture that improves LVLMs' instruction-following abilities and generalization across various zero-shot vision-language tasks by activating task-specific model parameters based on instruction clusters and adding a universal expert.
Zhimeng Jiang,Xiaotian Han,Chao Fan,Zirui Liu,Na Zou,Ali Mostafavi,Xia Hu
http://arxiv.org/abs/2312.12369v1
Compressor summary: The paper proposes a new graph neural network architecture called Fair Message Passing that improves fairness and prediction performance by explicitly using sensitive attributes and mitigating biases in node classification tasks.
Piotr Pęzik,Sylwia Karasińska,Anna Cichosz,Łukasz Jałowiecki,Konrad Kaczyński,Małgorzata Krawentek,Karolina Walkusz,Paweł Wilk,Mariusz Kleć,Krzysztof Szklanny,Szymon Marszałkowski
http://arxiv.org/abs/2312.12364v1
Compressor summary: SpokesBiz is a large conversational Polish corpus with many uses in linguistics and ASR development.
Monika Wysoczańska,Oriane Siméoni,Michaël Ramamonjisoa,Andrei Bursuc,Tomasz Trzciński,Patrick Pérez
http://arxiv.org/abs/2312.12359v1
Compressor summary: The paper proposes a zero-shot open-vocabulary semantic segmentation method that improves MaskCLIP features with self-supervised localization priors and achieves state-of-the-art results on various benchmarks using only one pass through CLIP.
Feixiang Zhou,Zheheng Jiang,Huiyu Zhou,Xuelong Li
http://arxiv.org/abs/2312.12347v1
Compressor summary: The paper proposes a novel approach for semi-supervised action segmentation in long untrimmed videos using contrastive learning with intra- and inter-information variations exploration and neighbour consistency enforcement.
Yucheng Li,Frank Geurin,Chenghua Lin
http://arxiv.org/abs/2312.12343v1
Compressor summary: LatestEval is an automatic method that creates uncontaminated reading comprehension evaluations by using texts published within a recent time window, avoiding overlap with pre-trained language models' training corpora.
Suwei Yang,Kuldeep S. Meel
http://arxiv.org/abs/2312.12341v1
Compressor summary: The paper introduces PBCount, an exact Pseudo-Boolean model counter using knowledge compilation via algebraic decision diagrams, which can handle more instances than existing methods.
Ruiyuan Zhang,Jiaxiang Liu,Zexi Li,Hao Dong,Jie Fu,Chao Wu
http://arxiv.org/abs/2312.12340v1
Compressor summary: Key points: - The paper proposes a co-creation space with multiple assemblers for geometric fracture assembly without semantic information - It introduces a novel loss function to address collision issues during the process - It outperforms existing frameworks on two datasets and has linear computational complexity, enhanced abstraction, and improved generalization Summary: The paper presents a new framework for assembling fragmented 3D objects without semantic information, using multiple assemblers and a novel loss function that improves performance and collision handling.
Kiran Lekkala,Henghui Bao,Sumedh Sontakke,Laurent Itti
http://arxiv.org/abs/2312.12339v1
Compressor summary: The proposed method enables learning task-independent representations from value function estimates that can transfer skills across different tasks regardless of their appearance and dynamics.
David Charatan,Sizhe Li,Andrea Tagliasacchi,Vincent Sitzmann
http://arxiv.org/abs/2312.12337v1
Compressor summary: PixelSplat is a model that learns to reconstruct 3D images from two images using Gaussian primitives, enabling fast and memory-efficient rendering and 3D reconstruction.
Efthymios Georgiou,Yannis Avrithis,Alexandros Potamianos
http://arxiv.org/abs/2312.12334v1
Compressor summary: PowMix is a novel embedding space regularizer for multimodal sentiment analysis that improves performance without sacrificing robustness or text dominance.
Benjamin Colburn,Jose C. Principe,Luis G. Sanchez Giraldo
http://arxiv.org/abs/2312.12318v1
Compressor summary: Key points: - Kernel Adaptive Filtering (KAF) are methods that search for functions in a Reproducing Kernel Hilbert Space (RKHS) for tasks like time series prediction and system identification. - KAF have a linear relationship between number of training samples and model size, which limits their use on large data sets. - The paper proposes a novel view of optimal filtering that preserves the time structure of a stochastic process in RKHS, using correntropy as a nonlinear functional. Summary: The paper presents a new approach to Kernel Adaptive Filtering that avoids the linear growth of model size with training samples by using correntropy to capture nonlinear mappings in a Reproducing Kernel Hilbert Space that respect the time structure of stochastic processes.
Stefan Schoder
http://arxiv.org/abs/2312.12314v1
Compressor summary: The study explores how 2D-object detection algorithms like YOLO can improve road safety for autonomous driving in Austria by detecting and tracking objects in various conditions.
Yinhong Liu,Yixuan Su,Ehsan Shareghi,Nigel Collier
http://arxiv.org/abs/2312.12299v1
Compressor summary: The paper proposes Instruct-SCTG, a framework that uses instruction-tuned language models to generate structurally coherent articles in various domains with section-by-section alignment and measures discourse divergence using a new metric.
Muhammad Suffian,Ulrike Kuhl,Jose M. Alonso-Moral,Alessandro Bogliolo
http://arxiv.org/abs/2312.12290v1
Compressor summary: The paper introduces an AI system (CL-XAI) that helps learners understand how AI models work and evaluates its effectiveness through human feedback.
Junxiang Wang,Guangji Bai,Wei Cheng,Zhengzhang Chen,Liang Zhao,Haifeng Chen
http://arxiv.org/abs/2312.12276v1
Compressor summary: POND is a novel prompt-based deep learning model for multi-source time series domain adaptation, focusing on extracting and leveraging domain-specific meta-data information.
Ilya Zisman,Vladislav Kurenkov,Alexander Nikulin,Viacheslav Sinii,Sergey Kolesnikov
http://arxiv.org/abs/2312.12275v1
Compressor summary: $AD^{\epsilon}$ is a method that improves in-context learning with suboptimal human demonstrations by gradually introducing noise into them.
Peter Kocsis,Vincent Sitzmann,Matthias Nießner
http://arxiv.org/abs/2312.12274v1
Compressor summary: Intrinsic Image Diffusion is a model that generates multiple realistic material explanations for indoor scenes, using probabilistic methods and learned priors from real images to overcome challenges in appearance decomposition.
Chun-Mei Feng,Yang Bai,Tao Luo,Zhen Li,Salman Khan,Wangmeng Zuo,Xinxing Xu,Rick Siow Mong Goh,Yong Liu
http://arxiv.org/abs/2312.12273v1
Compressor summary: The paper proposes a VQA-based post-processing approach (VQA4CIR) to improve Composed Image Retrieval by detecting and correcting inconsistent results with captions using LLM and LVLM fine-tuning.
Gloria Araiza-Illan,Luke Meyer,Khiet P. Truong,Deniz Baskent
http://arxiv.org/abs/2312.12269v1
Compressor summary: The text proposes an automated DIN test using Kaldi-NL toolkit that can evaluate spoken responses without a human supervisor and evaluates its performance in two studies.
Steven Mortier,Amir Hamedpour,Bart Bussmann,Ruth Phoebe Tchana Wandji,Steven Latré,Bjarni D. Sigurdsson,Tom De Schepper,Tim Verdonck
http://arxiv.org/abs/2312.12258v1
Compressor summary: This study shows that soil temperature affects the start and peak of the growing season in subarctic grasslands, while other factors like air temperature, precipitation, and irradiance also play a role in vegetation phenology.
Jiayu Chen,Guosheng Li,Chao Yu,Xinyi Yang,Botian Xu,Huazhong Yang,Yu Wang
http://arxiv.org/abs/2312.12255v1
Compressor summary: The paper presents TaskFlex Solver, a reinforcement learning and curriculum learning method for multi-agent pursuit problems in complex environments, which achieves high capture rates and adapts to various task conditions.
Demircan Tas,Rohit Priyadarshi Sanatani
http://arxiv.org/abs/2312.12253v1
Compressor summary: The authors develop a sentiment analysis model for urban environments that can extract specific aspects and their sentiments from crowdsourced reviews of public parks, and show its improvement in prediction accuracy using BERT with LCF.
Idris Hamoud,Muhammad Abdullah Jamal,Vinkle Srivastav,Didier Mutter,Nicolas Padoy,Omid Mohareri
http://arxiv.org/abs/2312.12250v1
Compressor summary: The paper introduces a new object-based approach to recognize surgical activities in the OR using geometric arrangements between clinicians and devices, improving efficiency and performance with less data.
Asbjørn Munk,Ao Ma,Mads Nielsen
http://arxiv.org/abs/2312.12246v1
Compressor summary: The MDD-UNet is an unsupervised domain adaptation framework for U-Nets that improves image segmentation performance across different data characteristics using a theory-based approach with theoretical guarantees.
Mehran Kazemi,Hamidreza Alvari,Ankit Anand,Jialin Wu,Xi Chen,Radu Soricut
http://arxiv.org/abs/2312.12241v1
Compressor summary: The paper evaluates vision language models' mathematical reasoning abilities on geometry problems and finds they struggle with higher-depth problems requiring long chains of reasoning.
Yue Duan,Zhen Zhao,Lei Qi,Luping Zhou,Lei Wang,Yinghuan Shi
http://arxiv.org/abs/2312.12237v1
Compressor summary: The paper proposes a method to improve semi-supervised learning for fine-grained visual classification by selecting soft labels based on class transition tracking and confidence.
Xinying Zou,Samir M. Perlaza,Iñaki Esnaola,Eitan Altman
http://arxiv.org/abs/2312.12236v1
Compressor summary: The paper introduces a new tool to measure generalization in ML algorithms using the worst-case probability measure, which is a Gibbs probability measure.
Lingjun Zhang,Xinyuan Chen,Yaohui Wang,Yue Lu,Yu Qiao
http://arxiv.org/abs/2312.12232v1
Compressor summary: Diff-Text is a training-free framework that uses Stable Diffusion to generate realistic images with text in any language, improving text recognition and background blending.
Gaoge Han,Shaoli Huang,Mingming Gong,Jinglei Tang
http://arxiv.org/abs/2312.12227v1
Compressor summary: HuTuMotion is a new method that uses limited human feedback to improve the quality of generating natural human motions by adapting the prior distribution in latent diffusion models.
Satoki Ishikawa,Ryo Karakida
http://arxiv.org/abs/2312.12226v1
Compressor summary: The study proposes a specific parameterization for second-order optimization that enhances feature learning and allows transferring hyperparameters across different network widths.
Alonso Urbano,David W. Romero
http://arxiv.org/abs/2312.12223v1
Compressor summary: The paper proposes a method to detect different levels of symmetry in data without labels, improving generalization and robustness of models.
Junjue Wang,Zhuo Zheng,Zihang Chen,Ailong Ma,Yanfei Zhong
http://arxiv.org/abs/2312.12222v1
Compressor summary: The text introduces EarthVQA, a multi-modal multi-task dataset for advancing relational reasoning in Earth vision, and SOBA, a framework that leverages object semantics and relations for VQA.
Tobias Hyrup,Anton Danholt Lautrup,Arthur Zimek,Peter Schneider-Kamp
http://arxiv.org/abs/2312.12216v1
Compressor summary: The text discusses the need for a common framework to evaluate the privacy of synthetic data in healthcare, proposing four principles (CAIR) and a rubric to assess existing metrics.
Jie Qiao,Zhengming Chen,Jianhua Yu,Ruichu Cai,Zhifeng Hao
http://arxiv.org/abs/2312.12206v1
Compressor summary: The paper investigates how to learn causal structure from data with self-masking missingness, which makes existing methods fail, using an additive noise model, and proposes a practical algorithm based on theoretical results.
Yong Xien Chng,Henry Zheng,Yizeng Han,Xuchong Qiu,Gao Huang
http://arxiv.org/abs/2312.12198v1
Compressor summary: The text introduces MagNet, a novel method for referring image segmentation that uses mask grounding and cross-modal alignment to improve visual grounding and correspondence between language and images.
Dongwei Ye,Mengwu Guo
http://arxiv.org/abs/2312.12193v1
Compressor summary: The paper proposes a new Bayesian method to learn nonlinear dynamics from time series data without explicitly calculating state derivatives, which can improve the accuracy of learned models when data is scarce or noisy.
Chenyu Sun,Hangwei Qian,Chunyan Miao
http://arxiv.org/abs/2312.12191v1
Compressor summary: The paper proposes a Curiosity-driven Unsupervised Data Collection method to improve multi-task offline reinforcement learning by expanding feature space using adaptive temporal distances and reaching higher-quality data.
Yuecen Wei,Haonan Yuan,Xingcheng Fu,Qingyun Sun,Hao Peng,Xianxian Li,Chunming Hu
http://arxiv.org/abs/2312.12183v1
Compressor summary: PoinDP is a framework that uses hyperbolic geometry to protect hierarchical graph data from inference attacks and privacy leaks, while preserving performance in node classification tasks.
Jose L. Gómez,Manuel Silva,Antonio Seoane,Agnès Borrás,Mario Noriega,Germán Ros,Jose A. Iglesias-Guitian,Antonio M. López
http://arxiv.org/abs/2312.12176v1
Compressor summary: UrbanSyn is a high-quality synthetic urban driving dataset that enhances existing ones for unsupervised domain adaptation in image semantic segmentation.
Zhihang Liu,Jun Li,Hongtao Xie,Pandeng Li,Jiannan Ge,Sun-Ao Liu,Guoqing Jin
http://arxiv.org/abs/2312.12155v1
Compressor summary: MESM is a novel framework for improving video moment retrieval by enhancing video and text features to balance the modality gap between videos and queries.
Lingling Xu,Haoran Xie,Si-Zhao Joe Qin,Xiaohui Tao,Fu Lee Wang
http://arxiv.org/abs/2312.12148v1
Compressor summary: This paper reviews parameter efficient fine-tuning (PEFT) methods for pretrained language models, discussing their applications and future directions, and conducting experiments to understand their effectiveness.
Jinyi Liu,Zhi Wang,Yan Zheng,Jianye Hao,Chenjia Bai,Junjie Ye,Zhen Wang,Haiyin Piao,Yang Sun
http://arxiv.org/abs/2312.12145v1
Compressor summary: OVD-Explorer is a new method for noisy environment exploration in continuous control RL that balances optimism with over-exploration mitigation.
Siran Chen,Yue Ma,Yu Qiao,Yali Wang
http://arxiv.org/abs/2312.12144v1
Compressor summary: The paper proposes a Masked Bird-Eye-View (M-BEV) perception framework that improves robustness to failed camera views by randomly masking and reconstructing them in end-to-end training, achieving significant performance gains on the NuScenes benchmark.
Akshat Kishore Shrivastava,Tapan Kumar Gandhi
http://arxiv.org/abs/2312.12143v1
Compressor summary: The paper proposes a new method for waste classification that simulates nystagmus, a biological phenomenon affecting human vision, improving the accuracy of the Vision Transformer model by 2%.
Zhenhua Yang,Dezhi Peng,Yuxin Kong,Yuyi Zhang,Cong Yao,Lianwen Jin
http://arxiv.org/abs/2312.12142v1
Compressor summary: FontDiffuser is a diffusion-based method that improves font generation by combining global and local content cues, handling large style variations, and learning style representation from images.
Zeping Yu,Kailai Yang,Zhiwei Liu,Sophia Ananiadou
http://arxiv.org/abs/2312.12141v1
Compressor summary: This paper explores residual connections in transformers to better understand how they store and merge knowledge for language modeling, and proposes a method to analyze the influence of previous layers.
Po-An Wang,Ruo-Chun Tzeng,Alexandre Proutiere
http://arxiv.org/abs/2312.12137v1
Compressor summary: The paper studies how to identify the best arm in stochastic Multi-Armed Bandits with a fixed sampling budget, and proposes a new adaptive algorithm that outperforms existing ones using Large Deviation techniques.
Wooju Lee,Dasol Hong,Hyungtae Lim,Hyun Myung
http://arxiv.org/abs/2312.12133v1
Compressor summary: The paper proposes an object-aware domain generalization method for single-domain generalization in object detection, using data augmentation and training strategy to improve object localization and classification.
Shuli Wang,Kun Gao,Lanfang Zhang,Yang Liu,Lei Chen
http://arxiv.org/abs/2312.12123v1
Compressor summary: The paper proposes a framework that combines Mixture Density Networks and LSTM-based encoder-decoder networks to predict vehicle trajectories considering driver heterogeneity, achieving better predictions than existing models.
Xiang Feng,Yongbo He,Yubo Wang,Chengkai Wang,Zhenzhong Kuang,Jiajun Ding,Feiwei Qin,Jun Yu,Jianping Fan
http://arxiv.org/abs/2312.12122v1
Compressor summary: The paper proposes a zero-shot super-resolution framework for NeRF that uses internal learning to synthesize high-quality high-resolution novel views from low-resolution training data, without requiring external high-resolution data or additional scene information.
Susanne Hindennach,Lei Shi,Filip Miletić,Andreas Bulling
http://arxiv.org/abs/2312.12119v1
Compressor summary: The study examines how mind-attributing explanations in AI research affect perceptions of AI awareness and responsibility, finding that they can conceal AI responsibility from users.
Gwladys Kelodjou,Laurence Rozé,Véronique Masson,Luis Galárraga,Romaric Gaudel,Maurice Tchuente,Alexandre Termier
http://arxiv.org/abs/2312.12115v1
Compressor summary: The paper proposes two improvements for Kernel SHAP, a post-hoc explainability method for black-box machine learning models: (1) making it fully stable by changing its neighbor selection procedure, and (2) using the coalitions of Layer 1 for faster and more meaningful feature attribution.
Nabeel Seedat,Nicolas Huynh,Boris van Breugel,Mihaela van der Schaar
http://arxiv.org/abs/2312.12112v1
Compressor summary: CLLM is a method that uses large language models to generate and curate high-quality augmented datasets for machine learning tasks in low-data settings, improving performance compared to conventional generators.
Xiangyu Liu,Yang Liu,Wei Hu
http://arxiv.org/abs/2312.12108v1
Compressor summary: The paper proposes a new model, CCA, that uses both textual and graph information to better detect errors in knowledge graphs, especially noisy triplets with similar correct ones.
Yao Rong,Peizhu Qian,Vaibhav Unhelkar,Enkelejda Kasneci
http://arxiv.org/abs/2312.12102v1
Compressor summary: I-CEE is a framework for explaining image classification models to users based on their expertise level, using example images and local explanations tailored to each user.
Jaeyeul Kim,Jungwan Woo,Jeonghoon Kim,Sunghoon Im
http://arxiv.org/abs/2312.12098v1
Compressor summary: The paper introduces DDFE, a module that extracts features related to LiDAR point cloud density and improves domain generalization for 3D perception tasks.
Chunjie Luo,Fei Luo,Yusen Wang,Enxu Zhao,Chunxia Xiao
http://arxiv.org/abs/2312.12096v1
Compressor summary: DLCA-Recon is a method to create human avatars from monocular videos, using physical connection information and dynamic deformation fields to accurately model loose clothing movement.
Haodong Yan,Zhiming Hu,Syn Schmitt,Andreas Bulling
http://arxiv.org/abs/2312.12090v1
Compressor summary: GazeMoDiff is a new model that uses eye gaze to guide the generation of realistic human body motions for virtual reality applications, outperforming existing methods.
James Hong,Lu Yuan,Michaël Gharbi,Matthew Fisher,Kayvon Fatahalian
http://arxiv.org/abs/2312.12080v1
Compressor summary: GenCrop is a weakly-supervised method that learns subject-aware cropping from stock images and text-to-image diffusion models, achieving competitive results with supervised methods.
Wengang Guo,Jiayi Yang,Huilin Yin,Qijun Chen,Wei Ye
http://arxiv.org/abs/2312.12068v1
Compressor summary: PICNN is a novel method that improves the interpretability and performance of CNNs by clustering filters into class-specific groups using Bernoulli sampling and a reparameterization trick.
Nai-Chieh Huang,Ping-Chun Hsieh,Kuo-Hao Ho,I-Chen Wu
http://arxiv.org/abs/2312.12065v1
Compressor summary: The paper provides the first global convergence results for a variant of the Proximal Policy Optimization algorithm with clipping, using new analysis techniques and showing that the clipping range affects only the convergence constant.
Marian-Leontin Pop,Levente Tamas
http://arxiv.org/abs/2312.12064v1
Compressor summary: The paper proposes using Feature Pyramid Networks to reduce multipath inference artifacts in pulse-based ToF cameras, improving surface detection on planar surfaces.
Lena G. M. Bauer,Collin Leiber,Christian Böhm,Claudia Plant
http://arxiv.org/abs/2312.12050v1
Compressor summary: The paper proposes a sigmoid function to replace look-up tables for transforming Dip-values to Dip-p-values, improving computation speed and integration with learning algorithms.
Alexander Nikulin,Vladislav Kurenkov,Ilya Zisman,Artem Agarkov,Viacheslav Sinii,Sergey Kolesnikov
http://arxiv.org/abs/2312.12044v1
Compressor summary: XLand-MiniGrid is a JAX-based library for meta-reinforcement learning research with scalable grid-worlds and diverse tasks.
Zhiming Hu,Jiahui Xu,Syn Schmitt,Andreas Bulling
http://arxiv.org/abs/2312.12042v1
Compressor summary: The paper analyzes human eye and body movement coordination during everyday activities, and proposes Pose2Gaze, a model that generates realistic eye movements from full-body poses, outperforming existing methods.
Sichao Xiong,Yigit Ihlamur
http://arxiv.org/abs/2312.12037v1
Compressor summary: The research develops a new way to evaluate how well startup founders match their ideas using advanced language models, suggesting that each idea's success depends on the founder's background.
Jiachun Pan,Hanshu Yan,Jun Hao Liew,Jiashi Feng,Vincent Y. F. Tan
http://arxiv.org/abs/2312.12030v1
Compressor summary: SAG is a training-free guidance technique for diffusion models that uses symplectic adjoint method to accurately estimate clean images and generate high-quality images and videos.
Siamul Karim Khan,Patrick Tinsley,Mahsa Mitcheff,Patrick Flynn,Kevin W. Bowyer,Adam Czajka
http://arxiv.org/abs/2312.12028v1
Compressor summary: The paper presents a method to create realistic iris images that change size and preserve identity, improving iris recognition and forensic analysis.
DaLuo,Yanglei Gan,Rui Hou,Run Lin,Qiao Liu,Yuxiang Cai,Wannian Gao
http://arxiv.org/abs/2312.12021v1
Compressor summary: The paper proposes a novel framework for few-shot relation extraction that uses sentence-anchored and label-anchored contrastive losses to learn robust and uniform representations from incomplete instance-label pairs.
Marcel Boersma,Krishna Manoorkar,Alessandra Palmigiano,Mattia Panettiere,Apostolos Tzimoulis,Nachoem Wijnberg
http://arxiv.org/abs/2312.12010v1
Compressor summary: The paper presents a method to categorize business processes using bipartite graphs, formal concept analysis, and Dempster-Shafer theory to obtain explainable results for auditing purposes.
Top Piriyakulkij,Volodymyr Kuleshov,Kevin Ellis
http://arxiv.org/abs/2312.12009v1
Compressor summary: The paper proposes an algorithm to help large language models ask better questions and infer user preferences more efficiently in interactive systems, improving task performance and reducing user interactions.
Md. Rafiul Biswas,Ashhadul Islam,Zubair Shah,Wajdi Zaghouani,Samir Brahim Belhaouari
http://arxiv.org/abs/2312.12006v1
Compressor summary: This study evaluates a fine-tuned ChatGPT model as a personal medical assistant in Arabic, using online datasets and human evaluation metrics.
Leander van den Heuvel,Gertjan Burghouts,David W. Zhang,Gwenn Englebienne,Sabina B. van Rooij
http://arxiv.org/abs/2312.12000v1
Compressor summary: The text proposes a diffusion model that refines bounding boxes for object detection, improves performance, and uses the results for semi-supervised learning without human involvement.
Zhuowei Zhang,Mengting Hu,Yinhao Bai,Zhen Zhang
http://arxiv.org/abs/2312.11997v1
Compressor summary: The paper proposes a coreference-guided mind-map generation network (CMGN) that uses a coreference graph to capture structural information and improve the understanding of document logic and semantics, achieving better performance than existing methods.
Korrawe Karunratanakul,Konpat Preechakul,Emre Aksan,Thabo Beeler,Supasorn Suwajanakorn,Siyu Tang
http://arxiv.org/abs/2312.11994v1
Compressor summary: DNO is a new method that uses existing motion diffusion models to optimize motion-related tasks without training new models or relying on complex algorithms.
Hongyin Zhu,Prayag Tiwari
http://arxiv.org/abs/2312.11985v1
Compressor summary: The paper introduces a framework to assess climate crisis knowledge in large language models using diverse questions and comprehensive metrics, revealing gaps in their up-to-date information.
Yongqi An,Xu Zhao,Tao Yu,Ming Tang,Jinqiao Wang
http://arxiv.org/abs/2312.11983v1
Compressor summary: FLAP is a novel retraining-free structured pruning framework for large language models that reduces storage, enhances inference speed, and outperforms existing methods without retraining.
Dongmin Kim,Sunghyun Park,Jaegul Choo
http://arxiv.org/abs/2312.11976v1
Compressor summary: The paper proposes a method for detecting anomalies in time-series data that adapts to changing normalities over time and improves performance on real-world benchmarks.
Haeyong Kang,Jaehong Yoon,Sung Ju Hwang,Chang D. Yoo
http://arxiv.org/abs/2312.11973v1
Compressor summary: The paper introduces Winning Subnetworks (WSN), a method that uses reused weights in dense networks to improve continual learning, and proposes the Fourier Subneural Operator (FSO) to address limitations in video incremental learning.
Pengxiang Ding,Qiongjie Cui,Min Zhang,Mengyuan Liu,Haofan Wang,Donglin Wang
http://arxiv.org/abs/2312.11972v1
Compressor summary: The text proposes a novel framework that predicts both coarse and fine-grained human activities collaboratively, achieving state-of-the-art performance on a large-scale benchmark.
Chen Gao,Xiaochong Lan,Nian Li,Yuan Yuan,Jingtao Ding,Zhilun Zhou,Fengli Xu,Yong Li
http://arxiv.org/abs/2312.11970v1
Compressor summary: This paper reviews how large language models can improve agent-based modeling and simulation, exploring their challenges and applications in various domains.
Anubha Pandey,Aditi Rai,Maneet Singh,Deepak Bhatt,Tanmoy Bhowmik
http://arxiv.org/abs/2312.11969v1
Compressor summary: The paper proposes a new method to reduce bias in automated prediction algorithms by mixing feature statistics across different groups based on protected attributes, improving fairness metrics with minimal accuracy loss.
Wei Tang,Liang Li,Xuejing Liu,Lu Jin,Jinhui Tang,Zechao Li
http://arxiv.org/abs/2312.11967v1
Compressor summary: Our proposed framework for visual grounding uses context disentangling and prototype inheriting to improve discrimination and generalization, achieving state-of-the-art results on standard and open-vocabulary scenes.
Nan Jiang,Md Nasim,Yexiang Xue
http://arxiv.org/abs/2312.11955v1
Compressor summary: Vertical Symbolic Regression is a method to speed up AI-driven scientific discovery by fitting simple expressions involving a few independent variables at a time and gradually adding more variables, resulting in a significantly smaller search space than horizontal approaches.
Huafeng Qin,Xin Jin,Yun Jiang,Mounim A. El-Yacoubi,Xinbo Gao
http://arxiv.org/abs/2312.11954v1
Compressor summary: AdAutomixup is an adversarial automatic mixup augmentation method that generates diverse and challenging mixed samples to improve image classification accuracy by alternatively optimizing a mixed example generator and a target classifier.
Collin Leiber,Dominik Mautz,Claudia Plant,Christian Böhm
http://arxiv.org/abs/2312.11952v1
Compressor summary: The paper presents a framework using the Minimum Description Length Principle (MDL) to automatically find the number of subspaces and clusters in high-dimensional datasets, with efficient procedures for splitting, merging, and outlier detection.
Rui Liu,Yifan Hu,Yi Ren,Xiang Yin,Haizhou Li
http://arxiv.org/abs/2312.11947v1
Compressor summary: The paper proposes ECSS, a novel model that enhances emotion understanding using a heterogeneous graph-based mechanism and achieves emotion rendering with contrastive learning, improving emotional CSS performance.
Haowei Du,Dingyu Zhang,Chen Li,Yang Li,Dongyan Zhao
http://arxiv.org/abs/2312.11945v1
Compressor summary: The authors propose a new framework for incomplete utterance rewriting that captures multi-granularity semantic information, selects relevant context, and constructs an edit matrix, achieving state-of-the-art results on two benchmark datasets.
Xiyuan Jin,Jing Wang,Lei Liu,Youfang Lin
http://arxiv.org/abs/2312.11939v1
Compressor summary: This study introduces a modification to time-series contrastive learning that addresses false negatives and class imbalance issues, improving representation learning for minority classes using instance graphs and semi-supervised classification.
Yuang Liu,Jing Wang,Qiang Zhou,Fan Wang,Jun Wang,Wei Zhang
http://arxiv.org/abs/2312.11938v1
Compressor summary: DMT is a pretraining method that uses multiple self-supervised models to improve general visual representations and achieve state-of-the-art results on classification and dense tasks.
Yuyang Xia,Shuncheng Liu,Quanlin Yu,Liwei Deng,You Zhang,Han Su,Kai Zheng
http://arxiv.org/abs/2312.11935v1
Compressor summary: AUTO is a deep reinforcement learning framework for autonomous vehicles that handles complex environments and considers the impact on surrounding vehicles, leading to improved safety, efficiency, and comfort.
Wei Chen,Zhiyi Huang,Ruichu Cai,Zhifeng Hao,Kun Zhang
http://arxiv.org/abs/2312.11934v1
Compressor summary: The text describes a novel method to identify causal edges between observed variables using higher-order cumulants and latent variable influence, with an asymmetry criterion to determine the causal direction.
Yujie Li,Zezhi Shao,Yongjun Xu,Qiang Qiu,Zhaogang Cao,Fei Wang
http://arxiv.org/abs/2312.11933v1
Compressor summary: The paper proposes a novel dynamic frequency domain graph convolution network (DFDGCN) for traffic prediction that captures spatial dependencies, mitigates time-shift effects, and handles noise in data.
Pengwei Yan,Kaisong Song,Zhuoren Jiang,Yangyang Kang,Tianqianjin Lin,Changlong Sun,Xiaozhong Liu
http://arxiv.org/abs/2312.11927v1
Compressor summary: DGPM is a new method that improves graph pretraining by discovering and using significant graph motifs for better representation learning and transferability.
Yulai Cong,Sijia Li
http://arxiv.org/abs/2312.11926v1
Compressor summary: The Big Learning EM algorithm improves mixture model training by using a foundation model approach to avoid bad local optima and achieve optimal results.
Xiaomeng Yang,Zhi Qiao,Yu Zhou,Weiping Wang
http://arxiv.org/abs/2312.11923v1
Compressor summary: This paper presents a fast and accurate scene text recognition method using a parallel and iterative decoder with an easy-first strategy and discrete diffusion for bidirectional context exploration.
Haowei Du,Quzhe Huang,Chen Li,Chen Zhang,Yang Li,Dongyan Zhao
http://arxiv.org/abs/2312.11922v1
Compressor summary: The paper proposes a new method for answering questions in knowledge graphs using dual relation graphs that improve the representations of entities and relations and achieve better performance.
Chen Li
http://arxiv.org/abs/2312.11920v1
Compressor summary: The paper proposes a novel TTS method using large language models and prompt learning to disambiguate polyphonic characters in Mandarin Chinese.
Ganesh Bikshandi,Jay Shah
http://arxiv.org/abs/2312.11918v1
Compressor summary: The paper presents an optimized implementation of the FlashAttention-2 attention algorithm on NVIDIA Hopper GPUs using the CUTLASS library, achieving 20-50% higher FLOPs/s than the previous Ampere version.
Weipeng Guan,Peiyu Chen,Huibin Zhao,Yu Wang,Peng Lu
http://arxiv.org/abs/2312.11911v1
Compressor summary: Event cameras use motion-activated sensors to track 6 DoF pose and reconstruct 3D scenes with high accuracy, robustness, and efficiency using a novel event-based hybrid tracking framework called EVI-SAM.
Jie Liu,Yijia Cao,Yong Li,Yixiu Guo,Wei Deng
http://arxiv.org/abs/2312.11898v1
Compressor summary: The study introduces a new method (Attention-GCN-LSTM) that uses graph convolutional networks, long short-term memory, and attention to predict line loss rates accurately across multiple horizons in distribution networks.
Bruno Korbar,Yongqin Xian,Alessio Tonioni,Andrew Zisserman,Federico Tombari
http://arxiv.org/abs/2312.11897v1
Compressor summary: The paper introduces a text-conditioned video resampler module that uses a frozen visual encoder and a large language model to process long videos for various tasks, achieving state-of-the-art results on several benchmarks.
Mosam Dabhi,Laszlo A. Jeni,Simon Lucey
http://arxiv.org/abs/2312.11894v1
Compressor summary: The 3D Lifting Foundation Model (3D-LFM) is a transformer-based approach that can reconstruct various object classes from 2D landmarks, overcoming limitations of traditional methods and handling occlusions and perspectives.
Unggi Lee,Sungjun Yoon,Joon Seo Yun,Kyoungsoo Park,YoungHoon Jung,Damji Stratton,Hyeoncheol Kim
http://arxiv.org/abs/2312.11890v1
Compressor summary: The paper introduces new methods to improve knowledge tracing models by considering question and concept difficulty levels using contrastive learning and a large language model for prediction.
Ziqian Zeng,Yihuai Hong,Hongliang Dai,Huiping Zhuang,Cen Chen
http://arxiv.org/abs/2312.11882v1
Compressor summary: ConsistentEE is a reinforcement learning-based early exiting method for efficient inference that ensures correct prediction by one internal classifier and adapts to instance difficulty.
Taehong Jang,Joonmo Ahn,Sojung Lucia Kim
http://arxiv.org/abs/2312.11881v1
Compressor summary: The authors developed models to predict punctuation and spacing for Korean historical texts, achieving good results and enabling fast inference on low-performance GPUs.
Alperen Enes Bayar,Ufuk Uyan,Elif Toprak,Cao Yuheng,Tang Juncheng,Ahmet Alp Kindiroglu
http://arxiv.org/abs/2312.11880v1
Compressor summary: Key points: - The paper studies how RandLA-Net, a neural network, can segment 3D point cloud data in urban environments - It uses transfer learning and class remapping to overcome data scarcity and adapt to specific cities in China - It achieves over 80% F1 score for each city, showing its effectiveness Summary: The paper applies RandLA-Net with transfer learning and class remapping to segment 3D point cloud data in urban areas of three Chinese cities, achieving high accuracy.
Weixi Song,Zuchao Li,Lefei Zhang,Hai Zhao,Bo Du
http://arxiv.org/abs/2312.11875v1
Compressor summary: The paper investigates the loss landscape transition in fine-tuning pre-trained models, proposes Sparse Increment Fine-Tuning (SIFT) algorithm to exploit sparsity in gradients for efficient adaptation, and shows its effectiveness on various tasks.
Yanqi Ge,Qiang Nie,Ye Huang,Yong Liu,Chengjie Wang,Feng Zheng,Wen Li,Lixin Duan
http://arxiv.org/abs/2312.11872v1
Compressor summary: The paper proposes Semantic Anchor Regularization, a method that uses pre-defined class anchors to guide feature learning and achieve compactness within classes and separability between classes, while avoiding biases from long-tailed data.
Zizhong Li,Haopeng Zhang,Jiawei Zhang
http://arxiv.org/abs/2312.11870v1
Compressor summary: The paper proposes an augmented fake news dataset using ChatGPT for fact-checking, which can help reduce bias and improve detection compared to relying on human journalists alone.
Kaiyi Zhang,Yang Chen,Ximing Yang,Weizhong Zhang,Cheng Jin
http://arxiv.org/abs/2312.11867v1
Compressor summary: The text proposes a four-stage process for point cloud part editing and introduces SGAS, a model that uses feature disentanglement and constraint strategies to improve diversity, fidelity, and quality.
Weiyu Ma,Qirui Mi,Xue Yan,Yuqiao Wu,Runji Lin,Haifeng Zhang,Jun Wang
http://arxiv.org/abs/2312.11865v1
Compressor summary: Key points: - The paper explores using large language model agents for StarCraft II, a complex RTS game. - It develops a textual environment, TextStarCraft II, and a Chain of Summarization method to leverage LLMs' reasoning abilities. - It evaluates the performance and knowledge of LLM agents in the game and compares them with human experts and built-in AI. - The results show that LLM agents can achieve similar or better performance than average players and defeat the built-in AI. Summary: The paper presents a textual StarCraft II environment and a summarization method to enable large language model agents to play the game effectively. It shows that LLM agents can master StarCraft II knowledge and outperform human experts and built-in AI in the game.
Di Wu,Yuling Jiao,Li Shen,Haizhao Yang,Xiliang Lu
http://arxiv.org/abs/2312.11863v1
Compressor summary: This paper analyzes the theoretical guarantees of deep reinforcement learning in offline decision-making scenarios, considering general neural network approximation and various data properties.
Karthikeyan Natesan Ramamurthy,Aldo Guzmán-Sáenz,Mustafa Hajij
http://arxiv.org/abs/2312.11862v1
Compressor summary: Topo-MLP is a fast and robust MLP-based algorithm to learn representations in higher order network models using a novel HONC loss that implicitly incorporates the simplicial structure.
Boshi Tang,Zhiyong Wu,Xixin Wu,Qiaochu Huang,Jun Chen,Shun Lei,Helen Meng
http://arxiv.org/abs/2312.11858v1
Compressor summary: The paper proposes a novel GNN calibration framework, SimCalib, that considers nodewise similarity at global and local levels to improve performance in cost-sensitive scenarios.
Jiarong Guo,Xiaogang Xu,Hengshuang Zhao
http://arxiv.org/abs/2312.11856v1
Compressor summary: The paper proposes a self-supervised learning technique for 3D-GANs that uses an encoder and a cyclic constraint to improve the quality of 3D geometrical modeling.
Zheng Wei Lim,Ekaterina Vylomova,Charles Kemp,Trevor Cohn
http://arxiv.org/abs/2312.11852v1
Compressor summary: The study uses data from human translators and a machine translation model to explore how surprisal and attention affect translation difficulty and time spent on translation tasks.
Xinshun Wang,Qiongjie Cui,Chen Chen,Mengyuan Liu
http://arxiv.org/abs/2312.11850v1
Compressor summary: Universal Graph Convolution (UniGC) introduces a new graph convolution concept that adapts different graph convolutions as special cases and GCNext, a novel GCN-building paradigm that dynamically determines the best-fitting graph convolutions for human motion prediction, achieving state-of-the-art performance with up to 9x lower computational cost.
Guangming Liu,Qi Liu,Jing Liang,Quanying Sun
http://arxiv.org/abs/2312.11849v1
Compressor summary: The paper presents a new variational active contour model based on Aubert-Aujol denoising that can segment images with multiplicative gamma noise, and proposes two fast fixed point algorithms to solve the problem efficiently.
Avinandan Bose,Mihaela Curmei,Daniel L. Jiang,Jamie Morgenstern,Sarah Dean,Lillian J. Ratliff,Maryam Fazel
http://arxiv.org/abs/2312.11846v1
Compressor summary: The paper proposes a randomized algorithm to learn user preferences for multiple subpopulations with heterogeneous data distributions, and shows that it achieves near-optimal results in terms of total loss.
Chaojian Li,Bichen Wu,Peter Vajda,Yingyan,Lin
http://arxiv.org/abs/2312.11841v1
Compressor summary: MixRT is a novel NeRF representation that enables real-time rendering on edge devices using a low-quality mesh, a displacement map, and a compressed model.
Zeinab Zakani,Hadi Moradi,Sogand Ghasemzadeh,Maryam Riazi,Fatemeh Mortazavi
http://arxiv.org/abs/2312.11832v1
Compressor summary: The FishFinder is a video game that can accurately identify ADHD in children by measuring their attention, impulsivity, and hyperactivity during gameplay.
Yacine Izza,Kuldeep S. Meel,Joao Marques-Silva
http://arxiv.org/abs/2312.11831v1
Compressor summary: The paper presents efficient algorithms to compute approximate probabilistic abductive explanations for machine learning, addressing their theoretical and practical complexity.
Burak Aksar,Yara Rizk,Tathagata Chakraborti
http://arxiv.org/abs/2312.11828v1
Compressor summary: The paper proposes an efficient algorithm for parsing and orchestrating multi-intent inputs in decentralized NLU for chatbots in multi-agent systems, achieving high accuracy and speed.
Yufei Cai,Yuxiang Wei,Zhilong Ji,Jinfeng Bai,Hu Han,Wangmeng Zuo
http://arxiv.org/abs/2312.11826v1
Compressor summary: DETEX is a novel text-to-image generation approach that learns disentangled concept embeddings with multiple word embeddings and attribute mappers, achieving better representation and editability of the target concept.
Youshao Xiao,Weichang Wu,Zhenglei Zhou,Fagui Mao,Shangchun Zhao,Lin Ju,Lei Liang,Xiaolu Zhang,Jun Zhou
http://arxiv.org/abs/2312.11819v1
Compressor summary: Key points: - Large language models like ChatGPT are versatile but face challenges in RLHF training efficiency - Current Flattening strategy treats all models as one entity and causes bottlenecks - Proposed adaptive model placement framework with two strategies: Interleaving and Separation - The framework improves the throughput, reduces memory redundancy, and communication costs - Experiments show significant improvements over SOTA approaches in various scenarios Summary: The paper presents an adaptive model placement framework for distributed RLHF training that uses two strategies to improve efficiency, reduce bottlenecks, and outperform existing methods.
Phuoc Nguyen,Truyen Tran,Sunil Gupta,Thin Nguyen,Svetha Venkatesh
http://arxiv.org/abs/2312.11818v1
Compressor summary: The paper proposes a noisy functional causal model for identifying root causes of anomalies in causal processes by considering both node and edge contributions, using Bayesian learning and inference methods and an efficient gradient-based attribution method.
Shezheng Song,Shan Zhao,Chengyu Wang,Tianwei Yan,Shasha Li,Xiaoguang Mao,Meng Wang
http://arxiv.org/abs/2312.11816v1
Compressor summary: Key points: - Multimodal Entity Linking (MEL) links ambiguous mentions with multimodal information to entities in Knowledge Graph (KG). - Existing methods face challenges such as modality impurity and ambiguity. - The paper proposes a Dual-way Enhanced (DWE) framework that refines queries with multimodal data, leverages fine-grained image attributes, and enriches entity semantics using Wikipedia descriptions. - The method achieves state-of-the-art performance on three benchmarks. Summary: The paper presents DWE, a novel framework for MEL that improves query refinement, visual enhancement, and entity semantics using multimodal data and Wikipedia descriptions, and outperforms existing methods.
Fengli Xu,Jun Zhang,Chen Gao,Jie Feng,Yong Li
http://arxiv.org/abs/2312.11813v1
Compressor summary: The paper introduces Urban Generative Intelligence (UGI), a platform that uses large language models to create embodied agents for various urban tasks, simulating and addressing complex urban systems.
Mahmoud SalahEldin Kasem,Mohamed Mahmoud,Hyun-Soo Kang
http://arxiv.org/abs/2312.11812v1
Compressor summary: This paper reviews applications, methodologies, and challenges of Arabic Optical Character Recognition (OCR) and identifies research gaps to guide future development.
Gemini Team,Rohan Anil,Sebastian Borgeaud,Yonghui Wu,Jean-Baptiste Alayrac,Jiahui Yu,Radu Soricut,Johan Schalkwyk,Andrew M. Dai,Anja Hauth,Katie Millican,David Silver,Slav Petrov,Melvin Johnson,Ioannis Antonoglou,Julian Schrittwieser,Amelia Glaese,Jilin Chen,Emily Pitler,Timothy Lillicrap,Angeliki Lazaridou,Orhan Firat,James Molloy,Michael Isard,Paul R. Barham,Tom Hennigan,Benjamin Lee,Fabio Viola,Malcolm Reynolds,Yuanzhong Xu,Ryan Doherty,Eli Collins,Clemens Meyer,Eliza Rutherford,Erica Moreira,Kareem Ayoub,Megha Goel,George Tucker,Enrique Piqueras,Maxim Krikun,Iain Barr,Nikolay Savinov,Ivo Danihelka,Becca Roelofs,Anaïs White,Anders Andreassen,Tamara von Glehn,Lakshman Yagati,Mehran Kazemi,Lucas Gonzalez,Misha Khalman,Jakub Sygnowski,Alexandre Frechette,Charlotte Smith,Laura Culp,Lev Proleev,Yi Luan,Xi Chen,James Lottes,Nathan Schucher,Federico Lebron,Alban Rrustemi,Natalie Clay,Phil Crone,Tomas Kocisky,Jeffrey Zhao,Bartek Perz,Dian Yu,Heidi Howard,Adam Bloniarz,Jack W. Rae,Han Lu,Laurent Sifre,Marcello Maggioni,Fred Alcober,Dan Garrette,Megan Barnes,Shantanu Thakoor,Jacob Austin,Gabriel Barth-Maron,William Wong,Rishabh Joshi,Rahma Chaabouni,Deeni Fatiha,Arun Ahuja,Ruibo Liu,Yunxuan Li,Sarah Cogan,Jeremy Chen,Chao Jia,Chenjie Gu,Qiao Zhang,Jordan Grimstad,Ale Jakse Hartman,Martin Chadwick,Gaurav Singh Tomar,Xavier Garcia,Evan Senter,Emanuel Taropa,Thanumalayan Sankaranarayana Pillai,Jacob Devlin,Michael Laskin,Diego de Las Casas,Dasha Valter,Connie Tao,Lorenzo Blanco,Adrià Puigdomènech Badia,David Reitter,Mianna Chen,Jenny Brennan,Clara Rivera,Sergey Brin,Shariq Iqbal,Gabriela Surita,Jane Labanowski,Abhi Rao,Stephanie Winkler,Emilio Parisotto,Yiming Gu,Kate Olszewska,Yujing Zhang,Ravi Addanki,Antoine Miech,Annie Louis,Laurent El Shafey,Denis Teplyashin,Geoff Brown,Elliot Catt,Nithya Attaluri,Jan Balaguer,Jackie Xiang,Pidong Wang,Zoe Ashwood,Anton Briukhov,Albert Webson,Sanjay Ganapathy,Smit Sanghavi,Ajay Kannan,Ming-Wei Chang,Axel Stjerngren,Josip Djolonga,Yuting Sun,Ankur Bapna,Matthew Aitchison,Pedram Pejman,Henryk Michalewski,Tianhe Yu,Cindy Wang,Juliette Love,Junwhan Ahn,Dawn Bloxwich,Kehang Han,Peter Humphreys,Thibault Sellam,James Bradbury,Varun Godbole,Sina Samangooei,Bogdan Damoc,Alex Kaskasoli,Sébastien M. R. Arnold,Vijay Vasudevan,Shubham Agrawal,Jason Riesa,Dmitry Lepikhin,Richard Tanburn,Srivatsan Srinivasan,Hyeontaek Lim,Sarah Hodkinson,Pranav Shyam,Johan Ferret,Steven Hand,Ankush Garg,Tom Le Paine,Jian Li,Yujia Li,Minh Giang,Alexander Neitz,Zaheer Abbas,Sarah York,Machel Reid,Elizabeth Cole,Aakanksha Chowdhery,Dipanjan Das,Dominika Rogozińska,Vitaly Nikolaev,Pablo Sprechmann,Zachary Nado,Lukas Zilka,Flavien Prost,Luheng He,Marianne Monteiro,Gaurav Mishra,Chris Welty,Josh Newlan,Dawei Jia,Miltiadis Allamanis,Clara Huiyi Hu,Raoul de Liedekerke,Justin Gilmer,Carl Saroufim,Shruti Rijhwani,Shaobo Hou,Disha Shrivastava,Anirudh Baddepudi,Alex Goldin,Adnan Ozturel,Albin Cassirer,Yunhan Xu,Daniel Sohn,Devendra Sachan,Reinald Kim Amplayo,Craig Swanson,Dessie Petrova,Shashi Narayan,Arthur Guez,Siddhartha Brahma,Jessica Landon,Miteyan Patel,Ruizhe Zhao,Kevin Villela,Luyu Wang,Wenhao Jia,Matthew Rahtz,Mai Giménez,Legg Yeung,Hanzhao Lin,James Keeling,Petko Georgiev,Diana Mincu,Boxi Wu,Salem Haykal,Rachel Saputro,Kiran Vodrahalli,James Qin,Zeynep Cankara,Abhanshu Sharma,Nick Fernando,Will Hawkins,Behnam Neyshabur,Solomon Kim,Adrian Hutter,Priyanka Agrawal,Alex Castro-Ros,George van den Driessche,Tao Wang,Fan Yang,Shuo-yiin Chang,Paul Komarek,Ross McIlroy,Mario Lučić,Guodong Zhang,Wael Farhan,Michael Sharman,Paul Natsev,Paul Michel,Yong Cheng,Yamini Bansal,Siyuan Qiao,Kris Cao,Siamak Shakeri,Christina Butterfield,Justin Chung,Paul Kishan Rubenstein,Shivani Agrawal,Arthur Mensch,Kedar Soparkar,Karel Lenc,Timothy Chung,Aedan Pope,Loren Maggiore,Jackie Kay,Priya Jhakra,Shibo Wang,Joshua Maynez,Mary Phuong,Taylor Tobin,Andrea Tacchetti,Maja Trebacz,Kevin Robinson,Yash Katariya,Sebastian Riedel,Paige Bailey,Kefan Xiao,Nimesh Ghelani,Lora Aroyo,Ambrose Slone,Neil Houlsby,Xuehan Xiong,Zhen Yang,Elena Gribovskaya,Jonas Adler,Mateo Wirth,Lisa Lee,Music Li,Thais Kagohara,Jay Pavagadhi,Sophie Bridgers,Anna Bortsova,Sanjay Ghemawat,Zafarali Ahmed,Tianqi Liu,Richard Powell,Vijay Bolina,Mariko Iinuma,Polina Zablotskaia,James Besley,Da-Woon Chung,Timothy Dozat,Ramona Comanescu,Xiance Si,Jeremy Greer,Guolong Su,Martin Polacek,Raphaël Lopez Kaufman,Simon Tokumine,Hexiang Hu,Elena Buchatskaya,Yingjie Miao,Mohamed Elhawaty,Aditya Siddhant,Nenad Tomasev,Jinwei Xing,Christina Greer,Helen Miller,Shereen Ashraf,Aurko Roy,Zizhao Zhang,Ada Ma,Angelos Filos,Milos Besta,Rory Blevins,Ted Klimenko,Chih-Kuan Yeh,Soravit Changpinyo,Jiaqi Mu,Oscar Chang,Mantas Pajarskas,Carrie Muir,Vered Cohen,Charline Le Lan,Krishna Haridasan,Amit Marathe,Steven Hansen,Sholto Douglas,Rajkumar Samuel,Mingqiu Wang,Sophia Austin,Chang Lan,Jiepu Jiang,Justin Chiu,Jaime Alonso Lorenzo,Lars Lowe Sjösund,Sébastien Cevey,Zach Gleicher,Thi Avrahami,Anudhyan Boral,Hansa Srinivasan,Vittorio Selo,Rhys May,Konstantinos Aisopos,Léonard Hussenot,Livio Baldini Soares,Kate Baumli,Michael B. Chang,Adrià Recasens,Ben Caine,Alexander Pritzel,Filip Pavetic,Fabio Pardo,Anita Gergely,Justin Frye,Vinay Ramasesh,Dan Horgan,Kartikeya Badola,Nora Kassner,Subhrajit Roy,Ethan Dyer,Víctor Campos,Alex Tomala,Yunhao Tang,Dalia El Badawy,Elspeth White,Basil Mustafa,Oran Lang,Abhishek Jindal,Sharad Vikram,Zhitao Gong,Sergi Caelles,Ross Hemsley,Gregory Thornton,Fangxiaoyu Feng,Wojciech Stokowiec,Ce Zheng,Phoebe Thacker,Çağlar Ünlü,Zhishuai Zhang,Mohammad Saleh,James Svensson,Max Bileschi,Piyush Patil,Ankesh Anand,Roman Ring,Katerina Tsihlas,Arpi Vezer,Marco Selvi,Toby Shevlane,Mikel Rodriguez,Tom Kwiatkowski,Samira Daruki,Keran Rong,Allan Dafoe,Nicholas FitzGerald,Keren Gu-Lemberg,Mina Khan,Lisa Anne Hendricks,Marie Pellat,Vladimir Feinberg,James Cobon-Kerr,Tara Sainath,Maribeth Rauh,Sayed Hadi Hashemi,Richard Ives,Yana Hasson,YaGuang Li,Eric Noland,Yuan Cao,Nathan Byrd,Le Hou,Qingze Wang,Thibault Sottiaux,Michela Paganini,Jean-Baptiste Lespiau,Alexandre Moufarek,Samer Hassan,Kaushik Shivakumar,Joost van Amersfoort,Amol Mandhane,Pratik Joshi,Anirudh Goyal,Matthew Tung,Andrew Brock,Hannah Sheahan,Vedant Misra,Cheng Li,Nemanja Rakićević,Mostafa Dehghani,Fangyu Liu,Sid Mittal,Junhyuk Oh,Seb Noury,Eren Sezener,Fantine Huot,Matthew Lamm,Nicola De Cao,Charlie Chen,Gamaleldin Elsayed,Ed Chi,Mahdis Mahdieh,Ian Tenney,Nan Hua,Ivan Petrychenko,Patrick Kane,Dylan Scandinaro,Rishub Jain,Jonathan Uesato,Romina Datta,Adam Sadovsky,Oskar Bunyan,Dominik Rabiej,Shimu Wu,John Zhang,Gautam Vasudevan,Edouard Leurent,Mahmoud Alnahlawi,Ionut Georgescu,Nan Wei,Ivy Zheng,Betty Chan,Pam G Rabinovitch,Piotr Stanczyk,Ye Zhang,David Steiner,Subhajit Naskar,Michael Azzam,Matthew Johnson,Adam Paszke,Chung-Cheng Chiu,Jaume Sanchez Elias,Afroz Mohiuddin,Faizan Muhammad,Jin Miao,Andrew Lee,Nino Vieillard,Sahitya Potluri,Jane Park,Elnaz Davoodi,Jiageng Zhang,Jeff Stanway,Drew Garmon,Abhijit Karmarkar,Zhe Dong,Jong Lee,Aviral Kumar,Luowei Zhou,Jonathan Evens,William Isaac,Zhe Chen,Johnson Jia,Anselm Levskaya,Zhenkai Zhu,Chris Gorgolewski,Peter Grabowski,Yu Mao,Alberto Magni,Kaisheng Yao,Javier Snaider,Norman Casagrande,Paul Suganthan,Evan Palmer,Geoffrey Irving,Edward Loper,Manaal Faruqui,Isha Arkatkar,Nanxin Chen,Izhak Shafran,Michael Fink,Alfonso Castaño,Irene Giannoumis,Wooyeol Kim,Mikołaj Rybiński,Ashwin Sreevatsa,Jennifer Prendki,David Soergel,Adrian Goedeckemeyer,Willi Gierke,Mohsen Jafari,Meenu Gaba,Jeremy Wiesner,Diana Gage Wright,Yawen Wei,Harsha Vashisht,Yana Kulizhskaya,Jay Hoover,Maigo Le,Lu Li,Chimezie Iwuanyanwu,Lu Liu,Kevin Ramirez,Andrey Khorlin,Albert Cui,Tian LIN,Marin Georgiev,Marcus Wu,Ricardo Aguilar,Keith Pallo,Abhishek Chakladar,Alena Repina,Xihui Wu,Tom van der Weide,Priya Ponnapalli,Caroline Kaplan,Jiri Simsa,Shuangfeng Li,Olivier Dousse,Fan Yang,Jeff Piper,Nathan Ie,Minnie Lui,Rama Pasumarthi,Nathan Lintz,Anitha Vijayakumar,Lam Nguyen Thiet,Daniel Andor,Pedro Valenzuela,Cosmin Paduraru,Daiyi Peng,Katherine Lee,Shuyuan Zhang,Somer Greene,Duc Dung Nguyen,Paula Kurylowicz,Sarmishta Velury,Sebastian Krause,Cassidy Hardin,Lucas Dixon,Lili Janzer,Kiam Choo,Ziqiang Feng,Biao Zhang,Achintya Singhal,Tejasi Latkar,Mingyang Zhang,Quoc Le,Elena Allica Abellan,Dayou Du,Dan McKinnon,Natasha Antropova,Tolga Bolukbasi,Orgad Keller,David Reid,Daniel Finchelstein,Maria Abi Raad,Remi Crocker,Peter Hawkins,Robert Dadashi,Colin Gaffney,Sid Lall,Ken Franko,Egor Filonov,Anna Bulanova,Rémi Leblond,Vikas Yadav,Shirley Chung,Harry Askham,Luis C. Cobo,Kelvin Xu,Felix Fischer,Jun Xu,Christina Sorokin,Chris Alberti,Chu-Cheng Lin,Colin Evans,Hao Zhou,Alek Dimitriev,Hannah Forbes,Dylan Banarse,Zora Tung,Jeremiah Liu,Mark Omernick,Colton Bishop,Chintu Kumar,Rachel Sterneck,Ryan Foley,Rohan Jain,Swaroop Mishra,Jiawei Xia,Taylor Bos,Geoffrey Cideron,Ehsan Amid,Francesco Piccinno,Xingyu Wang,Praseem Banzal,Petru Gurita,Hila Noga,Premal Shah,Daniel J. Mankowitz,Alex Polozov,Nate Kushman,Victoria Krakovna,Sasha Brown,MohammadHossein Bateni,Dennis Duan,Vlad Firoiu,Meghana Thotakuri,Tom Natan,Anhad Mohananey,Matthieu Geist,Sidharth Mudgal,Sertan Girgin,Hui Li,Jiayu Ye,Ofir Roval,Reiko Tojo,Michael Kwong,James Lee-Thorp,Christopher Yew,Quan Yuan,Sumit Bagri,Danila Sinopalnikov,Sabela Ramos,John Mellor,Abhishek Sharma,Aliaksei Severyn,Jonathan Lai,Kathy Wu,Heng-Tze Cheng,David Miller,Nicolas Sonnerat,Denis Vnukov,Rory Greig,Jennifer Beattie,Emily Caveness,Libin Bai,Julian Eisenschlos,Alex Korchemniy,Tomy Tsai,Mimi Jasarevic,Weize Kong,Phuong Dao,Zeyu Zheng,Frederick Liu,Fan Yang,Rui Zhu,Mark Geller,Tian Huey Teh,Jason Sanmiya,Evgeny Gladchenko,Nejc Trdin,Andrei Sozanschi,Daniel Toyama,Evan Rosen,Sasan Tavakkol,Linting Xue,Chen Elkind,Oliver Woodman,John Carpenter,George Papamakarios,Rupert Kemp,Sushant Kafle,Tanya Grunina,Rishika Sinha,Alice Talbert,Abhimanyu Goyal,Diane Wu,Denese Owusu-Afriyie,Cosmo Du,Chloe Thornton,Jordi Pont-Tuset,Pradyumna Narayana,Jing Li,Sabaer Fatehi,John Wieting,Omar Ajmeri,Benigno Uria,Tao Zhu,Yeongil Ko,Laura Knight,Amélie Héliou,Ning Niu,Shane Gu,Chenxi Pang,Dustin Tran,Yeqing Li,Nir Levine,Ariel Stolovich,Norbert Kalb,Rebeca Santamaria-Fernandez,Sonam Goenka,Wenny Yustalim,Robin Strudel,Ali Elqursh,Balaji Lakshminarayanan,Charlie Deck,Shyam Upadhyay,Hyo Lee,Mike Dusenberry,Zonglin Li,Xuezhi Wang,Kyle Levin,Raphael Hoffmann,Dan Holtmann-Rice,Olivier Bachem,Summer Yue,Sho Arora,Eric Malmi,Daniil Mirylenka,Qijun Tan,Christy Koh,Soheil Hassas Yeganeh,Siim Põder,Steven Zheng,Francesco Pongetti,Mukarram Tariq,Yanhua Sun,Lucian Ionita,Mojtaba Seyedhosseini,Pouya Tafti,Ragha Kotikalapudi,Zhiyu Liu,Anmol Gulati,Jasmine Liu,Xinyu Ye,Bart Chrzaszcz,Lily Wang,Nikhil Sethi,Tianrun Li,Ben Brown,Shreya Singh,Wei Fan,Aaron Parisi,Joe Stanton,Chenkai Kuang,Vinod Koverkathu,Christopher A. Choquette-Choo,Yunjie Li,TJ Lu,Abe Ittycheriah,Prakash Shroff,Pei Sun,Mani Varadarajan,Sanaz Bahargam,Rob Willoughby,David Gaddy,Ishita Dasgupta,Guillaume Desjardins,Marco Cornero,Brona Robenek,Bhavishya Mittal,Ben Albrecht,Ashish Shenoy,Fedor Moiseev,Henrik Jacobsson,Alireza Ghaffarkhah,Morgane Rivière,Alanna Walton,Clément Crepy,Alicia Parrish,Yuan Liu,Zongwei Zhou,Clement Farabet,Carey Radebaugh,Praveen Srinivasan,Claudia van der Salm,Andreas Fidjeland,Salvatore Scellato,Eri Latorre-Chimoto,Hanna Klimczak-Plucińska,David Bridson,Dario de Cesare,Tom Hudson,Piermaria Mendolicchio,Lexi Walker,Alex Morris,Ivo Penchev,Matthew Mauger,Alexey Guseynov,Alison Reid,Seth Odoom,Lucia Loher,Victor Cotruta,Madhavi Yenugula,Dominik Grewe,Anastasia Petrushkina,Tom Duerig,Antonio Sanchez,Steve Yadlowsky,Amy Shen,Amir Globerson,Adam Kurzrok,Lynette Webb,Sahil Dua,Dong Li,Preethi Lahoti,Surya Bhupatiraju,Dan Hurt,Haroon Qureshi,Ananth Agarwal,Tomer Shani,Matan Eyal,Anuj Khare,Shreyas Rammohan Belle,Lei Wang,Chetan Tekur,Mihir Sanjay Kale,Jinliang Wei,Ruoxin Sang,Brennan Saeta,Tyler Liechty,Yi Sun,Yao Zhao,Stephan Lee,Pandu Nayak,Doug Fritz,Manish Reddy Vuyyuru,John Aslanides,Nidhi Vyas,Martin Wicke,Xiao Ma,Taylan Bilal,Evgenii Eltyshev,Daniel Balle,Nina Martin,Hardie Cate,James Manyika,Keyvan Amiri,Yelin Kim,Xi Xiong,Kai Kang,Florian Luisier,Nilesh Tripuraneni,David Madras,Mandy Guo,Austin Waters,Oliver Wang,Joshua Ainslie,Jason Baldridge,Han Zhang,Garima Pruthi,Jakob Bauer,Feng Yang,Riham Mansour,Jason Gelman,Yang Xu,George Polovets,Ji Liu,Honglong Cai,Warren Chen,XiangHai Sheng,Emily Xue,Sherjil Ozair,Adams Yu,Christof Angermueller,Xiaowei Li,Weiren Wang,Julia Wiesinger,Emmanouil Koukoumidis,Yuan Tian,Anand Iyer,Madhu Gurumurthy,Mark Goldenson,Parashar Shah,MK Blake,Hongkun Yu,Anthony Urbanowicz,Jennimaria Palomaki,Chrisantha Fernando,Kevin Brooks,Ken Durden,Harsh Mehta,Nikola Momchev,Elahe Rahimtoroghi,Maria Georgaki,Amit Raul,Sebastian Ruder,Morgan Redshaw,Jinhyuk Lee,Komal Jalan,Dinghua Li,Ginger Perng,Blake Hechtman,Parker Schuh,Milad Nasr,Mia Chen,Kieran Milan,Vladimir Mikulik,Trevor Strohman,Juliana Franco,Tim Green,Demis Hassabis,Koray Kavukcuoglu,Jeffrey Dean,Oriol Vinyals
http://arxiv.org/abs/2312.11805v1
Compressor summary: The report presents Gemini, a new family of multimodal models with various sizes for different applications, that show impressive performance across image, audio, video, and text understanding tasks.
Maria Antoniak,Aakanksha Naik,Carla S. Alvarado,Lucy Lu Wang,Irene Y. Chen
http://arxiv.org/abs/2312.11803v1
Compressor summary: The authors propose nine ethical principles for using large language models in healthcare applications, based on input from healthcare workers and birthing people.
Lang Yu,Qin Chen,Jie Zhou,Liang He
http://arxiv.org/abs/2312.11795v1
Compressor summary: MELO is a plug-in Model Editing method that dynamically adjusts LLM behavior using inner vector database indices for efficient and effective editing of various NLP tasks.
Zhaowei Lu,Li Jiang
http://arxiv.org/abs/2312.11793v1
Compressor summary: The text describes an improved keypoint-based algorithm for detecting copy-move forgery in images, using entropy images to increase the number of keypoints and a clustering algorithm to handle non-ideal distribution of grayscale values.
Yi Cheng,Wenge Liu,Jian Wang,Chak Tou Leong,Yi Ouyang,Wenjie Li,Xian Wu,Yefeng Zheng
http://arxiv.org/abs/2312.11792v1
Compressor summary: The Cooper framework is a novel dialogue system that coordinates multiple specialized agents to achieve complex goals, such as negotiation and emotional support, by focusing on different aspects of these goals.
Aadirupa Saha,Vitaly Feldman,Tomer Koren,Yishay Mansour
http://arxiv.org/abs/2312.11788v1
Compressor summary: The paper studies convex optimization with preference feedback, where it designs efficient algorithms with improved convergence rates for batched and multiway comparisons, and provides lower bounds to show their optimality.
Zhangdie Yuan,Andreas Vlachos
http://arxiv.org/abs/2312.11785v1
Compressor summary: The paper introduces a zero-shot method for fact-checking that uses semantic triples and external knowledge graphs to generalize to unseen data and domains.
Zihui Xue,Kumar Ashutosh,Kristen Grauman
http://arxiv.org/abs/2312.11782v1
Compressor summary: The authors propose VidOSC, a method that uses text and vision-language models to learn how objects change over time without manual annotations, and introduce an open-world benchmark for video object state changes called HowToChange.
Anaelia Ovalle,Ninareh Mehrabi,Palash Goyal,Jwala Dhamala,Kai-Wei Chang,Richard Zemel,Aram Galstyan,Yuval Pinter,Rahul Gupta
http://arxiv.org/abs/2312.11779v1
Compressor summary: This paper investigates how data scarcity affects large language models' misgendering of non-binary people and proposes a new tokenization method, PTP, to improve neopronoun consistency.
Yuze He,Yushi Bai,Matthieu Lin,Jenny Sheng,Yubin Hu,Qi Wang,Yu-Hui Wen,Yong-Jin Liu
http://arxiv.org/abs/2312.11774v1
Compressor summary: The authors propose a text-to-3D method that improves fine-grained view consistency and reduces floaters and empty spaces by incorporating multi-view image conditions into NeRF optimization.
Emily Kaczmarek,Olivier X. Miguel,Alexa C. Bowie,Robin Ducharme,Alysha L. J. Dingwall-Harvey,Steven Hawken,Christine M. Armour,Mark C. Walker,Kevin Dick
http://arxiv.org/abs/2312.11772v1
Compressor summary: CAManim is a novel XAI visualization method that animates CAM-based network activation maps through all layers to improve understanding of CNN predictions.
Alyssa R. Amod,Alexandra Smith,Pearly Joubert,Confidence Raymond,Dong Zhang,Udunna C. Anazodo,Dodzi Motchon,Tinashe E. M. Mutsvangwa,Sébastien Quetin
http://arxiv.org/abs/2312.11770v1
Compressor summary: The authors investigated how different training approaches affect tumor segmentation models' performance on poor-quality neuroimaging data from Sub-Saharan Africa and found that fine-tuning a model pretrained on high-quality data improved its results.
Ilias Diakonikolas,Daniel M. Kane,Jasper C. H. Lee,Thanasis Pittas
http://arxiv.org/abs/2312.11769v1
Compressor summary: The paper proposes efficient algorithms for clustering mixtures of bounded covariance distributions with a fine-grained separation assumption and shows their applicability to various settings, including high-dimensional log-concave distributions and robust clustering.
Rupali Bhati,Sai Krishna Gottipati,Clodéric Mars,Matthew E. Taylor
http://arxiv.org/abs/2312.11768v1
Compressor summary: The paper investigates the best type and curriculum of cooperative teammates to train a learning agent in multi-agent reinforcement learning for achieving task performance and overall team reward, finding that less skilled but pre-trained teammates and skill-decreasing curricula perform better.
Manabu Mukai,Hidekata Hontani,Tatsuya Yokota
http://arxiv.org/abs/2312.11763v1
Compressor summary: The paper presents a new optimization algorithm for tensor decomposition that supports different loss functions and models, and can be applied to various applications.
Jay Mahajan,Samuel Hum,Jack Henhapl,Diya Yunus,Matthew Gadbury,Emi Brown,Jeff Ginger,H. Chad Lane
http://arxiv.org/abs/2312.11761v1
Compressor summary: MineObserver 2.0 is an improved AI framework that helps assess the accuracy of learner-generated descriptions of Minecraft images related to science, using computer vision and natural language processing, and provides feedback to teachers.