arxiv compressed, 2023-12-20

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-20 generated by the compressor, my personal LLM-based project.


Weakly Supervised Open-Vocabulary Object Detection

Jianghang Lin,Yunhang Shen,Bingquan Wang,Shaohui Lin,Ke Li,Liujuan Cao

http://arxiv.org/abs/2312.12437v1

Compressor summary: The paper proposes WSOVOD, a framework for weakly supervised open-vocabulary object detection that can detect novel concepts and use diverse datasets with only image-level annotations.


A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Chaoyou Fu,Renrui Zhang,Haojia Lin,Zihan Wang,Timin Gao,Yongdong Luo,Yubo Huang,Zhengye Zhang,Longtian Qiu,Gaoxiang Ye,Yunhang Shen,Mengdan Zhang,Peixian Chen,Sirui Zhao,Xiawu Zheng,Shaohui Lin,Deqiang Jiang,Di Yin,Peng Gao,Ke Li,Xing Sun,Rongrong Ji

http://arxiv.org/abs/2312.12436v1

Compressor summary: The paper explores Gemini Pro's visual understanding abilities and compares it with GPT-4V and Sphinx, finding that Gemini can be a strong challenger to GPT-4V in multi-modal tasks.


Tracking Any Object Amodally

Cheng-Yen Hsieh,Tarasha Khurana,Achal Dave,Deva Ramanan

http://arxiv.org/abs/2312.12433v1

Compressor summary: The paper introduces a new benchmark dataset and a module that improves amodal perception for detection and tracking of occluded objects in videos.


On Inference Stability for Diffusion Models

Viet Nguyen,Giang Vu,Tung Nguyen Thanh,Khoat Than,Toan Tran

http://arxiv.org/abs/2312.12431v1

Compressor summary: The paper proposes a new loss function for Denoising Probabilistic Models that considers the correlation between timesteps, improving image quality and generalization.


The Endoscapes Dataset for Surgical Scene Segmentation, Object Detection, and Critical View of Safety Assessment: Official Splits and Benchmark

Aditya Murali,Deepak Alapatt,Pietro Mascagni,Armine Vardazaryan,Alain Garcia,Nariaki Okamoto,Guido Costamagna,Didier Mutter,Jacques Marescaux,Bernard Dallemagne,Nicolas Padoy

http://arxiv.org/abs/2312.12429v1

Compressor summary: The Endoscapes dataset contains LC videos with detailed annotations for assessing CVS and other aspects of the surgery, along with benchmarks and public access to the data and models.


SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process

Mengyu Wang,Henghui Ding,Jun Hao Liew,Jiajun Liu,Yao Zhao,Yunchao Wei

http://arxiv.org/abs/2312.12425v1

Compressor summary: Key points: - SegRefiner is a model-agnostic solution for enhancing object masks from different segmentation models - It uses a discrete diffusion process and predicts label and transition probabilities for each pixel - It performs well on various segmentation tasks, improving both metrics and details Summary: SegRefiner refines object masks using a discrete diffusion process that predicts label and transition probabilities for each pixel, achieving superior results on different segmentation tasks.


Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Shraman Pramanick,Guangxing Han,Rui Hou,Sayan Nag,Ser-Nam Lim,Nicolas Ballas,Qifan Wang,Rama Chellappa,Amjad Almahairi

http://arxiv.org/abs/2312.12423v1

Compressor summary: VistaLLM is a visual system that uses instruction-guided image tokenization and adaptive sampling to perform various vision-language tasks with single and multiple input images, while leveraging the CoinIt dataset and introducing the AttCoSeg task.


Scene-Conditional 3D Object Stylization and Composition

Jinghao Zhou,Tomas Jakab,Philip Torr,Christian Rupprecht

http://arxiv.org/abs/2312.12419v1

Compressor summary: Key points: - The paper proposes a framework to stylize 3D assets to fit into 2D scenes - The framework uses differentiable ray tracing and text-to-image diffusion models - The method can handle different environments and objects Summary: The paper presents a method to adapt 3D assets to match 2D scenes using ray tracing and image priors, enabling object stylization and realistic composition.


Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan,Tanzila Rahman,Kwang Moo Yi,Leonid Sigal

http://arxiv.org/abs/2312.12416v1

Compressor summary: The authors propose a method to obtain interpretable language prompts from text-to-image diffusion models by using a delayed projection scheme and focusing on later timesteps of the diffusion process.


On Alternating-time Temporal Logic, Hyperproperties, and Strategy Sharing

Raven Beutner,Bernd Finkbeiner

http://arxiv.org/abs/2312.12403v1

Compressor summary: HyperATLS$^*_S$ extends ATL$^*$ to compare outcomes of multiple strategic interactions and enforce shared strategies among agents, capturing important AI properties and enabling decidable model checking.


New classes of the greedy-applicable arm feature distributions in the sparse linear bandit problem

Koji Ichikawa,Shinji Ito,Daisuke Hatano,Hanna Sumita,Takuro Fukunaga,Naonori Kakimura,Ken-ichi Kawarabayashi

http://arxiv.org/abs/2312.12400v1

Compressor summary: The paper proposes new distribution classes that allow the greedy algorithm to be applied to more contextual bandit problems without strong assumptions on arm feature diversity.


Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou,Zhili Liu,Kai Chen,Lanqing Hong,Hang Xu,Aoxue Li,Dit-Yan Yeung,James T. Kwok,Yu Zhang

http://arxiv.org/abs/2312.12379v1

Compressor summary: MoCLE is a new MoE architecture that improves LVLMs' instruction-following abilities and generalization across various zero-shot vision-language tasks by activating task-specific model parameters based on instruction clusters and adding a universal expert.


Chasing Fairness in Graphs: A GNN Architecture Perspective

Zhimeng Jiang,Xiaotian Han,Chao Fan,Zirui Liu,Na Zou,Ali Mostafavi,Xia Hu

http://arxiv.org/abs/2312.12369v1

Compressor summary: The paper proposes a new graph neural network architecture called Fair Message Passing that improves fairness and prediction performance by explicitly using sensitive attributes and mitigating biases in node classification tasks.


SpokesBiz -- an Open Corpus of Conversational Polish

Piotr Pęzik,Sylwia Karasińska,Anna Cichosz,Łukasz Jałowiecki,Konrad Kaczyński,Małgorzata Krawentek,Karolina Walkusz,Paweł Wilk,Mariusz Kleć,Krzysztof Szklanny,Szymon Marszałkowski

http://arxiv.org/abs/2312.12364v1

Compressor summary: SpokesBiz is a large conversational Polish corpus with many uses in linguistics and ASR development.


CLIP-DINOiser: Teaching CLIP a few DINO tricks

Monika Wysoczańska,Oriane Siméoni,Michaël Ramamonjisoa,Andrei Bursuc,Tomasz Trzciński,Patrick Pérez

http://arxiv.org/abs/2312.12359v1

Compressor summary: The paper proposes a zero-shot open-vocabulary semantic segmentation method that improves MaskCLIP features with self-supervised localization priors and achieves state-of-the-art results on various benchmarks using only one pass through CLIP.


SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Action Segmentation

Feixiang Zhou,Zheheng Jiang,Huiyu Zhou,Xuelong Li

http://arxiv.org/abs/2312.12347v1

Compressor summary: The paper proposes a novel approach for semi-supervised action segmentation in long untrimmed videos using contrastive learning with intra- and inter-information variations exploration and neighbour consistency enforcement.


Avoiding Data Contamination in Language Model Evaluation: Dynamic Test Construction with Latest Materials

Yucheng Li,Frank Geurin,Chenghua Lin

http://arxiv.org/abs/2312.12343v1

Compressor summary: LatestEval is an automatic method that creates uncontaminated reading comprehension evaluations by using texts published within a recent time window, avoiding overlap with pre-trained language models' training corpora.


Engineering an Exact Pseudo-Boolean Model Counter

Suwei Yang,Kuldeep S. Meel

http://arxiv.org/abs/2312.12341v1

Compressor summary: The paper introduces PBCount, an exact Pseudo-Boolean model counter using knowledge compilation via algebraic decision diagrams, which can handle more instances than existing methods.


Scalable Geometric Fracture Assembly via Co-creation Space among Assemblers

Ruiyuan Zhang,Jiaxiang Liu,Zexi Li,Hao Dong,Jie Fu,Chao Wu

http://arxiv.org/abs/2312.12340v1

Compressor summary: Key points: - The paper proposes a co-creation space with multiple assemblers for geometric fracture assembly without semantic information - It introduces a novel loss function to address collision issues during the process - It outperforms existing frameworks on two datasets and has linear computational complexity, enhanced abstraction, and improved generalization Summary: The paper presents a new framework for assembling fragmented 3D objects without semantic information, using multiple assemblers and a novel loss function that improves performance and collision handling.


Value Explicit Pretraining for Goal-Based Transfer Learning

Kiran Lekkala,Henghui Bao,Sumedh Sontakke,Laurent Itti

http://arxiv.org/abs/2312.12339v1

Compressor summary: The proposed method enables learning task-independent representations from value function estimates that can transfer skills across different tasks regardless of their appearance and dynamics.


pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan,Sizhe Li,Andrea Tagliasacchi,Vincent Sitzmann

http://arxiv.org/abs/2312.12337v1

Compressor summary: PixelSplat is a model that learns to reconstruct 3D images from two images using Gaussian primitives, enabling fast and memory-efficient rendering and 3D reconstruction.


PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis

Efthymios Georgiou,Yannis Avrithis,Alexandros Potamianos

http://arxiv.org/abs/2312.12334v1

Compressor summary: PowMix is a novel embedding space regularizer for multimodal sentiment analysis that improves performance without sacrificing robustness or text dominance.


An Alternate View on Optimal Filtering in an RKHS

Benjamin Colburn,Jose C. Principe,Luis G. Sanchez Giraldo

http://arxiv.org/abs/2312.12318v1

Compressor summary: Key points: - Kernel Adaptive Filtering (KAF) are methods that search for functions in a Reproducing Kernel Hilbert Space (RKHS) for tasks like time series prediction and system identification. - KAF have a linear relationship between number of training samples and model size, which limits their use on large data sets. - The paper proposes a novel view of optimal filtering that preserves the time structure of a stochastic process in RKHS, using correntropy as a nonlinear functional. Summary: The paper presents a new approach to Kernel Adaptive Filtering that avoids the linear growth of model size with training samples by using correntropy to capture nonlinear mappings in a Reproducing Kernel Hilbert Space that respect the time structure of stochastic processes.


First qualitative observations on deep learning vision model YOLO and DETR for automated driving in Austria

Stefan Schoder

http://arxiv.org/abs/2312.12314v1

Compressor summary: The study explores how 2D-object detection algorithms like YOLO can improve road safety for autonomous driving in Austria by detecting and tracking objects in various conditions.


Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions

Yinhong Liu,Yixuan Su,Ehsan Shareghi,Nigel Collier

http://arxiv.org/abs/2312.12299v1

Compressor summary: The paper proposes Instruct-SCTG, a framework that uses instruction-tuned language models to generate structurally coherent articles in various domains with section-by-section alignment and measures discourse divergence using a new metric.


Toward enriched Cognitive Learning with XAI

Muhammad Suffian,Ulrike Kuhl,Jose M. Alonso-Moral,Alessandro Bogliolo

http://arxiv.org/abs/2312.12290v1

Compressor summary: The paper introduces an AI system (CL-XAI) that helps learners understand how AI models work and evaluates its effectiveness through human feedback.


Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Junxiang Wang,Guangji Bai,Wei Cheng,Zhengzhang Chen,Liang Zhao,Haifeng Chen

http://arxiv.org/abs/2312.12276v1

Compressor summary: POND is a novel prompt-based deep learning model for multi-source time series domain adaptation, focusing on extracting and leveraging domain-specific meta-data information.


Emergence of In-Context Reinforcement Learning from Noise Distillation

Ilya Zisman,Vladislav Kurenkov,Alexander Nikulin,Viacheslav Sinii,Sergey Kolesnikov

http://arxiv.org/abs/2312.12275v1

Compressor summary: $AD^{\epsilon}$ is a method that improves in-context learning with suboptimal human demonstrations by gradually introducing noise into them.


Intrinsic Image Diffusion for Single-view Material Estimation

Peter Kocsis,Vincent Sitzmann,Matthias Nießner

http://arxiv.org/abs/2312.12274v1

Compressor summary: Intrinsic Image Diffusion is a model that generates multiple realistic material explanations for indoor scenes, using probabilistic methods and learned priors from real images to overcome challenges in appearance decomposition.


VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

Chun-Mei Feng,Yang Bai,Tao Luo,Zhen Li,Salman Khan,Wangmeng Zuo,Xinxing Xu,Rick Siow Mong Goh,Yong Liu

http://arxiv.org/abs/2312.12273v1

Compressor summary: The paper proposes a VQA-based post-processing approach (VQA4CIR) to improve Composed Image Retrieval by detecting and correcting inconsistent results with captions using LLM and LVLM fine-tuning.


Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?

Gloria Araiza-Illan,Luke Meyer,Khiet P. Truong,Deniz Baskent

http://arxiv.org/abs/2312.12269v1

Compressor summary: The text proposes an automated DIN test using Kaldi-NL toolkit that can evaluate spoken responses without a human supervisor and evaluates its performance in two studies.


Inferring the relationship between soil temperature and the normalized difference vegetation index with machine learning

Steven Mortier,Amir Hamedpour,Bart Bussmann,Ruth Phoebe Tchana Wandji,Steven Latré,Bjarni D. Sigurdsson,Tom De Schepper,Tim Verdonck

http://arxiv.org/abs/2312.12258v1

Compressor summary: This study shows that soil temperature affects the start and peak of the growing season in subarctic grasslands, while other factors like air temperature, precipitation, and irradiance also play a role in vegetation phenology.


TaskFlex Solver for Multi-Agent Pursuit via Automatic Curriculum Learning

Jiayu Chen,Guosheng Li,Chao Yu,Xinyi Yang,Botian Xu,Huazhong Yang,Yu Wang

http://arxiv.org/abs/2312.12255v1

Compressor summary: The paper presents TaskFlex Solver, a reinforcement learning and curriculum learning method for multi-agent pursuit problems in complex environments, which achieves high capture rates and adapts to various task conditions.


Geo-located Aspect Based Sentiment Analysis (ABSA) for Crowdsourced Evaluation of Urban Environments

Demircan Tas,Rohit Priyadarshi Sanatani

http://arxiv.org/abs/2312.12253v1

Compressor summary: The authors develop a sentiment analysis model for urban environments that can extract specific aspects and their sentiments from crowdsourced reviews of public parks, and show its improvement in prediction accuracy using BERT with LCF.


ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room

Idris Hamoud,Muhammad Abdullah Jamal,Vinkle Srivastav,Didier Mutter,Nicolas Padoy,Omid Mohareri

http://arxiv.org/abs/2312.12250v1

Compressor summary: The paper introduces a new object-based approach to recognize surgical activities in the OR using geometric arrangements between clinicians and devices, improving efficiency and performance with less data.


MDD-UNet: Domain Adaptation for Medical Image Segmentation with Theoretical Guarantees, a Proof of Concept

Asbjørn Munk,Ao Ma,Mads Nielsen

http://arxiv.org/abs/2312.12246v1

Compressor summary: The MDD-UNet is an unsupervised domain adaptation framework for U-Nets that improves image segmentation performance across different data characteristics using a theory-based approach with theoretical guarantees.


GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Mehran Kazemi,Hamidreza Alvari,Ankit Anand,Jialin Wu,Xi Chen,Radu Soricut

http://arxiv.org/abs/2312.12241v1

Compressor summary: The paper evaluates vision language models' mathematical reasoning abilities on geometry problems and finds they struggle with higher-depth problems requiring long chains of reasoning.


Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

Yue Duan,Zhen Zhao,Lei Qi,Luping Zhou,Lei Wang,Yinghuan Shi

http://arxiv.org/abs/2312.12237v1

Compressor summary: The paper proposes a method to improve semi-supervised learning for fine-grained visual classification by selecting soft labels based on class transition tracking and confidence.


Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

Xinying Zou,Samir M. Perlaza,Iñaki Esnaola,Eitan Altman

http://arxiv.org/abs/2312.12236v1

Compressor summary: The paper introduces a new tool to measure generalization in ML algorithms using the worst-case probability measure, which is a Gibbs probability measure.


Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Lingjun Zhang,Xinyuan Chen,Yaohui Wang,Yue Lu,Yu Qiao

http://arxiv.org/abs/2312.12232v1

Compressor summary: Diff-Text is a training-free framework that uses Stable Diffusion to generate realistic images with text in any language, improving text recognition and background blending.


HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Gaoge Han,Shaoli Huang,Mingming Gong,Jinglei Tang

http://arxiv.org/abs/2312.12227v1

Compressor summary: HuTuMotion is a new method that uses limited human feedback to improve the quality of generating natural human motions by adapting the prior distribution in latent diffusion models.


On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

Satoki Ishikawa,Ryo Karakida

http://arxiv.org/abs/2312.12226v1

Compressor summary: The study proposes a specific parameterization for second-order optimization that enhances feature learning and allows transferring hyperparameters across different network widths.


Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries

Alonso Urbano,David W. Romero

http://arxiv.org/abs/2312.12223v1

Compressor summary: The paper proposes a method to detect different levels of symmetry in data without labels, improving generalization and robustness of models.


EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Junjue Wang,Zhuo Zheng,Zihang Chen,Ailong Ma,Yanfei Zhong

http://arxiv.org/abs/2312.12222v1

Compressor summary: The text introduces EarthVQA, a multi-modal multi-task dataset for advancing relational reasoning in Earth vision, and SOBA, a framework that leverages object semantics and relations for VQA.


Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data

Tobias Hyrup,Anton Danholt Lautrup,Arthur Zimek,Peter Schneider-Kamp

http://arxiv.org/abs/2312.12216v1

Compressor summary: The text discusses the need for a common framework to evaluate the privacy of synthetic data in healthcare, proposing four principles (CAIR) and a rubric to assess existing metrics.


Identification of Causal Structure in the Presence of Missing Data with Additive Noise Model

Jie Qiao,Zhengming Chen,Jianhua Yu,Ruichu Cai,Zhifeng Hao

http://arxiv.org/abs/2312.12206v1

Compressor summary: The paper investigates how to learn causal structure from data with self-masking missingness, which makes existing methods fail, using an additive noise model, and proposes a practical algorithm based on theoretical results.


Mask Grounding for Referring Image Segmentation

Yong Xien Chng,Henry Zheng,Yizeng Han,Xuchong Qiu,Gao Huang

http://arxiv.org/abs/2312.12198v1

Compressor summary: The text introduces MagNet, a novel method for referring image segmentation that uses mask grounding and cross-modal alignment to improve visual grounding and correspondence between language and images.


Gaussian process learning of nonlinear dynamics

Dongwei Ye,Mengwu Guo

http://arxiv.org/abs/2312.12193v1

Compressor summary: The paper proposes a new Bayesian method to learn nonlinear dynamics from time series data without explicitly calculating state derivatives, which can improve the accuracy of learned models when data is scarce or noisy.


CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

Chenyu Sun,Hangwei Qian,Chunyan Miao

http://arxiv.org/abs/2312.12191v1

Compressor summary: The paper proposes a Curiosity-driven Unsupervised Data Collection method to improve multi-task offline reinforcement learning by expanding feature space using adaptive temporal distances and reaching higher-quality data.


Poincaré Differential Privacy for Hierarchy-aware Graph Embedding

Yuecen Wei,Haonan Yuan,Xingcheng Fu,Qingyun Sun,Hao Peng,Xianxian Li,Chunming Hu

http://arxiv.org/abs/2312.12183v1

Compressor summary: PoinDP is a framework that uses hyperbolic geometry to protect hierarchical graph data from inference attacks and privacy leaks, while preserving performance in node classification tasks.


All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes

Jose L. Gómez,Manuel Silva,Antonio Seoane,Agnès Borrás,Mario Noriega,Germán Ros,Jose A. Iglesias-Guitian,Antonio M. López

http://arxiv.org/abs/2312.12176v1

Compressor summary: UrbanSyn is a high-quality synthetic urban driving dataset that enhances existing ones for unsupervised domain adaptation in image semantic segmentation.


Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

Zhihang Liu,Jun Li,Hongtao Xie,Pandeng Li,Jiannan Ge,Sun-Ao Liu,Guoqing Jin

http://arxiv.org/abs/2312.12155v1

Compressor summary: MESM is a novel framework for improving video moment retrieval by enhancing video and text features to balance the modality gap between videos and queries.


Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Lingling Xu,Haoran Xie,Si-Zhao Joe Qin,Xiaohui Tao,Fu Lee Wang

http://arxiv.org/abs/2312.12148v1

Compressor summary: This paper reviews parameter efficient fine-tuning (PEFT) methods for pretrained language models, discussing their applications and future directions, and conducting experiments to understand their effectiveness.


OVD-Explorer:Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu,Zhi Wang,Yan Zheng,Jianye Hao,Chenjia Bai,Junjie Ye,Zhen Wang,Haiyin Piao,Yang Sun

http://arxiv.org/abs/2312.12145v1

Compressor summary: OVD-Explorer is a new method for noisy environment exploration in continuous control RL that balances optimism with over-exploration mitigation.


M-BEV: Masked BEV Perception for Robust Autonomous Driving

Siran Chen,Yue Ma,Yu Qiao,Yali Wang

http://arxiv.org/abs/2312.12144v1

Compressor summary: The paper proposes a Masked Bird-Eye-View (M-BEV) perception framework that improves robustness to failed camera views by randomly masking and reconstructing them in end-to-end training, achieving significant performance gains on the NuScenes benchmark.


Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items

Akshat Kishore Shrivastava,Tapan Kumar Gandhi

http://arxiv.org/abs/2312.12143v1

Compressor summary: The paper proposes a new method for waste classification that simulates nystagmus, a biological phenomenon affecting human vision, improving the accuracy of the Vision Transformer model by 2%.


FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang,Dezhi Peng,Yuxin Kong,Yuyi Zhang,Cong Yao,Lianwen Jin

http://arxiv.org/abs/2312.12142v1

Compressor summary: FontDiffuser is a diffusion-based method that improves font generation by combining global and local content cues, handling large style variations, and learning style representation from images.


Exploring the Residual Stream of Transformers

Zeping Yu,Kailai Yang,Zhiwei Liu,Sophia Ananiadou

http://arxiv.org/abs/2312.12141v1

Compressor summary: This paper explores residual connections in transformers to better understand how they store and merge knowledge for language modeling, and proposes a method to analyze the influence of previous layers.


Best Arm Identification with Fixed Budget: A Large Deviation Perspective

Po-An Wang,Ruo-Chun Tzeng,Alexandre Proutiere

http://arxiv.org/abs/2312.12137v1

Compressor summary: The paper studies how to identify the best arm in stochastic Multi-Armed Bandits with a fixed sampling budget, and proposes a new adaptive algorithm that outperforms existing ones using Large Deviation techniques.


Object-Aware Domain Generalization for Object Detection

Wooju Lee,Dasol Hong,Hyungtae Lim,Hyun Myung

http://arxiv.org/abs/2312.12133v1

Compressor summary: The paper proposes an object-aware domain generalization method for single-domain generalization in object detection, using data augmentation and training strategy to improve object localization and classification.


Probabilistic Prediction of Longitudinal Trajectory Considering Driving Heterogeneity with Interpretability

Shuli Wang,Kun Gao,Lanfang Zhang,Yang Liu,Lei Chen

http://arxiv.org/abs/2312.12123v1

Compressor summary: The paper proposes a framework that combines Mixture Density Networks and LSTM-based encoder-decoder networks to predict vehicle trajectories considering driver heterogeneity, achieving better predictions than existing models.


ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields

Xiang Feng,Yongbo He,Yubo Wang,Chengkai Wang,Zhenzhong Kuang,Jiajun Ding,Feiwei Qin,Jun Yu,Jianping Fan

http://arxiv.org/abs/2312.12122v1

Compressor summary: The paper proposes a zero-shot super-resolution framework for NeRF that uses internal learning to synthesize high-quality high-resolution novel views from low-resolution training data, without requiring external high-resolution data or additional scene information.


Mindful Explanations: Prevalence and Impact of Mind Attribution in XAI Research

Susanne Hindennach,Lei Shi,Filip Miletić,Andreas Bulling

http://arxiv.org/abs/2312.12119v1

Compressor summary: The study examines how mind-attributing explanations in AI research affect perceptions of AI awareness and responsibility, finding that they can conceal AI responsibility from users.


Shaping Up SHAP: Enhancing Stability through Layer-Wise Neighbor Selection

Gwladys Kelodjou,Laurence Rozé,Véronique Masson,Luis Galárraga,Romaric Gaudel,Maurice Tchuente,Alexandre Termier

http://arxiv.org/abs/2312.12115v1

Compressor summary: The paper proposes two improvements for Kernel SHAP, a post-hoc explainability method for black-box machine learning models: (1) making it fully stable by changing its neighbor selection procedure, and (2) using the coalitions of Layer 1 for faster and more meaningful feature attribution.


Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes

Nabeel Seedat,Nicolas Huynh,Boris van Breugel,Mihaela van der Schaar

http://arxiv.org/abs/2312.12112v1

Compressor summary: CLLM is a method that uses large language models to generate and curate high-quality augmented datasets for machine learning tasks in low-data settings, improving performance compared to conventional generators.


Knowledge Graph Error Detection with Contrastive Confidence Adaption

Xiangyu Liu,Yang Liu,Wei Hu

http://arxiv.org/abs/2312.12108v1

Compressor summary: The paper proposes a new model, CCA, that uses both textual and graph information to better detect errors in knowledge graphs, especially noisy triplets with similar correct ones.


I-CEE: Tailoring Explanations of Image Classifications Models to User Expertise

Yao Rong,Peizhu Qian,Vaibhav Unhelkar,Enkelejda Kasneci

http://arxiv.org/abs/2312.12102v1

Compressor summary: I-CEE is a framework for explaining image classification models to users based on their expertise level, using example images and local explanations tailored to each user.


Domain Generalization in LiDAR Semantic Segmentation Leveraged by Density Discriminative Feature Embedding

Jaeyeul Kim,Jungwan Woo,Jeonghoon Kim,Sunghoon Im

http://arxiv.org/abs/2312.12098v1

Compressor summary: The paper introduces DDFE, a module that extracts features related to LiDAR point cloud density and improves domain generalization for 3D perception tasks.


DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos

Chunjie Luo,Fei Luo,Yusen Wang,Enxu Zhao,Chunxia Xiao

http://arxiv.org/abs/2312.12096v1

Compressor summary: DLCA-Recon is a method to create human avatars from monocular videos, using physical connection information and dynamic deformation fields to accurately model loose clothing movement.


GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction

Haodong Yan,Zhiming Hu,Syn Schmitt,Andreas Bulling

http://arxiv.org/abs/2312.12090v1

Compressor summary: GazeMoDiff is a new model that uses eye gaze to guide the generation of realistic human body motions for virtual reality applications, outperforming existing methods.


Learning Subject-Aware Cropping by Outpainting Professional Photos

James Hong,Lu Yuan,Michaël Gharbi,Matthew Fisher,Kayvon Fatahalian

http://arxiv.org/abs/2312.12080v1

Compressor summary: GenCrop is a weakly-supervised method that learns subject-aware cropping from stock images and text-to-image diffusion models, achieving competitive results with supervised methods.


PICNN: A Pathway towards Interpretable Convolutional Neural Networks

Wengang Guo,Jiayi Yang,Huilin Yin,Qijun Chen,Wei Ye

http://arxiv.org/abs/2312.12068v1

Compressor summary: PICNN is a novel method that improves the interpretability and performance of CNNs by clustering filters into class-specific groups using Bernoulli sampling and a reparameterization trick.


PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping

Nai-Chieh Huang,Ping-Chun Hsieh,Kuo-Hao Ho,I-Chen Wu

http://arxiv.org/abs/2312.12065v1

Compressor summary: The paper provides the first global convergence results for a variant of the Proximal Policy Optimization algorithm with clipping, using new analysis techniques and showing that the clipping range affects only the convergence constant.


MPI Planar Correction of Pulse Based ToF Cameras

Marian-Leontin Pop,Levente Tamas

http://arxiv.org/abs/2312.12064v1

Compressor summary: The paper proposes using Feature Pyramid Networks to reduce multipath inference artifacts in pulse-based ToF cameras, improving surface detection on planar surfaces.


Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering

Lena G. M. Bauer,Collin Leiber,Christian Böhm,Claudia Plant

http://arxiv.org/abs/2312.12050v1

Compressor summary: The paper proposes a sigmoid function to replace look-up tables for transforming Dip-values to Dip-p-values, improving computation speed and integration with learning algorithms.


XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX

Alexander Nikulin,Vladislav Kurenkov,Ilya Zisman,Artem Agarkov,Viacheslav Sinii,Sergey Kolesnikov

http://arxiv.org/abs/2312.12044v1

Compressor summary: XLand-MiniGrid is a JAX-based library for meta-reinforcement learning research with scalable grid-worlds and diverse tasks.


Pose2Gaze: Generating Realistic Human Gaze Behaviour from Full-body Poses using an Eye-body Coordination Model

Zhiming Hu,Jiahui Xu,Syn Schmitt,Andreas Bulling

http://arxiv.org/abs/2312.12042v1

Compressor summary: The paper analyzes human eye and body movement coordination during everyday activities, and proposes Pose2Gaze, a model that generates realistic eye movements from full-body poses, outperforming existing methods.


Founder-GPT: Self-play to evaluate the Founder-Idea fit

Sichao Xiong,Yigit Ihlamur

http://arxiv.org/abs/2312.12037v1

Compressor summary: The research develops a new way to evaluate how well startup founders match their ideas using advanced language models, suggesting that each idea's success depends on the founder's background.


Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

Jiachun Pan,Hanshu Yan,Jun Hao Liew,Jiashi Feng,Vincent Y. F. Tan

http://arxiv.org/abs/2312.12030v1

Compressor summary: SAG is a training-free guidance technique for diffusion models that uses symplectic adjoint method to accurately estimate clean images and generate high-quality images and videos.


EyePreserve: Identity-Preserving Iris Synthesis

Siamul Karim Khan,Patrick Tinsley,Mahsa Mitcheff,Patrick Flynn,Kevin W. Bowyer,Adam Czajka

http://arxiv.org/abs/2312.12028v1

Compressor summary: The paper presents a method to create realistic iris images that change size and preserve identity, improving iris recognition and forensic analysis.


Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction

DaLuo,Yanglei Gan,Rui Hou,Run Lin,Qiao Liu,Yuxiang Cai,Wannian Gao

http://arxiv.org/abs/2312.12021v1

Compressor summary: The paper proposes a novel framework for few-shot relation extraction that uses sentence-anchored and label-anchored contrastive losses to learn robust and uniform representations from incomplete instance-label pairs.


Flexible categorization using formal concept analysis and Dempster-Shafer theory

Marcel Boersma,Krishna Manoorkar,Alessandra Palmigiano,Mattia Panettiere,Apostolos Tzimoulis,Nachoem Wijnberg

http://arxiv.org/abs/2312.12010v1

Compressor summary: The paper presents a method to categorize business processes using bipartite graphs, formal concept analysis, and Dempster-Shafer theory to obtain explainable results for auditing purposes.


Active Preference Inference using Language Models and Probabilistic Reasoning

Top Piriyakulkij,Volodymyr Kuleshov,Kevin Ellis

http://arxiv.org/abs/2312.12009v1

Compressor summary: The paper proposes an algorithm to help large language models ask better questions and infer user preferences more efficiently in interactive systems, improving task performance and reducing user interactions.


Can ChatGPT be Your Personal Medical Assistant?

Md. Rafiul Biswas,Ashhadul Islam,Zubair Shah,Wajdi Zaghouani,Samir Brahim Belhaouari

http://arxiv.org/abs/2312.12006v1

Compressor summary: This study evaluates a fine-tuned ChatGPT model as a personal medical assistant in Arabic, using online datasets and human evaluation metrics.


Diffusing More Objects for Semi-Supervised Domain Adaptation with Less Labeling

Leander van den Heuvel,Gertjan Burghouts,David W. Zhang,Gwenn Englebienne,Sabina B. van Rooij

http://arxiv.org/abs/2312.12000v1

Compressor summary: The text proposes a diffusion model that refines bounding boxes for object detection, improves performance, and uses the results for semi-supervised learning without human involvement.


Coreference Graph Guidance for Mind-Map Generation

Zhuowei Zhang,Mengting Hu,Yinhao Bai,Zhen Zhang

http://arxiv.org/abs/2312.11997v1

Compressor summary: The paper proposes a coreference-guided mind-map generation network (CMGN) that uses a coreference graph to capture structural information and improve the understanding of document logic and semantics, achieving better performance than existing methods.


Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul,Konpat Preechakul,Emre Aksan,Thabo Beeler,Supasorn Suwajanakorn,Siyu Tang

http://arxiv.org/abs/2312.11994v1

Compressor summary: DNO is a new method that uses existing motion diffusion models to optimize motion-related tasks without training new models or relying on complex algorithms.


Climate Change from Large Language Models

Hongyin Zhu,Prayag Tiwari

http://arxiv.org/abs/2312.11985v1

Compressor summary: The paper introduces a framework to assess climate crisis knowledge in large language models using diverse questions and comprehensive metrics, revealing gaps in their up-to-date information.


Fluctuation-based Adaptive Structured Pruning for Large Language Models

Yongqi An,Xu Zhao,Tao Yu,Ming Tang,Jinqiao Wang

http://arxiv.org/abs/2312.11983v1

Compressor summary: FLAP is a novel retraining-free structured pruning framework for large language models that reduces storage, enhances inference speed, and outperforms existing methods without retraining.


When Model Meets New Normals: Test-time Adaptation for Unsupervised Time-series Anomaly Detection

Dongmin Kim,Sunghyun Park,Jaegul Choo

http://arxiv.org/abs/2312.11976v1

Compressor summary: The paper proposes a method for detecting anomalies in time-series data that adapts to changing normalities over time and improves performance on real-world benchmarks.


Continual Learning: Forget-free Winning Subnetworks for Video Representations

Haeyong Kang,Jaehong Yoon,Sung Ju Hwang,Chang D. Yoo

http://arxiv.org/abs/2312.11973v1

Compressor summary: The paper introduces Winning Subnetworks (WSN), a method that uses reused weights in dense networks to improve continual learning, and proposes the Fourier Subneural Operator (FSO) to address limitations in video incremental learning.


Expressive Forecasting of 3D Whole-body Human Motions

Pengxiang Ding,Qiongjie Cui,Min Zhang,Mengyuan Liu,Haofan Wang,Donglin Wang

http://arxiv.org/abs/2312.11972v1

Compressor summary: The text proposes a novel framework that predicts both coarse and fine-grained human activities collaboratively, achieving state-of-the-art performance on a large-scale benchmark.


Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives

Chen Gao,Xiaochong Lan,Nian Li,Yuan Yuan,Jingtao Ding,Zhilun Zhou,Fengli Xu,Yong Li

http://arxiv.org/abs/2312.11970v1

Compressor summary: This paper reviews how large language models can improve agent-based modeling and simulation, exploring their challenges and applications in various domains.


GroupMixNorm Layer for Learning Fair Models

Anubha Pandey,Aditi Rai,Maneet Singh,Deepak Bhatt,Tanmoy Bhowmik

http://arxiv.org/abs/2312.11969v1

Compressor summary: The paper proposes a new method to reduce bias in automated prediction algorithms by mixing feature statistics across different groups based on protected attributes, improving fairness metrics with minimal accuracy loss.


Context Disentangling and Prototype Inheriting for Robust Visual Grounding

Wei Tang,Liang Li,Xuejing Liu,Lu Jin,Jinhui Tang,Zechao Li

http://arxiv.org/abs/2312.11967v1

Compressor summary: Our proposed framework for visual grounding uses context disentangling and prototype inheriting to improve discrimination and generalization, achieving state-of-the-art results on standard and open-vocabulary scenes.


Vertical Symbolic Regression

Nan Jiang,Md Nasim,Yexiang Xue

http://arxiv.org/abs/2312.11955v1

Compressor summary: Vertical Symbolic Regression is a method to speed up AI-driven scientific discovery by fitting simple expressions involving a few independent variables at a time and gradually adding more variables, resulting in a significantly smaller search space than horizontal approaches.


Adversarial AutoMixup

Huafeng Qin,Xin Jin,Yun Jiang,Mounim A. El-Yacoubi,Xinbo Gao

http://arxiv.org/abs/2312.11954v1

Compressor summary: AdAutomixup is an adversarial automatic mixup augmentation method that generates diverse and challenging mixed samples to improve image classification accuracy by alternatively optimizing a mixed example generator and a target classifier.


Automatic Parameter Selection for Non-Redundant Clustering

Collin Leiber,Dominik Mautz,Claudia Plant,Christian Böhm

http://arxiv.org/abs/2312.11952v1

Compressor summary: The paper presents a framework using the Minimum Description Length Principle (MDL) to automatically find the number of subspaces and clusters in high-dimensional datasets, with efficient procedures for splitting, merging, and outlier detection.


Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Rui Liu,Yifan Hu,Yi Ren,Xiang Yin,Haizhou Li

http://arxiv.org/abs/2312.11947v1

Compressor summary: The paper proposes ECSS, a novel model that enhances emotion understanding using a heterogeneous graph-based mechanism and achieves emotion rendering with contrastive learning, improving emotional CSS performance.


Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting

Haowei Du,Dingyu Zhang,Chen Li,Yang Li,Dongyan Zhao

http://arxiv.org/abs/2312.11945v1

Compressor summary: The authors propose a new framework for incomplete utterance rewriting that captures multi-granularity semantic information, selects relevant context, and constructs an edit matrix, achieving state-of-the-art results on two benchmark datasets.


Time-Series Contrastive Learning against False Negatives and Class Imbalance

Xiyuan Jin,Jing Wang,Lei Liu,Youfang Lin

http://arxiv.org/abs/2312.11939v1

Compressor summary: This study introduces a modification to time-series contrastive learning that addresses false negatives and class imbalance issues, improving representation learning for minority classes using instance graphs and semi-supervised classification.


DMT: Comprehensive Distillation with Multiple Self-supervised Teachers

Yuang Liu,Jing Wang,Qiang Zhou,Fan Wang,Jun Wang,Wei Zhang

http://arxiv.org/abs/2312.11938v1

Compressor summary: DMT is a pretraining method that uses multiple self-supervised models to improve general visual representations and achieve state-of-the-art results on classification and dense tasks.


Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

Yuyang Xia,Shuncheng Liu,Quanlin Yu,Liwei Deng,You Zhang,Han Su,Kai Zheng

http://arxiv.org/abs/2312.11935v1

Compressor summary: AUTO is a deep reinforcement learning framework for autonomous vehicles that handles complex environments and considers the impact on surrounding vehicles, leading to improved safety, efficiency, and comfort.


Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Wei Chen,Zhiyi Huang,Ruichu Cai,Zhifeng Hao,Kun Zhang

http://arxiv.org/abs/2312.11934v1

Compressor summary: The text describes a novel method to identify causal edges between observed variables using higher-order cumulants and latent variable influence, with an asymmetry criterion to determine the causal direction.


Dynamic Frequency Domain Graph Convolutional Network for Traffic Forecasting

Yujie Li,Zezhi Shao,Yongjun Xu,Qiang Qiu,Zhaogang Cao,Fei Wang

http://arxiv.org/abs/2312.11933v1

Compressor summary: The paper proposes a novel dynamic frequency domain graph convolution network (DFDGCN) for traffic prediction that captures spatial dependencies, mitigates time-shift effects, and handles noise in data.


Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery

Pengwei Yan,Kaisong Song,Zhuoren Jiang,Yangyang Kang,Tianqianjin Lin,Changlong Sun,Xiaozhong Liu

http://arxiv.org/abs/2312.11927v1

Compressor summary: DGPM is a new method that improves graph pretraining by discovering and using significant graph motifs for better representation learning and transferability.


Big Learning Expectation Maximization

Yulai Cong,Sijia Li

http://arxiv.org/abs/2312.11926v1

Compressor summary: The Big Learning EM algorithm improves mixture model training by using a foundation model approach to avoid bad local optima and achieve optimal results.


IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

Xiaomeng Yang,Zhi Qiao,Yu Zhou,Weiping Wang

http://arxiv.org/abs/2312.11923v1

Compressor summary: This paper presents a fast and accurate scene text recognition method using a parallel and iterative decoder with an easy-first strategy and discrete diffusion for bidirectional context exploration.


Relation-Aware Question Answering for Heterogeneous Knowledge Graphs

Haowei Du,Quzhe Huang,Chen Li,Chen Zhang,Yang Li,Dongyan Zhao

http://arxiv.org/abs/2312.11922v1

Compressor summary: The paper proposes a new method for answering questions in knowledge graphs using dual relation graphs that improve the representations of entities and relations and achieve better performance.


External Knowledge Augmented Polyphone Disambiguation Using Large Language Model

Chen Li

http://arxiv.org/abs/2312.11920v1

Compressor summary: The paper proposes a novel TTS method using large language models and prompt learning to disambiguate polyphonic characters in Mandarin Chinese.


A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library

Ganesh Bikshandi,Jay Shah

http://arxiv.org/abs/2312.11918v1

Compressor summary: The paper presents an optimized implementation of the FlashAttention-2 attention algorithm on NVIDIA Hopper GPUs using the CUTLASS library, achieving 20-50% higher FLOPs/s than the previous Ampere version.


EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping

Weipeng Guan,Peiyu Chen,Huibin Zhao,Yu Wang,Peng Lu

http://arxiv.org/abs/2312.11911v1

Compressor summary: Event cameras use motion-activated sensors to track 6 DoF pose and reconstruct 3D scenes with high accuracy, robustness, and efficiency using a novel event-based hybrid tracking framework called EVI-SAM.


Short-Term Multi-Horizon Line Loss Rate Forecasting of a Distribution Network Using Attention-GCN-LSTM

Jie Liu,Yijia Cao,Yong Li,Yixiu Guo,Wei Deng

http://arxiv.org/abs/2312.11898v1

Compressor summary: The study introduces a new method (Attention-GCN-LSTM) that uses graph convolutional networks, long short-term memory, and attention to predict line loss rates accurately across multiple horizons in distribution networks.


Text-Conditioned Resampler For Long Form Video Understanding

Bruno Korbar,Yongqin Xian,Alessio Tonioni,Andrew Zisserman,Federico Tombari

http://arxiv.org/abs/2312.11897v1

Compressor summary: The paper introduces a text-conditioned video resampler module that uses a frozen visual encoder and a large language model to process long videos for various tasks, achieving state-of-the-art results on several benchmarks.


3D-LFM: Lifting Foundation Model

Mosam Dabhi,Laszlo A. Jeni,Simon Lucey

http://arxiv.org/abs/2312.11894v1

Compressor summary: The 3D Lifting Foundation Model (3D-LFM) is a transformer-based approach that can reconstruct various object classes from 2D landmarks, overcoming limitations of traditional methods and handling occlusions and perspectives.


Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction

Unggi Lee,Sungjun Yoon,Joon Seo Yun,Kyoungsoo Park,YoungHoon Jung,Damji Stratton,Hyeoncheol Kim

http://arxiv.org/abs/2312.11890v1

Compressor summary: The paper introduces new methods to improve knowledge tracing models by considering question and concept difficulty levels using contrastive learning and a large language model for prediction.


ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Ziqian Zeng,Yihuai Hong,Hongliang Dai,Huiping Zhuang,Cen Chen

http://arxiv.org/abs/2312.11882v1

Compressor summary: ConsistentEE is a reinforcement learning-based early exiting method for efficient inference that ensures correct prediction by one internal classifier and adapts to instance difficulty.


Punctuation restoration Model and Spacing Model for Korean Ancient Document

Taehong Jang,Joonmo Ahn,Sojung Lucia Kim

http://arxiv.org/abs/2312.11881v1

Compressor summary: The authors developed models to predict punctuation and spacing for Korean historical texts, achieving good results and enabling fast inference on low-performance GPUs.


Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas

Alperen Enes Bayar,Ufuk Uyan,Elif Toprak,Cao Yuheng,Tang Juncheng,Ahmet Alp Kindiroglu

http://arxiv.org/abs/2312.11880v1

Compressor summary: Key points: - The paper studies how RandLA-Net, a neural network, can segment 3D point cloud data in urban environments - It uses transfer learning and class remapping to overcome data scarcity and adapt to specific cities in China - It achieves over 80% F1 score for each city, showing its effectiveness Summary: The paper applies RandLA-Net with transfer learning and class remapping to segment 3D point cloud data in urban areas of three Chinese cities, achieving high accuracy.


Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Weixi Song,Zuchao Li,Lefei Zhang,Hai Zhao,Bo Du

http://arxiv.org/abs/2312.11875v1

Compressor summary: The paper investigates the loss landscape transition in fine-tuning pre-trained models, proposes Sparse Increment Fine-Tuning (SIFT) algorithm to exploit sparsity in gradients for efficient adaptation, and shows its effectiveness on various tasks.


Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning

Yanqi Ge,Qiang Nie,Ye Huang,Yong Liu,Chengjie Wang,Feng Zheng,Wen Li,Lixin Duan

http://arxiv.org/abs/2312.11872v1

Compressor summary: The paper proposes Semantic Anchor Regularization, a method that uses pre-defined class anchors to guide feature learning and achieve compactness within classes and separability between classes, while avoiding biases from long-tailed data.


A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT

Zizhong Li,Haopeng Zhang,Jiawei Zhang

http://arxiv.org/abs/2312.11870v1

Compressor summary: The paper proposes an augmented fake news dataset using ChatGPT for fact-checking, which can help reduce bias and improve detection compared to relying on human journalists alone.


Point Cloud Part Editing: Segmentation, Generation, Assembly, and Selection

Kaiyi Zhang,Yang Chen,Ximing Yang,Weizhong Zhang,Cheng Jin

http://arxiv.org/abs/2312.11867v1

Compressor summary: The text proposes a four-stage process for point cloud part editing and introduces SGAS, a model that uses feature disentanglement and constraint strategies to improve diversity, fidelity, and quality.


Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Weiyu Ma,Qirui Mi,Xue Yan,Yuqiao Wu,Runji Lin,Haifeng Zhang,Jun Wang

http://arxiv.org/abs/2312.11865v1

Compressor summary: Key points: - The paper explores using large language model agents for StarCraft II, a complex RTS game. - It develops a textual environment, TextStarCraft II, and a Chain of Summarization method to leverage LLMs' reasoning abilities. - It evaluates the performance and knowledge of LLM agents in the game and compares them with human experts and built-in AI. - The results show that LLM agents can achieve similar or better performance than average players and defeat the built-in AI. Summary: The paper presents a textual StarCraft II environment and a summarization method to enable large language model agents to play the game effectively. It shows that LLM agents can master StarCraft II knowledge and outperform human experts and built-in AI in the game.


Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Di Wu,Yuling Jiao,Li Shen,Haizhao Yang,Xiliang Lu

http://arxiv.org/abs/2312.11863v1

Compressor summary: This paper analyzes the theoretical guarantees of deep reinforcement learning in offline decision-making scenarios, considering general neural network approximation and various data properties.


Topo-MLP : A Simplicial Network Without Message Passing

Karthikeyan Natesan Ramamurthy,Aldo Guzmán-Sáenz,Mustafa Hajij

http://arxiv.org/abs/2312.11862v1

Compressor summary: Topo-MLP is a fast and robust MLP-based algorithm to learn representations in higher order network models using a novel HONC loss that implicitly incorporates the simplicial structure.


SimCalib: Graph Neural Network Calibration based on Similarity between Nodes

Boshi Tang,Zhiyong Wu,Xixin Wu,Qiaochu Huang,Jun Chen,Shun Lei,Helen Meng

http://arxiv.org/abs/2312.11858v1

Compressor summary: The paper proposes a novel GNN calibration framework, SimCalib, that considers nodewise similarity at global and local levels to improve performance in cost-sensitive scenarios.


Self-supervised Learning for Enhancing Geometrical Modeling in 3D-Aware Generative Adversarial Network

Jiarong Guo,Xiaogang Xu,Hengshuang Zhao

http://arxiv.org/abs/2312.11856v1

Compressor summary: The paper proposes a self-supervised learning technique for 3D-GANs that uses an encoder and a cyclic constraint to improve the quality of 3D geometrical modeling.


Predicting Human Translation Difficulty with Neural Machine Translation

Zheng Wei Lim,Ekaterina Vylomova,Charles Kemp,Trevor Cohn

http://arxiv.org/abs/2312.11852v1

Compressor summary: The study uses data from human translators and a machine translation model to explore how surprisal and attention affect translation difficulty and time spent on translation tasks.


GCNext: Towards the Unity of Graph Convolutions for Human Motion Prediction

Xinshun Wang,Qiongjie Cui,Chen Chen,Mengyuan Liu

http://arxiv.org/abs/2312.11850v1

Compressor summary: Universal Graph Convolution (UniGC) introduces a new graph convolution concept that adapts different graph convolutions as special cases and GCNext, a novel GCN-building paradigm that dynamically determines the best-fitting graph convolutions for human motion prediction, achieving state-of-the-art performance with up to 9x lower computational cost.


Active contours driven by local and global intensity fitting energy with application to SAR image segmentation and its fast solvers

Guangming Liu,Qi Liu,Jing Liang,Quanying Sun

http://arxiv.org/abs/2312.11849v1

Compressor summary: The paper presents a new variational active contour model based on Aubert-Aujol denoising that can segment images with multiplicative gamma noise, and proposes two fast fixed point algorithms to solve the problem efficiently.


Initializing Services in Interactive ML Systems for Diverse Users

Avinandan Bose,Mihaela Curmei,Daniel L. Jiang,Jamie Morgenstern,Sarah Dean,Lillian J. Ratliff,Maryam Fazel

http://arxiv.org/abs/2312.11846v1

Compressor summary: The paper proposes a randomized algorithm to learn user preferences for multiple subpopulations with heterogeneous data distributions, and shows that it achieves near-optimal results in terms of total loss.


MixRT: Mixed Neural Representations For Real-Time NeRF Rendering

Chaojian Li,Bichen Wu,Peter Vajda,Yingyan,Lin

http://arxiv.org/abs/2312.11841v1

Compressor summary: MixRT is a novel NeRF representation that enables real-time rendering on edge devices using a low-quality mesh, a displacement map, and a compressed model.


The Validity of a Machine Learning-Based Video Game in the Objective Screening of Attention Deficit Hyperactivity Disorder in Children Aged 5 to 12 Years

Zeinab Zakani,Hadi Moradi,Sogand Ghasemzadeh,Maryam Riazi,Fatemeh Mortazavi

http://arxiv.org/abs/2312.11832v1

Compressor summary: The FishFinder is a video game that can accurately identify ADHD in children by measuring their attention, impulsivity, and hyperactivity during gameplay.


Locally-Minimal Probabilistic Explanations

Yacine Izza,Kuldeep S. Meel,Joao Marques-Silva

http://arxiv.org/abs/2312.11831v1

Compressor summary: The paper presents efficient algorithms to compute approximate probabilistic abductive explanations for machine learning, addressing their theoretical and practical complexity.


TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models

Burak Aksar,Yara Rizk,Tathagata Chakraborti

http://arxiv.org/abs/2312.11828v1

Compressor summary: The paper proposes an efficient algorithm for parsing and orchestrating multi-intent inputs in decentralized NLU for chatbots in multi-agent systems, achieving high accuracy and speed.


Decoupled Textual Embeddings for Customized Image Generation

Yufei Cai,Yuxiang Wei,Zhilong Ji,Jinfeng Bai,Hu Han,Wangmeng Zuo

http://arxiv.org/abs/2312.11826v1

Compressor summary: DETEX is a novel text-to-image generation approach that learns disentangled concept embeddings with multiple word embeddings and attribute mappers, achieving better representation and editability of the target concept.


An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

Youshao Xiao,Weichang Wu,Zhenglei Zhou,Fagui Mao,Shangchun Zhao,Lin Ju,Lei Liang,Xiaolu Zhang,Jun Zhou

http://arxiv.org/abs/2312.11819v1

Compressor summary: Key points: - Large language models like ChatGPT are versatile but face challenges in RLHF training efficiency - Current Flattening strategy treats all models as one entity and causes bottlenecks - Proposed adaptive model placement framework with two strategies: Interleaving and Separation - The framework improves the throughput, reduces memory redundancy, and communication costs - Experiments show significant improvements over SOTA approaches in various scenarios Summary: The paper presents an adaptive model placement framework for distributed RLHF training that uses two strategies to improve efficiency, reduce bottlenecks, and outperform existing methods.


Root Cause Explanation of Outliers under Noisy Mechanisms

Phuoc Nguyen,Truyen Tran,Sunil Gupta,Thin Nguyen,Svetha Venkatesh

http://arxiv.org/abs/2312.11818v1

Compressor summary: The paper proposes a noisy functional causal model for identifying root causes of anomalies in causal processes by considering both node and edge contributions, using Bayesian learning and inference methods and an efficient gradient-based attribution method.


A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

Shezheng Song,Shan Zhao,Chengyu Wang,Tianwei Yan,Shasha Li,Xiaoguang Mao,Meng Wang

http://arxiv.org/abs/2312.11816v1

Compressor summary: Key points: - Multimodal Entity Linking (MEL) links ambiguous mentions with multimodal information to entities in Knowledge Graph (KG). - Existing methods face challenges such as modality impurity and ambiguity. - The paper proposes a Dual-way Enhanced (DWE) framework that refines queries with multimodal data, leverages fine-grained image attributes, and enriches entity semantics using Wikipedia descriptions. - The method achieves state-of-the-art performance on three benchmarks. Summary: The paper presents DWE, a novel framework for MEL that improves query refinement, visual enhancement, and entity semantics using multimodal data and Wikipedia descriptions, and outperforms existing methods.


Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment

Fengli Xu,Jun Zhang,Chen Gao,Jie Feng,Yong Li

http://arxiv.org/abs/2312.11813v1

Compressor summary: The paper introduces Urban Generative Intelligence (UGI), a platform that uses large language models to create embodied agents for various urban tasks, simulating and addressing complex urban systems.


Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey

Mahmoud SalahEldin Kasem,Mohamed Mahmoud,Hyun-Soo Kang

http://arxiv.org/abs/2312.11812v1

Compressor summary: This paper reviews applications, methodologies, and challenges of Arabic Optical Character Recognition (OCR) and identifies research gaps to guide future development.


Gemini: A Family of Highly Capable Multimodal Models

Gemini Team,Rohan Anil,Sebastian Borgeaud,Yonghui Wu,Jean-Baptiste Alayrac,Jiahui Yu,Radu Soricut,Johan Schalkwyk,Andrew M. Dai,Anja Hauth,Katie Millican,David Silver,Slav Petrov,Melvin Johnson,Ioannis Antonoglou,Julian Schrittwieser,Amelia Glaese,Jilin Chen,Emily Pitler,Timothy Lillicrap,Angeliki Lazaridou,Orhan Firat,James Molloy,Michael Isard,Paul R. Barham,Tom Hennigan,Benjamin Lee,Fabio Viola,Malcolm Reynolds,Yuanzhong Xu,Ryan Doherty,Eli Collins,Clemens Meyer,Eliza Rutherford,Erica Moreira,Kareem Ayoub,Megha Goel,George Tucker,Enrique Piqueras,Maxim Krikun,Iain Barr,Nikolay Savinov,Ivo Danihelka,Becca Roelofs,Anaïs White,Anders Andreassen,Tamara von Glehn,Lakshman Yagati,Mehran Kazemi,Lucas Gonzalez,Misha Khalman,Jakub Sygnowski,Alexandre Frechette,Charlotte Smith,Laura Culp,Lev Proleev,Yi Luan,Xi Chen,James Lottes,Nathan Schucher,Federico Lebron,Alban Rrustemi,Natalie Clay,Phil Crone,Tomas Kocisky,Jeffrey Zhao,Bartek Perz,Dian Yu,Heidi Howard,Adam Bloniarz,Jack W. Rae,Han Lu,Laurent Sifre,Marcello Maggioni,Fred Alcober,Dan Garrette,Megan Barnes,Shantanu Thakoor,Jacob Austin,Gabriel Barth-Maron,William Wong,Rishabh Joshi,Rahma Chaabouni,Deeni Fatiha,Arun Ahuja,Ruibo Liu,Yunxuan Li,Sarah Cogan,Jeremy Chen,Chao Jia,Chenjie Gu,Qiao Zhang,Jordan Grimstad,Ale Jakse Hartman,Martin Chadwick,Gaurav Singh Tomar,Xavier Garcia,Evan Senter,Emanuel Taropa,Thanumalayan Sankaranarayana Pillai,Jacob Devlin,Michael Laskin,Diego de Las Casas,Dasha Valter,Connie Tao,Lorenzo Blanco,Adrià Puigdomènech Badia,David Reitter,Mianna Chen,Jenny Brennan,Clara Rivera,Sergey Brin,Shariq Iqbal,Gabriela Surita,Jane Labanowski,Abhi Rao,Stephanie Winkler,Emilio Parisotto,Yiming Gu,Kate Olszewska,Yujing Zhang,Ravi Addanki,Antoine Miech,Annie Louis,Laurent El Shafey,Denis Teplyashin,Geoff Brown,Elliot Catt,Nithya Attaluri,Jan Balaguer,Jackie Xiang,Pidong Wang,Zoe Ashwood,Anton Briukhov,Albert Webson,Sanjay Ganapathy,Smit Sanghavi,Ajay Kannan,Ming-Wei Chang,Axel Stjerngren,Josip Djolonga,Yuting Sun,Ankur Bapna,Matthew Aitchison,Pedram Pejman,Henryk Michalewski,Tianhe Yu,Cindy Wang,Juliette Love,Junwhan Ahn,Dawn Bloxwich,Kehang Han,Peter Humphreys,Thibault Sellam,James Bradbury,Varun Godbole,Sina Samangooei,Bogdan Damoc,Alex Kaskasoli,Sébastien M. R. Arnold,Vijay Vasudevan,Shubham Agrawal,Jason Riesa,Dmitry Lepikhin,Richard Tanburn,Srivatsan Srinivasan,Hyeontaek Lim,Sarah Hodkinson,Pranav Shyam,Johan Ferret,Steven Hand,Ankush Garg,Tom Le Paine,Jian Li,Yujia Li,Minh Giang,Alexander Neitz,Zaheer Abbas,Sarah York,Machel Reid,Elizabeth Cole,Aakanksha Chowdhery,Dipanjan Das,Dominika Rogozińska,Vitaly Nikolaev,Pablo Sprechmann,Zachary Nado,Lukas Zilka,Flavien Prost,Luheng He,Marianne Monteiro,Gaurav Mishra,Chris Welty,Josh Newlan,Dawei Jia,Miltiadis Allamanis,Clara Huiyi Hu,Raoul de Liedekerke,Justin Gilmer,Carl Saroufim,Shruti Rijhwani,Shaobo Hou,Disha Shrivastava,Anirudh Baddepudi,Alex Goldin,Adnan Ozturel,Albin Cassirer,Yunhan Xu,Daniel Sohn,Devendra Sachan,Reinald Kim Amplayo,Craig Swanson,Dessie Petrova,Shashi Narayan,Arthur Guez,Siddhartha Brahma,Jessica Landon,Miteyan Patel,Ruizhe Zhao,Kevin Villela,Luyu Wang,Wenhao Jia,Matthew Rahtz,Mai Giménez,Legg Yeung,Hanzhao Lin,James Keeling,Petko Georgiev,Diana Mincu,Boxi Wu,Salem Haykal,Rachel Saputro,Kiran Vodrahalli,James Qin,Zeynep Cankara,Abhanshu Sharma,Nick Fernando,Will Hawkins,Behnam Neyshabur,Solomon Kim,Adrian Hutter,Priyanka Agrawal,Alex Castro-Ros,George van den Driessche,Tao Wang,Fan Yang,Shuo-yiin Chang,Paul Komarek,Ross McIlroy,Mario Lučić,Guodong Zhang,Wael Farhan,Michael Sharman,Paul Natsev,Paul Michel,Yong Cheng,Yamini Bansal,Siyuan Qiao,Kris Cao,Siamak Shakeri,Christina Butterfield,Justin Chung,Paul Kishan Rubenstein,Shivani Agrawal,Arthur Mensch,Kedar Soparkar,Karel Lenc,Timothy Chung,Aedan Pope,Loren Maggiore,Jackie Kay,Priya Jhakra,Shibo Wang,Joshua Maynez,Mary Phuong,Taylor Tobin,Andrea Tacchetti,Maja Trebacz,Kevin Robinson,Yash Katariya,Sebastian Riedel,Paige Bailey,Kefan Xiao,Nimesh Ghelani,Lora Aroyo,Ambrose Slone,Neil Houlsby,Xuehan Xiong,Zhen Yang,Elena Gribovskaya,Jonas Adler,Mateo Wirth,Lisa Lee,Music Li,Thais Kagohara,Jay Pavagadhi,Sophie Bridgers,Anna Bortsova,Sanjay Ghemawat,Zafarali Ahmed,Tianqi Liu,Richard Powell,Vijay Bolina,Mariko Iinuma,Polina Zablotskaia,James Besley,Da-Woon Chung,Timothy Dozat,Ramona Comanescu,Xiance Si,Jeremy Greer,Guolong Su,Martin Polacek,Raphaël Lopez Kaufman,Simon Tokumine,Hexiang Hu,Elena Buchatskaya,Yingjie Miao,Mohamed Elhawaty,Aditya Siddhant,Nenad Tomasev,Jinwei Xing,Christina Greer,Helen Miller,Shereen Ashraf,Aurko Roy,Zizhao Zhang,Ada Ma,Angelos Filos,Milos Besta,Rory Blevins,Ted Klimenko,Chih-Kuan Yeh,Soravit Changpinyo,Jiaqi Mu,Oscar Chang,Mantas Pajarskas,Carrie Muir,Vered Cohen,Charline Le Lan,Krishna Haridasan,Amit Marathe,Steven Hansen,Sholto Douglas,Rajkumar Samuel,Mingqiu Wang,Sophia Austin,Chang Lan,Jiepu Jiang,Justin Chiu,Jaime Alonso Lorenzo,Lars Lowe Sjösund,Sébastien Cevey,Zach Gleicher,Thi Avrahami,Anudhyan Boral,Hansa Srinivasan,Vittorio Selo,Rhys May,Konstantinos Aisopos,Léonard Hussenot,Livio Baldini Soares,Kate Baumli,Michael B. Chang,Adrià Recasens,Ben Caine,Alexander Pritzel,Filip Pavetic,Fabio Pardo,Anita Gergely,Justin Frye,Vinay Ramasesh,Dan Horgan,Kartikeya Badola,Nora Kassner,Subhrajit Roy,Ethan Dyer,Víctor Campos,Alex Tomala,Yunhao Tang,Dalia El Badawy,Elspeth White,Basil Mustafa,Oran Lang,Abhishek Jindal,Sharad Vikram,Zhitao Gong,Sergi Caelles,Ross Hemsley,Gregory Thornton,Fangxiaoyu Feng,Wojciech Stokowiec,Ce Zheng,Phoebe Thacker,Çağlar Ünlü,Zhishuai Zhang,Mohammad Saleh,James Svensson,Max Bileschi,Piyush Patil,Ankesh Anand,Roman Ring,Katerina Tsihlas,Arpi Vezer,Marco Selvi,Toby Shevlane,Mikel Rodriguez,Tom Kwiatkowski,Samira Daruki,Keran Rong,Allan Dafoe,Nicholas FitzGerald,Keren Gu-Lemberg,Mina Khan,Lisa Anne Hendricks,Marie Pellat,Vladimir Feinberg,James Cobon-Kerr,Tara Sainath,Maribeth Rauh,Sayed Hadi Hashemi,Richard Ives,Yana Hasson,YaGuang Li,Eric Noland,Yuan Cao,Nathan Byrd,Le Hou,Qingze Wang,Thibault Sottiaux,Michela Paganini,Jean-Baptiste Lespiau,Alexandre Moufarek,Samer Hassan,Kaushik Shivakumar,Joost van Amersfoort,Amol Mandhane,Pratik Joshi,Anirudh Goyal,Matthew Tung,Andrew Brock,Hannah Sheahan,Vedant Misra,Cheng Li,Nemanja Rakićević,Mostafa Dehghani,Fangyu Liu,Sid Mittal,Junhyuk Oh,Seb Noury,Eren Sezener,Fantine Huot,Matthew Lamm,Nicola De Cao,Charlie Chen,Gamaleldin Elsayed,Ed Chi,Mahdis Mahdieh,Ian Tenney,Nan Hua,Ivan Petrychenko,Patrick Kane,Dylan Scandinaro,Rishub Jain,Jonathan Uesato,Romina Datta,Adam Sadovsky,Oskar Bunyan,Dominik Rabiej,Shimu Wu,John Zhang,Gautam Vasudevan,Edouard Leurent,Mahmoud Alnahlawi,Ionut Georgescu,Nan Wei,Ivy Zheng,Betty Chan,Pam G Rabinovitch,Piotr Stanczyk,Ye Zhang,David Steiner,Subhajit Naskar,Michael Azzam,Matthew Johnson,Adam Paszke,Chung-Cheng Chiu,Jaume Sanchez Elias,Afroz Mohiuddin,Faizan Muhammad,Jin Miao,Andrew Lee,Nino Vieillard,Sahitya Potluri,Jane Park,Elnaz Davoodi,Jiageng Zhang,Jeff Stanway,Drew Garmon,Abhijit Karmarkar,Zhe Dong,Jong Lee,Aviral Kumar,Luowei Zhou,Jonathan Evens,William Isaac,Zhe Chen,Johnson Jia,Anselm Levskaya,Zhenkai Zhu,Chris Gorgolewski,Peter Grabowski,Yu Mao,Alberto Magni,Kaisheng Yao,Javier Snaider,Norman Casagrande,Paul Suganthan,Evan Palmer,Geoffrey Irving,Edward Loper,Manaal Faruqui,Isha Arkatkar,Nanxin Chen,Izhak Shafran,Michael Fink,Alfonso Castaño,Irene Giannoumis,Wooyeol Kim,Mikołaj Rybiński,Ashwin Sreevatsa,Jennifer Prendki,David Soergel,Adrian Goedeckemeyer,Willi Gierke,Mohsen Jafari,Meenu Gaba,Jeremy Wiesner,Diana Gage Wright,Yawen Wei,Harsha Vashisht,Yana Kulizhskaya,Jay Hoover,Maigo Le,Lu Li,Chimezie Iwuanyanwu,Lu Liu,Kevin Ramirez,Andrey Khorlin,Albert Cui,Tian LIN,Marin Georgiev,Marcus Wu,Ricardo Aguilar,Keith Pallo,Abhishek Chakladar,Alena Repina,Xihui Wu,Tom van der Weide,Priya Ponnapalli,Caroline Kaplan,Jiri Simsa,Shuangfeng Li,Olivier Dousse,Fan Yang,Jeff Piper,Nathan Ie,Minnie Lui,Rama Pasumarthi,Nathan Lintz,Anitha Vijayakumar,Lam Nguyen Thiet,Daniel Andor,Pedro Valenzuela,Cosmin Paduraru,Daiyi Peng,Katherine Lee,Shuyuan Zhang,Somer Greene,Duc Dung Nguyen,Paula Kurylowicz,Sarmishta Velury,Sebastian Krause,Cassidy Hardin,Lucas Dixon,Lili Janzer,Kiam Choo,Ziqiang Feng,Biao Zhang,Achintya Singhal,Tejasi Latkar,Mingyang Zhang,Quoc Le,Elena Allica Abellan,Dayou Du,Dan McKinnon,Natasha Antropova,Tolga Bolukbasi,Orgad Keller,David Reid,Daniel Finchelstein,Maria Abi Raad,Remi Crocker,Peter Hawkins,Robert Dadashi,Colin Gaffney,Sid Lall,Ken Franko,Egor Filonov,Anna Bulanova,Rémi Leblond,Vikas Yadav,Shirley Chung,Harry Askham,Luis C. Cobo,Kelvin Xu,Felix Fischer,Jun Xu,Christina Sorokin,Chris Alberti,Chu-Cheng Lin,Colin Evans,Hao Zhou,Alek Dimitriev,Hannah Forbes,Dylan Banarse,Zora Tung,Jeremiah Liu,Mark Omernick,Colton Bishop,Chintu Kumar,Rachel Sterneck,Ryan Foley,Rohan Jain,Swaroop Mishra,Jiawei Xia,Taylor Bos,Geoffrey Cideron,Ehsan Amid,Francesco Piccinno,Xingyu Wang,Praseem Banzal,Petru Gurita,Hila Noga,Premal Shah,Daniel J. Mankowitz,Alex Polozov,Nate Kushman,Victoria Krakovna,Sasha Brown,MohammadHossein Bateni,Dennis Duan,Vlad Firoiu,Meghana Thotakuri,Tom Natan,Anhad Mohananey,Matthieu Geist,Sidharth Mudgal,Sertan Girgin,Hui Li,Jiayu Ye,Ofir Roval,Reiko Tojo,Michael Kwong,James Lee-Thorp,Christopher Yew,Quan Yuan,Sumit Bagri,Danila Sinopalnikov,Sabela Ramos,John Mellor,Abhishek Sharma,Aliaksei Severyn,Jonathan Lai,Kathy Wu,Heng-Tze Cheng,David Miller,Nicolas Sonnerat,Denis Vnukov,Rory Greig,Jennifer Beattie,Emily Caveness,Libin Bai,Julian Eisenschlos,Alex Korchemniy,Tomy Tsai,Mimi Jasarevic,Weize Kong,Phuong Dao,Zeyu Zheng,Frederick Liu,Fan Yang,Rui Zhu,Mark Geller,Tian Huey Teh,Jason Sanmiya,Evgeny Gladchenko,Nejc Trdin,Andrei Sozanschi,Daniel Toyama,Evan Rosen,Sasan Tavakkol,Linting Xue,Chen Elkind,Oliver Woodman,John Carpenter,George Papamakarios,Rupert Kemp,Sushant Kafle,Tanya Grunina,Rishika Sinha,Alice Talbert,Abhimanyu Goyal,Diane Wu,Denese Owusu-Afriyie,Cosmo Du,Chloe Thornton,Jordi Pont-Tuset,Pradyumna Narayana,Jing Li,Sabaer Fatehi,John Wieting,Omar Ajmeri,Benigno Uria,Tao Zhu,Yeongil Ko,Laura Knight,Amélie Héliou,Ning Niu,Shane Gu,Chenxi Pang,Dustin Tran,Yeqing Li,Nir Levine,Ariel Stolovich,Norbert Kalb,Rebeca Santamaria-Fernandez,Sonam Goenka,Wenny Yustalim,Robin Strudel,Ali Elqursh,Balaji Lakshminarayanan,Charlie Deck,Shyam Upadhyay,Hyo Lee,Mike Dusenberry,Zonglin Li,Xuezhi Wang,Kyle Levin,Raphael Hoffmann,Dan Holtmann-Rice,Olivier Bachem,Summer Yue,Sho Arora,Eric Malmi,Daniil Mirylenka,Qijun Tan,Christy Koh,Soheil Hassas Yeganeh,Siim Põder,Steven Zheng,Francesco Pongetti,Mukarram Tariq,Yanhua Sun,Lucian Ionita,Mojtaba Seyedhosseini,Pouya Tafti,Ragha Kotikalapudi,Zhiyu Liu,Anmol Gulati,Jasmine Liu,Xinyu Ye,Bart Chrzaszcz,Lily Wang,Nikhil Sethi,Tianrun Li,Ben Brown,Shreya Singh,Wei Fan,Aaron Parisi,Joe Stanton,Chenkai Kuang,Vinod Koverkathu,Christopher A. Choquette-Choo,Yunjie Li,TJ Lu,Abe Ittycheriah,Prakash Shroff,Pei Sun,Mani Varadarajan,Sanaz Bahargam,Rob Willoughby,David Gaddy,Ishita Dasgupta,Guillaume Desjardins,Marco Cornero,Brona Robenek,Bhavishya Mittal,Ben Albrecht,Ashish Shenoy,Fedor Moiseev,Henrik Jacobsson,Alireza Ghaffarkhah,Morgane Rivière,Alanna Walton,Clément Crepy,Alicia Parrish,Yuan Liu,Zongwei Zhou,Clement Farabet,Carey Radebaugh,Praveen Srinivasan,Claudia van der Salm,Andreas Fidjeland,Salvatore Scellato,Eri Latorre-Chimoto,Hanna Klimczak-Plucińska,David Bridson,Dario de Cesare,Tom Hudson,Piermaria Mendolicchio,Lexi Walker,Alex Morris,Ivo Penchev,Matthew Mauger,Alexey Guseynov,Alison Reid,Seth Odoom,Lucia Loher,Victor Cotruta,Madhavi Yenugula,Dominik Grewe,Anastasia Petrushkina,Tom Duerig,Antonio Sanchez,Steve Yadlowsky,Amy Shen,Amir Globerson,Adam Kurzrok,Lynette Webb,Sahil Dua,Dong Li,Preethi Lahoti,Surya Bhupatiraju,Dan Hurt,Haroon Qureshi,Ananth Agarwal,Tomer Shani,Matan Eyal,Anuj Khare,Shreyas Rammohan Belle,Lei Wang,Chetan Tekur,Mihir Sanjay Kale,Jinliang Wei,Ruoxin Sang,Brennan Saeta,Tyler Liechty,Yi Sun,Yao Zhao,Stephan Lee,Pandu Nayak,Doug Fritz,Manish Reddy Vuyyuru,John Aslanides,Nidhi Vyas,Martin Wicke,Xiao Ma,Taylan Bilal,Evgenii Eltyshev,Daniel Balle,Nina Martin,Hardie Cate,James Manyika,Keyvan Amiri,Yelin Kim,Xi Xiong,Kai Kang,Florian Luisier,Nilesh Tripuraneni,David Madras,Mandy Guo,Austin Waters,Oliver Wang,Joshua Ainslie,Jason Baldridge,Han Zhang,Garima Pruthi,Jakob Bauer,Feng Yang,Riham Mansour,Jason Gelman,Yang Xu,George Polovets,Ji Liu,Honglong Cai,Warren Chen,XiangHai Sheng,Emily Xue,Sherjil Ozair,Adams Yu,Christof Angermueller,Xiaowei Li,Weiren Wang,Julia Wiesinger,Emmanouil Koukoumidis,Yuan Tian,Anand Iyer,Madhu Gurumurthy,Mark Goldenson,Parashar Shah,MK Blake,Hongkun Yu,Anthony Urbanowicz,Jennimaria Palomaki,Chrisantha Fernando,Kevin Brooks,Ken Durden,Harsh Mehta,Nikola Momchev,Elahe Rahimtoroghi,Maria Georgaki,Amit Raul,Sebastian Ruder,Morgan Redshaw,Jinhyuk Lee,Komal Jalan,Dinghua Li,Ginger Perng,Blake Hechtman,Parker Schuh,Milad Nasr,Mia Chen,Kieran Milan,Vladimir Mikulik,Trevor Strohman,Juliana Franco,Tim Green,Demis Hassabis,Koray Kavukcuoglu,Jeffrey Dean,Oriol Vinyals

http://arxiv.org/abs/2312.11805v1

Compressor summary: The report presents Gemini, a new family of multimodal models with various sizes for different applications, that show impressive performance across image, audio, video, and text understanding tasks.


Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health

Maria Antoniak,Aakanksha Naik,Carla S. Alvarado,Lucy Lu Wang,Irene Y. Chen

http://arxiv.org/abs/2312.11803v1

Compressor summary: The authors propose nine ethical principles for using large language models in healthcare applications, based on input from healthcare workers and birthing people.


MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

Lang Yu,Qin Chen,Jie Zhou,Liang He

http://arxiv.org/abs/2312.11795v1

Compressor summary: MELO is a plug-in Model Editing method that dynamically adjusts LLM behavior using inner vector database indices for efficient and effective editing of various NLP tasks.


An effective image copy-move forgery detection using entropy image

Zhaowei Lu,Li Jiang

http://arxiv.org/abs/2312.11793v1

Compressor summary: The text describes an improved keypoint-based algorithm for detecting copy-move forgery in images, using entropy images to increase the number of keypoints and a clustering algorithm to handle non-ideal distribution of grayscale values.


COOPER: Coordinating Specialized Agents towards a Complex Dialogue Goal

Yi Cheng,Wenge Liu,Jian Wang,Chak Tou Leong,Yi Ouyang,Wenjie Li,Xian Wu,Yefeng Zheng

http://arxiv.org/abs/2312.11792v1

Compressor summary: The Cooper framework is a novel dialogue system that coordinates multiple specialized agents to achieve complex goals, such as negotiation and emotional support, by focusing on different aspects of these goals.


Faster Convergence with Multiway Preferences

Aadirupa Saha,Vitaly Feldman,Tomer Koren,Yishay Mansour

http://arxiv.org/abs/2312.11788v1

Compressor summary: The paper studies convex optimization with preference feedback, where it designs efficient algorithms with improved convergence rates for batched and multiway comparisons, and provides lower bounds to show their optimality.


Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs

Zhangdie Yuan,Andreas Vlachos

http://arxiv.org/abs/2312.11785v1

Compressor summary: The paper introduces a zero-shot method for fact-checking that uses semantic triples and external knowledge graphs to generalize to unseen data and domains.


Learning Object State Changes in Videos: An Open-World Perspective

Zihui Xue,Kumar Ashutosh,Kristen Grauman

http://arxiv.org/abs/2312.11782v1

Compressor summary: The authors propose VidOSC, a method that uses text and vision-language models to learn how objects change over time without manual annotations, and introduce an open-world benchmark for video object state changes called HowToChange.


Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

Anaelia Ovalle,Ninareh Mehrabi,Palash Goyal,Jwala Dhamala,Kai-Wei Chang,Richard Zemel,Aram Galstyan,Yuval Pinter,Rahul Gupta

http://arxiv.org/abs/2312.11779v1

Compressor summary: This paper investigates how data scarcity affects large language models' misgendering of non-binary people and proposes a new tokenization method, PTP, to improve neopronoun consistency.


Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

Yuze He,Yushi Bai,Matthieu Lin,Jenny Sheng,Yubin Hu,Qi Wang,Yu-Hui Wen,Yong-Jin Liu

http://arxiv.org/abs/2312.11774v1

Compressor summary: The authors propose a text-to-3D method that improves fine-grained view consistency and reduces floaters and empty spaces by incorporating multi-view image conditions into NeRF optimization.


CAManim: Animating end-to-end network activation maps

Emily Kaczmarek,Olivier X. Miguel,Alexa C. Bowie,Robin Ducharme,Alysha L. J. Dingwall-Harvey,Steven Hawken,Christine M. Armour,Mark C. Walker,Kevin Dick

http://arxiv.org/abs/2312.11772v1

Compressor summary: CAManim is a novel XAI visualization method that animates CAM-based network activation maps through all layers to improve understanding of CNN predictions.


Bridging the Gap: Generalising State-of-the-Art U-Net Models to Sub-Saharan African Populations

Alyssa R. Amod,Alexandra Smith,Pearly Joubert,Confidence Raymond,Dong Zhang,Udunna C. Anazodo,Dodzi Motchon,Tinashe E. M. Mutsvangwa,Sébastien Quetin

http://arxiv.org/abs/2312.11770v1

Compressor summary: The authors investigated how different training approaches affect tumor segmentation models' performance on poor-quality neuroimaging data from Sub-Saharan Africa and found that fine-tuning a model pretrained on high-quality data improved its results.


Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

Ilias Diakonikolas,Daniel M. Kane,Jasper C. H. Lee,Thanasis Pittas

http://arxiv.org/abs/2312.11769v1

Compressor summary: The paper proposes efficient algorithms for clustering mixtures of bounded covariance distributions with a fine-grained separation assumption and shows their applicability to various settings, including high-dimensional log-concave distributions and robust clustering.


Curriculum Learning for Cooperation in Multi-Agent Reinforcement Learning

Rupali Bhati,Sai Krishna Gottipati,Clodéric Mars,Matthew E. Taylor

http://arxiv.org/abs/2312.11768v1

Compressor summary: The paper investigates the best type and curriculum of cooperative teammates to train a learning agent in multi-agent reinforcement learning for achieving task performance and overall team reward, finding that less skilled but pre-trained teammates and skill-decreasing curricula perform better.


ADMM-MM Algorithm for General Tensor Decomposition

Manabu Mukai,Hidekata Hontani,Tatsuya Yokota

http://arxiv.org/abs/2312.11763v1

Compressor summary: The paper presents a new optimization algorithm for tensor decomposition that supports different loss functions and models, and can be applied to various applications.


MineObserver 2.0: A Deep Learning & In-Game Framework for Assessing Natural Language Descriptions of Minecraft Imagery

Jay Mahajan,Samuel Hum,Jack Henhapl,Diya Yunus,Matthew Gadbury,Emi Brown,Jeff Ginger,H. Chad Lane

http://arxiv.org/abs/2312.11761v1

Compressor summary: MineObserver 2.0 is an improved AI framework that helps assess the accuracy of learner-generated descriptions of Minecraft images related to science, using computer vision and natural language processing, and provides feedback to teachers.