This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-05 generated by the compressor, my personal LLM-based project.
Anh-Quan Cao,Angela Dai,Raoul de Charette
http://arxiv.org/abs/2312.02158v1
Compressor summary: The paper proposes a new task called Panoptic Scene Completion (PSC) that adds instance-level information to the Semantic Scene Completion (SSC) task, and introduces a method to estimate uncertainties using a multi-input multi-output strategy.
Can Wang,Mingming He,Menglei Chai,Dongdong Chen,Jing Liao
http://arxiv.org/abs/2312.02157v1
Compressor summary: The paper proposes a new approach that combines mesh guiding and neural implicit fields for easy editing of 3D scenes while maintaining high-quality rendering.
Shunyuan Zheng,Boyao Zhou,Ruizhi Shao,Boning Liu,Shengping Zhang,Liqiang Nie,Yebin Liu
http://arxiv.org/abs/2312.02155v1
Compressor summary: The GPS-Gaussian method allows real-time, high-resolution view synthesis of characters without fine-tuning or optimization by using Gaussian parameter maps and training on human scan data.
Kangfu Mei,Luis Figueroa,Zhe Lin,Zhihong Ding,Scott Cohen,Vishal M. Patel
http://arxiv.org/abs/2312.02156v1
Compressor summary: The paper proposes using diffusion models to gradually refine shadow regions and improve texture recovery, conditioning on a learned latent feature space and fusing noise features with the network, achieving significant improvements over previous methods.
Yunhang Shen,Chaoyou Fu,Peixian Chen,Mengdan Zhang,Ke Li,Xing Sun,Yunsheng Wu,Shaohui Lin,Rongrong Ji
http://arxiv.org/abs/2312.02153v1
Compressor summary: The paper introduces APE, a universal visual perception model that performs diverse tasks like detection, segmentation, and grounding in images using instance-level sentence-object matching and without task-specific fine-tuning.
Georg Bökman,Johan Edstedt,Michael Felsberg,Fredrik Kahl
http://arxiv.org/abs/2312.02152v1
Compressor summary: The authors propose a method to learn a linear transform called a steerer that encodes rotations in image keypoint descriptions, improving their robustness to camera rotation without sacrificing performance or runtime.
Wele Gedara Chaminda Bandara,Celso M. De Melo,Vishal M. Patel
http://arxiv.org/abs/2312.02151v1
Compressor summary: Mixed Barlow Twins is a method to improve self-supervised learning by enhancing sample interaction and reducing feature overfitting, leading to better downstream task performance.
Grace Luo,Trevor Darrell,Oliver Wang,Dan B Goldman,Aleksander Holynski
http://arxiv.org/abs/2312.02150v1
Compressor summary: Readout Guidance is a method that uses lightweight networks to guide text-to-image diffusion models with user-defined targets, requiring fewer parameters and training samples than prior methods.
Xiaojuan Wang,Janne Kontkanen,Brian Curless,Steve Seitz,Ira Kemelmacher,Ben Mildenhall,Pratul Srinivasan,Dor Verbin,Aleksander Holynski
http://arxiv.org/abs/2312.02149v1
Compressor summary: The text describes a method for creating images that can be zoomed in on extensively while maintaining consistency across different scales using a joint multi-scale diffusion sampling approach.
Sucheng Ren,Zeyu Wang,Hongru Zhu,Junfei Xiao,Alan Yuille,Cihang Xie
http://arxiv.org/abs/2312.02147v1
Compressor summary: The paper presents D-iGPT, a modified version of image-GPT that uses semantic tokens and predicts both visible and hidden tokens, achieving strong visual representation learning results on ImageNet-1K and other tasks.
Hannah Lawrence,Mitchell Tong Harris
http://arxiv.org/abs/2312.02146v1
Compressor summary: Neural networks can solve positivity optimization and certification problems for polynomials faster and accurately, using data-driven methods and adapting to non-compact group equivariant structures.
Bingxin Ke,Anton Obukhov,Shengyu Huang,Nando Metzger,Rodrigo Caye Daudt,Konrad Schindler
http://arxiv.org/abs/2312.02145v1
Compressor summary: Marigold is a method for monocular depth estimation that uses generative diffusion models to improve generalization and achieve state-of-the-art performance with synthetic training data.
Yunzhong Hou,Xingjian Leng,Tom Gedeon,Liang Zheng
http://arxiv.org/abs/2312.02144v1
Compressor summary: The text describes a novel solution that uses reinforcement learning and transformers to autonomously generate camera configurations for multi-view pedestrian detection systems, achieving better results than existing methods.
Yiming Huang,Zhenghao Lin,Xiao Liu,Yeyun Gong,Shuai Lu,Fangyu Lei,Yaobo Liang,Yelong Shen,Chen Lin,Nan Duan,Weizhu Chen
http://arxiv.org/abs/2312.02143v1
Compressor summary: The paper evaluates GPT-4's reasoning skills on Codeforces problems, finding a decline in performance after September 2021, suggesting data contamination and challenges for existing LLMs to solve complex reasoning tasks.
Kaiyu Yue,Bor-Chun Chen,Jonas Geiping,Hengduo Li,Tom Goldstein,Ser-Nam Lim
http://arxiv.org/abs/2312.02142v1
Compressor summary: The paper proposes an efficient object recognition method using a language decoder to predict text tokens from image embeddings, with a custom attention mask and one-shot sampling for parallel label generation.
Zitong Zhan,Dasong Gao,Yun-Jou Lin,Youjie Xia,Chen Wang
http://arxiv.org/abs/2312.02141v1
Compressor summary: The paper introduces imperative learning (IL), a self-supervised scheme for training feature correspondence in computer vision, which improves performance on tasks like feature matching and pose estimation.
Ali Hatamizadeh,Jiaming Song,Guilin Liu,Jan Kautz,Arash Vahdat
http://arxiv.org/abs/2312.02139v1
Compressor summary: This paper proposes Diffusion Vision Transformers (DiffiT), a hybrid hierarchical architecture with a U-shaped encoder and decoder, that uses time-dependent self-attention for efficient denoising in diffusion-based generative learning, achieving state-of-the-art results on various image synthesis tasks.
Chandradeep Pokhariya,Ishaan N Shah,Angela Xing,Zekun Li,Kefan Chen,Avinash Sharma,Srinath Sridhar
http://arxiv.org/abs/2312.02137v1
Compressor summary: MANUs is a novel method for capturing hand-object grasps using articulated 3D Gaussians, which enables accurate estimation of contacts between hands and objects, and requires tens of camera views.
Qihang Zhang,Yinghao Xu,Yujun Shen,Bo Dai,Bolei Zhou,Ceyuan Yang
http://arxiv.org/abs/2312.02136v1
Compressor summary: The paper proposes a new way to generate large-scale 3D scenes using an equivariant radiance field and a bird's-eye view map, which allows for easy manipulation of objects and smooth stitching of local scenes.
Yao-Chih Lee,Zhoutong Zhang,Kevin Blackburn-Matzen,Simon Niklaus,Jianming Zhang,Jia-Bin Huang,Feng Liu
http://arxiv.org/abs/2312.02135v1
Compressor summary: The paper proposes a fast and efficient method to synthesize novel views from monocular videos using explicit video representations, treating static and dynamic content separately.
Liangxiao Hu,Hongwen Zhang,Yuxiang Zhang,Boyao Zhou,Boning Liu,Shengping Zhang,Liqiang Nie
http://arxiv.org/abs/2312.02134v1
Compressor summary: GaussianAvatar creates realistic human avatars with dynamic 3D appearances from a single video using animatable 3D Gaussians and pose-dependent appearance modeling.
Amir Hertz,Andrey Voynov,Shlomi Fruchter,Daniel Cohen-Or
http://arxiv.org/abs/2312.02133v1
Compressor summary: StyleAligned is a technique for maintaining consistent style across generated images using text-to-image models by sharing attention during the diffusion process.
Edith Cohen,Xin Lyu,Jelani Nelson,Tamas Sarlos,Uri Stemmer
http://arxiv.org/abs/2312.02132v1
Compressor summary: Hot PATE is a method for transferring knowledge from multiple teacher models to a student model while preserving privacy, focusing on diverse tasks where there may not be a clear label for each example.
Vitor Miguel Xavier Peres,Greice Pinho Dal Molin,Soraia Raupp Musse
http://arxiv.org/abs/2312.02128v1
Compressor summary: The research evaluates Ekman's action units in real and virtual human faces, posed and spontaneous, to find differences and similarities that can help various fields of knowledge.
Nikhil Keetha,Jay Karhade,Krishna Murthy Jatavallabhula,Gengshan Yang,Sebastian Scherer,Deva Ramanan,Jonathon Luiten
http://arxiv.org/abs/2312.02126v1
Compressor summary: The paper introduces SplaTAM, a method that uses 3D Gaussians to enable dense SLAM with a single unposed monocular RGB-D camera, improving performance in pose estimation, map construction, and novel-view synthesis while allowing real-time rendering.
Amir Panahandeh,Hanie Asemi,Esmail Nourani
http://arxiv.org/abs/2312.02125v1
Compressor summary: The study trains a Persian classical poetry generation model using a transformer architecture on a specialized dataset without pretraining, and proposes a novel decoding method to enhance coherence and meaningfulness in the generated poetry.
Majed El Helou,Doruk Cetin,Petar Stamenkovic,Fabio Zund
http://arxiv.org/abs/2312.02124v1
Compressor summary: VerA is a versatile facial image anonymization method that preserves semantic areas for medical intervention and works well on before-and-after results, outperforming or matching existing methods in regular images.
Yuxiang Wei,Zhe Wang,Jiawei Liu,Yifeng Ding,Lingming Zhang
http://arxiv.org/abs/2312.02120v1
Compressor summary: Magicoder is an open-source LLM for code that uses OSS-Instruct to generate diverse and realistic instruction data from open-source code snippets, resulting in high performance on coding benchmarks.
Anay Mehrotra,Manolis Zampetakis,Paul Kassianik,Blaine Nelson,Hyrum Anderson,Yaron Singer,Amin Karbasi
http://arxiv.org/abs/2312.02119v1
Compressor summary: The paper introduces TAP, an automated method that uses tree-of-thoughts reasoning to generate jailbreaks for LLMs with only black-box access and minimal queries, outperforming previous methods.
Benjamin Litterer,David Jurgens,Dallas Card
http://arxiv.org/abs/2312.02118v1
Compressor summary: The authors develop a method to identify media storms from online news articles and study their characteristics and effects on media coverage and agenda setting.
Michael Tschannen,Cian Eastwood,Fabian Mentzer
http://arxiv.org/abs/2312.02116v1
Compressor summary: The paper introduces GIVT, which are transformers that generate real-valued vector sequences from infinite vocabularies, and shows their applications in image generation and other tasks.
Lucas Farndale,Robert Insall,Ke Yuan
http://arxiv.org/abs/2312.02111v1
Compressor summary: TriDeNT is a self-supervised method that uses additional data unavailable during inference to improve computational pathology models' performance on various tasks.
Dar-Yen Chen,Hamish Tennent,Ching-Wen Hsu
http://arxiv.org/abs/2312.02109v1
Compressor summary: ArtAdapter is a new text-to-image framework that transfers high-level artistic elements from text to image with unprecedented fidelity and no content borrowing, outperforming existing methods.
Sunghun Kang,Junbum Cha,Jonghwan Mun,Byungseok Roh,Chang D. Yoo
http://arxiv.org/abs/2312.02103v1
Compressor summary: The paper presents PLAC, a method to learn image-to-text mapping for arbitrary concepts in open-vocabulary object detection.
Jan Mielniczuk,Adam Wawrzeńczyk
http://arxiv.org/abs/2312.02095v1
Compressor summary: The paper compares different classifiers for positive unlabeled data in case-control and single-sample scenarios and shows that their performance can vary significantly depending on the scenario.
Mohamad Ali-Dib,Kristen Menou
http://arxiv.org/abs/2312.02091v1
Compressor summary: Key points: - LLMs can solve physics problems and code, but struggle with PhD-level computational physics problems - The paper evaluates LLMs on 50 original problems in different domains using various packages - GPT4 fails most problems, but produces mostly correct lines with some physics and coding errors Summary: The paper tests the ability of large language models to solve complex physics problems using code packages and finds that GPT4 fails most of them, but produces mostly correct lines with some physics and coding errors.
Yuchao Gu,Yipin Zhou,Bichen Wu,Licheng Yu,Jia-Wei Liu,Rui Zhao,Jay Zhangjie Wu,David Junhao Zhang,Mike Zheng Shou,Kevin Tang
http://arxiv.org/abs/2312.02087v1
Compressor summary: The VideoSwap framework uses semantic point correspondences instead of dense correspondences for shape-preserving video subject swapping with user-friendly interactions.
Maxim Borisyak,Stefan Born,Peter Neubauer,Nicolás Cruz-Bournazou
http://arxiv.org/abs/2312.02079v1
Compressor summary: The authors propose a deep learning method that can handle sparse and irregular time series from bio-process data without needing imputation or alignment procedures, and demonstrate its effectiveness in forecasting tasks.
Giovanni Monea,Maxime Peyrard,Martin Josifoski,Vishrav Chaudhary,Jason Eisner,Emre Kıcıman,Hamid Palangi,Barun Patra,Robert West
http://arxiv.org/abs/2312.02073v1
Compressor summary: The paper introduces Fakepedia, a dataset to study how large language models ground their knowledge in contradictory situations, and analyzes the differences between GPT-4-turbo and Mistral-7B in this context.
Shenhan Qian,Tobias Kirschstein,Liam Schoneveld,Davide Davoli,Simon Giebenhain,Matthias Nießner
http://arxiv.org/abs/2312.02069v1
Compressor summary: GaussianAvatars is a new method for creating realistic and controllable 3D head avatars using Gaussian splats and a parametric morphable face model.
Donya Rooein,Amanda Cercas Curry,Dirk Hovy
http://arxiv.org/abs/2312.02065v1
Compressor summary: The paper evaluates the readability of answers generated by large language models (LLMs) for science questions targeting different age groups and education levels, finding that LLMs need improvement to better adapt to diverse audiences in educational settings.
Marco Cotogni,Jacopo Bonato,Luigi Sabetta,Francesco Pelosin,Alessandro Nicolosi
http://arxiv.org/abs/2312.02052v1
Compressor summary: DUCK is a new unlearning algorithm that uses metric learning to remove specific samples and achieve state-of-the-art performance in ensuring privacy in AI models.
Shuhuai Ren,Linli Yao,Shicheng Li,Xu Sun,Lu Hou
http://arxiv.org/abs/2312.02051v1
Compressor summary: TimeChat is a multimodal language model that can understand long videos by processing their visual and temporal information and following instructions from users.
Han Zhang,Quan Gan,David Wipf,Weinan Zhang
http://arxiv.org/abs/2312.02037v1
Compressor summary: The paragraph describes a novel framework called Graph-based Feature Synthesis (GFS) that uses heterogeneous graphs to train machine learning models on multi-table relational databases without feature engineering, preserving the data's inherent relationships and structure.
Mohammad Altillawi,Shile Li,Sai Manoj Prakhya,Ziyuan Liu,Joan Serrat
http://arxiv.org/abs/2312.02029v1
Compressor summary: The paper proposes a learning method for global visual localization that uses minimal pose labels to learn 3D scene geometry and improve pose estimation accuracy using rigid alignment and additional learning constraints.
Christoph Hümmer,Manuel Schwonberg,Liangwei Zhong,Hu Cao,Alois Knoll,Hanno Gottschalk
http://arxiv.org/abs/2312.02021v1
Compressor summary: VLTSeg is a vision-language method that improves domain generalization in semantic segmentation by using CLIP and EVA-CLIP encoders, outperforming previous approaches on several benchmarks.
Xingyuan Zhang,Philip Becker-Ehmck,Patrick van der Smagt,Maximilian Karl
http://arxiv.org/abs/2312.02019v1
Compressor summary: The paper introduces AIME, a method that enables agents to learn new behaviors by imitating expert demonstrations without needing further training or environment interactions.
Yufei Shi,Beijia Lu,Jia-Wei Liu,Ming Li,Mike Zheng Shou
http://arxiv.org/abs/2312.02015v1
Compressor summary: The ColonNeRF framework uses neural rendering to reconstruct the entire colon in a piecewise manner, addressing challenges like shape dissimilarity, geometry complexity, and sparse views for accurate long-sequence colonoscopy reconstruction.
M. R. Mahani,Igor A. Nechepurenko,Yasmin Rahimof,Andreas Wicht
http://arxiv.org/abs/2312.02012v1
Compressor summary: The text proposes a method for creating a minimal yet informative database using Bayesian optimization and Gaussian process regression to train accurate machine learning models with less data points.
Duo Zheng,Shijia huang,Lin Zhao,Yiwu Zhong,Liwei Wang
http://arxiv.org/abs/2312.02010v1
Compressor summary: The paper introduces NaviLLM, a generalist AI model for embodied navigation that adapts large language models to various tasks using schema-based instructions and achieves state-of-the-art performance and generalizability.
Geonmo Gu,Sanghyuk Chun,Wonjae Kim,Yoohoon Kang,Sangdoo Yun
http://arxiv.org/abs/2312.01998v1
Compressor summary: The proposed LinCIR framework trains a composed image retrieval model using only text datasets with a self-supervision technique called self-masking projection, achieving high performance on four benchmarks and outperforming some supervised methods.
Jungwon Choi,Seongho Keum,EungGu Yun,Byung-Hoon Kim,Juho Lee
http://arxiv.org/abs/2312.01994v1
Compressor summary: The authors propose a generative self-supervised learning method for graph neural networks to improve accuracy and interpretability in modeling dynamic functional connectivity from fMRI data, addressing challenges such as high data cost and limited generalization.
Mohammad Ali Vahedifar,Azim Akhtarshenas,Mariam Sabbaghian,Mohammad Rafatpanah
http://arxiv.org/abs/2312.01991v1
Compressor summary: The paper introduces IMKNN, a novel method that improves the KNN algorithm by using Mutual Information and Shapley values to assign weights to neighbors, and shows its superior performance in various classification tasks.
Ziteng Gao,Zhan Tong,Kevin Qinghong Lin,Joya Chen,Mike Zheng Shou
http://arxiv.org/abs/2312.01987v1
Compressor summary: The paper proposes a method to bootstrap SparseFormer architectures from ViT-based vision foundation models, reducing computational costs and enabling zero-shot performance with minimal training samples.
Lu Qi,Lehan Yang,Weidong Guo,Yu Xu,Bo Du,Varun Jampani,Ming-Hsuan Yang
http://arxiv.org/abs/2312.01985v1
Compressor summary: The paper presents a new representation for diffusion models that enables image generation, segmentation, and adaptation to various tasks with efficient modules and inpainting pipeline.
Haodong Zhang,ZhiKe Chen,Haocheng Xu,Lei Hao,Xiaofei Wu,Songcen Xu,Zhensong Zhang,Yue Wang,Rong Xiong
http://arxiv.org/abs/2312.01964v1
Compressor summary: The SMT method uses vision-language models to extract and maintain meaningful motion semantics for motion retargeting between animation characters, with a two-stage pipeline that ensures preservation of both fine-grained details and high-level semantics.
Francesca Cairoli,Luca Bortolussi,Nicola Paoletti
http://arxiv.org/abs/2312.01959v1
Compressor summary: This tutorial explains how to use machine learning and conformal prediction to efficiently predict and monitor future violations of requirements in complex systems.
Victor Gallego
http://arxiv.org/abs/2312.01957v1
Compressor summary: The paper presents dSC, a method for improving LLM outputs using synthetic data and Bayesian inference, and shows its potential in various tasks.
Andrea Papaluca,Daniel Krefl,Sergio Mendez Rodriguez,Artem Lensky,Hanna Suominen
http://arxiv.org/abs/2312.01954v1
Compressor summary: The authors evaluate how Large Language Models (LLMs) use contextual information from a Knowledge Base (KB) to perform Triplet Extraction (TE) in Zero- and Few-Shots settings, finding that the quality of the KB context strongly affects TE performance.
Jian Lin,Chengze Li,Xueting Liu,Zhongping Ge
http://arxiv.org/abs/2312.01943v1
Compressor summary: The authors introduce a new dataset and model for accurately segmenting characters in cartoons, which enables various creative applications in cartoon editing.
Markus Wulfmeier,Arunkumar Byravan,Sarah Bechtle,Karol Hausman,Nicolas Heess
http://arxiv.org/abs/2312.01939v1
Compressor summary: The paragraph discusses how artificial intelligence systems are becoming more general and the challenges and opportunities in improving their knowledge representation and transfer across different domains using reinforcement learning modalities.
Xiaobo Hu,Youfang Lin,Yue Liu,Jinwen Wang,Shuo Wang,Hehe Fan,Kai Lv
http://arxiv.org/abs/2312.01915v1
Compressor summary: The BiT model leverages bidirectional prediction of environmental transitions to extract reliable representations for vision-based control tasks.
Alexander Frotscher,Jaivardhan Kapoor,Thomas Wolfers,Christian F. Baumgartner
http://arxiv.org/abs/2312.01904v1
Compressor summary: ANDi is a new unsupervised anomaly detection method for brain MRI that outperforms existing approaches and can identify diverse types of anomalies better.
Min Yang,Huan Gao,Ping Guo,Limin Wang
http://arxiv.org/abs/2312.01897v1
Compressor summary: The paper proposes a new mechanism for adapting pre-trained ViT models as a unified long-form video transformer to capture inter-snippet relations for temporal action detection in untrimmed videos, while maintaining low computation overhead and memory consumption.
Cameron Martin,Fucai Ke,Hao Wang
http://arxiv.org/abs/2312.01887v1
Compressor summary: This paper presents a novel method for detecting electric vehicle (EV) charging at the feeder level using sliding-window feature extraction and machine learning techniques, achieving high accuracy in both offline and online detection.
Xunguang Wang,Zhenlan Ji,Pingchuan Ma,Zongjie Li,Shuai Wang
http://arxiv.org/abs/2312.01886v1
Compressor summary: The paper proposes InstructTA, a targeted adversarial attack on large vision-language models that uses a text-to-image model, GPT-4, and a local surrogate model to generate instruction-aware features and optimize the adversarial example.
Mattia Setzu,Salvatore Ruggieri
http://arxiv.org/abs/2312.01884v1
Compressor summary: The paragraph discusses how univariate and multivariate decision trees, which partition data differently, have similar performance despite the latter being more powerful, possibly due to dataset pre-processing bias.
Yimin Sun,Chao Wang,Yan Peng
http://arxiv.org/abs/2312.01882v1
Compressor summary: The paper introduces ZFDDA, a zero-shot VQA model for flood damage assessment, and FFD-IQA, a new dataset with diverse question types and more data to evaluate the model's performance.
Xingtong Yu,Zemin Liu,Yuan Fang,Xinming Zhang
http://arxiv.org/abs/2312.01878v1
Compressor summary: HGPROMPT is a novel framework that unifies pre-training and downstream tasks for homogeneous and heterogeneous graphs using dual-template design and dual-prompt to bridge the gap between them.
Yitao Peng,Lianghua He,Die Hu,Yihang Liu,Longzhen Yang,Shaohua Shang
http://arxiv.org/abs/2312.01871v1
Compressor summary: The paper proposes FeaInfNet, a model for interpretable medical image diagnosis that simulates doctors' reasoning process, uses local feature masks and adaptive dynamic masks to enhance expressivity and interpretability, and achieves state-of-the-art performance on multiple datasets.
Zichao Li,Ines Arous,Siva Reddy,Jackie C. K. Cheung
http://arxiv.org/abs/2312.01858v1
Compressor summary: The paragraph discusses a proposed evaluation protocol, DepEdit, for assessing the editing process of large language models (LLMs) with respect to logical constraints and implications of edited facts.
Joshua Niemeijer,Manuel Schwonberg,Jan-Aike Termöhlen,Nico M. Schmidt,Tim Fingscheidt
http://arxiv.org/abs/2312.01850v1
Compressor summary: The authors propose DIDEX, a diffusion-based domain extension method that generates diverse pseudo-target images with text prompts and trains a model to adapt towards them, achieving improved domain generalization results on various datasets without using target data.
Xusen Sun,Longhao Zhang,Hao Zhu,Peng Zhang,Bang Zhang,Xinya Ji,Kangneng Zhou,Daiheng Gao,Liefeng Bo,Xun Cao
http://arxiv.org/abs/2312.01841v1
Compressor summary: VividTalk is a framework that generates high-quality talking head videos with lip-sync, expressive facial expressions, natural head pose, and high video quality by learning two motions in two stages.
Yuxia Geng,Jiaoyan Chen,Yuhang Zeng,Zhuo Chen,Wen Zhang,Jeff Z. Pan,Yuxiang Wang,Xiaoliang Xu
http://arxiv.org/abs/2312.01837v1
Compressor summary: The paper proposes a new Knowledge Graph Completion method (PDKGC) that uses prompts to train a frozen pre-trained language model, improving entity prediction by combining textual and structural information.
Longhui Yuan,Shuang Li,Zhuo He,Binhui Xie
http://arxiv.org/abs/2312.01835v1
Compressor summary: The paper proposes active test-time adaptation (ATASeg) for semantic segmentation, which uses a human-in-the-loop pattern to query few labels online and reduce the performance gap between unsupervised and supervised methods.
Zhangyue Yin,Qiushi Sun,Cheng Chang,Qipeng Guo,Junqi Dai,Xuanjing Huang,Xipeng Qiu
http://arxiv.org/abs/2312.01823v1
Compressor summary: The Exchange-of-Thought framework allows large language models to communicate with each other during problem-solving, improving their performance on complex reasoning tasks by incorporating external insights.
Elizaveta Tennant,Stephen Hailes,Mirco Musolesi
http://arxiv.org/abs/2312.01818v1
Compressor summary: The paper explores different approaches to embedding morality in AI systems, argues that hybrid solutions combining hard-coded rules and learned preferences are needed, and presents three case studies using reinforcement learning to provide moral principles to agents.
Wassim Tenachi,Rodrigo Ibata,Thibaut L. François,Foivos I. Diakogiannis
http://arxiv.org/abs/2312.01816v1
Compressor summary: Class Symbolic Regression is a framework that finds a single function to fit multiple data sets with different parameters, using the idea that similar phenomena follow common laws. It improves on previous symbolic regression methods by integrating dimensional analysis and deep reinforcement learning, and shows its usefulness in astrophysics by finding an analytic galaxy potential from simulated orbits.
Christopher Diehl,Tobias Klosek,Martin Krüger,Nils Murzyn,Timo Osterburg,Torsten Bertram
http://arxiv.org/abs/2312.01811v1
Compressor summary: The authors propose a game-theoretic approach to model multi-agent interactions in robotics, combining energy-based models with neural networks for inference and optimization, which improves interpretability and predictive performance.
Nicola Dall'Asen,Willi Menapace,Elia Peruzzo,Enver Sangineto,Yiming Wang,Elisa Ricci
http://arxiv.org/abs/2312.01800v1
Compressor summary: The text describes a novel AI-based collaborative painting task that aims to produce coherent paintings with humans and machines, using parametrized strokes, attention mechanisms, and a new dataset.
Martin Hellkvist,Ayça Özçelikkale,Anders Ahlén
http://arxiv.org/abs/2312.01795v1
Compressor summary: Key points: - The paper studies estimation under time-varying signals in continual learning setting - It focuses on distributed learning using COCOA algorithm - It provides analytical characterization for generalization error of COCOA - It shows how network size, task similarity and number of tasks affect generalization error - It demonstrates the results with a digit classification task Summary: The paper analyzes how COCOA, a distributed learning algorithm, performs in continual learning with time-varying signals, and how to choose the network size to minimize the generalization error.
Sergey Kolesnikov
http://arxiv.org/abs/2312.01792v1
Compressor summary: Wild-Tab is a benchmark for testing out-of-distribution generalization in tabular regression tasks, using real-world datasets from weather prediction and power consumption estimation.
Konstantinos Triaridis,Vasileios Mezaris
http://arxiv.org/abs/2312.01790v1
Compressor summary: The paper presents two methods for merging the outputs of different filters to improve image manipulation localization and detection (IMLD), achieving competitive results compared to existing approaches.
Chengyin Hu,Weiwen Shi
http://arxiv.org/abs/2312.01789v1
Compressor summary: This paper introduces a novel attack method, TOUAP, for compromising cross-modal visible-infrared detectors in real-world scenarios using a two-stage optimization process involving an irregular polygonal infrared patch and a color QR code.
Toygar Tanyel,Besher Alkurdi,Serkan Ayvaz
http://arxiv.org/abs/2312.01787v1
Compressor summary: The paper proposes a data augmentation method to reduce human bias in offensive language detection on social media, aiming to improve accuracy and fairness in classifying offensive content across multiple languages.
Jiarui Xu,Yossi Gandelsman,Amir Bar,Jianwei Yang,Jianfeng Gao,Trevor Darrell,Xiaolong Wang
http://arxiv.org/abs/2312.01771v1
Compressor summary: The paper introduces IMProv, a generative model that learns visual tasks from textual and image prompts, achieving improvements in various computer vision tasks.
Ameiy Acharya,Chakka Sai Pradeep,Neelam Sinha
http://arxiv.org/abs/2312.01768v1
Compressor summary: The study uses fMRI and NSS to identify brain regions in the DMN that are most impacted by MCI, finding significant differences for PCC and Fusiform nodes.
Chen Zhang,Guorong Li,Yuankai Qi,Hanhua Ye,Laiyun Qing,Ming-Hsuan Yang,Qingming Huang
http://arxiv.org/abs/2312.01764v1
Compressor summary: The paper proposes a Dynamic Erasing Network (DE-Net) that learns multi-scale temporal features for weakly supervised video anomaly detection, handling duration variations and encouraging the discovery of gentle abnormal segments.
Yuntao Shou,Wei Ai,Tao Meng,Keqin Li
http://arxiv.org/abs/2312.01758v1
Compressor summary: The paper proposes a novel method called CZL-CIAE that leverages CLIP and FourierFormer to improve zero-shot age estimation from images and text, leading to better prediction results.
Yousuf Rayhan Emon,Md Golam Rabbani,Dr. Md. Taimur Ahad,Faruk Ahmed
http://arxiv.org/abs/2312.01756v1
Compressor summary: The paragraph discusses a literature review of machine learning methods for detecting sweet orange leaf diseases using image classification techniques.
Charika De Alvis,Dishanika Denipitiyage,Suranga Seneviratne
http://arxiv.org/abs/2312.01753v1
Compressor summary: Rebalanced Contrastive Learning (RCL) improves long tail classification by balancing feature space, reducing intra-class distance, and regularizing margins for imbalanced classes.
Qiaole Dong,Bo Zhao,Yanwei Fu
http://arxiv.org/abs/2312.01746v1
Compressor summary: The authors reproduce the closed-source DDVM model for image-to-image translation, making it open-source, and achieve comparable performance to the original with public data and GPUs.
Dixuan Lin,Yixing Peng,Jingke Meng,Wei-Shi Zheng
http://arxiv.org/abs/2312.01745v1
Compressor summary: The paper proposes a new method for person re-identification that builds fine bidirectional cross-modal associations between visual and textual modalities using adaptive dual association modules, ATP and ARA.
Ryo Watanabe,Yusuke Mukuta,Tatsuya Harada
http://arxiv.org/abs/2312.01742v1
Compressor summary: The paper proposes a novel approach, FSDDIM, to create a diffusion model within spiking neural networks (SNNs) using synaptic current learning (SCL), which enables high-speed and low-energy image generation while maintaining the advantages of SNNs.
Bingkun Nian,Fenghe Tang,Jianrui Ding,Pingping Zhang,Jie Yang,S. Kevin Zhou,Wei Liu
http://arxiv.org/abs/2312.01741v1
Compressor summary: The paper introduces a new deep neural network for weak target image segmentation that leverages reconstruction tasks and outperforms existing methods on seven datasets.
Hui Ouyang,Cheng Chen,Ke Tang
http://arxiv.org/abs/2312.01739v1
Compressor summary: This paper introduces a novel divide-and-conquer strategy for large-scale Dynamic Bayesian Network structure learning, specifically focusing on Time-sliced Bayesian Networks, and shows substantial improvements in scalability, accuracy, and computational efficiency.
Yunhao Liu,Lu Qi,Yu-Ju Tsai,Xiangtai Li,Kelvin C. K. Chan,Ming-Hsuan Yang
http://arxiv.org/abs/2312.01734v1
Compressor summary: The paper proposes an adapter for face recognition models that processes both low-quality and enhanced images using dual-input structures to overcome the limitations of traditional approaches and achieve better performance in the wild.
Fan Lu,Kai Zhu,Kecheng Zheng,Wei Zhai,Yang Cao
http://arxiv.org/abs/2312.01732v1
Compressor summary: The paper proposes a new method for detecting out-of-distribution samples in images and text that uses semantic alignment and likelihood-aware sampling to adapt to complex domain transformations.
Jie Liu,Qilin Li,Senjian An,Bradley Ezard,Ling Li
http://arxiv.org/abs/2312.01729v1
Compressor summary: EdgeConvFormer is a novel anomaly detection method for multivariate time series that combines Time2vec embedding, dynamic graph CNN, and Transformer to extract global and local spatial-time information and outperforms existing approaches on various real-world datasets.
Tong Nie,Guoyang Qin,Yuewen Mei,Jian Sun
http://arxiv.org/abs/2312.01728v1
Compressor summary: The paper proposes an effective and versatile deep neural model for multivariate time series imputation, which incorporates low-rank properties and achieves superior performance on various datasets.
Jeongho Kim,Gyojung Gu,Minho Park,Sunghyun Park,Jaegul Choo
http://arxiv.org/abs/2312.01725v1
Compressor summary: Key points: - The paper proposes StableVITON, a method to do image-based virtual try-on using a pre-trained diffusion model. - The method learns the semantic correspondence between clothing and body in the latent space of the model. - The method uses zero cross-attention blocks, attention total variation loss, and augmentation to preserve clothing details and generate high-fidelity images. Summary: StableVITON is a novel virtual try-on method that leverages a pre-trained diffusion model to learn the semantic correspondence between clothing and body in the latent space and produce realistic images with sharp attention maps.
Moritz Lampert,Ingo Scholtes
http://arxiv.org/abs/2312.01721v1
Compressor summary: The self-loop paradox is a phenomenon where the information a node gains from itself can be smaller in graphs with self-loops compared to graphs without, depending on the GNN architecture, number of layers, and whether the layer number is even or odd.
Bingshuai Liu,Chenyang Lyu,Zijun Min,Zhanyu Wang,Jinsong Su,Longyue Wang
http://arxiv.org/abs/2312.01714v1
Compressor summary: The paper proposes a method to improve multi-modal reasoning in LLMs by using retrieval mechanisms to select relevant examples, achieving state-of-the-art results on ScienceQA dataset.
Xubin Zhong,Changxing Ding,Yupeng Hu,Dacheng Tao
http://arxiv.org/abs/2312.01713v1
Compressor summary: The paper proposes Shunted Cross-Attention (SCA) and Interaction-aware Pose Estimation (IPE) to improve one-stage HOI detection by extracting disentangled interaction representations using different attention heads and a novel attention module for human pose features.
Mingyue Guo,Li Yuan,Zhaoyi Yan,Binghui Chen,Yaowei Wang,Qixiang Ye
http://arxiv.org/abs/2312.01711v1
Compressor summary: mPrompt is a method that uses both point and segmentation annotations to guide each other, reducing bias and improving accuracy in crowd counting tasks.
Lei Wang,Jiabang He,Shenshen Li,Ning Liu,Ee-Peng Lim
http://arxiv.org/abs/2312.01701v1
Compressor summary: The paper introduces ReCaption, a framework to reduce fine-grained object hallucinations in instruction-tuned large vision-language models using ChatGPT and a new probing-based evaluation method called Fine-Grained Object Hallucination Evaluation.
Zige Wang,Wanjun Zhong,Yufei Wang,Qi Zhu,Fei Mi,Baojun Wang,Lifeng Shang,Xin Jiang,Qun Liu
http://arxiv.org/abs/2312.01700v1
Compressor summary: The paragraph discusses the importance of data management in training Large Language Models, and provides a survey of current research and challenges in this field.
Jinguo Cheng,Ke Li,Yuxuan Liang,Lijun Sun,Junchi Yan,Yuankai Wu
http://arxiv.org/abs/2312.01699v1
Compressor summary: SUMformer is a novel approach to urban mobility prediction that treats city data as complex multivariate time series and uses a special attention mechanism to capture temporal and cross-variable correlations, achieving better results than existing methods.
Yizhou Wang,Yixuan Wu,Shixiang Tang,Weizhen He,Xun Guo,Feng Zhu,Lei Bai,Rui Zhao,Jian Wu,Tong He,Wanli Ouyang
http://arxiv.org/abs/2312.01697v1
Compressor summary: Hulk is a multimodal human-centric generalist model that can handle various perception tasks without fine-tuning, by using discrete and continuous representations for different modalities.
Zhenxin Li,Shiyi Lan,Jose M. Alvarez,Zuxuan Wu
http://arxiv.org/abs/2312.01696v1
Compressor summary: The paper proposes BEVNeXt, a dense Bird's Eye View framework for 3D object detection that combines depth estimation, temporal aggregation, and perspective techniques with CRF-modulated depth embedding, achieving state-of-the-art results on the nuScenes benchmark.
Bracha Laufer-Goldshtein,Adam Fisch,Regina Barzilay,Tommi Jaakkola
http://arxiv.org/abs/2312.01692v1
Compressor summary: The paper proposes a method to find machine learning model configurations that balance various risks and metrics using Bayesian Optimization and risk-controlling procedures.
Hongjie Liu,Haotian Shi,Sicheng Fu,Tengfei Yuan,Xinhuan Zhang,Hongzhe Xu,Bin Ran
http://arxiv.org/abs/2312.01687v1
Compressor summary: The study presents a method for extracting features from bus travel data using Point of Interest (POI) data, enhanced P-KMENAS and P-LDA algorithms, which can improve bus travel attractiveness, usage, congestion, and emissions by understanding travel behavior.
Shi Zhenning,Dong Changsheng,Xie Xueshuo,Pan Bin,He Along,Li Tao
http://arxiv.org/abs/2312.01682v1
Compressor summary: ResEnsemble-DDPM is a method that combines denoising diffusion probabilistic models and end-to-end models for better image segmentation by introducing a residual term and using ensemble learning.
Haochen Zhang,Yuyang Dong,Chuan Xiao,Masafumi Oyamada
http://arxiv.org/abs/2312.01678v1
Compressor summary: The paper introduces Jellyfish, an open-source LLM for DP tasks that can operate on a low-priced GPU, learn domain knowledge during tuning, and explain its output decisions with an interpreter.
Xin Lin,Chao Ren,Kelvin C. K. Chan,Lu Qi,Jinshan Pan,Ming-Hsuan Yang
http://arxiv.org/abs/2312.01677v1
Compressor summary: DINO-IR is a novel multi-task image restoration approach that uses robust features from DINOv2 to achieve better performance than existing methods in various tasks.
Jingyu Pan,Chen-Chia Chang,Zhiyao Xie,Yiran Chen
http://arxiv.org/abs/2312.01674v1
Compressor summary: The paper introduces EDALearn, a benchmark suite for Machine Learning in Electronic Design Automation, which provides a comprehensive and open-source dataset with end-to-end data collection, analysis, and reproducibility to promote research and efficiency in VLSI design.
Zheng Chen,Huming Liu
http://arxiv.org/abs/2312.01672v1
Compressor summary: STADEE is a novel deep detection method that combines statistics with deep learning to identify machine-generated text, outperforming existing methods in various scenarios.
Hanyu Wang,Pengxiang Wu,Kevin Dela Rosa,Chen Wang,Abhinav Shrivastava
http://arxiv.org/abs/2312.01671v1
Compressor summary: The text introduces a novel method for MultiModality-guided Image Style Transfer (MMIST) that improves style transfer based on text guidance and allows inputs from various sources, achieving state-of-the-art performance on TIST task and effectiveness on MMIST task.
Runze He,Shaofei Huang,Xuecheng Nie,Tianrui Hui,Luoqi Liu,Jiao Dai,Jizhong Han,Guanbin Li,Si Liu
http://arxiv.org/abs/2312.01663v1
Compressor summary: The paper introduces a CustomNeRF model that can edit 3D scenes based on texts or images, addressing challenges like foreground editing and multi-view consistency.
Phuoc Pham Van Long,Duc Anh Vu,Nhat M. Hoang,Xuan Long Do,Anh Tuan Luu
http://arxiv.org/abs/2312.01661v1
Compressor summary: The paragraph discusses using ChatGPT, a large language model, to generate mathematical questions for different levels of education and evaluates its performance in both context-aware and context-unaware settings.
Chi-Hsi Kung,Chieh-Chi Yang,Pang-Yuan Pao,Shu-Wei Lu,Pin-Lun Chen,Hsin-Cheng Lu,Yi-Ting Chen
http://arxiv.org/abs/2312.01659v1
Compressor summary: The paper introduces RiskBench, a benchmark for evaluating risk identification algorithms in intelligent driving systems that aim to achieve zero collisions.
Yun Yue,Zhiling Ye,Jiadi Jiang,Yongchao Liu,Ke Zhang
http://arxiv.org/abs/2312.01658v1
Compressor summary: The paper proposes AGD, a new optimizer for deep learning that uses a novel preconditioning matrix and an auto-switching function to improve generalization performance on various datasets.
Sheikh Waqas Akhtar
http://arxiv.org/abs/2312.01657v1
Compressor summary: Neural-ODE use continuous neural networks to solve differential equations with advantages in memory efficiency, adaptability, and flexibility, but they have stability issues that can be addressed by using Nesterov's accelerated gradient (NAG) based ODE-solver.
Evan Dogariu
http://arxiv.org/abs/2312.01653v1
Compressor summary: The text describes a new end-to-end training pipeline for neural network sparsification that reduces model size, complexity, and memory footprint while maintaining competitive performance.
Linh Van Ma,Muhammad Ishfaq Hussain,JongHyun Park,Jeongbae Kim,Moongu Jeon
http://arxiv.org/abs/2312.01650v1
Compressor summary: The paper presents an improved version of ByteTrack, a multiple object tracking algorithm, that adapts its confidence threshold based on detection performance.
Randall Balestriero,Romain Cosentino,Sarath Shekkizhar
http://arxiv.org/abs/2312.01648v1
Compressor summary: The authors propose a geometric approach to understand and manipulate large language models, revealing new features that help solve various tasks without relying on approximations or fine-tuning.
Aditya Paranjape,Yash Patwardhan,Vedant Deshpande,Aniket Darp,Jayashree Jagdale
http://arxiv.org/abs/2312.01642v1
Compressor summary: The paper presents a voice-based chatbot for cars to improve road safety by automating tasks such as navigation, calls, weather forecasts, and music using voice commands instead of manual actions.
Jiandong Jin,Xiao Wang,Chenglong Li,Lili Huang,Jin Tang
http://arxiv.org/abs/2312.01640v1
Compressor summary: The paper proposes a new generative model called SequencePAR for pedestrian attribute recognition that uses visual features, text prompts, and attention mechanisms to improve performance on complex and imbalanced data.
Evan Dogariu,Jiatong Yu
http://arxiv.org/abs/2312.01634v1
Compressor summary: The paper introduces statistical learning, robust streaming techniques, and their connections, aiming to enlighten and inspire further research in both fields.
Jie Wang,Xianyan Li,Jiucheng Xie,Feng Xu,Hao Gao
http://arxiv.org/abs/2312.01632v1
Compressor summary: GaussianHead is a head avatar algorithm that uses 3D gaussian primitives to represent dynamic scenes efficiently and accurately, achieving optimal visual results in various tasks.
Piotr Teterwak,Ximeng Sun,Bryan A. Plummer,Kate Saenko,Ser-Nam Lim
http://arxiv.org/abs/2312.01629v1
Compressor summary: The paper explores adapting large language models (LLMs) for image classification using contrastive learning, improving their performance and retaining their generative abilities.
Muhammad Kamran Janjua,Haseeb Shah,Martha White,Erfan Miahi,Marlos C. Machado,Adam White
http://arxiv.org/abs/2312.01624v1
Compressor summary: The paper explores using reinforcement learning for predicting water treatment plant operations and shows that online learning improves prediction accuracy.
Yong Liu,Cairong Zhang,Yitong Wang,Jiahao Wang,Yujiu Yang,Yansong Tang
http://arxiv.org/abs/2312.01623v1
Compressor summary: The paper introduces UniLSeg, a universal segmentation model that can perform segmentation at any semantic level using language instructions and a unified data format.
Zhengyu Hu,Jieyu Zhang,Yue Yu,Yuchen Zhuang,Hui Xiong
http://arxiv.org/abs/2312.01619v1
Compressor summary: LEMR is a framework that reduces annotation costs for model selection tasks by using ensemble methods, uncertainty sampling, and Z-score refinement, achieving comparable results to fully labeled datasets with less labeling effort.
Yunfei Fan,Tianyu Zhao,Guidong Wang
http://arxiv.org/abs/2312.01616v1
Compressor summary: The SchurVINS framework combines high accuracy and low computational complexity for visual inertial navigation systems by using a complete residual model and Schur complement.
Duc Q. Nguyen,Thanh Toan Nguyen,Tho quan
http://arxiv.org/abs/2312.01612v1
Compressor summary: The article introduces xNeuSM, an explainable neural subgraph matching method that adapts attention factors for each node and improves prediction accuracy and query time compared to existing methods.
Amena Darwish,Stefan Ericson,Rohollah Ghasemi,Tobias Andersson,Dan Lönn,Andreas Andersson Lassila,Kent Salomonsson
http://arxiv.org/abs/2312.01606v1
Compressor summary: The study proposes a deep learning model that predicts two critical weld characteristics using various laser welding input factors and shows promising results for improving welding quality assurance.
Mulham Fawakherji,Eduard Vazquez,Pasquale Giampa,Binod Bhattarai
http://arxiv.org/abs/2312.01605v1
Compressor summary: The paragraph discusses a new text augmentation technique called CutMixOut, which combines cutout and cutmix methods to improve multimodal person re-identification performance.
Wei Chen,Huaiyu Wan,Yuting Wu,Shuyuan Zhao,Jiayaqi Cheng,Yuxin Li,Youfang Lin
http://arxiv.org/abs/2312.01601v1
Compressor summary: The paper proposes a new method, LogCL, for predicting future facts in temporal knowledge graphs by using contrastive learning to fuse local and global historical information and improving robustness against noise.
Kaiwen Yang,Tao Shen,Xinmei Tian,Xiubo Geng,Chongyang Tao,Dacheng Tao,Tianyi Zhou
http://arxiv.org/abs/2312.01598v1
Compressor summary: The paragraph introduces QVix, a new prompting strategy for large vision-language models to improve their zero-shot image reasoning capabilities by asking more detailed questions about the input images.
Feng Wang,Jieru Mei,Alan Yuille
http://arxiv.org/abs/2312.01597v1
Compressor summary: The paper proposes a novel self-attention mechanism called Correlative Self-Attention (CSA) that adapts CLIP for zero-shot semantic segmentation, achieving significant improvements over existing methods.
Cong-Duy Nguyen,The-Anh Vu-Le,Thong Nguyen,Tho Quan,Luu Anh Tuan
http://arxiv.org/abs/2312.01592v1
Compressor summary: The paper introduces GroundedBERT, a method that combines BERT with visual information using Optimal Transport to improve language learning for grounded tasks.
Haoyu Jiang,Haiyang Yu,Nan Li,Ping Yi
http://arxiv.org/abs/2312.01585v1
Compressor summary: The study proposes OCGEC, a novel one-class classification framework using graph neural networks for model-level backdoor detection in DNNs with minimal clean data and achieves high AUC scores.
Eleftheria Briakou,Navita Goyal,Marine Carpuat
http://arxiv.org/abs/2312.01582v1
Compressor summary: The authors propose a technique to generate contrastive highlights for explaining predictions of semantic divergence models, which improves on existing saliency methods in capturing fine-grained meaning differences.
Sachit Kuhar,Yash Jain,Alexey Tumanov
http://arxiv.org/abs/2312.01581v1
Compressor summary: Signed Binarization is a framework that improves accuracy and efficiency of DNNs on edge devices by combining hardware-software systems, quantization functions, and representation learning techniques to balance repetition and sparsity during inference.
Jodie A. Cochrane,Adrian G. Wills,Sarah J. Johnson
http://arxiv.org/abs/2312.01577v1
Compressor summary: The paper proposes a new algorithm for learning Bayesian decision trees using Hamiltonian Monte Carlo (HMC) to improve efficiency and exploration of the posterior.
Yiyun Zhang,Zijian Wang,Yadan Luo,Xin Yu,Zi Huang
http://arxiv.org/abs/2312.01576v1
Compressor summary: The paper proposes U-BDD++, a self-supervised framework for detecting building damage from unlabelled satellite images, using vision-language models to handle domain-specific issues and improve training quality.
Keito Kudo,Haruki Nagasawa,Jun Suzuki,Nobuyuki Shimizu
http://arxiv.org/abs/2312.01575v1
Compressor summary: The paper introduces a new video summarization task that combines keyframe selection and caption generation, creates a dataset to evaluate it, and proposes a practical application for the task.
Li Li,Jiawei Peng,Huiyi Chen,Chongyang Gao,Xu Yang
http://arxiv.org/abs/2312.01571v1
Compressor summary: The study explores diverse in-context configurations for Large Vision-Language Models using Visual Question Answering and improves their performance, while gaining insights into the inner properties of these models.
Omer Subasi
http://arxiv.org/abs/2312.01567v1
Compressor summary: The paper presents MUSE, a search algorithm for quantum variational machine learning that improves accuracy in classification and regression tasks compared to previous methods.
Sanjoy Chowdhury,Sayan Nag,Dinesh Manocha
http://arxiv.org/abs/2312.01564v1
Compressor summary: APoLLo is a method that combines adapter and prompt learning for vision-language models, improving their generalization in few-shot settings by using trainable cross-attention layers and enforcing encoder consistency.
Yan Xu,Kris Kitani
http://arxiv.org/abs/2312.01561v1
Compressor summary: PME is a method for cross-view person matching and 3D human pose estimation in multi-camera networks without requiring 3D data or camera poses, using clustering and geometric constraints to solve the problem.
Shima Rezasoltani,Faisal Z. Qureshi
http://arxiv.org/abs/2312.01558v1
Compressor summary: The paper proposes a hyperspectral image compression method using neural networks that outperforms existing methods at low bitrates and is faster with sampling.
Stephanie Baker,Wei Xiang
http://arxiv.org/abs/2312.01555v1
Compressor summary: The paragraph discusses how explainable AI (XAI) is not only important for transparency but also crucial for ensuring fairness, robustness, privacy, security, and transparency in various applications of responsible AI (RAI).
Bill Yuchen Lin,Abhilasha Ravichander,Ximing Lu,Nouha Dziri,Melanie Sclar,Khyathi Chandu,Chandra Bhagavatula,Yejin Choi
http://arxiv.org/abs/2312.01552v1
Compressor summary: The paragraph discusses a study (LIMA) that shows alignment tuning in large language models may be superficial, as base models and aligned versions perform similarly on most tokens, and proposes a new method (URIAL) for tuning-free alignment using in-context learning.
Xiaoyuan Cheng,Yiming Yang,Wei Jiang,Yukun Hu
http://arxiv.org/abs/2312.01544v1
Compressor summary: The paper presents KEEC, a method for learning and controlling dynamical systems with complex dynamics using equivariant geometry, which achieves quadratic convergence and outperforms other loss functions.