This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-11-30 generated by the compressor, my personal LLM-based project.
Zeyuan Yin,Zhiqiang Shen
http://arxiv.org/abs/2311.18838v1
Compressor summary: The text describes a new method called CDA that improves the accuracy of distilling large datasets like ImageNet-1K and 21K under a standard resolution, outperforming previous approaches and reducing the gap to full-data training.
Lihao Liu,Yanqi Cheng,Zhongying Deng,Shujun Wang,Dongdong Chen,Xiaowei Hu,Pietro Liò,Carola-Bibiane Schönlieb,Angelica Aviles-Rivero
http://arxiv.org/abs/2311.18839v1
Compressor summary: The text introduces TrafficMOT, a diverse and complex dataset for multi-object tracking in traffic videos, which can improve traffic monitoring accuracy and road safety by testing advanced machine learning algorithms.
Dominick Reilly,Srijan Das
http://arxiv.org/abs/2311.18840v1
Compressor summary: The PI-ViT (or $\pi$-ViT) is a new approach that improves video transformers for human action recognition in Activities of Daily Living by adding 2D and 3D pose information to RGB images, achieving state-of-the-art performance without poses or extra computation during inference.
Yao Feng,Jing Lin,Sai Kumar Dwivedi,Yu Sun,Priyanka Patel,Michael J. Black
http://arxiv.org/abs/2311.18836v1
Compressor summary: PoseGPT is a framework that uses large language models to understand and generate 3D human poses from images or text descriptions, enabling advanced tasks like speculative pose generation and reasoning about pose estimation.
Zhen Xing,Qi Dai,Zihao Zhang,Hui Zhang,Han Hu,Zuxuan Wu,Yu-Gang Jiang
http://arxiv.org/abs/2311.18837v1
Compressor summary: Video Instruction Diffusion (VIDiff) is a fast and versatile model that edits and enhances long videos based on written instructions, using diffusion models for various video tasks.
Rongyao Fang,Shilin Yan,Zhaoyang Huang,Jingqiu Zhou,Hao Tian,Jifeng Dai,Hongsheng Li
http://arxiv.org/abs/2311.18835v1
Compressor summary: InstructSeq is a framework that uses natural language instructions to control diverse vision tasks, enabling more versatile and human-like artificial intelligence.
Wenming Weng,Ruoyu Feng,Yanhui Wang,Qi Dai,Chunyu Wang,Dacheng Yin,Zhiyuan Zhao,Kai Qiu,Jianmin Bao,Yuhui Yuan,Chong Luo,Yueyi Zhang,Zhiwei Xiong
http://arxiv.org/abs/2311.18834v1
Compressor summary: ART$\boldsymbol{\cdot}$V is a framework for generating videos frame by frame with diffusion models, avoiding complex motions, preserving high-fidelity, and handling drifting issues, enabling various applications and quality.
Hsin-Ying Lee,Hung-Yu Tseng,Hsin-Ying Lee,Ming-Hsuan Yang
http://arxiv.org/abs/2311.18832v1
Compressor summary: The paper proposes a new pipeline that uses pre-trained text-to-image models to predict pixel-level properties from images, while addressing the domain gap and making the process deterministic.
Shuyuan Tu,Qi Dai,Zhi-Qi Cheng,Han Hu,Xintong Han,Zuxuan Wu,Yu-Gang Jiang
http://arxiv.org/abs/2311.18830v1
Compressor summary: MotionEditor is a diffusion model for video motion editing that incorporates a content-aware motion adapter into ControlNet to preserve the original background and protagonist appearance while modifying the motion information.
Yanhui Wang,Jianmin Bao,Wenming Weng,Ruoyu Feng,Dacheng Yin,Tao Yang,Jingxu Zhang,Qi Dai Zhiyuan Zhao,Chunyu Wang,Kai Qiu,Yuhui Yuan,Xiaoyan Sun,Chong Luo,Baining Guo
http://arxiv.org/abs/2311.18829v1
Compressor summary: MicroCinema is a framework that generates coherent text-to-video by dividing the process into two stages and using advanced image models to enhance appearance preservation and motion dynamics.
Tianwei Yin,Michaël Gharbi,Richard Zhang,Eli Shechtman,Frédo Durand,William T. Freeman,Taesung Park
http://arxiv.org/abs/2311.18828v1
Compressor summary: DMD is a method that turns diffusion models into one-step image generators with minimal quality loss by matching distributions using score functions and a regression loss, achieving competitive results compared to other few-step diffusion approaches while being much faster.
Kaiwen Hou
http://arxiv.org/abs/2311.18826v1
Compressor summary: The paper introduces a new method for causal inference using continuous normalizing flows that improves efficiency, robustness, and geometric properties of parametric submodels.
Dongho Lee,Jongseo Lee,Jinwoo Choi
http://arxiv.org/abs/2311.18825v1
Compressor summary: The CAST model uses RGB input to achieve a balanced spatio-temporal understanding of videos for action recognition and outperforms existing methods on various benchmarks.
Zhiqiu Xu,Yanjie Chen,Kirill Vishniakov,Yida Yin,Zhiqiang Shen,Trevor Darrell,Lingjie Liu,Zhuang Liu
http://arxiv.org/abs/2311.18823v1
Compressor summary: Weight selection is a method to initialize smaller neural networks by selecting some weights from a pretrained larger model, improving their performance and reducing training time.
Moayed Haji-Ali,Guha Balakrishnan,Vicente Ordonez
http://arxiv.org/abs/2311.18822v1
Compressor summary: ElasticDiffusion is a training-free method that allows text-to-image diffusion models to generate images with various sizes by decoupling the generation trajectory into local and global signals, resulting in better image coherence quality compared to other methods.
Kaifeng Lyu,Jikai Jin,Zhiyuan Li,Simon S. Du,Jason D. Lee,Wei Hu
http://arxiv.org/abs/2311.18817v1
Compressor summary: The paper investigates the "grokking" phenomenon in neural networks, where training accuracy is initially poor but improves drastically after a certain point, and attributes it to implicit biases during the learning process.
Yijia Zheng,Raymond A. Yeh
http://arxiv.org/abs/2311.18815v1
Compressor summary: The text discusses a new method called IMMA that protects image models from generating harmful or unauthorized content by making them difficult to fine-tune with malicious adaptations.
Yudong Wang,Jichang Guo,Wanru He,Huan Gao,Huihui Yue,Zenan Zhang,Chongyi Li
http://arxiv.org/abs/2311.18814v1
Compressor summary: The authors study how 18 underwater image enhancement algorithms affect the performance of 7 object detectors on underwater object detection tasks, using a total of 133 models.
Raphael Tang,Xinyu Zhang,Jimmy Lin,Ferhan Ture
http://arxiv.org/abs/2311.18812v1
Compressor summary: The study uses a probe to examine sociodemographic biases in large language models' latent representations, even when they refuse to respond.
Chicago Park,Shirin Shoushtari,Weijie Gan,Ulugbek S. Kamilov
http://arxiv.org/abs/2311.18810v1
Compressor summary: The paper explains why PnP-ADMM works well with expansive CNNs by relating them to MMSE denoisers and proximal operators.
Evin Pınar Örnek,Yann Labbé,Bugra Tekin,Lingni Ma,Cem Keskin,Christian Forster,Tomas Hodan
http://arxiv.org/abs/2311.18809v1
Compressor summary: FoundPose is a 6D pose estimation method for unseen rigid objects from a single RGB image that uses DINOv2, a vision foundation model, and performs well on the BOP benchmark.
Jake M. Hofman,Angelos Chatzimparmpas,Amit Sharma,Duncan J. Watts,Jessica Hullman
http://arxiv.org/abs/2311.18807v1
Compressor summary: The authors explore using pre-registration, a practice from explanatory modeling, to improve reproducibility and reliability in predictive modeling by preventing biased estimates and data-dependent decision-making.
Akshay Punjabi,Pablo Izquierdo Ayala
http://arxiv.org/abs/2311.18806v1
Compressor summary: The authors propose a minimalist U-Net model for accurate precipitation forecasting that considers the environmental impact of computational resources.
Qi Cao,Takeshi Kojima,Yutaka Matsuo,Yusuke Iwasawa
http://arxiv.org/abs/2311.18805v1
Compressor summary: The study examines the ability of large language models, especially GPT-4, to handle and recover from scrambled sentences and finds that they can almost perfectly reconstruct original sentences even when all letters within each word are scrambled.
Samuel Stevens,Jiaman Wu,Matthew J Thompson,Elizabeth G Campolongo,Chan Hee Song,David Edward Carlyn,Li Dong,Wasila M Dahdul,Charles Stewart,Tanya Berger-Wolf,Wei-Lun Chao,Yu Su
http://arxiv.org/abs/2311.18803v1
Compressor summary: The authors present TreeOfLife-10M, a large dataset of biology images, and BioCLIP, a foundation model for the tree of life that uses computer vision to extract information from these images, achieving superior performance on various biology classification tasks.
Ayush Baid,John Lambert,Travis Driver,Akshay Krishnan,Hayk Stepanyan,Frank Dellaert
http://arxiv.org/abs/2311.18801v1
Compressor summary: The authors compare global and incremental Structure-from-Motion methods using a modular framework that combines recent developments in feature extraction and matching, but find that SIFT features still perform best for incremental SfM.
Artemis Panagopoulou,Le Xue,Ning Yu,Junnan Li,Dongxu Li,Shafiq Joty,Ran Xu,Silvio Savarese,Caiming Xiong,Juan Carlos Niebles
http://arxiv.org/abs/2311.18799v1
Compressor summary: The paper introduces a cross-modality framework for integrating various modalities without extensive customization, using frozen large language models and collecting high-quality instruction tuning data.
Linfeng Du,Ji Xin,Alex Labach,Saba Zuberi,Maksims Volkovs,Rahul G. Krishnan
http://arxiv.org/abs/2311.18780v1
Compressor summary: MultiResFormer is a new transformer-based model that adapts to different periodicity in time series data and achieves better performance on long-term forecasting tasks compared to existing methods.
Saurabh Page,Sudeep Mangalvedhekar,Kshitij Deshpande,Tanmay Chavan,Sheetal Sonawane
http://arxiv.org/abs/2311.18778v1
Compressor summary: The paper describes a study on detecting violence-inciting texts in Bangla, using BERT-based models and achieving an ensemble F1 score of 0.737.
Zineng Tang,Ziyi Yang,Mahmoud Khademi,Yang Liu,Chenguang Zhu,Mohit Bansal
http://arxiv.org/abs/2311.18775v1
Compressor summary: CoDi-2 is an advanced AI system that can understand and generate various types of inputs and outputs, such as text, vision, and audio, by following complex instructions and examples.
Rohan Myer Krishnan,Zitian Tang,Zhiqiu Yu,Chen Sun
http://arxiv.org/abs/2311.18773v1
Compressor summary: Spacewalk-18 is a benchmark for evaluating video-language models' ability to learn skills from human demonstrations in long and multimodal spacewalk videos, which challenges current methods.
Lei Xin,George Chiu,Shreyas Sundaram
http://arxiv.org/abs/2311.18769v1
Compressor summary: The paper proposes an online change point detection method for linear systems with unknown dynamics and temporal correlations that can achieve a pre-specified false alarm bound and provides finite-sample-based guarantees on detection probability and delay.
Yanqing Liu,Kai Wang,Wenqi Shao,Ping Luo,Yu Qiao,Mike Zheng Shou,Kaipeng Zhang,Yang You
http://arxiv.org/abs/2311.18765v1
Compressor summary: This paper shows that using multi-modal large language models to generate multiple captions for images improves visual-language representation learning and image-text retrieval performance.
James Seale Smith,Yen-Chang Hsu,Zsolt Kira,Yilin Shen,Hongxia Jin
http://arxiv.org/abs/2311.18763v1
Compressor summary: STAMINA is a novel method that improves text-to-image diffusion models for sequential concept learning by using low-ranked attention-masked adapters and customized MLP tokens, achieving better performance on 50-concept landmarks and human faces benchmarks.
Aryaman Chobey,Oliver Smith,Anzi Wang,Grusha Prasad
http://arxiv.org/abs/2311.18761v1
Compressor summary: The paper investigates if using a curriculum based on sentence-level surprisal estimates from teacher models trained on the BabyLM dataset can improve linguistic knowledge acquisition in neural language models, but finds that it does not result in better alignment with human behavior.
Yongliang Shen,Kaitao Song,Xu Tan,Wenqi Zhang,Kan Ren,Siyu Yuan,Weiming Lu,Dongsheng Li,Yueting Zhuang
http://arxiv.org/abs/2311.18760v1
Compressor summary: TaskBench is a system for evaluating large language models' ability in task automation by decomposing tasks into sub-tasks, invoking tools, and predicting parameters based on user instructions.
Daoan Zhang,Yunhao Luo,Jianguo Zhang
http://arxiv.org/abs/2311.18758v1
Compressor summary: The paper proposes a new method for semi-supervised semantic segmentation that boosts uncertainty on unlabeled data to reduce the gap between labeled and unlabeled datasets, improving model generalization and achieving state-of-the-art results.
Hiroki Furuta,Yutaka Matsuo,Aleksandra Faust,Izzeddin Gur
http://arxiv.org/abs/2311.18751v1
Compressor summary: The paragraph discusses a new benchmark called CompWoB that tests language model agents' ability to perform compositional web automation tasks, showing their performance degrades when tasks combine or change order.
Jie Shi,Arno P. J. M. Siebes,Siamak Mehrkanoon
http://arxiv.org/abs/2311.18749v1
Compressor summary: The paper presents a novel interpretable two-stream transformer model, TransCORALNet, for accurate supply chain credit assessment under segment industry and cold start problems using domain adaptation and LIME explanations.
Uchechukwu F. Njoku,Alberto Abelló,Besim Bilalli,Gianluca Bontempi
http://arxiv.org/abs/2311.18746v1
Compressor summary: The paper introduces a new method to help data scientists choose the best feature subset from many options by combining visualization and post-processing techniques for MOFS outcomes.
Xiao Liu,Xuanyu Lei,Shengyuan Wang,Yue Huang,Zhuoer Feng,Bosi Wen,Jiale Cheng,Pei Ke,Yifan Xu,Weng Lam Tam,Xiaohan Zhang,Lichao Sun,Hongning Wang,Jing Zhang,Minlie Huang,Yuxiao Dong,Jie Tang
http://arxiv.org/abs/2311.18743v1
Compressor summary: The paragraph introduces AlignBench, a benchmark for evaluating Chinese LLMs' alignment that uses human-in-the-loop data curation and a companion evaluator LLM called CritiqueLLM.
Vedant Deshpande,Yash Patwardhan,Kshitij Deshpande,Sudeep Mangalvedhekar,Ravindra Murumkar
http://arxiv.org/abs/2311.18739v1
Compressor summary: The paper presents a method for country-level Arabic dialect identification using pre-trained transformer models and ensembling, achieving 76.65 F1-score at the NADI 2023 Shared Task.
Suman Sapkota,Binod Bhattarai
http://arxiv.org/abs/2311.18735v1
Compressor summary: The paper explores how different neural architectures can be improved by using dimension mixing techniques inspired by the Fast Fourier Transform and proposes new non-linear mixers for CNNs, Transformers, and MLP-Mixers.
Sudeep Mangalvedhekar,Kshitij Deshpande,Yash Patwardhan,Vedant Deshpande,Ravindra Murumkar
http://arxiv.org/abs/2311.18730v1
Compressor summary: The paper describes an approach for two Arabic AI tasks related to persuasion and disinformation detection using pre-trained transformer models and ensembling, achieving top ranks on both tasks.
Yu Deng,Duomin Wang,Xiaohang Ren,Xingyu Chen,Baoyuan Wang
http://arxiv.org/abs/2311.18729v1
Compressor summary: The method learns one-shot 4D head synthesis from monocular videos using large-scale synthetic data generated by a 4D generative model and a transformer-based reconstructor, with a novel learning strategy for better generalization to real images.
Lénaïc Chizat,Praneeth Netrapalli
http://arxiv.org/abs/2311.18718v1
Compressor summary: The paper proposes a way to predict, measure, and control feature learning in deep learning by aligning feature updates with the backward pass, and studies its implications on hyperparameter tuning and neural network architectures.
Qing Wang,Haojie Jia,Wenfei Song,Qi Li
http://arxiv.org/abs/2311.18712v1
Compressor summary: The paper presents a new model called CoRec, which uses two components to identify coordinators and conjunct boundaries in sentences more effectively and efficiently than existing methods that rely on syntactic parsers.
Matúš Pikuliak,Andrea Hrckova,Stefan Oresko,Marián Šimko
http://arxiv.org/abs/2311.18711v1
Compressor summary: GEST is a new dataset for evaluating how well AI systems understand gender stereotypes across 9 Slavic languages and English, finding widespread stereotypical reasoning.
Matthieu Terris,Thomas Moreau
http://arxiv.org/abs/2311.18710v1
Compressor summary: The paper proposes a meta-learning approach for imaging inverse problems that can handle unsupervised settings, fine-tune to specific tasks, and recover the Bayes optimal estimator with few fine-tuning steps.
Daniel Jarne Ornia,Giannis Delimpaltadakis,Jens Kober,Javier Alonso-Mora
http://arxiv.org/abs/2311.18703v1
Compressor summary: PA-RL is a method that makes RL agents more predictable by using state sequence entropy rate as a measure and applying policy-dependent and action-dependent rewards based on entropy.
Pei Ke,Bosi Wen,Zhuoer Feng,Xiao Liu,Xuanyu Lei,Jiale Cheng,Shengyuan Wang,Aohan Zeng,Yuxiao Dong,Hongning Wang,Jie Tang,Minlie Huang
http://arxiv.org/abs/2311.18702v1
Compressor summary: The authors propose CritiqueLLM, a new critique generation model that uses dialogue-based prompting for high-quality evaluation data and shows promising scaling properties compared to GPT-4 in evaluating large language models.
Cheng Sun,Wei-En Tai,Yu-Lin Shih,Kuan-Wei Chen,Yong-Jing Syu,Kent Selwyn The,Yu-Chiang Frank Wang,Hwann-Tzong Chen
http://arxiv.org/abs/2311.18695v1
Compressor summary: The paper introduces Seg2Reg, a method that combines 1D regression and 2D segmentation for room layout reconstruction using density fields and volume rendering, improving accuracy and generalization with a strong baseline model.
Jared Markowitz,Jesse Silverberg,Gary Collins
http://arxiv.org/abs/2311.18684v1
Compressor summary: The paper proposes new off-policy actor-critic methods for reinforcement learning with mixed-sign rewards, which improve sample efficiency and performance over existing approaches.
Chantal Pellegrini,Ege Özsoy,Benjamin Busam,Nassir Navab,Matthias Keicher
http://arxiv.org/abs/2311.18681v1
Compressor summary: RaDialog is a conversational AI tool that generates accurate radiology reports for medical images and can interactively answer questions or correct errors, advancing the field of radiology.
Hewen Xiao,Jie Mei,Guangfu Ma,Weiren Wu
http://arxiv.org/abs/2311.18675v1
Compressor summary: The article proposes a novel network structure and a deep supervision strategy to address information distortion caused by interpolation in deep convolutional neural networks for salient object detection.
Sahar Nasirihaghighi,Negin Ghamsarian,Daniela Stefanics,Klaus Schoeffmann,Heinrich Husslein
http://arxiv.org/abs/2311.18666v1
Compressor summary: The authors propose a new network and framework for recognizing actions in laparoscopic surgeries, which handle challenges such as content distortion, duration variation, and scene variations using recurrent layers and frame sampling.
Ari Goodman,Gurpreet Singh,Ryan O'Shea,Peter Teague,James Hing
http://arxiv.org/abs/2311.18665v1
Compressor summary: The ASIST system for safely arresting helicopters on ships was improved by a research project called PETA, which developed a computer vision prototype that can track helicopters in real-time without hardware installation requirements.
Pedro Esteban Chavarrias Solano,Andrew Bulpitt,Venkataraman Subramanian,Sharib Ali
http://arxiv.org/abs/2311.18664v1
Compressor summary: The authors propose a multi-task learning approach using surface normal prediction and attention mechanisms to improve depth estimation in colonoscopy videos.
Jiawei Peng,Ju He,Prakhar Kaushik,Zihao Xiao,Jiteng Mu,Alan Yuille
http://arxiv.org/abs/2311.18661v1
Compressor summary: The paper presents a method to learn part segmentation from synthetic animals using SMAL models, improves domain adaptation with CB-FDM, and shows transferability across quadrupeds in PartImageNet.
Shitou Zhang,Zuchao Li,Xingshen Liu,Liming Yang,Ping Wang
http://arxiv.org/abs/2311.18658v1
Compressor summary: The paper introduces ArcMMLU, a benchmark to evaluate large language models' knowledge and reasoning in the Chinese Library & Information Science domain.
Gwanghyun Kim,Dong Un Kang,Hoigi Seo,Hayeon Kim,Se Young Chun
http://arxiv.org/abs/2311.18654v1
Compressor summary: DetText2Scene is a novel method for generating large-scale images from detailed human-centric text descriptions with high faithfulness, controllability, and naturalness in a global context.
Sijin Chen,Xin Chen,Chi Zhang,Mingsheng Li,Gang Yu,Hao Fei,Hongyuan Zhu,Jiayuan Fan,Tao Chen
http://arxiv.org/abs/2311.18651v1
Compressor summary: The paper introduces LL3DA, a language model that can understand and interact with point cloud 3D scenes directly, improving human-machine communication in complex environments.
Hai Zhang,Junzhe Xu,Shanlin Jiang,Zhenan He
http://arxiv.org/abs/2311.18649v1
Compressor summary: The paper proposes Semantic Evolution to generate high-quality semantics for few-shot learning and shows that a simple two-layer network with these semantics outperforms previous methods.
Franciskus Xaverius Erick,Mina Rezaei,Johanna Paula Müller,Bernhard Kainz
http://arxiv.org/abs/2311.18645v1
Compressor summary: The authors propose a novel stochastic vision transformer that incorporates uncertainty and distance awareness into self-supervised learning pipelines using Wasserstein distance-based attention and regularization, leading to improved performance on various tasks.
Carlos G. Correa,Sophia Sanborn,Mark K. Ho,Frederick Callaway,Nathaniel D. Daw,Thomas L. Griffiths
http://arxiv.org/abs/2311.18644v1
Compressor summary: The paper studies how people create hierarchical plans using a programming task, and finds that humans prefer reusable programs over shorter ones.
Tobias Kirschstein,Simon Giebenhain,Matthias Nießner
http://arxiv.org/abs/2311.18635v1
Compressor summary: The authors propose DiffusionAvatars, a diffusion-based neural renderer that creates high-fidelity 3D head avatars of people with intuitive control over pose and expression using a neural parametric head model, cross-attention, and TriPlane lookup.
Yau Shing Jonathan Cheung,Xi Chen,Lihe Yang,Hengshuang Zhao
http://arxiv.org/abs/2311.18628v1
Compressor summary: The paper proposes a lightweight clustering method using self-supervised vision transformer features for unsupervised semantic segmentation, achieving state-of-the-art results on two datasets.
Shishir Muralidhara,Sravan Kumar Jagadeesh,René Schuster,Didier Stricker
http://arxiv.org/abs/2311.18618v1
Compressor summary: The paper introduces Joint Panoptic Part Fusion (JPPF), a method to combine semantic areas, object instances, and semantic parts in computer vision, which is evaluated on two datasets and shows fair fusion and generalization without fine-tuning.
Tyler J. Bradshaw,Alan B. McMillan
http://arxiv.org/abs/2311.18614v1
Compressor summary: The article aims to educate readers about AI principles, focusing on aspects relevant to PET imaging using examples like convolutional neural networks and U-Net.
Daoyi Gao,Dávid Rozenberszki,Stefan Leutenegger,Angela Dai
http://arxiv.org/abs/2311.18610v1
Compressor summary: DiffCAD is a weakly-supervised probabilistic method that learns to reconstruct 3D objects from RGB images using diffusion and multiple plausible CAD models, achieving competitive performance even on real data without supervision.
Yingdi Guo
http://arxiv.org/abs/2311.18609v1
Compressor summary: The paper presents a method to improve the arithmetic capabilities of large language models by combining them with small pretrained models and prompt injection, addressing limitations like toxicity and pool performance.
Hyelin Nam,Gihyun Kwon,Geon Yeong Park,Jong Chul Ye
http://arxiv.org/abs/2311.18608v1
Compressor summary: The Contrastive Denoising Score (CDS) technique improves image editing by preserving structural details and transforming content in latent diffusion models using intermediate features from self-attention layers.
Ping Chen,Xingpeng Zhang,Chengtao Zhou,Dichao Fan,Peng Tu,Le Zhang,Yanlin Qian
http://arxiv.org/abs/2311.18605v1
Compressor summary: The authors propose a method called Triangular Distribution Transform (TDT) to map feature discrepancies to label discrepancies in convolutional neural networks for label distribution learning tasks, improving performance and correctness.
Kale-ab Tessera,Callum Rhys Tilbury,Sasha Abramowitz,Ruan de Kock,Omayma Mahjoub,Benjamin Rosman,Sara Hooker,Arnu Pretorius
http://arxiv.org/abs/2311.18598v1
Compressor summary: GANNO is a MARL approach that learns to improve neural network optimization by dynamically scheduling hyperparameters at a layerwise level using agents per layer.
Dong Li,Jiandong Jin,Yuhao Zhang,Yanlin Zhong,Yaoyang Wu,Lan Chen,Xiao Wang,Bin Luo
http://arxiv.org/abs/2311.18592v1
Compressor summary: The study presents a novel pattern recognition framework that fuses RGB frames, event streams, and semantic labels using large-scale vision-language models like CLIP.
Juyoung Yun
http://arxiv.org/abs/2311.18587v1
Compressor summary: The study proposes using 16-bit precision for ongoing training of 32-bit deep learning models, which improves speed without sacrificing accuracy and reduces computational resources.
Shiyao Cui,Zhenyu Zhang,Yilong Chen,Wenyuan Zhang,Tianyun Liu,Siqi Wang,Tingwen Liu
http://arxiv.org/abs/2311.18580v1
Compressor summary: The paper introduces FFT, a new benchmark to evaluate the potential harms of large language models based on factuality, fairness, and toxicity.
Yongjie Duan,Zhiyu Pan,Jianjiang Feng,Jie Zhou
http://arxiv.org/abs/2311.18576v1
Compressor summary: The paper introduces LDRF, a flexible and accurate fixed-length fingerprint representation that uses localized deep learning to handle different visible areas and poses, and proposes a matching score normalization technique to reduce false matches in large databases.
Yuli Slavutsky,Yuval Benjamini
http://arxiv.org/abs/2311.18575v1
Compressor summary: The paper proposes an algorithm for zero-shot classifiers to handle distribution shifts in unseen classes by using hierarchical data sampling and out-of-distribution generalization.
Avijit Dasgupta,C. V. Jawahar,Karteek Alahari
http://arxiv.org/abs/2311.18572v1
Compressor summary: The paper proposes a self-training based source-free video domain adaptation method that handles noisy labels and uses a teacher-student framework to improve performance on target domain videos.
Karolina Stańczak,Kevin Du,Adina Williams,Isabelle Augenstein,Ryan Cotterell
http://arxiv.org/abs/2311.18567v1
Compressor summary: The paragraph discusses a study that challenges the neo-Whorfian hypothesis by showing that grammatical gender has little to no impact on how people choose adjectives for inanimate nouns when controlling for meaning.
Tianli Liao,Chenyang Zhao,Lei Li,Heling Cao
http://arxiv.org/abs/2311.18564v1
Compressor summary: The paper presents a local alignment and stitching method that improves image quality by evaluating seam quality and adjusting pixel regions with low quality.
Yurui Chen,Chun Gu,Junzhe Jiang,Xiatian Zhu,Li Zhang
http://arxiv.org/abs/2311.18561v1
Compressor summary: The Periodic Vibration Gaussian model uses a 3D Gaussian splatting technique with periodic vibrations to represent dynamic urban scenes and outperforms existing methods without relying on object labels or optical flow estimation.
Alexandru Ţifrea,Gizem Yüce,Amartya Sanyal,Fanny Yang
http://arxiv.org/abs/2311.18557v1
Compressor summary: The paragraph discusses the limitations and potential of semi-supervised learning (SSL) algorithms in improving over supervised learning (SL) and unsupervised learning (UL) methods, using 2-Gaussian mixture models as an example.
Daniel Grimm,Maximilian Zipfl,Felix Hertlein,Alexander Naumann,Jürgen Lüttin,Steffen Thoma,Stefan Schmid,Lavdim Halilaj,Achim Rettinger,J. Marius Zöllner
http://arxiv.org/abs/2311.18553v1
Compressor summary: The paragraph describes a new vector-based approach for predicting traffic trajectories that improves on existing methods by using a semantic scene graph, image-based map features, and anchor paths to account for agent interactions, context, and constraints.
Tuomas Jalonen,Mohammad Al-Sa'd,Serkan Kiranyaz,Moncef Gabbouj
http://arxiv.org/abs/2311.18547v1
Compressor summary: The paper presents a real-time CNN model for diagnosing multiple bearing faults under various conditions and compares it to the current state-of-the-art approach, showing significant accuracy gains and robustness to noise.
Jiwon Kim,Byeongho Heo,Sangdoo Yun,Seungryong Kim,Dongyoon Han
http://arxiv.org/abs/2311.18540v1
Compressor summary: The paper presents a simple method that uses unlabeled pairs to improve semantic correspondence without extra annotations, achieving better results than existing methods.
Ju He,Qihang Yu,Inkyu Shin,Xueqing Deng,Xiaohui Shen,Alan Yuille,Liang-Chieh Chen
http://arxiv.org/abs/2311.18537v1
Compressor summary: MaXTron is a framework that uses Mask XFormer with Trajectory Attention for panoptic segmentation, enhancing temporal consistency with within-clip and cross-clip tracking modules.
Haoyang Liu,Tiancheng Xing,Luwei Li,Vibhu Dalal,Jingrui He,Haohan Wang
http://arxiv.org/abs/2311.18531v1
Compressor summary: The paper proposes a new dataset distillation method using Wasserstein distance to match synthetic data with extensive datasets, achieving better performance on several benchmarks.
Maciej Besta,Afonso Claudino Catarino,Lukas Gianinazzi,Nils Blach,Piotr Nyczyk,Hubert Niewiadomski,Torsten Hoefler
http://arxiv.org/abs/2311.18526v1
Compressor summary: The HOT model improves dynamic link prediction by using higher-order graph structures and hierarchy in attention matrices, achieving high accuracy with low memory usage.
Alison Peard,Jim Hall
http://arxiv.org/abs/2311.18521v1
Compressor summary: The authors propose a new method using GANs to simulate realistic compound hazards from climate risk data, which can help with climate adaptation and disaster preparedness.
Pakizar Shamoi,Muragul Muratbekova
http://arxiv.org/abs/2311.18518v1
Compressor summary: The paper presents a fuzzy set approach to classify emotions in paintings using color associations, which correlates well with human judgments and has potential applications in various fields.
Aritra Bhowmik,Martin R. Oswald,Pascal Mettes,Cees G. M. Snoek
http://arxiv.org/abs/2311.18512v1
Compressor summary: The paper proposes a simpler and more effective alternative for detecting objects in images by regressing to intersections between proposals and ground truth boxes, instead of overlapping areas.
Tengjin Weng,Yang Shen,Zhidong Zhao,Zhiming Cheng,Shuai Wang
http://arxiv.org/abs/2311.18496v1
Compressor summary: The proposed method uses multiple initialization networks to generate pseudo-labels and consensus information to denoise optic disc and cup segmentation data for better glaucoma screening and diagnosis.
Avery Ma,Amir-massoud Farahmand,Yangchen Pan,Philip Torr,Jindong Gu
http://arxiv.org/abs/2311.18495v1
Compressor summary: The paper proposes a method to make neural networks better at generating adversarial perturbations that work across different models by fine-tuning the source model using a witness model, and shows improved transferability in experiments.
Violeta Menéndez González,Andrew Gilbert,Graeme Phillipson,Stephen Jolly,Simon Hadfield
http://arxiv.org/abs/2311.18491v1
Compressor summary: ZeST-NeRF is a novel approach that can generate temporal NeRFs for new scenes without retraining, using multi-view synthesis techniques and scene flow-field estimation, achieving improved visual and quantitative results.
Jin-Chuan Shi,Miao Wang,Hao-Bin Duan,Shao-Hua Guan
http://arxiv.org/abs/2311.18482v1
Compressor summary: The text introduces Language Embedded 3D Gaussians, a new way to represent scenes for open-vocabulary query tasks that uses less memory and performs better than previous approaches.
Lokesh Mishra,Cesar Berrospi,Kasper Dinkla,Diego Antognini,Francesco Fusco,Benedikt Bothur,Maksym Lysak,Nikolaos Livathinos,Ahmed Nassar,Panagiotis Vagenas,Lucas Morin,Christoph Auer,Michele Dolfi,Peter Staar
http://arxiv.org/abs/2311.18481v1
Compressor summary: Deep Search DocQA is a conversational AI system that helps users extract information from ESG reports using computer vision, NLP, and language models.
Bruno D. Ferreira-Saraiva,Joao P. Matos-Carvalho,Manuel Pita
http://arxiv.org/abs/2311.18466v1
Compressor summary: The text discusses how explicit replies in computer-mediated communication affect the structure of conversations and how to identify roles of utterances using a hierarchical topic model.
Maresa Schröder,Dennis Frauen,Stefan Feuerriegel
http://arxiv.org/abs/2311.18460v1
Compressor summary: This paper studies how unobserved confounding affects causal fairness in machine learning and proposes a new neural framework to learn fair predictions despite this challenge.
Hrushikesh Loya,Łukasz Dudziak,Abhinav Mehrotra,Royson Lee,Javier Fernandez-Marques,Nicholas D. Lane,Hongkai Wen
http://arxiv.org/abs/2311.18451v1
Compressor summary: The text discusses using meta-learning methods from few-shot adaptation to improve neural architecture search for diverse tasks, with a focus on reducing its cost and uncertainty in under-represented domains.
Zicong Fan,Maria Parelli,Maria Eleni Kadoglou,Muhammed Kocabas,Xu Chen,Michael J. Black,Otmar Hilliges
http://arxiv.org/abs/2311.18448v1
Compressor summary: The paper introduces HOLD, a method that can reconstruct 3D hand and object interactions from monocular videos without relying on pre-scanned templates or limited data, using an implicit model and hand-object constraints.
Bin Huang,Xin Wang,Hong Chen,Zihan Song,Wenwu Zhu
http://arxiv.org/abs/2311.18445v1
Compressor summary: VTimeLLM is a novel Video LLM that uses a three-stage training strategy to improve fine-grained video moment understanding and reasoning with respect to time boundaries, outperforming existing Video LLMs in various tasks.
Victor Boone
http://arxiv.org/abs/2311.18437v1
Compressor summary: The paper investigates how well no-regret algorithms perform in one-shot stochastic bandits and finds that randomized methods have better sliding regret than index policies.
Zipeng Qi,Guoxi Huang,Zebin Huang,Qin Guo,Jinwen Chen,Junyu Han,Jian Wang,Gang Zhang,Lufei Liu,Errui Ding,Jingdong Wang
http://arxiv.org/abs/2311.18435v1
Compressor summary: The paper presents two innovations for improving spatial controllability in text-based diffusion models, called Vision Guidance and Layered Rendering Diffusion, which lead to more efficient and accurate image synthesis with specific spatial and contextual requirements.
Felix Koulischer,Cédric Goemaere,Tom van der Meersch,Johannes Deleu,Thomas Demeester
http://arxiv.org/abs/2311.18434v1
Compressor summary: The paper investigates how the inverse temperature hyperparameter affects the performance of Modern Hopfield Networks (MHNs) and suggests that understanding it could help optimize Transformers in the future.
Xiuhong Lin,Changjie Qiu,Zhipeng Cai,Siqi Shen,Yu Zang,Weiquan Liu,Xuesheng Bian,Matthias Müller,Cheng Wang
http://arxiv.org/abs/2311.18433v1
Compressor summary: E2PNet is a novel method that registers 2D RGB images to 3D point clouds using event data, outperforming other methods and being robust to extreme illumination or fast motion.
Lianrui Mu,Jianhong Bai,Xiaoxuan He,Jiangnan Ye,Xiaoyu Liang,Yuchen Yang,Jiedong Zhuang,Haoji Hu
http://arxiv.org/abs/2311.18420v1
Compressor summary: TeG-DG is a framework that leverages text information to improve the domain generalization of Face Anti-Spoofing techniques, achieving better performance especially with limited source domain data.
Jianhao Zeng,Dan Song,Weizhi Nie,Hongshuo Tian,Tongtong Wang,Anan Liu
http://arxiv.org/abs/2311.18405v1
Compressor summary: CAT-DM is a new method for virtual try-on that combines controllability and acceleration using a diffusion model, outperforming previous methods in image quality and pattern reproduction.
Xianlong Wang,Shengshan Hu,Minghui Li,Zhifei Yu,Ziqi Zhou,Leo Yu Zhang,Hai Jin
http://arxiv.org/abs/2311.18403v1
Compressor summary: The authors propose a new image corruption method to defend against a type of unlearnable dataset that uses convolution and random matrices to counteract imperceptible perturbations in training data.
Dan Song,Xinwei Fu,Weizhi Nie,Wenhui Li,Anan Liu
http://arxiv.org/abs/2311.18402v1
Compressor summary: The paper proposes view selection and hierarchical prompts to improve zero-shot 3D shape recognition using language-image pre-trained models like CLIP.
Rafael Pablos Sarabia,Joachim Nyborg,Morten Birk,Ira Assent
http://arxiv.org/abs/2311.18398v1
Compressor summary: Key points: - Paper presents a solution for forecasting high-resolution precipitation using satellite images - Proposes a 2D U-Net model that outperforms the official 3D U-Net baseline - Refines the dataset through importance sampling and dataset preparation - Explores alternative cross-entropy loss function and Conditioning Lead Time - Evaluates standard and learned upsampling methods for high-resolution forecasts Summary: The paper proposes a 2D U-Net model that uses refined satellite images, an improved cross-entropy loss function, and Conditioning Lead Time to forecast high-resolution precipitation more accurately than the official 3D U-Net baseline.
Zhebin Zhang,Xinyu Zhang,Yuanhang Ren,Saijiang Shi,Meng Han,Yongkang Wu,Ruofei Lai,Zhao Cao
http://arxiv.org/abs/2311.18397v1
Compressor summary: The paper introduces Induction-Augmented Generation (IAG), which uses inductive reasoning to generate implicit knowledge for open-domain QA tasks, improving performance over existing methods.
Bernd Frauenknecht,Tobias Ehlgen,Sebastian Trimpe
http://arxiv.org/abs/2311.18393v1
Compressor summary: The paper explores three data-efficient deep RL methods for vehicle trajectory control and proposes a new model-based formulation that improves their performance over standard approaches like soft-actor critic.
Seongmin Hong,Kyeonghyun Lee,Suh Yoon Jeon,Hyewon Bae,Se Young Chun
http://arxiv.org/abs/2311.18387v1
Compressor summary: The paper proposes algorithms for finding the initial noise from images generated by diffusion probabilistic models, improving robustness and quality of image editing tasks.
Jiaxin Mei,Tao Zhou,Kaiwen Huang,Yizhe Zhang,Yi Zhou,Ye Wu,Huazhu Fu
http://arxiv.org/abs/2311.18373v1
Compressor summary: This paper reviews polyp segmentation algorithms, comparing traditional methods with deep learning models and evaluating their performance on benchmark datasets.
Beatrix M. G. Nielsen,Lars Kai Hansen
http://arxiv.org/abs/2311.18364v1
Compressor summary: The authors study Sentence-BERT embeddings and find that they have a problem called hubness, which affects the quality of semantic representations; they propose a combination of two methods to reduce this issue and improve the results.
Ziyang Chen,Yiwen Ye,Mengkang Lu,Yongsheng Pan,Yong Xia
http://arxiv.org/abs/2311.18363v1
Compressor summary: The paper proposes a new method called VPTTA that adapts visual prompts to adjust semantic segmentation models for different medical images without updating the pre-trained model, achieving better results than other methods.
Eyob Mengiste,Borja Garcia de Soto,Timo Hartmann
http://arxiv.org/abs/2311.18361v1
Compressor summary: The study presents a method to automatically generate lookahead plans for construction projects using a neural network model that considers material conditions, space utilization, and project timeline.
Weikai Li,Hongfeng Wei,Yanlai Wu,Jie Yang,Yudi Ruan,Yuan Li,Ying Tang
http://arxiv.org/abs/2311.18358v1
Compressor summary: TIDE is a novel FSOD method that learns from untuned support instances and uses cross-attention and multi-scale resizing to improve performance, overcoming limitations of existing methods in Industry 5.0 scenarios.
Thorben Werner,Johannes Burchert,Lars Schmidt-Thieme
http://arxiv.org/abs/2311.18356v1
Compressor summary: The paper presents an Active Learning framework to compare algorithms fairly across tasks and domains, and proposes the first benchmark testing algorithms in Tabular, Image, and Text domains.
Akira Kawabata,Saku Sugawara
http://arxiv.org/abs/2311.18353v1
Compressor summary: The authors present a dataset to test language models' understanding of the rationale behind critical reasoning in logical reading comprehension tasks, and find that current models struggle to explain why incorrect options should be eliminated.
Berger Cyrille,Lacroix Simon
http://arxiv.org/abs/2311.18344v1
Compressor summary: The paper proposes a fast, robust, and parameter-free model-driven method for detecting image line segments using a linear Kalman filter and a pyramidal extension.
Lu Han,Xu-Yang Chen,Han-Jia Ye,De-Chuan Zhan
http://arxiv.org/abs/2311.18341v1
Compressor summary: This paper proposes TFI, a technique to generate synthetic data from adjacent frames and a multi-level dice loss to improve precipitation forecasting models' robustness against spatial-temporal shifts.
Jianjian Qin,Chunzhi Gu,Jun Yu,Chao Zhang
http://arxiv.org/abs/2311.18332v1
Compressor summary: The paper proposes a new anomaly detection method called CutSwap, which uses saliency maps to generate realistic negative samples for self-supervised learning in computer vision.
Sumanth Udupa,Prajwal Gurunath,Aniruddh Sikdar,Suresh Sundaram
http://arxiv.org/abs/2311.18331v1
Compressor summary: The paper proposes a technique called MRFP to improve semantic scene understanding by randomizing fine-grained and coarse features in deep neural networks, enabling better generalization from simulated data to real-world scenes.
Yingshu Chen,Guocheng Shao,Ka Chun Shum,Binh-Son Hua,Sai-Kit Yeung
http://arxiv.org/abs/2311.18328v1
Compressor summary: The paper surveys recent advances in using artificial intelligence to create digital art by transforming 3D data into various styles, and explores its applications and challenges.
Y. Wang,J. Xu,Y. Zeng,Y. Gong
http://arxiv.org/abs/2311.18311v1
Compressor summary: The paper proposes a method to improve NeRF's view synthesis by learning anisotropic features that eliminate ambiguity and enhance scene representation, achieving better rendering quality on synthetic and real data.
Yuxiao Chen,Sander Tonkens,Marco Pavone
http://arxiv.org/abs/2311.18307v1
Compressor summary: CTT is a traffic model that outputs continuous and categorical predictions, has an interpretable latent space, and can integrate with large language models for better autonomous vehicle planning and simulation.
Zhangsihao Yang,Mingyuan Zhou,Mengyi Shan,Bingbing Wen,Ziwei Xuan,Mitch Hill,Junjie Bai,Guo-Jun Qi,Yalin Wang
http://arxiv.org/abs/2311.18303v1
Compressor summary: The paper proposes a model to generate diverse and realistic animal motions from text descriptions using knowledge from human motion synthesis and introduces a new dataset with 36 animal identities.
Karim Makki,Adrien Bartoli
http://arxiv.org/abs/2311.18299v1
Compressor summary: The paper proposes a new method to use specularities in endoscopic images as cues for 3D perception by reconstructing the tissue's normal direction and shape from a single image.
Tu Bui,Shruti Agarwal,John Collomosse
http://arxiv.org/abs/2311.18297v1
Compressor summary: TrustMark is a GAN-based watermarking method that balances image quality and watermark recovery accuracy, with robustness to various perturbations and a watermark remover counterpart.
Zhiwei Deng,Ting Chen,Yang Li
http://arxiv.org/abs/2311.18296v1
Compressor summary: The paper presents the Perceptual Group Tokenizer, a model that uses perceptual grouping to extract visual features and learn representations without label supervision, achieving competitive performance on ImageNet-1K benchmark.
Juhyeon Park,Seokhyeon Jeong,Taesup Moon
http://arxiv.org/abs/2311.18291v1
Compressor summary: The paper proposes TLDR, a method to reduce spurious correlations in image classifiers by using texts generated by large language models as proxies for images and filtering noisy words.
Haiyao Xiao,Chenglai Zhong,Xuan Gao,Yudong Guo,Juyong Zhang
http://arxiv.org/abs/2311.18288v1
Compressor summary: CosAvatar is a framework for portrait tuning that uses monocular video and text inputs to create animatable portraits with temporal and 3D consistency, enabling precise editing of styles and attributes based on text instructions.
Lingyi Hong,Wei Zhang,Shuyong Gao,Hong Lu,WenQiang Zhang
http://arxiv.org/abs/2311.18286v1
Compressor summary: SimulFlow is a novel method for unsupervised video object segmentation that simultaneously extracts features and identifies targets using a SimulFlow Attention mechanism, achieving state-of-the-art performance while addressing computational complexity and fusion difficulties.
Zhuohao Yin,Xin Huang
http://arxiv.org/abs/2311.18273v1
Compressor summary: The paper introduces a multi-modal framework that uses pretrained models, knowledge bases, and datasets to disambiguate word meanings from images, and shares insights and code for the research community.
Younggeol Cho,Youngrae Kim,Dongman Lee
http://arxiv.org/abs/2311.18270v1
Compressor summary: BESTTA is a novel method that uses style transfer to adapt models to changing environments with limited resources, achieving accuracy and efficiency using only a single image.
Ruxiao Duan,Yaoyao Liu,Jieneng Chen,Adam Kortylewski,Alan Yuille
http://arxiv.org/abs/2311.18266v1
Compressor summary: ESCORT is a novel method for class-incremental learning that compresses old images into prompts and generates diverse exemplars from them using an off-the-shelf diffusion model, improving performance significantly on multiple benchmarks.
Ninad Aithal,Chakka Sai Pradeep,Neelam Sinha
http://arxiv.org/abs/2311.18265v1
Compressor summary: The study uses resting state fMRI to analyze brain network dynamics and classify healthy subjects from those with Mild Cognitive Impairment with a high accuracy rate.
Kristen Grauman,Andrew Westbury,Lorenzo Torresani,Kris Kitani,Jitendra Malik,Triantafyllos Afouras,Kumar Ashutosh,Vijay Baiyya,Siddhant Bansal,Bikram Boote,Eugene Byrne,Zach Chavis,Joya Chen,Feng Cheng,Fu-Jen Chu,Sean Crane,Avijit Dasgupta,Jing Dong,Maria Escobar,Cristhian Forigua,Abrham Gebreselasie,Sanjay Haresh,Jing Huang,Md Mohaiminul Islam,Suyog Jain,Rawal Khirodkar,Devansh Kukreja,Kevin J Liang,Jia-Wei Liu,Sagnik Majumder,Yongsen Mao,Miguel Martin,Effrosyni Mavroudi,Tushar Nagarajan,Francesco Ragusa,Santhosh Kumar Ramakrishnan,Luigi Seminara,Arjun Somayazulu,Yale Song,Shan Su,Zihui Xue,Edward Zhang,Jinxu Zhang,Angela Castillo,Changan Chen,Xinzhu Fu,Ryosuke Furuta,Cristina Gonzalez,Prince Gupta,Jiabo Hu,Yifei Huang,Yiming Huang,Weslie Khoo,Anush Kumar,Robert Kuo,Sach Lakhavani,Miao Liu,Mi Luo,Zhengyi Luo,Brighid Meredith,Austin Miller,Oluwatumininu Oguntola,Xiaqing Pan,Penny Peng,Shraman Pramanick,Merey Ramazanova,Fiona Ryan,Wei Shan,Kiran Somasundaram,Chenan Song,Audrey Southerland,Masatoshi Tateno,Huiyu Wang,Yuchen Wang,Takuma Yagi,Mingfei Yan,Xitong Yang,Zecheng Yu,Shengxin Cindy Zha,Chen Zhao,Ziwei Zhao,Zhifan Zhu,Jeff Zhuo,Pablo Arbelaez,Gedas Bertasius,David Crandall,Dima Damen,Jakob Engel,Giovanni Maria Farinella,Antonino Furnari,Bernard Ghanem,Judy Hoffman,C. V. Jawahar,Richard Newcombe,Hyun Soo Park,James M. Rehg,Yoichi Sato,Manolis Savva,Jianbo Shi,Mike Zheng Shou,Michael Wray
http://arxiv.org/abs/2311.18259v1
Compressor summary: Ego-Exo4D is a large and diverse dataset of multimodal videos with various human activities, contexts, and annotations for research purposes.
Jing Nathan Yan,Jiatao Gu,Alexander M. Rush
http://arxiv.org/abs/2311.18257v1
Compressor summary: DiffuSSM is a scalable state space model for high-fidelity image generation that preserves detailed images without global compression, offering better performance and lower computational cost than existing models.
Guangming Zhu,Siyuan Wang,Qing Cheng,Kelong Wu,Hao Li,Liang Zhang
http://arxiv.org/abs/2311.18254v1
Compressor summary: This study introduces SketchIME, a sketch input method for creating situation maps in C4I systems, with a new dataset and recognition architecture that adapts to new users and tasks.
Yi Li,Aarti Gupta,Sharad Malik
http://arxiv.org/abs/2311.18246v1
Compressor summary: COSMA is an optimization framework that minimizes additional data accesses in specialized hardware accelerators for Deep Neural Networks.
Yongjun Zhang
http://arxiv.org/abs/2311.18241v1
Compressor summary: The authors fine-tuned two large transformer models, longformer and swin-transformer v2, to identify potential protests in news articles and images using the DoCA Corpus and UCLA-protest imagery data, and provided the models via GitHub for social movement scholars.
Raviteja Vemulapalli,Hadi Pouransari,Fartash Faghri,Sachin Mehta,Mehrdad Farajtabar,Mohammad Rastegari,Oncel Tuzel
http://arxiv.org/abs/2311.18237v1
Compressor summary: The paper proposes a simple and effective method to use large pretrained models in resource-limited settings by transferring task-specific knowledge from them to small task-oriented models.
Marwa Abdulhai,Isadora White,Charlie Snell,Charles Sun,Joey Hong,Yuexiang Zhai,Kelvin Xu,Sergey Levine
http://arxiv.org/abs/2311.18232v1
Compressor summary: The paragraph discusses the potential of reinforcement learning to create goal-directed language agents using large language models and introduces LMRL-Gym, an open-source benchmark for evaluating multi-turn RL for LLMs on various tasks.
Hantao Yao,Rui Zhang,Changsheng Xu
http://arxiv.org/abs/2311.18231v1
Compressor summary: TCP is a method for improving visual-language models' ability to generate task-specific textual classifiers by incorporating prior knowledge about classes and using Textual Knowledge Embedding.
Zijian Chen,Wei Sun,Zicheng Zhang,Ru Huang,Fangfang Lu,Xiongkuo Min,Guangtao Zhai,Wenjun Zhang
http://arxiv.org/abs/2311.18216v1
Compressor summary: The paper introduces a new model called FS-BAND that can detect and evaluate banding artifacts, a common video compression issue, using frequency analysis and outperforms existing image quality assessment methods.
Sungjoo Byun,Dongjun Jang,Hyemi Jo,Hyopil Shin
http://arxiv.org/abs/2311.18215v1
Compressor summary: KoTox is a collection of toxic instructions that helps train Large Language Models to produce less unethical language and respond better to toxic inputs in NLP applications.
Mengfei Xia,Yujun Shen,Ceyuan Yang,Ran Yi,Wenping Wang,Yong-jin Liu
http://arxiv.org/abs/2311.18208v1
Compressor summary: This paper proposes using score matching to improve GANs' ability to generate data that matches the real data manifold, resulting in better synthesis performance on diverse and complex datasets.
Haruka Kiyohara,Ren Kishimoto,Kosuke Kawakami,Ken Kobayashi,Kazuhide Nakata,Yuta Saito
http://arxiv.org/abs/2311.18207v1
Compressor summary: SharpeRatio@k is a new metric that measures the risk-return tradeoff of policy portfolios formed by an offline evaluation method called Off-Policy Evaluation (OPE), helping to identify the most efficient estimator for online deployment.
Haruka Kiyohara,Ren Kishimoto,Kosuke Kawakami,Ken Kobayashi,Kazuhide Nakata,Yuta Saito
http://arxiv.org/abs/2311.18206v1
Compressor summary: SCOPE-RL is an open-source Python software that supports offline RL and OPE with flexible and reliable OPE modules, user-friendly APIs, and comprehensive documentation.
Hengchao Shang,Zongyao Li,Daimeng Wei,Jiaxin Guo,Minghan Wang,Xiaoyu Chen,Lizhi Lei,Hao Yang
http://arxiv.org/abs/2311.18200v1
Compressor summary: The paper introduces INarIG, a model that predicts target words in Word-Level Auto Completion (WLAC) tasks by using human typed sequences as Instruction Units and iterative decoding with subwords to improve translation efficiency for low-frequency words.
Mohammad Aminul Islam,Wangzhi Xing,Jun Zhou,Yongsheng Gao,Kuldip K. Paliwal
http://arxiv.org/abs/2311.18199v1
Compressor summary: The paper proposes Hy-Tracker, a framework that uses YOLOv7 for object detection and tracking in hyperspectral videos, addressing challenges like multiple spectral bands, scarce annotations, occlusions, and cluttered backgrounds.
Pengqian Han,Jiamou Liu,Jialing He,Zeyu Zhang,Song Yang,Yanni Tang,Partha Roop
http://arxiv.org/abs/2311.18198v1
Compressor summary: The paper introduces a novel spatial-temporal conditional random field model for pedestrian trajectory prediction that incorporates intention information, improving performance over existing methods.
Jongin Kim,Byeo Rhee Back,Aditya Agrawal,Jiaxi Wu,Veronika J. Wirtz,Traci Hong,Derry Wijaya
http://arxiv.org/abs/2311.18195v1
Compressor summary: The paper presents a multilingual dataset of COVID-19 vaccine misinformation tweets from Brazil, Indonesia, and Nigeria, and proposes two methods to improve misinformation detection models.
Yongqiang Chen,Binghui Xie,Kaiwen Zhou,Bo Han,Yatao Bian,James Cheng
http://arxiv.org/abs/2311.18194v1
Compressor summary: The paper investigates the limitations of in-context learning (ICL) in large language models and proposes a new architecture, DeepSet, which preserves input symmetry for better ICL performance.
Keke Huang,Pietro Liò
http://arxiv.org/abs/2311.18177v1
Compressor summary: The paper proposes UniBasis, a universal polynomial basis that adapts to different levels of graph heterophily, and UniFilter, a general polynomial filter that uses UniBasis for efficient graph analysis.
Xiaosheng He,Fan Yang,Fayao Liu,Guosheng Lin
http://arxiv.org/abs/2311.18169v1
Compressor summary: The paper proposes a new image translation module for GAN transferring that helps preserve content and style when training with limited data, outperforming existing methods.
Karren D. Yang,Anurag Ranjan,Jen-Hao Rick Chang,Raviteja Vemulapalli,Oncel Tuzel
http://arxiv.org/abs/2311.18168v1
Compressor summary: The paper proposes a probabilistic approach for animating 3D facial geometry from speech signals, addressing key challenges in data and metrics, and showing applications such as generating diverse speech-driven 3D facial motion and improving downstream audio-visual models.
Weilian Song,Jieliang Luo,Dale Zhao,Yan Fu,Chin-Yi Cheng,Yasutaka Furukawa
http://arxiv.org/abs/2311.18166v1
Compressor summary: The paper introduces an assistive system for architects that converts large-scale point clouds into standardized digital building models using predicted editing operations as APIs of Autodesk Revit software.
KL Navaneet,Kossar Pourahmadi Meibodi,Soroush Abbasi Koohpayegani,Hamed Pirsiavash
http://arxiv.org/abs/2311.18159v1
Compressor summary: The paper introduces a vector quantization and compression method to reduce the storage cost of 3D Gaussian Splatting, a fast NeRF alternative, while maintaining image quality.
Yifan Zhang,Bryan Hooi
http://arxiv.org/abs/2311.18158v1
Compressor summary: HiPA is a method to improve the quality and speed of text-to-image diffusion by focusing on enhancing high-frequency information using low-rank adaptors.