This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-07 generated by the compressor, my personal LLM-based project.
Xinshun Wang,Zhongbin Fang,Xia Li,Xiangtai Li,Chen Chen,Mengyuan Liu
http://arxiv.org/abs/2312.03703v1
Compressor summary: Skeleton-in-Context (SiC) is a framework for in-context learning of skeleton sequence modeling that can handle multiple tasks simultaneously, adapt to new tasks, and achieve state-of-the-art performance.
Tianhong Li,Dina Katabi,Kaiming He
http://arxiv.org/abs/2312.03701v1
Compressor summary: RCG is a new image generation method that uses self-supervised representation distribution and achieves high quality results without human annotations.
Jiaming Han,Kaixiong Gong,Yiyuan Zhang,Jiaqi Wang,Kaipeng Zhang,Dahua Lin,Yu Qiao,Peng Gao,Xiangyu Yue
http://arxiv.org/abs/2312.03700v1
Compressor summary: The paper introduces OneLLM, a unified framework for aligning eight modalities to language, and presents a multimodal instruction dataset for evaluating its performance on various tasks.
Wenyuan Wu,Jasmin Heierli,Max Meisterhans,Adrian Moser,Andri Färber,Mateusz Dolata,Elena Gavagnin,Alexandre de Spindler,Gerhard Schwabe
http://arxiv.org/abs/2312.03699v1
Compressor summary: PROMISE is a framework that helps create and control complex language-based interactions with information systems, improving their effectiveness and efficiency.
Chris Careaga,Yağız Aksoy,S. Mahdi H. Miangoleh
http://arxiv.org/abs/2312.03698v1
Compressor summary: The authors propose a self-supervised method for image harmonization that adjusts shading and albedo to match lighting between foreground and background in composited images.
Sudhanshu Chanpuriya,Cameron Musco,Konstantinos Sotiropoulos,Charalampos Tsourakakis
http://arxiv.org/abs/2312.03691v1
Compressor summary: The authors propose a new evaluation framework for graph generative models that considers model-generated graph overlap, categorize them into three complexity levels, derive theoretical bounds on their output quality, introduce new models based on dense subgraph discovery, and show competitive results with popular models.
Alex Tamkin,Amanda Askell,Liane Lovitt,Esin Durmus,Nicholas Joseph,Shauna Kravec,Karina Nguyen,Jared Kaplan,Deep Ganguli
http://arxiv.org/abs/2312.03689v1
Compressor summary: The authors propose a method for evaluating the potential discriminatory impact of language models in various use cases by generating diverse prompts with different demographic information, and suggest ways to reduce discrimination through prompt engineering.
Jiayuan Mao,Tomás Lozano-Pérez,Joshua B. Tenenbaum,Leslie Pack Kaelbling
http://arxiv.org/abs/2312.03682v1
Compressor summary: The paper analyzes how relational neural networks, such as graph neural networks and transformers, can be used to learn goal-conditioned policies for planning problems, and identifies three classes of planning problems based on circuit width and depth.
Lennart Bastian,Yizheng Xie,Nassir Navab,Zorah Lähner
http://arxiv.org/abs/2312.03678v1
Compressor summary: The proposed method combines different basis functions to create a hybrid spectral space for shape correspondence, improving performance on non-isometric deformations and noisy data.
Ziqi Li
http://arxiv.org/abs/2312.03675v1
Compressor summary: GeoShapley is a game theory-based approach for measuring the importance of location and its synergies with other features in various machine learning models, and it can be applied to both statistical and black-box models.
xujie zhang,Xiu Li,Michael Kampffmeyer,Xin Dong,Zhenyu Xie,Feida Zhu,Haoye Dong,Xiaodan Liang
http://arxiv.org/abs/2312.03667v1
Compressor summary: WarpDiffusion is a novel method that improves Virtual Try-On by combining warping-based and diffusion-based techniques with attention mechanisms to enhance realism and retain garment details.
Alexander Sasha Vezhnevets,John P. Agapiou,Avia Aharon,Ron Ziv,Jayd Matyas,Edgar A. Duéñez-Guzmán,William A. Cunningham,Simon Osindero,Danny Karmon,Joel Z. Leibo
http://arxiv.org/abs/2312.03664v1
Compressor summary: Concordia is a library that facilitates constructing and working with Generative Agent-Based Models, which use Large Language Models to apply common sense, control technologies, and communicate in simulations of physical or digital environments.
Ming Nie,Renyuan Peng,Chunwei Wang,Xinyue Cai,Jianhua Han,Hang Xu,Li Zhang
http://arxiv.org/abs/2312.03661v1
Compressor summary: Reason2Drive is a new dataset for studying interpretable reasoning in complex driving environments using large vision-language models.
Dan Friedman,Andrew Lampinen,Lucas Dixon,Danqi Chen,Asma Ghandeharioun
http://arxiv.org/abs/2312.03656v1
Compressor summary: The simplified representations of deep learning models may not accurately capture their behavior outside the training data and may lead to wrong conclusions about their generalization abilities.
Ziyan Wang,Yali Du,Yudi Zhang,Meng Fang,Biwei Huang
http://arxiv.org/abs/2312.03644v1
Compressor summary: MACCA is a method to accurately assign credit to individual agents in offline multi-agent reinforcement learning by modeling the causal relationships between rewards using a Dynamic Bayesian Network, which works well in both discrete and continuous action settings.
Matthew L. Olson,Shusen Liu,Jayaraman J. Thiagarajan,Bogdan Kustowski,Weng-Keen Wong,Rushil Anirudh
http://arxiv.org/abs/2312.03642v1
Compressor summary: The paper proposes a new transformer-based method that combines graph hyper-parameter optimization with multi-modal data to improve prediction accuracy in simulation and real-world scenarios.
Zhouxia Wang,Ziyang Yuan,Xintao Wang,Tianshui Chen,Menghan Xia,Ping Luo,Ying Shan
http://arxiv.org/abs/2312.03641v1
Compressor summary: The paper introduces MotionCtrl, a novel motion controller for video generation that independently controls camera and object motion, enabling more fine-grained control and diverse combinations of motions.
Jingye Yang,Da Wu,Kai Wang
http://arxiv.org/abs/2312.03633v1
Compressor summary: The study found that while bidirectional LLM BERT can avoid the reversal curse, both encoder and decoder models struggle with logical reasoning involving three sets.
Assaf Ben-Kish,Moran Yanuka,Morris Alper,Raja Giryes,Hadar Averbuch-Elor
http://arxiv.org/abs/2312.03631v1
Compressor summary: MOCHa uses reinforcement learning to reduce hallucinations and improve caption quality in image captioning without strong supervision, and introduces OpenCHAIR, a new benchmark for evaluating open-vocabulary hallucinations.
Xumeng Han,Longhui Wei,Xuehui Yu,Zhiyang Dou,Xin He,Kuiran Wang,Zhenjun Han,Qi Tian
http://arxiv.org/abs/2312.03628v1
Compressor summary: Sambor is a new model that improves SAM by adding the ability to detect objects based on human inputs and category names, using a novel module and an open-set region proposal network.
Zirui Wang,Zhizhou Sha,Zheng Ding,Yilin Wang,Zhuowen Tu
http://arxiv.org/abs/2312.03626v1
Compressor summary: TokenCompose is a Latent Diffusion Model that improves text-to-image generation by introducing token-wise consistency terms between image content and object segmentation maps during finetuning, achieving better multi-category instance composition and photorealism.
Wassim Tenachi,Rodrigo Ibata,Foivos I. Diakogiannis
http://arxiv.org/abs/2312.03612v1
Compressor summary: The paper introduces a method that uses reinforcement learning to generate equations with physical units, achieving better results than other methods in noisy conditions.
Yunhan Yang,Yukun Huang,Xiaoyang Wu,Yuan-Chen Guo,Song-Hai Zhang,Hengshuang Zhao,Tong He,Xihui Liu
http://arxiv.org/abs/2312.03611v1
Compressor summary: The paper introduces DreamComposer, a framework that improves existing view-aware diffusion models by using multiple views of an object to generate high-quality novel views for 3D object reconstruction and other tasks.
Ryan Rubel,Andrew Dudash,Mohammad Goli,James O'Hara,Karl Wunderlich
http://arxiv.org/abs/2312.03608v1
Compressor summary: The authors propose a method to automatically label LiDAR and camera data for object detection in indoor settings using an IPS, which is much faster than manual annotation.
Samar Khanna,Patrick Liu,Linqi Zhou,Chenlin Meng,Robin Rombach,Marshall Burke,David Lobell,Stefano Ermon
http://arxiv.org/abs/2312.03606v1
Compressor summary: The paper introduces DiffusionSat, a large generative model for satellite images that uses metadata and diffusion techniques to generate realistic samples and solve various tasks.
Ekkasit Pinyoanuntapong,Pu Wang,Minwoo Lee,Chen Chen
http://arxiv.org/abs/2312.03596v1
Compressor summary: MMM is a novel motion generation method that uses a tokenizer and a transformer to capture dependencies between motion and text tokens, allowing for high-fidelity, high-speed, and editable motion generation.
Junhao Zhuang,Yanhong Zeng,Wenran Liu,Chun Yuan,Kai Chen
http://arxiv.org/abs/2312.03594v1
Compressor summary: PowerPaint is a model that excels at context-aware image inpainting and text-guided object inpainting by using learnable task prompts and tailored fine-tuning strategies.
Sharon Lee,Yunzhi Zhang,Shangzhe Wu,Jiajun Wu
http://arxiv.org/abs/2312.03587v1
Compressor summary: The paragraph discusses learning a language-informed visual concept representation from large pre-trained vision-language models and using it to generate images with novel compositions of visual concepts.
Xiaobo Yang,Xiaojin Gong
http://arxiv.org/abs/2312.03585v1
Compressor summary: The paper proposes a framework using pre-trained models CLIP and SAM to generate segmentation seeds for weakly supervised semantic segmentation, achieving state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014.
Ivona Najdenkoska,Animesh Sinha,Abhimanyu Dubey,Dhruv Mahajan,Vignesh Ramanathan,Filip Radenovic
http://arxiv.org/abs/2312.03584v1
Compressor summary: Context Diffusion is a framework for generating images from contextual examples and text prompts, improving image quality and adaptability.
Eojin Jeon,Mingyu Lee,Juhyeong Park,Yeachan Kim,Wing-Lam Mok,SangKeun Lee
http://arxiv.org/abs/2312.03577v1
Compressor summary: The text discusses a new debiasing framework for models that uses binary classifiers called bias experts to improve bias identification and mitigate its negative effects on performance.
Risab Biswas,Swalpa Kumar Roy,Ning Wang,Umapada Pal,Guang-Bin Huang
http://arxiv.org/abs/2312.03568v1
Compressor summary: The DocBinFormer is a new transformer-based architecture for effective document image binarization that captures global and local features using two-level vision transformers, outperforming existing methods on several benchmarks.
Joel Stremmel,Ardavan Saeedi,Hamid Hassanzadeh,Sanjit Batra,Jeffrey Hertzberg,Jaime Murillo,Eran Halperin
http://arxiv.org/abs/2312.03567v1
Compressor summary: XAIQA is a novel method that generates synthetic QA pairs from electronic health records data for extractive QA systems, outperforming existing approaches in semantic matches and clinical abbreviations, and improving GPT-4's performance on difficult questions.
El Ouanas Belabbaci,Mohammed Khammari,Ammar Chouchane,Mohcene Bessaoudi,Abdelmalik Ouamane,Yassine Himeur,Shadi Atalla,Wathiq Mansoor
http://arxiv.org/abs/2312.03562v1
Compressor summary: The authors propose a new method for verifying family relationships from facial images using Multiscale Retinex, deep and shallow texture descriptors, and Logistic Regression, achieving promising results on three kinship datasets.
Wenhui Wang,Shuming Ma,Hanwen Xu,Naoto Usuyama,Jiayu Ding,Hoifung Poon,Furu Wei
http://arxiv.org/abs/2312.03558v1
Compressor summary: Key points: - LongViT is a vision Transformer for gigapixel images - It splits the image into millions of patches and uses LongNet to model them - It can handle computation and memory constraints - It is applied in computational pathology for cancer diagnosis and prognosis - It outperforms previous methods Summary: LongViT is a new vision Transformer that can process gigapixel images in a fast and efficient way, enabling better cancer diagnosis and prognosis in computational pathology.
Jianjin Xu,Saman Motamed,Praneetha Vaddamanu,Chen Henry Wu,Christian Haene,Jean-Charles Bazin,Fernando de la Torre
http://arxiv.org/abs/2312.03556v1
Compressor summary: The paper proposes a method called Parallel Visual Attention (PVA) that uses attention modules and an identity encoder to improve face inpainting results, preserving identity and semantic attributes, and reducing computational complexity compared to existing techniques.
Fei Yang,Shuang Peng,Ning Sun,Fangyu Wang,Ke Tan,Fu Wu,Jiezhong Qiu,Aimin Pan
http://arxiv.org/abs/2312.03549v1
Compressor summary: Holmes is a novel LLM training framework for heterogeneous NIC environments that uses data and model parallelism strategies, intelligent tasklet scheduling, and pipeline parallel techniques to achieve high training efficiency.
Gongyang Li,Zhen Bai,Zhi Liu
http://arxiv.org/abs/2312.03548v1
Compressor summary: The Texture-Semantic Collaboration Network (TSCNet) is a novel approach for salient object detection in optical remote sensing images that leverages both texture and semantic cues to address the challenges of multiple, small, low-illumination, and irregularly shaped objects.
Haicheng Liao,Huanming Shen,Zhenning Li,Chengyue Wang,Guofa Li,Yiming Bie,Chengzhong Xu
http://arxiv.org/abs/2312.03543v1
Compressor summary: The paper presents a CAVG model that uses multiple encoders and LLMs to improve visual grounding in autonomous vehicles, achieving high accuracy and efficiency in various scenarios.
Olivia Markham,Yuhao Chen,Chi-en Amy Tai,Alexander Wong
http://arxiv.org/abs/2312.03540v1
Compressor summary: FoodFusion is a Latent Diffusion model that generates realistic and diverse food images from textual descriptions using a large curated dataset and data cleaning methods.
Anh Thai,Ahmad Humayun,Stefan Stojanov,Zixuan Huang,Bikram Boote,James M. Rehg
http://arxiv.org/abs/2312.03533v1
Compressor summary: The paper proposes LSME, a new object learning task based on mutual exclusivity bias, and presents a dataset, baselines, and a top-performing method for it.
Maria Priisalu,Ted Kronvall,Cristian Sminchisescu
http://arxiv.org/abs/2312.03528v1
Compressor summary: The paper proposes a new way to adapt human motion prediction models to individual movement patterns, which is important for systems like delivery robots that interact with the same person over time.
Peng Sun,Bei Shi,Daiwei Yu,Tao Lin
http://arxiv.org/abs/2312.03526v1
Compressor summary: The authors propose RDED, a new data distillation method that addresses the challenges of large-scale and high-resolution datasets by focusing on realism, diversity, and efficiency.
Talia Tseriotou,Ryan Sze-Yin Chan,Adam Tsakalidis,Iman Munire Bilal,Elena Kochkina,Terry Lyons,Maria Liakata
http://arxiv.org/abs/2312.03523v1
Compressor summary: Sig-Networks is a new open-source toolkit that uses Signature-based Neural Network models to perform well in temporal NLP tasks like counselling conversations, rumour stance switch and mood changes in social media threads.
Chang Liu,Tamas Sziranyi
http://arxiv.org/abs/2312.03519v1
Compressor summary: The paper proposes using UAV vision and satellite image analysis for detecting wildfires, extracting road networks, and planning dynamic escape routes for people in distress during wilderness rescues.
Junhyuk So,Jungwon Lee,Eunhyeok Park
http://arxiv.org/abs/2312.03517v1
Compressor summary: The paper introduces FRDiff, a technique that uses feature reuse and reduced score function evaluations to speed up diffusion models without compromising quality.
Vladimir Arkhipkin,Andrei Filatov,Viacheslav Vasilev,Anastasia Maltseva,Said Azizov,Igor Pavlov,Julia Agafonova,Andrey Kuznetsov,Denis Dimitrov
http://arxiv.org/abs/2312.03511v1
Compressor summary: The paper introduces Kandinsky 3.0, an improved text-to-image generation model with a larger backbone, encoder, and no diffusion mapping, which enhances quality and domain adaptability.
Neil Kichler,Sher Afghan,Uwe Naumann
http://arxiv.org/abs/2312.03510v1
Compressor summary: The paper proposes a new method to create accurate and efficient surrogate models for complex phenomena by using sensitivity information during learning and pruning, which can be applied beyond quantitative finance.
Nikomidisz Eftimiu,Michal Kozubek
http://arxiv.org/abs/2312.03509v1
Compressor summary: The paper introduces a new computer vision technique using gravitational force fields for detecting, segmenting, and tracking cells in fluorescence microscopy images, which can be faster and more explainable than machine learning methods.
Haojie Zhang,Yongyi Su,Xun Xu,Kui Jia
http://arxiv.org/abs/2312.03502v1
Compressor summary: The paragraph discusses a new self-training strategy to improve the image segmentation model SAM's robustness and efficiency under different distribution shifts, outperforming existing methods.
Shiro Takagi
http://arxiv.org/abs/2312.03497v1
Compressor summary: The paper explores the concept of artificial agents capable of conducting research, discussing their core components and challenges, and suggesting prototyping as a first step to overcome them.
Kim van den Houten,David M. J. Tax,Esteban Freydell,Mathijs de Weerdt
http://arxiv.org/abs/2312.03492v1
Compressor summary: Decision-focused learning adapts to stochastic scheduling problems with uncertain processing times by using historical realizations and outperforms existing methods.
Talha Chafekar,Aafiya Hussain,Grishma Sharma,Deepak Sharma
http://arxiv.org/abs/2312.03483v1
Compressor summary: The authors experiment with different methods to incorporate target answers into question generation for RNN models and find that answer prompting without additional modes performs best.
Jonas Groschwitz,Shay B. Cohen,Lucia Donatelli,Meaghan Fowlie
http://arxiv.org/abs/2312.03480v1
Compressor summary: GrAPES is a challenge set that tests AMR parsers on various aspects of sentence meaning, revealing their strengths and weaknesses.
Weitao Du,Jiujiu Chen,Xuecang Zhang,Zhiming Ma,Shengchao Liu
http://arxiv.org/abs/2312.03475v1
Compressor summary: The text introduces a new method called MoleculeJAE that can learn the geometry and topology of molecules using self-supervised learning, improving drug discovery with better geometrical representation.
Hao Wen,Jakob Zeitler,Connor Rupnow
http://arxiv.org/abs/2312.03466v1
Compressor summary: The paragraph discusses optimizing Bayesian search strategies for self-driving laboratories with asynchronous parallel experiments and delayed feedback.
Kai Li,Yi Luo
http://arxiv.org/abs/2312.03464v1
Compressor summary: The paper proposes a new way to train a large neural network and extract smaller subnetworks from it during inference based on size or complexity constraints, which improves performance and reduces training time compared to training separate subnetworks from scratch.
Tianshu Wang,Hongyu Lin,Xianpei Han,Le Sun,Xiaoyang Chen,Hao Wang,Zhenyu Zeng
http://arxiv.org/abs/2312.03463v1
Compressor summary: DBCopilot is a framework that simplifies database interactions by routing natural language questions through massive databases using a compact neural network router and leveraging large language models for SQL generation.
Yuheng Jiang,Zhehao Shen,Penghao Wang,Zhuo Su,Yu Hong,Yingliang Zhang,Jingyi Yu,Lan Xu
http://arxiv.org/abs/2312.03461v1
Compressor summary: HiFi4G is a technique that uses 3D Gaussians to render realistic human performance from dense footage, enabling efficient compression and non-rigid tracking.
Sitong Su,Jianzhi Liu,Lianli Gao,Jingkuan Song
http://arxiv.org/abs/2312.03459v1
Compressor summary: The authors propose F3-Pruning, a training-free and generalized pruning strategy for inferencing large T2V models faster without losing quality.
Chengguang Gan,Qinghao Zhang,Tatsunori Mori
http://arxiv.org/abs/2312.03458v1
Compressor summary: The study introduces "Think from Words" (TFW) and "TFW with Extra word-level information" (TFW Extra), two methods that aim to improve Large Language Models' (LLMs) text comprehension by starting at the word level and using additional word-level data, and evaluates their effectiveness on six Japanese datasets.
Lukas Drees,Dereje T. Demie,Madhuri R. Paul,Johannes Leonhardt,Sabine J. Seidel,Thomas F. Döring,Ribana Roscher
http://arxiv.org/abs/2312.03443v1
Compressor summary: The paper presents a two-stage framework for realistic image prediction and plant phenotyping using conditional Wasserstein generative adversarial networks, which can integrate multiple growth-influencing conditions and help precision agriculture by revealing spatial crop development over time.
Yuxuan Han,Junfeng Lyu,Feng Xu
http://arxiv.org/abs/2312.03442v1
Compressor summary: This paper presents a new, easy-to-use method for capturing high-quality 3D face scans with skin, hair, eyes, and mouth interior using a single smartphone flashlight sequence in a dim room.
Jialong Zuo,Hanyu Zhou,Ying Nie,Feng Zhang,Tianyu Guo,Nong Sang,Yunhe Wang,Changxin Gao
http://arxiv.org/abs/2312.03441v1
Compressor summary: The paper introduces UFineBench, a new benchmark for text-based person retrieval with ultra-fine granularity, and presents a new dataset (UFine6926), an evaluation paradigm (UFine3C), and an efficient algorithm (CFAM) to address the problem of coarse-grained annotations.
Youtian Lin,Zuozhuo Dai,Siyu Zhu,Yao Yao
http://arxiv.org/abs/2312.03431v1
Compressor summary: Gaussian-Flow is a fast point-based approach for dynamic scene reconstruction and real-time rendering from videos using a novel Dual-Domain Deformation Model.
Zhuoyan Liu,Bo Wang,Lizhi Wang,Chenyu Mao,Ye Li
http://arxiv.org/abs/2312.03430v1
Compressor summary: The ShareCMP framework improves RGB-Polarization semantic segmentation for underwater scenarios with less parameters and better performance.
Yingyan Xu,Prashanth Chandran,Sebastian Weiss,Markus Gross,Gaspard Zoss,Derek Bradley
http://arxiv.org/abs/2312.03420v1
Compressor summary: The text describes a new method to create realistic and animatable digital heads that can be relit in any environment and perform various expressions.
Daria Cherniuk,Aleksandr Mikhalev,Ivan Oseledets
http://arxiv.org/abs/2312.03415v1
Compressor summary: LoRA is a technique that speeds up neural network training by using low-rank adapters, and the RunLoRA framework optimizes this technique for efficiency.
Jang-Hyun Kim,Junyoung Yeom,Sangdoo Yun,Hyun Oh Song
http://arxiv.org/abs/2312.03414v1
Compressor summary: The paper introduces a method to compress and store context for Transformer language models in online scenarios, reducing memory and computation while maintaining performance.
Mitchell Keegan,Mahdi Abolghasemi
http://arxiv.org/abs/2312.03413v1
Compressor summary: The paper proposes neural network models that use the Lagrangian Dual Framework to approximate Knapsack Problem solutions, improving constraint satisfaction while maintaining strong optimization performance.
Negin Ghamsarian,Sebastian Wolf,Martin Zinkernagel,Klaus Schoeffmann,Raphael Sznitman
http://arxiv.org/abs/2312.03409v1
Compressor summary: DeepPyramid+ is a neural network architecture that tackles various challenges in medical image and surgical video segmentation using Pyramid View Fusion and Deformable Pyramid Reception modules.
Hongyang Li,Yang Li,Huijie Wang,Jia Zeng,Pinlong Cai,Huilin Xu,Dahua Lin,Junchi Yan,Feng Xu,Lu Xiong,Jingdong Wang,Futang Zhu,Kai Yan,Chunjing Xu,Tiancai Wang,Beipeng Mu,Shaoqing Ren,Zhihui Peng,Yu Qiao
http://arxiv.org/abs/2312.03408v1
Compressor summary: The paragraph discusses a comprehensive review of over seventy open-source autonomous driving datasets, assessing their characteristics and challenges for the evolution of the industry ecosystem.
Chao Chen,Tian Zhou,Yanjun Zhao,Hui Liu,Liang Sun,Rong Jin
http://arxiv.org/abs/2312.03406v1
Compressor summary: SVQ is a sparse vector quantization method that improves spatiotemporal forecasting tasks by balancing details and noise reduction using a two-layer MLP and a randomly fixed or learnable matrix, achieving state-of-the-art results in various fields.
Sangwoong Yoon,Dohyun Kwon,Himchan Hwang,Yung-Kyun Noh,Frank C. Park
http://arxiv.org/abs/2312.03397v1
Compressor summary: GCD trains an energy-based model and a sampler together, generalizing Contrastive Divergence by using a trainable sampler instead of MCMC, and can improve both models' performance.
Ivan Rodin,Antonino Furnari,Kyle Min,Subarna Tripathi,Giovanni Maria Farinella
http://arxiv.org/abs/2312.03391v1
Compressor summary: Egocentric Action Scene Graphs (EASGs) are a new way to understand long egocentric videos, using graphs to describe actions, objects, and relationships over time.
Taeyoung Kim,Hongseok Yang
http://arxiv.org/abs/2312.03386v1
Compressor summary: The paper extends infinite-width analysis to the Jacobian of deep neural networks, showing that MLPs and their Jacobians converge to Gaussian processes in the infinite-width limit.
Tharindu Ranasinghe,Marcos Zampieri
http://arxiv.org/abs/2312.03379v1
Compressor summary: The paper introduces pre-trained encoder-decoder models for offensive language detection that outperform existing transformer-based models and achieve state-of-the-art results in multiple languages.
Junfei Shi,Wei Wang,Haiyan Jin,Mengmeng Nie,Shanshan Ji
http://arxiv.org/abs/2312.03378v1
Compressor summary: The paper proposes a new deep learning method for PolSAR image classification that directly uses the complex matrix as input, learns its structure in Riemannian space, and improves performance over existing methods.
Hongyu Huang,Guoji Tian,Chongcheng Chen
http://arxiv.org/abs/2312.03372v1
Compressor summary: The study uses Neural Radiance Fields (NeRF) to reconstruct three-dimensional trees from two-dimensional images, showing its efficiency and adaptability but with lower resolution and accuracy compared to photogrammetric methods.
Arthur Hemmer,Mickaël Coustaty,Nicola Bartolo,Jérôme Brachat,Jean-Marc Ogier
http://arxiv.org/abs/2312.03367v1
Compressor summary: The authors study how to improve probabilistic models for information extraction by combining them with constrained decoding methods, proposing a new method called Lazy-$k$, and showing its benefits over existing approaches.
Hamed Hematian Hemati,Arash Lagzian,Moein Salimi Sartakhti,Hamid Beigy,Ehsaneddin Asgari
http://arxiv.org/abs/2312.03361v1
Compressor summary: The paper introduces Khabarchin, a new dataset for detecting important news in Persian language, and proposes learning-based models to tackle this task.
Kan Hatakeyama-Sato,Yasuhiko Igarashi,Shun Katakami,Yuta Nabae,Teruaki Hayakawa
http://arxiv.org/abs/2312.03360v1
Compressor summary: The paragraph discusses using additional training to embed specialized scientific knowledge into a large language model, addressing challenges such as text scarcity and hyperparameter optimization.
Doriand Petit,Steve Bourgeois,Dumitru Pavel,Vincent Gay-Bellile,Florian Chabot,Loic Barthe
http://arxiv.org/abs/2312.03357v1
Compressor summary: The RING-NeRF architecture uses Residual Implicit Neural Grids to control the level of detail and achieve fast training and state-of-the-art performance in 3D reconstruction and new view synthesis tasks.
Xin Cao,Xinxin Han,Yifan Wang,Mengna Yang,Kang Li
http://arxiv.org/abs/2312.03350v1
Compressor summary: PointMoment is a novel self-supervised representation learning framework for point clouds that uses a high-order mixed moment loss function to reduce feature redundancy and improve downstream tasks such as 3D point cloud classification and segmentation.
Ke Alexander Wang,Emily B. Fox
http://arxiv.org/abs/2312.03344v1
Compressor summary: Key points: - Paper proposes hybrid variational autoencoder to learn interpretable representations of CGM and meal data for diabetes - Latent space grounded to mechanistic differential equation inputs, reflecting physiological quantities - Novel method to infer glucose appearance rate from unreliable meal logs - Unsupervised representation discovers separation between individuals proportional to disease severity - Embeddings produce clusters better than other features Summary: The paper presents a new method to learn interpretable and accurate representations of CGM and meal data for diabetes using a hybrid variational autoencoder that connects latent space to physiological quantities and infers glucose appearance rate. The method reveals disease severity and outperforms other features.
Amandine Decker,Ellen Breitholtz,Christine Howes,Staffan Larsson
http://arxiv.org/abs/2312.03342v1
Compressor summary: The paper proposes separating and defining genre and topic concepts to improve dialogue system flexibility and reliability.
Zhixin Zhang,Yiyuan Zhang,Xiaohan Ding,Fusheng Jin,Xiangyu Yue
http://arxiv.org/abs/2312.03341v1
Compressor summary: GeMap is a method that learns Euclidean shapes and relations of map instances beyond basic perception, achieving state-of-the-art performance on two datasets.
Xin Cao,Huan Xia,Xinxin Han,Yifan Wang,Kang Li,Linzhi Su
http://arxiv.org/abs/2312.03339v1
Compressor summary: PointJEM is a self-supervised point cloud representation learning method that reduces feature redundancy using joint entropy and performs well in downstream tasks.
Aaron J. Snoswell,Lucinda Nelson,Hao Xue,Flora D. Salim,Nicolas Suzor,Jean Burgess
http://arxiv.org/abs/2312.03330v1
Compressor summary: The paper argues that generic toxicity classifiers are not suitable for measuring misogyny in natural language generation and proposes using a misogyny-specific lexicon instead.
Xiaobo Hu,Youfang Lin,HeHe Fan,Shuo Wang,Zhihao Wu,Kai Lv
http://arxiv.org/abs/2312.03327v1
Compressor summary: The paper proposes a Category Relation Graph (CRG) to learn object layout knowledge and a Temporal-Spatial-Region (TSR) attention architecture to capture object dependencies for visual navigation.
Yuexing Han,Guanxin Wan,Bing Wang
http://arxiv.org/abs/2312.03325v1
Compressor summary: The authors propose Geodesic curve feature augmentation (GCFA), a method that projects image features into a shape space and generates new features along a geodesic curve, improving data preprocessing for deep learning models in small sample environments.
Zhimiao Yu,Tiancheng Lin,Yi Xu
http://arxiv.org/abs/2312.03322v1
Compressor summary: The paper proposes a new pre-training method for few-shot segmentation called Background Clustering Pre-Training, which separates novel classes from the background and uses clustering and base classes to improve the performance.
Saurabh Garg,Amrith Setlur,Zachary Chase Lipton,Sivaraman Balakrishnan,Virginia Smith,Aditi Raghunathan
http://arxiv.org/abs/2312.03318v1
Compressor summary: This paper investigates how combining self-training and contrastive learning techniques improve unsupervised domain adaptation and semi-supervised learning, with varying success depending on the setting.
Wonjun Lee,Gary Geunbae Lee,Yunsu Kim
http://arxiv.org/abs/2312.03312v1
Compressor summary: The authors propose a method to improve speech recognition in low-resource languages by enhancing phoneme recognition and translation models with articulatory characteristics and realistic noise generation.
Xiaoqian Liu,Junge Zhang,Mingyi Zhang,Peipei Yang
http://arxiv.org/abs/2312.03309v1
Compressor summary: The paper proposes a unified evaluation paradigm for continual learning models based on cognitive properties supporting human continual learning, such as adaptability, sensitivity to task variations, and efficiency.
Ilya Tyagin,Ilya Safro
http://arxiv.org/abs/2312.03303v1
Compressor summary: The paper introduces Dyport, a new benchmarking system for evaluating biomedical hypothesis generation systems using curated datasets and dynamic graphs to assess both accuracy and impact.
Yanlong Li,Chamara Madarasingha,Kanchana Thilakarathna
http://arxiv.org/abs/2312.03298v1
Compressor summary: DiffPMAE is a self-supervised learning method for point cloud reconstruction that combines Masked Auto-Encoding and Diffusion Model, outperforming many existing techniques on various tasks.
Xu Yao,Shuang Liang,Songqiao Han,Hailiang Huang
http://arxiv.org/abs/2312.03292v1
Compressor summary: The GNN-MoCE architecture uses a mixture of collaborative experts to predict biochemical properties from molecular graphs, addressing data scarcity and imbalance in the Molecular Property Prediction task by exploiting task commonalities and enhancing expert diversity and influence.
Weitang Liu,Ying Wai Li,Tianle Wang,Yi-Zhuang You,Jingbo Shang
http://arxiv.org/abs/2312.03291v1
Compressor summary: The OmniInput framework evaluates an AI/ML model's quality on all possible inputs by using a self-constructed test set and analyzing its output distribution.
Junjie Sheng,Zixiao Huang,Chuyun Shen,Wenhao Li,Yun Hua,Bo Jin,Hongyuan Zha,Xiangfeng Wang
http://arxiv.org/abs/2312.03290v1
Compressor summary: The authors investigate if language agents can be alternatives to PPO agents in sequential decision-making tasks by using the TextGym simulator, introducing different levels of scenarios, and proposing a novel EXE agent.
Seungju Cho,Hongshin Lee,Changick Kim
http://arxiv.org/abs/2312.03289v1
Compressor summary: The study proposes ARCIL, a method that combines adversarial robustness and incremental learning, and introduces FPD and LAD losses to address the loss of robustness in this setting, achieving significantly better results than existing methods on three datasets.
Nguyen Huu Bao Long
http://arxiv.org/abs/2312.03288v1
Compressor summary: The paper proposes a new method for skeleton-based action recognition using graph convolutional networks, cross-attention modules, and temporal attention transformers that outperforms previous methods on two datasets.
Hongsin Lee,Seungju Cho,Changick Kim
http://arxiv.org/abs/2312.03286v1
Compressor summary: The paper proposes a new technique, IGDM, to improve adversarial robustness by transferring input gradient knowledge from a teacher model to a student model, which enhances the performance of existing adversarial distillation methods without additional data augmentation.
Jimmy Li,Igor Kozlov,Di Wu,Xue Liu,Gregory Dudek
http://arxiv.org/abs/2312.03277v1
Compressor summary: The paper proposes a scalable framework using reinforcement learning and anomaly detection to optimize cellular RAN across many cell sites with varying traffic patterns, efficiently using computational resources.
Keifer Lee,Shubham Gupta,Sunglyoung Kim,Bhargav Makwana,Chao Chen,Chen Feng
http://arxiv.org/abs/2312.03266v1
Compressor summary: SOAR is a method for selecting good views for NeRF using interpretable functions and a learned network, improving speed and quality compared to baselines.
Sina Baharlouei,Shivam Patel,Meisam Razaviyayn
http://arxiv.org/abs/2312.03259v1
Compressor summary: The paper proposes a stochastic optimization framework for fair machine learning that works with small data batches, has convergence guarantees, and performs well on both training and test data distribution shifts.
Hailin Zhang,Zirui Liu,Boxuan Chen,Yikai Zhao,Tong Zhao,Tong Yang,Bin Cui
http://arxiv.org/abs/2312.03256v1
Compressor summary: CAFE is a new compression framework for Deep Learning Recommendation Models that uses HotSketch to capture feature importance and hash embedding for non-hot features, achieving better performance than existing methods.
Zikun Ye,Reza Yousefi Maragheh,Lalitesh Morishetti,Shanu Vashishtha,Jason Cho,Kaushiki Nag,Sushant Kumar,Kannan Achan
http://arxiv.org/abs/2312.03253v1
Compressor summary: The paper proposes an optimization model and a gradient-based algorithm to improve seller fairness in online marketplaces by balancing recommendation rewards and a fairness metric.
Haowen Wang,Tao Sun,Cong Fan,Jinjie Gu
http://arxiv.org/abs/2312.03248v1
Compressor summary: C-Poly is a novel approach for improving neural networks' knowledge organization, leading to better cross-task generalization using customized skills and shared skills learned with low-rank techniques.
Jiale Yan,Hiroaki Ito,Ángel López García-Arias,Yasuyuki Okoshi,Hikari Otsuka,Kazushi Kawamura,Thiem Van Chu,Masato Motomura
http://arxiv.org/abs/2312.03236v1
Compressor summary: This paper explores subnetworks in graph neural networks (GNNs) using scalar pruning mask methods, discovering untrained recurrent networks with high performance and reducing memory usage by up to 98.7%.
Rafal Kocielnik,Elyssa Y. Wong,Timothy N. Chu,Lydia Lin,De-An Huang,Jiayun Wang,Anima Anandkumar,Andrew J. Hung
http://arxiv.org/abs/2312.03231v1
Compressor summary: This paper develops a machine learning model to classify five types of surgical feedback (Anatomic, Technical, Procedural, Praise, Visual Aid) from text, audio, and video inputs to help improve surgical training.
Aravind Sundaresan,Brian Burns,Indranil Sur,Yi Yao,Xiao Lin,Sujeong Kim
http://arxiv.org/abs/2312.03227v1
Compressor summary: The HMID system, trained with shape, pose, and biometric losses, improves biometric identification performance on raw images of human bodies in various conditions.
Mengke Song,Linfeng Li,Dunquan Wu,Wenfeng Song,Chenglizhao Chen
http://arxiv.org/abs/2312.03226v1
Compressor summary: The paper presents a new method for ranking salient objects by importance order, addressing challenges in existing methods such as ill-defined ground truth, multi-task conflicts, and complex model designs.
Heng Huang,Xin Jin,Yaqi Liu,Hao Lou,Chaoen Xiao,Shuai Cui,Xinning Li,Dongqing Zou
http://arxiv.org/abs/2312.03222v1
Compressor summary: The paper proposes a new model, F2S, that predicts image aesthetic attributes using feature extractors instead of labels, enabling the learning of meaningful attribute scores from overall scores.
Yuanshi Liu,Hanzhen Zhao,Yang Xu,Pengyun Yue,Cong Fang
http://arxiv.org/abs/2312.03218v1
Compressor summary: This paper proposes adaptive gradient-based algorithms that improve complexities for machine learning problems by refining the description of degenerated conditions using two factors and addressing the limitations of existing optimization modeling and analysis.
Eric H. Jiang,Andrew Lizarraga
http://arxiv.org/abs/2312.03216v1
Compressor summary: The paper presents a new algorithm, SDSRA, that improves efficiency and policy quality in reinforcement learning tasks by combining skill-based strategies with robust Actor-Critic framework.
Polina Turishcheva,Jason Ramapuram,Sinead Williamson,Dan Busbridge,Eeshan Dhekane,Russ Webb
http://arxiv.org/abs/2312.03213v1
Compressor summary: BYOV combines self-supervised learning and Bayesian methods to estimate uncertainty in model predictions, outperforming a deterministic baseline and providing preliminary evidence of its usefulness.
Shengbo Wang,Ke Li
http://arxiv.org/abs/2312.03212v1
Compressor summary: The paper proposes an efficient and provable method for solving expensive partially observable constrained optimization problems using improved acquisition functions and a surrogate model that better represents feasible regions.
Felix Wimbauer,Bichen Wu,Edgar Schoenfeld,Xiaoliang Dai,Ji Hou,Zijian He,Artsiom Sanakoyeu,Peizhao Zhang,Sam Tsai,Jonas Kohler,Christian Rupprecht,Daniel Cremers,Peter Vajda,Jialiang Wang
http://arxiv.org/abs/2312.03209v1
Compressor summary: The authors propose block caching, a technique that reuses outputs from previous layer blocks in diffusion models to speed up image generation while maintaining high visual quality.
Patrick Beukema,Favyen Bastani,Piper Wolters,Henry Herzog,Joe Ferdinando
http://arxiv.org/abs/2312.03207v1
Compressor summary: The paper introduces three specialized computer vision models for satellite data to monitor global IUU fishing and presents best practices for real-time maritime conservation using the Skylight platform.
Shijie Zhou,Haoran Chang,Sicheng Jiang,Zhiwen Fan,Zehao Zhu,Dejia Xu,Pradyumna Chari,Suya You,Zhangyang Wang,Achuta Kadambi
http://arxiv.org/abs/2312.03203v1
Compressor summary: The paper presents a method to extend NeRF's functionality for semantic tasks using 3D Gaussian Splatting and 2D foundation models, while addressing speed and quality issues.
Seungyeon Lee,Thai-Hoang Pham,Zhao Cheng,Ping Zhang
http://arxiv.org/abs/2312.03196v1
Compressor summary: The study introduces a neural network model called DREAM that improves automatic sleep staging by learning domain generalized representations from diverse physiological signals and modeling sleep dynamics, outperforming existing methods on three datasets.
Alex Kim,Sangwon Yoon
http://arxiv.org/abs/2312.03195v1
Compressor summary: The authors propose a double-channel model for classifying rumors on social media as true, false, or unverifiable based on their informativeness and use it to achieve state-of-the-art results on a dataset.
Alex Kim,Sangwon Yoon
http://arxiv.org/abs/2312.03194v1
Compressor summary: The study uses BERT to analyze company disclosures and improve bankruptcy prediction by enhancing the input dataset quality, achieving high accuracy rates.