arxiv compressed, 2023-12-07

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-07 generated by the compressor, my personal LLM-based project.


Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang,Zhongbin Fang,Xia Li,Xiangtai Li,Chen Chen,Mengyuan Liu

http://arxiv.org/abs/2312.03703v1

Compressor summary: Skeleton-in-Context (SiC) is a framework for in-context learning of skeleton sequence modeling that can handle multiple tasks simultaneously, adapt to new tasks, and achieve state-of-the-art performance.


Self-conditioned Image Generation via Generating Representations

Tianhong Li,Dina Katabi,Kaiming He

http://arxiv.org/abs/2312.03701v1

Compressor summary: RCG is a new image generation method that uses self-supervised representation distribution and achieves high quality results without human annotations.


OneLLM: One Framework to Align All Modalities with Language

Jiaming Han,Kaixiong Gong,Yiyuan Zhang,Jiaqi Wang,Kaipeng Zhang,Dahua Lin,Yu Qiao,Peng Gao,Xiangyu Yue

http://arxiv.org/abs/2312.03700v1

Compressor summary: The paper introduces OneLLM, a unified framework for aligning eight modalities to language, and presents a multimodal instruction dataset for evaluating its performance on various tasks.


PROMISE: A Framework for Model-Driven Stateful Prompt Orchestration

Wenyuan Wu,Jasmin Heierli,Max Meisterhans,Adrian Moser,Andri Färber,Mateusz Dolata,Elena Gavagnin,Alexandre de Spindler,Gerhard Schwabe

http://arxiv.org/abs/2312.03699v1

Compressor summary: PROMISE is a framework that helps create and control complex language-based interactions with information systems, improving their effectiveness and efficiency.


Intrinsic Harmonization for Illumination-Aware Compositing

Chris Careaga,Yağız Aksoy,S. Mahdi H. Miangoleh

http://arxiv.org/abs/2312.03698v1

Compressor summary: The authors propose a self-supervised method for image harmonization that adjusts shading and albedo to match lighting between foreground and background in composited images.


On the Role of Edge Dependency in Graph Generative Models

Sudhanshu Chanpuriya,Cameron Musco,Konstantinos Sotiropoulos,Charalampos Tsourakakis

http://arxiv.org/abs/2312.03691v1

Compressor summary: The authors propose a new evaluation framework for graph generative models that considers model-generated graph overlap, categorize them into three complexity levels, derive theoretical bounds on their output quality, introduce new models based on dense subgraph discovery, and show competitive results with popular models.


Evaluating and Mitigating Discrimination in Language Model Decisions

Alex Tamkin,Amanda Askell,Liane Lovitt,Esin Durmus,Nicholas Joseph,Shauna Kravec,Karina Nguyen,Jared Kaplan,Deep Ganguli

http://arxiv.org/abs/2312.03689v1

Compressor summary: The authors propose a method for evaluating the potential discriminatory impact of language models in various use cases by generating diverse prompts with different demographic information, and suggest ways to reduce discrimination through prompt engineering.


What Planning Problems Can A Relational Neural Network Solve?

Jiayuan Mao,Tomás Lozano-Pérez,Joshua B. Tenenbaum,Leslie Pack Kaelbling

http://arxiv.org/abs/2312.03682v1

Compressor summary: The paper analyzes how relational neural networks, such as graph neural networks and transformers, can be used to learn goal-conditioned policies for planning problems, and identifies three classes of planning problems based on circuit width and depth.


Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Lennart Bastian,Yizheng Xie,Nassir Navab,Zorah Lähner

http://arxiv.org/abs/2312.03678v1

Compressor summary: The proposed method combines different basis functions to create a hybrid spectral space for shape correspondence, improving performance on non-isometric deformations and noisy data.


GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models

Ziqi Li

http://arxiv.org/abs/2312.03675v1

Compressor summary: GeoShapley is a game theory-based approach for measuring the importance of location and its synergies with other features in various machine learning models, and it can be applied to both statistical and black-box models.


WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

xujie zhang,Xiu Li,Michael Kampffmeyer,Xin Dong,Zhenyu Xie,Feida Zhu,Haoye Dong,Xiaodan Liang

http://arxiv.org/abs/2312.03667v1

Compressor summary: WarpDiffusion is a novel method that improves Virtual Try-On by combining warping-based and diffusion-based techniques with attention mechanisms to enhance realism and retain garment details.


Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

Alexander Sasha Vezhnevets,John P. Agapiou,Avia Aharon,Ron Ziv,Jayd Matyas,Edgar A. Duéñez-Guzmán,William A. Cunningham,Simon Osindero,Danny Karmon,Joel Z. Leibo

http://arxiv.org/abs/2312.03664v1

Compressor summary: Concordia is a library that facilitates constructing and working with Generative Agent-Based Models, which use Large Language Models to apply common sense, control technologies, and communicate in simulations of physical or digital environments.


Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie,Renyuan Peng,Chunwei Wang,Xinyue Cai,Jianhua Han,Hang Xu,Li Zhang

http://arxiv.org/abs/2312.03661v1

Compressor summary: Reason2Drive is a new dataset for studying interpretable reasoning in complex driving environments using large vision-language models.


Interpretability Illusions in the Generalization of Simplified Models

Dan Friedman,Andrew Lampinen,Lucas Dixon,Danqi Chen,Asma Ghandeharioun

http://arxiv.org/abs/2312.03656v1

Compressor summary: The simplified representations of deep learning models may not accurately capture their behavior outside the training data and may lead to wrong conclusions about their generalization abilities.


MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

Ziyan Wang,Yali Du,Yudi Zhang,Meng Fang,Biwei Huang

http://arxiv.org/abs/2312.03644v1

Compressor summary: MACCA is a method to accurately assign credit to individual agents in offline multi-agent reinforcement learning by modeling the causal relationships between rewards using a Dynamic Bayesian Network, which works well in both discrete and continuous action settings.


Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Matthew L. Olson,Shusen Liu,Jayaraman J. Thiagarajan,Bogdan Kustowski,Weng-Keen Wong,Rushil Anirudh

http://arxiv.org/abs/2312.03642v1

Compressor summary: The paper proposes a new transformer-based method that combines graph hyper-parameter optimization with multi-modal data to improve prediction accuracy in simulation and real-world scenarios.


MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Zhouxia Wang,Ziyang Yuan,Xintao Wang,Tianshui Chen,Menghan Xia,Ping Luo,Ying Shan

http://arxiv.org/abs/2312.03641v1

Compressor summary: The paper introduces MotionCtrl, a novel motion controller for video generation that independently controls camera and object motion, enabling more fine-grained control and diverse combinations of motions.


Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models

Jingye Yang,Da Wu,Kai Wang

http://arxiv.org/abs/2312.03633v1

Compressor summary: The study found that while bidirectional LLM BERT can avoid the reversal curse, both encoder and decoder models struggle with logical reasoning involving three sets.


MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Assaf Ben-Kish,Moran Yanuka,Morris Alper,Raja Giryes,Hadar Averbuch-Elor

http://arxiv.org/abs/2312.03631v1

Compressor summary: MOCHa uses reinforcement learning to reduce hallucinations and improve caption quality in image captioning without strong supervision, and introduces OpenCHAIR, a new benchmark for evaluating open-vocabulary hallucinations.


Boosting Segment Anything Model Towards Open-Vocabulary Learning

Xumeng Han,Longhui Wei,Xuehui Yu,Zhiyang Dou,Xin He,Kuiran Wang,Zhenjun Han,Qi Tian

http://arxiv.org/abs/2312.03628v1

Compressor summary: Sambor is a new model that improves SAM by adding the ability to detect objects based on human inputs and category names, using a novel module and an open-set region proposal network.


TokenCompose: Grounding Diffusion with Token-level Supervision

Zirui Wang,Zhizhou Sha,Zheng Ding,Yilin Wang,Zhuowen Tu

http://arxiv.org/abs/2312.03626v1

Compressor summary: TokenCompose is a Latent Diffusion Model that improves text-to-image generation by introducing token-wise consistency terms between image content and object segmentation maps during finetuning, achieving better multi-category instance composition and photorealism.


Physical Symbolic Optimization

Wassim Tenachi,Rodrigo Ibata,Foivos I. Diakogiannis

http://arxiv.org/abs/2312.03612v1

Compressor summary: The paper introduces a method that uses reinforcement learning to generate equations with physical units, achieving better results than other methods in noisy conditions.


DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang,Yukun Huang,Xiaoyang Wu,Yuan-Chen Guo,Song-Hai Zhang,Hengshuang Zhao,Tong He,Xihui Liu

http://arxiv.org/abs/2312.03611v1

Compressor summary: The paper introduces DreamComposer, a framework that improves existing view-aware diffusion models by using multiple views of an object to generate high-quality novel views for 3D object reconstruction and other tasks.


Automated Multimodal Data Annotation via Calibration With Indoor Positioning System

Ryan Rubel,Andrew Dudash,Mohammad Goli,James O'Hara,Karl Wunderlich

http://arxiv.org/abs/2312.03608v1

Compressor summary: The authors propose a method to automatically label LiDAR and camera data for object detection in indoor settings using an IPS, which is much faster than manual annotation.


DiffusionSat: A Generative Foundation Model for Satellite Imagery

Samar Khanna,Patrick Liu,Linqi Zhou,Chenlin Meng,Robin Rombach,Marshall Burke,David Lobell,Stefano Ermon

http://arxiv.org/abs/2312.03606v1

Compressor summary: The paper introduces DiffusionSat, a large generative model for satellite images that uses metadata and diffusion techniques to generate realistic samples and solve various tasks.


MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong,Pu Wang,Minwoo Lee,Chen Chen

http://arxiv.org/abs/2312.03596v1

Compressor summary: MMM is a novel motion generation method that uses a tokenizer and a transformer to capture dependencies between motion and text tokens, allowing for high-fidelity, high-speed, and editable motion generation.


A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang,Yanhong Zeng,Wenran Liu,Chun Yuan,Kai Chen

http://arxiv.org/abs/2312.03594v1

Compressor summary: PowerPaint is a model that excels at context-aware image inpainting and text-guided object inpainting by using learnable task prompts and tailored fine-tuning strategies.


Language-Informed Visual Concept Learning

Sharon Lee,Yunzhi Zhang,Shangzhe Wu,Jiajun Wu

http://arxiv.org/abs/2312.03587v1

Compressor summary: The paragraph discusses learning a language-informed visual concept representation from large pre-trained vision-language models and using it to generate images with novel compositions of visual concepts.


Foundation Model Assisted Weakly Supervised Semantic Segmentation

Xiaobo Yang,Xiaojin Gong

http://arxiv.org/abs/2312.03585v1

Compressor summary: The paper proposes a framework using pre-trained models CLIP and SAM to generate segmentation seeds for weakly supervised semantic segmentation, achieving state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014.


Context Diffusion: In-Context Aware Image Generation

Ivona Najdenkoska,Animesh Sinha,Abhimanyu Dubey,Dhruv Mahajan,Vignesh Ramanathan,Filip Radenovic

http://arxiv.org/abs/2312.03584v1

Compressor summary: Context Diffusion is a framework for generating images from contextual examples and text prompts, improving image quality and adaptability.


Improving Bias Mitigation through Bias Experts in Natural Language Understanding

Eojin Jeon,Mingyu Lee,Juhyeong Park,Yeachan Kim,Wing-Lam Mok,SangKeun Lee

http://arxiv.org/abs/2312.03577v1

Compressor summary: The text discusses a new debiasing framework for models that uses binary classifiers called bias experts to improve bias identification and mitigate its negative effects on performance.


DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Risab Biswas,Swalpa Kumar Roy,Ning Wang,Umapada Pal,Guang-Bin Huang

http://arxiv.org/abs/2312.03568v1

Compressor summary: The DocBinFormer is a new transformer-based architecture for effective document image binarization that captures global and local features using two-level vision transformers, outperforming existing methods on several benchmarks.


XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Joel Stremmel,Ardavan Saeedi,Hamid Hassanzadeh,Sanjit Batra,Jeffrey Hertzberg,Jaime Murillo,Eran Halperin

http://arxiv.org/abs/2312.03567v1

Compressor summary: XAIQA is a novel method that generates synthetic QA pairs from electronic health records data for extractive QA systems, outperforming existing approaches in semantic matches and clinical abbreviations, and improving GPT-4's performance on difficult questions.


Enhancing Kinship Verification through Multiscale Retinex and Combined Deep-Shallow features

El Ouanas Belabbaci,Mohammed Khammari,Ammar Chouchane,Mohcene Bessaoudi,Abdelmalik Ouamane,Yassine Himeur,Shadi Atalla,Wathiq Mansoor

http://arxiv.org/abs/2312.03562v1

Compressor summary: The authors propose a new method for verifying family relationships from facial images using Multiscale Retinex, deep and shallow texture descriptors, and Logistic Regression, achieving promising results on three kinship datasets.


When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

Wenhui Wang,Shuming Ma,Hanwen Xu,Naoto Usuyama,Jiayu Ding,Hoifung Poon,Furu Wei

http://arxiv.org/abs/2312.03558v1

Compressor summary: Key points: - LongViT is a vision Transformer for gigapixel images - It splits the image into millions of patches and uses LongNet to model them - It can handle computation and memory constraints - It is applied in computational pathology for cancer diagnosis and prognosis - It outperforms previous methods Summary: LongViT is a new vision Transformer that can process gigapixel images in a fast and efficient way, enabling better cancer diagnosis and prognosis in computational pathology.


Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

Jianjin Xu,Saman Motamed,Praneetha Vaddamanu,Chen Henry Wu,Christian Haene,Jean-Charles Bazin,Fernando de la Torre

http://arxiv.org/abs/2312.03556v1

Compressor summary: The paper proposes a method called Parallel Visual Attention (PVA) that uses attention modules and an identity encoder to improve face inpainting results, preserving identity and semantic attributes, and reducing computational complexity compared to existing techniques.


Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment

Fei Yang,Shuang Peng,Ning Sun,Fangyu Wang,Ke Tan,Fu Wu,Jiezhong Qiu,Aimin Pan

http://arxiv.org/abs/2312.03549v1

Compressor summary: Holmes is a novel LLM training framework for heterogeneous NIC environments that uses data and model parallelism strategies, intelligent tasklet scheduling, and pipeline parallel techniques to achieve high training efficiency.


Texture-Semantic Collaboration Network for ORSI Salient Object Detection

Gongyang Li,Zhen Bai,Zhi Liu

http://arxiv.org/abs/2312.03548v1

Compressor summary: The Texture-Semantic Collaboration Network (TSCNet) is a novel approach for salient object detection in optical remote sensing images that leverages both texture and semantic cues to address the challenges of multiple, small, low-illumination, and irregularly shaped objects.


GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models

Haicheng Liao,Huanming Shen,Zhenning Li,Chengyue Wang,Guofa Li,Yiming Bie,Chengzhong Xu

http://arxiv.org/abs/2312.03543v1

Compressor summary: The paper presents a CAVG model that uses multiple encoders and LLMs to improve visual grounding in autonomous vehicles, achieving high accuracy and efficiency in various scenarios.


FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation

Olivia Markham,Yuhao Chen,Chi-en Amy Tai,Alexander Wong

http://arxiv.org/abs/2312.03540v1

Compressor summary: FoodFusion is a Latent Diffusion model that generates realistic and diverse food images from textual descriptions using a large curated dataset and data cleaning methods.


Low-shot Object Learning with Mutual Exclusivity Bias

Anh Thai,Ahmad Humayun,Stefan Stojanov,Zixuan Huang,Bikram Boote,James M. Rehg

http://arxiv.org/abs/2312.03533v1

Compressor summary: The paper proposes LSME, a new object learning task based on mutual exclusivity bias, and presents a dataset, baselines, and a top-performing method for it.


Personalized Pose Forecasting

Maria Priisalu,Ted Kronvall,Cristian Sminchisescu

http://arxiv.org/abs/2312.03528v1

Compressor summary: The paper proposes a new way to adapt human motion prediction models to individual movement patterns, which is important for systems like delivery robots that interact with the same person over time.


On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Peng Sun,Bei Shi,Daiwei Yu,Tao Lin

http://arxiv.org/abs/2312.03526v1

Compressor summary: The authors propose RDED, a new data distillation method that addresses the challenges of large-scale and high-resolution datasets by focusing on realism, diversity, and efficiency.


Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling

Talia Tseriotou,Ryan Sze-Yin Chan,Adam Tsakalidis,Iman Munire Bilal,Elena Kochkina,Terry Lyons,Maria Liakata

http://arxiv.org/abs/2312.03523v1

Compressor summary: Sig-Networks is a new open-source toolkit that uses Signature-based Neural Network models to perform well in temporal NLP tasks like counselling conversations, rumour stance switch and mood changes in social media threads.


Active Wildfires Detection and Dynamic Escape Routes Planning for Humans through Information Fusion between Drones and Satellites

Chang Liu,Tamas Sziranyi

http://arxiv.org/abs/2312.03519v1

Compressor summary: The paper proposes using UAV vision and satellite image analysis for detecting wildfires, extracting road networks, and planning dynamic escape routes for people in distress during wilderness rescues.


FRDiff: Feature Reuse for Exquisite Zero-shot Acceleration of Diffusion Models

Junhyuk So,Jungwon Lee,Eunhyeok Park

http://arxiv.org/abs/2312.03517v1

Compressor summary: The paper introduces FRDiff, a technique that uses feature reuse and reduced score function evaluations to speed up diffusion models without compromising quality.


Kandinsky 3.0 Technical Report

Vladimir Arkhipkin,Andrei Filatov,Viacheslav Vasilev,Anastasia Maltseva,Said Azizov,Igor Pavlov,Julia Agafonova,Andrey Kuznetsov,Denis Dimitrov

http://arxiv.org/abs/2312.03511v1

Compressor summary: The paper introduces Kandinsky 3.0, an improved text-to-image generation model with a larger backbone, encoder, and no diffusion mapping, which enhances quality and domain adaptability.


Towards Sobolev Training

Neil Kichler,Sher Afghan,Uwe Naumann

http://arxiv.org/abs/2312.03510v1

Compressor summary: The paper proposes a new method to create accurate and efficient surrogate models for complex phenomena by using sensitivity information during learning and pruning, which can be applied beyond quantitative finance.


Gravitational cell detection and tracking in fluorescence microscopy data

Nikomidisz Eftimiu,Michal Kozubek

http://arxiv.org/abs/2312.03509v1

Compressor summary: The paper introduces a new computer vision technique using gravitational force fields for detecting, segmenting, and tracking cells in fluorescence microscopy images, which can be faster and more explainable than machine learning methods.


Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang,Yongyi Su,Xun Xu,Kui Jia

http://arxiv.org/abs/2312.03502v1

Compressor summary: The paragraph discusses a new self-training strategy to improve the image segmentation model SAM's robustness and efficiency under different distribution shifts, outperforming existing methods.


Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research

Shiro Takagi

http://arxiv.org/abs/2312.03497v1

Compressor summary: The paper explores the concept of artificial agents capable of conducting research, discussing their core components and challenges, and suggesting prototyping as a first step to overcome them.


Learning From Scenarios for Stochastic Repairable Scheduling

Kim van den Houten,David M. J. Tax,Esteban Freydell,Mathijs de Weerdt

http://arxiv.org/abs/2312.03492v1

Compressor summary: Decision-focused learning adapts to stochastic scheduling problems with uncertain processing times by using historical realizations and outperforms existing methods.


Exploring Answer Information Methods for Question Generation with Transformers

Talha Chafekar,Aafiya Hussain,Grishma Sharma,Deepak Sharma

http://arxiv.org/abs/2312.03483v1

Compressor summary: The authors experiment with different methods to incorporate target answers into question generation for RNN models and find that answer prompting without additional modes performs best.


AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite

Jonas Groschwitz,Shay B. Cohen,Lucia Donatelli,Meaghan Fowlie

http://arxiv.org/abs/2312.03480v1

Compressor summary: GrAPES is a challenge set that tests AMR parsers on various aspects of sentence meaning, revealing their strengths and weaknesses.


Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion

Weitao Du,Jiujiu Chen,Xuecang Zhang,Zhiming Ma,Shengchao Liu

http://arxiv.org/abs/2312.03475v1

Compressor summary: The text introduces a new method called MoleculeJAE that can learn the geometry and topology of molecules using self-supervised learning, improving drug discovery with better geometrical representation.


Search Strategies for Self-driving Laboratories with Pending Experiments

Hao Wen,Jakob Zeitler,Connor Rupnow

http://arxiv.org/abs/2312.03466v1

Compressor summary: The paragraph discusses optimizing Bayesian search strategies for self-driving laboratories with asynchronous parallel experiments and delayed feedback.


Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

Kai Li,Yi Luo

http://arxiv.org/abs/2312.03464v1

Compressor summary: The paper proposes a new way to train a large neural network and extract smaller subnetworks from it during inference based on size or complexity constraints, which improves performance and reduces training time compared to training separate subnetworks from scratch.


DBCopilot: Scaling Natural Language Querying to Massive Databases

Tianshu Wang,Hongyu Lin,Xianpei Han,Le Sun,Xiaoyang Chen,Hao Wang,Zhenyu Zeng

http://arxiv.org/abs/2312.03463v1

Compressor summary: DBCopilot is a framework that simplifies database interactions by routing natural language questions through massive databases using a compact neural network router and leveraging large language models for SQL generation.


HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang,Zhehao Shen,Penghao Wang,Zhuo Su,Yu Hong,Yingliang Zhang,Jingyi Yu,Lan Xu

http://arxiv.org/abs/2312.03461v1

Compressor summary: HiFi4G is a technique that uses 3D Gaussians to render realistic human performance from dense footage, enabling efficient compression and non-rigid tracking.


F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

Sitong Su,Jianzhi Liu,Lianli Gao,Jingkuan Song

http://arxiv.org/abs/2312.03459v1

Compressor summary: The authors propose F3-Pruning, a training-free and generalized pruning strategy for inferencing large T2V models faster without losing quality.


Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification

Chengguang Gan,Qinghao Zhang,Tatsunori Mori

http://arxiv.org/abs/2312.03458v1

Compressor summary: The study introduces "Think from Words" (TFW) and "TFW with Extra word-level information" (TFW Extra), two methods that aim to improve Large Language Models' (LLMs) text comprehension by starting at the word level and using additional word-level data, and evaluates their effectiveness on six Japanese datasets.


Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks

Lukas Drees,Dereje T. Demie,Madhuri R. Paul,Johannes Leonhardt,Sabine J. Seidel,Thomas F. Döring,Ribana Roscher

http://arxiv.org/abs/2312.03443v1

Compressor summary: The paper presents a two-stage framework for realistic image prediction and plant phenotyping using conditional Wasserstein generative adversarial networks, which can integrate multiple growth-influencing conditions and help precision agriculture by revealing spatial crop development over time.


High-Quality Facial Geometry and Appearance Capture at Home

Yuxuan Han,Junfeng Lyu,Feng Xu

http://arxiv.org/abs/2312.03442v1

Compressor summary: This paper presents a new, easy-to-use method for capturing high-quality 3D face scans with skin, hair, eyes, and mouth interior using a single smartphone flashlight sequence in a dim room.


UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Jialong Zuo,Hanyu Zhou,Ying Nie,Feng Zhang,Tianyu Guo,Nong Sang,Yunhe Wang,Changxin Gao

http://arxiv.org/abs/2312.03441v1

Compressor summary: The paper introduces UFineBench, a new benchmark for text-based person retrieval with ultra-fine granularity, and presents a new dataset (UFine6926), an evaluation paradigm (UFine3C), and an efficient algorithm (CFAM) to address the problem of coarse-grained annotations.


Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Youtian Lin,Zuozhuo Dai,Siyu Zhu,Yao Yao

http://arxiv.org/abs/2312.03431v1

Compressor summary: Gaussian-Flow is a fast point-based approach for dynamic scene reconstruction and real-time rendering from videos using a novel Dual-Domain Deformation Model.


ShareCMP: Polarization-Aware RGB-P Semantic Segmentation

Zhuoyan Liu,Bo Wang,Lizhi Wang,Chenyu Mao,Ye Li

http://arxiv.org/abs/2312.03430v1

Compressor summary: The ShareCMP framework improves RGB-Polarization semantic segmentation for underwater scenarios with less parameters and better performance.


Artist-Friendly Relightable and Animatable Neural Heads

Yingyan Xu,Prashanth Chandran,Sebastian Weiss,Markus Gross,Gaspard Zoss,Derek Bradley

http://arxiv.org/abs/2312.03420v1

Compressor summary: The text describes a new method to create realistic and animatable digital heads that can be relit in any environment and perform various expressions.


Run LoRA Run: Faster and Lighter LoRA Implementations

Daria Cherniuk,Aleksandr Mikhalev,Ivan Oseledets

http://arxiv.org/abs/2312.03415v1

Compressor summary: LoRA is a technique that speeds up neural network training by using low-rank adapters, and the RunLoRA framework optimizes this technique for efficiency.


Compressed Context Memory For Online Language Model Interaction

Jang-Hyun Kim,Junyoung Yeom,Sangdoo Yun,Hyun Oh Song

http://arxiv.org/abs/2312.03414v1

Compressor summary: The paper introduces a method to compress and store context for Transformer language models in online scenarios, reducing memory and computation while maintaining performance.


Approximating Solutions to the Knapsack Problem using the Lagrangian Dual Framework

Mitchell Keegan,Mahdi Abolghasemi

http://arxiv.org/abs/2312.03413v1

Compressor summary: The paper proposes neural network models that use the Lagrangian Dual Framework to approximate Knapsack Problem solutions, improving constraint satisfaction while maintaining strong optimization performance.


DeepPyramid+: Medical Image Segmentation using Pyramid View Fusion and Deformable Pyramid Reception

Negin Ghamsarian,Sebastian Wolf,Martin Zinkernagel,Klaus Schoeffmann,Raphael Sznitman

http://arxiv.org/abs/2312.03409v1

Compressor summary: DeepPyramid+ is a neural network architecture that tackles various challenges in medical image and surgical video segmentation using Pyramid View Fusion and Deformable Pyramid Reception modules.


Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

Hongyang Li,Yang Li,Huijie Wang,Jia Zeng,Pinlong Cai,Huilin Xu,Dahua Lin,Junchi Yan,Feng Xu,Lu Xiong,Jingdong Wang,Futang Zhu,Kai Yan,Chunjing Xu,Tiancai Wang,Beipeng Mu,Shaoqing Ren,Zhihui Peng,Yu Qiao

http://arxiv.org/abs/2312.03408v1

Compressor summary: The paragraph discusses a comprehensive review of over seventy open-source autonomous driving datasets, assessing their characteristics and challenges for the evolution of the industry ecosystem.


SVQ: Sparse Vector Quantization for Spatiotemporal Forecasting

Chao Chen,Tian Zhou,Yanjun Zhao,Hui Liu,Liang Sun,Rong Jin

http://arxiv.org/abs/2312.03406v1

Compressor summary: SVQ is a sparse vector quantization method that improves spatiotemporal forecasting tasks by balancing details and noise reduction using a two-layer MLP and a randomly fixed or learnable matrix, achieving state-of-the-art results in various fields.


Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning

Sangwoong Yoon,Dohyun Kwon,Himchan Hwang,Yung-Kyun Noh,Frank C. Park

http://arxiv.org/abs/2312.03397v1

Compressor summary: GCD trains an energy-based model and a sampler together, generalizing Contrastive Divergence by using a trainable sampler instead of MCMC, and can improve both models' performance.


Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin,Antonino Furnari,Kyle Min,Subarna Tripathi,Giovanni Maria Farinella

http://arxiv.org/abs/2312.03391v1

Compressor summary: Egocentric Action Scene Graphs (EASGs) are a new way to understand long egocentric videos, using graphs to describe actions, objects, and relationships over time.


An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network

Taeyoung Kim,Hongseok Yang

http://arxiv.org/abs/2312.03386v1

Compressor summary: The paper extends infinite-width analysis to the Jacobian of deep neural networks, showing that MLPs and their Jacobians converge to Gaussian processes in the infinite-width limit.


A Text-to-Text Model for Multilingual Offensive Language Identification

Tharindu Ranasinghe,Marcos Zampieri

http://arxiv.org/abs/2312.03379v1

Compressor summary: The paper introduces pre-trained encoder-decoder models for offensive language detection that outperform existing transformer-based models and achieve state-of-the-art results in multiple languages.


Riemannian Complex Matrix Convolution Network for PolSAR Image Classification

Junfei Shi,Wei Wang,Haiyan Jin,Mengmeng Nie,Shanshan Ji

http://arxiv.org/abs/2312.03378v1

Compressor summary: The paper proposes a new deep learning method for PolSAR image classification that directly uses the complex matrix as input, learns its structure in Riemannian space, and improves performance over existing methods.


Evaluating the point cloud of individual trees generated from images based on Neural Radiance fields (NeRF) method

Hongyu Huang,Guoji Tian,Chongcheng Chen

http://arxiv.org/abs/2312.03372v1

Compressor summary: The study uses Neural Radiance Fields (NeRF) to reconstruct three-dimensional trees from two-dimensional images, showing its efficiency and adaptability but with lower resolution and accuracy compared to photogrammetric methods.


Lazy-k: Decoding for Constrained Token Classification

Arthur Hemmer,Mickaël Coustaty,Nicola Bartolo,Jérôme Brachat,Jean-Marc Ogier

http://arxiv.org/abs/2312.03367v1

Compressor summary: The authors study how to improve probabilistic models for information extraction by combining them with constrained decoding methods, proposing a new method called Lazy-$k$, and showing its benefits over existing approaches.


KhabarChin: Automatic Detection of Important News in the Persian Language

Hamed Hematian Hemati,Arash Lagzian,Moein Salimi Sartakhti,Hamid Beigy,Ehsaneddin Asgari

http://arxiv.org/abs/2312.03361v1

Compressor summary: The paper introduces Khabarchin, a new dataset for detecting important news in Persian language, and proposes learning-based models to tackle this task.


Teaching Specific Scientific Knowledge into Large Language Models through Additional Training

Kan Hatakeyama-Sato,Yasuhiko Igarashi,Shun Katakami,Yuta Nabae,Teruaki Hayakawa

http://arxiv.org/abs/2312.03360v1

Compressor summary: The paragraph discusses using additional training to embed specialized scientific knowledge into a large language model, addressing challenges such as text scarcity and hyperparameter optimization.


RING-NeRF: A Versatile Architecture based on Residual Implicit Neural Grids

Doriand Petit,Steve Bourgeois,Dumitru Pavel,Vincent Gay-Bellile,Florian Chabot,Loic Barthe

http://arxiv.org/abs/2312.03357v1

Compressor summary: The RING-NeRF architecture uses Residual Implicit Neural Grids to control the level of detail and achieve fast training and state-of-the-art performance in 3D reconstruction and new view synthesis tasks.


PointMoment:Mixed-Moment-based Self-Supervised Representation Learning for 3D Point Clouds

Xin Cao,Xinxin Han,Yifan Wang,Mengna Yang,Kang Li

http://arxiv.org/abs/2312.03350v1

Compressor summary: PointMoment is a novel self-supervised representation learning framework for point clouds that uses a high-order mixed moment loss function to reduce feature redundancy and improve downstream tasks such as 3D point cloud classification and segmentation.


Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

Ke Alexander Wang,Emily B. Fox

http://arxiv.org/abs/2312.03344v1

Compressor summary: Key points: - Paper proposes hybrid variational autoencoder to learn interpretable representations of CGM and meal data for diabetes - Latent space grounded to mechanistic differential equation inputs, reflecting physiological quantities - Novel method to infer glucose appearance rate from unreliable meal logs - Unsupervised representation discovers separation between individuals proportional to disease severity - Embeddings produce clusters better than other features Summary: The paper presents a new method to learn interpretable and accurate representations of CGM and meal data for diabetes using a hybrid variational autoencoder that connects latent space to physiological quantities and infers glucose appearance rate. The method reveals disease severity and outperforms other features.


Topic and genre in dialogue

Amandine Decker,Ellen Breitholtz,Christine Howes,Staffan Larsson

http://arxiv.org/abs/2312.03342v1

Compressor summary: The paper proposes separating and defining genre and topic concepts to improve dialogue system flexibility and reliability.


Online Vectorized HD Map Construction using Geometry

Zhixin Zhang,Yiyuan Zhang,Xiaohan Ding,Fusheng Jin,Xiangyu Yue

http://arxiv.org/abs/2312.03341v1

Compressor summary: GeMap is a method that learns Euclidean shapes and relations of map instances beyond basic perception, achieving state-of-the-art performance on two datasets.


PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization

Xin Cao,Huan Xia,Xinxin Han,Yifan Wang,Kang Li,Linzhi Su

http://arxiv.org/abs/2312.03339v1

Compressor summary: PointJEM is a self-supervised point cloud representation learning method that reduces feature redundancy using joint entropy and performs well in downstream tasks.


Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities

Aaron J. Snoswell,Lucinda Nelson,Hao Xue,Flora D. Salim,Nicolas Suzor,Jean Burgess

http://arxiv.org/abs/2312.03330v1

Compressor summary: The paper argues that generic toxicity classifiers are not suitable for measuring misogyny in natural language generation and proposes using a misogyny-specific lexicon instead.


Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Xiaobo Hu,Youfang Lin,HeHe Fan,Shuo Wang,Zhihao Wu,Kai Lv

http://arxiv.org/abs/2312.03327v1

Compressor summary: The paper proposes a Category Relation Graph (CRG) to learn object layout knowledge and a Temporal-Spatial-Region (TSR) attention architecture to capture object dependencies for visual navigation.


GCFA:Geodesic Curve Feature Augmentation via Shape Space Theory

Yuexing Han,Guanxin Wan,Bing Wang

http://arxiv.org/abs/2312.03325v1

Compressor summary: The authors propose Geodesic curve feature augmentation (GCFA), a method that projects image features into a shape space and generates new features along a geodesic curve, improving data preprocessing for deep learning models in small sample environments.


Background Clustering Pre-training for Few-shot Segmentation

Zhimiao Yu,Tiancheng Lin,Yi Xu

http://arxiv.org/abs/2312.03322v1

Compressor summary: The paper proposes a new pre-training method for few-shot segmentation called Background Clustering Pre-Training, which separates novel classes from the background and uses clustering and base classes to improve the performance.


Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

Saurabh Garg,Amrith Setlur,Zachary Chase Lipton,Sivaraman Balakrishnan,Virginia Smith,Aditi Raghunathan

http://arxiv.org/abs/2312.03318v1

Compressor summary: This paper investigates how combining self-training and contrastive learning techniques improve unsupervised domain adaptation and semi-supervised learning, with varying success depending on the setting.


Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

Wonjun Lee,Gary Geunbae Lee,Yunsu Kim

http://arxiv.org/abs/2312.03312v1

Compressor summary: The authors propose a method to improve speech recognition in low-resource languages by enhancing phoneme recognition and translation models with articulatory characteristics and realistic noise generation.


Benchmarking Continual Learning from Cognitive Perspectives

Xiaoqian Liu,Junge Zhang,Mingyi Zhang,Peipei Yang

http://arxiv.org/abs/2312.03309v1

Compressor summary: The paper proposes a unified evaluation paradigm for continual learning models based on cognitive properties supporting human continual learning, such as adaptability, sensitivity to task variations, and efficiency.


Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique

Ilya Tyagin,Ilya Safro

http://arxiv.org/abs/2312.03303v1

Compressor summary: The paper introduces Dyport, a new benchmarking system for evaluating biomedical hypothesis generation systems using curated datasets and dynamic graphs to assess both accuracy and impact.


DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction

Yanlong Li,Chamara Madarasingha,Kanchana Thilakarathna

http://arxiv.org/abs/2312.03298v1

Compressor summary: DiffPMAE is a self-supervised learning method for point cloud reconstruction that combines Masked Auto-Encoding and Diffusion Model, outperforming many existing techniques on various tasks.


Enhancing Molecular Property Prediction via Mixture of Collaborative Experts

Xu Yao,Shuang Liang,Songqiao Han,Hailiang Huang

http://arxiv.org/abs/2312.03292v1

Compressor summary: The GNN-MoCE architecture uses a mixture of collaborative experts to predict biochemical properties from molecular graphs, addressing data scarcity and imbalance in the Molecular Property Prediction task by exploiting task commonalities and enhancing expert diversity and influence.


OMNIINPUT: A Model-centric Evaluation Framework through Output Distribution

Weitang Liu,Ying Wai Li,Tianle Wang,Yi-Zhuang You,Jingbo Shang

http://arxiv.org/abs/2312.03291v1

Compressor summary: The OmniInput framework evaluates an AI/ML model's quality on all possible inputs by using a self-constructed test set and analyzing its output distribution.


Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym

Junjie Sheng,Zixiao Huang,Chuyun Shen,Wenhao Li,Yun Hua,Bo Jin,Hongyuan Zha,Xiangfeng Wang

http://arxiv.org/abs/2312.03290v1

Compressor summary: The authors investigate if language agents can be alternatives to PPO agents in sequential decision-making tasks by using the TextGym simulator, introducing different levels of scenarios, and proposing a novel EXE agent.


Class Incremental Learning for Adversarial Robustness

Seungju Cho,Hongshin Lee,Changick Kim

http://arxiv.org/abs/2312.03289v1

Compressor summary: The study proposes ARCIL, a method that combines adversarial robustness and incremental learning, and introduces FPD and LAD losses to address the loss of robustness in this setting, achieving significantly better results than existing methods on three datasets.


STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action Recognition

Nguyen Huu Bao Long

http://arxiv.org/abs/2312.03288v1

Compressor summary: The paper proposes a new method for skeleton-based action recognition using graph convolutional networks, cross-attention modules, and temporal attention transformers that outperforms previous methods on two datasets.


Indirect Gradient Matching for Adversarial Robust Distillation

Hongsin Lee,Seungju Cho,Changick Kim

http://arxiv.org/abs/2312.03286v1

Compressor summary: The paper proposes a new technique, IGDM, to improve adversarial robustness by transferring input gradient knowledge from a teacher model to a student model, which enhances the performance of existing adversarial distillation methods without additional data augmentation.


Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

Jimmy Li,Igor Kozlov,Di Wu,Xue Liu,Gregory Dudek

http://arxiv.org/abs/2312.03277v1

Compressor summary: The paper proposes a scalable framework using reinforcement learning and anomaly detection to optimize cellular RAN across many cell sites with varying traffic patterns, efficiently using computational resources.


SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Keifer Lee,Shubham Gupta,Sunglyoung Kim,Bhargav Makwana,Chao Chen,Chen Feng

http://arxiv.org/abs/2312.03266v1

Compressor summary: SOAR is a method for selecting good views for NeRF using interpretable functions and a learned network, improving speed and quality compared to baselines.


f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization

Sina Baharlouei,Shivam Patel,Meisam Razaviyayn

http://arxiv.org/abs/2312.03259v1

Compressor summary: The paper proposes a stochastic optimization framework for fair machine learning that works with small data batches, has convergence guarantees, and performs well on both training and test data distribution shifts.


CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Hailin Zhang,Zirui Liu,Boxuan Chen,Yikai Zhao,Tong Zhao,Tong Yang,Bin Cui

http://arxiv.org/abs/2312.03256v1

Compressor summary: CAFE is a new compression framework for Deep Learning Recommendation Models that uses HotSketch to capture feature importance and hash embedding for non-hot features, achieving better performance than existing methods.


Seller-side Outcome Fairness in Online Marketplaces

Zikun Ye,Reza Yousefi Maragheh,Lalitesh Morishetti,Shanu Vashishtha,Jason Cho,Kaushiki Nag,Sushant Kumar,Kannan Achan

http://arxiv.org/abs/2312.03253v1

Compressor summary: The paper proposes an optimization model and a gradient-based algorithm to improve seller fairness in online marketplaces by balancing recommendation rewards and a fairness metric.


Customizable Combination of Parameter-Efficient Modules for Multi-Task Learning

Haowen Wang,Tao Sun,Cong Fan,Jinjie Gu

http://arxiv.org/abs/2312.03248v1

Compressor summary: C-Poly is a novel approach for improving neural networks' knowledge organization, leading to better cross-task generalization using customized skills and shared skills learned with low-rank techniques.


Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets

Jiale Yan,Hiroaki Ito,Ángel López García-Arias,Yasuyuki Okoshi,Hikari Otsuka,Kazushi Kawamura,Thiem Van Chu,Masato Motomura

http://arxiv.org/abs/2312.03236v1

Compressor summary: This paper explores subnetworks in graph neural networks (GNNs) using scalar pruning mask methods, discovering untrained recurrent networks with high performance and reducing memory usage by up to 98.7%.


Deep Multimodal Fusion for Surgical Feedback Classification

Rafal Kocielnik,Elyssa Y. Wong,Timothy N. Chu,Lydia Lin,De-An Huang,Jiayun Wang,Anima Anandkumar,Andrew J. Hung

http://arxiv.org/abs/2312.03231v1

Compressor summary: This paper develops a machine learning model to classify five types of surgical feedback (Anatomic, Technical, Procedural, Praise, Visual Aid) from text, audio, and video inputs to help improve surgical training.


Human Body Model based ID using Shape and Pose Parameters

Aravind Sundaresan,Brian Burns,Indranil Sur,Yi Yao,Xiao Lin,Sujeong Kim

http://arxiv.org/abs/2312.03227v1

Compressor summary: The HMID system, trained with shape, pose, and biometric losses, improves biometric identification performance on raw images of human bodies in various conditions.


Rethinking Object Saliency Ranking: A Novel Whole-flow Processing Paradigm

Mengke Song,Linfeng Li,Dunquan Wu,Wenfeng Song,Chenglizhao Chen

http://arxiv.org/abs/2312.03226v1

Compressor summary: The paper presents a new method for ranking salient objects by importance order, addressing challenges in existing methods such as ill-defined ground truth, multi-task conflicts, and complex model designs.


Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels

Heng Huang,Xin Jin,Yaqi Liu,Hao Lou,Chaoen Xiao,Shuai Cui,Xinning Li,Dongqing Zou

http://arxiv.org/abs/2312.03222v1

Compressor summary: The paper proposes a new model, F2S, that predicts image aesthetic attributes using feature extractors instead of labels, enabling the learning of meaningful attribute scores from overall scores.


Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Yuanshi Liu,Hanzhen Zhao,Yang Xu,Pengyun Yue,Cong Fang

http://arxiv.org/abs/2312.03218v1

Compressor summary: This paper proposes adaptive gradient-based algorithms that improve complexities for machine learning problems by refining the description of degenerated conditions using two factors and addressing the limitations of existing optimization modeling and analysis.


SDSRA: A Skill-Driven Skill-Recombination Algorithm for Efficient Policy Learning

Eric H. Jiang,Andrew Lizarraga

http://arxiv.org/abs/2312.03216v1

Compressor summary: The paper presents a new algorithm, SDSRA, that improves efficiency and policy quality in reinforcement learning tasks by combining skill-based strategies with robust Actor-Critic framework.


Bootstrap Your Own Variance

Polina Turishcheva,Jason Ramapuram,Sinead Williamson,Dan Busbridge,Eeshan Dhekane,Russ Webb

http://arxiv.org/abs/2312.03213v1

Compressor summary: BYOV combines self-supervised learning and Bayesian methods to estimate uncertainty in model predictions, outperforming a deterministic baseline and providing preliminary evidence of its usefulness.


Constrained Bayesian Optimization Under Partial Observations: Balanced Improvements and Provable Convergence

Shengbo Wang,Ke Li

http://arxiv.org/abs/2312.03212v1

Compressor summary: The paper proposes an efficient and provable method for solving expensive partially observable constrained optimization problems using improved acquisition functions and a surrogate model that better represents feasible regions.


Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Felix Wimbauer,Bichen Wu,Edgar Schoenfeld,Xiaoliang Dai,Ji Hou,Zijian He,Artsiom Sanakoyeu,Peizhao Zhang,Sam Tsai,Jonas Kohler,Christian Rupprecht,Daniel Cremers,Peter Vajda,Jialiang Wang

http://arxiv.org/abs/2312.03209v1

Compressor summary: The authors propose block caching, a technique that reuses outputs from previous layer blocks in diffusion models to speed up image generation while maintaining high visual quality.


Satellite Imagery and AI: A New Era in Ocean Conservation, from Research to Deployment and Impact

Patrick Beukema,Favyen Bastani,Piper Wolters,Henry Herzog,Joe Ferdinando

http://arxiv.org/abs/2312.03207v1

Compressor summary: The paper introduces three specialized computer vision models for satellite data to monitor global IUU fishing and presents best practices for real-time maritime conservation using the Skylight platform.


Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou,Haoran Chang,Sicheng Jiang,Zhiwen Fan,Zehao Zhu,Dejia Xu,Pradyumna Chari,Suya You,Zhangyang Wang,Achuta Kadambi

http://arxiv.org/abs/2312.03203v1

Compressor summary: The paper presents a method to extend NeRF's functionality for semantic tasks using 3D Gaussian Splatting and 2D foundation models, while addressing speed and quality issues.


Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging

Seungyeon Lee,Thai-Hoang Pham,Zhao Cheng,Ping Zhang

http://arxiv.org/abs/2312.03196v1

Compressor summary: The study introduces a neural network model called DREAM that improves automatic sleep staging by learning domain generalized representations from diverse physiological signals and modeling sleep dynamics, outperforming existing methods on three datasets.


Detecting Rumor Veracity with Only Textual Information by Double-Channel Structure

Alex Kim,Sangwon Yoon

http://arxiv.org/abs/2312.03195v1

Compressor summary: The authors propose a double-channel model for classifying rumors on social media as true, false, or unverifiable based on their informativeness and use it to achieve state-of-the-art results on a dataset.


Corporate Bankruptcy Prediction with Domain-Adapted BERT

Alex Kim,Sangwon Yoon

http://arxiv.org/abs/2312.03194v1

Compressor summary: The study uses BERT to analyze company disclosures and improve bankruptcy prediction by enhancing the input dataset quality, achieving high accuracy rates.