arxiv compressed, 2023-12-07

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-07 generated by the compressor, my personal LLM-based project.

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang,Zhongbin Fang,Xia Li,Xiangtai Li,Chen Chen,Mengyuan Liu

Compressor summary: Skeleton-in-Context (SiC) is a framework for in-context learning of skeleton sequence modeling that can handle multiple tasks simultaneously, adapt to new tasks, and achieve state-of-the-art performance.

Self-conditioned Image Generation via Generating Representations

Tianhong Li,Dina Katabi,Kaiming He

Compressor summary: RCG is a new image generation method that uses self-supervised representation distribution and achieves high quality results without human annotations.

OneLLM: One Framework to Align All Modalities with Language

Jiaming Han,Kaixiong Gong,Yiyuan Zhang,Jiaqi Wang,Kaipeng Zhang,Dahua Lin,Yu Qiao,Peng Gao,Xiangyu Yue

Compressor summary: The paper introduces OneLLM, a unified framework for aligning eight modalities to language, and presents a multimodal instruction dataset for evaluating its performance on various tasks.

PROMISE: A Framework for Model-Driven Stateful Prompt Orchestration

Wenyuan Wu,Jasmin Heierli,Max Meisterhans,Adrian Moser,Andri Färber,Mateusz Dolata,Elena Gavagnin,Alexandre de Spindler,Gerhard Schwabe

Compressor summary: PROMISE is a framework that helps create and control complex language-based interactions with information systems, improving their effectiveness and efficiency.

Intrinsic Harmonization for Illumination-Aware Compositing

Chris Careaga,Yağız Aksoy,S. Mahdi H. Miangoleh

Compressor summary: The authors propose a self-supervised method for image harmonization that adjusts shading and albedo to match lighting between foreground and background in composited images.

On the Role of Edge Dependency in Graph Generative Models

Sudhanshu Chanpuriya,Cameron Musco,Konstantinos Sotiropoulos,Charalampos Tsourakakis

Compressor summary: The authors propose a new evaluation framework for graph generative models that considers model-generated graph overlap, categorize them into three complexity levels, derive theoretical bounds on their output quality, introduce new models based on dense subgraph discovery, and show competitive results with popular models.

Evaluating and Mitigating Discrimination in Language Model Decisions

Alex Tamkin,Amanda Askell,Liane Lovitt,Esin Durmus,Nicholas Joseph,Shauna Kravec,Karina Nguyen,Jared Kaplan,Deep Ganguli

Compressor summary: The authors propose a method for evaluating the potential discriminatory impact of language models in various use cases by generating diverse prompts with different demographic information, and suggest ways to reduce discrimination through prompt engineering.

What Planning Problems Can A Relational Neural Network Solve?

Jiayuan Mao,Tomás Lozano-Pérez,Joshua B. Tenenbaum,Leslie Pack Kaelbling

Compressor summary: The paper analyzes how relational neural networks, such as graph neural networks and transformers, can be used to learn goal-conditioned policies for planning problems, and identifies three classes of planning problems based on circuit width and depth.

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Lennart Bastian,Yizheng Xie,Nassir Navab,Zorah Lähner

Compressor summary: The proposed method combines different basis functions to create a hybrid spectral space for shape correspondence, improving performance on non-isometric deformations and noisy data.

GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models

Ziqi Li

Compressor summary: GeoShapley is a game theory-based approach for measuring the importance of location and its synergies with other features in various machine learning models, and it can be applied to both statistical and black-box models.

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

xujie zhang,Xiu Li,Michael Kampffmeyer,Xin Dong,Zhenyu Xie,Feida Zhu,Haoye Dong,Xiaodan Liang

Compressor summary: WarpDiffusion is a novel method that improves Virtual Try-On by combining warping-based and diffusion-based techniques with attention mechanisms to enhance realism and retain garment details.

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

Alexander Sasha Vezhnevets,John P. Agapiou,Avia Aharon,Ron Ziv,Jayd Matyas,Edgar A. Duéñez-Guzmán,William A. Cunningham,Simon Osindero,Danny Karmon,Joel Z. Leibo

Compressor summary: Concordia is a library that facilitates constructing and working with Generative Agent-Based Models, which use Large Language Models to apply common sense, control technologies, and communicate in simulations of physical or digital environments.

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie,Renyuan Peng,Chunwei Wang,Xinyue Cai,Jianhua Han,Hang Xu,Li Zhang

Compressor summary: Reason2Drive is a new dataset for studying interpretable reasoning in complex driving environments using large vision-language models.

Interpretability Illusions in the Generalization of Simplified Models

Dan Friedman,Andrew Lampinen,Lucas Dixon,Danqi Chen,Asma Ghandeharioun

Compressor summary: The simplified representations of deep learning models may not accurately capture their behavior outside the training data and may lead to wrong conclusions about their generalization abilities.

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

Ziyan Wang,Yali Du,Yudi Zhang,Meng Fang,Biwei Huang

Compressor summary: MACCA is a method to accurately assign credit to individual agents in offline multi-agent reinforcement learning by modeling the causal relationships between rewards using a Dynamic Bayesian Network, which works well in both discrete and continuous action settings.

Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Matthew L. Olson,Shusen Liu,Jayaraman J. Thiagarajan,Bogdan Kustowski,Weng-Keen Wong,Rushil Anirudh

Compressor summary: The paper proposes a new transformer-based method that combines graph hyper-parameter optimization with multi-modal data to improve prediction accuracy in simulation and real-world scenarios.

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Zhouxia Wang,Ziyang Yuan,Xintao Wang,Tianshui Chen,Menghan Xia,Ping Luo,Ying Shan

Compressor summary: The paper introduces MotionCtrl, a novel motion controller for video generation that independently controls camera and object motion, enabling more fine-grained control and diverse combinations of motions.

Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models

Jingye Yang,Da Wu,Kai Wang

Compressor summary: The study found that while bidirectional LLM BERT can avoid the reversal curse, both encoder and decoder models struggle with logical reasoning involving three sets.

MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Assaf Ben-Kish,Moran Yanuka,Morris Alper,Raja Giryes,Hadar Averbuch-Elor

Compressor summary: MOCHa uses reinforcement learning to reduce hallucinations and improve caption quality in image captioning without strong supervision, and introduces OpenCHAIR, a new benchmark for evaluating open-vocabulary hallucinations.

Boosting Segment Anything Model Towards Open-Vocabulary Learning

Xumeng Han,Longhui Wei,Xuehui Yu,Zhiyang Dou,Xin He,Kuiran Wang,Zhenjun Han,Qi Tian

Compressor summary: Sambor is a new model that improves SAM by adding the ability to detect objects based on human inputs and category names, using a novel module and an open-set region proposal network.

TokenCompose: Grounding Diffusion with Token-level Supervision

Zirui Wang,Zhizhou Sha,Zheng Ding,Yilin Wang,Zhuowen Tu

Compressor summary: TokenCompose is a Latent Diffusion Model that improves text-to-image generation by introducing token-wise consistency terms between image content and object segmentation maps during finetuning, achieving better multi-category instance composition and photorealism.

Physical Symbolic Optimization

Wassim Tenachi,Rodrigo Ibata,Foivos I. Diakogiannis

Compressor summary: The paper introduces a method that uses reinforcement learning to generate equations with physical units, achieving better results than other methods in noisy conditions.

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang,Yukun Huang,Xiaoyang Wu,Yuan-Chen Guo,Song-Hai Zhang,Hengshuang Zhao,Tong He,Xihui Liu

Compressor summary: The paper introduces DreamComposer, a framework that improves existing view-aware diffusion models by using multiple views of an object to generate high-quality novel views for 3D object reconstruction and other tasks.

Automated Multimodal Data Annotation via Calibration With Indoor Positioning System

Ryan Rubel,Andrew Dudash,Mohammad Goli,James O'Hara,Karl Wunderlich

Compressor summary: The authors propose a method to automatically label LiDAR and camera data for object detection in indoor settings using an IPS, which is much faster than manual annotation.

DiffusionSat: A Generative Foundation Model for Satellite Imagery

Samar Khanna,Patrick Liu,Linqi Zhou,Chenlin Meng,Robin Rombach,Marshall Burke,David Lobell,Stefano Ermon

Compressor summary: The paper introduces DiffusionSat, a large generative model for satellite images that uses metadata and diffusion techniques to generate realistic samples and solve various tasks.

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong,Pu Wang,Minwoo Lee,Chen Chen

Compressor summary: MMM is a novel motion generation method that uses a tokenizer and a transformer to capture dependencies between motion and text tokens, allowing for high-fidelity, high-speed, and editable motion generation.

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang,Yanhong Zeng,Wenran Liu,Chun Yuan,Kai Chen

Compressor summary: PowerPaint is a model that excels at context-aware image inpainting and text-guided object inpainting by using learnable task prompts and tailored fine-tuning strategies.

Language-Informed Visual Concept Learning

Sharon Lee,Yunzhi Zhang,Shangzhe Wu,Jiajun Wu

Compressor summary: The paragraph discusses learning a language-informed visual concept representation from large pre-trained vision-language models and using it to generate images with novel compositions of visual concepts.

Foundation Model Assisted Weakly Supervised Semantic Segmentation

Xiaobo Yang,Xiaojin Gong

Compressor summary: The paper proposes a framework using pre-trained models CLIP and SAM to generate segmentation seeds for weakly supervised semantic segmentation, achieving state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014.

Context Diffusion: In-Context Aware Image Generation

Ivona Najdenkoska,Animesh Sinha,Abhimanyu Dubey,Dhruv Mahajan,Vignesh Ramanathan,Filip Radenovic

Compressor summary: Context Diffusion is a framework for generating images from contextual examples and text prompts, improving image quality and adaptability.

Improving Bias Mitigation through Bias Experts in Natural Language Understanding

Eojin Jeon,Mingyu Lee,Juhyeong Park,Yeachan Kim,Wing-Lam Mok,SangKeun Lee

Compressor summary: The text discusses a new debiasing framework for models that uses binary classifiers called bias experts to improve bias identification and mitigate its negative effects on performance.

DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Risab Biswas,Swalpa Kumar Roy,Ning Wang,Umapada Pal,Guang-Bin Huang

Compressor summary: The DocBinFormer is a new transformer-based architecture for effective document image binarization that captures global and local features using two-level vision transformers, outperforming existing methods on several benchmarks.

XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Joel Stremmel,Ardavan Saeedi,Hamid Hassanzadeh,Sanjit Batra,Jeffrey Hertzberg,Jaime Murillo,Eran Halperin

Compressor summary: XAIQA is a novel method that generates synthetic QA pairs from electronic health records data for extractive QA systems, outperforming existing approaches in semantic matches and clinical abbreviations, and improving GPT-4's performance on difficult questions.

Enhancing Kinship Verification through Multiscale Retinex and Combined Deep-Shallow features

El Ouanas Belabbaci,Mohammed Khammari,Ammar Chouchane,Mohcene Bessaoudi,Abdelmalik Ouamane,Yassine Himeur,Shadi Atalla,Wathiq Mansoor

Compressor summary: The authors propose a new method for verifying family relationships from facial images using Multiscale Retinex, deep and shallow texture descriptors, and Logistic Regression, achieving promising results on three kinship datasets.

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

Wenhui Wang,Shuming Ma,Hanwen Xu,Naoto Usuyama,Jiayu Ding,Hoifung Poon,Furu Wei

Compressor summary: Key points: - LongViT is a vision Transformer for gigapixel images - It splits the image into millions of patches and uses LongNet to model them - It can handle computation and memory constraints - It is applied in computational pathology for cancer diagnosis and prognosis - It outperforms previous methods Summary: LongViT is a new vision Transformer that can process gigapixel images in a fast and efficient way, enabling better cancer diagnosis and prognosis in computational pathology.

Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

Jianjin Xu,Saman Motamed,Praneetha Vaddamanu,Chen Henry Wu,Christian Haene,Jean-Charles Bazin,Fernando de la Torre

Compressor summary: The paper proposes a method called Parallel Visual Attention (PVA) that uses attention modules and an identity encoder to improve face inpainting results, preserving identity and semantic attributes, and reducing computational complexity compared to existing techniques.

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment

Fei Yang,Shuang Peng,Ning Sun,Fangyu Wang,Ke Tan,Fu Wu,Jiezhong Qiu,Aimin Pan

Compressor summary: Holmes is a novel LLM training framework for heterogeneous NIC environments that uses data and model parallelism strategies, intelligent tasklet scheduling, and pipeline parallel techniques to achieve high training efficiency.

Texture-Semantic Collaboration Network for ORSI Salient Object Detection

Gongyang Li,Zhen Bai,Zhi Liu

Compressor summary: The Texture-Semantic Collaboration Network (TSCNet) is a novel approach for salient object detection in optical remote sensing images that leverages both texture and semantic cues to address the challenges of multiple, small, low-illumination, and irregularly shaped objects.

GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models

Haicheng Liao,Huanming Shen,Zhenning Li,Chengyue Wang,Guofa Li,Yiming Bie,Chengzhong Xu

Compressor summary: The paper presents a CAVG model that uses multiple encoders and LLMs to improve visual grounding in autonomous vehicles, achieving high accuracy and efficiency in various scenarios.

FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation

Olivia Markham,Yuhao Chen,Chi-en Amy Tai,Alexander Wong

Compressor summary: FoodFusion is a Latent Diffusion model that generates realistic and diverse food images from textual descriptions using a large curated dataset and data cleaning methods.

Low-shot Object Learning with Mutual Exclusivity Bias

Anh Thai,Ahmad Humayun,Stefan Stojanov,Zixuan Huang,Bikram Boote,James M. Rehg

Compressor summary: The paper proposes LSME, a new object learning task based on mutual exclusivity bias, and presents a dataset, baselines, and a top-performing method for it.

Personalized Pose Forecasting

Maria Priisalu,Ted Kronvall,Cristian Sminchisescu

Compressor summary: The paper proposes a new way to adapt human motion prediction models to individual movement patterns, which is important for systems like delivery robots that interact with the same person over time.

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Peng Sun,Bei Shi,Daiwei Yu,Tao Lin

Compressor summary: The authors propose RDED, a new data distillation method that addresses the challenges of large-scale and high-resolution datasets by focusing on realism, diversity, and efficiency.

Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling

Talia Tseriotou,Ryan Sze-Yin Chan,Adam Tsakalidis,Iman Munire Bilal,Elena Kochkina,Terry Lyons,Maria Liakata

Compressor summary: Sig-Networks is a new open-source toolkit that uses Signature-based Neural Network models to perform well in temporal NLP tasks like counselling conversations, rumour stance switch and mood changes in social media threads.

Active Wildfires Detection and Dynamic Escape Routes Planning for Humans through Information Fusion between Drones and Satellites

Chang Liu,Tamas Sziranyi

Compressor summary: The paper proposes using UAV vision and satellite image analysis for detecting wildfires, extracting road networks, and planning dynamic escape routes for people in distress during wilderness rescues.

FRDiff: Feature Reuse for Exquisite Zero-shot Acceleration of Diffusion Models

Junhyuk So,Jungwon Lee,Eunhyeok Park

Compressor summary: The paper introduces FRDiff, a technique that uses feature reuse and reduced score function evaluations to speed up diffusion models without compromising quality.

Kandinsky 3.0 Technical Report

Vladimir Arkhipkin,Andrei Filatov,Viacheslav Vasilev,Anastasia Maltseva,Said Azizov,Igor Pavlov,Julia Agafonova,Andrey Kuznetsov,Denis Dimitrov

Compressor summary: The paper introduces Kandinsky 3.0, an improved text-to-image generation model with a larger backbone, encoder, and no diffusion mapping, which enhances quality and domain adaptability.

Towards Sobolev Training

Neil Kichler,Sher Afghan,Uwe Naumann

Compressor summary: The paper proposes a new method to create accurate and efficient surrogate models for complex phenomena by using sensitivity information during learning and pruning, which can be applied beyond quantitative finance.

Gravitational cell detection and tracking in fluorescence microscopy data

Nikomidisz Eftimiu,Michal Kozubek

Compressor summary: The paper introduces a new computer vision technique using gravitational force fields for detecting, segmenting, and tracking cells in fluorescence microscopy images, which can be faster and more explainable than machine learning methods.

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang,Yongyi Su,Xun Xu,Kui Jia

Compressor summary: The paragraph discusses a new self-training strategy to improve the image segmentation model SAM's robustness and efficiency under different distribution shifts, outperforming existing methods.

Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research

Shiro Takagi

Compressor summary: The paper explores the concept of artificial agents capable of conducting research, discussing their core components and challenges, and suggesting prototyping as a first step to overcome them.

Learning From Scenarios for Stochastic Repairable Scheduling

Kim van den Houten,David M. J. Tax,Esteban Freydell,Mathijs de Weerdt

Compressor summary: Decision-focused learning adapts to stochastic scheduling problems with uncertain processing times by using historical realizations and outperforms existing methods.

Exploring Answer Information Methods for Question Generation with Transformers

Talha Chafekar,Aafiya Hussain,Grishma Sharma,Deepak Sharma

Compressor summary: The authors experiment with different methods to incorporate target answers into question generation for RNN models and find that answer prompting without additional modes performs best.

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite

Jonas Groschwitz,Shay B. Cohen,Lucia Donatelli,Meaghan Fowlie

Compressor summary: GrAPES is a challenge set that tests AMR parsers on various aspects of sentence meaning, revealing their strengths and weaknesses.

Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion

Weitao Du,Jiujiu Chen,Xuecang Zhang,Zhiming Ma,Shengchao Liu

Compressor summary: The text introduces a new method called MoleculeJAE that can learn the geometry and topology of molecules using self-supervised learning, improving drug discovery with better geometrical representation.

Search Strategies for Self-driving Laboratories with Pending Experiments

Hao Wen,Jakob Zeitler,Connor Rupnow

Compressor summary: The paragraph discusses optimizing Bayesian search strategies for self-driving laboratories with asynchronous parallel experiments and delayed feedback.

Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

Kai Li,Yi Luo

Compressor summary: The paper proposes a new way to train a large neural network and extract smaller subnetworks from it during inference based on size or complexity constraints, which improves performance and reduces training time compared to training separate subnetworks from scratch.

DBCopilot: Scaling Natural Language Querying to Massive Databases

Tianshu Wang,Hongyu Lin,Xianpei Han,Le Sun,Xiaoyang Chen,Hao Wang,Zhenyu Zeng

Compressor summary: DBCopilot is a framework that simplifies database interactions by routing natural language questions through massive databases using a compact neural network router and leveraging large language models for SQL generation.

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang,Zhehao Shen,Penghao Wang,Zhuo Su,Yu Hong,Yingliang Zhang,Jingyi Yu,Lan Xu

Compressor summary: HiFi4G is a technique that uses 3D Gaussians to render realistic human performance from dense footage, enabling efficient compression and non-rigid tracking.

F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

Sitong Su,Jianzhi Liu,Lianli Gao,Jingkuan Song

Compressor summary: The authors propose F3-Pruning, a training-free and generalized pruning strategy for inferencing large T2V models faster without losing quality.

Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification

Chengguang Gan,Qinghao Zhang,Tatsunori Mori

Compressor summary: The study introduces "Think from Words" (TFW) and "TFW with Extra word-level information" (TFW Extra), two methods that aim to improve Large Language Models' (LLMs) text comprehension by starting at the word level and using additional word-level data, and evaluates their effectiveness on six Japanese datasets.

Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks

Lukas Drees,Dereje T. Demie,Madhuri R. Paul,Johannes Leonhardt,Sabine J. Seidel,Thomas F. Döring,Ribana Roscher

Compressor summary: The paper presents a two-stage framework for realistic image prediction and plant phenotyping using conditional Wasserstein generative adversarial networks, which can integrate multiple growth-influencing conditions and help precision agriculture by revealing spatial crop development over time.

High-Quality Facial Geometry and Appearance Capture at Home

Yuxuan Han,Junfeng Lyu,Feng Xu

Compressor summary: This paper presents a new, easy-to-use method for capturing high-quality 3D face scans with skin, hair, eyes, and mouth interior using a single smartphone flashlight sequence in a dim room.

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Jialong Zuo,Hanyu Zhou,Ying Nie,Feng Zhang,Tianyu Guo,Nong Sang,Yunhe Wang,Changxin Gao

Compressor summary: The paper introduces UFineBench, a new benchmark for text-based person retrieval with ultra-fine granularity, and presents a new dataset (UFine6926), an evaluation paradigm (UFine3C), and an efficient algorithm (CFAM) to address the problem of coarse-grained annotations.

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Youtian Lin,Zuozhuo Dai,Siyu Zhu,Yao Yao

Compressor summary: Gaussian-Flow is a fast point-based approach for dynamic scene reconstruction and real-time rendering from videos using a novel Dual-Domain Deformation Model.

ShareCMP: Polarization-Aware RGB-P Semantic Segmentation

Zhuoyan Liu,Bo Wang,Lizhi Wang,Chenyu Mao,Ye Li

Compressor summary: The ShareCMP framework improves RGB-Polarization semantic segmentation for underwater scenarios with less parameters and better performance.

Artist-Friendly Relightable and Animatable Neural Heads

Yingyan Xu,Prashanth Chandran,Sebastian Weiss,Markus Gross,Gaspard Zoss,Derek Bradley

Compressor summary: The text describes a new method to create realistic and animatable digital heads that can be relit in any environment and perform various expressions.

Run LoRA Run: Faster and Lighter LoRA Implementations

Daria Cherniuk,Aleksandr Mikhalev,Ivan Oseledets

Compressor summary: LoRA is a technique that speeds up neural network training by using low-rank adapters, and the RunLoRA framework optimizes this technique for efficiency.

Compressed Context Memory For Online Language Model Interaction

Jang-Hyun Kim,Junyoung Yeom,Sangdoo Yun,Hyun Oh Song

Compressor summary: The paper introduces a method to compress and store context for Transformer language models in online scenarios, reducing memory and computation while maintaining performance.

Approximating Solutions to the Knapsack Problem using the Lagrangian Dual Framework

Mitchell Keegan,Mahdi Abolghasemi

Compressor summary: The paper proposes neural network models that use the Lagrangian Dual Framework to approximate Knapsack Problem solutions, improving constraint satisfaction while maintaining strong optimization performance.

DeepPyramid+: Medical Image Segmentation using Pyramid View Fusion and Deformable Pyramid Reception

Negin Ghamsarian,Sebastian Wolf,Martin Zinkernagel,Klaus Schoeffmann,Raphael Sznitman

Compressor summary: DeepPyramid+ is a neural network architecture that tackles various challenges in medical image and surgical video segmentation using Pyramid View Fusion and Deformable Pyramid Reception modules.

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

Hongyang Li,Yang Li,Huijie Wang,Jia Zeng,Pinlong Cai,Huilin Xu,Dahua Lin,Junchi Yan,Feng Xu,Lu Xiong,Jingdong Wang,Futang Zhu,Kai Yan,Chunjing Xu,Tiancai Wang,Beipeng Mu,Shaoqing Ren,Zhihui Peng,Yu Qiao

Compressor summary: The paragraph discusses a comprehensive review of over seventy open-source autonomous driving datasets, assessing their characteristics and challenges for the evolution of the industry ecosystem.

SVQ: Sparse Vector Quantization for Spatiotemporal Forecasting

Chao Chen,Tian Zhou,Yanjun Zhao,Hui Liu,Liang Sun,Rong Jin

Compressor summary: SVQ is a sparse vector quantization method that improves spatiotemporal forecasting tasks by balancing details and noise reduction using a two-layer MLP and a randomly fixed or learnable matrix, achieving state-of-the-art results in various fields.

Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning

Sangwoong Yoon,Dohyun Kwon,Himchan Hwang,Yung-Kyun Noh,Frank C. Park

Compressor summary: GCD trains an energy-based model and a sampler together, generalizing Contrastive Divergence by using a trainable sampler instead of MCMC, and can improve both models' performance.

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin,Antonino Furnari,Kyle Min,Subarna Tripathi,Giovanni Maria Farinella

Compressor summary: Egocentric Action Scene Graphs (EASGs) are a new way to understand long egocentric videos, using graphs to describe actions, objects, and relationships over time.

An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network

Taeyoung Kim,Hongseok Yang

Compressor summary: The paper extends infinite-width analysis to the Jacobian of deep neural networks, showing that MLPs and their Jacobians converge to Gaussian processes in the infinite-width limit.

A Text-to-Text Model for Multilingual Offensive Language Identification

Tharindu Ranasinghe,Marcos Zampieri

Compressor summary: The paper introduces pre-trained encoder-decoder models for offensive language detection that outperform existing transformer-based models and achieve state-of-the-art results in multiple languages.

Riemannian Complex Matrix Convolution Network for PolSAR Image Classification

Junfei Shi,Wei Wang,Haiyan Jin,Mengmeng Nie,Shanshan Ji

Compressor summary: The paper proposes a new deep learning method for PolSAR image classification that directly uses the complex matrix as input, learns its structure in Riemannian space, and improves performance over existing methods.

Evaluating the point cloud of individual trees generated from images based on Neural Radiance fields (NeRF) method

Hongyu Huang,Guoji Tian,Chongcheng Chen

Compressor summary: The study uses Neural Radiance Fields (NeRF) to reconstruct three-dimensional trees from two-dimensional images, showing its efficiency and adaptability but with lower resolution and accuracy compared to photogrammetric methods.

Lazy-k: Decoding for Constrained Token Classification

Arthur Hemmer,Mickaël Coustaty,Nicola Bartolo,Jérôme Brachat,Jean-Marc Ogier

Compressor summary: The authors study how to improve probabilistic models for information extraction by combining them with constrained decoding methods, proposing a new method called Lazy-$k$, and showing its benefits over existing approaches.

KhabarChin: Automatic Detection of Important News in the Persian Language

Hamed Hematian Hemati,Arash Lagzian,Moein Salimi Sartakhti,Hamid Beigy,Ehsaneddin Asgari

Compressor summary: The paper introduces Khabarchin, a new dataset for detecting important news in Persian language, and proposes learning-based models to tackle this task.

Teaching Specific Scientific Knowledge into Large Language Models through Additional Training

Kan Hatakeyama-Sato,Yasuhiko Igarashi,Shun Katakami,Yuta Nabae,Teruaki Hayakawa

Compressor summary: The paragraph discusses using additional training to embed specialized scientific knowledge into a large language model, addressing challenges such as text scarcity and hyperparameter optimization.

RING-NeRF: A Versatile Architecture based on Residual Implicit Neural Grids

Doriand Petit,Steve Bourgeois,Dumitru Pavel,Vincent Gay-Bellile,Florian Chabot,Loic Barthe

Compressor summary: The RING-NeRF architecture uses Residual Implicit Neural Grids to control the level of detail and achieve fast training and state-of-the-art performance in 3D reconstruction and new view synthesis tasks.

PointMoment:Mixed-Moment-based Self-Supervised Representation Learning for 3D Point Clouds

Xin Cao,Xinxin Han,Yifan Wang,Mengna Yang,Kang Li

Compressor summary: PointMoment is a novel self-supervised representation learning framework for point clouds that uses a high-order mixed moment loss function to reduce feature redundancy and improve downstream tasks such as 3D point cloud classification and segmentation.

Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

Ke Alexander Wang,Emily B. Fox

Compressor summary: Key points: - Paper proposes hybrid variational autoencoder to learn interpretable representations of CGM and meal data for diabetes - Latent space grounded to mechanistic differential equation inputs, reflecting physiological quantities - Novel method to infer glucose appearance rate from unreliable meal logs - Unsupervised representation discovers separation between individuals proportional to disease severity - Embeddings produce clusters better than other features Summary: The paper presents a new method to learn interpretable and accurate representations of CGM and meal data for diabetes using a hybrid variational autoencoder that connects latent space to physiological quantities and infers glucose appearance rate. The method reveals disease severity and outperforms other features.

Topic and genre in dialogue

Amandine Decker,Ellen Breitholtz,Christine Howes,Staffan Larsson

Compressor summary: The paper proposes separating and defining genre and topic concepts to improve dialogue system flexibility and reliability.

Online Vectorized HD Map Construction using Geometry

Zhixin Zhang,Yiyuan Zhang,Xiaohan Ding,Fusheng Jin,Xiangyu Yue

Compressor summary: GeMap is a method that learns Euclidean shapes and relations of map instances beyond basic perception, achieving state-of-the-art performance on two datasets.

PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization

Xin Cao,Huan Xia,Xinxin Han,Yifan Wang,Kang Li,Linzhi Su

Compressor summary: PointJEM is a self-supervised point cloud representation learning method that reduces feature redundancy using joint entropy and performs well in downstream tasks.

Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities

Aaron J. Snoswell,Lucinda Nelson,Hao Xue,Flora D. Salim,Nicolas Suzor,Jean Burgess

Compressor summary: The paper argues that generic toxicity classifiers are not suitable for measuring misogyny in natural language generation and proposes using a misogyny-specific lexicon instead.

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Xiaobo Hu,Youfang Lin,HeHe Fan,Shuo Wang,Zhihao Wu,Kai Lv

Compressor summary: The paper proposes a Category Relation Graph (CRG) to learn object layout knowledge and a Temporal-Spatial-Region (TSR) attention architecture to capture object dependencies for visual navigation.

GCFA:Geodesic Curve Feature Augmentation via Shape Space Theory

Yuexing Han,Guanxin Wan,Bing Wang

Compressor summary: The authors propose Geodesic curve feature augmentation (GCFA), a method that projects image features into a shape space and generates new features along a geodesic curve, improving data preprocessing for deep learning models in small sample environments.

Background Clustering Pre-training for Few-shot Segmentation

Zhimiao Yu,Tiancheng Lin,Yi Xu

Compressor summary: The paper proposes a new pre-training method for few-shot segmentation called Background Clustering Pre-Training, which separates novel classes from the background and uses clustering and base classes to improve the performance.

Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

Saurabh Garg,Amrith Setlur,Zachary Chase Lipton,Sivaraman Balakrishnan,Virginia Smith,Aditi Raghunathan

Compressor summary: This paper investigates how combining self-training and contrastive learning techniques improve unsupervised domain adaptation and semi-supervised learning, with varying success depending on the setting.

Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

Wonjun Lee,Gary Geunbae Lee,Yunsu Kim

Compressor summary: The authors propose a method to improve speech recognition in low-resource languages by enhancing phoneme recognition and translation models with articulatory characteristics and realistic noise generation.

Benchmarking Continual Learning from Cognitive Perspectives

Xiaoqian Liu,Junge Zhang,Mingyi Zhang,Peipei Yang

Compressor summary: The paper proposes a unified evaluation paradigm for continual learning models based on cognitive properties supporting human continual learning, such as adaptability, sensitivity to task variations, and efficiency.

Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique

Ilya Tyagin,Ilya Safro

Compressor summary: The paper introduces Dyport, a new benchmarking system for evaluating biomedical hypothesis generation systems using curated datasets and dynamic graphs to assess both accuracy and impact.

DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction

Yanlong Li,Chamara Madarasingha,Kanchana Thilakarathna

Compressor summary: DiffPMAE is a self-supervised learning method for point cloud reconstruction that combines Masked Auto-Encoding and Diffusion Model, outperforming many existing techniques on various tasks.

Enhancing Molecular Property Prediction via Mixture of Collaborative Experts

Xu Yao,Shuang Liang,Songqiao Han,Hailiang Huang

Compressor summary: The GNN-MoCE architecture uses a mixture of collaborative experts to predict biochemical properties from molecular graphs, addressing data scarcity and imbalance in the Molecular Property Prediction task by exploiting task commonalities and enhancing expert diversity and influence.

OMNIINPUT: A Model-centric Evaluation Framework through Output Distribution

Weitang Liu,Ying Wai Li,Tianle Wang,Yi-Zhuang You,Jingbo Shang

Compressor summary: The OmniInput framework evaluates an AI/ML model's quality on all possible inputs by using a self-constructed test set and analyzing its output distribution.

Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym

Junjie Sheng,Zixiao Huang,Chuyun Shen,Wenhao Li,Yun Hua,Bo Jin,Hongyuan Zha,Xiangfeng Wang

Compressor summary: The authors investigate if language agents can be alternatives to PPO agents in sequential decision-making tasks by using the TextGym simulator, introducing different levels of scenarios, and proposing a novel EXE agent.

Class Incremental Learning for Adversarial Robustness

Seungju Cho,Hongshin Lee,Changick Kim

Compressor summary: The study proposes ARCIL, a method that combines adversarial robustness and incremental learning, and introduces FPD and LAD losses to address the loss of robustness in this setting, achieving significantly better results than existing methods on three datasets.

STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action Recognition

Nguyen Huu Bao Long

Compressor summary: The paper proposes a new method for skeleton-based action recognition using graph convolutional networks, cross-attention modules, and temporal attention transformers that outperforms previous methods on two datasets.

Indirect Gradient Matching for Adversarial Robust Distillation

Hongsin Lee,Seungju Cho,Changick Kim

Compressor summary: The paper proposes a new technique, IGDM, to improve adversarial robustness by transferring input gradient knowledge from a teacher model to a student model, which enhances the performance of existing adversarial distillation methods without additional data augmentation.

Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

Jimmy Li,Igor Kozlov,Di Wu,Xue Liu,Gregory Dudek

Compressor summary: The paper proposes a scalable framework using reinforcement learning and anomaly detection to optimize cellular RAN across many cell sites with varying traffic patterns, efficiently using computational resources.

SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Keifer Lee,Shubham Gupta,Sunglyoung Kim,Bhargav Makwana,Chao Chen,Chen Feng

Compressor summary: SOAR is a method for selecting good views for NeRF using interpretable functions and a learned network, improving speed and quality compared to baselines.

f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization

Sina Baharlouei,Shivam Patel,Meisam Razaviyayn

Compressor summary: The paper proposes a stochastic optimization framework for fair machine learning that works with small data batches, has convergence guarantees, and performs well on both training and test data distribution shifts.

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Hailin Zhang,Zirui Liu,Boxuan Chen,Yikai Zhao,Tong Zhao,Tong Yang,Bin Cui

Compressor summary: CAFE is a new compression framework for Deep Learning Recommendation Models that uses HotSketch to capture feature importance and hash embedding for non-hot features, achieving better performance than existing methods.

Seller-side Outcome Fairness in Online Marketplaces

Zikun Ye,Reza Yousefi Maragheh,Lalitesh Morishetti,Shanu Vashishtha,Jason Cho,Kaushiki Nag,Sushant Kumar,Kannan Achan

Compressor summary: The paper proposes an optimization model and a gradient-based algorithm to improve seller fairness in online marketplaces by balancing recommendation rewards and a fairness metric.

Customizable Combination of Parameter-Efficient Modules for Multi-Task Learning

Haowen Wang,Tao Sun,Cong Fan,Jinjie Gu

Compressor summary: C-Poly is a novel approach for improving neural networks' knowledge organization, leading to better cross-task generalization using customized skills and shared skills learned with low-rank techniques.

Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets

Jiale Yan,Hiroaki Ito,Ángel López García-Arias,Yasuyuki Okoshi,Hikari Otsuka,Kazushi Kawamura,Thiem Van Chu,Masato Motomura

Compressor summary: This paper explores subnetworks in graph neural networks (GNNs) using scalar pruning mask methods, discovering untrained recurrent networks with high performance and reducing memory usage by up to 98.7%.

Deep Multimodal Fusion for Surgical Feedback Classification

Rafal Kocielnik,Elyssa Y. Wong,Timothy N. Chu,Lydia Lin,De-An Huang,Jiayun Wang,Anima Anandkumar,Andrew J. Hung

Compressor summary: This paper develops a machine learning model to classify five types of surgical feedback (Anatomic, Technical, Procedural, Praise, Visual Aid) from text, audio, and video inputs to help improve surgical training.

Human Body Model based ID using Shape and Pose Parameters

Aravind Sundaresan,Brian Burns,Indranil Sur,Yi Yao,Xiao Lin,Sujeong Kim

Compressor summary: The HMID system, trained with shape, pose, and biometric losses, improves biometric identification performance on raw images of human bodies in various conditions.

Rethinking Object Saliency Ranking: A Novel Whole-flow Processing Paradigm

Mengke Song,Linfeng Li,Dunquan Wu,Wenfeng Song,Chenglizhao Chen

Compressor summary: The paper presents a new method for ranking salient objects by importance order, addressing challenges in existing methods such as ill-defined ground truth, multi-task conflicts, and complex model designs.

Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels

Heng Huang,Xin Jin,Yaqi Liu,Hao Lou,Chaoen Xiao,Shuai Cui,Xinning Li,Dongqing Zou

Compressor summary: The paper proposes a new model, F2S, that predicts image aesthetic attributes using feature extractors instead of labels, enabling the learning of meaningful attribute scores from overall scores.

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Yuanshi Liu,Hanzhen Zhao,Yang Xu,Pengyun Yue,Cong Fang

Compressor summary: This paper proposes adaptive gradient-based algorithms that improve complexities for machine learning problems by refining the description of degenerated conditions using two factors and addressing the limitations of existing optimization modeling and analysis.

SDSRA: A Skill-Driven Skill-Recombination Algorithm for Efficient Policy Learning

Eric H. Jiang,Andrew Lizarraga

Compressor summary: The paper presents a new algorithm, SDSRA, that improves efficiency and policy quality in reinforcement learning tasks by combining skill-based strategies with robust Actor-Critic framework.

Bootstrap Your Own Variance

Polina Turishcheva,Jason Ramapuram,Sinead Williamson,Dan Busbridge,Eeshan Dhekane,Russ Webb

Compressor summary: BYOV combines self-supervised learning and Bayesian methods to estimate uncertainty in model predictions, outperforming a deterministic baseline and providing preliminary evidence of its usefulness.

Constrained Bayesian Optimization Under Partial Observations: Balanced Improvements and Provable Convergence

Shengbo Wang,Ke Li

Compressor summary: The paper proposes an efficient and provable method for solving expensive partially observable constrained optimization problems using improved acquisition functions and a surrogate model that better represents feasible regions.

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Felix Wimbauer,Bichen Wu,Edgar Schoenfeld,Xiaoliang Dai,Ji Hou,Zijian He,Artsiom Sanakoyeu,Peizhao Zhang,Sam Tsai,Jonas Kohler,Christian Rupprecht,Daniel Cremers,Peter Vajda,Jialiang Wang

Compressor summary: The authors propose block caching, a technique that reuses outputs from previous layer blocks in diffusion models to speed up image generation while maintaining high visual quality.

Satellite Imagery and AI: A New Era in Ocean Conservation, from Research to Deployment and Impact

Patrick Beukema,Favyen Bastani,Piper Wolters,Henry Herzog,Joe Ferdinando

Compressor summary: The paper introduces three specialized computer vision models for satellite data to monitor global IUU fishing and presents best practices for real-time maritime conservation using the Skylight platform.

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou,Haoran Chang,Sicheng Jiang,Zhiwen Fan,Zehao Zhu,Dejia Xu,Pradyumna Chari,Suya You,Zhangyang Wang,Achuta Kadambi

Compressor summary: The paper presents a method to extend NeRF's functionality for semantic tasks using 3D Gaussian Splatting and 2D foundation models, while addressing speed and quality issues.

Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging

Seungyeon Lee,Thai-Hoang Pham,Zhao Cheng,Ping Zhang

Compressor summary: The study introduces a neural network model called DREAM that improves automatic sleep staging by learning domain generalized representations from diverse physiological signals and modeling sleep dynamics, outperforming existing methods on three datasets.

Detecting Rumor Veracity with Only Textual Information by Double-Channel Structure

Alex Kim,Sangwon Yoon

Compressor summary: The authors propose a double-channel model for classifying rumors on social media as true, false, or unverifiable based on their informativeness and use it to achieve state-of-the-art results on a dataset.

Corporate Bankruptcy Prediction with Domain-Adapted BERT

Alex Kim,Sangwon Yoon

Compressor summary: The study uses BERT to analyze company disclosures and improve bankruptcy prediction by enhancing the input dataset quality, achieving high accuracy rates.