arxiv compressed, 2023-12-26

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-26 generated by the compressor, my personal LLM-based project.


MACS: Mass Conditioned 3D Hand and Object Motion Synthesis

Soshi Shimada,Franziska Mueller,Jan Bednarik,Bardia Doosti,Bernd Bickel,Danhang Tang,Vladislav Golyanik,Jonathan Taylor,Christian Theobalt,Thabo Beeler

http://arxiv.org/abs/2312.14929v1

Compressor summary: MACS is a new approach for synthesizing natural 3D hand and object motions based on object mass and interaction type, which can be used for various applications such as generating training data, fast animation, and character interactions in computer games.


A Survey of Reinforcement Learning from Human Feedback

Timo Kaufmann,Paul Weng,Viktor Bengs,Eyke Hüllermeier

http://arxiv.org/abs/2312.14925v1

Compressor summary: RLHF is a technique that learns from human feedback to enhance AI performance and align its objectives with human values, with applications ranging from language models to various other domains.


Training Convolutional Neural Networks with the Forward-Forward algorithm

Riccardo Scodellaro,Ajinkya Kulkarni,Frauke Alves,Matthias Schröter

http://arxiv.org/abs/2312.14924v1

Compressor summary: The paper proposes a new training method for Convolutional Neural Networks (CNNs) using the Forward Forward (FF) algorithm and achieves 99.0% accuracy on MNIST dataset with a novel labeling technique.


Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models

Guihong Li,Hsiang Hsu,Chun-Fu Chen,Radu Marculescu

http://arxiv.org/abs/2312.14923v1

Compressor summary: Fast-NTK is a novel algorithm that allows selective data removal from large-scale neural networks without retraining, reducing computational complexity by incorporating parameter-efficient fine-tuning methods.


A Novel Sampled Clustering Algorithm for Rice Phenotypic Data

Mithun Singh,Kapil Ahuja,Milind B. Ratnaparkhe

http://arxiv.org/abs/2312.14920v1

Compressor summary: The authors improve a spectral clustering algorithm for rice species by modifying the similarity matrix construction and scaling factor, resulting in better accuracy and speed compared to hierarchical clustering.


Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

James Gunn,Zygmunt Lenyk,Anuj Sharma,Andrea Donati,Alexandru Buburuzan,John Redford,Romain Mueller

http://arxiv.org/abs/2312.14919v1

Compressor summary: The paper proposes a novel fusion method for autonomous driving that bypasses monocular depth estimation and uses attention to select and fuse camera and lidar features in a bird's-eye-view grid, leading to better 3D object detection.


PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF

Mohsen Gholami,Rabab Ward,Z. Jane Wang

http://arxiv.org/abs/2312.14915v1

Compressor summary: PoseGen uses NeRFs to generate diverse 3D human pose datasets, which improve the robustness of pre-trained pose estimators when applied to out-of-distribution samples.


FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models

Subhodip Panda,Prathosh AP

http://arxiv.org/abs/2312.14895v1

Compressor summary: The text discusses the need for precise control over deep generative models, especially when they generate harmful content, and proposes a method called FAST that filters out unwanted features in black-box systems.


Theory of Hallucinations based on Equivariance

Hisaichi Shibata

http://arxiv.org/abs/2312.14504v1

Compressor summary: The text proposes a new theory that links insufficient equivarity in language models to hallucinations, and presents a novel technique based on T5 model to test this theory on a toy model.


ViStripformer: A Token-Efficient Transformer for Versatile Video Restoration

Fu-Jen Tsai,Yan-Tsung Peng,Chen-Yu Chang,Chan-Yu Li,Yen-Yu Lin,Chung-Chi Tsai,Chia-Wen Lin

http://arxiv.org/abs/2312.14502v1

Compressor summary: ViStripformer is a video restoration method that uses strip attention to capture spatial and temporal information, outperforming traditional transformers in efficiency and effectiveness.


Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks

Zheyuan Hu,Zekun Shi,George Em Karniadakis,Kenji Kawaguchi

http://arxiv.org/abs/2312.14499v1

Compressor summary: The text introduces Hutchinson Trace Estimation (HTE), which improves the performance of Physics-Informed Neural Networks (PINNs) when solving high-dimensional and high-order partial differential equations (PDEs) by reducing computational cost and memory consumption.


Context Enhanced Transformer for Single Image Object Detection

Seungjun An,Seonghoon Park,Gyeongnyeon Kim,Jeongyeol Baek,Byeongwon Lee,Seungryong Kim

http://arxiv.org/abs/2312.14492v1

Compressor summary: The paper proposes CETR, a single image object detection method that incorporates temporal context from videos using a memory module.


Language Model is a Branch Predictor for Simultaneous Machine Translation

Aoxiong Yin,Tianyun Zhong,Haoyuan Li,Siliang Tang,Zhou Zhao

http://arxiv.org/abs/2312.14488v1

Compressor summary: The paper proposes using branch prediction techniques from CPUs to reduce translation latency in simultaneous machine translation, while preserving quality by predicting future source words and decoding output accordingly.


Part to Whole: Collaborative Prompting for Surgical Instrument Segmentation

Wenxi Yue,Jing Zhang,Kun Hu,Qiuxia Wu,Zongyuan Ge,Yong Xia,Jiebo Luo,Zhiyong Wang

http://arxiv.org/abs/2312.14481v1

Compressor summary: The paper proposes a new method, SP-SAM, for segmenting surgical instruments using text prompts and joint visual embeddings to better understand instrument structures and categories.


MonoLSS: Learnable Sample Selection For Monocular 3D Detection

Zhenjia Li,Jinrang Jia,Yifeng Shi

http://arxiv.org/abs/2312.14474v1

Compressor summary: The paper proposes a method to improve monocular 3D detection of cars, cyclists, and pedestrians by selecting suitable samples adaptively using a Learnable Sample Selection module and enriching data with MixUp3D.


Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing

Jinmin He,Kai Li,Yifan Zang,Haobo Fu,Qiang Fu,Junliang Xing,Jian Cheng

http://arxiv.org/abs/2312.14472v1

Compressor summary: Dynamic Depth Routing (D2R) is a framework that learns to flexibly adjust the number of modules used for different tasks in multi-task reinforcement learning, improving data efficiency and performance on robotics manipulation tasks.


Prototype-based Cross-Modal Object Tracking

Lei Liu,Chenglong Li,Futian Wang,Longfeng Shen,Jin Tang

http://arxiv.org/abs/2312.14471v1

Compressor summary: ProtoTrack is a cross-modal object tracker that adapts to target appearance variations using multi-modal prototypes and generates them with novel algorithms.


Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration

Honghao Wei,Xin Liu,Lei Ying

http://arxiv.org/abs/2312.14470v1

Compressor summary: The paper proposes a safe Reinforcement Learning algorithm that handles hard instantaneous constraints without knowing a safe action set or a safe graph, and works for general cost functions in Reproducing Kernel Hilbert Space.


FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

Dongmei Zhang,Chang Li,Ray Zhang,Shenghao Xie,Wei Xue,Xiaodong Xie,Shanghang Zhang

http://arxiv.org/abs/2312.14465v1

Compressor summary: The paper proposes FM-OV3D, a method that blends knowledge from multiple pre-trained foundation models to improve open-vocabulary 3D detection tasks without dataset constraints.


How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?

Soumya Suvra Ghosal,Yiyou Sun,Yixuan Li

http://arxiv.org/abs/2312.14452v1

Compressor summary: The paper proposes a new OOD detection method called Subspace Nearest Neighbor (SNN) that uses subspace learning to reduce the curse-of-dimensionality and improve distance-based detection.


PUMA: Efficient Continual Graph Learning with Graph Condensation

Yilun Liu,Ruihong Qiu,Yanran Tang,Hongzhi Yin,Zi Huang

http://arxiv.org/abs/2312.14439v1

Compressor summary: The paper proposes PUMA, a memory bank for graph representation learning that condenses both labelled and unlabelled nodes, uses training-from-scratch, and prorogation to improve efficiency and effectiveness.


PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering

Bingheng Li,Erlin Pan,Zhao Kang

http://arxiv.org/abs/2312.14438v1

Compressor summary: The paper proposes a two-fold filtering mechanism to extract homophily in heterophilic and vice versa, using graph heat equation and Possion-Charlier polynomials, and applies it to node classification with PCNet.


Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning

Jay Shenoy,Axel Levy,Frédéric Poitevin,Gordon Wetzstein

http://arxiv.org/abs/2312.14432v1

Compressor summary: X-RAI is a new online framework for reconstructing 3D structures of biomolecules from large X-ray free-electron laser datasets, enabling real-time capture and analysis of fleeting states under near-physiological conditions.


A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing

Jay Lee,Hanqi Su

http://arxiv.org/abs/2312.14428v1

Compressor summary: This paper introduces ILKMs, a framework to apply large language models in industry 4.0 and smart manufacturing by incorporating domain-specific knowledge, and compares them with LLMs using eight perspectives.


GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds

Mostafa ElAraby,Sabyasachi Sahoo,Yann Pequignot,Paul Novello,Liam Paull

http://arxiv.org/abs/2312.14427v1

Compressor summary: GROOD is a novel framework that uses gradient space and class prototypes to distinguish between in-distribution and out-of-distribution samples in image classification tasks, improving robustness against over-confident predictions.


Efficacy of Machine-Generated Instructions

Samaksh Gulati,Anshit Verma,Manoj Parmar,Palash Chaudhary

http://arxiv.org/abs/2312.14423v1

Compressor summary: The study shows that machine-generated annotations can be a more efficient alternative to human-written instructions for fine-tuning language models.


Enhancing Actionable Formal Concept Identification with Base-Equivalent Conceptual-Relevance

Ayao Bobi,Rokia Missaoui,Mohamed Hamza Ibrahim

http://arxiv.org/abs/2312.14421v1

Compressor summary: The paper proposes a new measure, BECR, for identifying important concepts in large data sets using formal concept analysis, based on the number of base and equivalent attributes and minimal generators per concept intent.


A Multi-Stage Adaptive Feature Fusion Neural Network for Multimodal Gait Recognition

Shinan Zou,Jianbo Xiong,Chao Fan,Shiqi Yu,Jin Tang

http://arxiv.org/abs/2312.14410v1

Compressor summary: The paper introduces a novel multimodal gait recognition algorithm that exploits the complementary advantages of multiple modalities using a multi-stage feature fusion strategy, an adaptive feature fusion module, and a multiscale spatial-temporal feature extractor.


AdvCloak: Customized Adversarial Cloak for Privacy Protection

Xuannan Liu,Yaoyao Zhong,Xing Cui,Yuhang Zhang,Peipei Li,Weihong Deng

http://arxiv.org/abs/2312.14407v1

Compressor summary: AdvCloak is a framework that protects privacy by automatically generating personalized adversarial masks for faces using generative models, achieving high naturalness and generalization ability.


Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection

Ze Yu Zhao,Zheng Zhu,Guilin Li,Wenhan Wang,Bo Wang

http://arxiv.org/abs/2312.14406v1

Compressor summary: Key points: - Autoregressive model using GPT for fraud detection in payment systems - Confronts token explosion and reconstructs behavioral sequences - No need for labeled data, uses unsupervised pretraining - Integrates differential convolutional approach for anomaly detection - Applicable in various transactional contexts Summary: The authors propose a novel GPT-based autoregressive model that detects fraud in payment systems by representing transactions without labels and enhancing anomaly detection with differential convolution.


Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits

Qi Xu,Lijie Wang,Jing Wang,Song Chen,Lin Cheng,Yi Kang

http://arxiv.org/abs/2312.14405v1

Compressor summary: The paper proposes a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout, improving performance and reducing runtime compared to existing methods.


Cross-Covariate Gait Recognition: A Benchmark

Shinan Zou,Chao Fan,Jianbo Xiong,Chuanfu Shen,Shiqi Yu,Jin Tang

http://arxiv.org/abs/2312.14404v1

Compressor summary: The paper presents a new large and diverse gait dataset (CCGR) and proposes a parsing-based approach for cross-covariate gait recognition, which is a challenging but important problem in gait research.


The Fairness Fair: Bringing Human Perception into Collective Decision-Making

Hadi Hosseini

http://arxiv.org/abs/2312.14402v1

Compressor summary: This text discusses the importance of studying fairness in collective decision-making from various perspectives, including human perception, cognition, and interaction with AI, to better capture its complexities in real-world problems.


Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances

Cristian Rodriguez-Opazo,Edison Marrese-Taylor,Ehsan Abbasnejad,Hamed Damirchi,Ignacio M. Jara,Felipe Bravo-Marquez,Anton van den Hengel

http://arxiv.org/abs/2312.14400v1

Compressor summary: The paper investigates how different neural architectures perform with CLIP and proposes a method to combine their predictions for better image classification.


Unsupervised Deep Learning Image Verification Method

Enoch Solomon,Abraham Woubie,Eyael Solomon Emiru

http://arxiv.org/abs/2312.14395v1

Compressor summary: The paper proposes a method to improve face verification using an autoencoder that converts face vectors into a novel representation by reconstructing neighboring face vectors based on cosine similarity, achieving a 56% relative improvement in EER over the baseline system.


AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction

Tangwen Qian,Yile Chen,Gao Cong,Yongjun Xu,Fei Wang

http://arxiv.org/abs/2312.14394v1

Compressor summary: AdapTraj is a new framework for multi-agent trajectory prediction that leverages multiple source domains and uses a causal formulation to model domain-invariant and domain-specific features, improving performance over existing methods.


StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors

Wanchao Su,Can Wang,Chen Liu,Hangzhou Han,Hongbo Fu,Jing Liao

http://arxiv.org/abs/2312.14389v1

Compressor summary: StyleRetoucher is a novel automatic portrait image retouching framework that uses StyleGAN's generation and generalization ability to improve skin condition while preserving facial details, outperforming existing solutions.


Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation

Chaowei Fang,Ziyin Zhou,Junye Chen,Hanjing Su,Qingyao Wu,Guanbin Li

http://arxiv.org/abs/2312.14387v1

Compressor summary: The paper proposes a new method to improve point-based interactive image segmentation by refining the initial mask with consistent inferences and target-preserving zooming, achieving state-of-the-art results on various datasets.


Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Anirudh S. Sundar,Chao-Han Huck Yang,David M. Chan,Shalini Ghosh,Venkatesh Ravichandran,Phani Sankar Nidadavolu

http://arxiv.org/abs/2312.14378v1

Compressor summary: MAM is a method that transfers knowledge from text and image models to speech and audio models using attention matrices, improving their performance on downstream tasks.


Learning Socio-Temporal Graphs for Multi-Agent Trajectory Prediction

Yuke Li,Lixiong Chen,Guangyi Chen,Ching-Yao Chan,Kun Zhang,Stefano Anzellotti,Donglai Wei

http://arxiv.org/abs/2312.14373v1

Compressor summary: The paper proposes STGformer, an attention-based model that captures pair-wise socio-temporal interactions among pedestrians using Directed Acyclic Graphs and achieves state-of-the-art prediction accuracy in trajectory prediction.


Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations

Alexander Grushin

http://arxiv.org/abs/2312.14359v1

Compressor summary: The paper explores the possibility of training machine learning models with internal state using binary activations and few layers, and proposes a new algorithm to do so, while discussing its limitations and potential benefits.


Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models

Priyesh Vakharia,Devavrat Joshi,Meenal Chavan,Dhananjay Sonawane,Bhrigu Garg,Parsa Mazaheri,Ian Lane

http://arxiv.org/abs/2312.14346v1

Compressor summary: The paper investigates LLM hallucinations, proposes a token-level approach to identify them, and applies it to improve dialogue summarization's interpretability and reliability.


Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

Behnam Rahdari,Hao Ding,Ziwei Fan,Yifei Ma,Zhuotong Chen,Anoop Deoras,Branislav Kveton

http://arxiv.org/abs/2312.14345v1

Compressor summary: Logic-Scaffolding is a framework that uses aspect-based explanation and chain-of-thought prompting to help Large Language Models generate reliable zero-shot explanations through intermediate reasoning steps.