This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-26 generated by the compressor, my personal LLM-based project.
Soshi Shimada,Franziska Mueller,Jan Bednarik,Bardia Doosti,Bernd Bickel,Danhang Tang,Vladislav Golyanik,Jonathan Taylor,Christian Theobalt,Thabo Beeler
http://arxiv.org/abs/2312.14929v1
Compressor summary: MACS is a new approach for synthesizing natural 3D hand and object motions based on object mass and interaction type, which can be used for various applications such as generating training data, fast animation, and character interactions in computer games.
Timo Kaufmann,Paul Weng,Viktor Bengs,Eyke Hüllermeier
http://arxiv.org/abs/2312.14925v1
Compressor summary: RLHF is a technique that learns from human feedback to enhance AI performance and align its objectives with human values, with applications ranging from language models to various other domains.
Riccardo Scodellaro,Ajinkya Kulkarni,Frauke Alves,Matthias Schröter
http://arxiv.org/abs/2312.14924v1
Compressor summary: The paper proposes a new training method for Convolutional Neural Networks (CNNs) using the Forward Forward (FF) algorithm and achieves 99.0% accuracy on MNIST dataset with a novel labeling technique.
Guihong Li,Hsiang Hsu,Chun-Fu Chen,Radu Marculescu
http://arxiv.org/abs/2312.14923v1
Compressor summary: Fast-NTK is a novel algorithm that allows selective data removal from large-scale neural networks without retraining, reducing computational complexity by incorporating parameter-efficient fine-tuning methods.
Mithun Singh,Kapil Ahuja,Milind B. Ratnaparkhe
http://arxiv.org/abs/2312.14920v1
Compressor summary: The authors improve a spectral clustering algorithm for rice species by modifying the similarity matrix construction and scaling factor, resulting in better accuracy and speed compared to hierarchical clustering.
James Gunn,Zygmunt Lenyk,Anuj Sharma,Andrea Donati,Alexandru Buburuzan,John Redford,Romain Mueller
http://arxiv.org/abs/2312.14919v1
Compressor summary: The paper proposes a novel fusion method for autonomous driving that bypasses monocular depth estimation and uses attention to select and fuse camera and lidar features in a bird's-eye-view grid, leading to better 3D object detection.
Mohsen Gholami,Rabab Ward,Z. Jane Wang
http://arxiv.org/abs/2312.14915v1
Compressor summary: PoseGen uses NeRFs to generate diverse 3D human pose datasets, which improve the robustness of pre-trained pose estimators when applied to out-of-distribution samples.
Subhodip Panda,Prathosh AP
http://arxiv.org/abs/2312.14895v1
Compressor summary: The text discusses the need for precise control over deep generative models, especially when they generate harmful content, and proposes a method called FAST that filters out unwanted features in black-box systems.
Hisaichi Shibata
http://arxiv.org/abs/2312.14504v1
Compressor summary: The text proposes a new theory that links insufficient equivarity in language models to hallucinations, and presents a novel technique based on T5 model to test this theory on a toy model.
Fu-Jen Tsai,Yan-Tsung Peng,Chen-Yu Chang,Chan-Yu Li,Yen-Yu Lin,Chung-Chi Tsai,Chia-Wen Lin
http://arxiv.org/abs/2312.14502v1
Compressor summary: ViStripformer is a video restoration method that uses strip attention to capture spatial and temporal information, outperforming traditional transformers in efficiency and effectiveness.
Zheyuan Hu,Zekun Shi,George Em Karniadakis,Kenji Kawaguchi
http://arxiv.org/abs/2312.14499v1
Compressor summary: The text introduces Hutchinson Trace Estimation (HTE), which improves the performance of Physics-Informed Neural Networks (PINNs) when solving high-dimensional and high-order partial differential equations (PDEs) by reducing computational cost and memory consumption.
Seungjun An,Seonghoon Park,Gyeongnyeon Kim,Jeongyeol Baek,Byeongwon Lee,Seungryong Kim
http://arxiv.org/abs/2312.14492v1
Compressor summary: The paper proposes CETR, a single image object detection method that incorporates temporal context from videos using a memory module.
Aoxiong Yin,Tianyun Zhong,Haoyuan Li,Siliang Tang,Zhou Zhao
http://arxiv.org/abs/2312.14488v1
Compressor summary: The paper proposes using branch prediction techniques from CPUs to reduce translation latency in simultaneous machine translation, while preserving quality by predicting future source words and decoding output accordingly.
Wenxi Yue,Jing Zhang,Kun Hu,Qiuxia Wu,Zongyuan Ge,Yong Xia,Jiebo Luo,Zhiyong Wang
http://arxiv.org/abs/2312.14481v1
Compressor summary: The paper proposes a new method, SP-SAM, for segmenting surgical instruments using text prompts and joint visual embeddings to better understand instrument structures and categories.
Zhenjia Li,Jinrang Jia,Yifeng Shi
http://arxiv.org/abs/2312.14474v1
Compressor summary: The paper proposes a method to improve monocular 3D detection of cars, cyclists, and pedestrians by selecting suitable samples adaptively using a Learnable Sample Selection module and enriching data with MixUp3D.
Jinmin He,Kai Li,Yifan Zang,Haobo Fu,Qiang Fu,Junliang Xing,Jian Cheng
http://arxiv.org/abs/2312.14472v1
Compressor summary: Dynamic Depth Routing (D2R) is a framework that learns to flexibly adjust the number of modules used for different tasks in multi-task reinforcement learning, improving data efficiency and performance on robotics manipulation tasks.
Lei Liu,Chenglong Li,Futian Wang,Longfeng Shen,Jin Tang
http://arxiv.org/abs/2312.14471v1
Compressor summary: ProtoTrack is a cross-modal object tracker that adapts to target appearance variations using multi-modal prototypes and generates them with novel algorithms.
Honghao Wei,Xin Liu,Lei Ying
http://arxiv.org/abs/2312.14470v1
Compressor summary: The paper proposes a safe Reinforcement Learning algorithm that handles hard instantaneous constraints without knowing a safe action set or a safe graph, and works for general cost functions in Reproducing Kernel Hilbert Space.
Dongmei Zhang,Chang Li,Ray Zhang,Shenghao Xie,Wei Xue,Xiaodong Xie,Shanghang Zhang
http://arxiv.org/abs/2312.14465v1
Compressor summary: The paper proposes FM-OV3D, a method that blends knowledge from multiple pre-trained foundation models to improve open-vocabulary 3D detection tasks without dataset constraints.
Soumya Suvra Ghosal,Yiyou Sun,Yixuan Li
http://arxiv.org/abs/2312.14452v1
Compressor summary: The paper proposes a new OOD detection method called Subspace Nearest Neighbor (SNN) that uses subspace learning to reduce the curse-of-dimensionality and improve distance-based detection.
Yilun Liu,Ruihong Qiu,Yanran Tang,Hongzhi Yin,Zi Huang
http://arxiv.org/abs/2312.14439v1
Compressor summary: The paper proposes PUMA, a memory bank for graph representation learning that condenses both labelled and unlabelled nodes, uses training-from-scratch, and prorogation to improve efficiency and effectiveness.
Bingheng Li,Erlin Pan,Zhao Kang
http://arxiv.org/abs/2312.14438v1
Compressor summary: The paper proposes a two-fold filtering mechanism to extract homophily in heterophilic and vice versa, using graph heat equation and Possion-Charlier polynomials, and applies it to node classification with PCNet.
Jay Shenoy,Axel Levy,Frédéric Poitevin,Gordon Wetzstein
http://arxiv.org/abs/2312.14432v1
Compressor summary: X-RAI is a new online framework for reconstructing 3D structures of biomolecules from large X-ray free-electron laser datasets, enabling real-time capture and analysis of fleeting states under near-physiological conditions.
Jay Lee,Hanqi Su
http://arxiv.org/abs/2312.14428v1
Compressor summary: This paper introduces ILKMs, a framework to apply large language models in industry 4.0 and smart manufacturing by incorporating domain-specific knowledge, and compares them with LLMs using eight perspectives.
Mostafa ElAraby,Sabyasachi Sahoo,Yann Pequignot,Paul Novello,Liam Paull
http://arxiv.org/abs/2312.14427v1
Compressor summary: GROOD is a novel framework that uses gradient space and class prototypes to distinguish between in-distribution and out-of-distribution samples in image classification tasks, improving robustness against over-confident predictions.
Samaksh Gulati,Anshit Verma,Manoj Parmar,Palash Chaudhary
http://arxiv.org/abs/2312.14423v1
Compressor summary: The study shows that machine-generated annotations can be a more efficient alternative to human-written instructions for fine-tuning language models.
Ayao Bobi,Rokia Missaoui,Mohamed Hamza Ibrahim
http://arxiv.org/abs/2312.14421v1
Compressor summary: The paper proposes a new measure, BECR, for identifying important concepts in large data sets using formal concept analysis, based on the number of base and equivalent attributes and minimal generators per concept intent.
Shinan Zou,Jianbo Xiong,Chao Fan,Shiqi Yu,Jin Tang
http://arxiv.org/abs/2312.14410v1
Compressor summary: The paper introduces a novel multimodal gait recognition algorithm that exploits the complementary advantages of multiple modalities using a multi-stage feature fusion strategy, an adaptive feature fusion module, and a multiscale spatial-temporal feature extractor.
Xuannan Liu,Yaoyao Zhong,Xing Cui,Yuhang Zhang,Peipei Li,Weihong Deng
http://arxiv.org/abs/2312.14407v1
Compressor summary: AdvCloak is a framework that protects privacy by automatically generating personalized adversarial masks for faces using generative models, achieving high naturalness and generalization ability.
Ze Yu Zhao,Zheng Zhu,Guilin Li,Wenhan Wang,Bo Wang
http://arxiv.org/abs/2312.14406v1
Compressor summary: Key points: - Autoregressive model using GPT for fraud detection in payment systems - Confronts token explosion and reconstructs behavioral sequences - No need for labeled data, uses unsupervised pretraining - Integrates differential convolutional approach for anomaly detection - Applicable in various transactional contexts Summary: The authors propose a novel GPT-based autoregressive model that detects fraud in payment systems by representing transactions without labels and enhancing anomaly detection with differential convolution.
Qi Xu,Lijie Wang,Jing Wang,Song Chen,Lin Cheng,Yi Kang
http://arxiv.org/abs/2312.14405v1
Compressor summary: The paper proposes a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout, improving performance and reducing runtime compared to existing methods.
Shinan Zou,Chao Fan,Jianbo Xiong,Chuanfu Shen,Shiqi Yu,Jin Tang
http://arxiv.org/abs/2312.14404v1
Compressor summary: The paper presents a new large and diverse gait dataset (CCGR) and proposes a parsing-based approach for cross-covariate gait recognition, which is a challenging but important problem in gait research.
Hadi Hosseini
http://arxiv.org/abs/2312.14402v1
Compressor summary: This text discusses the importance of studying fairness in collective decision-making from various perspectives, including human perception, cognition, and interaction with AI, to better capture its complexities in real-world problems.
Cristian Rodriguez-Opazo,Edison Marrese-Taylor,Ehsan Abbasnejad,Hamed Damirchi,Ignacio M. Jara,Felipe Bravo-Marquez,Anton van den Hengel
http://arxiv.org/abs/2312.14400v1
Compressor summary: The paper investigates how different neural architectures perform with CLIP and proposes a method to combine their predictions for better image classification.
Enoch Solomon,Abraham Woubie,Eyael Solomon Emiru
http://arxiv.org/abs/2312.14395v1
Compressor summary: The paper proposes a method to improve face verification using an autoencoder that converts face vectors into a novel representation by reconstructing neighboring face vectors based on cosine similarity, achieving a 56% relative improvement in EER over the baseline system.
Tangwen Qian,Yile Chen,Gao Cong,Yongjun Xu,Fei Wang
http://arxiv.org/abs/2312.14394v1
Compressor summary: AdapTraj is a new framework for multi-agent trajectory prediction that leverages multiple source domains and uses a causal formulation to model domain-invariant and domain-specific features, improving performance over existing methods.
Wanchao Su,Can Wang,Chen Liu,Hangzhou Han,Hongbo Fu,Jing Liao
http://arxiv.org/abs/2312.14389v1
Compressor summary: StyleRetoucher is a novel automatic portrait image retouching framework that uses StyleGAN's generation and generalization ability to improve skin condition while preserving facial details, outperforming existing solutions.
Chaowei Fang,Ziyin Zhou,Junye Chen,Hanjing Su,Qingyao Wu,Guanbin Li
http://arxiv.org/abs/2312.14387v1
Compressor summary: The paper proposes a new method to improve point-based interactive image segmentation by refining the initial mask with consistent inferences and target-preserving zooming, achieving state-of-the-art results on various datasets.
Anirudh S. Sundar,Chao-Han Huck Yang,David M. Chan,Shalini Ghosh,Venkatesh Ravichandran,Phani Sankar Nidadavolu
http://arxiv.org/abs/2312.14378v1
Compressor summary: MAM is a method that transfers knowledge from text and image models to speech and audio models using attention matrices, improving their performance on downstream tasks.
Yuke Li,Lixiong Chen,Guangyi Chen,Ching-Yao Chan,Kun Zhang,Stefano Anzellotti,Donglai Wei
http://arxiv.org/abs/2312.14373v1
Compressor summary: The paper proposes STGformer, an attention-based model that captures pair-wise socio-temporal interactions among pedestrians using Directed Acyclic Graphs and achieves state-of-the-art prediction accuracy in trajectory prediction.
Alexander Grushin
http://arxiv.org/abs/2312.14359v1
Compressor summary: The paper explores the possibility of training machine learning models with internal state using binary activations and few layers, and proposes a new algorithm to do so, while discussing its limitations and potential benefits.
Priyesh Vakharia,Devavrat Joshi,Meenal Chavan,Dhananjay Sonawane,Bhrigu Garg,Parsa Mazaheri,Ian Lane
http://arxiv.org/abs/2312.14346v1
Compressor summary: The paper investigates LLM hallucinations, proposes a token-level approach to identify them, and applies it to improve dialogue summarization's interpretability and reliability.
Behnam Rahdari,Hao Ding,Ziwei Fan,Yifei Ma,Zhuotong Chen,Anoop Deoras,Branislav Kveton
http://arxiv.org/abs/2312.14345v1
Compressor summary: Logic-Scaffolding is a framework that uses aspect-based explanation and chain-of-thought prompting to help Large Language Models generate reliable zero-shot explanations through intermediate reasoning steps.