arxiv compressed, 2024-01-02

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-02 generated by the compressor, my personal LLM-based project.


Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Li Du,Afra Amini,Lucas Torroba Hennigen,Xinyan Velocity Yu,Jason Eisner,Holden Lee,Ryan Cotterell

http://arxiv.org/abs/2312.17710v1

Compressor summary: The paper proposes new faithful gradient-based sampling algorithms for energy-based text generation that improve fluency and adherence to control objectives.


TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

Felipe Oliveira,Victoria Reis,Nelson Ebecken

http://arxiv.org/abs/2312.17704v1

Compressor summary: TuPy-E is a large annotated Portuguese corpus for hate speech detection that uses an open-source approach and advanced techniques like BERT models.


Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

Ioanna Ntinou,Enrique Sanchez,Georgios Tzimiropoulos

http://arxiv.org/abs/2312.17686v1

Compressor summary: The paper proposes a single-stage method for action localization using a vision transformer with bipartite matching loss, improving performance and speed over two-stage methods.


FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Feng Liang,Bichen Wu,Jialiang Wang,Licheng Yu,Kunpeng Li,Yinan Zhao,Ishan Misra,Jia-Bin Huang,Peizhao Zhang,Peter Vajda,Diana Marculescu

http://arxiv.org/abs/2312.17681v1

Compressor summary: The paper proposes a video-to-video synthesis framework that leverages spatial and temporal information to maintain consistency across video frames, while being flexible, efficient, and high-quality.


Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models

Kay Liu,Hengrui Zhang,Ziqing Hu,Fangxin Wang,Philip S. Yu

http://arxiv.org/abs/2312.17679v1

Compressor summary: GODM is a novel data augmentation method that uses diffusion models to generate synthetic graph data for supervised graph outlier detection, mitigating class imbalance issues.


Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

Kaiyuan Yang,Fabio Musio,Yihui Ma,Norman Juchler,Johannes C. Paetzold,Rami Al-Maskari,Luciano Höher,Hongwei Bran Li,Ibrahim Ethem Hamamci,Anjany Sekuboyina,Suprosanna Shit,Houjing Huang,Diana Waldmannstetter,Florian Kofler,Fernando Navarro,Martin Menten,Ivan Ezhov,Daniel Rueckert,Iris Vos,Ynte Ruigrok,Birgitta Velthuis,Hugo Kuijf,Julien Hämmerli,Catherine Wurster,Philippe Bijlenga,Laura Westphal,Jeroen Bisschop,Elisa Colombo,Hakim Baazaoui,Andrew Makmur,James Hallinan,Bene Wiestler,Jan S. Kirschke,Roland Wiest,Emmanuel Montagnon,Laurent Letourneau-Guillon,Adrian Galdran,Francesco Galati,Daniele Falcetta,Maria A. Zuluaga,Chaolong Lin,Haoran Zhao,Zehan Zhang,Sinyoung Ra,Jongyun Hwang,Hyunjin Park,Junqiang Chen,Marek Wodzinski,Henning Müller,Pengcheng Shi,Wei Liu,Ting Ma,Cansu Yalçin,Rachika E. Hamadache,Joaquim Salvi,Xavier Llado,Uma Maria Lal-Trehan Estrada,Valeriia Abramova,Luca Giancardo,Arnau Oliver,Jialu Liu,Haibin Huang,Yue Cui,Zehang Lin,Yusheng Liu,Shunzhi Zhu,Tatsat R. Patel,Vincent M. Tutino,Maysam Orouskhani,Huayu Wang,Mahmud Mossa-Basha,Chengcheng Zhu,Maximilian R. Rokuss,Yannick Kirchhoff,Nico Disch,Julius Holzschuh,Fabian Isensee,Klaus Maier-Hein,Yuki Sato,Sven Hirsch,Susanne Wegener,Bjoern Menze

http://arxiv.org/abs/2312.17670v1

Compressor summary: The TopCoW Challenge 2023 aimed to improve the characterization of the Circle of Willis (a network of brain arteries) using a public dataset with annotated images from MRA and CTA modalities, attracting over 140 participants worldwide.


AIJack: Security and Privacy Risk Simulator for Machine Learning

Hideaki Takahashi

http://arxiv.org/abs/2312.17667v1

Compressor summary: AIJack is an open-source library that helps evaluate security and privacy risks in machine learning models.


Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale

Hao Zhang,Shuaijie Zhang

http://arxiv.org/abs/2312.17663v1

Compressor summary: The article introduces a new bounding box regression method that considers the shape and scale of the boxes themselves, improving object detection performance and achieving state-of-the-art results in various tasks.


Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

Yuqing Wang,Yun Zhao

http://arxiv.org/abs/2312.17661v1

Compressor summary: Gemini is a multimodal large language model that performs well in complex commonsense reasoning tasks across different domains and modalities.


Normalization of Lithuanian Text Using Regular Expressions

Pijus Kasparaitis

http://arxiv.org/abs/2312.17660v2

Compressor summary: The paper presents a taxonomy for Lithuanian language semiotic classes, rules for detecting and expanding non-standard words, and evaluates their accuracy in different data sets.


Solar Radiation Prediction in the UTEQ based on Machine Learning Models

Jordy Anchundia Troncoso,Ángel Torres Quijije,Byron Oviedo,Cristian Zambrano-Vega

http://arxiv.org/abs/2312.17659v1

Compressor summary: The research compares various machine learning algorithms for predicting solar radiation at UTEQ using meteorological variables and finds Gradient Boosting and Random Forest to be the best performers.


Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang,Li Chen,Yanan Sun,Hongyang Li

http://arxiv.org/abs/2312.17655v1

Compressor summary: The paper introduces ViDAR, a pre-training model for visual autonomous driving that uses visual point cloud forecasting to capture semantics, 3D geometry, and temporal dynamics and improves downstream tasks.


Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation

Jiaxi Wang,Wenhui Hu,Xueyang Liu,Beihu Wu,Yuting Qiu,YingYing Cai

http://arxiv.org/abs/2312.17648v1

Compressor summary: EpmVG is a framework that uses cross-modal distillation to align images and texts in a multimodal pre-trained model for better visual grounding.


Research on the Laws of Multimodal Perception and Cognition from a Cross-cultural Perspective -- Taking Overseas Chinese Gardens as an Example

Ran Chen,Xueqi Yao,Jing Zhao,Shuhan Xu,Sirui Zhang,Yijun Mao

http://arxiv.org/abs/2312.17642v1

Compressor summary: The study analyzes perceptual and cognitive interactions in overseas Chinese gardens using social media data, deep learning, and multi-agent systems, revealing new insights into aesthetic experience and cultural communication.


Decision-focused predictions via pessimistic bilevel optimization: a computational study

Víctor Bucarey,Sophia Calderón,Gonzalo Muñoz,Frederic Semet

http://arxiv.org/abs/2312.17640v1

Compressor summary: This paper proposes a method to optimize decisions under uncertainty by minimizing regret, and shows its advantages on shortest-path problems.


XAI for In-hospital Mortality Prediction via Multimodal ICU Data

Xingqiao Li,Jindong Gu,Zhiyong Wang,Yancheng Yuan,Bo Du,Fengxiang He

http://arxiv.org/abs/2312.17624v1

Compressor summary: The paper proposes X-MMP, a multimodal AI system for predicting ICU mortality that provides explainability and visualization of its decisions and features.


Large Language Models for Generative Information Extraction: A Survey

Derong Xu,Wei Chen,Wenjun Peng,Chao Zhang,Tong Xu,Xiangyu Zhao,Xian Wu,Yefeng Zheng,Enhong Chen

http://arxiv.org/abs/2312.17617v1

Compressor summary: The text is about a systematic review of recent advancements in using large language models for information extraction tasks, and provides insights and future directions.


One-Shot Multi-Rate Pruning of Graph Convolutional Networks

Hichem Sahbi

http://arxiv.org/abs/2312.17615v1

Compressor summary: The paper proposes a novel method to create lightweight Graph Convolutional Networks by jointly training network topology and weights using a variational approach that aligns weight distribution with an a priori distribution, achieving better performance in skeleton-based recognition tasks especially at high pruning rates.


P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion

Linlian Jiang,Pan Chen,Ye Wang,Tieru Wu,Rui Ma

http://arxiv.org/abs/2312.17611v1

Compressor summary: The P2M2-Net framework uses text prompts to guide a Transformer network in completing missing regions of 3D point clouds with controllable and diverse results.


Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Dongfang Li,Baotian Hu,Qingcai Chen,Shan He

http://arxiv.org/abs/2312.17591v1

Compressor summary: REGEX is a method to improve text classification model explanations by enhancing robustness and similarity between attention and feature attributions.


Interpretable and Explainable Machine Learning Methods for Predictive Process Monitoring: A Systematic Literature Review

Nijat Mehdiyev,Maxim Majlatow,Peter Fettke

http://arxiv.org/abs/2312.17584v1

Compressor summary: The paper reviews literature on making machine learning models in predictive process mining explainable and interpretable, discussing challenges, methods, and future directions.


Action-Item-Driven Summarization of Long Meeting Transcripts

Logan Golia,Jugal Kalita

http://arxiv.org/abs/2312.17581v1

Compressor summary: The paper presents a new algorithm that generates meeting summaries based on action items, divides transcripts into topic-based sections, and outperforms the current state-of-the-art model by 4.98%.


Informative Rays Selection for Few-Shot Neural Radiance Fields

Marco Orsingher,Anthony Dell'Eva,Paolo Zani,Paolo Medici,Massimo Bertozzi

http://arxiv.org/abs/2312.17561v1

Compressor summary: KeyNeRF is a method for training NeRF with few input views by selecting key informative rays using view and pixel-level selection algorithms.


A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage

Sergio Garcia Garcia,Santiago Cepeda,Ignacio Arrese,Rosario Sarabia

http://arxiv.org/abs/2312.17553v1

Compressor summary: The researchers developed and validated an artificial intelligence tool that can automatically segment blood in subarachnoid hemorrhage patients on CT scans, improving accuracy and efficiency.


Building Efficient Universal Classifiers with Natural Language Inference

Moritz Laurer,Wouter van Atteveldt,Andreu Casas,Kasper Welbers

http://arxiv.org/abs/2312.17543v1

Compressor summary: The paper introduces a BERT-like universal classifier based on NLI that can do any text classification task without fine-tuning or few-shot learning, and shares the code for building it.


Distance Guided Generative Adversarial Network for Explainable Binary Classifications

Xiangyu Xiong,Yue Sun,Xiaohong Liu,Wei Ke,Chan-Tong Lam,Jiangang Chen,Mingfeng Jiang,Mingwei Wang,Hui Xie,Tong Tong,Qinquan Gao,Hao Chen,Tao Tan

http://arxiv.org/abs/2312.17538v1

Compressor summary: The paper proposes a new data augmentation method called DisGAN, which generates diverse samples in the hyperplane space for binary classification using vertical and horizontal distances.


Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs

Shaojie Zhu,Zhaobin Wang,Chengxiang Zhuo,Hui Lu,Bo Hu,Zang Li

http://arxiv.org/abs/2312.17535v1

Compressor summary: Olapa-MCoT is a LLM that improves CoT and Chinese mathematical reasoning with SimRRHF algorithm and data relearning, achieving 36% improvement over llama2-13B.


Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception

Yuncheng Huang,Qianyu He,Jiaqing Liang,Sihang Jiang,Yanghua Xiao,Yunwen Chen

http://arxiv.org/abs/2312.17532v1

Compressor summary: The authors present a framework to improve LLMs' quantitative reasoning by enhancing their dimension perception, which is crucial for understanding quantities with units and improving performance on related benchmarks.


RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Weiying Xie,Zixuan Wang,Jitao Ma,Daixun Li,Yunsong Li

http://arxiv.org/abs/2312.17530v1

Compressor summary: RS-DGC is a dynamic gradient compression technique for distributed deep learning in remote sensing applications that leverages neighborhood statistics to sparsify gradients and reduce communication costs while maintaining performance.


Noise-free Optimization in Early Training Steps for Image Super-Resolution

MinKyu Lee,Jae-Pil Heo

http://arxiv.org/abs/2312.17526v1

Compressor summary: The paper investigates deep-learning-based single image super-resolution methods and proposes a new optimization method that improves their stability and performance by estimating the optimal centroid of high-resolution images and removing inherent noise.


Overview of the PromptCBLUE Shared Task in CHIP2023

Wei Zhu,Xiaoling Wang,Mosha Chen,Buzhou Tang

http://arxiv.org/abs/2312.17522v1

Compressor summary: The paper introduces a shared task that tests Chinese large language models in medical natural language processing using two tracks: prompt tuning and in-context learning.


Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting

Raquel Espinosa,Fernando Jiménez,José Palma

http://arxiv.org/abs/2312.17517v1

Compressor summary: The paper proposes a novel feature selection method for time series forecasting using LSTM networks, an evolutionary algorithm, and ensemble learning, which improves generalization and reduces overfitting.


Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

Zijing Shi,Meng Fang,Shunfeng Zheng,Shilong Deng,Ling Chen,Yali Du

http://arxiv.org/abs/2312.17515v1

Compressor summary: The study explores how large language models can collaborate in ad hoc teamwork scenarios using CodeAct, an agent that improves communication and adaptability by combining memory and code-driven reasoning.


Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation

Tuan-Anh Vu,Duc Thanh Nguyen,Qing Guo,Binh-Son Hua,Nhat Minh Chung,Ivor W. Tsang,Sai-Kit Yeung

http://arxiv.org/abs/2312.17505v1

Compressor summary: The paper proposes a text-to-image diffusion model that leverages cross-domain features for camouflaged object segmentation, outperforming existing methods on benchmark datasets.


HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Hao Wang,Bo Tang,Chi Harold Liu,Shangqin Mao,Jiahong Zhou,Zipeng Dai,Yaqi Sun,Qianlong Xie,Xingxing Wang,Dong Wang

http://arxiv.org/abs/2312.17503v1

Compressor summary: Key points: - The paper proposes a hierarchical offline DRL framework for cross-channel constrained bidding with budget allocation - The framework consists of a high-level planner and a low-level executor with CPC-guided action selection mechanism - HiBid outperforms six baselines and is deployed on Meituan advertising platform Summary: The paper introduces HiBid, an offline DRL framework that optimizes cross-channel bidding with budget allocation and CPC constraints, and shows its effectiveness and real-world application.


Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Xiaohua Lu,Liangxu Xie,Lei Xu,Rongzhi Mao,Shan Chang,Xiaojun Xu

http://arxiv.org/abs/2312.17495v1

Compressor summary: The text describes a multimodal deep learning model that predicts molecular properties better than mono-modal models by using different representations of drug molecules and fusion methods.


QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition

Youzhe Song,Feng Wang

http://arxiv.org/abs/2312.17494v1

Compressor summary: The paper proposes a novel method for mixed-quality face recognition that applies different learning methods to HQ and LQ images, using classification-based methods for HQ data and self-supervised contrastive learning for LQ data.


HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

Xin Zhang,Jinheng Xie,Yuan Yuan,Michael Bi Mi,Robby T. Tan

http://arxiv.org/abs/2312.17492v1

Compressor summary: HEAP is a novel framework that uses cross-attention and contrastive losses to group patches into regions for efficient hierarchical image decomposition and improved object discovery and differentiation.


Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning

Zhongzhi Chen,Xingwu Sun,Xianfeng Jiao,Fengzong Lian,Zhanhui Kang,Di Wang,Cheng-Zhong Xu

http://arxiv.org/abs/2312.17484v1

Compressor summary: Truth Forest is a method to make LLMs more truthful by finding hidden truth representations using orthogonal probes and Random Peek technique, improving performance on TruthfulQA dataset.


MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Jacob Portes,Alex Trott,Sam Havens,Daniel King,Abhinav Venigalla,Moin Nadeem,Nikhil Sardana,Daya Khudia,Jonathan Frankle

http://arxiv.org/abs/2312.17482v1

Compressor summary: MosaicBERT is a fast, optimized BERT-style encoder architecture that allows efficient pretraining with minimal costs.


Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning

Nigini Oliveira,Jasmine Li,Koosha Khalvati,Rodolfo Cortes Barragan,Katharina Reinecke,Andrew N. Meltzoff,Rajesh P. N. Rao

http://arxiv.org/abs/2312.17479v1

Compressor summary: The authors propose using inverse reinforcement learning to help AI agents learn the cultural values and norms of the community they operate in by observing human behavior in a virtual world.


Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters

Manikanta Loya,Divya Anand Sinha,Richard Futrell

http://arxiv.org/abs/2312.17476v1

Compressor summary: The study examines how variations in prompts and hyperparameters affect large language models' decision making abilities, finding that they can exhibit a human-like exploration-exploitation tradeoff.


EHR Interaction Between Patients and AI: NoteAid EHR Interaction

Xiaocheng Zhang,Zonghai Yao,Hong Yu

http://arxiv.org/abs/2312.17475v1

Compressor summary: The paper presents an approach using generative large language models to help patients understand their Electronic Health Records by providing explanations and answering questions, and evaluates its performance on two novel tasks.


FerKD: Surgical Label Adaptation for Efficient Distillation

Zhiqiang Shen

http://arxiv.org/abs/2312.17473v1

Compressor summary: FerKD is a novel knowledge distillation framework that improves convergence speed and accuracy by calibrating less-confident regions, mixing similar image regions, and using hard ground truth labels.


Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

Benjamin Eyre,Elliot Creager,David Madras,Vardan Papyan,Richard Zemel

http://arxiv.org/abs/2312.17463v1

Compressor summary: The paper proposes a simple spectral adaptation method to improve the performance of neural regression models on out-of-distribution data.


Tracking with Human-Intent Reasoning

Jiawen Zhu,Zhi-Qi Cheng,Jun-Yan He,Chenyang Li,Bin Luo,Huchuan Lu,Yifeng Geng,Xuansong Xie

http://arxiv.org/abs/2312.17448v1

Compressor summary: The paper proposes a new tracking task called Instruction Tracking, which uses a Large Vision-Language Model to provide implicit tracking instructions and achieve competitive performance on referring video object segmentation benchmarks.


ClST: A Convolutional Transformer Framework for Automatic Modulation Recognition by Knowledge Distillation

Dongbin Hou,Lixin Li,Wensheng Lin,Junli Liang,Zhu Han

http://arxiv.org/abs/2312.17446v1

Compressor summary: The paper proposes a new neural network (ClST) and a knowledge distillation method (SKD) to improve automatic modulation recognition (AMR) using deep learning, especially on miniaturized devices.


SMoT: Think in State Machine

Jia Liu,Jie Shuai

http://arxiv.org/abs/2312.17445v1

Compressor summary: Key points: - Current prompting approach for language model inference relies on LLM's autonomous exploration of reasoning paths, which can be inefficient and prone to errors - SMoT introduces a paradigm that uses predefined state machines to guide LLM's reasoning, eliminating fruitless exploration - SMoT also uses a multi-agent mechanism to enhance the accuracy of reasoning by assigning different objectives to agents - SMoT achieves an extraordinary accuracy of 98% on an array reasoning task Summary: SMoT is a novel paradigm that improves LLM's problem-solving by using predefined state machines and multi-agent mechanisms, resulting in high accuracy on array reasoning.


Video Understanding with Large Language Models: A Survey

Yunlong Tang,Jing Bi,Siting Xu,Luchuan Song,Susan Liang,Teng Wang,Daoan Zhang,Jie An,Jingyang Lin,Rongyi Zhu,Ali Vosoughi,Chao Huang,Zeliang Zhang,Feng Zheng,Jianguo Zhang,Ping Luo,Jiebo Luo,Chenliang Xu

http://arxiv.org/abs/2312.17432v1

Compressor summary: This survey summarizes recent advancements in using large language models for video understanding, exploring their capabilities, types, tasks, datasets, applications, and limitations.


Commonsense for Zero-Shot Natural Language Video Localization

Meghana Holla,Ismini Lourentzou

http://arxiv.org/abs/2312.17429v1

Compressor summary: CORONET is a framework that uses commonsense reasoning to improve zero-shot Natural Language-Video Localization by bridging the gap between videos and generated pseudo-queries with Graph Convolution Networks and cross-attention mechanisms.


ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset

Deyi Ji,Siqi Gao,Mingyuan Tao,Hongtao Lu,Feng Zhao

http://arxiv.org/abs/2312.17428v1

Compressor summary: ChangeNet is a large-scale practical-oriented dataset for multi-temporal change detection with realistic perspective distortions and six annotated categories, covering various complex scenes from 100 cities.


Context-based Transfer and Efficient Iterative Learning for Unbiased Scene Graph Generation

Qishen Chen,Xinyu Lyu,Haonan Zhang,Pengpeng Zeng,Lianli Gao,Jingkuan Song

http://arxiv.org/abs/2312.17425v1

Compressor summary: CITrans is a plug-and-play method for scene graph generation that uses context-restricted transfer and efficient iterative learning to improve data transfer and training efficiency.


Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Melrose Roderick,Felix Berkenkamp,Fatemeh Sheikholeslami,Zico Kolter

http://arxiv.org/abs/2312.17411v1

Compressor summary: Generative Posterior Networks (GPNs) are a new generative model that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems by approximating the Bayesian posterior distribution.


Comparing roughness descriptors for distinct terrain surfaces in point cloud data

Lei Fan,Yang Zhao

http://arxiv.org/abs/2312.17407v1

Compressor summary: This study compares five ways of measuring surface roughness and shows that using multiple methods can improve accuracy in analyzing different terrains.


Parameter Optimization with Conscious Allocation (POCA)

Joshua Inman,Tanmay Khandait,Giulia Pedrielli,Lalitha Sankar

http://arxiv.org/abs/2312.17404v1

Compressor summary: POCA is a new hyperband-based algorithm that adaptively allocates budget to hyperparameter configurations using Bayesian sampling, and outperforms its competitors in finding optimal configurations for machine learning models.