This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-02 generated by the compressor, my personal LLM-based project.
Li Du,Afra Amini,Lucas Torroba Hennigen,Xinyan Velocity Yu,Jason Eisner,Holden Lee,Ryan Cotterell
http://arxiv.org/abs/2312.17710v1
Compressor summary: The paper proposes new faithful gradient-based sampling algorithms for energy-based text generation that improve fluency and adherence to control objectives.
Felipe Oliveira,Victoria Reis,Nelson Ebecken
http://arxiv.org/abs/2312.17704v1
Compressor summary: TuPy-E is a large annotated Portuguese corpus for hate speech detection that uses an open-source approach and advanced techniques like BERT models.
Ioanna Ntinou,Enrique Sanchez,Georgios Tzimiropoulos
http://arxiv.org/abs/2312.17686v1
Compressor summary: The paper proposes a single-stage method for action localization using a vision transformer with bipartite matching loss, improving performance and speed over two-stage methods.
Feng Liang,Bichen Wu,Jialiang Wang,Licheng Yu,Kunpeng Li,Yinan Zhao,Ishan Misra,Jia-Bin Huang,Peizhao Zhang,Peter Vajda,Diana Marculescu
http://arxiv.org/abs/2312.17681v1
Compressor summary: The paper proposes a video-to-video synthesis framework that leverages spatial and temporal information to maintain consistency across video frames, while being flexible, efficient, and high-quality.
Kay Liu,Hengrui Zhang,Ziqing Hu,Fangxin Wang,Philip S. Yu
http://arxiv.org/abs/2312.17679v1
Compressor summary: GODM is a novel data augmentation method that uses diffusion models to generate synthetic graph data for supervised graph outlier detection, mitigating class imbalance issues.
Kaiyuan Yang,Fabio Musio,Yihui Ma,Norman Juchler,Johannes C. Paetzold,Rami Al-Maskari,Luciano Höher,Hongwei Bran Li,Ibrahim Ethem Hamamci,Anjany Sekuboyina,Suprosanna Shit,Houjing Huang,Diana Waldmannstetter,Florian Kofler,Fernando Navarro,Martin Menten,Ivan Ezhov,Daniel Rueckert,Iris Vos,Ynte Ruigrok,Birgitta Velthuis,Hugo Kuijf,Julien Hämmerli,Catherine Wurster,Philippe Bijlenga,Laura Westphal,Jeroen Bisschop,Elisa Colombo,Hakim Baazaoui,Andrew Makmur,James Hallinan,Bene Wiestler,Jan S. Kirschke,Roland Wiest,Emmanuel Montagnon,Laurent Letourneau-Guillon,Adrian Galdran,Francesco Galati,Daniele Falcetta,Maria A. Zuluaga,Chaolong Lin,Haoran Zhao,Zehan Zhang,Sinyoung Ra,Jongyun Hwang,Hyunjin Park,Junqiang Chen,Marek Wodzinski,Henning Müller,Pengcheng Shi,Wei Liu,Ting Ma,Cansu Yalçin,Rachika E. Hamadache,Joaquim Salvi,Xavier Llado,Uma Maria Lal-Trehan Estrada,Valeriia Abramova,Luca Giancardo,Arnau Oliver,Jialu Liu,Haibin Huang,Yue Cui,Zehang Lin,Yusheng Liu,Shunzhi Zhu,Tatsat R. Patel,Vincent M. Tutino,Maysam Orouskhani,Huayu Wang,Mahmud Mossa-Basha,Chengcheng Zhu,Maximilian R. Rokuss,Yannick Kirchhoff,Nico Disch,Julius Holzschuh,Fabian Isensee,Klaus Maier-Hein,Yuki Sato,Sven Hirsch,Susanne Wegener,Bjoern Menze
http://arxiv.org/abs/2312.17670v1
Compressor summary: The TopCoW Challenge 2023 aimed to improve the characterization of the Circle of Willis (a network of brain arteries) using a public dataset with annotated images from MRA and CTA modalities, attracting over 140 participants worldwide.
Hideaki Takahashi
http://arxiv.org/abs/2312.17667v1
Compressor summary: AIJack is an open-source library that helps evaluate security and privacy risks in machine learning models.
Hao Zhang,Shuaijie Zhang
http://arxiv.org/abs/2312.17663v1
Compressor summary: The article introduces a new bounding box regression method that considers the shape and scale of the boxes themselves, improving object detection performance and achieving state-of-the-art results in various tasks.
Yuqing Wang,Yun Zhao
http://arxiv.org/abs/2312.17661v1
Compressor summary: Gemini is a multimodal large language model that performs well in complex commonsense reasoning tasks across different domains and modalities.
Pijus Kasparaitis
http://arxiv.org/abs/2312.17660v2
Compressor summary: The paper presents a taxonomy for Lithuanian language semiotic classes, rules for detecting and expanding non-standard words, and evaluates their accuracy in different data sets.
Jordy Anchundia Troncoso,Ángel Torres Quijije,Byron Oviedo,Cristian Zambrano-Vega
http://arxiv.org/abs/2312.17659v1
Compressor summary: The research compares various machine learning algorithms for predicting solar radiation at UTEQ using meteorological variables and finds Gradient Boosting and Random Forest to be the best performers.
Zetong Yang,Li Chen,Yanan Sun,Hongyang Li
http://arxiv.org/abs/2312.17655v1
Compressor summary: The paper introduces ViDAR, a pre-training model for visual autonomous driving that uses visual point cloud forecasting to capture semantics, 3D geometry, and temporal dynamics and improves downstream tasks.
Jiaxi Wang,Wenhui Hu,Xueyang Liu,Beihu Wu,Yuting Qiu,YingYing Cai
http://arxiv.org/abs/2312.17648v1
Compressor summary: EpmVG is a framework that uses cross-modal distillation to align images and texts in a multimodal pre-trained model for better visual grounding.
Ran Chen,Xueqi Yao,Jing Zhao,Shuhan Xu,Sirui Zhang,Yijun Mao
http://arxiv.org/abs/2312.17642v1
Compressor summary: The study analyzes perceptual and cognitive interactions in overseas Chinese gardens using social media data, deep learning, and multi-agent systems, revealing new insights into aesthetic experience and cultural communication.
Víctor Bucarey,Sophia Calderón,Gonzalo Muñoz,Frederic Semet
http://arxiv.org/abs/2312.17640v1
Compressor summary: This paper proposes a method to optimize decisions under uncertainty by minimizing regret, and shows its advantages on shortest-path problems.
Xingqiao Li,Jindong Gu,Zhiyong Wang,Yancheng Yuan,Bo Du,Fengxiang He
http://arxiv.org/abs/2312.17624v1
Compressor summary: The paper proposes X-MMP, a multimodal AI system for predicting ICU mortality that provides explainability and visualization of its decisions and features.
Derong Xu,Wei Chen,Wenjun Peng,Chao Zhang,Tong Xu,Xiangyu Zhao,Xian Wu,Yefeng Zheng,Enhong Chen
http://arxiv.org/abs/2312.17617v1
Compressor summary: The text is about a systematic review of recent advancements in using large language models for information extraction tasks, and provides insights and future directions.
Hichem Sahbi
http://arxiv.org/abs/2312.17615v1
Compressor summary: The paper proposes a novel method to create lightweight Graph Convolutional Networks by jointly training network topology and weights using a variational approach that aligns weight distribution with an a priori distribution, achieving better performance in skeleton-based recognition tasks especially at high pruning rates.
Linlian Jiang,Pan Chen,Ye Wang,Tieru Wu,Rui Ma
http://arxiv.org/abs/2312.17611v1
Compressor summary: The P2M2-Net framework uses text prompts to guide a Transformer network in completing missing regions of 3D point clouds with controllable and diverse results.
Dongfang Li,Baotian Hu,Qingcai Chen,Shan He
http://arxiv.org/abs/2312.17591v1
Compressor summary: REGEX is a method to improve text classification model explanations by enhancing robustness and similarity between attention and feature attributions.
Nijat Mehdiyev,Maxim Majlatow,Peter Fettke
http://arxiv.org/abs/2312.17584v1
Compressor summary: The paper reviews literature on making machine learning models in predictive process mining explainable and interpretable, discussing challenges, methods, and future directions.
Logan Golia,Jugal Kalita
http://arxiv.org/abs/2312.17581v1
Compressor summary: The paper presents a new algorithm that generates meeting summaries based on action items, divides transcripts into topic-based sections, and outperforms the current state-of-the-art model by 4.98%.
Marco Orsingher,Anthony Dell'Eva,Paolo Zani,Paolo Medici,Massimo Bertozzi
http://arxiv.org/abs/2312.17561v1
Compressor summary: KeyNeRF is a method for training NeRF with few input views by selecting key informative rays using view and pixel-level selection algorithms.
Sergio Garcia Garcia,Santiago Cepeda,Ignacio Arrese,Rosario Sarabia
http://arxiv.org/abs/2312.17553v1
Compressor summary: The researchers developed and validated an artificial intelligence tool that can automatically segment blood in subarachnoid hemorrhage patients on CT scans, improving accuracy and efficiency.
Moritz Laurer,Wouter van Atteveldt,Andreu Casas,Kasper Welbers
http://arxiv.org/abs/2312.17543v1
Compressor summary: The paper introduces a BERT-like universal classifier based on NLI that can do any text classification task without fine-tuning or few-shot learning, and shares the code for building it.
Xiangyu Xiong,Yue Sun,Xiaohong Liu,Wei Ke,Chan-Tong Lam,Jiangang Chen,Mingfeng Jiang,Mingwei Wang,Hui Xie,Tong Tong,Qinquan Gao,Hao Chen,Tao Tan
http://arxiv.org/abs/2312.17538v1
Compressor summary: The paper proposes a new data augmentation method called DisGAN, which generates diverse samples in the hyperplane space for binary classification using vertical and horizontal distances.
Shaojie Zhu,Zhaobin Wang,Chengxiang Zhuo,Hui Lu,Bo Hu,Zang Li
http://arxiv.org/abs/2312.17535v1
Compressor summary: Olapa-MCoT is a LLM that improves CoT and Chinese mathematical reasoning with SimRRHF algorithm and data relearning, achieving 36% improvement over llama2-13B.
Yuncheng Huang,Qianyu He,Jiaqing Liang,Sihang Jiang,Yanghua Xiao,Yunwen Chen
http://arxiv.org/abs/2312.17532v1
Compressor summary: The authors present a framework to improve LLMs' quantitative reasoning by enhancing their dimension perception, which is crucial for understanding quantities with units and improving performance on related benchmarks.
Weiying Xie,Zixuan Wang,Jitao Ma,Daixun Li,Yunsong Li
http://arxiv.org/abs/2312.17530v1
Compressor summary: RS-DGC is a dynamic gradient compression technique for distributed deep learning in remote sensing applications that leverages neighborhood statistics to sparsify gradients and reduce communication costs while maintaining performance.
MinKyu Lee,Jae-Pil Heo
http://arxiv.org/abs/2312.17526v1
Compressor summary: The paper investigates deep-learning-based single image super-resolution methods and proposes a new optimization method that improves their stability and performance by estimating the optimal centroid of high-resolution images and removing inherent noise.
Wei Zhu,Xiaoling Wang,Mosha Chen,Buzhou Tang
http://arxiv.org/abs/2312.17522v1
Compressor summary: The paper introduces a shared task that tests Chinese large language models in medical natural language processing using two tracks: prompt tuning and in-context learning.
Raquel Espinosa,Fernando Jiménez,José Palma
http://arxiv.org/abs/2312.17517v1
Compressor summary: The paper proposes a novel feature selection method for time series forecasting using LSTM networks, an evolutionary algorithm, and ensemble learning, which improves generalization and reduces overfitting.
Zijing Shi,Meng Fang,Shunfeng Zheng,Shilong Deng,Ling Chen,Yali Du
http://arxiv.org/abs/2312.17515v1
Compressor summary: The study explores how large language models can collaborate in ad hoc teamwork scenarios using CodeAct, an agent that improves communication and adaptability by combining memory and code-driven reasoning.
Tuan-Anh Vu,Duc Thanh Nguyen,Qing Guo,Binh-Son Hua,Nhat Minh Chung,Ivor W. Tsang,Sai-Kit Yeung
http://arxiv.org/abs/2312.17505v1
Compressor summary: The paper proposes a text-to-image diffusion model that leverages cross-domain features for camouflaged object segmentation, outperforming existing methods on benchmark datasets.
Hao Wang,Bo Tang,Chi Harold Liu,Shangqin Mao,Jiahong Zhou,Zipeng Dai,Yaqi Sun,Qianlong Xie,Xingxing Wang,Dong Wang
http://arxiv.org/abs/2312.17503v1
Compressor summary: Key points: - The paper proposes a hierarchical offline DRL framework for cross-channel constrained bidding with budget allocation - The framework consists of a high-level planner and a low-level executor with CPC-guided action selection mechanism - HiBid outperforms six baselines and is deployed on Meituan advertising platform Summary: The paper introduces HiBid, an offline DRL framework that optimizes cross-channel bidding with budget allocation and CPC constraints, and shows its effectiveness and real-world application.
Xiaohua Lu,Liangxu Xie,Lei Xu,Rongzhi Mao,Shan Chang,Xiaojun Xu
http://arxiv.org/abs/2312.17495v1
Compressor summary: The text describes a multimodal deep learning model that predicts molecular properties better than mono-modal models by using different representations of drug molecules and fusion methods.
Youzhe Song,Feng Wang
http://arxiv.org/abs/2312.17494v1
Compressor summary: The paper proposes a novel method for mixed-quality face recognition that applies different learning methods to HQ and LQ images, using classification-based methods for HQ data and self-supervised contrastive learning for LQ data.
Xin Zhang,Jinheng Xie,Yuan Yuan,Michael Bi Mi,Robby T. Tan
http://arxiv.org/abs/2312.17492v1
Compressor summary: HEAP is a novel framework that uses cross-attention and contrastive losses to group patches into regions for efficient hierarchical image decomposition and improved object discovery and differentiation.
Zhongzhi Chen,Xingwu Sun,Xianfeng Jiao,Fengzong Lian,Zhanhui Kang,Di Wang,Cheng-Zhong Xu
http://arxiv.org/abs/2312.17484v1
Compressor summary: Truth Forest is a method to make LLMs more truthful by finding hidden truth representations using orthogonal probes and Random Peek technique, improving performance on TruthfulQA dataset.
Jacob Portes,Alex Trott,Sam Havens,Daniel King,Abhinav Venigalla,Moin Nadeem,Nikhil Sardana,Daya Khudia,Jonathan Frankle
http://arxiv.org/abs/2312.17482v1
Compressor summary: MosaicBERT is a fast, optimized BERT-style encoder architecture that allows efficient pretraining with minimal costs.
Nigini Oliveira,Jasmine Li,Koosha Khalvati,Rodolfo Cortes Barragan,Katharina Reinecke,Andrew N. Meltzoff,Rajesh P. N. Rao
http://arxiv.org/abs/2312.17479v1
Compressor summary: The authors propose using inverse reinforcement learning to help AI agents learn the cultural values and norms of the community they operate in by observing human behavior in a virtual world.
Manikanta Loya,Divya Anand Sinha,Richard Futrell
http://arxiv.org/abs/2312.17476v1
Compressor summary: The study examines how variations in prompts and hyperparameters affect large language models' decision making abilities, finding that they can exhibit a human-like exploration-exploitation tradeoff.
Xiaocheng Zhang,Zonghai Yao,Hong Yu
http://arxiv.org/abs/2312.17475v1
Compressor summary: The paper presents an approach using generative large language models to help patients understand their Electronic Health Records by providing explanations and answering questions, and evaluates its performance on two novel tasks.
Zhiqiang Shen
http://arxiv.org/abs/2312.17473v1
Compressor summary: FerKD is a novel knowledge distillation framework that improves convergence speed and accuracy by calibrating less-confident regions, mixing similar image regions, and using hard ground truth labels.
Benjamin Eyre,Elliot Creager,David Madras,Vardan Papyan,Richard Zemel
http://arxiv.org/abs/2312.17463v1
Compressor summary: The paper proposes a simple spectral adaptation method to improve the performance of neural regression models on out-of-distribution data.
Jiawen Zhu,Zhi-Qi Cheng,Jun-Yan He,Chenyang Li,Bin Luo,Huchuan Lu,Yifeng Geng,Xuansong Xie
http://arxiv.org/abs/2312.17448v1
Compressor summary: The paper proposes a new tracking task called Instruction Tracking, which uses a Large Vision-Language Model to provide implicit tracking instructions and achieve competitive performance on referring video object segmentation benchmarks.
Dongbin Hou,Lixin Li,Wensheng Lin,Junli Liang,Zhu Han
http://arxiv.org/abs/2312.17446v1
Compressor summary: The paper proposes a new neural network (ClST) and a knowledge distillation method (SKD) to improve automatic modulation recognition (AMR) using deep learning, especially on miniaturized devices.
Jia Liu,Jie Shuai
http://arxiv.org/abs/2312.17445v1
Compressor summary: Key points: - Current prompting approach for language model inference relies on LLM's autonomous exploration of reasoning paths, which can be inefficient and prone to errors - SMoT introduces a paradigm that uses predefined state machines to guide LLM's reasoning, eliminating fruitless exploration - SMoT also uses a multi-agent mechanism to enhance the accuracy of reasoning by assigning different objectives to agents - SMoT achieves an extraordinary accuracy of 98% on an array reasoning task Summary: SMoT is a novel paradigm that improves LLM's problem-solving by using predefined state machines and multi-agent mechanisms, resulting in high accuracy on array reasoning.
Yunlong Tang,Jing Bi,Siting Xu,Luchuan Song,Susan Liang,Teng Wang,Daoan Zhang,Jie An,Jingyang Lin,Rongyi Zhu,Ali Vosoughi,Chao Huang,Zeliang Zhang,Feng Zheng,Jianguo Zhang,Ping Luo,Jiebo Luo,Chenliang Xu
http://arxiv.org/abs/2312.17432v1
Compressor summary: This survey summarizes recent advancements in using large language models for video understanding, exploring their capabilities, types, tasks, datasets, applications, and limitations.
Meghana Holla,Ismini Lourentzou
http://arxiv.org/abs/2312.17429v1
Compressor summary: CORONET is a framework that uses commonsense reasoning to improve zero-shot Natural Language-Video Localization by bridging the gap between videos and generated pseudo-queries with Graph Convolution Networks and cross-attention mechanisms.
Deyi Ji,Siqi Gao,Mingyuan Tao,Hongtao Lu,Feng Zhao
http://arxiv.org/abs/2312.17428v1
Compressor summary: ChangeNet is a large-scale practical-oriented dataset for multi-temporal change detection with realistic perspective distortions and six annotated categories, covering various complex scenes from 100 cities.
Qishen Chen,Xinyu Lyu,Haonan Zhang,Pengpeng Zeng,Lianli Gao,Jingkuan Song
http://arxiv.org/abs/2312.17425v1
Compressor summary: CITrans is a plug-and-play method for scene graph generation that uses context-restricted transfer and efficient iterative learning to improve data transfer and training efficiency.
Melrose Roderick,Felix Berkenkamp,Fatemeh Sheikholeslami,Zico Kolter
http://arxiv.org/abs/2312.17411v1
Compressor summary: Generative Posterior Networks (GPNs) are a new generative model that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems by approximating the Bayesian posterior distribution.
Lei Fan,Yang Zhao
http://arxiv.org/abs/2312.17407v1
Compressor summary: This study compares five ways of measuring surface roughness and shows that using multiple methods can improve accuracy in analyzing different terrains.
Joshua Inman,Tanmay Khandait,Giulia Pedrielli,Lalitha Sankar
http://arxiv.org/abs/2312.17404v1
Compressor summary: POCA is a new hyperband-based algorithm that adaptively allocates budget to hyperparameter configurations using Bayesian sampling, and outperforms its competitors in finding optimal configurations for machine learning models.