arxiv compressed, 2024-01-11

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-11 generated by the compressor, my personal LLM-based project.


InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

Mohamad Shahbazi,Liesbeth Claessens,Michael Niemeyer,Edo Collins,Alessio Tonioni,Luc Van Gool,Federico Tombari

http://arxiv.org/abs/2401.05335v1

Compressor summary: InseRF is a novel method that can generate new objects in 3D scenes based on textual descriptions and 2D bounding boxes, without requiring explicit 3D information as input.


Towards Online Sign Language Recognition and Translation

Ronglai Zuo,Fangyun Wei,Brian Mak

http://arxiv.org/abs/2401.05336v1

Compressor summary: The authors present a novel online sign language recognition system that outperforms existing offline methods on three benchmarks.


URHand: Universal Relightable Hands

Zhaoxi Chen,Gyeongsik Moon,Kaiwen Guo,Chen Cao,Stanislav Pidhorskyi,Tomas Simon,Rohan Joshi,Yuan Dong,Yichen Xu,Bernardo Pires,He Wen,Lucas Evans,Bo Peng,Julia Buffalini,Autumn Trimble,Kevyn McPhail,Melissa Schoeller,Shoou-I Yu,Javier Romero,Michael Zollhöfer,Yaser Sheikh,Ziwei Liu,Shunsuke Saito

http://arxiv.org/abs/2401.05334v1

Compressor summary: URHand is a universal relightable hand model that can be personalized with few images, generalize to natural illuminations and novel identities, and render photorealistically under any lighting condition.


Arrival Time Prediction for Autonomous Shuttle Services in the Real World: Evidence from Five Cities

Carolin Schmidt,Mathias Tygesen,Filipe Rodrigues

http://arxiv.org/abs/2401.05322v1

Compressor summary: The study presents a punctuality prediction system for autonomous shuttles using various models, including graph neural networks, to enhance customer trust in shared automated vehicles.


Leveraging Print Debugging to Improve Code Generation in Large Language Models

Xueyu Hu,Kun Kuang,Jiankai Sun,Hongxia Yang,Fei Wu

http://arxiv.org/abs/2401.05319v1

Compressor summary: The paper proposes an in-context learning method to improve LLMs' performance in debugging programming problems using print statements.


Can Probabilistic Feedback Drive User Impacts in Online Platforms?

Jessica Dai,Bailey Flanigan,Nika Haghtalab,Meena Jagadeesan,Chara Podimata

http://arxiv.org/abs/2401.05304v1

Compressor summary: The text explains how content recommender systems can negatively affect users due to differences in feedback rates and how no-regret algorithms may not address this issue.


I am a Strange Dataset: Metalinguistic Tests for Language Models

Tristan Thrush,Jared Moore,Miguel Monares,Christopher Potts,Douwe Kiela

http://arxiv.org/abs/2401.05300v1

Compressor summary: The paper introduces a new dataset, "I am a Strange Dataset", to test large language models' ability to handle metalinguistic self-reference and other metalinguistic language tasks.


Enhanced Muscle and Fat Segmentation for CT-Based Body Composition Analysis: A Comparative Study

Benjamin Hou,Tejas Sudharshan Mathai,Jianfei Liu,Christopher Parnell,Ronald M. Summers

http://arxiv.org/abs/2401.05294v1

Compressor summary: The study compares an internal tool with TotalSegmentator for segmenting muscle and fat from CT scans, finding the internal tool more accurate for subcutaneous fat and muscle while having a high agreement for visceral fat.


Score Distillation Sampling with Learned Manifold Corrective

Thiemo Alldieck,Nikos Kolotouros,Cristian Sminchisescu

http://arxiv.org/abs/2401.05293v1

Compressor summary: The paper analyzes a popular image diffusion method for controlling optimization problems using text prompts, identifies its noisy gradient issue, and proposes a fix by training a shallow network to mimic the denoising deficiency of the model.


INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges

Jayr Pereira,Andre Assumpcao,Julio Trecenti,Luiz Airosa,Caio Lente,Jhonatan Cléto,Guilherme Dobins,Rodrigo Nogueira,Luis Mitchell,Roberto Lotufo

http://arxiv.org/abs/2401.05273v1

Compressor summary: INACIA is a system that uses artificial intelligence to automate case analysis and generate recommendations for the Brazilian Federal Court of Accounts, demonstrating high performance and potential for improving efficiency and fairness in legal systems.


AUTOACT: Automatic Agent Learning from Scratch via Self-Planning

Shuofei Qiao,Ningyu Zhang,Runnan Fang,Yujie Luo,Wangchunshu Zhou,Yuchen Eleanor Jiang,Chengfei Lv,Huajun Chen

http://arxiv.org/abs/2401.05268v1

Compressor summary: AutoAct is a framework that automatically learns and synthesizes planning trajectories for language agents without relying on large-scale annotated data or closed-source models, achieving comparable performance to GPT-3.5-Turbo.


PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Junsong Chen,Yue Wu,Simian Luo,Enze Xie,Sayak Paul,Ping Luo,Hang Zhao,Zhenguo Li

http://arxiv.org/abs/2401.05252v1

Compressor summary: PIXART-{\delta} is a fast and efficient text-to-image synthesis framework that combines LCM and ControlNet with PIXART-{\alpha}, enabling high-quality image generation in 2-4 steps with fine-grained control and low memory requirements.


ReACT: Reinforcement Learning for Controller Parametrization using B-Spline Geometries

Thomas Rudolf,Daniel Flögel,Tobias Schürmann,Simon Süß,Stefan Schwab,Sören Hohmann

http://arxiv.org/abs/2401.05251v1

Compressor summary: This paper proposes a novel approach using deep reinforcement learning to automatically parametrize controllers for complex, nonlinear systems with parameter-variant behavior, using B-spline geometries and long short-term memory neural networks for efficient adaptation and actor regularizations.


CASA: Causality-driven Argument Sufficiency Assessment

Xiao Liu,Yansong Feng,Kai-Wei Chang

http://arxiv.org/abs/2401.05249v1

Compressor summary: The paper proposes CASA, a framework to assess if an argument's premises support its conclusion using causality-driven probability of sufficiency and large language models, and demonstrates its effectiveness in fallacy detection and writing assistance.


Decoupling Decision-Making in Fraud Prevention through Classifier Calibration for Business Logic Action

Emanuele Luzio,Moacir Antonelli Ponti,Christian Ramirez Arevalo,Luis Argerich

http://arxiv.org/abs/2401.05240v1

Compressor summary: The paper explores how calibration strategies can help decouple machine learning classifiers from score-based actions in business logic frameworks, and evaluates their trade-offs and performance in a real-world scenario.


Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Tianhang Cheng,Wei-Chiu Ma,Kaiyu Guan,Antonio Torralba,Shenlong Wang

http://arxiv.org/abs/2401.05236v1

Compressor summary: Key points: - Structure from Duplicates (SfD) is a new inverse graphics framework that reconstructs 3D geometry, material, and illumination from a single image with multiple identical objects. - SfD uses object duplicates as a prior for inverse graphics and a robust SfM formulation for joint pose estimation. - SfD achieves more realistic and detailed 3D reconstructions than existing models. Summary: SfD is a novel method that reconstructs 3D properties from a single image of identical objects, using them as prior information and improving the quality of reconstruction.


Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

Yaqi Duan,Martin J. Wainwright

http://arxiv.org/abs/2401.05233v1

Compressor summary: The paper presents a new framework for analyzing continuous state-action RL with fast convergence rates, focusing on two key stability properties related to value functions and policies, and connecting off-line and transfer learning.


Measuring Natural Scenes SFR of Automotive Fisheye Cameras

Daniel Jakab,Eoin Martino Grua,Brian Micheal Deegan,Anthony Scanlan,Pepijn Van De Ven,Ciarán Eising

http://arxiv.org/abs/2401.05232v1

Compressor summary: The paper introduces a modified NS-SFR algorithm for measuring MTF and optical quality in wide FOV camera datasets for vehicle automation.


Do Vision and Language Encoders Represent the World Similarly?

Mayug Maniparambil,Raiymbek Akshulakov,Yasser Abdelaziz Dahou Djilali,Sanath Narayan,Mohamed El Amine Seddik,Karttikeya Mangalam,Noel E. O'Connor

http://arxiv.org/abs/2401.05224v1

Compressor summary: The study explores if uni-modal vision and language models can be aligned without training by using graph matching techniques, showing promising results on downstream tasks.


Invariant Causal Prediction with Locally Linear Models

Alexander Mey,Rui Manuel Castro

http://arxiv.org/abs/2401.05218v1

Compressor summary: The paper proposes LoLICaP, a method for identifying causal parents from observational data under linear and invariant causal structures across different environments.


Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Chenxi Yang,Yujia Liu,Dingquan Li,Tingting jiang

http://arxiv.org/abs/2401.05217v1

Compressor summary: The paper introduces a novel query-based black box attack on no-reference image quality assessment methods, which overcomes limitations of existing attacks and reveals the vulnerability of these methods to black-box attacks.


Pre-trained Large Language Models for Financial Sentiment Analysis

Wei Luo,Dihong Gong

http://arxiv.org/abs/2401.05215v1

Compressor summary: Key points: - Financial sentiment analysis classifies financial news titles into positive, negative, or neutral categories - The paper proposes to adapt pretrained large language models (LLMs) for this task - LLMs are trained on huge text corpora and can be fine-tuned with few samples - The paper uses the Llama2-7B model with supervised fine-tuning technique - The approach outperforms previous state-of-the-art algorithms Summary: The paper adapts pretrained large language models to classify financial news titles into sentiment categories, using the Llama2-7B model and supervised fine-tuning, and achieves better results than previous methods.


A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a Verbalizer

Yong Ma,Senlin Luo,Yu-Ming Shang,Zhengjun Li,Yong Liu

http://arxiv.org/abs/2401.05204v1

Compressor summary: Our novel approach to constructing verbalizers for prompt-tuning uses task-specific scenarios to create label words, improving zero-shot text classification performance and reducing bias.


Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits

Helena Russello,Rik van der Tol,Menno Holzhauer,Eldert J. van Henten,Gert Kootstra

http://arxiv.org/abs/2401.05202v1

Compressor summary: The study developed an automated lameness detection system for cows using deep-learning image processing, extracting keypoints and locomotion traits from outdoor videos, and improving classification accuracy by including multiple traits.


Monte Carlo Tree Search for Recipe Generation using GPT-2

Karan Taneja,Richard Segal,Richard Goodwin

http://arxiv.org/abs/2401.05199v1

Compressor summary: RecipeMC is a method to generate more credible and preferred food recipes using GPT-2 and Monte Carlo Tree Search with reward functions.


Experiment Planning with Function Approximation

Aldo Pacchiano,Jonathan N. Lee,Emma Brunskill

http://arxiv.org/abs/2401.05193v1

Compressor summary: The paper proposes two experiment planning strategies for contextual bandit problems using function approximation, one with eluder planning and sampling that achieves optimality guarantees, and another with uniform sampler for small action spaces.


Divide and Conquer for Large Language Models Reasoning

Zijie Meng,Yan Zhang,Zhaopeng Feng,Yang Feng,Gaoang Wang,Joey Tianyi Zhou,Jian Wu,Zuozhu Liu

http://arxiv.org/abs/2401.05190v1

Compressor summary: The paper proposes a Divide and Conquer strategy for large language models to improve their reasoning abilities on multi-choice questions by handling different subsets of tasks with different methods.


Can ChatGPT Rival Neural Machine Translation? A Comparative Study

Zhaokun Jiang,Ziyin Zhang

http://arxiv.org/abs/2401.05176v1

Compressor summary: The paper compares ChatGPT and NMT engines for Chinese-English translation using automated metrics and human evaluation, finding that ChatGPT performs better with contextual information.


CLIP-guided Source-free Object Detection in Aerial Images

Nanqing Liu,Xun Xu,Yongyi Su,Chengxin Liu,Peiliang Gong,Heng-Chao Li

http://arxiv.org/abs/2401.05168v1

Compressor summary: Key points: - The text proposes a novel object detection method (SFOD) for aerial images with domain adaptation challenges - SFOD uses self-training and CLIP-guided aggregation to generate pseudo-labels from unlabeled data - SFOD is evaluated on two new datasets and shows better performance than other methods Summary: The text introduces a novel method (SFOD) for detecting objects in aerial images with domain adaptation issues, using self-training and CLIP-guided aggregation to create pseudo-labels from unlabeled data. The method is tested on two new datasets and outperforms other algorithms.


Watermark Text Pattern Spotting in Document Images

Mateusz Krubinski,Stefan Matcovici,Diana Grigore,Daniel Voinea,Alin-Ionut Popa

http://arxiv.org/abs/2401.05167v1

Compressor summary: Key points: - Watermark text spotting in document images can reveal information about the scope, audience and authenticity of records - Existing methods face challenges due to varied writing styles in the wild - A new benchmark (K-Watermark) and a solution (Wextract) are proposed to detect and extract watermark text from documents - The solution outperforms baselines by 5 AP points in detection and 4 points in character accuracy Summary: The paper presents a novel approach for detecting and extracting watermark text from document images, which can provide valuable information about the records. It introduces a new benchmark and a state-of-the-art solution that surpass existing methods by a significant margin.


REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge

Siyang Song,Micol Spitale,Cheng Luo,Cristina Palmero,German Barquero,Hengde Zhu,Sergio Escalera,Michel Valstar,Tobias Baur,Fabien Ringeval,Elisabeth Andre,Hatice Gunes

http://arxiv.org/abs/2401.05166v1

Compressor summary: The REACT 2024 challenge aims to develop machine learning models that generate diverse and realistic human facial reactions in response to speaker behaviors in dyadic interactions, using data from NOXI and RECOLA datasets.


MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

Jiawei Chen,Dingkang Yang,Yue Jiang,Yuxuan Lei,Lihua Zhang

http://arxiv.org/abs/2401.05163v1

Compressor summary: The paper proposes a large-scale self-supervised learning framework for medical visual question answering, treating it as a generative task and extending traditional image datasets with language models.


Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN

Muhammad Ali Farooq,Wang Yao,Michael Schukat,Mark A Little,Peter Corcoran

http://arxiv.org/abs/2401.05159v1

Compressor summary: The study shows that using synthetic skin lesion data from stable diffusion models enhances the performance and generalization of machine learning models for real-world skin lesion analysis.


Toward distortion-aware change detection in realistic scenarios

Yitao Zhao,Heng-Chao Li,Nanqing Liu,Rui Wang

http://arxiv.org/abs/2401.05157v1

Compressor summary: The paper proposes a self-supervised framework to address bitemporal geometric distortion in change detection tasks using pretext representation pre-training, image alignment, and fine-tuning.


CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

Yinghui Xing,Litao Qu,ShiZhou Zhang,Xiuwei Zhang,Yanning Zhang

http://arxiv.org/abs/2401.05153v1

Compressor summary: The paper proposes CrossDiff, a cross-predictive diffusion model that uses self-supervised representation to improve pansharpening by combining spatial and spectral features from PAN and MS images.


Machine Learning to Promote Translational Research: Predicting Patent and Clinical Trial Inclusion in Dementia Research

Matilda Beinat,Julian Beinat,Mohammed Shoaib,Jorge Gomez Magenti

http://arxiv.org/abs/2401.05145v1

Compressor summary: This study uses machine learning to predict which dementia research papers will have practical applications, aiming to improve translation of discoveries into treatments and reduce the societal and economic burden of the disease.


Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation

Abhisek Tiwari,Shreyangshu Bera,Sriparna Saha,Pushpak Bhattacharyya,Samrat Ghosh

http://arxiv.org/abs/2401.05134v1

Compressor summary: The paper proposes a system to generate short summaries of patients' concerns during doctor-patient consultations using nonverbal cues, personal information, and a multitasking framework.


Neural Population Learning beyond Symmetric Zero-sum Games

Siqi Liu,Luke Marris,Marc Lanctot,Georgios Piliouras,Joel Z. Leibo,Nicolas Heess

http://arxiv.org/abs/2401.05133v1

Compressor summary: The paper proposes NeuPL-JPSRO, a method for finding equilibria in complex general-sum games using neural networks and transfer learning, which works well empirically and theoretically.


Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

Teru Nagamori,Sayaka Shiota,Hitoshi Kiya

http://arxiv.org/abs/2401.05126v1

Compressor summary: The proposed method enables privacy-preserving training and testing of deep neural networks using encrypted images without sacrificing performance.


BELHD: Improving Biomedical Entity Linking with Homonoym Disambiguation

Samuele Garda,Ulf Leser

http://arxiv.org/abs/2401.05125v1

Compressor summary: BELHD is a new method for biomedical entity linking that handles homonyms by preprocessing the knowledge base and using candidate sharing for contrastive learning, improving results on 10 corpora.


Any-Way Meta Learning

Junhoo Lee,Yearim Kim,Hyunho Lee,Nojun Kwak

http://arxiv.org/abs/2401.05097v1

Compressor summary: The paper introduces an "any-way" learning paradigm that overcomes fixed cardinality constraints in meta-learning by using label equivalence from episodic task sampling and improves performance, convergence speed, and stability.


SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

Jiayuan Tian,Jie Lei,Jiaqing Zhang,Weiying Xie,Yunsong Li

http://arxiv.org/abs/2401.05093v1

Compressor summary: SwiMDiff is a novel self-supervised pre-training framework for remote sensing images that addresses challenges in contrastive learning by scene-wide matching and pixel-level diffusion constraints, improving performance in change detection and land-cover classification tasks.


Hierarchical Classification of Transversal Skills in Job Ads Based on Sentence Embeddings

Florin Leon,Marius Gavrilescu,Sabina-Adriana Floria,Alina-Adriana Minea

http://arxiv.org/abs/2401.05073v1

Compressor summary: The paper presents a deep learning model that identifies transversal skills needed for different jobs by analyzing job ads in multiple languages and using ESCO taxonomy.


Aligning Translation-Specific Understanding to General Understanding in Large Language Models

Yichong Huang,Xiaocheng Feng,Baohang Li,Chengpeng Fu,Wenshuai Huo,Ting Liu,Bing Qin

http://arxiv.org/abs/2401.05072v1

Compressor summary: The paper proposes a novel translation process (xIoD) that aligns general understanding and content-specific knowledge in LLMs by interpreting difficult words across languages and enhancing translations with these interpretations.


MISS: Multiclass Interpretable Scoring Systems

Michal K. Grzeszczyk,Tomasz Trzciński,Arkadiusz Sitek

http://arxiv.org/abs/2401.05069v1

Compressor summary: The authors propose a new machine-learning method for creating interpretable scoring systems for multiclass problems, useful for decision making in healthcare and criminal justice.


Application of Deep Learning in Blind Motion Deblurring: Current Status and Future Prospects

Yawen Xiang,Heng Zhou,Chengyang Li,Fangwei Sun,Zhongbo Li,Yongqiang Xie

http://arxiv.org/abs/2401.05055v1

Compressor summary: This paper reviews deep learning methods for blind motion deblurring, comparing their advantages and limitations on various datasets.


Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Yuu Jinnai,Ukyo Honda,Tetsuro Morimura,Peinan Zhang

http://arxiv.org/abs/2401.05054v1

Compressor summary: The paper proposes two new methods, DMBR and KMBR, to generate high-quality and diverse sentences by adding diversity objectives to MBR decoding, and shows their effectiveness on various text generation tasks.


Content-Aware Depth-Adaptive Image Restoration

Tom Richard Vargis,Siavash Ghiasvand

http://arxiv.org/abs/2401.05049v1

Compressor summary: The paper presents a modular image restoration pipeline using existing models and allowing user control over the process, adaptable for various object categories like medical images.


CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification Tasks

Kaizheng Wang,Keivan Shariatmadar,Shireen Kudukkil Manchingal,Fabio Cuzzolin,David Moens,Hans Hallez

http://arxiv.org/abs/2401.05043v1

Compressor summary: CreINNs are novel interval neural networks that estimate both weight uncertainty and credal sets for improved out-of-distribution detection with less computational complexity than existing methods.


Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk

Dennis Ulmer,Elman Mansimov,Kaixiang Lin,Justin Sun,Xibin Gao,Yi Zhang

http://arxiv.org/abs/2401.05033v1

Compressor summary: Key points: - Large language models are powerful but hard to specialize for specific functions - Instructing tuning requires many human-generated samples, which are costly or unavailable - Self-talk method uses LLMs in different roles to collect data for fine-tuning - Automated metric measures dialogue success and filters data for training - Self-talk data improves results and quality of dialogues Summary: The paper proposes a self-talk method that uses large language models to engage in conversations in various roles, collecting data for fine-tuning without human input. It introduces an automated metric to measure dialogue success and filter data, and shows that self-talk data improves results and quality of dialogues.


AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction

Sarmad Idrees,Jongeun Choi,Seokman Sohn

http://arxiv.org/abs/2401.05018v1

Compressor summary: AdvMT is a new model for predicting human movements that uses a transformer encoder and a temporal continuity discriminator to capture spatial and temporal dependencies, resulting in more accurate and realistic motion predictions.


An Information Theoretic Approach to Interaction-Grounded Learning

Xiaoyan Hu,Farzan Farnia,Ho-fung Leung

http://arxiv.org/abs/2401.05015v1

Compressor summary: VI-IGL is a new information-theoretic method for reinforcement learning that infers latent rewards from feedback variables using conditional mutual information and shows improved performance compared to previous methods.


Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data

Jinjing Zhu,Yucheng Chen,Lin Wang

http://arxiv.org/abs/2401.05014v1

Compressor summary: The paper proposes a framework to transfer knowledge between modalities without access to source data by using task-irrelevant data to bridge the gaps and enhance knowledge transfer.


HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for Long-Term Forecasting

Shubao Zhao,Ming Jin,Zhaoxiang Hou,Chengyi Yang,Zengxiang Li,Qingsong Wen,Yi Wang

http://arxiv.org/abs/2401.05012v1

Compressor summary: HiMTM is a new hierarchical multi-scale model for time series forecasting that outperforms existing self-supervised and end-to-end methods.


Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Yucheng Han,Na Zhao,Weiling Chen,Keng Teck Ma,Hanwang Zhang

http://arxiv.org/abs/2401.05011v1

Compressor summary: DPKE is a novel approach for semi-supervised 3D object detection that enriches knowledge from data and feature perspectives, improving performance over existing methods.


Less is More : A Closer Look at Multi-Modal Few-Shot Learning

Chunpeng Zhou,Haishuai Wang,Xilu Yuan,Zhi Yu,Jiajun Bu

http://arxiv.org/abs/2401.05010v1

Compressor summary: The paper proposes a simple and effective framework for few-shot learning that leverages textual information and pre-trained language models, achieving impressive results and surpassing state-of-the-art methods in 1-shot learning tasks.


Temporal Analysis of World Disaster Risk:A Machine Learning Approach to Cluster Dynamics

Christian Mulomba Mukendi,Hyebong Choi

http://arxiv.org/abs/2401.05007v1

Compressor summary: This paper evaluates global disaster risk management efforts using the World Risk Index and finds that traditional long-term strategies are not effective, suggesting a need for innovative approaches tailored to highly vulnerable countries.


Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-based Vision

Kamil Jeziorek,Piotr Wzorek,Krzysztof Blachut,Andrea Pinna,Tomasz Kryjak

http://arxiv.org/abs/2401.04988v1

Compressor summary: Key points: - Event-based vision processes data from neuromorphic cameras using Graph Convolutional Networks (GCNs) - Paper presents hardware implementation of graph generation from event camera data - Proposes simplifications and modifications to graph representation - Results show minimal impact on object detection performance Summary: The paper proposes and implements a hardware module for generating graphs from event camera data for event-based vision, with minimal impact on object detection.


Structure-Preserving Physics-Informed Neural Networks With Energy or Lyapunov Structure

Haoyu Chu,Yuto Miyatake,Wenjun Cui,Shikui Wei,Daisuke Furihata

http://arxiv.org/abs/2401.04986v1

Compressor summary: The text proposes structure-preserving physics-informed neural networks to improve learning efficiency and apply them to downstream tasks like robust image recognition.


MGNet: Learning Correspondences via Multiple Graphs

Luanyuan Dai,Xiaoyu Du,Hanwang Zhang,Jinhui Tang

http://arxiv.org/abs/2401.04984v1

Compressor summary: The paper proposes MGNet, a method that combines multiple complementary graphs to find correspondences using graph neural networks and Graph Soft Degree Attention.


Invertible Solution of Neural Differential Equations for Analysis of Irregularly-Sampled Time Series

YongKyung Oh,Dongyoung Lim,Sungil Kim

http://arxiv.org/abs/2401.04979v1

Compressor summary: The authors propose a novel Neural CDEs-based method that ensures invertibility and better modeling of dynamic temporal dynamics for analyzing irregular and incomplete time series data, outperforming existing models in classification and interpolation tasks.


Closed-Form Interpretation of Neural Network Classifiers with Symbolic Regression Gradients

Sebastian Johann Wetzel

http://arxiv.org/abs/2401.04978v1

Compressor summary: The paper presents a method for interpreting neural network classifiers by finding an intersection between their equivalence classes and symbolic equations, enabling automated scientific discovery.


HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition

Qian Wu,Ruoxuan Cui,Yuke Li,Haoqi Zhu

http://arxiv.org/abs/2401.04975v1

Compressor summary: The paper introduces HaltingVT, a video transformer that adaptively removes redundant tokens to improve efficiency and performance in action recognition tasks.


Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation

Ian Stewart,Rada Mihalcea

http://arxiv.org/abs/2401.04972v1

Compressor summary: This text discusses how machine translation (MT) systems often fail to accurately translate sentences about same-gender relationships, and examines factors that influence this bias in natural language processing (NLP).


Large Model based Sequential Keyframe Extraction for Video Summarization

Kailong Tan,Yuxiang Zhou,Qianchen Xia,Rui Liu,Yong Chen

http://arxiv.org/abs/2401.04962v1

Compressor summary: LMSKE is a video summarization method that uses large models to cut videos into shots, generate visual features, cluster candidate keyframes, and eliminate redundancy to create a summary with minimum frames.


ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Yuncheng Jiang,Zixun Zhang,Yiwen Hu,Guanbin Li,Xiang Wan,Song Wu,Shuguang Cui,Silin Huang,Zhen Li

http://arxiv.org/abs/2401.04961v1

Compressor summary: ECC-PolypDet is a new method that uses contrastive learning and other techniques to improve polyp detection in colorectal cancer screening, outperforming existing approaches.


EmMixformer: Mix transformer for eye movement recognition

Huafeng Qin,Hongyu Zhu,Xin Jin,Qun Song,Mounim A. El-Yacoubi,Xinbo Gao

http://arxiv.org/abs/2401.04956v1

Compressor summary: EmMixformer uses a transformer, attention LSTM, and Fourier transformer to capture local and global temporal dependencies in eye movement data for improved recognition accuracy.


Can AI Write Classical Chinese Poetry like Humans? An Empirical Study Inspired by Turing Test

Zekun Deng,Hao Yang,Jun Wang

http://arxiv.org/abs/2401.04952v1

Compressor summary: The paper introduces ProFTAP, a framework to evaluate AI's poetry writing ability, and shows that current LLMs can write classical Chinese poems comparable to humans, even surpassing GPT-4.


Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

Beiwen Tian,Huan-ang Gao,Leiyao Cui,Yupeng Zheng,Lan Luo,Baofeng Wang,Rong Zhi,Guyue Zhou,Hao Zhao

http://arxiv.org/abs/2401.04942v1

Compressor summary: The text introduces a new video anomaly segmentation dataset for autonomous driving, using synthetic data enhanced with a generative adversarial network to improve realism and providing two novel metrics for evaluating the safety of the algorithm.


Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Sicong Huang,Jiawei He,Kry Yik Chau Lui

http://arxiv.org/abs/2401.04933v1

Compressor summary: The authors propose a new method for out of distribution detection using variational autoencoders, with provable guarantees and better empirical results than existing techniques.


The Impact of Reasoning Step Length on Large Language Models

Mingyu Jin,Qinkai Yu,Dong shu,Haiyan Zhao,Wenyue Hua,Yanda Meng,Yongfeng Zhang,Mengnan Du

http://arxiv.org/abs/2401.04925v1

Compressor summary: The number and length of reasoning steps in chain of thought prompts greatly affect large language models' reasoning abilities, with longer steps enhancing performance and shorter steps diminishing it.


Inconsistency-Based Data-Centric Active Open-Set Annotation

Ruiyu Mao,Ouyang Xu,Yunhui Guo

http://arxiv.org/abs/2401.04923v1

Compressor summary: NEAT is an efficient data-centric method for annotating open-set data, which improves upon existing active learning approaches by handling unknown classes and using clusterability of labels to select informative samples.


Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton

Hongbo Kang,Yong Wang,Mengyuan Liu,Doudou Wu,Peng Liu,Xinlin Yuan,Wenming Yang

http://arxiv.org/abs/2401.04921v1

Compressor summary: The DRPose framework refines the output of deterministic models for 3D human pose estimation using diffusion, denoising, and multi-step refinement, achieving state-of-the-art performance on both single and multi-hypothesis tasks.


SnapCap: Efficient Snapshot Compressive Video Captioning

Jianqiao Sun,Yudi Su,Hao Zhang,Ziheng Cheng,Zequn Zeng,Zhengjue Wang,Bo Chen,Xin Yuan

http://arxiv.org/abs/2401.04903v1

Compressor summary: The paper proposes a novel video captioning pipeline called SnapCap that generates captions directly from compressed measurements obtained by a snapshot compressive sensing camera, improving efficiency and quality over traditional methods.


ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain

Bingchao Wang

http://arxiv.org/abs/2401.04898v1

Compressor summary: This paper introduces ANGO, a Chinese multi-choice question benchmark for evaluating large language models with improved interpretability, question difficulty, and sampling strategies.


Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations

Manqing Mao,Paishun Ting,Yijian Xiang,Mingyang Xu,Julia Chen,Jianzhe Lin

http://arxiv.org/abs/2401.04883v1

Compressor summary: The paper introduces MUCA, an LLM-based chatbot framework for group discussions that considers 3W design dimensions (what, when, and who) and uses an LLM-based simulator to improve efficiency in developing the framework.


Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing

Zi Yang,Nan Hua

http://arxiv.org/abs/2401.04881v1

Compressor summary: The paper proposes an efficient way to handle long sequences in LLMs by using eviction policies and a wait-to-attend mechanism that adapts to different architectures and tasks.


Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Yu Liu,Yuexin Zhang,Kunming Li,Yongliang Qiao,Stewart Worrall,You-Fu Li,He Kong

http://arxiv.org/abs/2401.04872v1

Compressor summary: The paper proposes a graph transformer with self-attention and domain adaptation to improve pedestrian motion prediction across various scenarios, and introduces a new metric for evaluation.


Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

Koji Inoue,Bing'er Jiang,Erik Ekstedt,Tatsuya Kawahara,Gabriel Skantze

http://arxiv.org/abs/2401.04868v1

Compressor summary: The paper presents a real-time turn-taking prediction system using voice activity projection (VAP), which combines contrastive predictive coding (CPC) and self/cross-attention transformers to map audio to future voice activities.


An Analysis of User Behaviours for Objectively Evaluating Spoken Dialogue Systems

Koji Inoue,Divesh Lala,Keiko Ochi,Tatsuya Kawahara,Gabriel Skantze

http://arxiv.org/abs/2401.04867v1

Compressor summary: The paper proposes an objective framework for evaluating spoken dialogue systems based on users' behaviours and shows how different behaviours correlate with subjective scores in three social dialogue tasks.


CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

Xingyu Miao,Yang Bai,Haoran Duan,Yawen Huang,Fan Wan,Yang Long,Yefeng Zheng

http://arxiv.org/abs/2401.04861v1

Compressor summary: Our method improves novel view generation from monocular videos by modeling object motion using a time-frequency domain module on top of a generalization NeRF.


Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval

Eunyi Lyou,Doyeon Lee,Jooeun Kim,Joonseok Lee

http://arxiv.org/abs/2401.04860v1

Compressor summary: The paper proposes a novel framework for zero-shot sketch-based image retrieval that aligns sketches and photos through texts and bridges the modality gap using an explicit modality encoding.


User Embedding Model for Personalized Language Prompting

Sumanth Doddapaneni,Krishna Sayana,Ambarish Jash,Sukhdeep Sodhi,Dima Kuzmin

http://arxiv.org/abs/2401.04858v1

Compressor summary: The study proposes a User Embedding Module (UEM) that converts user history in text form into embeddings, which improve recommendation systems' personalization and handle longer histories better than conventional methods.


Transportation Market Rate Forecast Using Signature Transform

Haotian Gu,Tim Jacobs,Philip Kaminsky,Xin Guo,Xinyu Li

http://arxiv.org/abs/2401.04857v1

Compressor summary: The authors propose a signature-based statistical technique for more accurate and interpretable marketplace rate forecasts, outperforming existing methods during crises like Covid-19 and the Ukraine war.


A Good Score Does not Lead to A Good Generative Model

Sixu Li,Shi Chen,Qin Li

http://arxiv.org/abs/2401.04856v1

Compressor summary: The paper shows that score-based generative models can fail to generate diverse samples even when the score function is well learned, by providing a counterexample where they produce blurry versions of training data.


Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs

Harvey Lederman,Kyle Mahowald

http://arxiv.org/abs/2401.04854v1

Compressor summary: The text discusses whether LLMs are cultural technologies that transmit information or possess a limited form of agency based on their ability to generate novel reference using new names for entities.