This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-11 generated by the compressor, my personal LLM-based project.
Mohamad Shahbazi,Liesbeth Claessens,Michael Niemeyer,Edo Collins,Alessio Tonioni,Luc Van Gool,Federico Tombari
http://arxiv.org/abs/2401.05335v1
Compressor summary: InseRF is a novel method that can generate new objects in 3D scenes based on textual descriptions and 2D bounding boxes, without requiring explicit 3D information as input.
Ronglai Zuo,Fangyun Wei,Brian Mak
http://arxiv.org/abs/2401.05336v1
Compressor summary: The authors present a novel online sign language recognition system that outperforms existing offline methods on three benchmarks.
Zhaoxi Chen,Gyeongsik Moon,Kaiwen Guo,Chen Cao,Stanislav Pidhorskyi,Tomas Simon,Rohan Joshi,Yuan Dong,Yichen Xu,Bernardo Pires,He Wen,Lucas Evans,Bo Peng,Julia Buffalini,Autumn Trimble,Kevyn McPhail,Melissa Schoeller,Shoou-I Yu,Javier Romero,Michael Zollhöfer,Yaser Sheikh,Ziwei Liu,Shunsuke Saito
http://arxiv.org/abs/2401.05334v1
Compressor summary: URHand is a universal relightable hand model that can be personalized with few images, generalize to natural illuminations and novel identities, and render photorealistically under any lighting condition.
Carolin Schmidt,Mathias Tygesen,Filipe Rodrigues
http://arxiv.org/abs/2401.05322v1
Compressor summary: The study presents a punctuality prediction system for autonomous shuttles using various models, including graph neural networks, to enhance customer trust in shared automated vehicles.
Xueyu Hu,Kun Kuang,Jiankai Sun,Hongxia Yang,Fei Wu
http://arxiv.org/abs/2401.05319v1
Compressor summary: The paper proposes an in-context learning method to improve LLMs' performance in debugging programming problems using print statements.
Jessica Dai,Bailey Flanigan,Nika Haghtalab,Meena Jagadeesan,Chara Podimata
http://arxiv.org/abs/2401.05304v1
Compressor summary: The text explains how content recommender systems can negatively affect users due to differences in feedback rates and how no-regret algorithms may not address this issue.
Tristan Thrush,Jared Moore,Miguel Monares,Christopher Potts,Douwe Kiela
http://arxiv.org/abs/2401.05300v1
Compressor summary: The paper introduces a new dataset, "I am a Strange Dataset", to test large language models' ability to handle metalinguistic self-reference and other metalinguistic language tasks.
Benjamin Hou,Tejas Sudharshan Mathai,Jianfei Liu,Christopher Parnell,Ronald M. Summers
http://arxiv.org/abs/2401.05294v1
Compressor summary: The study compares an internal tool with TotalSegmentator for segmenting muscle and fat from CT scans, finding the internal tool more accurate for subcutaneous fat and muscle while having a high agreement for visceral fat.
Thiemo Alldieck,Nikos Kolotouros,Cristian Sminchisescu
http://arxiv.org/abs/2401.05293v1
Compressor summary: The paper analyzes a popular image diffusion method for controlling optimization problems using text prompts, identifies its noisy gradient issue, and proposes a fix by training a shallow network to mimic the denoising deficiency of the model.
Jayr Pereira,Andre Assumpcao,Julio Trecenti,Luiz Airosa,Caio Lente,Jhonatan Cléto,Guilherme Dobins,Rodrigo Nogueira,Luis Mitchell,Roberto Lotufo
http://arxiv.org/abs/2401.05273v1
Compressor summary: INACIA is a system that uses artificial intelligence to automate case analysis and generate recommendations for the Brazilian Federal Court of Accounts, demonstrating high performance and potential for improving efficiency and fairness in legal systems.
Shuofei Qiao,Ningyu Zhang,Runnan Fang,Yujie Luo,Wangchunshu Zhou,Yuchen Eleanor Jiang,Chengfei Lv,Huajun Chen
http://arxiv.org/abs/2401.05268v1
Compressor summary: AutoAct is a framework that automatically learns and synthesizes planning trajectories for language agents without relying on large-scale annotated data or closed-source models, achieving comparable performance to GPT-3.5-Turbo.
Junsong Chen,Yue Wu,Simian Luo,Enze Xie,Sayak Paul,Ping Luo,Hang Zhao,Zhenguo Li
http://arxiv.org/abs/2401.05252v1
Compressor summary: PIXART-{\delta} is a fast and efficient text-to-image synthesis framework that combines LCM and ControlNet with PIXART-{\alpha}, enabling high-quality image generation in 2-4 steps with fine-grained control and low memory requirements.
Thomas Rudolf,Daniel Flögel,Tobias Schürmann,Simon Süß,Stefan Schwab,Sören Hohmann
http://arxiv.org/abs/2401.05251v1
Compressor summary: This paper proposes a novel approach using deep reinforcement learning to automatically parametrize controllers for complex, nonlinear systems with parameter-variant behavior, using B-spline geometries and long short-term memory neural networks for efficient adaptation and actor regularizations.
Xiao Liu,Yansong Feng,Kai-Wei Chang
http://arxiv.org/abs/2401.05249v1
Compressor summary: The paper proposes CASA, a framework to assess if an argument's premises support its conclusion using causality-driven probability of sufficiency and large language models, and demonstrates its effectiveness in fallacy detection and writing assistance.
Emanuele Luzio,Moacir Antonelli Ponti,Christian Ramirez Arevalo,Luis Argerich
http://arxiv.org/abs/2401.05240v1
Compressor summary: The paper explores how calibration strategies can help decouple machine learning classifiers from score-based actions in business logic frameworks, and evaluates their trade-offs and performance in a real-world scenario.
Tianhang Cheng,Wei-Chiu Ma,Kaiyu Guan,Antonio Torralba,Shenlong Wang
http://arxiv.org/abs/2401.05236v1
Compressor summary: Key points: - Structure from Duplicates (SfD) is a new inverse graphics framework that reconstructs 3D geometry, material, and illumination from a single image with multiple identical objects. - SfD uses object duplicates as a prior for inverse graphics and a robust SfM formulation for joint pose estimation. - SfD achieves more realistic and detailed 3D reconstructions than existing models. Summary: SfD is a novel method that reconstructs 3D properties from a single image of identical objects, using them as prior information and improving the quality of reconstruction.
Yaqi Duan,Martin J. Wainwright
http://arxiv.org/abs/2401.05233v1
Compressor summary: The paper presents a new framework for analyzing continuous state-action RL with fast convergence rates, focusing on two key stability properties related to value functions and policies, and connecting off-line and transfer learning.
Daniel Jakab,Eoin Martino Grua,Brian Micheal Deegan,Anthony Scanlan,Pepijn Van De Ven,Ciarán Eising
http://arxiv.org/abs/2401.05232v1
Compressor summary: The paper introduces a modified NS-SFR algorithm for measuring MTF and optical quality in wide FOV camera datasets for vehicle automation.
Mayug Maniparambil,Raiymbek Akshulakov,Yasser Abdelaziz Dahou Djilali,Sanath Narayan,Mohamed El Amine Seddik,Karttikeya Mangalam,Noel E. O'Connor
http://arxiv.org/abs/2401.05224v1
Compressor summary: The study explores if uni-modal vision and language models can be aligned without training by using graph matching techniques, showing promising results on downstream tasks.
Alexander Mey,Rui Manuel Castro
http://arxiv.org/abs/2401.05218v1
Compressor summary: The paper proposes LoLICaP, a method for identifying causal parents from observational data under linear and invariant causal structures across different environments.
Chenxi Yang,Yujia Liu,Dingquan Li,Tingting jiang
http://arxiv.org/abs/2401.05217v1
Compressor summary: The paper introduces a novel query-based black box attack on no-reference image quality assessment methods, which overcomes limitations of existing attacks and reveals the vulnerability of these methods to black-box attacks.
Wei Luo,Dihong Gong
http://arxiv.org/abs/2401.05215v1
Compressor summary: Key points: - Financial sentiment analysis classifies financial news titles into positive, negative, or neutral categories - The paper proposes to adapt pretrained large language models (LLMs) for this task - LLMs are trained on huge text corpora and can be fine-tuned with few samples - The paper uses the Llama2-7B model with supervised fine-tuning technique - The approach outperforms previous state-of-the-art algorithms Summary: The paper adapts pretrained large language models to classify financial news titles into sentiment categories, using the Llama2-7B model and supervised fine-tuning, and achieves better results than previous methods.
Yong Ma,Senlin Luo,Yu-Ming Shang,Zhengjun Li,Yong Liu
http://arxiv.org/abs/2401.05204v1
Compressor summary: Our novel approach to constructing verbalizers for prompt-tuning uses task-specific scenarios to create label words, improving zero-shot text classification performance and reducing bias.
Helena Russello,Rik van der Tol,Menno Holzhauer,Eldert J. van Henten,Gert Kootstra
http://arxiv.org/abs/2401.05202v1
Compressor summary: The study developed an automated lameness detection system for cows using deep-learning image processing, extracting keypoints and locomotion traits from outdoor videos, and improving classification accuracy by including multiple traits.
Karan Taneja,Richard Segal,Richard Goodwin
http://arxiv.org/abs/2401.05199v1
Compressor summary: RecipeMC is a method to generate more credible and preferred food recipes using GPT-2 and Monte Carlo Tree Search with reward functions.
Aldo Pacchiano,Jonathan N. Lee,Emma Brunskill
http://arxiv.org/abs/2401.05193v1
Compressor summary: The paper proposes two experiment planning strategies for contextual bandit problems using function approximation, one with eluder planning and sampling that achieves optimality guarantees, and another with uniform sampler for small action spaces.
Zijie Meng,Yan Zhang,Zhaopeng Feng,Yang Feng,Gaoang Wang,Joey Tianyi Zhou,Jian Wu,Zuozhu Liu
http://arxiv.org/abs/2401.05190v1
Compressor summary: The paper proposes a Divide and Conquer strategy for large language models to improve their reasoning abilities on multi-choice questions by handling different subsets of tasks with different methods.
Zhaokun Jiang,Ziyin Zhang
http://arxiv.org/abs/2401.05176v1
Compressor summary: The paper compares ChatGPT and NMT engines for Chinese-English translation using automated metrics and human evaluation, finding that ChatGPT performs better with contextual information.
Nanqing Liu,Xun Xu,Yongyi Su,Chengxin Liu,Peiliang Gong,Heng-Chao Li
http://arxiv.org/abs/2401.05168v1
Compressor summary: Key points: - The text proposes a novel object detection method (SFOD) for aerial images with domain adaptation challenges - SFOD uses self-training and CLIP-guided aggregation to generate pseudo-labels from unlabeled data - SFOD is evaluated on two new datasets and shows better performance than other methods Summary: The text introduces a novel method (SFOD) for detecting objects in aerial images with domain adaptation issues, using self-training and CLIP-guided aggregation to create pseudo-labels from unlabeled data. The method is tested on two new datasets and outperforms other algorithms.
Mateusz Krubinski,Stefan Matcovici,Diana Grigore,Daniel Voinea,Alin-Ionut Popa
http://arxiv.org/abs/2401.05167v1
Compressor summary: Key points: - Watermark text spotting in document images can reveal information about the scope, audience and authenticity of records - Existing methods face challenges due to varied writing styles in the wild - A new benchmark (K-Watermark) and a solution (Wextract) are proposed to detect and extract watermark text from documents - The solution outperforms baselines by 5 AP points in detection and 4 points in character accuracy Summary: The paper presents a novel approach for detecting and extracting watermark text from document images, which can provide valuable information about the records. It introduces a new benchmark and a state-of-the-art solution that surpass existing methods by a significant margin.
Siyang Song,Micol Spitale,Cheng Luo,Cristina Palmero,German Barquero,Hengde Zhu,Sergio Escalera,Michel Valstar,Tobias Baur,Fabien Ringeval,Elisabeth Andre,Hatice Gunes
http://arxiv.org/abs/2401.05166v1
Compressor summary: The REACT 2024 challenge aims to develop machine learning models that generate diverse and realistic human facial reactions in response to speaker behaviors in dyadic interactions, using data from NOXI and RECOLA datasets.
Jiawei Chen,Dingkang Yang,Yue Jiang,Yuxuan Lei,Lihua Zhang
http://arxiv.org/abs/2401.05163v1
Compressor summary: The paper proposes a large-scale self-supervised learning framework for medical visual question answering, treating it as a generative task and extending traditional image datasets with language models.
Muhammad Ali Farooq,Wang Yao,Michael Schukat,Mark A Little,Peter Corcoran
http://arxiv.org/abs/2401.05159v1
Compressor summary: The study shows that using synthetic skin lesion data from stable diffusion models enhances the performance and generalization of machine learning models for real-world skin lesion analysis.
Yitao Zhao,Heng-Chao Li,Nanqing Liu,Rui Wang
http://arxiv.org/abs/2401.05157v1
Compressor summary: The paper proposes a self-supervised framework to address bitemporal geometric distortion in change detection tasks using pretext representation pre-training, image alignment, and fine-tuning.
Yinghui Xing,Litao Qu,ShiZhou Zhang,Xiuwei Zhang,Yanning Zhang
http://arxiv.org/abs/2401.05153v1
Compressor summary: The paper proposes CrossDiff, a cross-predictive diffusion model that uses self-supervised representation to improve pansharpening by combining spatial and spectral features from PAN and MS images.
Matilda Beinat,Julian Beinat,Mohammed Shoaib,Jorge Gomez Magenti
http://arxiv.org/abs/2401.05145v1
Compressor summary: This study uses machine learning to predict which dementia research papers will have practical applications, aiming to improve translation of discoveries into treatments and reduce the societal and economic burden of the disease.
Abhisek Tiwari,Shreyangshu Bera,Sriparna Saha,Pushpak Bhattacharyya,Samrat Ghosh
http://arxiv.org/abs/2401.05134v1
Compressor summary: The paper proposes a system to generate short summaries of patients' concerns during doctor-patient consultations using nonverbal cues, personal information, and a multitasking framework.
Siqi Liu,Luke Marris,Marc Lanctot,Georgios Piliouras,Joel Z. Leibo,Nicolas Heess
http://arxiv.org/abs/2401.05133v1
Compressor summary: The paper proposes NeuPL-JPSRO, a method for finding equilibria in complex general-sum games using neural networks and transfer learning, which works well empirically and theoretically.
Teru Nagamori,Sayaka Shiota,Hitoshi Kiya
http://arxiv.org/abs/2401.05126v1
Compressor summary: The proposed method enables privacy-preserving training and testing of deep neural networks using encrypted images without sacrificing performance.
Samuele Garda,Ulf Leser
http://arxiv.org/abs/2401.05125v1
Compressor summary: BELHD is a new method for biomedical entity linking that handles homonyms by preprocessing the knowledge base and using candidate sharing for contrastive learning, improving results on 10 corpora.
Junhoo Lee,Yearim Kim,Hyunho Lee,Nojun Kwak
http://arxiv.org/abs/2401.05097v1
Compressor summary: The paper introduces an "any-way" learning paradigm that overcomes fixed cardinality constraints in meta-learning by using label equivalence from episodic task sampling and improves performance, convergence speed, and stability.
Jiayuan Tian,Jie Lei,Jiaqing Zhang,Weiying Xie,Yunsong Li
http://arxiv.org/abs/2401.05093v1
Compressor summary: SwiMDiff is a novel self-supervised pre-training framework for remote sensing images that addresses challenges in contrastive learning by scene-wide matching and pixel-level diffusion constraints, improving performance in change detection and land-cover classification tasks.
Florin Leon,Marius Gavrilescu,Sabina-Adriana Floria,Alina-Adriana Minea
http://arxiv.org/abs/2401.05073v1
Compressor summary: The paper presents a deep learning model that identifies transversal skills needed for different jobs by analyzing job ads in multiple languages and using ESCO taxonomy.
Yichong Huang,Xiaocheng Feng,Baohang Li,Chengpeng Fu,Wenshuai Huo,Ting Liu,Bing Qin
http://arxiv.org/abs/2401.05072v1
Compressor summary: The paper proposes a novel translation process (xIoD) that aligns general understanding and content-specific knowledge in LLMs by interpreting difficult words across languages and enhancing translations with these interpretations.
Michal K. Grzeszczyk,Tomasz Trzciński,Arkadiusz Sitek
http://arxiv.org/abs/2401.05069v1
Compressor summary: The authors propose a new machine-learning method for creating interpretable scoring systems for multiclass problems, useful for decision making in healthcare and criminal justice.
Yawen Xiang,Heng Zhou,Chengyang Li,Fangwei Sun,Zhongbo Li,Yongqiang Xie
http://arxiv.org/abs/2401.05055v1
Compressor summary: This paper reviews deep learning methods for blind motion deblurring, comparing their advantages and limitations on various datasets.
Yuu Jinnai,Ukyo Honda,Tetsuro Morimura,Peinan Zhang
http://arxiv.org/abs/2401.05054v1
Compressor summary: The paper proposes two new methods, DMBR and KMBR, to generate high-quality and diverse sentences by adding diversity objectives to MBR decoding, and shows their effectiveness on various text generation tasks.
Tom Richard Vargis,Siavash Ghiasvand
http://arxiv.org/abs/2401.05049v1
Compressor summary: The paper presents a modular image restoration pipeline using existing models and allowing user control over the process, adaptable for various object categories like medical images.
Kaizheng Wang,Keivan Shariatmadar,Shireen Kudukkil Manchingal,Fabio Cuzzolin,David Moens,Hans Hallez
http://arxiv.org/abs/2401.05043v1
Compressor summary: CreINNs are novel interval neural networks that estimate both weight uncertainty and credal sets for improved out-of-distribution detection with less computational complexity than existing methods.
Dennis Ulmer,Elman Mansimov,Kaixiang Lin,Justin Sun,Xibin Gao,Yi Zhang
http://arxiv.org/abs/2401.05033v1
Compressor summary: Key points: - Large language models are powerful but hard to specialize for specific functions - Instructing tuning requires many human-generated samples, which are costly or unavailable - Self-talk method uses LLMs in different roles to collect data for fine-tuning - Automated metric measures dialogue success and filters data for training - Self-talk data improves results and quality of dialogues Summary: The paper proposes a self-talk method that uses large language models to engage in conversations in various roles, collecting data for fine-tuning without human input. It introduces an automated metric to measure dialogue success and filter data, and shows that self-talk data improves results and quality of dialogues.
Sarmad Idrees,Jongeun Choi,Seokman Sohn
http://arxiv.org/abs/2401.05018v1
Compressor summary: AdvMT is a new model for predicting human movements that uses a transformer encoder and a temporal continuity discriminator to capture spatial and temporal dependencies, resulting in more accurate and realistic motion predictions.
Xiaoyan Hu,Farzan Farnia,Ho-fung Leung
http://arxiv.org/abs/2401.05015v1
Compressor summary: VI-IGL is a new information-theoretic method for reinforcement learning that infers latent rewards from feedback variables using conditional mutual information and shows improved performance compared to previous methods.
Jinjing Zhu,Yucheng Chen,Lin Wang
http://arxiv.org/abs/2401.05014v1
Compressor summary: The paper proposes a framework to transfer knowledge between modalities without access to source data by using task-irrelevant data to bridge the gaps and enhance knowledge transfer.
Shubao Zhao,Ming Jin,Zhaoxiang Hou,Chengyi Yang,Zengxiang Li,Qingsong Wen,Yi Wang
http://arxiv.org/abs/2401.05012v1
Compressor summary: HiMTM is a new hierarchical multi-scale model for time series forecasting that outperforms existing self-supervised and end-to-end methods.
Yucheng Han,Na Zhao,Weiling Chen,Keng Teck Ma,Hanwang Zhang
http://arxiv.org/abs/2401.05011v1
Compressor summary: DPKE is a novel approach for semi-supervised 3D object detection that enriches knowledge from data and feature perspectives, improving performance over existing methods.
Chunpeng Zhou,Haishuai Wang,Xilu Yuan,Zhi Yu,Jiajun Bu
http://arxiv.org/abs/2401.05010v1
Compressor summary: The paper proposes a simple and effective framework for few-shot learning that leverages textual information and pre-trained language models, achieving impressive results and surpassing state-of-the-art methods in 1-shot learning tasks.
Christian Mulomba Mukendi,Hyebong Choi
http://arxiv.org/abs/2401.05007v1
Compressor summary: This paper evaluates global disaster risk management efforts using the World Risk Index and finds that traditional long-term strategies are not effective, suggesting a need for innovative approaches tailored to highly vulnerable countries.
Kamil Jeziorek,Piotr Wzorek,Krzysztof Blachut,Andrea Pinna,Tomasz Kryjak
http://arxiv.org/abs/2401.04988v1
Compressor summary: Key points: - Event-based vision processes data from neuromorphic cameras using Graph Convolutional Networks (GCNs) - Paper presents hardware implementation of graph generation from event camera data - Proposes simplifications and modifications to graph representation - Results show minimal impact on object detection performance Summary: The paper proposes and implements a hardware module for generating graphs from event camera data for event-based vision, with minimal impact on object detection.
Haoyu Chu,Yuto Miyatake,Wenjun Cui,Shikui Wei,Daisuke Furihata
http://arxiv.org/abs/2401.04986v1
Compressor summary: The text proposes structure-preserving physics-informed neural networks to improve learning efficiency and apply them to downstream tasks like robust image recognition.
Luanyuan Dai,Xiaoyu Du,Hanwang Zhang,Jinhui Tang
http://arxiv.org/abs/2401.04984v1
Compressor summary: The paper proposes MGNet, a method that combines multiple complementary graphs to find correspondences using graph neural networks and Graph Soft Degree Attention.
YongKyung Oh,Dongyoung Lim,Sungil Kim
http://arxiv.org/abs/2401.04979v1
Compressor summary: The authors propose a novel Neural CDEs-based method that ensures invertibility and better modeling of dynamic temporal dynamics for analyzing irregular and incomplete time series data, outperforming existing models in classification and interpolation tasks.
Sebastian Johann Wetzel
http://arxiv.org/abs/2401.04978v1
Compressor summary: The paper presents a method for interpreting neural network classifiers by finding an intersection between their equivalence classes and symbolic equations, enabling automated scientific discovery.
Qian Wu,Ruoxuan Cui,Yuke Li,Haoqi Zhu
http://arxiv.org/abs/2401.04975v1
Compressor summary: The paper introduces HaltingVT, a video transformer that adaptively removes redundant tokens to improve efficiency and performance in action recognition tasks.
Ian Stewart,Rada Mihalcea
http://arxiv.org/abs/2401.04972v1
Compressor summary: This text discusses how machine translation (MT) systems often fail to accurately translate sentences about same-gender relationships, and examines factors that influence this bias in natural language processing (NLP).
Kailong Tan,Yuxiang Zhou,Qianchen Xia,Rui Liu,Yong Chen
http://arxiv.org/abs/2401.04962v1
Compressor summary: LMSKE is a video summarization method that uses large models to cut videos into shots, generate visual features, cluster candidate keyframes, and eliminate redundancy to create a summary with minimum frames.
Yuncheng Jiang,Zixun Zhang,Yiwen Hu,Guanbin Li,Xiang Wan,Song Wu,Shuguang Cui,Silin Huang,Zhen Li
http://arxiv.org/abs/2401.04961v1
Compressor summary: ECC-PolypDet is a new method that uses contrastive learning and other techniques to improve polyp detection in colorectal cancer screening, outperforming existing approaches.
Huafeng Qin,Hongyu Zhu,Xin Jin,Qun Song,Mounim A. El-Yacoubi,Xinbo Gao
http://arxiv.org/abs/2401.04956v1
Compressor summary: EmMixformer uses a transformer, attention LSTM, and Fourier transformer to capture local and global temporal dependencies in eye movement data for improved recognition accuracy.
Zekun Deng,Hao Yang,Jun Wang
http://arxiv.org/abs/2401.04952v1
Compressor summary: The paper introduces ProFTAP, a framework to evaluate AI's poetry writing ability, and shows that current LLMs can write classical Chinese poems comparable to humans, even surpassing GPT-4.
Beiwen Tian,Huan-ang Gao,Leiyao Cui,Yupeng Zheng,Lan Luo,Baofeng Wang,Rong Zhi,Guyue Zhou,Hao Zhao
http://arxiv.org/abs/2401.04942v1
Compressor summary: The text introduces a new video anomaly segmentation dataset for autonomous driving, using synthetic data enhanced with a generative adversarial network to improve realism and providing two novel metrics for evaluating the safety of the algorithm.
Sicong Huang,Jiawei He,Kry Yik Chau Lui
http://arxiv.org/abs/2401.04933v1
Compressor summary: The authors propose a new method for out of distribution detection using variational autoencoders, with provable guarantees and better empirical results than existing techniques.
Mingyu Jin,Qinkai Yu,Dong shu,Haiyan Zhao,Wenyue Hua,Yanda Meng,Yongfeng Zhang,Mengnan Du
http://arxiv.org/abs/2401.04925v1
Compressor summary: The number and length of reasoning steps in chain of thought prompts greatly affect large language models' reasoning abilities, with longer steps enhancing performance and shorter steps diminishing it.
Ruiyu Mao,Ouyang Xu,Yunhui Guo
http://arxiv.org/abs/2401.04923v1
Compressor summary: NEAT is an efficient data-centric method for annotating open-set data, which improves upon existing active learning approaches by handling unknown classes and using clusterability of labels to select informative samples.
Hongbo Kang,Yong Wang,Mengyuan Liu,Doudou Wu,Peng Liu,Xinlin Yuan,Wenming Yang
http://arxiv.org/abs/2401.04921v1
Compressor summary: The DRPose framework refines the output of deterministic models for 3D human pose estimation using diffusion, denoising, and multi-step refinement, achieving state-of-the-art performance on both single and multi-hypothesis tasks.
Jianqiao Sun,Yudi Su,Hao Zhang,Ziheng Cheng,Zequn Zeng,Zhengjue Wang,Bo Chen,Xin Yuan
http://arxiv.org/abs/2401.04903v1
Compressor summary: The paper proposes a novel video captioning pipeline called SnapCap that generates captions directly from compressed measurements obtained by a snapshot compressive sensing camera, improving efficiency and quality over traditional methods.
Bingchao Wang
http://arxiv.org/abs/2401.04898v1
Compressor summary: This paper introduces ANGO, a Chinese multi-choice question benchmark for evaluating large language models with improved interpretability, question difficulty, and sampling strategies.
Manqing Mao,Paishun Ting,Yijian Xiang,Mingyang Xu,Julia Chen,Jianzhe Lin
http://arxiv.org/abs/2401.04883v1
Compressor summary: The paper introduces MUCA, an LLM-based chatbot framework for group discussions that considers 3W design dimensions (what, when, and who) and uses an LLM-based simulator to improve efficiency in developing the framework.
Zi Yang,Nan Hua
http://arxiv.org/abs/2401.04881v1
Compressor summary: The paper proposes an efficient way to handle long sequences in LLMs by using eviction policies and a wait-to-attend mechanism that adapts to different architectures and tasks.
Yu Liu,Yuexin Zhang,Kunming Li,Yongliang Qiao,Stewart Worrall,You-Fu Li,He Kong
http://arxiv.org/abs/2401.04872v1
Compressor summary: The paper proposes a graph transformer with self-attention and domain adaptation to improve pedestrian motion prediction across various scenarios, and introduces a new metric for evaluation.
Koji Inoue,Bing'er Jiang,Erik Ekstedt,Tatsuya Kawahara,Gabriel Skantze
http://arxiv.org/abs/2401.04868v1
Compressor summary: The paper presents a real-time turn-taking prediction system using voice activity projection (VAP), which combines contrastive predictive coding (CPC) and self/cross-attention transformers to map audio to future voice activities.
Koji Inoue,Divesh Lala,Keiko Ochi,Tatsuya Kawahara,Gabriel Skantze
http://arxiv.org/abs/2401.04867v1
Compressor summary: The paper proposes an objective framework for evaluating spoken dialogue systems based on users' behaviours and shows how different behaviours correlate with subjective scores in three social dialogue tasks.
Xingyu Miao,Yang Bai,Haoran Duan,Yawen Huang,Fan Wan,Yang Long,Yefeng Zheng
http://arxiv.org/abs/2401.04861v1
Compressor summary: Our method improves novel view generation from monocular videos by modeling object motion using a time-frequency domain module on top of a generalization NeRF.
Eunyi Lyou,Doyeon Lee,Jooeun Kim,Joonseok Lee
http://arxiv.org/abs/2401.04860v1
Compressor summary: The paper proposes a novel framework for zero-shot sketch-based image retrieval that aligns sketches and photos through texts and bridges the modality gap using an explicit modality encoding.
Sumanth Doddapaneni,Krishna Sayana,Ambarish Jash,Sukhdeep Sodhi,Dima Kuzmin
http://arxiv.org/abs/2401.04858v1
Compressor summary: The study proposes a User Embedding Module (UEM) that converts user history in text form into embeddings, which improve recommendation systems' personalization and handle longer histories better than conventional methods.
Haotian Gu,Tim Jacobs,Philip Kaminsky,Xin Guo,Xinyu Li
http://arxiv.org/abs/2401.04857v1
Compressor summary: The authors propose a signature-based statistical technique for more accurate and interpretable marketplace rate forecasts, outperforming existing methods during crises like Covid-19 and the Ukraine war.
Sixu Li,Shi Chen,Qin Li
http://arxiv.org/abs/2401.04856v1
Compressor summary: The paper shows that score-based generative models can fail to generate diverse samples even when the score function is well learned, by providing a counterexample where they produce blurry versions of training data.
Harvey Lederman,Kyle Mahowald
http://arxiv.org/abs/2401.04854v1
Compressor summary: The text discusses whether LLMs are cultural technologies that transmit information or possess a limited form of agency based on their ability to generate novel reference using new names for entities.