This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-04 generated by the compressor, my personal LLM-based project.
Guillaume Le Moing,Jean Ponce,Cordelia Schmid
http://arxiv.org/abs/2312.00786v1
Compressor summary: DOT is a fast point tracking method that uses key regions, nearest-neighbor interpolation, and a learnable optical flow estimator to handle occlusions and outperforms existing techniques while being much faster.
Yutong Bai,Xinyang Geng,Karttikeya Mangalam,Amir Bar,Alan Yuille,Trevor Darrell,Jitendra Malik,Alexei A Efros
http://arxiv.org/abs/2312.00785v1
Compressor summary: The paragraph introduces a novel method to train a large vision model using visual sentences without linguistic data and shows it can handle various tasks with different prompts.
Mu Cai,Haotian Liu,Siva Karthik Mustikovela,Gregory P. Meyer,Yuning Chai,Dennis Park,Yong Jae Lee
http://arxiv.org/abs/2312.00784v1
Compressor summary: The paragraph introduces a new multimodal model that can understand user-friendly visual prompts like colored boxes or arrows on images and performs well on region-understanding tasks.
Hengyi Wang,Jingwen Wang,Lourdes Agapito
http://arxiv.org/abs/2312.00778v1
Compressor summary: MorpheuS is a framework for reconstructing dynamic, deformable objects in 360 degrees from casually captured RGB-D videos using neural representations and a view-dependent diffusion prior.
Yuming Jiang,Tianxing Wu,Shuai Yang,Chenyang Si,Dahua Lin,Yu Qiao,Chen Change Loy,Ziwei Liu
http://arxiv.org/abs/2312.00777v1
Compressor summary: VideoBooth is a video generation framework that uses image prompts to create customized and high-quality videos with coarse-to-fine embeddings and cross-frame attention layers.
Junfeng Liu,Zhuocheng Mei,Kewen Peng,Ranga Raju Vatsavai
http://arxiv.org/abs/2312.00774v1
Compressor summary: The paper introduces a new method, PK-NCLI, that uses low-level normalized contextual latent interaction to efficiently identify relevant auxiliary information for improving conversational agents' responses and outperforms the existing state-of-the-art method, PK-FoCus.
Fatemeh Taheri Dezaki,Himanshu Arora,Rahul Suresh,Amin Banitalebi-Dehkordi
http://arxiv.org/abs/2312.00766v1
Compressor summary: The paper introduces an automated pipeline using machine learning to extract material attributes from makeup product images, enhancing product discovery and virtual try-on experiences.
Svetoslav Nizhnichenkov,Rahul Nair,Elizabeth Daly,Brian Mac Namee
http://arxiv.org/abs/2312.00765v1
Compressor summary: The paper proposes an explainable meta-classifier to identify cohorts affected by bias mitigation strategies, and shows that some mitigation methods negatively impact certain groups even with improved fairness metrics.
Sangamesh Kodge,Gobinda Saha,Kaushik Roy
http://arxiv.org/abs/2312.00761v1
Compressor summary: The paper introduces a new algorithm for machine unlearning that strategically removes classes from a model using novel spaces and a singular value decomposition-based technique, achieving good performance and efficiency in retaining accuracy and improving privacy against attacks.
Albert Gu,Tri Dao
http://arxiv.org/abs/2312.00752v1
Compressor summary: Mamba is a fast and scalable sequence model that uses selective structured state space models for content-based reasoning and achieves state-of-the-art performance on various modalities.
Tam Nguyen,Tan M. Nguyen,Richard G. Baraniuk
http://arxiv.org/abs/2312.00751v1
Compressor summary: The paper introduces a new type of transformer model that reduces token representation over-smoothing by penalizing the difference between input and output tokens using self-attention.
Dekun Wu,Haochen Shi,Zhiyuan Sun,Bang Liu
http://arxiv.org/abs/2312.00746v1
Compressor summary: The study explores using Large Language Models (LLMs) in Chinese murder mystery role-playing games, introducing a new dataset and framework to improve AI agent performance and evaluation.
Min Wei,Jingkai Zhou,Junyao Sun,Xuesong Zhang
http://arxiv.org/abs/2312.00739v1
Compressor summary: The paper proposes a new score distillation method (ASD) that improves stability and performance in various tasks by optimizing the discriminator using the Wasserstein Generative Adversarial Network (WGAN) paradigm.
Xuan-Phi Nguyen,Wenxuan Zhang,Xin Li,Mahani Aljunied,Qingyu Tan,Liying Cheng,Guanzheng Chen,Yue Deng,Sen Yang,Chaoqun Liu,Hang Zhang,Lidong Bing
http://arxiv.org/abs/2312.00738v1
Compressor summary: SeaLLMs are language models for Southeast Asian languages that respect local culture and perform better than ChatGPT in non-Latin languages.
Mingqiao Ye,Martin Danelljan,Fisher Yu,Lei Ke
http://arxiv.org/abs/2312.00732v1
Compressor summary: Gaussian Grouping extends Gaussian Splatting to jointly reconstruct and segment objects in 3D scenes using Identity Encodings, 2D mask predictions, and spatial consistency regularization, enabling versatile scene editing applications.
Xiaoyuan Cheng,Boli Chen,Liz Varga,Yukun Hu
http://arxiv.org/abs/2312.00727v1
Compressor summary: The paper proposes a stochastic model-based approach for safe reinforcement learning in partially observable environments using Predictive State Representation and Reproducing Kernel Hilbert Space.
Chenyu Wang,Sharut Gupta,Caroline Uhler,Tommi Jaakkola
http://arxiv.org/abs/2312.00718v1
Compressor summary: InfoCORE is an information maximization method for removing batch effects in drug screening data, improving molecular property prediction and retrieval.
Mauricio Tec,Ana Trisovic,Michelle Audirac,Sophie Woodward,Naeem Khoshnevis,Francesca Dominici
http://arxiv.org/abs/2312.00710v1
Compressor summary: SpaCE is a toolkit for assessing causal inference methods in studies with spatial confounding by providing realistic benchmark datasets and tools.
Loick Chambon,Eloi Zablocki,Mickael Chen,Florent Bartoccioni,Patrick Perez,Matthieu Cord
http://arxiv.org/abs/2312.00703v1
Compressor summary: PointBeV is a sparse Bird's-eye View segmentation model that uses sparse cells instead of dense grids, improving memory efficiency and enabling better performance on vehicle, pedestrian, and lane detection tasks.
Chinmay Savadikar,Xi Song,Tianfu Wu
http://arxiv.org/abs/2312.00700v1
Compressor summary: GIFT is a method for fine-tuning Transformer models with built-in interpretability, using a Parameter-to-Cluster Attention mechanism to generate and explain the fine-tuning parameters.
Bin Xiao,Murat Simsek,Burak Kantarci,Ala Abu Alkheir
http://arxiv.org/abs/2312.00699v1
Compressor summary: The paragraph discusses limitations of existing table detection methods, compares two-stage and transformer-based models, and identifies key design aspects for improving a two-stage model's performance in table structure recognition tasks.
Martin Georg Ljungqvist,Otto Nordander,Markus Skans,Arvid Mildner,Tony Liu,Pierre Nugues
http://arxiv.org/abs/2312.00694v1
Compressor summary: The paper explores how training a neural network object detector on real vs synthetic data affects each layer using a similarity analysis method, and finds the largest differences in the head part of the network.
Benedikt W. Hosp,Martin Dechant,Yannick Sauer,Rajat Agarwala,Siegfried Wahl
http://arxiv.org/abs/2312.00692v1
Compressor summary: The study introduces a virtual reality simulation tool for evaluating vision science methods in various real-world scenarios with high control and flexibility.
Jaime Corsetti,Davide Boscaini,Changjae Oh,Andrea Cavallaro,Fabio Poiesi
http://arxiv.org/abs/2312.00690v1
Compressor summary: The paragraph describes a new open-vocabulary object 6D pose estimation method that uses a textual prompt, no object model, and two viewpoints of two different scenes, outperforming existing methods on a new benchmark.
Hadi Wazni,Mehrnoosh Sadrzadeh
http://arxiv.org/abs/2312.00688v1
Compressor summary: The paper presents a Quantum Natural Language Processing system that uses Parametrised Quantum Circuits to perform pronoun resolution tasks and shows its effectiveness compared to classical systems.
Pablo Gamallo
http://arxiv.org/abs/2312.00680v1
Compressor summary: The paper proposes a transparent and interpretable method for encoding word contexts based on semantic compositionality, and shows that it can compete with Transformers in calculating word sense similarity.
Tianyu Ding,Tianyi Chen,Haidong Zhu,Jiachen Jiang,Yiqi Zhong,Jinxin Zhou,Guangzhi Wang,Zhihui Zhu,Ilya Zharkov,Luming Liang
http://arxiv.org/abs/2312.00678v1
Compressor summary: The text summarizes a survey of algorithmic advancements to improve the efficiency of Large Language Models (LLMs) in various aspects, such as scaling laws, data utilization, and training strategies.
Ying Nie,Wei He,Kai Han,Yehui Tang,Tianyu Guo,Fanyi Du,Yunhe Wang
http://arxiv.org/abs/2312.00674v1
Compressor summary: The paper proposes a multi-level alignment and masked language modeling approach to train lightweight CLIP models for vision-language tasks without increasing inference cost.
Mehdi Naouar,Gabriel Kalweit,Anusha Klett,Yannick Vogt,Paula Silvestrini,Diana Laura Infante Ramirez,Roland Mertelsmann,Joschka Boedecker,Maria Kalweit
http://arxiv.org/abs/2312.00671v1
Compressor summary: CellMixer is an annotation-free approach that uses image-level labels to train a segmentation model for identifying different cell types in heterogeneous cell populations.
Kangcheng Liu,Yong-Jin Liu,Kai Tang,Ming Liu,Baoquan Chen
http://arxiv.org/abs/2312.00663v1
Compressor summary: This paper presents a method to improve 3D scene understanding with limited labels by using pre-trained vision-language models, boundary awareness, and unsupervised learning.
Fabio Fehr,James Henderson
http://arxiv.org/abs/2312.00662v1
Compressor summary: The authors propose extending Nonparametric Variational Information Bottleneck (NVIB) to all attention functions in Transformers, which improves out-of-domain generalisation without additional training and suggests that pretrained Transformers are implicitly NV Bayesian models.
Ehsan Beikihassan,Amy K. Hoover,Ioannis Koutis,Ali Parviz,Niloofar Aghaieabiane
http://arxiv.org/abs/2312.00660v1
Compressor summary: The paper explores how natural knowledge diffusion processes in networks of artificial learners can optimize performance under resource constraints, inspired by human peer learning.
Cuong N. Nguyen,Phong Tran,Lam Si Tung Ho,Vu Dinh,Anh T. Tran,Tal Hassner,Cuong V. Nguyen
http://arxiv.org/abs/2312.00656v1
Compressor summary: The authors propose two efficient methods for estimating how well deep learning models transfer from one task to another in regression tasks, and show that their methods significantly outperform existing approaches in both accuracy and efficiency.
Stefan Hegselmann,Antonio Parziale,Divya Shanmugam,Shengpu Tang,Mercy Nyamewaa Asiedu,Serina Chang,Thomas Hartvigsen,Harvineet Singh
http://arxiv.org/abs/2312.00655v1
Compressor summary: The paragraph describes the collection of accepted Findings papers from the 3rd Machine Learning for Health symposium (ML4H 2023), which featured health-related topics and two submission tracks, with all submissions undergoing a double-blind peer-review process.
Pengxiang Li,Zhili Liu,Kai Chen,Lanqing Hong,Yunzhi Zhuge,Dit-Yan Yeung,Huchuan Lu,Xu Jia
http://arxiv.org/abs/2312.00651v1
Compressor summary: The paper introduces TrackDiffusion, a new architecture that generates continuous video sequences from tracklets, improving instance consistency and perceptual metrics, and enabling better training of multi-object tracking systems.
Ioannis Kakogeorgiou,Spyros Gidaris,Konstantinos Karantzalos,Nikos Komodakis
http://arxiv.org/abs/2312.00648v1
Compressor summary: This paper introduces two new techniques to improve slot-based autoencoders for unsupervised object segmentation in complex scenes.
Paul Bricman
http://arxiv.org/abs/2312.00645v1
Compressor summary: The authors propose hashmarking, a method to evaluate language models on sensitive topics without revealing the correct answers, by using cryptographic hashing of solutions before publication.
Karim Kassab,Antoine Schnepf,Jean-Yves Franceschi,Laurent Caraffa,Jeremie Mary,Valérie Gouet-Brunet
http://arxiv.org/abs/2312.00639v1
Compressor summary: EvE is a new method that uses generative networks to improve in-the-wild scene modeling and produce more realistic images, outperforming existing methods on novel view synthesis tasks.
Yuxin Li,Qiang Han,Mengying Yu,Yuxin Jiang,Chaikiat Yeo,Yiheng Li,Zihang Huang,Nini Liu,Hsuanhan Chen,Xiaojun Wu
http://arxiv.org/abs/2312.00633v1
Compressor summary: BEVENet is a fast and efficient 3D object detection framework that uses convolutional neural networks instead of vision-transformers, making it suitable for autonomous driving applications.
Michail Tarasiou,Jiankang Deng,Stefanos Zafeiriou
http://arxiv.org/abs/2312.00627v1
Compressor summary: The authors propose a framework for heterogeneous face recognition that uses large neural networks pre-trained on homogeneous visible data and fine-tuned on near-infrared data, achieving state-of-the-art results.
Joschka Herteux,Christoph Räth,Amine Baha,Giulia Martini,Duccio Piovani
http://arxiv.org/abs/2312.00626v1
Compressor summary: The authors present a new quantitative method to forecast food consumption levels for 60 days in four countries using data from the World Food Programme's hunger monitoring system and various models, finding Reservoir Computing as the best performer for this task.
Jose Pablo Folch,James Odgers,Shiqiang Zhang,Robert M Lee,Behrang Shafei,David Walz,Calvin Tsay,Mark van der Wilk,Ruth Misener
http://arxiv.org/abs/2312.00622v1
Compressor summary: The paper presents an extended SnAKe algorithm that handles costs and constraints for Bayesian optimization in data-driven experimental design.
Maren Hackenberg,Michelle Pfaffenlehner,Max Behrens,Astrid Pechmann,Janbernd Kirschner,Harald Binder
http://arxiv.org/abs/2312.00616v1
Compressor summary: The paper explores using deep learning and domain adaptation to combine different measurement instruments for assessing individuals over time, specifically in a spinal muscular atrophy registry.
Maorong Wang,Nicolas Michel,Ling Xiao,Toshihiko Yamasaki
http://arxiv.org/abs/2312.00600v1
Compressor summary: The paper proposes Collaborative Continual Learning (CCL) to address the challenge of acquiring new knowledge in online learning, and introduces Distillation Chain (DC) as a novel collaborative learning scheme that improves model plasticity and performance.
João Carreira,Michael King,Viorica Pătrăucean,Dilara Gokay,Cătălin Ionescu,Yi Yang,Daniel Zoran,Joseph Heyward,Carl Doersch,Yusuf Aytar,Dima Damen,Andrew Zisserman
http://arxiv.org/abs/2312.00598v1
Compressor summary: The authors present a framework for online learning from continuous video streams, addressing challenges related to high frame correlations, and demonstrate its effectiveness through experiments with pixel-to-pixel modelling and future prediction tasks.
Afifa Khaled,Chao Li,Jia Ning,Kun He
http://arxiv.org/abs/2312.00596v1
Compressor summary: BCN is a novel normalization technique for deep learning that adapts to both channel and batch dependence, enabling higher learning rates and better performance on different tasks.
Sahar Nasirihaghighi,Negin Ghamsarian,Heinrich Husslein,Klaus Schoeffmann
http://arxiv.org/abs/2312.00593v1
Compressor summary: The paper introduces a dataset for recognizing critical events in laparoscopic gynecology videos using a hybrid transformer architecture and a frame sampling strategy that improves event recognition accuracy.
Emma Cramer,Jonas Reiher,Sebastian Trimpe
http://arxiv.org/abs/2312.00592v1
Compressor summary: The paper proposes a metric to evaluate how well spatial autoencoders (SAEs) can track objects in images, which is important for robotic reinforcement learning (RL), and suggests three modifications to improve SAE architectures.
Xudong Li,Jingyuan Zheng,Xiawu Zheng,Runze Hu,Enwei Zhang,Yuting Gao,Yunhang Shen,Ke Li,Yutao Liu,Pingyang Dai,Yan Zhang,Rongrong Ji
http://arxiv.org/abs/2312.00591v1
Compressor summary: The paper proposes a new method to assess image quality without reference images by learning from non-aligned references using feature distillation, achieving state-of-the-art performance.
Samantha Visbeek,Erman Acar,Floris den Hengst
http://arxiv.org/abs/2312.00586v1
Compressor summary: The paper proposes Deep Symbolic Classification (DSC), a framework that combines deep neural networks and reinforcement learning to search for explainable, transparent, and data-driven fraud detection models that can handle class imbalance without oversampling or undersampling.
Josef Valvoda,Alec Thompson,Ryan Cotterell,Simone Teufel
http://arxiv.org/abs/2312.00584v1
Compressor summary: The paper discusses the ethical challenges of using NLP models to automate the role of judges in common law systems, arguing that current models are not capable of shaping the law and even if they were, there would still be ethical concerns.
Stephen Wu,Yu Otake,Yosuke Higo,Ikumasa Yoshida
http://arxiv.org/abs/2312.00581v1
Compressor summary: The paper discusses how data-driven methods and deep learning can improve geotechnics by addressing soil complexity and promoting open science, and envisions a future where advanced computational tools revolutionize the field.
Khai Loong Aw,Syrielle Montariol,Badr AlKhamissi,Martin Schrimpf,Antoine Bosselut
http://arxiv.org/abs/2312.00575v1
Compressor summary: Instruction-tuning enhances language models' similarity to human brain activity but not behavior on a reading task.
Aleksi Knuutila
http://arxiv.org/abs/2312.00570v1
Compressor summary: The paper applies GANs to study visual aspects of social processes in London, mapping how different areas vary in health, income, and environmental quality using image synthesis and comparing the results from three inversion techniques.
Iakes Goenaga,Aitziber Atutxa,Koldo Gojenola,Maite Oronoz,Rodrigo Agerri
http://arxiv.org/abs/2312.00567v1
Compressor summary: The paper presents a new Spanish dataset for extractive question answering in Evidence-Based Medicine, with explanations written by medical doctors and benchmarks both correct and incorrect answers.
Tingting Ni,Maryam Kamgarpour
http://arxiv.org/abs/2312.00561v1
Compressor summary: The paper proposes a new zeroth-order interior point method for constrained Markov decision processes that guarantees constraint satisfaction during learning and converges faster than existing algorithms.
Aniket Deroy,Subhankar Maity
http://arxiv.org/abs/2312.00554v1
Compressor summary: The study examines how biases in legal dataset summaries and large language models affect justice systems and explores various types of biases, such as gender, race, crime against women, country names, and religious keywords.
Qing Wang,Kang Zhou,Qiao Qiao,Yuepei Li,Qi Li
http://arxiv.org/abs/2312.00552v1
Compressor summary: The paper introduces AugURE, a method for unsupervised relation extraction that increases positive pair diversity with cross-sentence augmentation and uses margin loss instead of NCE to improve relation representation learning.
Sungho Choi,Seungyul Han,Woojun Kim,Jongseong Chae,Whiyoung Jung,Youngchul Sung
http://arxiv.org/abs/2312.00548v1
Compressor summary: The paper presents a novel framework for cross-domain imitation learning with visual observation that extracts domain-independent features to improve performance in practical scenarios.
Tianlang He,Zhiqiu Xia,Jierun Chen,Haoliang Li,S. -H. Gary Chan
http://arxiv.org/abs/2312.00540v1
Compressor summary: TASFAR is a new target-agnostic source-free domain adaptation method for regression tasks that uses prediction confidence to estimate a label density map and calibrate the source model on the target domain, achieving superior performance compared to existing approaches.
Jannis Vamvas,Tobias Domhan,Sony Trenous,Rico Sennrich,Eva Hasler
http://arxiv.org/abs/2312.00536v1
Compressor summary: The paper compares two metrics for machine translation evaluation and finds that the trained one is more robust to machine-translated references, indicating unintended positive effects of metric training.
Jose Luis Apellániz,Mikel García,Nerea Aranjuelo,Javier Barandiarán,Marcos Nieto
http://arxiv.org/abs/2312.00534v1
Compressor summary: The paper presents a method for detecting 3D curbs from LiDAR point clouds and shows how it reduces manual annotation time by 50.99%.
Christina Gsaxner,Shohei Mori,Dieter Schmalstieg,Jan Egger,Gerhard Paar,Werner Bailer,Denis Kalkofen
http://arxiv.org/abs/2312.00532v1
Compressor summary: DeepDR is a new RGB-D inpainting framework that can remove real objects from scenes and generate coherent structure and 3D geometry, achieving high quality results at real-time speeds.
Archchana Sindhujan,Diptesh Kanojia,Constantin Orasan,Tharindu Ranasinghe
http://arxiv.org/abs/2312.00525v1
Compressor summary: The paper describes an approach that uses autoencoder pre-trained language models within the MonoTransQuest architecture to assess translation quality without reference, and shows that MonoTQ-InfoXLM-large performs best among the tested models.
Haotian Gao,Renhe Jiang,Zheng Dong,Jinliang Deng,Xuan Song
http://arxiv.org/abs/2312.00516v1
Compressor summary: The paper introduces a new method called STD-MAE that uses masked autoencoders to learn complex spatio-temporal patterns in traffic data and improve forecasting performance.
Yueguan Wang,Naoki Yoshinaga
http://arxiv.org/abs/2312.00513v1
Compressor summary: SUMMaug is a data augmentation technique that uses summarization to generate pseudo examples for document classification tasks, improving robustness and accuracy.
Aristotelis Ballas,Vasileios Papapanagiotou,Christos Diou
http://arxiv.org/abs/2312.00502v1
Compressor summary: The authors propose contrastive self-supervised learning (SSL) to improve deep-learning models' effectiveness and robustness in detecting abnormalities in phonocardiogram signals using audio-based augmentations, addressing the scarcity of labeled data in medicine.
Mohammad Altillawi,Zador Pataki,Shile Li,Ziyuan Liu
http://arxiv.org/abs/2312.00500v1
Compressor summary: The authors propose a novel network to train a deep neural network for camera localization in robotics and AR/VR applications using relative spatial and temporal geometric constraints, achieving better performance with very limited ground-truth data.
Wahidul Hasan Abir,Md. Fahim Uddin,Faria Rahman Khanam,Mohammad Monirujjaman Khan
http://arxiv.org/abs/2312.00487v1
Compressor summary: The paper proposes an automated detection method for Acute Lymphoblastic Leukemia using deep learning models and explainable artificial intelligence to improve accuracy and efficiency in diagnosis.
William Bankes,George Hughes,Ilija Bogunovic,Zi Wang
http://arxiv.org/abs/2312.00486v1
Compressor summary: REDUCR is a data downsampling method that uses class priority reweighting to reduce training costs and improve worst-class generalization performance in image and text classification tasks.
Junkai Mao,Yuexing Han,Gouhei Tanaka,Bing Wang
http://arxiv.org/abs/2312.00485v1
Compressor summary: The proposed Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN) is a novel deep learning model that combines static backbone graphs with temporal models for accurate epidemic forecasting, overcoming limitations of recurrent structures and showing superior complexity and efficiency.
Ambroise Heurtebise,Pierre Ablin,Alexandre Gramfort
http://arxiv.org/abs/2312.00484v1
Compressor summary: MVICAD is an improved MultiView ICA algorithm that accounts for source delays and better separates sources in observed signals, applicable to neuroscience data analysis.
Hiroaki Yamada,Takenobu Tokunaga,Ryutaro Ohara,Akira Tokutsu,Keisuke Takeshita,Mihoko Sumida
http://arxiv.org/abs/2312.00480v1
Compressor summary: The paper introduces JTD, a novel dataset for Japanese Legal Judgment Prediction with two tasks: tort prediction and rationale extraction, which requires identifying court-accepted arguments from party allegations.
Matthieu Blanke,Marc Lelarge
http://arxiv.org/abs/2312.00477v1
Compressor summary: The authors propose a simpler learning model for multi-environment generalization in machine learning, which can identify the physical parameters of the system and enable interpretable learning while having competitive performance and low computational cost.
Antonio Sabbatella,Andrea Ponti,Antonio Candelieri,Ilaria Giordani,Francesco Archetti
http://arxiv.org/abs/2312.00471v1
Compressor summary: The paper proposes a Bayesian optimization method for discrete prompt tuning in classification tasks, which can efficiently search for optimal token sequences without relying on large language models.
A. M. Ershov,D. V. Tropin,E. E. Limonova,D. P. Nikolaev,V. V. Arlazarov
http://arxiv.org/abs/2312.00467v1
Compressor summary: Unfolder is a novel algorithm that rectifies images of documents with a crease from folding in half, outperforming advanced neural network methods and having fast runtime on smartphones.
Kerui Gu,Zhihao Li,Shiyong Liu,Jianzhuang Liu,Songcen Xu,Youliang Yan,Michael Bi Mi,Kenji Kawaguchi,Angela Yao
http://arxiv.org/abs/2312.00462v1
Compressor summary: The text discusses how removing orthogonalization from rotation matrices improves training efficiency and leads to better results in 3D computer vision tasks like human pose estimation.
Mayalen Etcheverry,Bert Wang-Chak Chan,Clément Moulin-Frier,Pierre-Yves Oudeyer
http://arxiv.org/abs/2312.00455v1
Compressor summary: The article presents a framework for generating endless complex artifacts in Minecraft using a complex system and a meta-diversity search algorithm, which learns to discover diverse patterns and seek novel sources of diversity.
Laura Smets,Werner Van Leekwijck,Ing Jyh Tsang,Steven Latré
http://arxiv.org/abs/2312.00454v1
Compressor summary: Hyperdimensional Computing (HDC) is a brain-inspired, lightweight machine learning method that performs well in image classification tasks with a novel encoding approach that preserves pattern similarity and improves accuracy and robustness.
Yajie Liu,Pu Ge,Haoxiang Ma,Shichao Fan,Qingjie Liu,Di Huang,Yunhong Wang
http://arxiv.org/abs/2312.00452v1
Compressor summary: The paper proposes a novel RIS method that improves generalization by using a prompt to handle linguistic style changes and a multi-modal fusion module to leverage spatial relations, achieving consistent gains on various datasets.
Zehao Zhu,Zhiwen Fan,Yifan Jiang,Zhangyang Wang
http://arxiv.org/abs/2312.00451v1
Compressor summary: The paper proposes FSGS, a few-shot view synthesis framework that uses 3D Gaussian Splatting to generate real-time and photo-realistic views from as few as three training images while accurately filling in sparse scene details.
Yingzi Ma,Yulong Cao,Jiachen Sun,Marco Pavone,Chaowei Xiao
http://arxiv.org/abs/2312.00438v1
Compressor summary: Dolphins is a vision-language model that can process multimodal inputs and generate outputs for various autonomous driving tasks by using Grounded Chain of Thought, open-source pretrained model, and human-like capabilities.
Pooja Bhatnagar,Sai Mrunaal,Sachin Kamnure
http://arxiv.org/abs/2312.00435v1
Compressor summary: This research compares different neural architectures for image captioning, proposes a new quality metric, and highlights the importance of data refinement and hyperparameter optimization.
Sumit Agarwal,Aditya Srikanth Veerubhotla,Srijan Bansal
http://arxiv.org/abs/2312.00434v1
Compressor summary: PEFTDebias is a new method that uses parameter-efficient fine-tuning to reduce biases in foundation models by acquiring debiasing parameters and incorporating them during training.
Pietro Bonazzi,Sizhen Bian,Giovanni Lippolis,Yawei Li,Sadique Sheik,Michele Magno
http://arxiv.org/abs/2312.00425v1
Compressor summary: The paper presents a neuromorphic eye-tracking method using a spiking neural network model called Retina, which performs better than the latest method with less power consumption and fewer parameters.
Hamid Sarmadi,Thorsteinn Rögnvaldsson,Nils Roger Carlsson,Mattias Ohlsson,Ibrahim Wahab,Ola Hall
http://arxiv.org/abs/2312.00416v1
Compressor summary: The paper examines how deep convolutional neural networks predict poverty and development indicators from satellite images and identifies key features that influence their predictions.
Taichi Nishimura,Shota Nakada,Masayoshi Kondo
http://arxiv.org/abs/2312.00414v1
Compressor summary: The paper presents an efficient method for retrieving partially relevant videos using super images and large-scale vision-and-language models, outperforming previous methods.
Deepak Sridhar,Yunsheng Li,Nuno Vasconcelos
http://arxiv.org/abs/2312.00412v1
Compressor summary: The paper proposes SCHEME, a method to improve Vision Transformers by using sparse feature mixing and a channel covariance attention mechanism, leading to better performance with fewer computations.
Yeshuo Shu,Gangcheng Zhang,Keyi Liu,Jintong Tang,Liyan Xu
http://arxiv.org/abs/2312.00411v1
Compressor summary: The study presents a method to extract high-order features from human mobility data and cluster users into different lifestyle profiles based on their movement patterns, time series, and place semantics.
Kai Lv,Shuo Zhang,Tianle Gu,Shuhao Xing,Jiawei Hong,Keyu Chen,Xiaoran Liu,Yuqing Yang,Honglin Guo,Tengxiao Liu,Yu Sun,Qipeng Guo,Hang Yan,Xipeng Qiu
http://arxiv.org/abs/2312.00407v1
Compressor summary: CoLLiE is an efficient library that enables collaborative training of large language models with various optimizers and fine-tuning methods, offering efficiency, ease of use, and customization.
Hyunju Kim,Heesuk Son,Dongman Lee
http://arxiv.org/abs/2312.00404v1
Compressor summary: The paper proposes an efficient group activity recognition scheme using causality patterns extracted from pervasive sensor data without user identification, achieving high accuracy and low runtime overhead in real environments.
Yaoyao Zhong,Mengshi Qi,Rui Wang,Yuhan Qiu,Yang Zhang,Huadong Ma
http://arxiv.org/abs/2312.00401v1
Compressor summary: The paper introduces VIoTGPT, a framework that uses large language models to interact with humans, query knowledge from videos, and invoke vision models for complex tasks in the Video Internet of Things.
Sida Li,Ioana Marinescu,Sebastian Musslick
http://arxiv.org/abs/2312.00396v1
Compressor summary: The paper proposes a new method (GFN-SR) for symbolic regression using deep learning and stochastic policy learning, which performs better than existing methods in noisy data scenarios.
Kshitij Deshpande,Varad Mashalkar,Kaustubh Mhaisekar,Amaan Naikwadi,Archana Ghotkar
http://arxiv.org/abs/2312.00392v1
Compressor summary: This paper surveys the application, methodology, data sources, and challenges of gesture recognition systems in various sectors and compares different techniques for capturing gestures.
Junchen Zhao,Yurun Song,Simeng Liu,Ian G. Harris,Sangeetha Abdu Jyothi
http://arxiv.org/abs/2312.00388v1
Compressor summary: LinguaLinked is a system that enables efficient distributed inference of large language models on mobile devices by optimizing model assignment, data transmission, and runtime load balancing.
Georgios Makridis,Vasileios Koukos,Georgios Fatouros,Dimosthenis Kyriazis
http://arxiv.org/abs/2312.00380v1
Compressor summary: The paragraph introduces a comprehensive framework that combines various XAI techniques to interpret models trained on trajectory data and improve the understanding of model decisions for different user demographics.
Noga Alon,Dmitrii Avdiukhin,Dor Elboim,Orr Fischer,Grigory Yaroslavtsev
http://arxiv.org/abs/2312.00379v1
Compressor summary: The paper studies how many labeled examples are needed to learn good representations using contrastive learning, and gives optimal bounds for various distance functions.
Fangxin Shang,Jie Fu,Yehui Yang,Lei Ma
http://arxiv.org/abs/2312.00377v1
Compressor summary: The SynFundus-1M dataset provides over 1 million realistic synthetic retinal fundus images and annotations for medical imaging research, overcoming the challenge of privacy restrictions and outperforming existing methods.
Yunjie Wu,Yapeng Meng,Zhipeng Hu,Lincheng Li,Haoqian Wu,Kun Zhou,Weiwei Xu,Xin Yu
http://arxiv.org/abs/2312.00375v1
Compressor summary: The paper presents a text-guided framework for generating and editing 3D faces using geometry-texture decoupling and diffusion models, with improved quality and consistency.
Alexey V. Calabourdin,Konstantin A. Aksenov
http://arxiv.org/abs/2312.00373v1
Compressor summary: The authors present an online learning method for hierarchical bayesian models, a generalized fat-tailed LTV model, and its application to commercial LTV data.
Jiayi Li,Rohan Taori,Tatsunori B. Hashimoto
http://arxiv.org/abs/2312.00364v1
Compressor summary: Active learning's effectiveness on large-real world datasets is underexplored; existing research mostly ignores multi-domain data, which our new benchmark and dataset aim to address.
Ziyu Wang,Yue Xu,Cewu Lu,Yong-Lu Li
http://arxiv.org/abs/2312.00362v1
Compressor summary: The authors present a new method for efficient machine learning with videos by disentangling static and dynamic information using still images and a learnable memory block.
Shaohua Dong,Yunhe Feng,Qing Yang,Yan Huang,Dongfang Liu,Heng Fan
http://arxiv.org/abs/2312.00360v1
Compressor summary: The paper introduces DPLNet, a simple and efficient network for multimodal semantic segmentation that adapts a frozen pre-trained RGB model with two prompt learning modules.
Yefan Zhou,Tianyu Pang,Keqin Liu,Charles H. Martin,Michael W. Mahoney,Yaoqing Yang
http://arxiv.org/abs/2312.00359v1
Compressor summary: TempBalance is a layer-wise learning rate method based on Heavy-Tailed Self-Regularization Theory, which improves performance in neural network training by balancing temperature across layers.
Pei-Chi Lo,Yi-Hang Tsai,Ee-Peng Lim,San-Yih Hwang
http://arxiv.org/abs/2312.00353v1
Compressor summary: The paper studies how large language models use their pre-trained knowledge graphs for reasoning tasks and identifies two types of hallucinations that may occur.
Haokun Chen,Xu Yang,Yuhang Huang,Zihan Wu,Jing Wang,Xin Geng
http://arxiv.org/abs/2312.00351v1
Compressor summary: The paper proposes two strategies to improve In-context Learning for Vision-Language Models by manipulating the label space of in-context examples, leading to better classification performance on various datasets.
Julian Michael
http://arxiv.org/abs/2312.00349v1
Compressor summary: The author proposes a method to develop scalable theories of linguistic structure using machine learning and Question-Answer driven Semantic Role Labeling, aiming to contribute to intelligible AI systems.
Xiao Wang,Yaoyu Li,Tian Gan,Zheng Zhang,Jingjing Lv,Liqiang Nie
http://arxiv.org/abs/2312.00347v1
Compressor summary: The RTQ framework tackles challenges in video-language understanding by refining information, modeling temporal relations, and querying task-specific details, achieving high performance without pre-training.
Xianda Guo,Juntao Lu,Chenming Zhang,Yiqi Wang,Yiqun Duan,Tian Yang,Zheng Zhu,Long Chen
http://arxiv.org/abs/2312.00343v1
Compressor summary: The paper introduces OpenStereo, a comprehensive and efficient stereo matching toolbox with over 12 network models, and evaluates its performance on the SceneFlow dataset.
Dohyeong Kim,Songhwai Oh
http://arxiv.org/abs/2312.00342v1
Compressor summary: The paper presents off-policy TRC, an RL method with CVaR constraints that uses surrogate functions to reduce estimation error and adapts trust-region constraint to ensure policy stability in complex environments.
Shilin Qu,Weiqing Wang,Yuan-Fang Li,Xin Zhou,Fajie Yuan
http://arxiv.org/abs/2312.00336v1
Compressor summary: The paragraph describes a novel one-stage message passing paradigm for hypergraph node representation learning that combines Transformers and hypergraph Laplacian to model both global and local information, achieving state-of-the-art results on semi-supervised hypernode classification.
Ziyu Zhou,Haozhe Luo,Jiaxuan Pang,Xiaowei Ding,Michael Gotway,Jianming Liang
http://arxiv.org/abs/2312.00335v1
Compressor summary: PEAC is a new self-supervised learning approach for medical images that leverages anatomical consistency to improve performance and interpretability in various downstream tasks.
Yuyi Mao,Xianghao Yu,Kaibin Huang,Ying-Jun Angela Zhang,Jun Zhang
http://arxiv.org/abs/2312.00333v1
Compressor summary: The text discusses how artificial intelligence technologies are becoming essential across various industries, but their resource-intensive nature and the need for large amounts of data pose challenges for edge AI on wireless networks near end-user devices, requiring an energy-conscious approach to ensure optimal performance.
Peng Wang
http://arxiv.org/abs/2312.00332v1
Compressor summary: The paper proposes a method for matching weak informative ontologies (WIOs) using semantic subgraphs and a similarity propagation model that balances efficiency and quality in ontology matching.
Gongye Liu,Menghan Xia,Yong Zhang,Haoxin Chen,Jinbo Xing,Xintao Wang,Yujiu Yang,Ying Shan
http://arxiv.org/abs/2312.00330v1
Compressor summary: StyleCrafter is a method that improves text-to-video models to generate diverse, stylized videos by using a style control adapter trained with style-rich images and a decoupling learning strategy.
Zhangcheng Qiang,Weiqing Wang,Kerry Taylor
http://arxiv.org/abs/2312.00326v1
Compressor summary: The paper introduces a novel agent-powered language model approach for ontology matching, which improves performance on complex and few-shot tasks compared to existing systems.
Dengbo Li,Jieren Cheng,Boyi Liu
http://arxiv.org/abs/2312.00316v1
Compressor summary: The paper proposes a new way to move cameras in self-driving cars using neural networks that works better by sharing some tasks with a remote server.
Seyedalireza Khoshsirat,Chandra Kambhamettu
http://arxiv.org/abs/2312.00313v1
Compressor summary: The paper introduces a novel method to use the James-Stein estimator in deep learning normalization layers, which improves mean and variance estimation and enhances computer vision task performance without extra computational cost.
Yiming Zhao,Tao Zhou,Yunqi Gu,Yi Zhou,Yizhe Zhang,Ye Wu,Huazhu Fu
http://arxiv.org/abs/2312.00312v1
Compressor summary: The paper proposes a novel method, SAM-CLNet, for scribble-supervised polyp segmentation using a collaborative learning process between the segmentation network and SAM to boost performance and outperforms existing methods.
Zidu Wang,Xiangyu Zhu,Tianshuo Zhang,Baiqin Wang,Zhen Lei
http://arxiv.org/abs/2312.00311v1
Compressor summary: The paper introduces Part Re-projection Distance Loss (PRDL), a method that uses facial part segmentation geometry to improve 3D face reconstruction with extreme expressions, outperforming renderer-based methods in various experiments.
Longfeng Nie,Yuntian Chen,Mengge Du,Changqi Sun,Dongxiao Zhang
http://arxiv.org/abs/2312.00308v1
Compressor summary: The text describes a new cloud-type identification system called CldNet that uses satellite data and improves the accuracy of identifying different cloud types.
Niranjan Rajesh,Debayan Gupta
http://arxiv.org/abs/2312.00304v1
Compressor summary: DPT is a curriculum-based pre-training approach for deep neural networks that teaches basic features like edges and shapes, inspired by human visual development, to address data scarcity issues in object recognition tasks.
Biqian Cheng,Evangelos E. Papalexakis,Jia Chen
http://arxiv.org/abs/2312.00296v1
Compressor summary: The proposed ACCA method aligns multiple data perspectives and embeds them in a correlated latent space using an iterative approach.
Baohua Zhang,Yongyi Huang,Wenyao Cui,Huaping Zhang,Jianyun Shang
http://arxiv.org/abs/2312.00293v1
Compressor summary: The paper proposes PsyAttention, a method for personality detection that adapts different psychological models, encodes features more effectively, and achieves higher accuracy than existing methods.
Anku Rani,Dwip Dalal,Shreya Gautam,Pankaj Gupta,Vinija Jain,Aman Chadha,Amit Sheth,Amitava Das
http://arxiv.org/abs/2312.00292v1
Compressor summary: This study proposes a novel framework using NLP techniques to detect lies of omission in deception, and analyzes their relationship with propaganda techniques.
Peetak P. Mitra,Vivek Ramavajjala
http://arxiv.org/abs/2312.00290v1
Compressor summary: The paper proposes a two-stage method that enables adding new diagnostic variables to a weather prediction model without retraining the entire model, using an autoencoder to embed prognostic variables into a latent space and then training downstream models on these representations.
Xingqiu He,Chaoqun You,Tony Q. S. Quek
http://arxiv.org/abs/2312.00279v1
Compressor summary: The text discusses a new definition of Age of Information for mobile edge computing applications, which can be minimized using reinforcement learning algorithms with post-decision states to improve performance and efficiency.
Lei Sha,Thomas Lukasiewicz
http://arxiv.org/abs/2312.00277v1
Compressor summary: The paper proposes a semi-supervised contrastive learning method for disentangling attributes in text without changing content, which improves on previous methods by using a closed-loop process and reducing computation cost.
Kazuki Irie,Róbert Csordás,Jürgen Schmidhuber
http://arxiv.org/abs/2312.00276v1
Compressor summary: The text proposes Automated Continual Learning (ACL), a method that trains neural networks to meta-learn their own algorithms for preventing catastrophic forgetting in changing environments, and shows its effectiveness on various image classification tasks.
Teo Susnjak,Elise Griffin,Mitchell McCutcheon,Kathleen Potter
http://arxiv.org/abs/2312.00271v1
Compressor summary: The researchers developed an interpretable machine learning survival model for elderly aged care residents, which can predict 6-month survival probabilities based on various factors like age, gender, health status, and more.
Hayden Moore
http://arxiv.org/abs/2312.00269v1
Compressor summary: The paper proposes synchronizing robust data operations and model fine-tuning driven by uncertainty quantification (UQ) to improve adaptability of computer vision (CV) systems in command and control (C2) at the tactical edge.
Hugo Jair Escalante,Aleksandra Kruchinina
http://arxiv.org/abs/2312.00268v1
Compressor summary: Academic challenges in machine learning and related fields advance research, highlight specific topics and problems, and promote diversity in participation and access to research.
Viraj Mehta,Vikramjeet Das,Ojash Neopane,Yijia Dai,Ilija Bogunovic,Jeff Schneider,Willie Neiswanger
http://arxiv.org/abs/2312.00267v1
Compressor summary: The paper proposes an algorithm that optimizes contextual choice for human feedback in reinforcement learning, improving performance and reducing sample cost.