This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-21 generated by the compressor, my personal LLM-based project.
Quan Sun,Yufeng Cui,Xiaosong Zhang,Fan Zhang,Qiying Yu,Zhengxiong Luo,Yueze Wang,Yongming Rao,Jingjing Liu,Tiejun Huang,Xinlong Wang
http://arxiv.org/abs/2312.13286v1
Compressor summary: Emu2 is a large, 37 billion parameter generative multimodal model that excels at solving various multimodal tasks with minimal input using in-context learning and reasoning.
Fangjinhua Wang,Marie-Julie Rakotosaona,Michael Niemeyer,Richard Szeliski,Marc Pollefeys,Federico Tombari
http://arxiv.org/abs/2312.13285v1
Compressor summary: UniSDF is a 3D reconstruction method that accurately models complex scenes with reflections using blended representation techniques and a multi-resolution grid backbone.
Pierluigi Zama Ramirez,Luca De Luigi,Daniele Sirocchi,Adriano Cardace,Riccardo Spezialetti,Francesco Ballerini,Samuele Salti,Luigi Di Stefano
http://arxiv.org/abs/2312.13277v1
Compressor summary: The paper introduces nf2vec, a framework that converts Neural Fields representing 3D data into compact embeddings for use in deep learning tasks.
Junwu Zhang,Zhenyu Tang,Yatian Pang,Xinhua Cheng,Peng Jin,Yida Wei,Wangbo Yu,Munan Ning,Li Yuan
http://arxiv.org/abs/2312.13271v1
Compressor summary: Repaint123 improves 3D image generation by combining a 2D diffusion model, repainting strategy, and visibility-aware adaptive strength for consistent multi-view images with fine textures and fast speed.
Zixiang Wei,Yiting Wang,Lichao Sun,Athanasios V. Vasilakos,Lin Wang
http://arxiv.org/abs/2312.13265v1
Compressor summary: ClassLIE is a novel framework that combines CNNs and transformers to enhance low-light images by classifying and adaptively learning structural and illumination information, achieving state-of-the-art performance.
Pablo M. Rodriguez Bertorello,Jean Rodmond Junior Laguerre
http://arxiv.org/abs/2312.13264v1
Compressor summary: dIR is a novel approach that allows querying structured and unstructured data using natural language and converting it into SQL for efficient retrieval.
Rajesh Shrestha,Bowen Xie
http://arxiv.org/abs/2312.13253v1
Compressor summary: This paper proposes methods to speed up conditional image generation using pre-trained diffusion models and reduce the need for training and computational resources.
Saurabh Saxena,Junhwa Hur,Charles Herrmann,Deqing Sun,David J. Fleet
http://arxiv.org/abs/2312.13252v1
Compressor summary: The paper proposes DMD, a diffusion model that uses log-scale depth parameterization, FOV conditioning, and synthetic data augmentation to achieve significant improvements in zero-shot metric depth estimation for indoor and outdoor scenes.
Amit Rozner,Barak Battash,Ofir Lindenbaum,Lior Wolf
http://arxiv.org/abs/2312.13240v1
Compressor summary: The paper proposes a novel face verification method that uses a hypernetwork to generate efficient neural models for personalized face identification, achieving state-of-the-art results with fewer parameters and computational cost.
Subham Sekhar Sahoo,Aaron Gokaslan,Chris De Sa,Volodymyr Kuleshov
http://arxiv.org/abs/2312.13236v1
Compressor summary: The paper introduces MuLAN, a learned diffusion process for image synthesis that adapts noise levels across an image, improving performance on density estimation tasks.
Christian A. Scholbeck,Julia Moosbauer,Giuseppe Casalicchio,Hoshin Gupta,Bernd Bischl,Christian Heumann
http://arxiv.org/abs/2312.13234v1
Compressor summary: The authors propose that sensitivity analysis, a method used to explain complex systems in various fields, can also be used to interpret machine learning models and highlight the benefits of this unified view for both researchers and practitioners.
Shiu-hong Kao,Jierun Chen,S. H. Gary Chan
http://arxiv.org/abs/2312.13223v1
Compressor summary: StableKD is a novel knowledge distillation framework that achieves more stable optimization and boosts model accuracy by breaking the Inter-Block Optimization Entanglement phenomenon using Decomposition and Recomposition operations.
Jean V. Alves,Diogo Leitão,Sérgio Jesus,Marco O. P. Sampaio,Pedro Saleiro,Mário A. T. Figueiredo,Pedro Bizarro
http://arxiv.org/abs/2312.13218v1
Compressor summary: The paper introduces FiFAR, a synthetic dataset for learning to defer algorithms in financial fraud detection, which considers human work capacity constraints and allows for benchmarking of hybrid human-AI decision systems.
Octave Mariotti,Oisin Mac Aodha,Hakan Bilen
http://arxiv.org/abs/2312.13216v1
Compressor summary: The authors propose a new self-supervised method for semantic correspondence estimation that uses weak 3D understanding with a spherical prior, improving performance on challenging image characteristics like symmetries and repeated parts.
Rahul Chand,Yashoteja Prabhu,Pratyush Kumar
http://arxiv.org/abs/2312.13211v1
Compressor summary: DSFormer is a novel weight factorization method for compressing transformer models in natural language understanding that improves efficiency-accuracy trade-off by using a semi-structured sparse matrix and a task-aware learning algorithm.
Yingji Zhang,Danilo S. Carvalho,Ian Pratt-Hartmann,André Freitas
http://arxiv.org/abs/2312.13208v1
Compressor summary: LlaMaVAE combines a sentence encoder (sentenceT5) with a language model (LlaMA) and a VAE to improve text generation control and performance on various tasks compared to previous models.
Neeraj Kumar Singh,Koyel Ghosh,Joy Mahapatra,Utpal Garain,Apurbalal Senapati
http://arxiv.org/abs/2312.13193v1
Compressor summary: The paper proposes HCDIR, an end-to-end model for detecting hateful comments and reducing their intensity in social media posts, focusing on low-resource languages like Indian languages.
Arshad Kaji,Manan Shah
http://arxiv.org/abs/2312.13179v1
Compressor summary: Large language models perform well in many tasks but struggle with code switching in machine translation due to their training methods.
Zhuangzhuang Jia,Grani A. Hanasusanto,Phebe Vayanos,Weijun Xie
http://arxiv.org/abs/2312.13173v1
Compressor summary: Key points: - The text is about learning fair policies for multi-stage selection problems from observational data - The proposed framework can handle various fairness constraints and linear selection rules - The approach improves precision and reduces unfairness compared to the existing policy Summary: The paper proposes a framework for learning fair and interpretable policies for multi-stage selection problems using causal inference and optimization.
Erez Peterfreund,Iryna Burak,Ofir Lindenbaum,Jim Gimlett,Felix Dietrich,Ronald R. Coifman,Ioannis G. Kevrekidis
http://arxiv.org/abs/2312.13155v1
Compressor summary: The paper proposes a neural network pipeline that fuses partial and heterogeneous measurements from different sensors by using multiple slightly perturbed instances to estimate local distortion and create a consistent latent space.
Zhongchang Sun,Yousef El-Laham,Svitlana Vyetrenko
http://arxiv.org/abs/2312.13152v1
Compressor summary: Key points: - The paper proposes a change point detection algorithm for time series modeled as neural SDEs - The algorithm jointly learns the change points and the SDE parameters using GANs - The algorithm outperforms classical benchmarks, standard GAN-based neural SDEs, and other deep generative models Summary: The paper presents a novel algorithm that uses GANs to detect change points and learn neural SDE models for time series data.
Stanislaw Szymanowicz,Christian Rupprecht,Andrea Vedaldi
http://arxiv.org/abs/2312.13150v1
Compressor summary: Splatter Image is a fast and accurate monocular 3D object reconstruction method based on Gaussian Splatting and neural networks.
Yousef El-Laham,Elizabeth Fons,Dillon Daudert,Svitlana Vyetrenko
http://arxiv.org/abs/2312.13141v1
Compressor summary: UMAP Mixup is a new data augmentation technique for deep learning models that uses uniform manifold approximation and projection to create synthetic samples on the data manifold, improving generalization performance in various regression tasks.
Edoardo Debenedetti,Zishen Wan,Maksym Andriushchenko,Vikash Sehwag,Kshitij Bhardwaj,Bhavya Kailkhura
http://arxiv.org/abs/2312.13131v1
Compressor summary: The paper derives scaling laws for adversarial robustness and analyzes the impact of computing power, model size, and training techniques on performance improvements.
Haili Ye,Xiaoqing Zhang,Yan Hu,Huazhu Fu,Jiang Liu
http://arxiv.org/abs/2312.13116v1
Compressor summary: The paper proposes a novel network (VSR-Net) to improve segmentation of vessel-like structures in medical images by rehabilitating subsection ruptures and calibrating model predictions.
Oguzhan Ulucan,Diclehan Ulucan,Marc Ebner
http://arxiv.org/abs/2312.13114v1
Compressor summary: The text discusses how analyzing color illusions can help improve computational color constancy methods, enabling them to estimate light sources in scenes with multiple illuminants.
Daiki Koge,Naoaki Ono,Shigehiko Kanaya
http://arxiv.org/abs/2312.13110v1
Compressor summary: The text introduces Boltzmann GNN, a pre-training method for molecular GNNs that generates latent vectors for multiple conformations from 2D molecular graphs, outperforming existing methods.
Difei Gao,Lei Ji,Zechen Bai,Mingyu Ouyang,Peiran Li,Dongxing Mao,Qinchen Wu,Weichen Zhang,Peiyi Wang,Xiangwu Guo,Hengxu Wang,Luowei Zhou,Mike Zheng Shou
http://arxiv.org/abs/2312.13108v1
Compressor summary: The paper introduces AssistGUI, a new benchmark to test AI agents' abilities to automate complex tasks on Windows using mouse and keyboard input, and proposes an improved framework that performs better than existing methods but still has room for improvement.
Sushil Sharma,Aryan Singh,Ganesh Sistu,Mark Halton,Ciarán Eising
http://arxiv.org/abs/2312.13104v1
Compressor summary: The authors propose using Graph Neural Networks and Bird's Eye View perspectives for future trajectory prediction in autonomous driving systems, improving on traditional DNN-based methods.
Jinge Wu,Yunsoo Kim,Eva C. Keller,Jamie Chow,Adam P. Levine,Nikolas Pontikos,Zina Ibrahim,Paul Taylor,Michelle C. Williams,Honghan Wu
http://arxiv.org/abs/2312.13103v1
Compressor summary: The paper presents an LLM-based assistant for radiologists that helps them find errors in their reports and outperforms other models and clinicians on a SIMPLE error-checking task.
Li Ma,Vasu Agrawal,Haithem Turki,Changil Kim,Chen Gao,Pedro Sander,Michael Zollhöfer,Christian Richardt
http://arxiv.org/abs/2312.13102v1
Compressor summary: The paper proposes a learnable Gaussian directional encoding to better model glossy surfaces' appearance under near-field lighting and introduces a data-driven geometry prior to improve specular reflection modeling in neural radiance fields.
William Heyden,Habib Ullah,M. Salman Siddiqui,Fadi Al Machot
http://arxiv.org/abs/2312.13100v1
Compressor summary: The paper introduces a dual strategy for GZSL that enhances semantic information with an innovative encoder and refines generative capabilities with a novel loss function, improving generalization and performance in diverse settings.
Elizaveta Kuznetsova,Mykola Makhortykh,Victoria Vziatysheva,Martha Stolze,Ani Baghumyan,Aleksandra Urman
http://arxiv.org/abs/2312.13096v1
Compressor summary: The article compares two AI-based chatbots' ability to detect true and false statements about political topics in different languages and finds that ChatGPT performs better than Bing Chat.
Abdallah Dib,Luiz Gustavo Hafemann,Emeline Got,Trevor Anderson,Amin Fadaeinejad,Rafael M. O. Cruz,Marc-Andre Carbonneau
http://arxiv.org/abs/2312.13091v1
Compressor summary: MoSAR is a method to generate 3D avatars from monocular images using semi-supervised learning and differentiable shading, producing realistic and relightable results.
Joseph Heyward,João Carreira,Dima Damen,Andrew Zisserman,Viorica Pătrăucean
http://arxiv.org/abs/2312.13090v1
Compressor summary: The First Perception Test challenge assessed various video models on seven tasks involving different modalities at the ICCV 2023 conference.
Alexandra Zytek,Wei-En Wang,Dongyu Liu,Laure Berti-Equille,Kalyan Veeramachaneni
http://arxiv.org/abs/2312.13084v1
Compressor summary: Pyreal is a system that helps users create understandable explanations for machine learning predictions using Python.
Sushil Sharma,Arindam Das,Ganesh Sistu,Mark Halton,Ciarán Eising
http://arxiv.org/abs/2312.13081v1
Compressor summary: The paper proposes BEVSeg2TP, a system that predicts the ego vehicle's future trajectory using semantic segmentation of objects in surround-view camera images and a spatiotemporal probabilistic network.
Xingyilang Yin,Xi Yang,Liangchen Liu,Nannan Wang,Xinbo Gao
http://arxiv.org/abs/2312.13071v1
Compressor summary: PDNet is a new MLP-based network that uses Point Deformable Aggregation Module (PDAM) to capture long-range dependencies in point cloud analysis by aggregating information from adaptive deformable reference points, improving representation capability and performance.
Abdulkadir Celikkanat,Nikolaos Nakis,Morten Mørup
http://arxiv.org/abs/2312.13068v1
Compressor summary: The text proposes GraSSP, a novel stochastic process using survival functions to model intermittent edge-persistent networks, improving representation learning for evolving networks.
Yue-Jiang Dong,Yuan-Chen Guo,Ying-Tian Liu,Fang-Lue Zhang,Song-Hai Zhang
http://arxiv.org/abs/2312.13066v1
Compressor summary: PPEA-Depth is a method to improve self-supervised depth estimation in dynamic scenes by transferring knowledge from pre-trained image models using compact encoder and decoder adapters.
Jordan Vice,Naveed Akhtar,Richard Hartley,Ajmal Mian
http://arxiv.org/abs/2312.13053v1
Compressor summary: The paper proposes a method to evaluate biases in text-to-image models without preconceived notions, using three metrics and testing on various scenarios.
Weixuan Wang,Barry Haddow,Alexandra Birch
http://arxiv.org/abs/2312.13040v1
Compressor summary: ReMaKE is a method to update LLMs' knowledge in multilingual settings using retrieved information from a multilingual database.
Raphael Fischer,Amal Saadallah
http://arxiv.org/abs/2312.13038v1
Compressor summary: AutoXPCR is a novel AutoML method that selects and explains DNNs for time series forecasting based on predictive quality, complexity, and resource consumption.
Weigang Lu,Ziyu Guan,Wei Zhao,Long Jin
http://arxiv.org/abs/2312.13032v1
Compressor summary: The paper introduces NodeMixup, a method to improve graph neural networks' performance by addressing the under-reaching issue caused by uneven labeled node distribution in graphs.
Zijian Li,Zhihui Wang
http://arxiv.org/abs/2312.13031v1
Compressor summary: DP-SACTGAN is a new framework for creating private and accurate tabular data using GANs.
Byung Hyun Lee,Min-hwan Oh,Se Young Chun
http://arxiv.org/abs/2312.13027v1
Compressor summary: The paper proposes DPCL, a novel framework for task-free continual learning that uses input and decision-making perturbations to prevent catastrophic forgetting and improve plasticity.
Yuming Gu,Hongyi Xu,You Xie,Guoxian Song,Yichun Shi,Di Chang,Jing Yang,Lingjie Luo
http://arxiv.org/abs/2312.13016v1
Compressor summary: Key points: - DiffPortrait3D is a conditional diffusion model that synthesizes 3D-consistent photo-realistic novel views from a single portrait - It uses generative prior of 2D diffusion models, disentangled attentive control of appearance and camera pose, and cross-view attention module - It achieves state-of-the-art results on in-the-wild and multi-view benchmarks Summary: DiffPortrait3D is a novel method that can generate realistic 3D facial details from a single portrait by using diffusion models, attention mechanisms, and 3D noise.
Dong Huang,Qingwen Bu,Jie M. Zhang,Michael Luck,Heming Cui
http://arxiv.org/abs/2312.13010v1
Compressor summary: AgentCoder is a novel multi-agent framework that improves code generation by collaboratively generating test cases, executing them, and providing feedback to the programmer agent.
Ishan Rajendrakumar Dave,Simon Jenni,Mubarak Shah
http://arxiv.org/abs/2312.13008v1
Compressor summary: The authors propose a new frame-level temporal self-supervision method for videos that improves feature learning and generalization performance on various tasks.
Jiaxi Cui,Liuzhenghao Lv,Jing Wen,Jing Tang,YongHong Tian,Li Yuan
http://arxiv.org/abs/2312.12999v1
Compressor summary: The paper introduces Machine Mindset, a method to integrate MBTI personality traits into large language models for personalized AI applications.
Bruno Arcanjo,Bruno Ferrarini,Maria Fasli,Michael Milford,Klaus D. McDonald-Maier,Shoaib Ehsan
http://arxiv.org/abs/2312.12995v1
Compressor summary: RegionDrosoNet is a novel multi-DrosoNet system that achieves improved visual place recognition performance while maintaining low computational requirements by specializing DrosoNets on different partitions of the image and using a voting module to combine outputs.
Emily Groves,Minhong Wang,Yusuf Abdulle,Holger Kunz,Jason Hoelscher-Obermaier,Ronin Wu,Honghan Wu
http://arxiv.org/abs/2312.12989v1
Compressor summary: The study compares NLP paradigms for biomedical ontology curation and shows that in-context learning (ICL) with GPT-4 excels in tasks requiring less data, while fine-tuning (FT) and supervised learning (ML) perform better with more data.
Dhawal Gupta,Scott M. Jordan,Shreyas Chaudhari,Bo Liu,Philip S. Thomas,Bruno Castro da Silva
http://arxiv.org/abs/2312.12972v1
Compressor summary: The paper proposes a bidirectional value function that considers both past and future rewards to improve credit assignment and policy evaluation in reinforcement learning.
Junjie Gao,Pengfei Wang,Qiujie Dong,Qiong Zeng,Shiqing Xin,Caiming Zhang
http://arxiv.org/abs/2312.12970v1
Compressor summary: D3Former is a new point cloud matching method that jointly learns repeatable keypoint detectors and feature-enhanced descriptors, improving accuracy on indoor and outdoor benchmarks.
Thibaud Ehret,Roger Marí,Dawa Derksen,Nicolas Gasnier,Gabriele Facciolo
http://arxiv.org/abs/2312.12961v1
Compressor summary: The paper introduces "radar fields", an extension of radiance fields to radar images, enabling surface modeling from radar image collections and hybrid methods with optical images.
Eerik Alamikkotervo,Risto Ojala,Alvari Seppänen,Kari Tammi
http://arxiv.org/abs/2312.12954v1
Compressor summary: TADAP is a method for automatically labeling drivable areas in winter conditions using satellite trajectories and pre-trained visual features, which improves self-supervised driving detection by 9.6%.
Hamidreza Gholamrezaei,Alireza Koochali,Andreas Dengel,Sheraz Ahmed
http://arxiv.org/abs/2312.12946v1
Compressor summary: SNS-GAN is a new generative model that embeds class labels in the noise space and performs well in both image and time series data generation.
The Tien Mai
http://arxiv.org/abs/2312.12945v1
Compressor summary: The study analyzes the prediction error of two logistic regression methods for 1-bit matrix completion and shows that nuclear-norm penalization achieves the optimal rate.
Jonathan Wilton,Nan Ye
http://arxiv.org/abs/2312.12937v1
Compressor summary: The paper studies how to train decision trees with noisy labels using robust loss functions, proposes a framework for constructing them, and introduces a new loss called negative exponential loss.
Eleonora Poeta,Gabriele Ciravegna,Eliana Pastor,Tania Cerquitelli,Elena Baralis
http://arxiv.org/abs/2312.12936v1
Compressor summary: The paper reviews concept-based explainable artificial intelligence (C-XAI) approaches, defines and categorizes them, and suggests evaluation strategies to help advance the field.
Lucia Testa,Claudio Battiloro,Stefania Sardellitti,Sergio Barbarossa
http://arxiv.org/abs/2312.12934v1
Compressor summary: This paper investigates how changing a small number of edges in a graph affects the stability of Graph Convolutional Neural Networks (GCNs) and provides a way to measure and analyze this effect.
Yi-Fan Zhang,Zhang Zhang,Liang Wang,Rong Jin
http://arxiv.org/abs/2312.12918v1
Compressor summary: The text explores advanced language models for detecting AI-generated texts across different topics without needing labeled data, addressing challenges in real-world scenarios.
Pan Xie,Taiyi Peng,Yao Du,Qipeng Zhang
http://arxiv.org/abs/2312.12917v1
Compressor summary: The research presents a new method to generate high-quality sign videos from sign glosses using improved 3D VQ-GAN and sequence-to-sequence attention, achieving better results than previous approaches on two datasets.
Shuyuan Wang,Qi Li,Huiyuan Luo,Chengkan Lv,Zhengtao Zhang
http://arxiv.org/abs/2312.12913v1
Compressor summary: POUTA is a novel method for visual anomaly detection that improves accuracy and efficiency by reusing and refining features from a reconstruction-based network.
Pau Torras,Sanket Biswas,Alicia Fornés
http://arxiv.org/abs/2312.12908v1
Compressor summary: The paper proposes a new music representation language, Music Tree Notation (MTN), to enable standardized comparison of Optical Music Recognition systems using a specific set of metrics.
Xiangjuan Li,Feifan Li,Yang Li,Quan Pan
http://arxiv.org/abs/2312.12904v1
Compressor summary: The paper proposes a generative model to create adversarial examples for deep reinforcement learning, measuring stealthiness by action consistency ratio, and showing fast and effective attacks compared to other methods.
William Hill,Ireton Liu,Anita De Mello Koch,Damion Harvey,George Konidaris,Steven James
http://arxiv.org/abs/2312.12891v1
Compressor summary: The authors create a new Minecraft planning benchmark that tests state-of-the-art planners on various challenges and provide a framework for creating new tasks.
Junkang Wu,Jiawei Chen,Jiancan Wu,Wentao Shi,Jizhi Zhang,Xiang Wang
http://arxiv.org/abs/2312.12882v1
Compressor summary: The paper investigates why Softmax loss performs well in recommendation models and proposes a new loss function, Bilateral SoftMax Loss, that improves robustness and fairness on both positive and negative examples.
Sara El Mekkaoui,Loubna Benabbou,Abdelaziz Berrado
http://arxiv.org/abs/2312.12878v1
Compressor summary: The paper reviews different approaches for extracting rules from feedforward neural networks to improve interpretability in AI systems.
Wenbin Lin,Chengwei Zheng,Jun-Hai Yong,Feng Xu
http://arxiv.org/abs/2312.12877v1
Compressor summary: The text proposes a method to create realistic 3D digital avatars that can adapt to different lighting and poses, using novel techniques for modeling geometry and shadow changes.
Bo Liu,Liqiang Yu,Chang Che,Qunwei Lin,Hao Hu,Xinyu Zhao
http://arxiv.org/abs/2312.12872v1
Compressor summary: The paper analyzes how deep learning and computer vision technologies are integrated for better image classification and object detection, while discussing their limitations and future directions.
Yu Liu,Runzhe Wan,James McQueen,Doug Hains,Jinxiang Gu,Rui Song
http://arxiv.org/abs/2312.12871v1
Compressor summary: The paper proposes two data-driven methods for selecting the assumed effect size in online experiments, which can improve accuracy and efficiency compared to traditional domain knowledge-based methods.
Wenqi Jia,Miao Liu,Hao Jiang,Ishwarya Ananthabhotla,James M. Rehg,Vamsi Krishna Ithapu,Ruohan Gao
http://arxiv.org/abs/2312.12870v1
Compressor summary: The paper introduces Av-CONV, a multi-modal, multi-task framework that predicts exocentric conversational interactions from egocentric videos, outperforming baselines in experiments.
Théo Vincent,Alberto Maria Metelli,Boris Belousov,Jan Peters,Marcello Restelli,Carlo D'Eramo
http://arxiv.org/abs/2312.12869v1
Compressor summary: The paper proposes a new reinforcement learning algorithm that learns an approximate version of the Bellman operator to improve generalization and avoid projection steps, which is called projected Bellman operator (PBO).
Ardavan S. Nobandegani,Irina Rish,Thomas R. Shultz
http://arxiv.org/abs/2312.12868v1
Compressor summary: The text studies how trust emerges in human social interactions using reinforcement learning and simulations to analyze the trust game.
Fernando Pérez-García,Sam Bond-Taylor,Pedro P. Sanchez,Boris van Breugel,Daniel C. Castro,Harshita Sharma,Valentina Salvatelli,Maria T. A. Wetscherek,Hannah Richardson,Matthew P. Lungren,Aditya Nori,Javier Alvarez-Valle,Ozan Oktay,Maximilian Ilse
http://arxiv.org/abs/2312.12865v1
Compressor summary: The text proposes using generative image editing with a text-to-image diffusion model to simulate dataset shifts and diagnose failure modes of biomedical vision models, improving their performance and robustness without additional data collection.
Zhecheng Wang,Rajanie Prabha,Tianyuan Huang,Jiajun Wu,Ram Rajagopal
http://arxiv.org/abs/2312.12856v1
Compressor summary: The authors create SkyScript, a large vision-language dataset for remote sensing images, by linking them to OpenStreetMap data using geo-coordinates, and use it to train a versatile vision language model that improves zero-shot scene classification and other tasks.
Dan Shi,Chaobin You,Jiantao Huang,Taihao Li,Deyi Xiong
http://arxiv.org/abs/2312.12853v1
Compressor summary: The paper introduces CORECODE, a dataset for evaluating Chinese language models' commonsense reasoning and conflict detection skills in dialogues, by annotating 19,700 conversations with 76,787 pieces of commonsense knowledge.
Bram Vanroy
http://arxiv.org/abs/2312.12852v1
Compressor summary: The authors introduce two fine-tuned Dutch language models and provide data, benchmarks, and a leaderboard to improve the state of the art in Dutch natural language processing.
Michael Dalvean
http://arxiv.org/abs/2312.12850v1
Compressor summary: The paper uses stochastic methods to rank English place names by their similarity to other languages' place names, helping to determine their origin.
Naiyu Yin,Tian Gao,Yue Yu,Qiang Ji
http://arxiv.org/abs/2312.12844v1
Compressor summary: The paper proposes a novel method for learning causal DAGs that accounts for heteroscedastic noise, which improves accuracy and efficiency over existing methods.
Hannah Blocher,Georg Schollmeyer,Malte Nalenz,Christoph Jansen
http://arxiv.org/abs/2312.12839v1
Compressor summary: The paper proposes a depth function for partial orders and uses it to compare machine learning algorithms on standard data sets, offering a novel analysis approach.
Yuhao Yi,Ronghui You,Hong Liu,Changxin Liu,Yuan Wang,Jiancheng Lv
http://arxiv.org/abs/2312.12835v1
Compressor summary: The paper proposes near-optimal resilient aggregation rules for Byzantine machine learning using outlier-robust clustering, and discusses attacks and a two-phase framework to improve security.
Yiwei Li,Peiwen Yuan,Shaoxiong Feng,Boyuan Pan,Bin Sun,Xinglin Wang,Heda Wang,Kan Li
http://arxiv.org/abs/2312.12832v1
Compressor summary: The authors propose a model specialization framework that uses both positive and negative samples to distill reasoning ability from large language models for arithmetic reasoning tasks.
Yuqi Lin,Minghao Chen,Kaipeng Zhang,Hengjia Li,Mingming Li,Zheng Yang,Dongqin Lv,Binbin Lin,Haifeng Liu,Deng Cai
http://arxiv.org/abs/2312.12828v1
Compressor summary: The paper proposes a local-to-global framework to improve CLIP's multi-label classification performance by preserving patch-wise spatial information and applying it to weakly supervised semantic segmentation.
Yuhui Wu,Guoqing Wang,Zhiwen Wang,Yang Yang,Tianyu Li,Peng Wang,Chongyi Li,Heng Tao Shen
http://arxiv.org/abs/2312.12826v1
Compressor summary: ReCo-Diff is a novel method that uses Retinex theory as a pre-processing condition to improve low-light image enhancement by guiding a conditional diffusion model with feature- and image-level information.
Zhangbin Li,Dan Guo,Jinxing Zhou,Jing Zhang,Meng Wang
http://arxiv.org/abs/2312.12816v1
Compressor summary: The paper proposes a model that uses fine-grained visual objects and multi-modal relations to answer questions from untrimmed audible videos, improving both feature interaction and model optimization.
Luke Yoffe,Aditya Sharma,Tobias Höllerer
http://arxiv.org/abs/2312.12815v1
Compressor summary: The paper presents an open-vocabulary method for placing virtual objects in augmented reality using recent advances in segmentation, vision-language, and LLMs, and shows its performance compared to human experts.
Hiroki Onozeki,Zhiyang Qi,Kazuma Akiyama,Ryutaro Asahara,Takumasa Kaneko,Michimasa Inaba
http://arxiv.org/abs/2312.12808v1
Compressor summary: The paper presents a dialogue system for a travel agency that helps users choose sightseeing plans in Kyoto, using flexible and stable dialogue flow control and motion-speech cues.
Seunghoo Hong,Juhun Lee,Simon S. Woo
http://arxiv.org/abs/2312.12807v1
Compressor summary: The paper proposes a new approach to remove unwanted content from image generation models while maintaining their synthesis quality and user control.
Yan Cai,Linlin Wang,Ye Wang,Gerard de Melo,Ya Zhang,Yanfeng Wang,Liang He
http://arxiv.org/abs/2312.12806v1
Compressor summary: MedBench is a benchmark for Chinese medical language models that assesses their knowledge and reasoning abilities across various domains and findings reveal their strengths and weaknesses.
Bo Yang,Hong Peng,Xiaohui Luo,Jun Wang,Xianzhong Long
http://arxiv.org/abs/2312.12804v1
Compressor summary: The paper proposes a multi-stage attention architecture using NSNP neurons with autapses for breast cancer classification, which improves accuracy and preserves valuable data features.
Sahil Singla,Yifan Wang
http://arxiv.org/abs/2312.12794v1
Compressor summary: The paper studies sequential posted pricing auctions in the bandit learning model, obtaining tight regret bounds for various buyer distributions and showing a new half-concavity property of the revenue function.
Tianliang Ma,Zhihui Deng,Xuguang Sun,Leilai Shao
http://arxiv.org/abs/2312.12784v1
Compressor summary: The text proposes a machine learning model using graph neural networks for fast and accurate cell library characterization in semiconductor process development, achieving high prediction accuracy and significant speed-up compared to traditional methods.
Mrinal Mathur,Sergey Plis
http://arxiv.org/abs/2312.12781v1
Compressor summary: DynaLay is an adaptive deep learning model that adjusts its computational effort based on input complexity, improving efficiency without sacrificing accuracy.
Carol Anderson,Phil Crone
http://arxiv.org/abs/2312.12773v1
Compressor summary: Key points: - Text segmentation is a prerequisite for natural language processing tasks - Existing methods work well on narrative texts with distinct topics - The challenge is to segment marriage announcements from newspapers - The text is not structured, has noisy OCR output, and adjacent segments are similar - A novel deep learning model beats the state-of-the-art method Summary: The paper proposes a new deep learning method for segmenting noisy, unstructured newspaper marriage announcements, which are challenging for existing text segmentation techniques.
Beibei Jing,Youjia Zhang,Zikai Song,Junqing Yu,Wei Yang
http://arxiv.org/abs/2312.12763v1
Compressor summary: The Adaptable Motion Diffusion (AMD) model uses a Large Language Model to parse text descriptions of complex human motions into anatomical scripts and then synthesizes realistic motion sequences from them.
Wenhao Xu,Rongtao Xu,Changwei Wang,Shibiao Xu,Li Guo,Man Zhang,Xiaopeng Zhang
http://arxiv.org/abs/2312.12754v1
Compressor summary: The paper proposes SPT-SEG, a one-stage approach that uses Spectral Prompt Tuning and Spectral Guided Decoder to improve CLIP's zero-shot pixel-level segmentation performance on unseen classes.
Edmund Mills,Shiye Su,Stuart Russell,Scott Emmons
http://arxiv.org/abs/2312.12747v1
Compressor summary: ALMANACS is a benchmark to measure how well language model explainability methods improve prediction of new inputs on safety-relevant topics, and it finds no existing method outperforms the explanation-free control.
M Tran,C Sun
http://arxiv.org/abs/2312.12746v1
Compressor summary: The study presents an application that uses openFDA data to help caregivers in low-resource settings reduce medical errors and improve patient safety by analyzing prescriptions.
Lipeng Gu,Xuefeng Yan,Liangliang Nan,Dingkun Zhu,Honghua Chen,Weiming Wang,Mingqiang Wei
http://arxiv.org/abs/2312.12743v1
Compressor summary: PointeNet is a lightweight network for efficient point cloud analysis that captures 3D geometries and enhances semantic perception, outperforming existing methods on object-level and scene-level tasks.
Zhaoyang Zhang,Wenqi Shao,Yixiao Ge,Xiaogang Wang,Jinwei Gu,Ping Luo
http://arxiv.org/abs/2312.12742v1
Compressor summary: The Cached Transformer is a new model that improves self-attention with a memory cache, enabling it to handle longer dependencies and perform better on various language and vision tasks.
Masahiro Kato
http://arxiv.org/abs/2312.12741v1
Compressor summary: The paper proposes a new strategy to identify the best arm in two-armed Gaussian bandits with unknown variances, and shows that it is asymptotically optimal under a small-gap regime.
Yasmin Moslem,Rejwanul Haque,Andy Way
http://arxiv.org/abs/2312.12740v1
Compressor summary: The paper shows how fine-tuning the Mistral 7B language model improves its adaptive machine translation capabilities, outperforming or matching other models in zero-shot and one-shot translation scenarios within the medical domain.
Rebecca M. Neeser,Bruno Correia,Philippe Schwaller
http://arxiv.org/abs/2312.12737v1
Compressor summary: The Focused Synthesizability score (FSscore) is a scoring approach that learns to rank molecules based on binary preferences using a graph attention network and human expert feedback, improving synthetic feasibility assessment in chemistry and drug discovery.
Jiachen Zhao,Zhun Deng,David Madras,James Zou,Mengye Ren
http://arxiv.org/abs/2312.12736v1
Compressor summary: The authors propose a forgetting-based filtering algorithm called ForgetFilter to ensure safe large language models finetuning by removing unsafe data based on how easily the model forgets it.
Libo Wang,Sijun Dong,Ying Chen,Xiaoliang Meng,Shenghui Fang
http://arxiv.org/abs/2312.12735v1
Compressor summary: The authors propose MetaSegNet, a metadata-collaborative multimodal segmentation network that uses vision-language representation learning for semantic segmentation of remote sensing images, improving generalization and accuracy compared to existing methods.
Wen Huang,Xintao Wu
http://arxiv.org/abs/2312.12731v1
Compressor summary: The paper proposes a causal approach to deal with biases in bandit problems using data from offline observations, leading to better decision policies and reduced regret.
Julio Silva-Rodriguez,Sina Hajimiri,Ismail Ben Ayed,Jose Dolz
http://arxiv.org/abs/2312.12730v1
Compressor summary: CLAP is a new approach for efficient transfer learning that adapts to different classes and tasks without relying on large labeled samples or case-specific hyperparameter tuning.
Haoxing Chen,Yaohui Li,Zhangxuan Gu,Zhuoer Xu,Jun Lan,Huaxiong Li
http://arxiv.org/abs/2312.12729v1
Compressor summary: SRIN is a new technique that uses semantic segmentation maps to improve image harmonization by matching foreground and background features.
Qihang Fang,Yafei Song,Keqiang Li,Liefeng Bo
http://arxiv.org/abs/2312.12726v1
Compressor summary: The paper proposes a more adaptive method to reduce shape-radiance ambiguity in neural radiance fields by estimating the color field based on the density field and posed images, and then applying it to regularize NeRF's density field.
Chengxiang Yin,Zhengping Che,Kun Wu,Zhiyuan Xu,Jian Tang
http://arxiv.org/abs/2312.12723v1
Compressor summary: The paper proposes a novel framework for Knowledge-based Visual Question Answering that uses Multiple Clues for Reasoning with Memory Neural Networks to exploit external knowledge and answer more general questions.
Jiang-Tian Zhai,Xialei Liu,Lu Yu,Ming-Ming Cheng
http://arxiv.org/abs/2312.12722v1
Compressor summary: Our method learns new tasks without accessing past data by selectively distilling patches for plasticity and stability, and restoring old task knowledge with realistic prototypes.
Chengxiang Yin,Zhengping Che,Kun Wu,Zhiyuan Xu,Qinru Qiu,Jian Tang
http://arxiv.org/abs/2312.12721v1
Compressor summary: The paper introduces EC-GNNs, a model that uses dense captioning to distill event-correlated information for cross-modal reasoning in VideoQA tasks.
Guangtao Zheng,Mengdi Huai,Aidong Zhang
http://arxiv.org/abs/2312.12720v1
Compressor summary: The paper proposes AdvST, a simple but effective method for single domain generalization that uses standard data augmentations with learnable parameters to manipulate sample semantics and learns a robust model with the augmented data.
Yunye Gong,Robik Shrestha,Jared Claypoole,Michael Cogswell,Arijit Ray,Christopher Kanan,Ajay Divakaran
http://arxiv.org/abs/2312.12716v1
Compressor summary: The proposed BloomVQA dataset evaluates vision-language models on different levels of comprehension using picture stories based on Bloom's Taxonomy, revealing weaknesses and inconsistencies in current models.
Jianheng Huang,Ante Wang,Linfeng Gao,Linfeng Song,Jinsong Su
http://arxiv.org/abs/2312.12713v1
Compressor summary: The paper proposes a semi-supervised learning framework, SemiDQG, to improve dialogue query generation by using response-augmented queries and pseudo instances for training.
Aritra Bhowmick,Mert Kosan,Zexi Huang,Ambuj Singh,Sourav Medya
http://arxiv.org/abs/2312.12697v1
Compressor summary: DGCluster is a new method for graph clustering that uses graph neural networks, automatically determines the number of clusters, and performs well across various metrics and datasets.
Shichong Peng,Alireza Moazeni,Ke Li
http://arxiv.org/abs/2312.12691v1
Compressor summary: This study compares different deep generative models on three inverse problems and finds that CHIMLE produces the best valid solutions and reliable uncertainty estimates.
Tannon Kew,Florian Schottmann,Rico Sennrich
http://arxiv.org/abs/2312.12683v1
Compressor summary: The text discusses the need for cross-lingual transfer in large language models and shows that multilingual instruction tuning with only three languages can improve performance on generative tasks but is less important for structured tasks.
Tim Valicenti,Justice Vidal,Ritik Patnaik
http://arxiv.org/abs/2312.12682v1
Compressor summary: The paper introduces Mini-GPTs, smaller and efficient language models created by contextual pruning of traditional LLMs, and demonstrates their effectiveness on diverse domains.
Hen Emuna,Nadav Borenstein,Xin Qian,Hyeonsu Kang,Joel Chan,Aniket Kittur,Dafna Shahaf
http://arxiv.org/abs/2312.12681v1
Compressor summary: BARcode is a search engine for finding biological solutions to engineering problems by mining inspirations from the web at scale, overcoming the limitations of existing hand-curated datasets.
Abdulkadhem A. Abdulkadhem
http://arxiv.org/abs/2312.12680v1
Compressor summary: The paper presents an innovative method to extract camera trajectories from video footage without GPS in noisy environments, using phase correlation and dynamic chain code techniques.
Pei Huang,Haoze Wu,Yuting Yang,Ieva Daukantas,Min Wu,Yedi Zhang,Clark Barrett
http://arxiv.org/abs/2312.12679v1
Compressor summary: The authors propose a framework for verifying properties of quantized neural networks using integer linear programming, heuristic search methods, and bound-propagation techniques, which improves scalability and efficiency compared to existing approaches.
Jack Sandberg,Niklas Åkerblom,Morteza Haghir Chehreghani
http://arxiv.org/abs/2312.12676v1
Compressor summary: The paper studies a combinatorial bandit problem with time-varying arm availability and provides novel regret bounds for three GP-based algorithms; it also applies these methods to an energy-efficient navigation problem on real roads.