This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-15 generated by the compressor, my personal LLM-based project.
Haoyu Guo,He Zhu,Sida Peng,Yuang Wang,Yujun Shen,Ruizhen Hu,Xiaowei Zhou
http://arxiv.org/abs/2312.08372v1
Compressor summary: The paper proposes a novel 3D-to-2D query framework to use 2D segmentation models for 3D instance segmentation, improving generalization ability and robustness across various scenes.
Kuan-Chih Huang,Weijie Lyu,Ming-Hsuan Yang,Yi-Hsuan Tsai
http://arxiv.org/abs/2312.08371v1
Compressor summary: The paper proposes a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection using current-frame objects and their historical trajectories as input.
Xijun Wang,Junbang Liang,Chun-Kai Wang,Kenan Deng,Yu Lou,Ming Lin,Shan Yang
http://arxiv.org/abs/2312.08367v1
Compressor summary: Key points: - VLAP model for efficient and effective video-language alignment - Frame-Prompter and QFormer-Distiller modules for frame sampling and cross-modal alignment - Improved accuracy, latency, and speed up compared to prior work and state-of-the-art methods Summary: The VLAP model uses Frame-Prompter and QFormer-Distiller to align video and language efficiently and effectively, achieving better results on video question-answering benchmarks.
Tsung-Han Wu,Giscard Biamby,David Chan,Lisa Dunlap,Ritwik Gupta,Xudong Wang,Joseph E. Gonzalez,Trevor Darrell
http://arxiv.org/abs/2312.08366v1
Compressor summary: The paper proposes a method for large multimodal models to detect and correct false premises in image segmentation tasks, improving their performance and human interaction.
Bernhard Jaeger,Andreas Geiger
http://arxiv.org/abs/2312.08365v1
Compressor summary: The text introduces reinforcement learning as a generalization of supervised learning for optimizing non-differentiable objectives in deep neural networks, and provides an accessible introduction to state-of-the-art algorithms like PPO.
Zeyu Ma,Alexander Raistrick,Lahav Lipson,Jia Deng
http://arxiv.org/abs/2312.08364v1
Compressor summary: OcMesher is a mesh extraction algorithm that creates octrees from signed distance functions and camera views to generate high-quality synthetic data for computer vision tasks.
Alexander Borzunov,Max Ryabinin,Artem Chumachenko,Dmitry Baranchuk,Tim Dettmers,Younes Belkada,Pavel Samygin,Colin Raffel
http://arxiv.org/abs/2312.08361v1
Compressor summary: The authors propose cost-efficient methods for running large language models on geodistributed devices, addressing reliability and load-balancing challenges with special algorithms to achieve up to 10x faster inference.
Anand Siththaranjan,Cassidy Laidlaw,Dylan Hadfield-Menell
http://arxiv.org/abs/2312.08358v1
Compressor summary: Preference learning from human feedback depends on incomplete data with hidden context, which can lead to counter-intuitive results and vulnerabilities in RLHF; distributional preference learning (DPL) methods can mitigate these issues.
Bowen Wen,Wei Yang,Jan Kautz,Stan Birchfield
http://arxiv.org/abs/2312.08344v1
Compressor summary: FoundationPose is a single model that estimates and tracks 6D object poses using CAD models or reference images, achieving strong generalization with synthetic training and large language models.
Thomas Tanay,Matteo Maggioni
http://arxiv.org/abs/2312.08338v1
Compressor summary: The text describes ConvGLR, a new method for novel view synthesis that uses a global rendering operator in a low-resolution latent space to improve performance over existing methods.
Srikumar Sastry,Xin Xing,Aayush Dhakal,Subash Khanal,Adeel Ahmad,Nathan Jacobs
http://arxiv.org/abs/2312.08334v1
Compressor summary: The authors propose a language model-based method for species distribution modeling that uses taxonomic hierarchy and a novel proximity-aware evaluation metric, achieving state-of-the-art results on various tasks.
Xin You,Ming Ding,Minghui Zhang,Hanxiao Zhang,Yi Yu,Jie Yang,Yun Gu
http://arxiv.org/abs/2312.08323v1
Compressor summary: The paper proposes a unified network, PnPNet, that uses pushing and pulling branches to generate precise boundary segmentation of volumetric images, improving diagnosis and intervention in clinical practice.
Jiang Zhang,Qiong Wu,Yiming Xu,Cheng Cao,Zheng Du,Konstantinos Psounis
http://arxiv.org/abs/2312.08303v1
Compressor summary: BD-LLM is a method to improve toxic content detection by using decision trees to guide Large Language Models and distill their knowledge into smaller, faster models.
Van Minh Nguyen,Nasheen Nur,William Stern,Thomas Mercer,Chiradeep Sen,Siddhartha Bhattacharyya,Victor Tumbiolo,Seng Jhing Goh
http://arxiv.org/abs/2312.08299v1
Compressor summary: The study uses AI to analyze social media posts and identify patterns of suicidal behavior by assigning attribution scores to tokens in users' texts.
Guénolé Fiche,Simon Leglaive,Xavier Alameda-Pineda,Antonio Agudo,Francesc Moreno-Noguer
http://arxiv.org/abs/2312.08291v1
Compressor summary: The authors propose a novel method for human pose and shape estimation using a low-dimensional discrete latent representation, achieving realistic results and outperforming current non-parametric approaches.
Piyush Arora,Pratik Mazumder
http://arxiv.org/abs/2312.08288v1
Compressor summary: This paper proposes a novel method to reduce bias in deep learning models without prior knowledge, especially when training data is limited, by synthesizing hybrid samples that balance the bias and improve predictions.
Anup Shakya,Abisha Thapa Magar,Somdeb Sarkhel,Deepak Venugopal
http://arxiv.org/abs/2312.08287v1
Compressor summary: The text proposes a framework using Hybrid Markov Logic Networks to verify complex properties of learned representations from Deep Neural Networks by encoding verification as a Mixed Integer Linear Program.
Aldan Creo,Manuel Lama,Juan C. Vidal
http://arxiv.org/abs/2312.08282v1
Compressor summary: This paper proposes new ways to help automatic summarizers handle long and complex scientific articles by providing them with key terms from the articles, improving performance especially for smaller models.
Songchi Zhou,Sheng Yu
http://arxiv.org/abs/2312.08274v2
Compressor summary: Key points: - The paper proposes a method for high-throughput biomedical relation extraction using large language models (LLMs) and their reading comprehension ability and world knowledge. - The method formulates the task as a binary classification problem, uses a biomedical thesaurus to match head entities, and slices text into chunks for compatibility with LLMs. - The method achieves performance comparable to GPT-4 on a curated benchmark dataset and can be extended to different semi-structured biomedical websites. Summary: The paper presents a method that leverages large language models' reading comprehension and world knowledge for extracting various types of biomedical relations from semi-structured web articles, with performance comparable to GPT-4.
Arul Selvam Periyasamy,Vladimir Tsaturyan,Sven Behnke
http://arxiv.org/abs/2312.08268v1
Compressor summary: The paper proposes a new vision transformer model for multi-object pose estimation with inductive biases, deformable attention, and query aggregation, achieving state-of-the-art results on the YCB-Video dataset.
Gwilherm Lesné,Yann Gousseau,Saïd Ladjal,Alasdair Newson
http://arxiv.org/abs/2312.08256v1
Compressor summary: Key points: - The paper proposes an auto-encoder for controlled image editing using StyleGAN - The auto-encoder re-organizes the latent space of StyleGAN to encourage disentanglement of attributes - The approach has shorter training time and better disentanglement than competing methods Summary: The paper presents a simple auto-encoder that edits images using StyleGAN by re-organizing its latent space for disentangled attribute editing, with faster training and higher quality.
Huan Yan,Yong Li
http://arxiv.org/abs/2312.08248v1
Compressor summary: Generative AI is a vital tool for improving traffic management and optimization by addressing various issues in different tasks within intelligent transportation systems.
Yujun Chen,Xin Tan,Zhizhong Zhang,Yanyun Qu,Yuan Xie
http://arxiv.org/abs/2312.08234v1
Compressor summary: The paper presents a semi-supervised segmentation method for point clouds using latent labels from LiDAR and image data, improving performance over the state-of-the-art method, LaserMix.
Gregor Kobsik,Isaak Lim,Leif Kobbelt
http://arxiv.org/abs/2312.08230v1
Compressor summary: Key points: - The paper proposes a self-supervised method for detecting partial and extrinsic symmetries in 3D shapes using contrastive learning and geodesic point cloud patches. - The method learns rotation, reflection, translation and scale invariant local shape features that generalize across different datasets and classes. - The paper introduces a new benchmark test for this task and shows how the detected symmetries can be used for 3D shape partitioning. Summary: The paper presents a novel self-supervised approach to detect partial and extrinsic symmetries in 3D shapes using contrastive learning on geodesic point cloud patches, and demonstrates its applications for shape analysis and partitioning.
Haoran Ye,Jiarui Wang,Helan Liang,Zhiguang Cao,Yong Li,Fanzhang Li
http://arxiv.org/abs/2312.08224v1
Compressor summary: GLOP is a hierarchical framework that combines non-autoregressive and autoregressive neural heuristics to efficiently scale up large-scale routing problems, achieving state-of-the-art real-time performance.
Chanyong Jung,Gihyun Kwon,Jong Chul Ye
http://arxiv.org/abs/2312.08223v1
Compressor summary: The paper proposes a method for image translation using graph neural networks and patch-wise similarity to capture semantic correspondence between input and output images.
Jin Li,Qirong Zhang,Shuling Xu,Xinlong Chen,Longkun Guo,Yang-Geng Fu
http://arxiv.org/abs/2312.08221v2
Compressor summary: Key points: - The paper proposes a soft graph normalization method and a label-smoothing-based learning framework to improve deep GNNs. - The method can preserve node diversity, capture input knowledge, and enhance optimization with residual connections. - The method outperforms existing baselines on twelve real-world node classification benchmarks. Summary: The paper presents a new approach to deepen graph neural networks by normalizing graphs softly, using residual connections, and learning from smoothed labels, which improves node classification performance.
Peiqi Duan,Boyu Li,Yixin Yang,Hanyue Lou,Minggui Teng,Yi Ma,Boxin Shi
http://arxiv.org/abs/2312.08220v1
Compressor summary: Event cameras improve image/video quality by enhancing traditional frame-based cameras in dynamic range, speed, and other aspects using five specific tasks and a new dataset.
Jingsheng Gao,Jiacheng Ruan,Suncheng Xiang,Zefang Yu,Ke Ji,Mingye Xie,Ting Liu,Yuzhuo Fu
http://arxiv.org/abs/2312.08212v1
Compressor summary: The paper introduces LAMM, a label alignment method for pre-trained visual-language models that improves few-shot learning and continual learning performance by adjusting category embeddings and using a hierarchical loss.
Yunchen Li,Zhou Yu,Gaoqi He,Yunhang Shen,Ke Li,Xing Sun,Shaohui Lin
http://arxiv.org/abs/2312.08200v1
Compressor summary: The paper proposes a novel generative model, SPD-DDPM, which uses Gaussian distributions in the SPD space for efficient predictions on large-scale data.
Kamil Kanclerz,Julita Bielaniewicz,Marcin Gruza,Jan Kocon,Stanisław Woźniak,Przemysław Kazienko
http://arxiv.org/abs/2312.08198v1
Compressor summary: Key points: - Data annotated by humans is valuable but costly for subjective NLP tasks - A new model-based approach reduces annotations needed with little knowledge loss - The method highlights the importance of diverse data collection and multi-task learning for subjective NLP Summary: The paper proposes a model that cuts annotation costs for subjective NLP problems by selecting tasks individually, while preserving knowledge quality and emphasizing the need for diverse data.
Pu Cao,Lu Yang,Feng Zhou,Tianrui Huang,Qing Song
http://arxiv.org/abs/2312.08195v1
Compressor summary: The paper proposes a framework for customizing diffusion models to generate high-quality images for specific concepts while maintaining versatility and controllability, using generalized classifier-free guidance and a concept-specific generator.
Mojtaba Najafi Khatounabad,Hacer Yalim Keles,Selma Kadioglu
http://arxiv.org/abs/2312.08194v1
Compressor summary: The study introduces SVInvNet, a deep learning-based approach for seismic velocity inversion that performs better than FWI with diverse seismic models and varying noise levels.
Tao Zhang,Kun Ding,Jinyong Wen,Yu Xiong,Zeyu Zhang,Shiming Xiang,Chunhong Pan
http://arxiv.org/abs/2312.08192v1
Compressor summary: The paper proposes PAD, a pre-training paradigm for infrared images that uses adapters to learn domain-specific features while retaining general feature extraction ability, and shows its effectiveness on three downstream tasks.
Peilin Cai
http://arxiv.org/abs/2312.08177v1
Compressor summary: The paper applies advanced image segmentation techniques using CNNs and the Unet model to analyze C-fos gene expression, a marker for neural activity, and develops a novel workflow with pre-processing steps and labeling approaches for efficient and automated segmentation.
Yuan Yao,Tian-Sheuan Chang
http://arxiv.org/abs/2312.08176v1
Compressor summary: The proposed technique compresses feature maps using adaptive scale interpolation and independent channel indexing, achieving high compression rates and low hardware costs.
Haifeng Huang,Zehan Wang,Rongjie Huang,Luping Liu,Xize Cheng,Yang Zhao,Tao Jin,Zhou Zhao
http://arxiv.org/abs/2312.08168v1
Compressor summary: The paper proposes a method to use object identifiers for referring to multiple objects in 3D scenes and fine-tunes a language model on various tasks using instruction tuning.
Qian Chen,Taolin Zhang,Dongyang Li,Xiaofeng He
http://arxiv.org/abs/2312.08157v1
Compressor summary: CIDR is a new method for finding minimal features in natural language processing models by detecting interactions between them using Cooperative Integrated Gradients and solving a knapsack problem.
Thomas Robinson,Niek Tax,Richard Mudd,Ido Guy
http://arxiv.org/abs/2312.08150v1
Compressor summary: The paper proposes UCB-EU, a cost-based sampling strategy for active learning that reduces the impact of biased non-response on prediction models' performance in real-world contexts.
Juan Luis Gonzalez Bello,Minh-Quan Viet Bui,Munchurl Kim
http://arxiv.org/abs/2312.08136v1
Compressor summary: ProNeRF is a novel neural rendering method that balances memory, speed, and quality by using projection-aware sampling and a new training strategy for efficient ray exploration and exploitation.
Amirhossein Habibian,Amir Ghodrati,Noor Fathima,Guillaume Sautiere,Risheek Garrepalli,Fatih Porikli,Jens Petersen
http://arxiv.org/abs/2312.08128v1
Compressor summary: Clockwork Diffusion is a method that saves computational resources in text-to-image diffusion models by reusing previous denoising operations on low-resolution feature maps.
Heechan Yoon,Seungkyu Lee
http://arxiv.org/abs/2312.08118v1
Compressor summary: The paper proposes a method to improve NeRF for synthesizing transparent objects by considering their refraction using visual hull, Snell's law, and NeRF sampling.
Clemens Seibold,Anna Hilsmann,Peter Eisert
http://arxiv.org/abs/2312.08111v1
Compressor summary: The paper proposes a method to prevent ghosting artifacts in face morphing by aligning pixels during generation and improves detection resistance and biometric quality.
Yorgos Felekis,Fabio Massimo Zennaro,Nicola Branchini,Theodoros Damoulas
http://arxiv.org/abs/2312.08107v1
Compressor summary: COTA is a novel method to learn abstraction maps between causal models using multi-marginal Optimal Transport and interventional data, without assuming complete knowledge of the underlying models.
Wenjie Wu,Changjun Fan,Jincai Huang,Zhong Liu,Junchi Yan
http://arxiv.org/abs/2312.08103v1
Compressor summary: The article surveys machine learning methods for solving multi-dimensional bin packing problems and provides a benchmark dataset and future research directions.
Antoine Schnepf,Flavian Vasile,Ugo Tanielian
http://arxiv.org/abs/2312.08094v1
Compressor summary: 3DGEN is a model that uses Neural Radiance Fields and GANs to generate realistic 3D meshes from images for various creative applications.
Tianshuo Peng,Zuchao Li,Ping Wang,Lefei Zhang,Hai Zhao
http://arxiv.org/abs/2312.08084v1
Compressor summary: DQPSA is a novel framework for multi-modal sentiment analysis that uses prompt as dual query to extract relevant visual information and an energy-based pairwise expert to model target boundaries.
Hao Ma,Zhiyuan Peng,Mingjie Shao,Jing Li,Ju Liu
http://arxiv.org/abs/2312.08079v1
Compressor summary: The paper proposes prompt tuning to extend a single-talker ASR model to target-speaker ASR, achieving comparable performance to full fine-tuning with much less training cost and retaining the original features of the model.
Wenting Chen,Xiang Li,Linlin Shen,Yixuan Yuan
http://arxiv.org/abs/2312.08078v2
Compressor summary: The Adaptive patch-word Matching (AdaMatch) model uses adaptive patches and keywords to generate explainable chest X-ray reports from images and text.
Ruituo Wu,Jiani Liu,Ce Zhu,Anh-Huy Phan,Ivan V. Oseledets,Yipeng Liu
http://arxiv.org/abs/2312.08075v1
Compressor summary: The paper proposes a tensor ring decomposition method for probabilistic graph modeling, which improves expressiveness and flexibility compared to existing methods, and uses an ensemble learning-inspired mixture model to incorporate multiple permutation candidates for better probability density estimation.
Juan Luis Gonzalez Bello,Munchurl Kim
http://arxiv.org/abs/2312.08071v1
Compressor summary: Key points: - Paper proposes a self-supervised learning method for single-image NVS that considers view-dependent effects (VDE) - VDEs are modeled as negative disparity using camera motion priors and specularities - Method uses relaxed volumetric rendering to improve efficiency - Outperforms state-of-the-art methods on two datasets Summary: The paper presents a self-supervised learning method for synthesizing novel views that captures view-dependent effects by modeling them as negative disparity and using relaxed volumetric rendering, achieving better results than existing methods.
Jouseau Roxane,Salva Sébastien,Samir Chafik
http://arxiv.org/abs/2312.08066v1
Compressor summary: The paper introduces a new metric to measure data quality for machine learning models, based on the relationship between performance and data deterioration.
Evdoxia Taka,Yuri Nakao,Ryosuke Sonoda,Takuya Yokota,Lin Luo,Simone Stumpf
http://arxiv.org/abs/2312.08064v1
Compressor summary: The authors explore how to involve non-experts in improving AI fairness by collecting feedback on a credit model and studying its effects, while also providing resources for further research.
Vihari Piratla,Juyeon Heo,Sukriti Singh,Adrian Weller
http://arxiv.org/abs/2312.08063v1
Compressor summary: The authors propose a Bayesian method to improve the reliability of concept explanations in multi-modal learning models, which are valuable for interpreting and debugging predictions using human-understandable concepts.
Florian Fervers,Sebastian Bullinger,Christoph Bodensteiner,Michael Arens,Rainer Stiefelhagen
http://arxiv.org/abs/2312.08060v1
Compressor summary: The paper proposes C-BEV, a novel retrieval method for geolocating street-view images using bird's eye view maps, which outperforms existing methods in challenging scenarios and infers camera pose without metric groundtruth.
Fares Fourati,Christopher John Quinn,Mohamed-Slim Alouini,Vaneet Aggarwal
http://arxiv.org/abs/2312.08057v1
Compressor summary: The SGB algorithm is a novel combinatorial bandit method that samples and selects actions from a subset of unselected arms, achieving better regret bounds than existing methods for constrained social influence maximization.
Shengguang Wu,Zhenglun Chen,Qi Su
http://arxiv.org/abs/2312.08056v1
Compressor summary: Key points: - The paper proposes a novel artifact image synthesis approach that uses diffusion models, archaeological knowledge, and historical expertise. - The approach generates higher-quality images that capture intricate details and align with written documents. - The approach outperforms existing methods in automatic metrics and human evaluation. Summary: The paper presents a new method to generate high-quality images of ancient artifacts using diffusion models, archaeological knowledge, and historical expertise, achieving better alignment with written texts and outperforming previous approaches.
Zifan Wang,Zhuorui Ye,Haoran Wu,Junyu Chen,Li Yi
http://arxiv.org/abs/2312.08054v1
Compressor summary: The paper proposes SCSFNet, a novel network that predicts future scenes and their semantic labels from dynamic point cloud sequences using a hybrid geometric representation and an attention-based skip connection scheme.
Jihao Xin,Ivan Ilin,Shunkang Zhang,Marco Canini,Peter Richtárik
http://arxiv.org/abs/2312.08053v1
Compressor summary: Kimad is an adaptive compression method for distributed deep learning that adjusts to network layer needs and improves communication efficiency.
Yuanbo Tang,Zhiyuan Peng,Yang Li
http://arxiv.org/abs/2312.08052v1
Compressor summary: The paper proposes a sparse and interpretable trajectory representation framework using pathlet dictionaries that improves downstream applications like traffic analysis and data compression.
Xu-Lu Zhang,Xiao-Yong Wei,Jin-Lin Wu,Tian-Yi Zhang,Zhaoxiang Zhang,Zhen Lei,Qing Li
http://arxiv.org/abs/2312.08048v2
Compressor summary: The paper proposes a method to improve inversion methods that generate personalized images by guiding the inversion process towards the core distribution and using spatial regularization, resulting in more diverse and balanced compositions of concepts.
Huaiyuan Ying,Zhengyun Zhao,Yang Zhao,Sihang Zeng,Sheng Yu
http://arxiv.org/abs/2312.08036v1
Compressor summary: CoRTEx uses ChatGPT explanations and contrastive learning to improve term clustering in biomedical knowledge graphs, achieving high accuracy and robustness.
Mona Schirmer,Dan Zhang,Eric Nalisnick
http://arxiv.org/abs/2312.08033v1
Compressor summary: The paper investigates how different model disagreement measures based on divergences perform in detecting out-of-distribution data and estimating test errors, using various vision models.
Jie Yan,Jing Liu,Zhong-yuan Zhang
http://arxiv.org/abs/2312.08029v1
Compressor summary: The paper proposes a new EM framework for clustering using DDPMs, which shows better performance in clustering and related tasks than VAEs and GANs.
Jinta Weng,Jiarui Zhang,Yue Hu,Daidong Fa,Xiaofeng Xuand,Heyan Huang
http://arxiv.org/abs/2312.08027v1
Compressor summary: MTPrompt is a method that improves the performance of chatbots using large language models by incorporating task-related information into prompts, making it easier to access the model's knowledge.
Yang Zhan,Yuan Yuan,Zhitong Xiong
http://arxiv.org/abs/2312.08022v1
Compressor summary: The paper introduces a new task, Mono3DRefer, that uses language descriptions with appearance and geometry information to locate 3D objects in monocular RGB images, and proposes a transformer-based network, Mono3DVG-TR, that leverages both text and visual features for this task.
Yuyang Sun,Huy H. Nguyen,Chun-Shien Lu,ZhiYong Zhang,Lu Sun,Isao Echizen
http://arxiv.org/abs/2312.08020v1
Compressor summary: The text proposes a blended-based detection method for detecting digital face manipulations using synthetic training samples and a multi-scale feature reconstruction network, which performs well on unseen data.
Zhiyuan Ma,Guoli Jia,Bowen Zhou
http://arxiv.org/abs/2312.08019v1
Compressor summary: The paper proposes AdapEdit, an algorithm that adapts image editing based on text-driven instructions using soft-attention, improving visual contents generation without needing extra training or data.
Vsevolod Skorokhodov,Darya Drozdova,Dmitry Yudin
http://arxiv.org/abs/2312.08012v1
Compressor summary: The paper introduces uSF, a neural network model that reconstructs 3D scenes with color and semantic labels, as well as uncertainty estimates, improving performance with limited training data.
Shahzad Ahmad,Sukalpa Chanda,Yogesh S Rawat
http://arxiv.org/abs/2312.08010v1
Compressor summary: EZ-CLIP is an efficient adaptation of CLIP that leverages temporal visual prompts for video action recognition and zero-shot learning with fewer parameters.
Kewei Wang,Yizheng Wu,Zhiyu Pan,Xingyi Li,Ke Xian,Zhe Wang,Zhiguo Cao,Guosheng Lin
http://arxiv.org/abs/2312.08009v2
Compressor summary: The study explores semi-supervised learning for class-agnostic motion prediction in autonomous driving, using a consistency-based self-training paradigm, a novel motion selection and re-generation module, and two data augmentation strategies to improve performance while reducing annotation costs.
Wenxuan Wang,Tongtian Yue,Yisi Zhang,Longteng Guo,Xingjian He,Xinlong Wang,Jing Liu
http://arxiv.org/abs/2312.08007v1
Compressor summary: The paper introduces a new finer-grained part-level referring expression segmentation task, RefCOCOm dataset, MRES-32M dataset, and model UniRES that outperform previous methods on both object-level and part-level vision-language understanding.
Yang Jiao,Zequn Jie,Shaoxiang Chen,Lechao Cheng,Jingjing Chen,Lin Ma,Yu-Gang Jiang
http://arxiv.org/abs/2312.08004v1
Compressor summary: The paper introduces IA-BEV, a method that integrates instance awareness into depth estimation for object detection in autonomous driving using camera-based bird-eye-view perception.
Zeynep G. Saribatur,Stefan Woltran
http://arxiv.org/abs/2312.07993v1
Compressor summary: The paper introduces a new equivalence notion for ASP programs called relativized simplifications, which captures different notions of forgetting and abstraction in knowledge representation and reasoning.
Alon Mor,Yonatan Belinkov,Benny Kimelfeld
http://arxiv.org/abs/2312.07991v1
Compressor summary: The text describes methods to quickly find and rank important words in a document using statistical analysis, while reducing noise and computational cost.
Róbert Csordás,Piotr Piękos,Kazuki Irie,Jürgen Schmidhuber
http://arxiv.org/abs/2312.07987v2
Compressor summary: The SwitchHead method reduces the compute and memory needs of self-attention layers in Transformers using Mixture-of-Experts (MoE) layers, achieving speedup without sacrificing performance.
Xiaobo Zhu,Yan Wu,Zhipeng Li,Hailong Su,Jin Che,Zhanheng Chen,Liying Wang
http://arxiv.org/abs/2312.07983v1
Compressor summary: The MPFA model combines evolving and raw perspectives in graph networks, enabling efficient learning of interleaved dynamics with fewer temporal neighbors.
Haiming Yi,Lei Hou,Yuhong Jin,Nasser A. Saeed
http://arxiv.org/abs/2312.07981v1
Compressor summary: The paper proposes a Time Series Diffusion Method (TSDM) for vibration signal generation using a U-net architecture with attention block, which improves small sample fault diagnosis accuracy for bearing datasets.
Prameela Madambakam,Shathanaa Rajmohan,Himangshu Sharma,Tummepalli Anka Chandrahas Purushotham Gupta
http://arxiv.org/abs/2312.07979v1
Compressor summary: The proposed semantic extraction based LJP model uses pretrained transformers to understand complex legal case documents, extract semantics at multiple levels, and predict judgment with attention mechanism.
T. Kim,H. Jeon,Y. Lim
http://arxiv.org/abs/2312.07976v2
Compressor summary: The study presents a new dataset for testing object detection in different weather conditions using the CARLA simulator and evaluates YOLO series' performance in various rain scenarios.
Zhiyuan Ma,zhihuan yu,Jianjun Li,Bowen Zhou
http://arxiv.org/abs/2312.07971v1
Compressor summary: LMD is a faster image reconstruction framework that combines masking diffusion from autoencoders and diffusion probabilistic models, reducing training time and improving inference speed.
Yanling Tian,Di Chen,Yunan Liu,Jian Yang,Shanshan Zhang
http://arxiv.org/abs/2312.07970v1
Compressor summary: The paper proposes a hybrid pre-training framework for person search, using sub-task data only, and shows significant improvements across diverse protocols.
Shiyun Chen,Li Lin,Pujin Cheng,Xiaoying Tang
http://arxiv.org/abs/2312.07969v1
Compressor summary: ASLseg is a new semi-supervised framework that improves liver tumor segmentation by adapting and refining the SAM model using pseudo-labels and an adaptation network.
Mathieu Schumann,Quentin Reynaud,François Sempé,Julien Guibourdenche,Jean-Baptiste Ly,Nicolas Sabouret
http://arxiv.org/abs/2312.07966v1
Compressor summary: The SMACH approach combines qualitative and quantitative data with agent-based simulation to improve the representation of Time-Use Surveys in activity and energy simulation.
Shengsheng Qian,Yifei Wang,Dizhan Xue,Shengjie Zhang,Huaiwen Zhang,Changsheng Xu
http://arxiv.org/abs/2312.07955v1
Compressor summary: The paper proposes a novel PoisonCAM method that precisely detects and removes SSL backdoors by cluster activation masking, achieving better results than existing methods on ImageNet-100.
Zhaorui Tan,Xi Yang,Kaizhu Huang
http://arxiv.org/abs/2312.07951v1
Compressor summary: SADA is a novel framework for text-to-image synthesis that uses semantic-aware data augmentation and image semantic regularization to improve consistency and quality of generated images.
Xin Ding,Xiaoyu Liu,Yun Zhang,Zhijun Tu,Wei Li,Jie Hu,Hanting Chen,Yehui Tang,Zhiwei Xiong,Baoqun Yin,Yunhe Wang
http://arxiv.org/abs/2312.07950v1
Compressor summary: CBQ is a method for efficient large language models that uses cross-block reconstruction to reduce errors from block quantization and outlier handling techniques for better low-bit quantization.
Haowen Bai,Zixiang Zhao,Jiangshe Zhang,Yichen Wu,Lilun Deng,Yukun Cui,Shuang Xu,Baisong Jiang
http://arxiv.org/abs/2312.07943v1
Compressor summary: ReFusion is a meta-learning based framework that learns optimal fusion loss from reconstructing source images, allowing it to adapt to diverse image fusion tasks.
Wenqian Zhang,Molin Huang,Yuxuan Zhou,Juze Zhang,Jingyi Yu,Jingya Wang,Lan Xu
http://arxiv.org/abs/2312.07937v1
Compressor summary: The paper introduces BOTH57M, a new dataset for generating realistic two-hand motions based on both body dynamics and text prompts, and proposes BOTH2Hands, a method that combines diffusion models and a cross-attention transformer for this task.
Ranjan Sapkota,Dawood Ahmed,Manoj Karkee
http://arxiv.org/abs/2312.07935v1
Compressor summary: YOLOv8 outperforms Mask R-CNN in instance segmentation for apple trees and immature apples under different conditions, with faster inference times.
Xiang Wei,Alan J. X. Guo,Sihan Sun,Mengyi Wei,Wei Yu
http://arxiv.org/abs/2312.07931v1
Compressor summary: The paper proposes a neural network-based sequence embedding technique using Poisson regression for efficient computation or approximation of Levenshtein distance in biological applications like DNA storage.
Baihe Huang,Banghua Zhu,Hanlin Zhu,Jason D. Lee,Jiantao Jiao,Michael I. Jordan
http://arxiv.org/abs/2312.07930v1
Compressor summary: The paper proposes a new statistical framework for watermarking, improves the rate and trade-off between error types, and explores model-agnostic and robust watermarking scenarios.
Weiguang Zhang,Qiufeng Wang,Kaizhu Huang
http://arxiv.org/abs/2312.07925v1
Compressor summary: Key points: - Document dewarping aims to remove geometric distortion in photos for text recognition - Polar coordinates representation (Polar-Doc) is explored as an alternative to Cartesian coordinates - One-stage model with multi-scope constraints achieves state-of-the-art results on two benchmarks Summary: The paper proposes a novel document dewarping method using Polar coordinates, which improves efficiency and performance compared to existing methods.
Hong Zhang,Yu Zhang
http://arxiv.org/abs/2312.07922v1
Compressor summary: The paper proposes reversible spiking neural networks that reduce memory costs during training, enabling deeper and larger SNN models with similar accuracies to ANNs.
Xiaoyu Zhou,Zhiwei Lin,Xiaojun Shan,Yongtao Wang,Deqing Sun,Ming-Hsuan Yang
http://arxiv.org/abs/2312.07920v1
Compressor summary: DrivingGaussian is a framework that efficiently and effectively reconstructs dynamic autonomous driving scenes with moving objects and high-fidelity details using Gaussian models and LiDAR prior.
Aiwei Liu,Leyi Pan,Yijian Lu,Jingjing Li,Xuming Hu,Lijie Wen,Irwin King,Philip S. Yu
http://arxiv.org/abs/2312.07913v1
Compressor summary: The text summarizes current text watermarking technologies that can prevent misuse and piracy of generated texts by large language models.
Kaijie Zhu,Qinlin Zhao,Hao Chen,Jindong Wang,Xing Xie
http://arxiv.org/abs/2312.07910v1
Compressor summary: PromptBench is a library to evaluate large language models with various components for research purposes.
Mingle Xu,Ji Eun Park,Jaehwan Lee,Jucheng Yang,Sook Yoon
http://arxiv.org/abs/2312.07905v1
Compressor summary: The study proposes a taxonomy to describe potential plant disease datasets and suggests future directions for creating more challenge-oriented datasets for deep learning applications in plant disease recognition and species identification.
Junhao Zheng,Shengjie Qiu,Qianli Ma
http://arxiv.org/abs/2312.07887v1
Compressor summary: SEQ* is an easy method for incremental learning with pre-trained language models that outperforms existing techniques and shows their underestimation of PLMs' anti-forgetting ability.
Kai Huang,Boyuan Yang,Wei Gao
http://arxiv.org/abs/2312.07886v1
Compressor summary: mPnP-LLM is a technique that allows adaptive modality adaptation for LLMs at runtime, reducing FLOPs and memory usage while maintaining or improving accuracy on embodied AI tasks.
Yufeng Liu,Haobo Zuo,Liangliang Yao,Kunhan Lu,Guangze Zheng,Changhong Fu
http://arxiv.org/abs/2312.07884v1
Compressor summary: The paper proposes a novel mutual-learning knowledge distillation framework for nighttime UAV tracking that uses low-light enhancers and a tight coupling-aware tracking backbone to improve performance while reducing computational burden.
Zhenduo Zhang,Bowen Zhang,Guang Liu
http://arxiv.org/abs/2312.07879v1
Compressor summary: CoIE is a technique that enhances text-to-image editing models by using a series of instructions from a large language model to manipulate multiple facial attributes more precisely and effectively.
Wei Zhao,Zhe Li,Jun Sun
http://arxiv.org/abs/2312.07876v1
Compressor summary: The study proposes a framework to analyze and understand the security vulnerabilities of large language models, revealing overfitting to harmful prompts and a mysterious neuron with high causal effect on output that can be targeted by attacks.
Guangming Zhu,Siyuan Wang,Tianci Wu,Liang Zhang
http://arxiv.org/abs/2312.07875v1
Compressor summary: The paper proposes a structured sketch recognition network with a semantic component-level memory module that can handle different situations of sketch recognition and improve the network's explainability.
Yanzuo Lu,Meng Shen,Andy J Ma,Xiaohua Xie,Jian-Huang Lai
http://arxiv.org/abs/2312.07871v2
Compressor summary: The paper proposes a novel Mutual Learning Network (MLNet) for universal domain adaptation, which reduces intra-domain variations and improves unknown-class identification using confidence-guided invariant feature learning and cross-domain mixup.
Yizhe Yang,Heyan Huang,Yihang Liu,Yang Gao
http://arxiv.org/abs/2312.07868v1
Compressor summary: The text discusses how to generate informative responses in dialogue using different types of knowledge sources and their effects on the task.
Xiaojie Hong,Zixin Song,Liangzhi Li,Xiaoli Wang,Feiyan Liu
http://arxiv.org/abs/2312.07867v1
Compressor summary: The paper introduces BESTMVQA, a system that helps users build and evaluate medical image question answering datasets and models.
Bang Wu,He Zhang,Xiangwen Yang,Shuo Wang,Minhui Xue,Shirui Pan,Xingliang Yuan
http://arxiv.org/abs/2312.07861v1
Compressor summary: GraphGuard is a training-data-free method for detecting and mitigating data misuse in graph neural networks on cloud-based machine learning platforms.
Yoshiro Kitamura,Yuanzhong Li,Wataru Ito,Hiroshi Ishikawa
http://arxiv.org/abs/2312.07860v1
Compressor summary: The paper presents a new image segmentation technique that uses higher-order potentials to model prior knowledge and improve segment quality, especially for pulmonary vessels.
Zhe Xu,Menghai Pan,Yuzhong Chen,Huiyuan Chen,Yuchen Yan,Mahashweta Das,Hanghang Tong
http://arxiv.org/abs/2312.07859v1
Compressor summary: The paper proposes invariant graph Transformer (IGT), a method that uses self-attention to perform fine-grained interventions on graph data for rationale discovery in graph machine learning, leading to improved model performance.
Minghao Fu,Ke Zhu,Jianxin Wu
http://arxiv.org/abs/2312.07856v1
Compressor summary: DTL uses a lightweight Compact Side Network to disentangle trainable parameters from the backbone, reducing GPU memory usage and improving accuracy on downstream tasks.
Tianxun Zhou,Muhammad Nur Shahril Iskandar,Keng-Hwee Chiam
http://arxiv.org/abs/2312.07854v1
Compressor summary: This study proposes a new method using image generation diffusion models to improve markerless gait analysis for lower-limb amputees, providing valuable insights for rehabilitation.
Liuxiang Qiu,Si Chen,Yan Yan,Jing-Hao Xue,Da-Han Wang,Shunzhi Zhu
http://arxiv.org/abs/2312.07853v2
Compressor summary: A novel network called HOS-Net uses short-, long-range features and high-order structure learning to improve visible-infrared person re-identification, achieving state-of-the-art performance.
Karthik Elamvazhuthi,Samet Oymak,Fabio Pasqualetti
http://arxiv.org/abs/2312.07851v2
Compressor summary: The paper compares neural ODEs and neural SDEs as reverse processes in SGMs, showing that stochasticity improves approximation of trajectories and enables controllability even with limited network width.
Feibo Jiang,Li Dong,Yubo Peng,Kezhi Wang,Kun Yang,Cunhua Pan,Dusit Niyato,Octavia A. Dobre
http://arxiv.org/abs/2312.07850v1
Compressor summary: The text describes how integrating large language models with agents' abilities can enhance 6G communications and proposes a multi-agent system for natural language communication tasks in 6G.
Yuanbo Wen,Tao Gao,Ziqi Li,Jing Zhang,Ting Chen
http://arxiv.org/abs/2312.07849v1
Compressor summary: RSHazeNet is a new framework that uses adaptive transposed self-attention, cross-level multi-view interaction, and view-progressive feature learning to efficiently remove haze from remote sensing images.
Shane Storm Strachan
http://arxiv.org/abs/2312.07848v1
Compressor summary: The project aims to create a large language model for Classics that is accurate, consistent, and appealing by fine-tuning an open-source LLM with a refined dataset and addressing its occasional hallucinations through continuous finetuning.
Xiong Zhou,Xianming Liu,Hanzhang Wang,Deming Zhai,Junjun Jiang,Xiangyang Ji
http://arxiv.org/abs/2312.07841v1
Compressor summary: The paper proposes a new loss function called unhinged loss that allows more in-depth analysis of deep learning dynamics and offers practical techniques for enhancing training.
Berkay H. Tosunlu,Joseph H. A. Guillaume,Alexis Tsoukiàs
http://arxiv.org/abs/2312.07838v1
Compressor summary: The text proposes a framework for using problem structuring methods to transform cognitive maps into value trees, enabling design-oriented decision support for conflict management with higher innovation potential.
Alexander Decruyenaere,Heidelinde Dehaene,Paloma Rabaey,Christiaan Polet,Johan Decruyenaere,Stijn Vansteelandt,Thomas Demeester
http://arxiv.org/abs/2312.07837v1
Compressor summary: The text discusses the challenges of inferring from synthetic data and emphasizes the need for better statistical inference methods for this type of data.
Gaurav Shrivastava,Ser-Nam Lim,Abhinav Shrivastava
http://arxiv.org/abs/2312.07835v1
Compressor summary: The paper proposes a robust framework for low-level vision tasks that learns from corrupted test sequences without external data and uses a novel spatial pyramid loss for noise robustness.
C Kupferschmidt,A. D. Binns,K. L. Kupferschmidt,G. W Taylor
http://arxiv.org/abs/2312.07833v1
Compressor summary: Text-to-image generative models can produce realistic images of rivers but may contain biases from the training data that need to be addressed in earth sciences applications.
Nhu-Thanh Nguyen,Khoa Thi-Kim Phan,Duc-Vu Nguyen,Ngan Luu-Thuy Nguyen
http://arxiv.org/abs/2312.07831v1
Compressor summary: The authors created a Vietnamese dataset for detecting abuse in texts using NLP methods and found that PhoBERT performed best among tested models.
Minh Duong,Long Nguyen,Yen Vuong,Trong Le,Ha-Thanh Nguyen
http://arxiv.org/abs/2312.07824v1
Compressor summary: The paper introduces a deep learning system that generates concise and relevant summaries of legal case documents using natural language processing techniques, aiming to benefit legal professionals by reducing workload and increasing efficiency.
Qi Tang,Yao Zhao,Meiqin Liu,Jian Jin,Chao Yao
http://arxiv.org/abs/2312.07823v1
Compressor summary: The Semantic Lens is a novel method for video super-resolution that uses semantic priors from degraded videos to improve pixel-level alignment and generate realistic visual results.
Srishti Gautam,Ahcene Boubekki,Marina M. C. Höhne,Michael C. Kampffmeyer
http://arxiv.org/abs/2312.07822v1
Compressor summary: KMEx is a simple method that transforms pre-trained models into self-explainable ones, providing diverse and trustworthy explanations without retraining.
Wei Zhang,Alexandre Salle
http://arxiv.org/abs/2312.07819v1
Compressor summary: The text describes experiments with GPT models like GPT-4 in Native Language Identification (NLI), where they predict a writer's first language from their second language writings, achieving high performance and providing justifications based on various linguistic features.
Ming Y. Lu,Bowen Chen,Drew F. K. Williamson,Richard J. Chen,Kenji Ikamura,Georg Gerber,Ivy Liang,Long Phi Le,Tong Ding,Anil V Parwani,Faisal Mahmood
http://arxiv.org/abs/2312.07814v1
Compressor summary: PathChat is a multimodal AI assistant for human pathology that combines a vision encoder pretrained on histology images with a language model, achieving high diagnostic accuracy and being more accurate and preferred by experts than other AI assistants.