This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-05 generated by the compressor, my personal LLM-based project.
Muhammad Uzair Khattak,Muhammad Ferjad Naeem,Muzammal Naseer,Luc Van Gool,Federico Tombari
http://arxiv.org/abs/2401.02418v1
Compressor summary: The authors propose a method to learn prompts for vision-language tasks using only text data derived from large language models, enabling zero-shot transfer and reducing costs.
Ayush Jain,Pushkal Katara,Nikolaos Gkanatsios,Adam W. Harley,Gabriel Sarch,Kriti Aggarwal,Vishrav Chaudhary,Katerina Fragkiadaki
http://arxiv.org/abs/2401.02416v1
Compressor summary: ODIN is a transformer model that simultaneously segments and labels 2D images and 3D point clouds, achieving state-of-the-art performance on various 3D perception benchmarks.
Chengyue Wu,Yukang Gan,Yixiao Ge,Zeyu Lu,Jiahao Wang,Ye Feng,Ping Luo,Ying Shan
http://arxiv.org/abs/2401.02415v1
Compressor summary: The paper proposes a new method to improve LLMs by expanding Transformer blocks and tuning them with a new corpus, achieving better performance on general tasks, programming, and math.
Jie An,Zhengyuan Yang,Jianfeng Wang,Linjie Li,Zicheng Liu,Lijuan Wang,Jiebo Luo
http://arxiv.org/abs/2401.02414v1
Compressor summary: The Cas-DM improves DDPM by using two modules that predict noise and clean image, allowing metric functions like LPIPS loss to improve image quality.
Rachit Bansal,Bidisha Samanta,Siddharth Dalmia,Nitish Gupta,Shikhar Vashishth,Sriram Ganapathy,Abhishek Bapna,Prateek Jain,Partha Talukdar
http://arxiv.org/abs/2401.02412v1
Compressor summary: CALM is a method that composes foundation models with specific models using cross-attention to enable new capabilities while preserving existing ones and improving efficiency.
Alex Trevithick,Matthew Chan,Towaki Takikawa,Umar Iqbal,Shalini De Mello,Manmohan Chandraker,Ravi Ramamoorthi,Koki Nagano
http://arxiv.org/abs/2401.02411v1
Compressor summary: Our method improves the resolution and consistency of 3D geometry generated by 3D GANs using high-resolution neural volume rendering without post-processing superresolution.
Pouyan Sajadi,Mostafa Rahmani Dehaghani,Yifan Tang,G. Gary Wang
http://arxiv.org/abs/2401.02403v1
Compressor summary: The paper presents a physics-informed neural network that predicts temperature fields in metal additive manufacturing using real-time data and can handle different scenarios.
Zihao Xiao,Longlong Jing,Shangxuan Wu,Alex Zihao Zhu,Jingwei Ji,Chiyu Max Jiang,Wei-Chih Hung,Thomas Funkhouser,Weicheng Kuo,Anelia Angelova,Yin Zhou,Shiwei Sheng
http://arxiv.org/abs/2401.02402v1
Compressor summary: Our paper proposes a novel 3D open-vocabulary panoptic segmentation method that fuses LiDAR and vision features, using a single classification head and new distillation losses to improve performance on novel classes.
Zizhang Li,Dor Litvak,Ruining Li,Yunzhi Zhang,Tomas Jakab,Christian Rupprecht,Shangzhe Wu,Andrea Vedaldi,Jiajun Wu
http://arxiv.org/abs/2401.02400v1
Compressor summary: The authors propose 3D-Fauna, a method that learns a deformable 3D animal model from 2D images and the Semantic Bank of Skinned Models to handle rare species with limited data.
Erisa Hasani,Rachel A. Ward
http://arxiv.org/abs/2401.02398v1
Compressor summary: The paper proposes a new method to create synthetic training data for deep learning neural operators without using classical numerical solvers for partial differential equations (PDEs).
Peiyuan Zhang,Guangtao Zeng,Tianduo Wang,Wei Lu
http://arxiv.org/abs/2401.02385v1
Compressor summary: TinyLlama is a small, efficient language model that performs well on various tasks and is available for free on GitHub.
Fanqing Meng,Wenqi Shao,Quanfeng Lu,Peng Gao,Kaipeng Zhang,Yu Qiao,Ping Luo
http://arxiv.org/abs/2401.02384v1
Compressor summary: The text introduces ChartAssistant, a vision-language model for universal chart comprehension and reasoning, which outperforms existing methods without task-specific fine-tuning.
Darshan Venkatrayappa,Alain Tremeau,Damien Muselet,Philippe Colantoni
http://arxiv.org/abs/2401.02383v1
Compressor summary: The study compares 3D body shape and pose estimation methods for contemporary dance, showing that multi-frame methods perform better than single-frame methods.
Griffin Adams,Jason Zucker,Noémie Elhadad
http://arxiv.org/abs/2401.02369v1
Compressor summary: Scientists fine-tune LLMs to generate hospital discharge summaries, using a smaller encoder model to predict salient entities and sentence-level planning with entity retrieval (SPEER) to improve coverage and faithfulness.
Shahed Rezaei,Ahmad Moeineddin,Michael Kaliske,Markus Apel
http://arxiv.org/abs/2401.02363v1
Compressor summary: The paper proposes a physics-informed deep learning method that solves steady-state heat equations in heterogeneous solids faster and more accurately than classical methods or pure data-driven approaches.
Xiangyu Zhao,Yicheng Chen,Shilin Xu,Xiangtai Li,Xinjiang Wang,Yining Li,Haian Huang
http://arxiv.org/abs/2401.02361v1
Compressor summary: MM-Grounding-DINO is an open-source version of Grounding-DINO, a state-of-the-art model for multiple vision tasks, with comprehensive technical details and better performance.
Marwan Taher,Ignacio Alzugaray,Andrew J. Davison
http://arxiv.org/abs/2401.02357v1
Compressor summary: The paper presents a system that uses a density field from an efficient radiance field method to accurately and robustly estimate the pose of small, challenging objects with reflective surfaces using a single wrist-mounted camera on a robot arm.
Ezgi Korkmaz
http://arxiv.org/abs/2401.02349v1
Compressor summary: The paper discusses the challenges of overfitting and generalization in deep reinforcement learning, and proposes a unified framework to improve robustness.
Longtian Qiu,Shan Ning,Xuming He
http://arxiv.org/abs/2401.02347v1
Compressor summary: The paper proposes a text-only trained zero-shot image captioning framework that leverages subregion features and reduces the modality gap between images and texts using noise injection and CLIP reranking, improving performance on common datasets.
Yabin Wang,Zhiwu Huang,Zhiheng Ma,Xiaopeng Hong
http://arxiv.org/abs/2401.02335v1
Compressor summary: The paper introduces DFLIP-3K, a large and diverse deepfake database that enables the development of convincing and explainable deepfake detection methods through linguistic profiling of deepfakes.
Uday Allu,Biddwan Ahmed,Vishesh Tripathi
http://arxiv.org/abs/2401.02333v1
Compressor summary: The paper proposes an improved method for finding accurate information from complex tables in PDF documents using RAG-based systems and various natural language processing techniques.
Yichen Zhu,Minjie Zhu,Ning Liu,Zhicai Ou,Xiaofeng Mou,Jian Tang
http://arxiv.org/abs/2401.02330v1
Compressor summary: LLaVA-Phi is a small but powerful multi-modal assistant that uses a tiny language model to interact effectively with both text and images, making it suitable for real-time applications and resource-efficient systems.
Xinyang Pu,Hecheng Jia,Linghao Zheng,Feng Wang,Feng Xu
http://arxiv.org/abs/2401.02326v1
Compressor summary: The ClassWise-SAM-Adapter (CWSAM) adapts a large vision foundation model to efficiently classify landcover on SAR images using fewer resources and outperforming conventional methods.
Parvin Malekzadeh,Konstantinos N. Plataniotis,Zissis Poulos,Zeyu Wang
http://arxiv.org/abs/2401.02325v1
Compressor summary: The paper proposes a new quantile Huber loss function based on Wasserstein distance that improves robustness and generalization in distributional reinforcement learning by accounting for noise in predicted and target quantiles.
Yiran Song,Qianyu Zhou,Xiangtai Li,Deng-Ping Fan,Xuequan Lu,Lizhuang Ma
http://arxiv.org/abs/2401.02317v1
Compressor summary: Key points: - The paper proposes Scalable Bias-Mode Attention Mask (BA-SAM) to improve Segment Anything Model (SAM) performance on images with varying resolutions - BA-SAM introduces a scaling factor and a bias-mode attention mask to adapt to different token sequence lengths and prioritize neighboring information - BA-SAM achieves better or state-of-the-art results in zero-shot and fine-tuning settings on diverse datasets Summary: The paper presents BA-SAM, a method that enhances SAM's adaptability to varying image resolutions by adjusting the attention layer and prioritizing neighboring information, leading to better or state-of-the-art segmentation results.
Leng Kai,Zhang Zhijie,Liu Jie,Zed Boukhers,Sui Wei,Cong Yang,Li Zhijun
http://arxiv.org/abs/2401.02313v1
Compressor summary: This paper introduces a self-supervised approach for edge detection using multi-level, multi-homography techniques to transfer annotations from synthetic to real data, and SuperEdge, a model that extracts edges at pixel and object levels.
Hao Sun,Mingyao Zhou,Wenjing Chen,Wei Xie
http://arxiv.org/abs/2401.02309v1
Compressor summary: The paper proposes a new method, TR-DETR, that leverages the reciprocal relationship between video moment retrieval and highlight detection tasks to improve performance on these related tasks using a local-global multi-modal alignment module and a task cooperation module.
Marcin Łoś,Maciej Paszyński
http://arxiv.org/abs/2401.02300v1
Compressor summary: The authors propose a Robust Physics-Informed Neural Network (RPINN) that improves the loss function in standard PINNs by incorporating the residual and the inverse of the Gram matrix, resulting in a more accurate approximation of PDE solutions with better convergence.
Seyed Mahed Mousavi,Gabriel Roccabruna,Simone Alghisi,Massimo Rizzoli,Mirco Ravanelli,Giuseppe Riccardi
http://arxiv.org/abs/2401.02297v1
Compressor summary: This paper evaluates the performance of large language models on spoken task-oriented dialogues and finds that they are not robust to noise by default, but can be improved with fine-tuning.
Iara Cunha,Marcos Eduardo Valle
http://arxiv.org/abs/2401.02296v1
Compressor summary: The paper presents a new training algorithm for single-layer morphological perceptrons that combines two existing methods and shows its effectiveness in binary classification problems.
Shengtao Li,Ge Gao,Yudong Liu,Yu-Shen Liu,Ming Gu
http://arxiv.org/abs/2401.02292v1
Compressor summary: The paper introduces GridFormer, a method that uses an attention mechanism between grid and point features for efficient 3D surface reconstruction with high precision.
Heng Chang,Jiangnan Ye,Alejo Lopez Avila,Jinhua Du,Jia Li
http://arxiv.org/abs/2401.02290v1
Compressor summary: Power-Link is a novel path-based explainer for Knowledge Graph Completion models that uses a simplified graph-powering technique to generate interpretable explanations efficiently and scalably.
Simon Thomine,Hichem Snoussi
http://arxiv.org/abs/2401.02287v1
Compressor summary: Key points: - The text is about unsupervised fabric defect detection using a knowledge distillation-based approach. - The approach is based on reverse distillation, which avoids reconstructing anomalies and mitigates classifier bias. - The method is fast and robust for different types of fabrics and defects. Summary: The text proposes a fast and robust unsupervised fabric defect detection method using reverse distillation, which prevents anomaly reconstruction and reduces classifier bias.
Lukas Meyer,Floris Erich,Yusuke Yoshiyasu,Marc Stamminger,Noriaki Ando,Yukiyasu Domae
http://arxiv.org/abs/2401.02281v1
Compressor summary: PEGASUS is a versatile dataset generator that creates realistic scenes by combining environments and objects using 3D Gaussian Splatting and physics simulation, enabling pose estimation networks to transfer from synthetic to real-world data.
Febrian Kurniawan,Gandeva Bayu Satrya,Firuz Kamalov
http://arxiv.org/abs/2401.02278v1
Compressor summary: Key points: - The study proposes a machine learning model to classify fish species and their consumability - The model uses a modified MobileNet that is lightweight and runs on limited hardware - The model is trained on a large dataset of Indonesian fish images and achieves high accuracy - The model can help prevent overfishing and protect marine resources Summary: The study develops a machine learning model that can identify fish species and whether they are edible, using a lightweight MobileNet. The model, trained on Indonesian fish images, could aid sustainable fishing and conserve marine life.
Marcos Eduardo Valle,Wington L. Vital,Guilherme Vieira
http://arxiv.org/abs/2401.02277v1
Compressor summary: The paper extends the universal approximation theorem to various vector-valued neural networks, including hypercomplex ones, by defining non-degenerate algebras and applying it to neural networks on these algebras.
Katharina Bendig,René Schuster,Didier Stricker
http://arxiv.org/abs/2401.02274v1
Compressor summary: The paper proposes a new event data augmentation method for DVS classification and object detection that introduces synthetic events for moving objects and improves accuracy.
Linglong Qian,Zina Ibrahim,Richard Dobson
http://arxiv.org/abs/2401.02258v1
Compressor summary: DEARI is a deep recurrent neural network that jointly imputes missing values and their uncertainty in heterogeneous multivariate time series, outperforming the current state-of-the-art methods.
Yuma Tsuta,Naoki Yoshinaga,Shoetsu Sato,Masashi Toyoda
http://arxiv.org/abs/2401.02256v1
Compressor summary: The study examines how to create an automatic response evaluator for open-domain dialogue systems that considers the human perspective and interlocutor awareness.
Chi Ian Tang,Lorena Qendro,Dimitris Spathis,Fahim Kawsar,Akhil Mathur,Cecilia Mascolo
http://arxiv.org/abs/2401.02255v1
Compressor summary: The paper proposes two models for wearable-based human activity recognition using continual learning, comparing them with existing approaches and exploring the balance between retention and adaptation.
Aishwarya Mirashi,Srushti Sonavane,Purva Lingayat,Tejas Padhiyar,Raviraj Joshi
http://arxiv.org/abs/2401.02254v1
Compressor summary: L3Cube-IndicNews is a multilingual text classification corpus for Indian regional languages, covering news headlines, articles, and sub-articles in 10 languages, with evaluation using different models.
Qian Lin,Chao Yu,Zongkai Liu,Zifan Wu
http://arxiv.org/abs/2401.02244v1
Compressor summary: The paper proposes a method to train a policy for multi-objective RL using only offline trajectory data, and addresses the preference-inconsistent demonstration problem by filtering, regularizing, and adapting the policy.
Di Qi,Tong Yang,Xiangyu Zhang
http://arxiv.org/abs/2401.02241v1
Compressor summary: sVORF is a novel unsupervised method for decomposing complex scenes into individual objects from a single image using volumetric object radiance fields and object slots guided by transformers.
Xiang Ma,Xuemei Li,Lexin Fang,Tianlong Zhao,Caiming Zhang
http://arxiv.org/abs/2401.02236v1
Compressor summary: U-Mixer is a framework that combines Unet and Mixer to tackle non-stationarity in time series forecasting by correcting stationarity and preserving temporal dependencies, achieving better performance than SOTA methods.
Guojian Wang,Faguo Wu,Xiao Zhang
http://arxiv.org/abs/2401.02225v1
Compressor summary: The text introduces a new DRL method that uses offline demonstrations as guidance to learn policies faster in tasks with sparse rewards, using a novel trajectory distance based on MMD for optimization.
Rikui Huang,Wei Wei,Xiaoye Qu,Wenfeng Xie,Xianling Mao,Dangyang Chen
http://arxiv.org/abs/2401.02212v1
Compressor summary: JMFRN is a model that jointly reasons about multiple temporal facts from a knowledge graph to answer complex questions, using entity-aware and time-aware attention modules and an answer type discrimination task.
Songbo Hu,Xiaobin Wang,Zhangdie Yuan,Anna Korhonen,Ivan Vulić
http://arxiv.org/abs/2401.02208v1
Compressor summary: DIALIGHT is an open-source toolkit for developing and evaluating multilingual task-oriented dialogue systems using pretrained and large language models, with a focus on systematic comparisons and user feedback.
Haonan Li,Martin Tomko,Timothy Baldwin
http://arxiv.org/abs/2401.02187v1
Compressor summary: Key points: - The paper proposes a dense vector retrieval approach for tourism QA, using pretrained language models and location encoder to encode questions and POIs separately. - The method is efficient, effective, and outperforms previous methods on a real-world dataset. - The method allows global evaluation with a larger search space than previous work. Summary: The paper presents an efficient and effective method for tourism QA using dense vector retrieval, which encodes questions and POIs separately with pretrained language models and location encoder, and enables global evaluation with a large search space.
Shih-Chi Ma,Tatiana Ermakova,Benjamin Fabian
http://arxiv.org/abs/2401.02183v1
Compressor summary: FairGridSearch is a framework for comparing fairness-enhancing models in binary classification, considering various factors such as metric selection, base estimator choice, and classification threshold, which can affect model fairness differently across datasets.
Weihao Li,Lei Tan,Pingyang Dai,Yan Zhang
http://arxiv.org/abs/2401.02173v1
Compressor summary: The paper proposes a two-stage training approach with prompt tuning strategy to improve text-to-image person re-identification using CLIP model by decoupling domain adaptation and task adaptation.
Yukang Zhang,Yang Lu,Yan Yan,Hanzi Wang,Xuelong Li
http://arxiv.org/abs/2401.02162v1
Compressor summary: This paper proposes a novel method to reduce modality discrepancy in visible-infrared person re-identification by exploring frequency domain information, which improves performance on two datasets.
Xuanhua He,Tao Hu,Guoli Wang,Zejin Wang,Run Wang,Qian Zhang,Keyu Yan,Ziyi Chen,Rui Li,Chenjun Xie,Jie Zhang,Man Zhou
http://arxiv.org/abs/2401.02161v1
Compressor summary: The authors propose FourierISP, a novel neural network framework that enhances both color and structure of smartphone RAW images by separating them in the frequency domain and using three subnetworks for optimization.
Rushi Chavda,Darshan Makwana,Vraj Patel,Anupam Shukla
http://arxiv.org/abs/2401.02158v1
Compressor summary: The paper reports on Team Shayona's success in two shared tasks involving binary classification of COVID-19 tweets and social anxiety Reddit posts using BERT and LightGBM models.
Yuxuan Liu,Haozhao Wang,Shuang Wang,Zhiming He,Wenchao Xu,Jialiang Zhu,Fan Yang
http://arxiv.org/abs/2401.02154v1
Compressor summary: Key points: - The text is about a method to estimate causal effects among events with private data silos - The method uses disentangle architecture, shared and private branches, and global constraints - The method improves the accuracy of causal effect estimation compared to existing methods Summary: The text proposes a new method that can estimate causal effects among events with private data silos using disentangle architecture, shared and private branches, and global constraints, which improves the accuracy of causal effect estimation.
Xuanhua He,Keyu Yan,Rui Li,Chengjun Xie,Jie Zhang,Man Zhou
http://arxiv.org/abs/2401.02151v1
Compressor summary: The FAME learning framework is a novel method for pan-sharpening that uses frequency domain techniques to reconstruct missing high-frequency information in multi-spectral images, outperforming existing methods.
Mei Wang,Weihong Deng,Sen Su
http://arxiv.org/abs/2401.02150v1
Compressor summary: The paper proposes a novel network to learn debiased representations by using a marginal softmax loss that penalizes spurious correlations and adapts margin parameters through meta learning.
Ziqiang Zheng,Yiwei Chen,Jipeng Zhang,Tuan-Anh Vu,Huimin Zeng,Yue Him Wong Tim,Sai-Kit Yeung
http://arxiv.org/abs/2401.02147v1
Compressor summary: The study evaluates GPT-4V's performance on marine analysis tasks and finds it lacking domain-specific knowledge, setting a new standard for future MLLM developments.
Cheng-Te Li,Yu-Che Tsai,Chih-Yao Chen,Jay Chiehen Liao
http://arxiv.org/abs/2401.02143v1
Compressor summary: The text surveys Graph Neural Networks (GNNs) for Tabular Data Learning (TDL), highlighting their strengths, challenges, applications, and future directions.
Xuehao Gao,Yang Yang,Zhenyu Xie,Shaoyi Du,Zhongqian Sun,Yang Wu
http://arxiv.org/abs/2401.02142v1
Compressor summary: The paper introduces GUESS, a generative framework that synthesizes human motion from text descriptions using a cascaded latent diffusion model and multi-condition fusion mechanism to improve accuracy, realisticness, and diversity.
Xinzhe Luo,Xin Wang,Linda Shapiro,Chun Yuan,Jianfeng Feng,Xiahai Zhuang
http://arxiv.org/abs/2401.02141v1
Compressor summary: Key points: - A general Bayesian learning framework for multi-modal groupwise registration on medical images - Probabilistic modelling of image generative process with latent variables for common anatomy and geometric variations - Hierarchical variational auto-encoding architecture for inference of latent variables - Unsupervised closed-loop self-reconstruction without complex similarity measures - Computationally efficient, scalable and flexible disentangled architecture - Inferred structural representations with visual semantics - Superior performance over conventional similarity-based approaches in multiple experiments Summary: The article presents a novel Bayesian learning framework that uses hierarchical variational auto-encoding to perform multi-modal groupwise registration on medical images, achieving superior accuracy, efficiency, scalability and interpretability without complex similarity measures.
Jinfu Liu,Runwei Ding,Yuhang Wen,Nan Dai,Fanyang Meng,Shen Zhao,Mengyuan Liu
http://arxiv.org/abs/2401.02138v1
Compressor summary: The paper introduces a new dual-branch framework called EPP-Net that uses both skeletons and human parsing features for action recognition, improving performance over existing methods.
Ziping Ma,Furong Xu,Jian Liu,Ming Yang,Qingpei Guo
http://arxiv.org/abs/2401.02137v1
Compressor summary: SyCoCa is a multimodal alignment method that improves contrastive captioning by introducing bidirectional interactions between images and texts at both global and local levels using textual and visual cues.
Wendi Cui,Jiaxin Zhang,Zhuohang Li,Lopez Damien,Kamalika Das,Bradley Malin,Sricharan Kumar
http://arxiv.org/abs/2401.02132v1
Compressor summary: The paper proposes DCR, a framework to evaluate and improve the consistency of text generated by LLMs using divide-conquer-reasoning, which outperforms existing methods in multiple tasks and reduces hallucination issues.
Jiacheng Wang,Ping Liu,Wei Xu
http://arxiv.org/abs/2401.02126v1
Compressor summary: Key points: - Existing methods struggle with combining rigid and non-rigid edits in text-to-image tasks - Proposed framework can handle both types of edits using text or reference images - Framework uses dual-path injection, self-attention, and latent fusion to improve quality and versatility Summary: The paper presents a novel text-to-image editing framework that can perform rigid and non-rigid edits guided by text or reference images, using techniques like dual-path injection, self-attention, and latent fusion to achieve high quality and flexibility.
Tzu-Han Lin,How-Shing Wang,Hao-Yung Weng,Kuang-Chen Peng,Zih-Ching Chen,Hung-yi Lee
http://arxiv.org/abs/2401.02122v1
Compressor summary: PEFT methods have varying effects on speech processing, and using an ensemble approach with majority voting improves performance over DARTS and a baseline method.
Zhenwen Li,Tao Xie
http://arxiv.org/abs/2401.02115v1
Compressor summary: Key points: - Text-to-SQL models generate candidate SQL queries and need a re-rank method to select the best one - Previous studies use test cases for code generation, but not for text-to-SQL - The proposed method generates databases and uses LLMs to predict execution results as test cases - The re-rank method selects the best SQL query based on pass numbers and generation probabilities - The experiment results show a 3.6\% improvement for some state-of-the-art models Summary: The paper proposes a text-to-SQL re-rank method that uses LLMs to predict execution results as test cases, and selects the best SQL query based on pass numbers and generation probabilities, improving the performance of some models by 3.6\%.
Fahim Faisal Niloy,Kishor Kumar Bhaumik,Simon S. Woo
http://arxiv.org/abs/2401.02113v1
Compressor summary: Key points: - The paper proposes a novel test-time adaptation (TTA) approach for satellite image segmentation - The approach estimates global Batch Normalization statistics and refines predicted masks using global class centers - The method is backpropagation-free, fast, and lightweight Summary: The paper presents a fast and lightweight TTA method for satellite image segmentation that adapts to distribution shifts by estimating global Batch Normalization statistics and refining predicted masks.
Debapriya Roy,Sanchayan Santra,Diganta Mukherjee,Bhabatosh Chanda
http://arxiv.org/abs/2401.02110v1
Compressor summary: The paper presents a new system for virtual try-on (VTON) of clothes, addressing the limitations of existing methods by using anatomy-aware geometric transformations and part-based warping.
Zeyu Li,Jingsheng Gao,Tong Yu,Suncheng Xiang,Jiacheng Ruan,Ting Liu,Yuzhuo Fu
http://arxiv.org/abs/2401.02099v1
Compressor summary: The text introduces CLAPP, a novel model that uses contrastive language-audio pre-training to improve underwater vessel classification and recognition from raw audio data and vessel state text pairs.
Jeffrey Zhang,Shao-Yu Chang,Kedan Li,David Forsyth
http://arxiv.org/abs/2401.02097v1
Compressor summary: The paper discusses how to improve retail photography using Stable Diffusion methods by addressing inconsistencies in image generation and backgrounds.
Mincong Huang,Chao Wang,Chi Ma,Yineng Zhang,Peng Zhang,Lei Yu
http://arxiv.org/abs/2401.02088v1
Compressor summary: BPipe improves memory utilization for large Transformer models like GPT-3, but not for LLaMA, and its benefits depend on flash attention.
Tingyang Chen,Dazhuo Qiu,Yinghui Wu,Arijit Khan,Xiangyu Ke,Yunjun Gao
http://arxiv.org/abs/2401.02086v1
Compressor summary: The paper proposes GVEX, a method to generate graph views for explanation, which helps understand specific class labels assigned by graph neural networks (GNNs) in analytical tasks like graph classification.
Yan Wang,Ling Guo,Hao Wu,Tao Zhou
http://arxiv.org/abs/2401.02080v1
Compressor summary: The energy-based diffusion generator is a new sampler that uses a variational autoencoder with a diffusion model encoder and a generalized Hamiltonian dynamics decoder to generate samples from arbitrary target distributions, outperforming existing methods.
Hanhui Wang,Huaize Ye,Yi Xia,Xueyan Zhang
http://arxiv.org/abs/2401.02076v1
Compressor summary: The paper proposes an improved single-source domain generalization (SDG) method for medical image segmentation using a parallel framework with the Segment Anything Model (SAM).
Chen Zheng,Ke Sun,Da Tang,Yukun Ma,Yuyu Zhang,Chenguang Xi,Xun Zhou
http://arxiv.org/abs/2401.02072v1
Compressor summary: ICE-GRT is a new AI model that uses Reinforcement Learning from Human Feedback to improve its understanding and reasoning abilities in domain-specific tasks without sacrificing general task performance, outperforming other large language models.
Hien Dang,Tho Tran,Tan Nguyen,Nhat Ho
http://arxiv.org/abs/2401.02058v1
Compressor summary: The paper studies how neural collapse, a phenomenon where deep neural networks' last-layer features become extreme points of a simplex, changes when the training data is imbalanced between classes.
Hao Yang,Hong-Yu Zhou,Cheng Li,Weijian Huang,Jiarun Liu,Shanshan Wang
http://arxiv.org/abs/2401.02044v1
Compressor summary: AFLoc is a new model that can find diseases in medical images without expert annotations, using multi-level learning and image-text alignment to adapt to diverse pathologies and outperform existing methods.
Chuanming Wang,Yuxin Yang,Mengshi Qi,Huadong Ma
http://arxiv.org/abs/2401.02041v1
Compressor summary: Key points: - Current ReID system is centralized and impractical for many videos - Propose a cloud-edge collaborative inference framework for ReID systems - Introduce DaCM to model spatial-temporal correlations among instances and reduce transmission overhead Summary: The paper proposes a cloud-edge collaboration framework for ReID systems with DaCM, a model that leverages spatial-temporal correlations to improve efficiency and scalability.
Yiheng Liu,Hao He,Tianle Han,Xu Zhang,Mengyuan Liu,Jiaming Tian,Yutong Zhang,Jiaqi Wang,Xiaohui Gao,Tianyang Zhong,Yi Pan,Shaochen Xu,Zihao Wu,Zhengliang Liu,Xin Zhang,Shu Zhang,Xintao Hu,Tuo Zhang,Ning Qiang,Tianming Liu,Bao Ge
http://arxiv.org/abs/2401.02038v1
Compressor summary: This paper reviews how ChatGPT's introduction boosted Large Language Models' usage for downstream tasks, emphasizing cost-efficient training and deployment techniques and their evolution.
Wei Zhu,Wenfeng Li,Xing Tian,Pengfei Wang,Xiaoling Wang,Jin Chen,Yuanbin Wu,Yuan Ni,Guotong Xie
http://arxiv.org/abs/2401.02034v1
Compressor summary: Key points: - The text proposes a new task, Text2MDT, to extract medical decision trees from texts automatically - It introduces an annotated dataset in Chinese and two methods for the task: end-to-end with LLMs and pipeline with encoder-based models - It reports promising results of the end-to-end method and the COT prompting technique, and comparable performance of the pipeline method Summary: The text presents Text2MDT, a novel task to generate medical decision trees from texts using large language models, and evaluates two methods with an annotated Chinese dataset.
Yunfan Ye,Kai Xu,Yuhang Huang,Renjiao Yi,Zhiping Cai
http://arxiv.org/abs/2401.02032v1
Compressor summary: The DiffusionEdge model uses a diffusion probabilistic approach to improve edge detection accuracy and sharpness, achieving superior results on various datasets.
Ruofei Wang,Renjie Wan,Zongyu Guo,Qing Guo,Rui Huang
http://arxiv.org/abs/2401.02031v1
Compressor summary: Spy-Watermark is a novel backdoor attack method that uses a learnable watermark embedded in images to deceive victim models while resisting data corruption and defense measures.
Qiang Zhang,Ruida Zhou,Yang Shen,Tie Liu
http://arxiv.org/abs/2401.02019v1
Compressor summary: The paper proposes a new approach to offline optimization that views it as sampling from a generative model and uses re-weighting with a PAC lower bound to learn a weight function and a score-based generative model, achieving robustly competitive performance on benchmarks.
Ling Yang,Jingwei Liu,Shenda Hong,Zhilong Zhang,Zhilin Huang,Zheming Cai,Wentao Zhang,Bin Cui
http://arxiv.org/abs/2401.02015v1
Compressor summary: ConPreDiff is a diffusion model that predicts neighborhood context for better image synthesis using a context decoder during training and removing it for inference.
Jing Wu,Suiyao Chen,Qi Zhao,Renat Sergazinov,Chen Li,Shengjie Liu,Chongchao Zhao,Tianpei Xie,Hanqing Guo,Cheng Ji,Daniel Cociorva,Hakan Brunzel
http://arxiv.org/abs/2401.02013v1
Compressor summary: SwitchTab is a novel self-supervised method for tabular data that captures latent dependencies and improves downstream tasks by producing more representative embeddings and enhancing traditional classification methods.
Allen Minch,Hung Anh Vu,Anne Marie Warren
http://arxiv.org/abs/2401.02012v1
Compressor summary: The project aims to create fairer AI models by using adversarial training techniques that address the bias inherent in Deep Neural Networks.
Wenjing Yan,Xuanyu Cao
http://arxiv.org/abs/2401.02011v1
Compressor summary: The paper proposes a robust decentralized saddle-point algorithm for multi-task optimization with random link failures, achieving regret and constraint violation bounds matching those of perfect communication scenarios.
Wenqi Zhang,Yongliang Shen,Linjuan Wu,Qiuying Peng,Jun Wang,Yueting Zhuang,Weiming Lu
http://arxiv.org/abs/2401.02009v1
Compressor summary: Self-Contrast improves Large Language Model's reflection by exploring diverse perspectives, contrasting differences, and summarizing them into a checklist for re-evaluation and error reduction.
Farhad Pourkamali-Anaraki,Jamal F. Husseini,Evan J. Pineda,Brett A. Bednarcyk,Scott E. Stapleton
http://arxiv.org/abs/2401.02008v1
Compressor summary: The paper presents a new two-stage machine learning framework that solves inverse problems by identifying promising input designs and evaluating them efficiently using conformal inference.