This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-17 generated by the compressor, my personal LLM-based project.
Jiahao Nie,Yun Xing,Gongjie Zhang,Pei Yan,Aoran Xiao,Yap-Peng Tan,Alex C. Kot,Shijian Lu
http://arxiv.org/abs/2401.08407v1
Compressor summary: The paper proposes a novel cross-domain fine-tuning strategy for few-shot segmentation, addressing domain transfer and overfitting issues using bi-directional and iterative methods.
Aman Gupta,Anup Shirgaonkar,Angels de Luis Balaguer,Bruno Silva,Daniel Holstein,Dawei Li,Jennifer Marsman,Leonardo O. Nunes,Mahsa Rouzbahman,Morris Sharp,Nick Mecklenburg,Rafael Padilha,Ranveer Chandra,Renato Luiz de Freitas Cunha,Roberto de M. Estevão Filho,Ryan Tsang,Sara Malvar,Swati Sharma,Todd Hendry,Vijay Aski,Vijetha Vijayendran,Vinamra Benara
http://arxiv.org/abs/2401.08406v1
Compressor summary: The paper compares fine-tuning and retrieval-augmented generation (RAG) approaches for integrating domain-specific data into large language models (LLMs), proposes a pipeline to evaluate both methods, and applies it to an agricultural dataset, showing improved accuracy and answer similarity.
Qiao Jin,Fangyuan Chen,Yiliang Zhou,Ziyang Xu,Justin M. Cheung,Robert Chen,Ronald M. Summers,Justin F. Rousseau,Peiyun Ni,Marc J Landsman,Sally L. Baxter,Subhi J. Al'Aref,Yijia Li,Michael F. Chiang,Yifan Peng,Zhiyong Lu
http://arxiv.org/abs/2401.08396v1
Compressor summary: GPT-4V can answer medical image quizzes better than humans but sometimes has flawed reasoning.
Zongxin Yang,Guikun Chen,Xiaodi Li,Wenguan Wang,Yi Yang
http://arxiv.org/abs/2401.08392v1
Compressor summary: DoraemonGPT is a system that uses large language models to process videos, store task-related attributes, reason spatially and temporally, access external knowledge, and plan actions for various real-world applications.
Wasim Ahmad,Maha Shadaydeh,Joachim Denzler
http://arxiv.org/abs/2401.08386v1
Compressor summary: The text describes a method to identify causal relationships among groups of variables in nonlinear systems using deep learning and interventions, which improves on existing methods and helps understand complex systems like climate and brain networks.
Jinghan Yao,Quentin Anthony,Aamir Shafi,Hari Subramoni,Dhabaleswar K.,Panda
http://arxiv.org/abs/2401.08383v1
Compressor summary: ExFlow optimizes MoE models for parallel inference by exploiting inter-layer expert affinity, reducing communication overhead and improving throughput on distributed systems.
Miquel Esplà-Gomis,Víctor M. Sánchez-Cartagena,Juan Antonio Pérez-Ortiz,Felipe Sánchez-Martínez
http://arxiv.org/abs/2401.08374v1
Compressor summary: The paper introduces a neural CAT tool that uses both TMs and in-domain monolingual corpora, and evaluates its effectiveness on four language pairs.
Kengatharaiyer Sarveswaran
http://arxiv.org/abs/2401.08367v1
Compressor summary: The paper describes the structure and features of modern Tamil, useful for linguists and computer scientists working with the language.
Xiaotong Liu,Jinxin Wang,Di Wang,Shao-Bo Lin
http://arxiv.org/abs/2401.08364v1
Compressor summary: The paper proposes a weighted spectral filter method to improve kernel interpolation for noisy data in image sciences, using spherical positive quadrature rules and high-pass spectral filters.
Junliang Luo,Tianyu Li,Di Wu,Michael Jenkin,Steve Liu,Gregory Dudek
http://arxiv.org/abs/2401.08358v1
Compressor summary: The report reviews existing methods to detect and reduce factually incorrect responses generated by large language models like ChatGPT, Bard, and Llama.
Xilai Li,Xiaosong Li,Haishu Tan,Jinyang Li
http://arxiv.org/abs/2401.08357v1
Compressor summary: The study presents a new image fusion algorithm that enhances small focus areas and improves object detection performance.
Jianhui Pang,Fanghua Ye,Longyue Wang,Dian Yu,Derek F. Wong,Shuming Shi,Zhaopeng Tu
http://arxiv.org/abs/2401.08350v1
Compressor summary: This study examines six core challenges in neural machine translation and finds that large language models improve some aspects but introduce new challenges related to efficiency, low-resource languages, and evaluation.
Jakub Białek,Wojtek Kuberski,Nikolaos Perrakis
http://arxiv.org/abs/2401.08348v1
Compressor summary: M-CBPE is a novel method for estimating ML model performance on unlabeled data, accounting for covariate shift and working for any performance metric and data type.
Fei Guo,YiKang Wang,Han Qi,WenPing Jin,Li Zhu
http://arxiv.org/abs/2401.08345v1
Compressor summary: The paper proposes a method for few-shot action recognition using multi-modal fusion and multi-view distillation to improve robustness to class overlaps and outliers.
Zhaoge Liu,Xiaohao Xu,Yunkang Cao,Weiming Shen
http://arxiv.org/abs/2401.08332v1
Compressor summary: Generative Denoise Distillation (GDD) is a novel method that improves knowledge transfer from complex models to simpler ones by adding noise and aligning features, achieving state-of-the-art results in computer vision tasks.
Qixin Zhang,Zongqi Wan,Zengde Deng,Zaiyi Chen,Xiaoming Sun,Jialin Zhang,Yu Yang
http://arxiv.org/abs/2401.08330v1
Compressor summary: The paper proposes a boosting technique for continuous DR-submodular maximization problems that improves the approximation guarantee of standard Projected Gradient Ascent to optimal levels with minimal modifications.
Devavrat Tomar,Guillaume Vray,Jean-Philippe Thiran,Behzad Bozorgtabar
http://arxiv.org/abs/2401.08328v1
Compressor summary: The paper introduces UnMix-TNS, a novel method that simulates the i.i.d. environment for test-time adaptation by mixing instance-wise statistics with multiple unmixed components, improving stability and performance in various scenarios.
Junjie Ye,Yilong Wu,Songyang Gao,Sixian Li,Guanyu Li,Xiaoran Fan,Qi Zhang,Tao Gui,Xuanjing Huang
http://arxiv.org/abs/2401.08326v1
Compressor summary: RoTBench is a benchmark for testing the robustness of large language models in tool learning across different noise levels, revealing their weaknesses and proposing RoTTuning to improve them.
Yizhuo Wu,Gagan Deep Singh,Mohammadreza Beikmirza,Leo de Vreede,Morteza Alavi,Chang Gao
http://arxiv.org/abs/2401.08318v1
Compressor summary: This paper introduces OpenDPD, an open-source framework for fast digital pre-distortion exploration and comparison using a novel Dense Gated Recurrent Unit (DGRU)-DPD model that outperforms previous models on a digital power amplifier.
Chengguang Gan,Qinghao Zhang,Tatsunori Mori
http://arxiv.org/abs/2401.08315v1
Compressor summary: The paper introduces a novel agent framework using large language models to efficiently summarize, grade, and select candidates from resumes, achieving significant speed and accuracy improvements over traditional methods.
Zhongwang Zhang,Zhiwei Wang,Junjie Yao,Zhangchen Zhou,Xiaolong Li,Weinan E,Zhi-Qin John Xu
http://arxiv.org/abs/2401.08309v1
Compressor summary: The text proposes the concept of an anchor function as a benchmark for studying transformer-based language models, enabling academic research with limited resources to explore their behavior and operations in various tasks.
Weixiang Zhao,Shilong Wang,Yulin Hu,Yanyan Zhao,Bing Qin,Xuanyu Zhang,Qing Yang,Dongliang Xu,Wanxiang Che
http://arxiv.org/abs/2401.08295v1
Compressor summary: The paper proposes a new method called Dual Attention Framework (DAPT) that improves large language models' ability to learn continually by aligning learning and selection modules, enhancing their performance on dynamic tasks.
Shuming Shi,Enbo Zhao,Deng Cai,Leyang Cui,Xinting Huang,Huayang Li
http://arxiv.org/abs/2401.08294v1
Compressor summary: Inferflow is a flexible and efficient inference engine for large language models that supports 3.5-bit quantization and hybrid model partitioning for multi-GPU inference.
Matthijs Douze,Alexandr Guzhva,Chengqi Deng,Jeff Johnson,Gergely Szilvasy,Pierre-Emmanuel Mazaré,Maria Lomeli,Lucas Hosseini,Hervé Jégou
http://arxiv.org/abs/2401.08281v1
Compressor summary: Faiss is a vector similarity search toolkit with various indexing methods and applications for large collections of embeddings.
Yipo Huang,Quan Yuan,Xiangfei Sheng,Zhichao Yang,Haoning Wu,Pengfei Chen,Yuzhe Yang,Leida Li,Weisi Lin
http://arxiv.org/abs/2401.08276v1
Compressor summary: AesBench is an expert benchmark for evaluating multimodal large language models' image aesthetics perception abilities using a database with diverse images and high-quality annotations.
Bin Zhang,Xiangyu Zhu,Xiaoyu Zhang,Zhen Lei
http://arxiv.org/abs/2401.08275v1
Compressor summary: The paper proposes a new face anti-spoofing method that uses diffusion models to denoise spoof images and genuine images, and considers their difference as a discriminative cue for detecting presentation attacks.
Pittawat Taveekitworachai,Febri Abdullah,Ruck Thawonmas
http://arxiv.org/abs/2401.08273v1
Compressor summary: Null-shot prompting exploits hallucination in large language models to improve task performance by using information from the "Examples" section not present in the context.
Zahra Tabatabaei,Adrián Colomer,JAvier Oliver Moll,Valery Naranjo
http://arxiv.org/abs/2401.08272v1
Compressor summary: The authors propose two novel Content-Based Histopathological Image Retrieval (CBHIR) methods using a Siamese network for assisting pathologists in diagnosing breast and skin cancers, outperforming existing methods and addressing the challenges of Spitzoid Tumors of Uncertain Malignant Potential (STUMP).
Bruno Arcanjo,Bruno Ferrarini,Michael Milford,Klaus D. McDonald-Maier,Shoaib Ehsan
http://arxiv.org/abs/2401.08263v1
Compressor summary: MuSIC is a system that improves visual place recognition by selecting the best technique for each image based on how well it matches previous images.
Oluwatosin Alabi,Tom Vercauteren,Miaojing Shi
http://arxiv.org/abs/2401.08256v1
Compressor summary: Key points: - MIS is beneficial but complex for surgeons - Data-driven surgical vision algorithms can help improve MIS systems - Multitask learning (MTL) is a promising approach for understanding MIS videos - The paper reviews the current MTL systems and discusses their benefits, limitations, trends, and directions Summary: The paper surveys multitask learning systems that use videos from minimally invasive surgery to improve surgical vision algorithms and overcome the complexity of this procedure.
Tom Roth,Inigo Jauregi Unanue,Alsharif Abuadbba,Massimo Piccardi
http://arxiv.org/abs/2401.08255v1
Compressor summary: Key points: - The paper proposes an approach to generate adversarial examples against multilingual classifiers using a fine-tuned paraphrase model. - The model is trained with pre-trained models and vocabulary-mapping matrices for quality and consistency control. - The approach outperforms existing baselines in query efficiency on two multilingual datasets and five languages. Summary: The paper presents a method to fool multilingual classifiers using a paraphrase model fine-tuned with adversarial and quality objectives, achieving better query efficiency than prior methods.
Asuka Tamaru,Junya Hara,Hiroshi Higashi,Yuichi Tanaka,Antonio Ortega
http://arxiv.org/abs/2401.08245v1
Compressor summary: The paper proposes a graph signal processing-based method to optimize the number of neighbors in k-nearest neighbor graphs for various applications, including point cloud denoising.
Mulomba Mukendi Christian,Yun Seon Kim,Hyebong Choi,Jaeyoung Lee,SongHee You
http://arxiv.org/abs/2401.08233v1
Compressor summary: The study proposes a novel feature engineering approach that enhances the accuracy and resilience of deep learning models in predicting wind speed and power by altering data input shapes, achieving high performance across different forecasting horizons.
Chongzhi Zhang,Mingyuan Zhang,Zhiyang Teng,Jiayi Li,Xizhou Zhu,Lewei Lu,Ziwei Liu,Aixin Sun
http://arxiv.org/abs/2401.08232v1
Compressor summary: The paper proposes a new method for natural language video localization that generates a global 2D temporal map using a conditional denoising diffusion process, addressing sparsity and discontinuity issues with a multi-scale technique and an innovative diffusion decoder.
Fabien Geyer,Johannes Freitag,Tobias Schulz,Sascha Uhrig
http://arxiv.org/abs/2401.08225v1
Compressor summary: The paper examines challenges and proposes solutions for using machine learning and neural networks in flying taxis, focusing on number representations and arithmetic efficiency.
Hang Chen,Xinyu Yang,Keqing Du
http://arxiv.org/abs/2401.08221v1
Compressor summary: The authors propose a probabilistic framework to learn causal structures and representations from indeterminate data like dialogue and video with multiple structures and values, and release two datasets with causal annotations for this purpose.
Hanjia Lyu,Weihong Qi,Zhongyu Wei,Jiebo Luo
http://arxiv.org/abs/2401.08212v1
Compressor summary: The study explores how GPT-4V uses emojis in online communication and finds differences between human and AI behaviors, possibly due to cultural biases and limited language training.
Zhongbin Fang,Xia Li,Xiangtai Li,Shen Zhao,Mengyuan Liu
http://arxiv.org/abs/2401.08210v1
Compressor summary: The paper introduces ModelNet-O, a large synthetic dataset with self-occlusion for 3D point cloud classification, and proposes PointMLS, a robust method that uses critical point sampling.
Leheng Zhang,Yawei Li,Xingyu Zhou,Xiaorui Zhao,Shuhang Gu
http://arxiv.org/abs/2401.08209v1
Compressor summary: The paper introduces an ATD-SR method for single image super-resolution, which uses adaptive token dictionaries and category-based self-attention to improve performance.
Zhongtian Ma,Qiaosheng Zhang,Zhen Wang
http://arxiv.org/abs/2401.08197v1
Compressor summary: The paper studies how to complete a rating matrix using sub-sampled data and observed social graphs and hypergraphs, finding a sharp threshold for success and developing an efficient algorithm that exploits these structures.
Yuefeng Zhang,Kai Lin
http://arxiv.org/abs/2401.08194v1
Compressor summary: The text describes a new image compression model that uses frequency-oriented transform to separate images into distinct bands, enabling better compression and preservation of semantic fidelity.
Minpeng Liao,Wei Luo,Chengxi Li,Jing Wu,Kai Fan
http://arxiv.org/abs/2401.08190v1
Compressor summary: The paper introduces a new math dataset enriched with Python code interpretation and proposes a fine-tuning protocol to improve mathematical reasoning in large language models.
Weize Kong,Spurthi Amba Hombaiah,Mingyang Zhang,Qiaozhu Mei,Michael Bendersky
http://arxiv.org/abs/2401.08189v1
Compressor summary: The paper proposes PRewrite, an automated tool using reinforcement learning to optimize and generate effective prompts for LLM-based applications, improving on manual and previous methods.
Bingcai Wei
http://arxiv.org/abs/2401.08185v1
Compressor summary: The paper proposes a dual-branch attention fusion network for image rain removal that combines features from convolutional neural networks and Transformers using an attention module.
Seok-Hwan Oh,Guil Jung,Myeong-Gee Kim,Sang-Yun Kim,Young-Min Kim,Hyeon-Jik Lee,Hyuk-Sool Kwon,Hyeon-Min Bae
http://arxiv.org/abs/2401.08178v1
Compressor summary: The Key-point-guided Diffusion probabilistic Model (KDM) is a generative model that uses object's key-points to manipulate images, producing realistic and consistent results on various tasks such as face generation, human pose synthesis, and echocardiography video prediction.
Zhen Zhou,Junfeng Fan,Yunkai Ma,Sihan Zhao,Fengshui Jing,Min Tan
http://arxiv.org/abs/2401.08174v1
Compressor summary: CFNet is a coarse-to-fine framework for segmenting completely occluded and dense objects using box prompt-based segmentation foundation models, which improves performance by exploiting geometric properties and reducing dependency on bounding box detection.
Zida Chen,Ziran Zhang,Haoying Li,Menghao Li,Yueting Chen,Qi Li,Huajun Feng,Zhihai Xu,Shiqi Chen
http://arxiv.org/abs/2401.08171v1
Compressor summary: The paper proposes JARNet, a two-stage network to remove distortion and blur in LAP images using optical flow correction and jitter-aware techniques, and presents a data synthesis pipeline for realistic degradation simulation.
Wei Jiang,Yongqi Zhai,Hangyu Li,Ronggang Wang
http://arxiv.org/abs/2401.08154v1
Compressor summary: The paper presents a method for image compression with perceptual quality improvement using adversarial and ROI losses, by Team TLIC.
Sanaz Hasanzadeh Fard
http://arxiv.org/abs/2401.08147v1
Compressor summary: This paper reviews lesser-explored applications of dynamic graph learning in various domains using machine learning.
Kiyohiro Nakayama,Mikaela Angelina Uy,Yang You,Ke Li,Leonidas Guibas
http://arxiv.org/abs/2401.08140v1
Compressor summary: The text introduces ProvNeRF, a model that improves NeRFs by incorporating per-point provenance for sparse and unconstrained views, enabling better understanding and reconstruction of 3D scenes.
Fu Feng,Jing Wang,Xin Geng
http://arxiv.org/abs/2401.08139v1
Compressor summary: The authors propose Genetic Transfer Learning (GTL), a framework inspired by evolution, to efficiently transfer essential knowledge from ancestor networks to descendant networks using learngenes, achieving improved performance on downstream tasks with fewer parameters.
Thanh Nguyen Canh,Xiem HoangVan
http://arxiv.org/abs/2401.08135v1
Compressor summary: The paper proposes a machine learning-based approach to detect blackhole attacks, which disrupt VANET communication and compromise its security and integrity, using a comprehensive dataset and various algorithms.
Xinni Jiang,Zengsheng Kuang,Chunle Guo,Ruixun Zhang,Lei Cai,Xiao Fan,Chongyi Li
http://arxiv.org/abs/2401.08123v1
Compressor summary: The D2A2 network restores depth details from RGB images using dynamic dual alignment and mask-to-pixel feature aggregation to handle modal and geometrical misalignments.
Gengyue Han,Xiaohan Liu,Xianyue Peng,Hao Wang,Yu Han
http://arxiv.org/abs/2401.08121v1
Compressor summary: CycLight is a novel cycle-level deep RL approach for adaptive traffic signal control that reduces computational burden, enhances practicality and safety, and works well with multi-agent cooperation and attention mechanism.
Lequan Lin,Dai Shi,Andi Han,Junbin Gao
http://arxiv.org/abs/2401.08119v1
Compressor summary: SpecSTG is a novel spectral diffusion framework that leverages spatial information in traffic forecasting by generating Fourier representations of future time series and using fast spectral graph convolution.
Qiang Qu,Yiran Shen,Xiaoming Chen,Yuk Ying Chung,Tongliang Liu
http://arxiv.org/abs/2401.08117v1
Compressor summary: The text proposes a novel method called E2HQV that uses a theory-inspired model to generate high-quality video frames from event camera inputs, outperforming existing data-driven approaches.
Mohammad Khateri,Morteza Ghahremani,Alejandra Sierra,Jussi Tohka
http://arxiv.org/abs/2401.08115v1
Compressor summary: Key points: - Deep learning approach to reconstruct clean high-resolution 3D-EM images from noisy low-resolution ones - Investigate training with no-clean references and different loss functions - Introduce a novel network architecture, EMSR, for enhancing resolution and reducing noise - Compare different training strategies and show the feasibility of the approach Summary: The authors propose a deep learning method, EMSR, to improve the quality of low-resolution electron microscopy images of brain tissue by using different loss functions and training strategies.
Steven A. Grosz,Akash Godbole,Anil K. Jain
http://arxiv.org/abs/2401.08111v1
Compressor summary: The paper presents Palm-ID, a contactless palmprint recognition system that combines global and local features using vision transformer and convolutional neural network, achieving high accuracy and low latency.
Yixuan Li,Peilin Chen,Hanwei Zhu,Keyan Ding,Leida Li,Shiqi Wang
http://arxiv.org/abs/2401.08107v1
Compressor summary: The text proposes a new method for assessing image quality called Shape-Texture Adaptive Fusion, which uses both shape-biased and texture-biased deep features to form a well-rounded statistical description of images and predicts image quality based on the variant Mahalanobis Distance between inner and outer statistics.
Austin Briley,Fatemeh Afghah
http://arxiv.org/abs/2401.08105v1
Compressor summary: This paper develops a real-time image classification model for wildfire detection on UAVs using NVIDIA's TensorRT and other optimization techniques, improving speed and maintaining accuracy.
Anh-Cuong Pham,Van-Quang Nguyen,Thi-Hong Vuong,Quang-Thuy Ha
http://arxiv.org/abs/2401.08100v1
Compressor summary: KTVIC is a large Vietnamese image captioning dataset for daily life activities that can improve research and applications in this domain.
Hancheng Zuo,Bernard Tiddeman
http://arxiv.org/abs/2401.08099v1
Compressor summary: The study presents a new GAN method for filling missing areas in normal maps, which are important for performance capture, by adapting existing image inpainting techniques.
Mengwei Xu,Wangsong Yin,Dongqi Cai,Rongjie Yi,Daliang Xu,Qipeng Wang,Bingyang Wu,Yihao Zhao,Chen Yang,Shihe Wang,Qiyang Zhang,Zhenyan Lu,Li Zhang,Shangguang Wang,Yuanchun Li,Yunxin Liu,Xin Jin,Xuanzhe Liu
http://arxiv.org/abs/2401.08092v1
Compressor summary: This survey explores the growing need for resource-efficient strategies to support large foundation models across various aspects, from architecture to implementation, due to their significant impact on hardware resources and environmental sustainability.
Fu Li,Xueying Wang,Bin Li,Yunlong Wu,Yanzhen Wang,Xiaodong Yi
http://arxiv.org/abs/2401.08089v1
Compressor summary: The paper proposes a novel method to generate behavior trees for complex tasks using large language models and synthetic data, improving their performance and adaptability.
Yachao Li,Junhui Li,Jing Jiang,Min Zhang
http://arxiv.org/abs/2401.08088v1
Compressor summary: The paper proposes an approach to improve large language models' document-level translation performance by combining sentence-level and document-level instructions, addressing the issue of sentence-level coverage and enhancing translation quality.
Yukun Su,Yiwen Cao,Jingliang Deng,Fengyun Rao,Qingyao Wu
http://arxiv.org/abs/2401.08086v1
Compressor summary: The paper proposes S2CNet, a network for cropping user generated content that preserves both aesthetics and content integrity, using spatial-semantic collaboration and adaptive attention.
Xin Zhang,Yu Liu,Yuming Lin,Qingming Liao,Yong Li
http://arxiv.org/abs/2401.08083v1
Compressor summary: The text describes a new computer vision technique called UV-SAM that uses satellite images to accurately identify and monitor urban villages, which are informal residential areas with poor living conditions in or around cities.
Alireza Nezhadettehad,Arkady Zaslavsky,Rakib Abdur,Siraj Ahmed Shaikh,Seng W. Loke,Guang-Li Huang,Alireza Hassani
http://arxiv.org/abs/2401.08081v1
Compressor summary: The text summarizes the current state and future possibilities of predicting the next location of mobile objects using context-aware artificial intelligence and machine learning techniques, with applications in various domains such as traffic control and public health.
Huafeng Qin,Yiquan Wu,Mounim A. El-Yacoubi,Jun Wang,Guangxiang Yang
http://arxiv.org/abs/2401.08079v1
Compressor summary: Key points: - Vein recognition is high secure and private, but lacks training data - AMCL generates challenging samples to train a robust contrastive learning model for palm-vein recognition - AMCL outperforms existing methods and achieves state-of-the-art results on three databases Summary: The paper proposes AMCL, a method that generates challenging samples to improve vein recognition by contrastive learning, and shows its superior performance on three databases.
Shubham Singh,Mayur Bhat
http://arxiv.org/abs/2401.08077v1
Compressor summary: The text describes a study that uses a transformer neural network to predict Ethereum prices based on other cryptocurrencies and their sentiment, outperforming some alternatives and suggesting a sentiment-driven illusion of causality in crypto markets.
Beibei Yang,Weiling Li,Yan Fang
http://arxiv.org/abs/2401.08068v1
Compressor summary: The paper proposes a new way to represent event streams from neuromorphic sensors using tensor decomposition and a special model to capture both spatial and temporal correlations for better performance in tasks like noise filtering.
Ching-Hao Chiu,Yu-Jen Chen,Yawen Wu,Yiyu Shi,Tsung-Yi Ho
http://arxiv.org/abs/2401.08066v1
Compressor summary: The paper proposes a method for improving fairness and accuracy in medical image diagnosis by enhancing features and regularizing feature entanglement without using sensitive attributes during training.
Lei Duan,Ziyang Jiang,David Carlson
http://arxiv.org/abs/2401.08061v1
Compressor summary: The authors propose using unlabeled satellite images with pseudo-labels from ordinary kriging to improve the performance of a CNN-RF model for climate data.
Cooper Gamble,Shahriar Faghani,Bradley J. Erickson
http://arxiv.org/abs/2401.08058v1
Compressor summary: The study applies conformal prediction to a deep learning model for intracranial hemorrhage detection, improving its ability to identify challenging cases and enhancing trustworthiness in radiology.
Haoran Zhu,Chang Xu,Wen Yang,Ruixiang Zhang,Yan Zhang,Gui-Song Xia
http://arxiv.org/abs/2401.08056v1
Compressor summary: The study proposes a method (DN-TOD) to improve tiny object detection in remote sensing imagery by correcting labels and handling bounding box errors caused by noisy annotations.
Zhixuan Liu,Peter Schaldenbrand,Beverley-Claire Okogwu,Wenxuan Peng,Youngsik Yun,Andrew Hundt,Jihie Kim,Jean Oh
http://arxiv.org/abs/2401.08053v1
Compressor summary: The authors propose a method to improve inclusive representation in generated images by collecting a culturally diverse dataset (CCUB) and using Self-Contrastive Fine-Tuning (SCoFT) to reduce biases and stereotypes.
Bingyuan Zhang,Xulong Zhang,Ning Cheng,Jun Yu,Jing Xiao,Jianzong Wang
http://arxiv.org/abs/2401.08049v1
Compressor summary: EmoTalker is a method for generating realistic and emotionally editable talking faces based on a diffusion model and a new dataset.
Somnath Basu Roy Chowdhury,Nicholas Monath,Avinava Dubey,Manzil Zaheer,Andrew McCallum,Amr Ahmed,Snigdha Chaturvedi
http://arxiv.org/abs/2401.08047v1
Compressor summary: CoverSumm is a fast and accurate algorithm for extractive opinion summarization in an incremental setting, using a cover tree to index review representations and maintaining a reservoir of candidate summary sentences.
Zhicheng Dou,Yuchen Guo,Ching-Chun Chang,Huy H. Nguyen,Isao Echizen
http://arxiv.org/abs/2401.08046v1
Compressor summary: The paper analyzes LLMs' impact on academic writing, highlights a lack of robustness in a GPT detector, and proposes Synthetic-Siamese, a reference-based method that improves detection performance.
Xu Yan,Haiming Zhang,Yingjie Cai,Jingming Guo,Weichao Qiu,Bin Gao,Kaiqiang Zhou,Yue Zhao,Huan Jin,Jiantao Gao,Zhen Li,Lihui Jiang,Wei Zhang,Hongbo Zhang,Dengxin Dai,Bingbing Liu
http://arxiv.org/abs/2401.08045v1
Compressor summary: The paper analyzes challenges and techniques for developing foundation models tailored for autonomous driving, and provides a roadmap and resources for future research.
Wenjun Qiu,David Lie,Lisa Austin
http://arxiv.org/abs/2401.08038v1
Compressor summary: Calpric is a low-cost method to generate labeled data for training accurate deep learning models on privacy policies using text selection, segmentation, active learning, and crowdsourced annotators.
Haibin Zhou,Jun Chang,Tao Lu,Huabing Zhou
http://arxiv.org/abs/2401.08036v1
Compressor summary: The study proposes a joint modeling approach for accurate 3D lane detection, using Bezier curves and interpolation methods, and introduces a novel 3D Spatial Constructor for front-view or surround-view lane detection in complex road conditions.
Chandrika Saha,Md. Mostafijur Rahman
http://arxiv.org/abs/2401.08035v1
Compressor summary: BanglaNet is a CNN ensemble model that achieves high recognition accuracies for Bangla handwritten characters, compound characters, numerals, and modifiers using augmented and non-augmented inputs.
Fengzhu Zeng,Wei Gao
http://arxiv.org/abs/2401.08026v1
Compressor summary: The text proposes a method for generating explanations (justifications) for fact-checking claims using a new dataset and a language model that incorporates retrieval information.
Syeda Nahida Akter,Aman Madaan,Sangwu Lee,Yiming Yang,Eric Nyberg
http://arxiv.org/abs/2401.08025v1
Compressor summary: Self-Imagine is a method that uses a Vision-Language Model to generate a visual representation of a question, answer it using both the question and the image, and improves the performance on math and reasoning tasks.
Ji Huang,Hui Wang
http://arxiv.org/abs/2401.08017v1
Compressor summary: The study proposes two improvements for the RT-DETR model in small object detection: fine-grained path augmentation and adaptive feature fusion, which enhance semantic and detailed information input to the Transformer and improve multi-scale feature integration.