arxiv compressed, 2024-01-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-17 generated by the compressor, my personal LLM-based project.


Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie,Yun Xing,Gongjie Zhang,Pei Yan,Aoran Xiao,Yap-Peng Tan,Alex C. Kot,Shijian Lu

http://arxiv.org/abs/2401.08407v1

Compressor summary: The paper proposes a novel cross-domain fine-tuning strategy for few-shot segmentation, addressing domain transfer and overfitting issues using bi-directional and iterative methods.


RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

Aman Gupta,Anup Shirgaonkar,Angels de Luis Balaguer,Bruno Silva,Daniel Holstein,Dawei Li,Jennifer Marsman,Leonardo O. Nunes,Mahsa Rouzbahman,Morris Sharp,Nick Mecklenburg,Rafael Padilha,Ranveer Chandra,Renato Luiz de Freitas Cunha,Roberto de M. Estevão Filho,Ryan Tsang,Sara Malvar,Swati Sharma,Todd Hendry,Vijay Aski,Vijetha Vijayendran,Vinamra Benara

http://arxiv.org/abs/2401.08406v1

Compressor summary: The paper compares fine-tuning and retrieval-augmented generation (RAG) approaches for integrating domain-specific data into large language models (LLMs), proposes a pipeline to evaluate both methods, and applies it to an agricultural dataset, showing improved accuracy and answer similarity.


Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine

Qiao Jin,Fangyuan Chen,Yiliang Zhou,Ziyang Xu,Justin M. Cheung,Robert Chen,Ronald M. Summers,Justin F. Rousseau,Peiyun Ni,Marc J Landsman,Sally L. Baxter,Subhi J. Al'Aref,Yijia Li,Michael F. Chiang,Yifan Peng,Zhiyong Lu

http://arxiv.org/abs/2401.08396v1

Compressor summary: GPT-4V can answer medical image quizzes better than humans but sometimes has flawed reasoning.


DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

Zongxin Yang,Guikun Chen,Xiaodi Li,Wenguan Wang,Yi Yang

http://arxiv.org/abs/2401.08392v1

Compressor summary: DoraemonGPT is a system that uses large language models to process videos, store task-related attributes, reason spatially and temporally, access external knowledge, and plan actions for various real-world applications.


Deep Learning-based Group Causal Inference in Multivariate Time-series

Wasim Ahmad,Maha Shadaydeh,Joachim Denzler

http://arxiv.org/abs/2401.08386v1

Compressor summary: The text describes a method to identify causal relationships among groups of variables in nonlinear systems using deep learning and interventions, which improves on existing methods and helps understand complex systems like climate and brain networks.


Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Jinghan Yao,Quentin Anthony,Aamir Shafi,Hari Subramoni,Dhabaleswar K.,Panda

http://arxiv.org/abs/2401.08383v1

Compressor summary: ExFlow optimizes MoE models for parallel inference by exploiting inter-layer expert affinity, reducing communication overhead and improving throughput on distributed systems.


Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

Miquel Esplà-Gomis,Víctor M. Sánchez-Cartagena,Juan Antonio Pérez-Ortiz,Felipe Sánchez-Martínez

http://arxiv.org/abs/2401.08374v1

Compressor summary: The paper introduces a neural CAT tool that uses both TMs and in-domain monolingual corpora, and evaluates its effectiveness on four language pairs.


Morphology and Syntax of the Tamil Language

Kengatharaiyer Sarveswaran

http://arxiv.org/abs/2401.08367v1

Compressor summary: The paper describes the structure and features of modern Tamil, useful for linguists and computer scientists working with the language.


Weighted Spectral Filters for Kernel Interpolation on Spheres: Estimates of Prediction Accuracy for Noisy Data

Xiaotong Liu,Jinxin Wang,Di Wang,Shao-Bo Lin

http://arxiv.org/abs/2401.08364v1

Compressor summary: The paper proposes a weighted spectral filter method to improve kernel interpolation for noisy data in image sciences, using spherical positive quadrature rules and high-pass spectral filters.


Hallucination Detection and Hallucination Mitigation: An Investigation

Junliang Luo,Tianyu Li,Di Wu,Michael Jenkin,Steve Liu,Gregory Dudek

http://arxiv.org/abs/2401.08358v1

Compressor summary: The report reviews existing methods to detect and reduce factually incorrect responses generated by large language models like ChatGPT, Bard, and Llama.


SAMF: Small-Area-Aware Multi-focus Image Fusion for Object Detection

Xilai Li,Xiaosong Li,Haishu Tan,Jinyang Li

http://arxiv.org/abs/2401.08357v1

Compressor summary: The study presents a new image fusion algorithm that enhances small focus areas and improves object detection performance.


Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

Jianhui Pang,Fanghua Ye,Longyue Wang,Dian Yu,Derek F. Wong,Shuming Shi,Zhaopeng Tu

http://arxiv.org/abs/2401.08350v1

Compressor summary: This study examines six core challenges in neural machine translation and finds that large language models improve some aspects but introduce new challenges related to efficiency, low-resource languages, and evaluation.


We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Jakub Białek,Wojtek Kuberski,Nikolaos Perrakis

http://arxiv.org/abs/2401.08348v1

Compressor summary: M-CBPE is a novel method for estimating ML model performance on unlabeled data, accounting for covariate shift and working for any performance metric and data type.


Multi-view Distillation based on Multi-modal Fusion for Few-shot Action Recognition(CLIP-$\mathrm{M^2}$DF)

Fei Guo,YiKang Wang,Han Qi,WenPing Jin,Li Zhu

http://arxiv.org/abs/2401.08345v1

Compressor summary: The paper proposes a method for few-shot action recognition using multi-modal fusion and multi-view distillation to improve robustness to class overlaps and outliers.


Generative Denoise Distillation: Simple Stochastic Noises Induce Efficient Knowledge Transfer for Dense Prediction

Zhaoge Liu,Xiaohao Xu,Yunkang Cao,Weiming Shen

http://arxiv.org/abs/2401.08332v1

Compressor summary: Generative Denoise Distillation (GDD) is a novel method that improves knowledge transfer from complex models to simpler ones by adding noise and aligning features, achieving state-of-the-art results in computer vision tasks.


Boosting Gradient Ascent for Continuous DR-submodular Maximization

Qixin Zhang,Zongqi Wan,Zengde Deng,Zaiyi Chen,Xiaoming Sun,Jialin Zhang,Yu Yang

http://arxiv.org/abs/2401.08330v1

Compressor summary: The paper proposes a boosting technique for continuous DR-submodular maximization problems that improves the approximation guarantee of standard Projected Gradient Ascent to optimal levels with minimal modifications.


Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation

Devavrat Tomar,Guillaume Vray,Jean-Philippe Thiran,Behzad Bozorgtabar

http://arxiv.org/abs/2401.08328v1

Compressor summary: The paper introduces UnMix-TNS, a novel method that simulates the i.i.d. environment for test-time adaptation by mixing instance-wise statistics with multiple unmixed components, improving stability and performance in various scenarios.


RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

Junjie Ye,Yilong Wu,Songyang Gao,Sixian Li,Guanyu Li,Xiaoran Fan,Qi Zhang,Tao Gui,Xuanjing Huang

http://arxiv.org/abs/2401.08326v1

Compressor summary: RoTBench is a benchmark for testing the robustness of large language models in tool learning across different noise levels, revealing their weaknesses and proposing RoTTuning to improve them.


OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion

Yizhuo Wu,Gagan Deep Singh,Mohammadreza Beikmirza,Leo de Vreede,Morteza Alavi,Chang Gao

http://arxiv.org/abs/2401.08318v1

Compressor summary: This paper introduces OpenDPD, an open-source framework for fast digital pre-distortion exploration and comparison using a novel Dense Gated Recurrent Unit (DGRU)-DPD model that outperforms previous models on a digital power amplifier.


Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening

Chengguang Gan,Qinghao Zhang,Tatsunori Mori

http://arxiv.org/abs/2401.08315v1

Compressor summary: The paper introduces a novel agent framework using large language models to efficiently summarize, grade, and select candidates from resumes, achieving significant speed and accuracy improvements over traditional methods.


Anchor function: a type of benchmark functions for studying language models

Zhongwang Zhang,Zhiwei Wang,Junjie Yao,Zhangchen Zhou,Xiaolong Li,Weinan E,Zhi-Qin John Xu

http://arxiv.org/abs/2401.08309v1

Compressor summary: The text proposes the concept of an anchor function as a benchmark for studying transformer-based language models, enabling academic research with limited resources to explore their behavior and operations in various tasks.


DAPT: A Dual Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

Weixiang Zhao,Shilong Wang,Yulin Hu,Yanyan Zhao,Bing Qin,Xuanyu Zhang,Qing Yang,Dongliang Xu,Wanxiang Che

http://arxiv.org/abs/2401.08295v1

Compressor summary: The paper proposes a new method called Dual Attention Framework (DAPT) that improves large language models' ability to learn continually by aligning learning and selection modules, enhancing their performance on dynamic tasks.


Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

Shuming Shi,Enbo Zhao,Deng Cai,Leyang Cui,Xinting Huang,Huayang Li

http://arxiv.org/abs/2401.08294v1

Compressor summary: Inferflow is a flexible and efficient inference engine for large language models that supports 3.5-bit quantization and hybrid model partitioning for multi-GPU inference.


The Faiss library

Matthijs Douze,Alexandr Guzhva,Chengqi Deng,Jeff Johnson,Gergely Szilvasy,Pierre-Emmanuel Mazaré,Maria Lomeli,Lucas Hosseini,Hervé Jégou

http://arxiv.org/abs/2401.08281v1

Compressor summary: Faiss is a vector similarity search toolkit with various indexing methods and applications for large collections of embeddings.


AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception

Yipo Huang,Quan Yuan,Xiangfei Sheng,Zhichao Yang,Haoning Wu,Pengfei Chen,Yuzhe Yang,Leida Li,Weisi Lin

http://arxiv.org/abs/2401.08276v1

Compressor summary: AesBench is an expert benchmark for evaluating multimodal large language models' image aesthetics perception abilities using a database with diverse images and high-quality annotations.


Modeling Spoof Noise by De-spoofing Diffusion and its Application in Face Anti-spoofing

Bin Zhang,Xiangyu Zhu,Xiaoyu Zhang,Zhen Lei

http://arxiv.org/abs/2401.08275v1

Compressor summary: The paper proposes a new face anti-spoofing method that uses diffusion models to denoise spoof images and genuine images, and considers their difference as a discriminative cue for detecting presentation attacks.


Large Language Models are Null-Shot Learners

Pittawat Taveekitworachai,Febri Abdullah,Ruck Thawonmas

http://arxiv.org/abs/2401.08273v1

Compressor summary: Null-shot prompting exploits hallucination in large language models to improve task performance by using information from the "Examples" section not present in the context.


Siamese Content-based Search Engine for a More Transparent Skin and Breast Cancer Diagnosis through Histological Imaging

Zahra Tabatabaei,Adrián Colomer,JAvier Oliver Moll,Valery Naranjo

http://arxiv.org/abs/2401.08272v1

Compressor summary: The authors propose two novel Content-Based Histopathological Image Retrieval (CBHIR) methods using a Siamese network for assisting pathologists in diagnosing breast and skin cancers, outperforming existing methods and addressing the challenges of Spitzoid Tumors of Uncertain Malignant Potential (STUMP).


Multi-Technique Sequential Information Consistency For Dynamic Visual Place Recognition In Changing Environments

Bruno Arcanjo,Bruno Ferrarini,Michael Milford,Klaus D. McDonald-Maier,Shoaib Ehsan

http://arxiv.org/abs/2401.08263v1

Compressor summary: MuSIC is a system that improves visual place recognition by selecting the best technique for each image based on how well it matches previous images.


Multitask Learning in Minimally Invasive Surgical Vision: A Review

Oluwatosin Alabi,Tom Vercauteren,Miaojing Shi

http://arxiv.org/abs/2401.08256v1

Compressor summary: Key points: - MIS is beneficial but complex for surgeons - Data-driven surgical vision algorithms can help improve MIS systems - Multitask learning (MTL) is a promising approach for understanding MIS videos - The paper reviews the current MTL systems and discusses their benefits, limitations, trends, and directions Summary: The paper surveys multitask learning systems that use videos from minimally invasive surgery to improve surgical vision algorithms and overcome the complexity of this procedure.


A Generative Adversarial Attack for Multilingual Text Classifiers

Tom Roth,Inigo Jauregi Unanue,Alsharif Abuadbba,Massimo Piccardi

http://arxiv.org/abs/2401.08255v1

Compressor summary: Key points: - The paper proposes an approach to generate adversarial examples against multilingual classifiers using a fine-tuned paraphrase model. - The model is trained with pre-trained models and vocabulary-mapping matrices for quality and consistency control. - The approach outperforms existing baselines in query efficiency on two multilingual datasets and five languages. Summary: The paper presents a method to fool multilingual classifiers using a paraphrase model fine-tuned with adversarial and quality objectives, achieving better query efficiency than prior methods.


Optimizing $k$ in $k$NN Graphs with Graph Learning Perspective

Asuka Tamaru,Junya Hara,Hiroshi Higashi,Yuichi Tanaka,Antonio Ortega

http://arxiv.org/abs/2401.08245v1

Compressor summary: The paper proposes a graph signal processing-based method to optimize the number of neighbors in k-nearest neighbor graphs for various applications, including point cloud denoising.


Enhancing Wind Speed and Wind Power Forecasting Using Shape-Wise Feature Engineering: A Novel Approach for Improved Accuracy and Robustness

Mulomba Mukendi Christian,Yun Seon Kim,Hyebong Choi,Jaeyoung Lee,SongHee You

http://arxiv.org/abs/2401.08233v1

Compressor summary: The study proposes a novel feature engineering approach that enhances the accuracy and resilience of deep learning models in predicting wind speed and power by altering data input shapes, achieving high performance across different forecasting horizons.


Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

Chongzhi Zhang,Mingyuan Zhang,Zhiyang Teng,Jiayi Li,Xizhou Zhu,Lewei Lu,Ziwei Liu,Aixin Sun

http://arxiv.org/abs/2401.08232v1

Compressor summary: The paper proposes a new method for natural language video localization that generates a global 2D temporal map using a conditional denoising diffusion process, addressing sparsity and discontinuity issues with a multi-scale technique and an innovative diffusion decoder.


Efficient and Mathematically Robust Operations for Certified Neural Networks Inference

Fabien Geyer,Johannes Freitag,Tobias Schulz,Sascha Uhrig

http://arxiv.org/abs/2401.08225v1

Compressor summary: The paper examines challenges and proposes solutions for using machine learning and neural networks in flying taxis, focusing on number representations and arithmetic efficiency.


Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Hang Chen,Xinyu Yang,Keqing Du

http://arxiv.org/abs/2401.08221v1

Compressor summary: The authors propose a probabilistic framework to learn causal structures and representations from indeterminate data like dialogue and video with multiple structures and values, and release two datasets with causal annotations for this purpose.


Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication

Hanjia Lyu,Weihong Qi,Zhongyu Wei,Jiebo Luo

http://arxiv.org/abs/2401.08212v1

Compressor summary: The study explores how GPT-4V uses emojis in online communication and finds differences between human and AI behaviors, possibly due to cultural biases and limited language training.


ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Zhongbin Fang,Xia Li,Xiangtai Li,Shen Zhao,Mengyuan Liu

http://arxiv.org/abs/2401.08210v1

Compressor summary: The paper introduces ModelNet-O, a large synthetic dataset with self-occlusion for 3D point cloud classification, and proposes PointMLS, a robust method that uses critical point sampling.


Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang,Yawei Li,Xingyu Zhou,Xiaorui Zhao,Shuhang Gu

http://arxiv.org/abs/2401.08209v1

Compressor summary: The paper introduces an ATD-SR method for single image super-resolution, which uses adaptive token dictionaries and category-based self-attention to improve performance.


Matrix Completion with Hypergraphs:Sharp Thresholds and Efficient Algorithms

Zhongtian Ma,Qiaosheng Zhang,Zhen Wang

http://arxiv.org/abs/2401.08197v1

Compressor summary: The paper studies how to complete a rating matrix using sub-sampled data and observed social graphs and hypergraphs, finding a sharp threshold for success and developing an efficient algorithm that exploits these structures.


End-to-End Optimized Image Compression with the Frequency-Oriented Transform

Yuefeng Zhang,Kai Lin

http://arxiv.org/abs/2401.08194v1

Compressor summary: The text describes a new image compression model that uses frequency-oriented transform to separate images into distinct bands, enabling better compression and preservation of semantic fidelity.


MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

Minpeng Liao,Wei Luo,Chengxi Li,Jing Wu,Kai Fan

http://arxiv.org/abs/2401.08190v1

Compressor summary: The paper introduces a new math dataset enriched with Python code interpretation and proposes a fine-tuning protocol to improve mathematical reasoning in large language models.


PRewrite: Prompt Rewriting with Reinforcement Learning

Weize Kong,Spurthi Amba Hombaiah,Mingyang Zhang,Qiaozhu Mei,Michael Bendersky

http://arxiv.org/abs/2401.08189v1

Compressor summary: The paper proposes PRewrite, an automated tool using reinforcement learning to optimize and generate effective prompts for LLM-based applications, improving on manual and previous methods.


DPAFNet:Dual Path Attention Fusion Network for Single Image Deraining

Bingcai Wei

http://arxiv.org/abs/2401.08185v1

Compressor summary: The paper proposes a dual-branch attention fusion network for image rain removal that combines features from convolutional neural networks and Transformers using an attention module.


Key-point Guided Deformable Image Manipulation Using Diffusion Model

Seok-Hwan Oh,Guil Jung,Myeong-Gee Kim,Sang-Yun Kim,Young-Min Kim,Hyeon-Jik Lee,Hyuk-Sool Kwon,Hyeon-Min Bae

http://arxiv.org/abs/2401.08178v1

Compressor summary: The Key-point-guided Diffusion probabilistic Model (KDM) is a generative model that uses object's key-points to manipulate images, producing realistic and consistent results on various tasks such as face generation, human pose synthesis, and echocardiography video prediction.


Completely Occluded and Dense Object Instance Segmentation Using Box Prompt-Based Segmentation Foundation Models

Zhen Zhou,Junfeng Fan,Yunkai Ma,Sihan Zhao,Fengshui Jing,Min Tan

http://arxiv.org/abs/2401.08174v1

Compressor summary: CFNet is a coarse-to-fine framework for segmenting completely occluded and dense objects using box prompt-based segmentation foundation models, which improves performance by exploiting geometric properties and reducing dependency on bounding box detection.


Deep Linear Array Pushbroom Image Restoration: A Degradation Pipeline and Jitter-Aware Restoration Network

Zida Chen,Ziran Zhang,Haoying Li,Menghao Li,Yueting Chen,Qi Li,Huajun Feng,Zhihai Xu,Shiqi Chen

http://arxiv.org/abs/2401.08171v1

Compressor summary: The paper proposes JARNet, a two-stage network to remove distortion and blur in LAP images using optical flow correction and jitter-aware techniques, and presents a data synthesis pipeline for realistic degradation simulation.


Learned Image Compression with ROI-Weighted Distortion and Bit Allocation

Wei Jiang,Yongqi Zhai,Hangyu Li,Ronggang Wang

http://arxiv.org/abs/2401.08154v1

Compressor summary: The paper presents a method for image compression with perceptual quality improvement using adversarial and ROI losses, by Team TLIC.


Machine Learning on Dynamic Graphs: A Survey on Applications

Sanaz Hasanzadeh Fard

http://arxiv.org/abs/2401.08147v1

Compressor summary: This paper reviews lesser-explored applications of dynamic graph learning in various domains using machine learning.


ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process

Kiyohiro Nakayama,Mikaela Angelina Uy,Yang You,Ke Li,Leonidas Guibas

http://arxiv.org/abs/2401.08140v1

Compressor summary: The text introduces ProvNeRF, a model that improves NeRFs by incorporating per-point provenance for sparse and unconstrained views, enabling better understanding and reconstruction of 3D scenes.


Transferring Core Knowledge via Learngenes

Fu Feng,Jing Wang,Xin Geng

http://arxiv.org/abs/2401.08139v1

Compressor summary: The authors propose Genetic Transfer Learning (GTL), a framework inspired by evolution, to efficiently transfer essential knowledge from ancestor networks to descendant networks using learngenes, achieving improved performance on downstream tasks with fewer parameters.


Machine Learning-Based Malicious Vehicle Detection for Security Threats and Attacks in Vehicle Ad-hoc Network (VANET) Communications

Thanh Nguyen Canh,Xiem HoangVan

http://arxiv.org/abs/2401.08135v1

Compressor summary: The paper proposes a machine learning-based approach to detect blackhole attacks, which disrupt VANET communication and compromise its security and integrity, using a comprehensive dataset and various algorithms.


The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation

Xinni Jiang,Zengsheng Kuang,Chunle Guo,Ruixun Zhang,Lei Cai,Xiao Fan,Chongyi Li

http://arxiv.org/abs/2401.08123v1

Compressor summary: The D2A2 network restores depth details from RGB images using dynamic dual alignment and mask-to-pixel feature aggregation to handle modal and geometrical misalignments.


CycLight: learning traffic signal cooperation with a cycle-level strategy

Gengyue Han,Xiaohan Liu,Xianyue Peng,Hao Wang,Yu Han

http://arxiv.org/abs/2401.08121v1

Compressor summary: CycLight is a novel cycle-level deep RL approach for adaptive traffic signal control that reduces computational burden, enhances practicality and safety, and works well with multi-agent cooperation and attention mechanism.


SpecSTG: A Fast Spectral Diffusion Framework for Probabilistic Spatio-Temporal Traffic Forecasting

Lequan Lin,Dai Shi,Andi Han,Junbin Gao

http://arxiv.org/abs/2401.08119v1

Compressor summary: SpecSTG is a novel spectral diffusion framework that leverages spatial information in traffic forecasting by generating Fourier representations of future time series and using fast spectral graph convolution.


E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning

Qiang Qu,Yiran Shen,Xiaoming Chen,Yuk Ying Chung,Tongliang Liu

http://arxiv.org/abs/2401.08117v1

Compressor summary: The text proposes a novel method called E2HQV that uses a theory-inspired model to generate high-quality video frames from event camera inputs, outperforming existing data-driven approaches.


No-Clean-Reference Image Super-Resolution: Application to Electron Microscopy

Mohammad Khateri,Morteza Ghahremani,Alejandra Sierra,Jussi Tohka

http://arxiv.org/abs/2401.08115v1

Compressor summary: Key points: - Deep learning approach to reconstruct clean high-resolution 3D-EM images from noisy low-resolution ones - Investigate training with no-clean references and different loss functions - Introduce a novel network architecture, EMSR, for enhancing resolution and reducing noise - Compare different training strategies and show the feasibility of the approach Summary: The authors propose a deep learning method, EMSR, to improve the quality of low-resolution electron microscopy images of brain tissue by using different loss functions and training strategies.


Mobile Contactless Palmprint Recognition: Use of Multiscale, Multimodel Embeddings

Steven A. Grosz,Akash Godbole,Anil K. Jain

http://arxiv.org/abs/2401.08111v1

Compressor summary: The paper presents Palm-ID, a contactless palmprint recognition system that combines global and local features using vision transformer and convolutional neural network, achieving high accuracy and low latency.


Deep Shape-Texture Statistics for Completely Blind Image Quality Evaluation

Yixuan Li,Peilin Chen,Hanwei Zhu,Keyan Ding,Leida Li,Shiqi Wang

http://arxiv.org/abs/2401.08107v1

Compressor summary: The text proposes a new method for assessing image quality called Shape-Texture Adaptive Fusion, which uses both shape-biased and texture-biased deep features to form a well-rounded statistical description of images and predicts image quality based on the variant Mahalanobis Distance between inner and outer statistics.


Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone Networks

Austin Briley,Fatemeh Afghah

http://arxiv.org/abs/2401.08105v1

Compressor summary: This paper develops a real-time image classification model for wildfire detection on UAVs using NVIDIA's TensorRT and other optimization techniques, improving speed and maintaining accuracy.


KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain

Anh-Cuong Pham,Van-Quang Nguyen,Thi-Hong Vuong,Quang-Thuy Ha

http://arxiv.org/abs/2401.08100v1

Compressor summary: KTVIC is a large Vietnamese image captioning dataset for daily life activities that can improve research and applications in this domain.


Inpainting Normal Maps for Lightstage data

Hancheng Zuo,Bernard Tiddeman

http://arxiv.org/abs/2401.08099v1

Compressor summary: The study presents a new GAN method for filling missing areas in normal maps, which are important for performance capture, by adapting existing image inpainting techniques.


A Survey of Resource-efficient LLM and Multimodal Foundation Models

Mengwei Xu,Wangsong Yin,Dongqi Cai,Rongjie Yi,Daliang Xu,Qipeng Wang,Bingyang Wu,Yihao Zhao,Chen Yang,Shihe Wang,Qiyang Zhang,Zhenyan Lu,Li Zhang,Shangguang Wang,Yuanchun Li,Yunxin Liu,Xin Jin,Xuanzhe Liu

http://arxiv.org/abs/2401.08092v1

Compressor summary: This survey explores the growing need for resource-efficient strategies to support large foundation models across various aspects, from architecture to implementation, due to their significant impact on hardware resources and environmental sustainability.


A Study on Training and Developing Large Language Models for Behavior Tree Generation

Fu Li,Xueying Wang,Bin Li,Yunlong Wu,Yanzhen Wang,Xiaodong Yi

http://arxiv.org/abs/2401.08089v1

Compressor summary: The paper proposes a novel method to generate behavior trees for complex tasks using large language models and synthetic data, improving their performance and adaptability.


Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Yachao Li,Junhui Li,Jing Jiang,Min Zhang

http://arxiv.org/abs/2401.08088v1

Compressor summary: The paper proposes an approach to improve large language models' document-level translation performance by combining sentence-level and document-level instructions, addressing the issue of sentence-level coverage and enhancing translation quality.


Spatial-Semantic Collaborative Cropping for User Generated Content

Yukun Su,Yiwen Cao,Jingliang Deng,Fengyun Rao,Qingyao Wu

http://arxiv.org/abs/2401.08086v1

Compressor summary: The paper proposes S2CNet, a network for cropping user generated content that preserves both aesthetics and content integrity, using spatial-semantic collaboration and adaptive attention.


UV-SAM: Adapting Segment Anything Model for Urban Village Identification

Xin Zhang,Yu Liu,Yuming Lin,Qingming Liao,Yong Li

http://arxiv.org/abs/2401.08083v1

Compressor summary: The text describes a new computer vision technique called UV-SAM that uses satellite images to accurately identify and monitor urban villages, which are informal residential areas with poor living conditions in or around cities.


Predicting Next Useful Location With Context-Awareness: The State-Of-The-Art

Alireza Nezhadettehad,Arkady Zaslavsky,Rakib Abdur,Siraj Ahmed Shaikh,Seng W. Loke,Guang-Li Huang,Alireza Hassani

http://arxiv.org/abs/2401.08081v1

Compressor summary: The text summarizes the current state and future possibilities of predicting the next location of mobile objects using context-aware artificial intelligence and machine learning techniques, with applications in various domains such as traffic control and public health.


Adversarial Masking Contrastive Learning for vein recognition

Huafeng Qin,Yiquan Wu,Mounim A. El-Yacoubi,Jun Wang,Guangxiang Yang

http://arxiv.org/abs/2401.08079v1

Compressor summary: Key points: - Vein recognition is high secure and private, but lacks training data - AMCL generates challenging samples to train a robust contrastive learning model for palm-vein recognition - AMCL outperforms existing methods and achieves state-of-the-art results on three databases Summary: The paper proposes AMCL, a method that generates challenging samples to improve vein recognition by contrastive learning, and shows its superior performance on three databases.


Transformer-based approach for Ethereum Price Prediction Using Crosscurrency correlation and Sentiment Analysis

Shubham Singh,Mayur Bhat

http://arxiv.org/abs/2401.08077v1

Compressor summary: The text describes a study that uses a transformer neural network to predict Ethereum prices based on other cryptocurrencies and their sentiment, outperforming some alternatives and suggesting a sentiment-driven illusion of causality in crypto markets.


Representation Learning on Event Stream via an Elastic Net-incorporated Tensor Network

Beibei Yang,Weiling Li,Yan Fang

http://arxiv.org/abs/2401.08068v1

Compressor summary: The paper proposes a new way to represent event streams from neuromorphic sensors using tensor decomposition and a special model to capture both spatial and temporal correlations for better performance in tasks like noise filtering.


Achieve Fairness without Demographics for Dermatological Disease Diagnosis

Ching-Hao Chiu,Yu-Jen Chen,Yawen Wu,Yiyu Shi,Tsung-Yi Ho

http://arxiv.org/abs/2401.08066v1

Compressor summary: The paper proposes a method for improving fairness and accuracy in medical image diagnosis by enhancing features and regularizing feature entanglement without using sensitive attributes during training.


Augmenting Ground-Level PM2.5 Prediction via Kriging-Based Pseudo-Label Generation

Lei Duan,Ziyang Jiang,David Carlson

http://arxiv.org/abs/2401.08061v1

Compressor summary: The authors propose using unlabeled satellite images with pseudo-labels from ordinary kriging to improve the performance of a CNN-RF model for climate data.


Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

Cooper Gamble,Shahriar Faghani,Bradley J. Erickson

http://arxiv.org/abs/2401.08058v1

Compressor summary: The study applies conformal prediction to a deep learning model for intracranial hemorrhage detection, improving its ability to identify challenging cases and enhancing trustworthiness in radiology.


Robust Tiny Object Detection in Aerial Images amidst Label Noise

Haoran Zhu,Chang Xu,Wen Yang,Ruixiang Zhang,Yan Zhang,Gui-Song Xia

http://arxiv.org/abs/2401.08056v1

Compressor summary: The study proposes a method (DN-TOD) to improve tiny object detection in remote sensing imagery by correcting labels and handling bounding box errors caused by noisy annotations.


SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Zhixuan Liu,Peter Schaldenbrand,Beverley-Claire Okogwu,Wenxuan Peng,Youngsik Yun,Andrew Hundt,Jihie Kim,Jean Oh

http://arxiv.org/abs/2401.08053v1

Compressor summary: The authors propose a method to improve inclusive representation in generated images by collecting a culturally diverse dataset (CCUB) and using Self-Contrastive Fine-Tuning (SCoFT) to reduce biases and stereotypes.


EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

Bingyuan Zhang,Xulong Zhang,Ning Cheng,Jun Yu,Jing Xiao,Jianzong Wang

http://arxiv.org/abs/2401.08049v1

Compressor summary: EmoTalker is a method for generating realistic and emotionally editable talking faces based on a diffusion model and a new dataset.


Incremental Extractive Opinion Summarization Using Cover Trees

Somnath Basu Roy Chowdhury,Nicholas Monath,Avinava Dubey,Manzil Zaheer,Andrew McCallum,Amr Ahmed,Snigdha Chaturvedi

http://arxiv.org/abs/2401.08047v1

Compressor summary: CoverSumm is a fast and accurate algorithm for extractive opinion summarization in an incremental setting, using a cover tree to index review representations and maintaining a reservoir of candidate summary sentences.


Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

Zhicheng Dou,Yuchen Guo,Ching-Chun Chang,Huy H. Nguyen,Isao Echizen

http://arxiv.org/abs/2401.08046v1

Compressor summary: The paper analyzes LLMs' impact on academic writing, highlights a lack of robustness in a GPT detector, and proposes Synthetic-Siamese, a reference-based method that improves detection performance.


Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Xu Yan,Haiming Zhang,Yingjie Cai,Jingming Guo,Weichao Qiu,Bin Gao,Kaiqiang Zhou,Yue Zhao,Huan Jin,Jiantao Gao,Zhen Li,Lihui Jiang,Wei Zhang,Hongbo Zhang,Dengxin Dai,Bingbing Liu

http://arxiv.org/abs/2401.08045v1

Compressor summary: The paper analyzes challenges and techniques for developing foundation models tailored for autonomous driving, and provides a roadmap and resources for future research.


Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning

Wenjun Qiu,David Lie,Lisa Austin

http://arxiv.org/abs/2401.08038v1

Compressor summary: Calpric is a low-cost method to generate labeled data for training accurate deep learning models on privacy policies using text selection, segmentation, active learning, and crowdsourced annotators.


3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

Haibin Zhou,Jun Chang,Tao Lu,Huabing Zhou

http://arxiv.org/abs/2401.08036v1

Compressor summary: The study proposes a joint modeling approach for accurate 3D lane detection, using Bezier curves and interpolation methods, and introduces a novel 3D Spatial Constructor for front-view or surround-view lane detection in complex road conditions.


BanglaNet: Bangla Handwritten Character Recognition using Ensembling of Convolutional Neural Network

Chandrika Saha,Md. Mostafijur Rahman

http://arxiv.org/abs/2401.08035v1

Compressor summary: BanglaNet is a CNN ensemble model that achieves high recognition accuracies for Bangla handwritten characters, compound characters, numerals, and modifiers using augmented and non-augmented inputs.


JustiLM: Few-shot Justification Generation for Explainable Fact-Checking of Real-world Claims

Fengzhu Zeng,Wei Gao

http://arxiv.org/abs/2401.08026v1

Compressor summary: The text proposes a method for generating explanations (justifications) for fact-checking claims using a new dataset and a language model that incorporates retrieval information.


Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination

Syeda Nahida Akter,Aman Madaan,Sangwu Lee,Yiming Yang,Eric Nyberg

http://arxiv.org/abs/2401.08025v1

Compressor summary: Self-Imagine is a method that uses a Vision-Language Model to generate a visual representation of a question, answer it using both the question and the image, and improves the performance on math and reasoning tasks.


Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

Ji Huang,Hui Wang

http://arxiv.org/abs/2401.08017v1

Compressor summary: The study proposes two improvements for the RT-DETR model in small object detection: fine-grained path augmentation and adaptive feature fusion, which enhance semantic and detailed information input to the Transformer and improve multi-scale feature integration.