arxiv compressed, 2024-01-03

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-03 generated by the compressor, my personal LLM-based project.


Street Gaussians for Modeling Dynamic Urban Scenes

Yunzhi Yan,Haotong Lin,Chenxu Zhou,Weijie Wang,Haiyang Sun,Kun Zhan,Xianpeng Lang,Xiaowei Zhou,Sida Peng

http://arxiv.org/abs/2401.01339v1

Compressor summary: The paper presents Street Gaussians, a new scene representation for dynamic urban street scenes that enables fast rendering and editing while achieving state-of-the-art performance with less precise vehicle poses.


Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen,Yihe Deng,Huizhuo Yuan,Kaixuan Ji,Quanquan Gu

http://arxiv.org/abs/2401.01335v1

Compressor summary: The paper introduces SPIN, a self-play fine-tuning method for Large Language Models that improves their performance by generating and refining training data from themselves, without requiring additional human-annotated data.


An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction

Zaratiana Urchade,Nadi Tomeh,Pierre Holat,Thierry Charnois

http://arxiv.org/abs/2401.01326v1

Compressor summary: The paper presents a new span-based method for extracting entities and relations from text using a transformer encoder-decoder with pointing mechanism.


LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Hongye Jin,Xiaotian Han,Jingfeng Yang,Zhimeng Jiang,Zirui Liu,Chia-Yuan Chang,Huiyuan Chen,Xia Hu

http://arxiv.org/abs/2401.01325v1

Compressor summary: The paper proposes a simple method called Self-Extend to extend the context window of large language models without fine-tuning by using bi-level attention information.


A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

S. M Towhidul Islam Tonmoy,S M Mehedi Zaman,Vinija Jain,Anku Rani,Vipula Rawte,Aman Chadha,Amitava Das

http://arxiv.org/abs/2401.01313v1

Compressor summary: This paper surveys 32 methods to reduce hallucination in large language models, which can generate factual-looking but ungrounded text, and proposes a taxonomy to categorize them based on various parameters.


Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

Matthew Dahl,Varun Magesh,Mirac Suzgun,Daniel E. Ho

http://arxiv.org/abs/2401.01301v1

Compressor summary: The text discusses the problem of legal hallucinations in large language models, which can lead to inconsistent and potentially harmful responses in legal tasks, and suggests caution in using them without supervision.


A Comprehensive Study of Knowledge Editing for Large Language Models

Ningyu Zhang,Yunzhi Yao,Bozhong Tian,Peng Wang,Shumin Deng,Mengru Wang,Zekun Xi,Shengyu Mao,Jintian Zhang,Yuansheng Ni,Siyuan Cheng,Ziwen Xu,Xin Xu,Jia-Chen Gu,Yong Jiang,Pengjun Xie,Fei Huang,Lei Liang,Zhiqiang Zhang,Xiaowei Zhu,Jun Zhou,Huajun Chen

http://arxiv.org/abs/2401.01286v1

Compressor summary: Key points: - Large Language Models (LLMs) are powerful but computationally expensive and need frequent updates to stay relevant - Knowledge editing is a technique to modify LLMs' behaviors within specific domains without affecting overall performance - The paper reviews existing knowledge editing methods, introduces a new benchmark, and discusses potential applications Summary: The paper surveys knowledge editing techniques for efficient and lightweight model modifications of Large Language Models (LLMs) that preserve their overall performance while adapting to specific domains.


Quality and Quantity of Machine Translation References for Automated Metrics

Vilém Zouhar,Ondřej Bojar

http://arxiv.org/abs/2401.01283v1

Compressor summary: The text discusses how the quality and number of human translations affect machine translation evaluation metrics and offers guidance for optimizing reference creation within a budget.


CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Quan Tu,Shilong Fan,Zihang Tian,Rui Yan

http://arxiv.org/abs/2401.01275v1

Compressor summary: CharacterEval is a new benchmark for evaluating Chinese role-playing conversational agents that shows promising capabilities of Chinese LLMs compared to GPT-4.


Learning-based agricultural management in partially observable environments subject to climate variability

Zhaoan Wang,Shaoping Xiao,Junchao Li,Jun Wang

http://arxiv.org/abs/2401.01273v1

Compressor summary: The study presents an innovative framework using Deep Reinforcement Learning and Recurrent Neural Networks to train an intelligent agent for optimal nitrogen fertilization management in corn crops under variable climate conditions, including extreme weather events.


MOC-RVQ: Multilevel Codebook-assisted Digital Generative Semantic Communication

Yingbin Zhou,Yaping Sun,Guanying Chen,Xiaodong Xu,Hao Chen,Binhong Huang,Shuguang Cui,Ping Zhang

http://arxiv.org/abs/2401.01272v1

Compressor summary: The proposed system combines a multi-head octonary codebook with residual vector quantization and Swin Transformer to create a high-quality semantic knowledge base for efficient image communication.


Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Haobo Zhang,Yicheng Li,Weihao Lu,Qian Lin

http://arxiv.org/abs/2401.01270v1

Compressor summary: The study investigates the generalization error of kernel ridge regression in high dimensions, showing its minimax optimality for some source conditions and revealing periodic plateau behavior and multiple descent behavior.


$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

Nicola Novello,Andrea M. Tonello

http://arxiv.org/abs/2401.01268v1

Compressor summary: The paper proposes a new objective function for classification tasks in deep learning using a shifted log $f$-divergence, which improves accuracy in various applications.


Fairness Certification for Natural Language Processing and Large Language Models

Vincent Freiberger,Erik Buchmann

http://arxiv.org/abs/2401.01262v1

Compressor summary: The text discusses the importance of developing a fairness certification for NLP systems due to their widespread use and potential for causing harm or discrimination.


Do Concept Bottleneck Models Obey Locality?

Naveen Raman,Mateo Espinosa Zarlenga,Juyeon Heo,Mateja Jamnik

http://arxiv.org/abs/2401.01259v1

Compressor summary: Concept Bottleneck Models, a type of interpretable deep learning architecture, may not accurately capture how concepts are related or independent from each other based on their spatial and semantic locality.


VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Fuchen Long,Zhaofan Qiu,Ting Yao,Tao Mei

http://arxiv.org/abs/2401.01256v1

Compressor summary: VideoDrafter is a novel framework that uses large language models to generate high-quality, consistent multi-scene videos from input prompts.


Deep Learning-Based Computational Model for Disease Identification in Cocoa Pods (Theobroma cacao L.)

Darlyn Buenaño Vera,Byron Oviedo,Washington Chiriboga Casanova,Cristian Zambrano-Vega

http://arxiv.org/abs/2401.01247v1

Compressor summary: The paper presents a deep learning model for identifying diseases in cocoa pods, which was trained on a dataset of images and integrated into a mobile app for easy use by farmers.


Contrastive Sequential Interaction Network Learning on Co-Evolving Riemannian Spaces

Li Sun,Junda Ye,Jiawei Zhang,Yong Yang,Mingsheng Liu,Feiyang Wang,Philip S. Yu

http://arxiv.org/abs/2401.01243v1

Compressor summary: CSINCERE is a novel model for sequential interaction network learning that uses co-evolving RiEmannian spaces and co-contrastive learning to address issues in previous methods, achieving superior results on 5 public datasets.


Encoding Binary Events from Continuous Time Series in Rooted Trees using Contrastive Learning

Tobias Engelhardt Rasmussen,Siv Sørensen

http://arxiv.org/abs/2401.01242v1

Compressor summary: The study presents a method to infer local network topology using discrete time series data and learn a binary event encoder from it.


Graph Elimination Networks

Shuo Wang,Ge Cheng,Yun Zhang

http://arxiv.org/abs/2401.01233v1

Compressor summary: Graph Elimination Networks (GENs) improve deep layer performance in Graph Neural Networks (GNNs) by eliminating redundancies during neighborhood propagation, enabling better capture of long-distance dependencies.


Motif-aware Riemannian Graph Neural Network with Generative-Contrastive Learning

Li Sun,Zhenhao Huang,Zixi Wang,Feiyang Wang,Hao Peng,Philip Yu

http://arxiv.org/abs/2401.01232v1

Compressor summary: The text introduces Motif-aware Riemannian Graph Representation Learning, a method that uses diverse curvature to capture complex graph structures in a self-supervised way.


IdentiFace : A VGG Based Multimodal Facial Biometric System

Mahmoud Rabea,Hanya Ahmed,Sohaila Mahmoud,Nourhan Sayed

http://arxiv.org/abs/2401.01227v1

Compressor summary: IdentiFace is a multimodal facial biometric system that combines different soft biometric traits and uses VGG-16 inspired architecture for high recognition accuracy.


Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Dimitrios Kollias,Viktoriia Sharmanska,Stefanos Zafeiriou

http://arxiv.org/abs/2401.01219v1

Compressor summary: The paper proposes a novel multi-task learning method that uses distribution matching for knowledge exchange between related tasks, enabling successful learning with little or no annotated data overlap and improving performance in various domains.


Zero-Shot Position Debiasing for Large Language Models

Zhongkun Liu,Zheng Chen,Mengqi Zhang,Zhaochun Ren,Zhumin Chen,Pengjie Ren

http://arxiv.org/abs/2401.01218v1

Compressor summary: The paper proposes ZOE, a zero-shot position debiasing framework for large language models that leverages unsupervised responses from pre-trained models without external knowledge or datasets.


Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise

Qinglong Huang,Yong Liao,Yanbin Hao,Pengyuan Zhou

http://arxiv.org/abs/2401.01216v1

Compressor summary: Noise-NeRF is a novel method for hiding information in 3D images using neural radiance fields, achieving high quality and efficiency in steganography and super-resolution.


YOLO algorithm with hybrid attention feature pyramid network for solder joint defect detection

Li Ang,Siti Khatijah Nor Abdul Rahim,Raseeda Hamzah,Raihah Aminuddin,Gao Yousheng

http://arxiv.org/abs/2401.01214v1

Compressor summary: The proposed hybrid attention mechanism improves solder joint defect detection in surface mount technology by increasing accuracy and reducing computational cost.


FGENet: Fine-Grained Extraction Network for Congested Crowd Counting

Hao-Yuan Ma,Li Zhang,Xiang-Yi Wei

http://arxiv.org/abs/2401.01208v1

Compressor summary: FINE-GRAINED EXTRACTION NETWORK (FGENET) IS A MODEL THAT DIRECTLY LEARNS THE COORDINATE POINTS OF INDIVIDUALS IN CROWD COUNTING, REDUCING ANNOATION NOISE AND OUTPERFORMING STATE-OF-THE-ART METHODS.


Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Renshuai Liu,Bowen Ma,Wei Zhang,Zhipeng Hu,Changjie Fan,Tangjie Lv,Yu Ding,Xuan Cheng

http://arxiv.org/abs/2401.01207v1

Compressor summary: The paper presents a novel multi-modal face generation framework that can control identity and expression in portrait images using a conditional diffusion model with innovative designs.


Whole-examination AI estimation of fetal biometrics from 20-week ultrasound scans

Lorenzo Venturini,Samuel Budd,Alfonso Farruggia,Robert Wright,Jacqueline Matthew,Thomas G. Day,Bernhard Kainz,Reza Razavi,Jo V. Hajnal

http://arxiv.org/abs/2401.01201v1

Compressor summary: The paper presents a new method for fetal anomaly screening that uses a neural network and Bayesian estimation to automatically extract and analyze ultrasound images without human intervention, achieving human-level performance.


Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms

Flavio P. Loss,Pedro H. da Cunha,Matheus B. Rocha,Madson Poltronieri Zanoni,Leandro M. de Lima,Isadora Tavares Nascimento,Isabella Rezende,Tania R. P. Canuto,Luciana de Paula Vieira,Renan Rossoni,Maria C. S. Santos,Patricia Lyra Frasson,Wanderson Romão,Paulo R. Filgueiras,Renato A. Krohling

http://arxiv.org/abs/2401.01200v1

Compressor summary: The text discusses the use of near-infrared (NIR) spectroscopy and machine learning algorithms for automated diagnosis of skin cancer, including a new dataset (NIR-SC-UFES) and results from LightGBM algorithm.


JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example

Benedetta Tondi,Wei Guo,Mauro Barni

http://arxiv.org/abs/2401.01199v1

Compressor summary: The paper proposes an optimal targeted attack on Deep Learning classifiers using Jacobian-induced Mahalanobis distance minimization and non-negative least squares, which is effective in various output encoding schemes and even multi-label classification scenarios.


Uncertainty Resolution in Misinformation Detection

Yury Orlovskiy,Camille Thibault,Anne Imouza,Jean-François Godbout,Reihaneh Rabbany,Kellin Pelrine

http://arxiv.org/abs/2401.01197v1

Compressor summary: The paper proposes a framework to categorize missing information in statements with misleading content, generating better questions to resolve uncertainty and improve the performance of large language models in combating misinformation.


Deep-ELA: Deep Exploratory Landscape Analysis with Self-Supervised Pretrained Transformers for Single- and Multi-Objective Continuous Optimization Problems

Moritz Vinzent Seiler,Pascal Kerschke,Heike Trautmann

http://arxiv.org/abs/2401.01192v1

Compressor summary: The paper proposes Deep-ELA, a hybrid approach that combines deep learning and Exploratory Landscape Analysis features for characterizing and understanding single- and multi-objective continuous optimization problems.


Unifying Structured Data as Graph for Data-to-Text Pre-Training

Shujie Li,Liang Li,Ruiying Geng,Min Yang,Binhua Li,Guanghu Yuan,Wanwei He,Shao Yuan,Can Ma,Fei Huang,Yongbin Li

http://arxiv.org/abs/2401.01183v1

Compressor summary: This paper proposes a structure-enhanced Transformer for data-to-text generation that leverages the graph format and positional information to handle different types of structured data.


Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification

Xuelin Zhu,Jian Liu,Dongqi Tang,Jiawei Ge,Weijia Liu,Bo Liu,Jiuxin Cao

http://arxiv.org/abs/2401.01181v1

Compressor summary: The paper proposes a novel query-based knowledge sharing method to improve open-vocabulary multi-label classification using vision-language pre-training models.


Accurate and Efficient Urban Street Tree Inventory with Deep Learning on Mobile Phone Imagery

Asim Khan,Umair Nawaz,Anwaar Ulhaq,Iqbal Gondal,Sajid Javed

http://arxiv.org/abs/2401.01180v1

Compressor summary: The paper proposes a smartphone-based method using deep learning to accurately estimate tree trunk diameter and improve forest management for climate change mitigation.


Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

Jiuming Qin,Che Liu,Sibo Cheng,Yike Guo,Rossella Arcucci

http://arxiv.org/abs/2401.01179v1

Compressor summary: The paper introduces a lightweight framework called Adaptor that preserves prior information in pre-trained encoders for medical vision tasks and reduces computation cost.


GBSS:a global building semantic segmentation dataset for large-scale remote sensing building extraction

Yuping Hu,Xin Huang,Jiayi Li,Zhen Zhang

http://arxiv.org/abs/2401.01178v1

Compressor summary: The authors introduce a new global dataset with diverse building samples for evaluating building semantic segmentation models and their transfer learning capabilities.


Learning Surface Scattering Parameters From SAR Images Using Differentiable Ray Tracing

Jiangtao Wei,Yixiang Luomei,Xu Zhang,Feng Xu

http://arxiv.org/abs/2401.01175v1

Compressor summary: The paper proposes a microwave-domain surface scattering model for realistic SAR image simulations and target parameter reconstruction using differentiable ray tracing and fast mapping projection techniques.


En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

Yifang Men,Biwen Lei,Yuan Yao,Miaomiao Cui,Zhouhui Lian,Xuansong Xie

http://arxiv.org/abs/2401.01173v1

Compressor summary: En3D is a novel 3D generative scheme that creates realistic, accurate, and diverse human avatars from synthetic 2D images using physical modeling and optimization modules.


Quadratic Time-Frequency Analysis of Vibration Signals for Diagnosing Bearing Faults

Mohammad Al-Sa'd,Tuomas Jalonen,Serkan Kiranyaz,Moncef Gabbouj

http://arxiv.org/abs/2401.01172v1

Compressor summary: The paper proposes a time-frequency convolutional neural network (TF-CNN) for bearing fault diagnosis under realistic conditions, such as speed changes and varying noise levels, by capturing non-stationary features and outperforming existing methods.


Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

Yanni Wang,Hecheng Jia,Shilei Fu,Huiping Lin,Feng Xu

http://arxiv.org/abs/2401.01165v1

Compressor summary: The paper proposes an interactive deep reinforcement learning framework with a differentiable SAR renderer to predict radar view angles given a target model, addressing challenges such as data scarcity and background interference.


Distilling Local Texture Features for Colorectal Tissue Classification in Low Data Regimes

Dmitry Demidov,Roba Al Majzoub,Amandeep Kumar,Fahad Khan

http://arxiv.org/abs/2401.01164v1

Compressor summary: KD-CTCNet is a method that uses knowledge distillation to improve classification of colorectal tissue samples, especially when there is limited training data.


NU-Class Net: A Novel Deep Learning-based Approach for Video Quality Enhancement

Parham Zilouchian Moghaddam,Mehdi Modarressi,MohammadAmin Sadeghi

http://arxiv.org/abs/2401.01163v1

Compressor summary: NU-Class Net is a deep-learning model that improves the quality of low-bit-rate videos captured by IoT edge node cameras by reducing compression artifacts.


Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection

Shiwen Zhao,Wei Wang,Junhui Hou,Hai Wu

http://arxiv.org/abs/2401.01134v1

Compressor summary: HPC-Net is a fast and accurate object detection network.


Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Bicheng Xu,Qi Yan,Renjie Liao,Lele Wang,Leonid Sigal

http://arxiv.org/abs/2401.01130v1

Compressor summary: The paper proposes DiffuseSG, a diffusion model that generates scene graphs from noise, enabling efficient and interpretable control for image generation and improving downstream applications.


SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM

Weijin Cheng,Jianzhi Liu,Jiawen Deng,Fuji Ren

http://arxiv.org/abs/2401.01128v1

Compressor summary: Key points: - Text-to-image synthesis improved with Large Language Models and Large Vision Models - Previous methods have unsafe factors in prompts - Appending optimal camera descriptions enhances safety and quality - SSP is a prompt engineering method to select and append camera descriptions - Experiments show significant improvement in semantic consistency and safety metrics Summary: The paper proposes SSP, a prompt engineering method that improves text-to-image synthesis by appending optimal camera descriptions, enhancing both quality and safety.


Explainable Adaptive Tree-based Model Selection for Time Series Forecasting

Matthias Jakobs,Amal Saadallah

http://arxiv.org/abs/2401.01124v1

Compressor summary: The paper proposes a novel online method for selecting tree-based models for time series forecasting using TreeSHAP explainability, which adapts to changing distributions and provides interpretability on three levels.


Utilizing Autoregressive Networks for Full Lifecycle Data Generation of Rolling Bearings for RUL Prediction

Junliang Wang,Qinghua Zhang,Guanhua Zhu,Guoxi Sun

http://arxiv.org/abs/2401.01119v1

Compressor summary: The paper introduces a new model (CVGAN) that generates vibration signals for rolling bearings based on historical data and remaining useful life, improving predictions of their lifespan.


Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

Chunyi Li,Haoning Wu,Zicheng Zhang,Hongkun Hao,Kaiwei Zhang,Lei Bai,Xiaohong Liu,Xiongkuo Min,Weisi Lin,Guangtao Zhai

http://arxiv.org/abs/2401.01117v1

Compressor summary: Q-Refine is a refining method for text-to-image generation that optimizes image quality based on human vision preferences using adaptive pipelines.


Unveiling Comparative Sentiments in Vietnamese Product Reviews: A Sequential Classification Framework

Ha Le,Bao Tran,Phuong Le,Tan Nguyen,Dac Nguyen,Ngoan Pham,Dang Huynh

http://arxiv.org/abs/2401.01108v1

Compressor summary: The paper introduces a method for identifying, extracting, and classifying comparative sentiments in Vietnamese product reviews and reports its performance at a challenge.


CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series

Tianyuan Huang,Zejia Wu,Jiajun Wu,Jackelyn Hwang,Ram Rajagopal

http://arxiv.org/abs/2401.01107v1

Compressor summary: The authors propose a street view time series dataset and a change detection model to accurately measure urban transformations, using high-definition images from a pedestrian perspective.


Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing

Zhe Kong,Wentian Zhang,Tao Wang,Kaihao Zhang,Yuexiang Li,Xiaoying Tang,Wenhan Luo

http://arxiv.org/abs/2401.01102v1

Compressor summary: The paper proposes a dual teacher knowledge distillation with domain alignment framework to improve face anti-spoofing performance by combining domain adversarial attack and perturbations to the input images.


Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding

Dehua Peng,Zhipeng Gui,Wenzhang Wei,Huayi Wu

http://arxiv.org/abs/2401.01100v1

Compressor summary: scML is a scalable manifold learning method that preserves global structure, works efficiently on large-scale data, and can be applied to various domains like single-cell transcriptomics and anomaly detection in ECG signals.


Robust single-particle cryo-EM image denoising and restoration

Jing Zhang,Tengfei Zhao,ShiYu Hu,Xin Zhao

http://arxiv.org/abs/2401.01097v1

Compressor summary: The paper presents a new method to improve the quality of cryo-electron microscopy images by removing noise and enhancing resolution.


Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector

Jitao Ma,Weiying Xie,Yunsong Li

http://arxiv.org/abs/2401.01093v1

Compressor summary: This paper proposes a new hyperspectral anomaly detection method (STAD) that mimics human visual perception and reduces false positives, while also making the model lightweight and efficient for edge devices.


Quokka: An Open-source Large Language Model ChatBot for Material Science

Xianjun Yang,Stephen D. Wilson,Linda Petzold

http://arxiv.org/abs/2401.01089v1

Compressor summary: The paper introduces a specialized chatbot for materials science, using the Llama-2 language model and the S2ORC dataset, to help researchers with context-aware queries.


Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

Jie Feng,Ke Wei,Jinchi Chen

http://arxiv.org/abs/2401.01084v1

Compressor summary: NPG-HM is a new policy search method in reinforcement learning that uses the Hessian-aided momentum technique for variance reduction and achieves the best known sample complexity for natural policy gradient type methods under certain assumptions.


Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

Triet Huynh Minh,Quan Le Bao

http://arxiv.org/abs/2401.01078v1

Compressor summary: The paper proposes using large language models like GPT-3 Babbage to generate Vietnamese "luc bat" poetry from natural language prompts and explore paraphrasing for translation and content control.


Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

Piao Hu,Jiashuo Jiang,Guodong Lyu,Hao Su

http://arxiv.org/abs/2401.01077v1

Compressor summary: The paper proposes online algorithms for a stochastic optimization problem with long-term constraints, using adversarial learning techniques, and analyzes their performance under various settings.


DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever

Zhichao Yin,Binyuan Hui,Min Yang,Fei Huang,Yongbin Li

http://arxiv.org/abs/2401.01076v1

Compressor summary: DialCLIP is a prompt-tuning method that enhances multi-modal dialog systems by learning context features and domain prompts for efficient and effective dialog retrieval.


Depth-discriminative Metric Learning for Monocular 3D Object Detection

Wonhyeok Choi,Mingyu Shin,Sunghoon Im

http://arxiv.org/abs/2401.01075v1

Compressor summary: The paper proposes a novel metric learning scheme for monocular 3D object detection that extracts depth-discriminative features without increasing inference time or model size, improving performance by 23.51% and 5.78% on average across KITTI and Waymo datasets.


AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

Qiuhui Chen,Xinyue Hu,Zirui Wang,Yi Hong

http://arxiv.org/abs/2401.01074v1

Compressor summary: Key points: - The paper proposes a transformer-based framework (Alifuse) for aligning and fusing multi-modal medical data - Alifuse converts images and texts into vision and language tokens, and uses attention mechanisms to learn holistic representations - Alifuse achieves state-of-the-art performance on classifying Alzheimer's disease on five datasets, beating eight baselines Summary: Alifuse is a transformer-based framework that fuses multi-modal medical data using attention mechanisms and outperforms eight baselines in classifying Alzheimer's disease.


Discovering Significant Topics from Legal Decisions with Selective Inference

Jerrold Soh

http://arxiv.org/abs/2401.01068v1

Compressor summary: The authors present a method to find important topics in legal texts using topic models and machine learning, which can help understand case outcomes and identify representative cases for each topic.


DTBS: Dual-Teacher Bi-directional Self-training for Domain Adaptation in Nighttime Semantic Segmentation

Fanding Huang,Zihao Yao,Wenhui Zhou

http://arxiv.org/abs/2401.01066v1

Compressor summary: The paper proposes a dual-teacher bi-directional self-training framework to improve semantic segmentation for autonomous vehicles in nighttime conditions by addressing style and illumination shifts.


BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving

Dafeng Wei,Tian Gao,Zhengyu Jia,Changwei Cai,Chengkai Hou,Peng Jia,Fu Liu,Kun Zhan,Jingchen Fan,Yixing Zhao,Yang Wang

http://arxiv.org/abs/2401.01065v1

Compressor summary: The paper proposes BEV-CLIP, a multimodal method for scene retrieval in autonomous driving using text and knowledge graph inputs, achieving high accuracy on NuScenes dataset.


LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Jun Zhao,Zhihao Zhang,Qi Zhang,Tao Gui,Xuanjing Huang

http://arxiv.org/abs/2401.01055v1

Compressor summary: The paper explores how to improve large language models' performance in non-English languages using less pretraining data and various methods, and evaluates their knowledge alignment and response quality across 13 low-resource languages.


Elastic Multi-Gradient Descent for Parallel Continual Learning

Fan Lyu,Wei Feng,Yuepan Li,Qing Sun,Fanhua Shang,Liang Wan,Liang Wang

http://arxiv.org/abs/2401.01054v1

Compressor summary: This paper proposes Elastic Multi-Gradient Descent (EMGD) for Parallel Continual Learning, which adjusts the learning direction to minimize negative impacts on previous tasks and balances training with a memory editing mechanism.


Cheetah: Natural Language Generation for 517 African Languages

Ife Adebara,AbdelRahim Elmadany,Muhammad Abdul-Mageed

http://arxiv.org/abs/2401.01053v1

Compressor summary: Cheetah is a multilingual natural language generation model that supports 517 African languages and performs well across various downstream tasks, promoting linguistic diversity in NLP.


PAC-Bayesian Domain Adaptation Bounds for Multi-view learning

Mehdi Hennequin,Khalid Benabdeslem,Haytham Elghazel

http://arxiv.org/abs/2401.01048v1

Compressor summary: The paper introduces a new distance measure and generalization bounds for multi-view domain adaptation using Pac-Bayesian theory.


Sharp Analysis of Power Iteration for Tensor PCA

Yuchen Wu,Kangjie Zhou

http://arxiv.org/abs/2401.01047v1

Compressor summary: The paper analyzes the tensor power iteration algorithm for finding planted signals in high-dimensional data and provides sharp bounds, a smaller threshold, and a stopping criterion for its convergence.


Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation

Mohammad Rostami,Dayuan Jian

http://arxiv.org/abs/2401.01042v1

Compressor summary: Our new algorithm adapts a deep neural network trained on frame-based data to work well with event-based data using self-supervised learning and uncorrelated conditioning, improving classification accuracy.


Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

Zishen Wan,Che-Kai Liu,Hanchen Yang,Chaojian Li,Haoran You,Yonggan Fu,Cheng Wan,Tushar Krishna,Yingyan Lin,Arijit Raychowdhury

http://arxiv.org/abs/2401.01040v1

Compressor summary: Neuro-symbolic AI combines neural, symbolic, and probabilistic methods to improve interpretability, robustness, and trustworthiness in AI systems while enabling human-AI collaboration and reasoning with less data.


Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations

Serban Stan,Mohammad Rostami

http://arxiv.org/abs/2401.01035v1

Compressor summary: Our online unsupervised domain adaptation algorithm improves semantic segmentation on unannotated domains by minimizing distributional distance between source and target latent features using a Gaussian mixture model surrogate for restricted source data access.