arxiv compressed, 2024-01-10

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-10 generated by the compressor, my personal LLM-based project.


A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

Ronglai Zuo,Fangyun Wei,Zenggui Chen,Brian Mak,Jiaolong Yang,Xin Tong

http://arxiv.org/abs/2401.04730v1

Compressor summary: The paper proposes a Spoken2Sign system that translates spoken languages into 3D sign languages using a simple baseline of three steps.


Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Xiyi Chen,Marko Mihajlovic,Shaofei Wang,Sergey Prokudin,Siyu Tang

http://arxiv.org/abs/2401.04728v1

Compressor summary: Key points: - The paper presents a new diffusion model for generating photorealistic human avatars from a single image or text prompt. - The model integrates a 3D morphable model to enable controllable facial expressions and body poses. - The model outperforms existing models on novel view and novel expression synthesis tasks. Summary: The paper introduces a new diffusion model that can create realistic human avatars with expressive faces and poseable bodies from one image or text, using a 3D morphable model for guidance.


Revisiting Adversarial Training at Scale

Zeyu Wang,Xianhang Li,Hongru Zhu,Cihang Xie

http://arxiv.org/abs/2401.04727v1

Compressor summary: Key points: - The paper proposes AdvXL, an efficient and effective framework for adversarial training with giant models and web-scale data. - AdvXL achieves new state-of-the-art robust accuracy records under AutoAttack on ImageNet-1K. - The code is available at https://github.com/UCSC-VLAA/AdvXL. Summary: The paper introduces AdvXL, a novel framework for training robust visual models with large-scale data and models, which sets new benchmarks for robust accuracy on ImageNet-1K.


Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Benedikt Roth,Valentin Koch,Sophia J. Wagner,Julia A. Schnabel,Carsten Marr,Tingying Peng

http://arxiv.org/abs/2401.04720v1

Compressor summary: The study shows that foundation models in computer vision can be finetuned with minimal resources to match or outperform existing feature extractors for computational pathology tasks.


Jump Cut Smoothing for Talking Heads

Xiaojuan Wang,Taesung Park,Yang Zhou,Eli Shechtman,Richard Zhang

http://arxiv.org/abs/2401.04718v1

Compressor summary: The paper presents a method to smooth jump cuts in talking head videos by using keypoints and landmarks from surrounding frames, interpolating them, and translating pixels with cross-modal attention.


Low-Resource Vision Challenges for Foundation Models

Yunhua Zhang,Hazel Doughty,Cees G. M. Snoek

http://arxiv.org/abs/2401.04716v1

Compressor summary: This paper explores low-resource image tasks in computer vision, where foundation models struggle to transfer well due to data scarcity, fine-grained differences, and distribution shift.


Model Editing Can Hurt General Abilities of Large Language Models

Jia-Chen Gu,Hao-Xiang Xu,Jun-Yu Ma,Pan Lu,Zhen-Hua Ling,Kai-Wei Chang,Nanyun Peng

http://arxiv.org/abs/2401.04700v1

Compressor summary: The paper discusses the trade-off between improving factuality and preserving general abilities in large language models through model editing methods.


Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

Gal Yona,Roee Aharoni,Mor Geva

http://arxiv.org/abs/2401.04695v1

Compressor summary: GRANOLA QA is a novel evaluation setting that considers multi-granularity answers to measure accuracy and informativeness of question answering models, which can improve their performance, especially for rare entities.


AI-based Mapping of the Conservation Status of Orchid Assemblages at Global Scale

Joaquim Estopinan,Maximilien Servajean,Pierre Bonnet,Alexis Joly,François Munoz

http://arxiv.org/abs/2401.04691v1

Compressor summary: This paper uses deep learning to map the global conservation status of orchids and finds high threat levels in Madagascar and Sumatra, highlighting the need for protection.


Mixture of multilayer stochastic block models for multiview clustering

Kylliann De Santiago,Marie Szafranski,Christophe Ambroise

http://arxiv.org/abs/2401.04682v1

Compressor summary: Key points: - A new method for aggregating multiple clustering from different sources - Uses a mixture of multilayer SBM to group co-membership matrices and partition observations - Provides identifiability, optimal number of clusters and components, and comparison with other methods - Applied to global food trading networks Summary: The paper presents a novel method that combines multiple clustering sources using multilayer SBM, and shows its advantages and applications in food trading networks.


CoordGate: Efficiently Computing Spatially-Varying Convolutions in Convolutional Neural Networks

Sunny Howard,Peter Norreys,Andreas Döpp

http://arxiv.org/abs/2401.04680v1

Compressor summary: CoordGate is a new module for convolutional neural networks that enhances spatially-varying convolutions and improves image deblurring.


RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

Mahdi Nikdan,Soroush Tabesh,Dan Alistarh

http://arxiv.org/abs/2401.04679v1

Compressor summary: RoSA is a new method for fine-tuning large language models that uses low-rank and sparse components to improve accuracy while saving computational resources.


Transfer-Learning-Based Autotuning Using Gaussian Copula

Thomas Randall,Jaehoon Koo,Brice Videau,Michael Kruse,Xingfu Wu,Paul Hovland,Mary Hall,Rong Ge,Prasanna Balaprakash

http://arxiv.org/abs/2401.04669v1

Compressor summary: Generative transfer learning using Gaussian copula improves autotuning for high-performance computing systems by efficiently sampling high-performing configurations and estimating few-shot budgets.


Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset

Galib Muhammad Shahriar Himel,Md. Masudul Islam

http://arxiv.org/abs/2401.04666v1

Compressor summary: This research compares pre-trained models for image classification on a cat and dog dataset, achieving 99.65% accuracy with NASNet Large on different computer architectures.


Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Zhen Qin,Weigao Sun,Dong Li,Xuyang Shen,Weixuan Sun,Yiran Zhong

http://arxiv.org/abs/2401.04658v1

Compressor summary: Lightning Attention-2 is a new linear attention algorithm that enables constant training speed for sequences of unlimited length by using tiling and GPU hardware optimization.


DepressionEmo: A novel dataset for multilabel classification of depression emotions

Abu Bakar Siddiqur Rahman,Hoang-Thang Ta,Lotfollah Najjar,Azad Azadmanesh,Ali Saffet Gönül

http://arxiv.org/abs/2401.04655v1

Compressor summary: The paper introduces DepressionEmo, a novel dataset to detect emotions associated with depression from Reddit user posts, and evaluates various text classification methods using it.


Learning to Prompt Segment Anything Models

Jiaxing Huang,Kai Jiang,Jingyi Zhang,Han Qiu,Lewei Lu,Shijian Lu,Eric Xing

http://arxiv.org/abs/2401.04651v1

Compressor summary: SSPrompt is a method to learn effective semantic and spatial prompts for improving segmentation masks of anything using Promptable Segmentation models like SEEM and SAM.


A novel framework for generalization of deep hidden physics models

Vijay Kag,Birupaksha Pal

http://arxiv.org/abs/2401.04648v1

Compressor summary: The paper proposes a new method to improve hidden physics models, allowing them to generalize to different input changes and system configurations.


Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks

Tanmay Garg,Deepika Vemuri,Vineeth N Balasubramanian

http://arxiv.org/abs/2401.04647v1

Compressor summary: The paper proposes an explanation module that generates visual concepts from classifier networks, which improves interpretability and performance in visual classification tasks using adversarial training.


Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Samuel Yanes Luis,Dmitriy Shutin,Juan Marchal Gómez,Daniel Gutiérrez Reina,Sergio Toral Marín

http://arxiv.org/abs/2401.04631v1

Compressor summary: Key points: - The paper proposes a multi-agent system of autonomous vehicles to monitor water quality using Local Gaussian Processes and Deep Reinforcement Learning. - The approach improves the estimation accuracy of water quality variables and algae blooms compared to existing methods. Summary: The paper presents a novel multi-agent system that uses Local Gaussian Processes and Deep Reinforcement Learning to monitor water quality more accurately than traditional approaches.


Agent Alignment in Evolving Social Norms

Shimin Li,Tianxiang Sun,Xipeng Qiu

http://arxiv.org/abs/2401.04620v1

Compressor summary: The paper proposes EvolutionaryAgent, a framework that aligns AI agents with social norms by evolving and selecting agents based on their fitness in adapting to changing norms.


Language Detection for Transliterated Content

Selva Kumar S,Afifah Khan Mohammed Ajmal Khan,Chirag Manjeshwar,Imadh Ajaz Banday

http://arxiv.org/abs/2401.04619v1

Compressor summary: The paper presents a dataset and methods to identify and classify languages in text messages that use the English alphabet to write native languages (transliteration), enhancing digital communication and overcoming language barriers.


Generic Knowledge Boosted Pre-training For Remote Sensing Images

Ziyue Huang,Mingming Zhang,Yuan Gong,Qingjie Liu,Yunhong Wang

http://arxiv.org/abs/2401.04614v1

Compressor summary: GeRSP is a novel pre-training framework for remote sensing images that combines self-supervised and supervised learning from both remote sensing and natural images to improve understanding tasks.


Distribution-Free Conformal Joint Prediction Regions for Neural Marked Temporal Point Processes

Victor Dheur,Tanguy Bosser,Rafael Izbicki,Souhaib Ben Taieb

http://arxiv.org/abs/2401.04612v1

Compressor summary: The paper proposes a framework to improve uncertainty quantification in neural Temporal Point Process models by using conformal prediction methods that generate more reliable and sharper joint prediction regions for event arrival times and marks.


EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang,Jiawei Feng,Hui Huang

http://arxiv.org/abs/2401.04608v1

Compressor summary: The paper introduces Emotional Image Content Generation (EICG), a new task to generate images that convey specific emotions using CLIP and novel losses for semantic clarity and emotion fidelity.


An Assessment on Comprehending Mental Health through Large Language Models

Mihael Arcan,Paul-David Niland,Fionn Delahunty

http://arxiv.org/abs/2401.04592v1

Compressor summary: The study evaluates the performance of different language models in understanding mental health expressions and finds that transformer-based models like BERT and XLNet perform better than others.


Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models

Xuewen Liu,Zhikai Li,Junrui Xiao,Qingyi Gu

http://arxiv.org/abs/2401.04585v1

Compressor summary: EDA-DM is a method to improve the performance of post-training quantization for diffusion models by addressing distribution mismatch issues in both calibration sample and reconstruction output levels.


Effective pruning of web-scale datasets based on complexity of concept clusters

Amro Abbas,Evgenia Rusak,Kushal Tirumala,Wieland Brendel,Kamalika Chaudhuri,Ari S. Morcos

http://arxiv.org/abs/2401.04578v1

Compressor summary: The authors prune large-scale multimodal datasets for training CLIP-style models to improve efficiency and performance, achieving better results with less data and compute.


Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

Yatong Bai,Utsav Garg,Apaar Shanker,Haoming Zhang,Samyak Parajuli,Erhan Bas,Isidora Filipovic,Amelia N. Chu,Eugenia D Fomitcheva,Elliot Branson,Aerin Kim,Somayeh Sojoudi,Kyunghyun Cho

http://arxiv.org/abs/2401.04575v1

Compressor summary: The Let's Go Shopping (LGS) dataset provides high-quality, large-scale image-caption pairs from e-commerce websites for vision-language tasks, overcoming the limitations of existing general-domain datasets.


Robust Imitation Learning for Automated Game Testing

Pierluigi Vito Amadori,Timothy Bradley,Ryan Spick,Guy Moss

http://arxiv.org/abs/2401.04572v1

Compressor summary: EVOLUTE is a new architecture for automated game testing that combines behavioural cloning with energy based models, improving efficiency and generalisation in shooting-and-driving games.


Phase-shifted remote photoplethysmography for estimating heart rate and blood pressure from facial video

Gyutae Hwang,Sang Jun Lee

http://arxiv.org/abs/2401.04560v1

Compressor summary: Key points: - The text proposes a vision-based method to estimate heart rate and blood pressure using deep learning - The method consists of two stages: one for rPPG signal detection and another for blood pressure estimation - The method achieves high accuracy on two datasets and reduces the MAE by 34.31% for heart rate estimation Summary: The text presents a deep learning framework that uses vision to estimate heart rate and blood pressure with high accuracy and low error compared to existing methods.


WaveletFormerNet: A Transformer-based Wavelet Network for Real-world Non-homogeneous and Dense Fog Removal

Shengli Zhang,Zhiyong Tao,Sen Lin

http://arxiv.org/abs/2401.04550v1

Compressor summary: The paper proposes a new Transformer-based wavelet network to improve foggy image recovery in complex real-world conditions by preserving texture details, avoiding color distortion, and enhancing feature extraction.


Evaluating Language Model Agency through Negotiations

Tim R. Davidson,Veniamin Veselovsky,Martin Josifoski,Maxime Peyrard,Antoine Bosselut,Michal Kosinski,Robert West

http://arxiv.org/abs/2401.04536v1

Compressor summary: The text discusses evaluating language models in negotiation games, which better reflect real-world scenarios and reveal their decision-making processes.


MERA: A Comprehensive LLM Evaluation in Russian

Alena Fenogenova,Artem Chervyakov,Nikita Martynov,Anastasia Kozlova,Maria Tikhonova,Albina Akhmetgareeva,Anton Emelyanov,Denis Shevelev,Pavel Lebedev,Leonid Sinev,Ulyana Isaeva,Katerina Kolomeytseva,Daniil Moskovskiy,Elizaveta Goncharova,Nikita Savushkin,Polina Mikhailova,Denis Dimitrov,Alexander Panchenko,Sergei Markov

http://arxiv.org/abs/2401.04531v1

Compressor summary: The paper introduces MERA, a benchmark for evaluating Russian foundation models in various tasks and domains, aiming to better understand their capabilities, limitations, and risks.


LUNA: A Framework for Language Understanding and Naturalness Assessment

Marat Saidov,Aleksandra Bakalova,Ekaterina Taktasheva,Vladislav Mikhailov,Ekaterina Artemova

http://arxiv.org/abs/2401.04522v1

Compressor summary: LUNA is a tool that simplifies the evaluation of Natural Language Generation (NLG) models by providing a unified interface for 20 NLG metrics, making it easy to add new ones.


The Critique of Critique

Shichao Sun,Junlong Li,Weizhe Yuan,Ruifeng Yuan,Wenjie Li,Pengfei Liu

http://arxiv.org/abs/2401.04518v1

Compressor summary: This paper introduces MetaCritique, a framework that evaluates critiques of large language models using precision, recall, and rationale, improving the quality of generative AI.


Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models

Mikhail Tikhomirov,Natalia Loukachevitch

http://arxiv.org/abs/2401.04515v1

Compressor summary: The article explores using language models to predict hypernymy relationships and improve them with co-hyponym information and iterative methods.


TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

Jiaqi Wang,Yuying Chang,Zhong Li,Ning An,Qi Ma,Lei Hei,Haibo Luo,Yifei Lu,Feiliang Ren

http://arxiv.org/abs/2401.04507v1

Compressor summary: TechGPT-2.0 is a large language model for knowledge graph construction tasks with improved capabilities in various domains and lengthy texts, trained on Huawei's Ascend server.


Optimal Survival Trees: A Dynamic Programming Approach

Tim Huisman,Jacobus G. M. van der Linden,Emir Demirović

http://arxiv.org/abs/2401.04489v1

Compressor summary: Survival trees use dynamic programming and recursion to model complex nonlinear relations in survival analysis, providing an optimal and scalable method that performs well in experiments.


Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

Yufei Guo,Yuanpei Chen

http://arxiv.org/abs/2401.04486v1

Compressor summary: The paper proposes a shortcut back-propagation method for training spiking neural networks and an evolutionary training framework to balance accuracy and ease of training, achieving better performance than existing methods.


Continuously Learning New Words in Automatic Speech Recognition

Christian Huber,Alexander Waibel

http://arxiv.org/abs/2401.04482v1

Compressor summary: Key points: - ASR systems struggle with recognizing special words in lectures - A self-supervised continual learning approach is proposed - The approach uses a memory-enhanced ASR model and adaptation datasets from slides - The approach improves performance on new words and preserves general performance Summary: The paper proposes a self-supervised continual learning method for ASR systems to learn special words in lectures using slides and memory enhancement.


Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset

Shrey Satapara,Parth Mehta,Debasis Ganguly,Sandip Modha

http://arxiv.org/abs/2401.04481v1

Compressor summary: The paper proposes using large language models to generate summaries that contain specific types of misinformation, creating a ground-truth dataset for detecting misinformation in news articles.


TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models

Xue Zhang,Xiangyu Shi,Xinyue Lou,Rui Qi,Yufeng Chen,Jinan Xu,Wenjuan Han

http://arxiv.org/abs/2401.04471v1

Compressor summary: The authors introduce TransportationGames, a benchmark to evaluate large language models' performance in the transportation domain across various tasks based on Bloom's Taxonomy levels.


MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Weimin Wang,Jiawei Liu,Zhijie Lin,Jiangqiao Yan,Shuo Chen,Chetwin Low,Tuyen Hoang,Jie Wu,Jun Hao Liew,Hanshu Yan,Daquan Zhou,Jiashi Feng

http://arxiv.org/abs/2401.04468v1

Compressor summary: MagicVideo-V2 is a text-to-video system that creates high-quality videos from text descriptions using various modules and achieves superior performance compared to other Text-to-Video systems.


PhilEO Bench: Evaluating Geo-Spatial Foundation Models

Casper Fibaek,Luke Camilleri,Andreas Luyts,Nikolaos Dionelis,Bertrand Le Saux

http://arxiv.org/abs/2401.04464v1

Compressor summary: The PhilEO Bench is a new evaluation framework for testing Earth Observation Foundation Models on a large Sentinel-2 dataset with three downstream tasks.


D3AD: Dynamic Denoising Diffusion Probabilistic Model for Anomaly Detection

Justin Tebbe,Jawad Tayyub

http://arxiv.org/abs/2401.04463v1

Compressor summary: The text proposes a new framework that improves diffusion models for anomaly detection by enhancing reconstruction and localization of variously sized anomalies.


AI Competitions and Benchmarks, Practical issues: Proposals, grant money, sponsors, prizes, dissemination, publicity

Magali Richard,Yuna Blum,Justin Guinney,Gustavo Stolovitzky,Adrien Pavão

http://arxiv.org/abs/2401.04452v1

Compressor summary: The chapter covers various aspects of organizing AI competitions, including motivating participants, engaging the community, managing logistics, and sharing results.


A Novel Dataset for Non-Destructive Inspection of Handwritten Documents

Eleonora Breci,Luca Guarnera,Sebastiano Battiato

http://arxiv.org/abs/2401.04448v1

Compressor summary: The text discusses a forensic handwriting analysis method that uses advanced software and machine learning to compare digitized documents, and presents a new dataset with both traditional and digital handwritten samples.


Image classification network enhancement methods based on knowledge injection

Yishuang Tian,Ning Wang,Liang Zhang

http://arxiv.org/abs/2401.04441v1

Compressor summary: The paper proposes a multi-level hierarchical deep learning algorithm that uses human cognition model and existing knowledge information to train deep neural networks, improving their interpretability and classification performance.


Empirical Analysis of Anomaly Detection on Hyperspectral Imaging Using Dimension Reduction Methods

Dongeon Kim,YeongHyeon Park

http://arxiv.org/abs/2401.04437v1

Compressor summary: The paper proposes a feature selection method for hyperspectral imaging to detect foreign matters in products faster and with better explainability than existing methods.


Uncertainty-aware Sampling for Long-tailed Semi-supervised Learning

Kuo Yang,Duo Li,Menghan Hu,Guangtao Zhai,Xiaokang Yang,Xiao-Ping Zhang

http://arxiv.org/abs/2401.04435v1

Compressor summary: The paper proposes a method to improve semi-supervised learning with imbalanced classes by using uncertainty-aware dynamic thresholds for pseudo-label selection, achieving better performance on long-tailed datasets.


i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

Haoyang Chen,Peiyan Sun,Qiyuan Song,Wanyuan Wang,Weiwei Wu,Wencan Zhang,Guanyu Gao,Yan Lyu

http://arxiv.org/abs/2401.04429v1

Compressor summary: i-Rebalance is a personalized vehicle reposition technique using deep reinforcement learning to balance demand and supply in ride-hailing platforms while considering drivers' preferences.


Meta-forests: Domain generalization on random forests with meta-learning

Yuyang Sun,Panagiotis Kosmas

http://arxiv.org/abs/2401.04425v1

Compressor summary: Meta-forests is a novel domain generalization algorithm that improves classifier performance by using meta-learning and maximum mean discrepancy to reduce correlation among trees and increase their strength.


Estimating Text Similarity based on Semantic Concept Embeddings

Tim vor der Brück,Marc Pouly

http://arxiv.org/abs/2401.04422v1

Compressor summary: Semantic Concept Embeddings improve Word2Vec word embeddings by capturing human thought processes and handling ambiguity in highly context-dependent words.


MapAI: Precision in Building Segmentation

Sander Riisøen Jyhne,Morten Goodwin,Per Arne Andersen,Ivar Oveland,Alexander Salveson Nossum,Karianne Ormseth,Mathilde Ørstavik,Andrew C. Flatman

http://arxiv.org/abs/2401.04406v1

Compressor summary: MapAI is a 2022 building segmentation competition using aerial images and LiDAR data, evaluated by IoU and Boundary IoU metrics.


MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation

Long Xu,Shanghong Li,Yongquan Chen,Jun Luo

http://arxiv.org/abs/2401.04403v1

Compressor summary: The paper proposes a new interactive segmentation algorithm that uses multi-scale tokens and contrastive loss to improve accuracy and efficiency, outperforming existing methods.


IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Ghadeer O. Ghosheh,Jin Li,Tingting Zhu

http://arxiv.org/abs/2401.04402v1

Compressor summary: IGNITE is a deep-learning model that learns patient dynamics from sparse and missing EHRs to generate personalized realistic values for data-driven models in personalized medicine.


Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Zilong Wang,Hao Zhang,Chun-Liang Li,Julian Martin Eisenschlos,Vincent Perot,Zifeng Wang,Lesly Miculicich,Yasuhisa Fujii,Jingbo Shang,Chen-Yu Lee,Tomas Pfister

http://arxiv.org/abs/2401.04398v1

Compressor summary: Chain-of-Table is a framework that uses tabular data to guide large language models in generating a chain of operations for table understanding tasks, improving accuracy and reliability.


The Role of Higher-Order Cognitive Models in Active Learning

Oskar Keurulainen,Gokhan Alcan,Ville Kyrki

http://arxiv.org/abs/2401.04397v1

Compressor summary: The text discusses a new paradigm for active learning with human feedback, considering how humans' higher levels of agency affect rational communication and providing an example and a computational study.


Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

Heewon Kim,Hyun Sung Chang,Kiho Cho,Jaeyun Lee,Bohyung Han

http://arxiv.org/abs/2401.04390v1

Compressor summary: The paper proposes an EM-based method to learn with noisy labels in computer vision, where two networks collaborate to distinguish clean labels and refurbish corrupted ones.


Machine unlearning through fine-grained model parameters perturbation

Zhiwei Zuo,Zhuo Tang,Kenli Li,Anwitaman Datta

http://arxiv.org/abs/2401.04385v1

Compressor summary: Key points: - Machine unlearning helps with user privacy but is costly - Proposed fine-grained strategies to address privacy and computational efficiency - Introduced new metrics and SPD-GAN to evaluate effectiveness and degree of unlearning Summary: The authors propose efficient machine unlearning techniques, novel metrics, and a data perturbation method (SPD-GAN) to balance user privacy protection and model performance.


Towards Explainable Artificial Intelligence (XAI): A Data Mining Perspective

Haoyi Xiong,Xuhong L,Xiaofei Zhang,Jiamin Chen,Xinhao Sun,Yuchen Li,Zeyi Sun,Mengnan Du

http://arxiv.org/abs/2401.04374v1

Compressor summary: This work reviews data-centric approaches to make deep neural networks more interpretable by analyzing how data collection, processing, and analysis affect model behavior and knowledge discovery.


Air Quality Forecasting Using Machine Learning: A Global perspective with Relevance to Low-Resource Settings

Mulomba Mukendi Christian,Hyebong Choi

http://arxiv.org/abs/2401.04369v1

Compressor summary: This study proposes a machine learning approach that uses two months of data to accurately predict air quality in 197 capital cities, with Random Forest algorithm achieving high performance and interpretability.


Enhancing Acute Kidney Injury Prediction through Integration of Drug Features in Intensive Care Units

Gabriel D. M. Manalu,Mulomba Mukendi Christian,Songhee You,Hyebong Choi

http://arxiv.org/abs/2401.04368v1

Compressor summary: The study proposes a novel multimodal approach using patient prescription data and drug embeddings to improve AKI prediction in the critical care setting, showing significant improvement over baseline models.


Probabilistic emotion and sentiment modelling of patient-reported experiences

Curtis Murray,Lewis Mitchell,Jonathan Tuke,Mark Mackay

http://arxiv.org/abs/2401.04367v1

Compressor summary: The study presents a new method to model emotions from online patient stories and develops a recommender system that predicts emotions and sentiments using topic analysis, outperforming existing methods.


SoK: Facial Deepfake Detectors

Binh M. Le,Jiwon Kim,Shahroz Tariq,Kristen Moore,Alsharif Abuadbba,Simon S. Woo

http://arxiv.org/abs/2401.04364v1

Compressor summary: The paper reviews state-of-the-art deepfake detectors and categorizes them into groups based on critical criteria, providing insights into their effectiveness across different attack scenarios.


Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example

Kwan Yun,Youngseo Kim,Kwanggyoon Seo,Chang Wook Seo,Junyong Noh

http://arxiv.org/abs/2401.04362v1

Compressor summary: DiffSketch is a method that creates stylized sketches from images using deep features and can be trained with one manual drawing, outperforming other methods.


Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning

Jiaan Wang,Jianfeng Qu,Kexin Wang,Zhixu Li,Wen Hua,Ximing Li,An Liu

http://arxiv.org/abs/2401.04361v1

Compressor summary: The paper proposes a contrastive learning framework to improve knowledge-grounded dialogue's robustness against real-world noises like misspellings, abbreviations, incomplete, erroneous, and outdated facts in external knowledge graphs.


Iterative Feedback Network for Unsupervised Point Cloud Registration

Yifan Xie,Boyu Wang,Shiqi Li,Jihua Zhu

http://arxiv.org/abs/2401.04357v1

Compressor summary: The paper proposes a novel Iterative Feedback Network (IFNet) that improves unsupervised point cloud registration by efficiently enriching low-level features with high-level ones and using a geometry-awareness descriptor for more precise results.


Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

Xuzheng Yu,Chen Jiang,Wei Zhang,Tian Gan,Linlin Chao,Jianan Zhao,Yuan Cheng,Qingpei Guo,Wei Chu

http://arxiv.org/abs/2401.04354v1

Compressor summary: The paper proposes a novel two-stream framework for video scene recognition that uses temporal and non-temporal perspectives, self-distillation, and knowledge-enhanced feature fusion to classify scenes in videos effectively.


A Change Point Detection Integrated Remaining Useful Life Estimation Model under Variable Operating Conditions

Anushiya Arunan,Yan Qin,Xiaoli Li,Chau Yuen

http://arxiv.org/abs/2401.04351v1

Compressor summary: The paper proposes a model that detects changes in device health using temporal correlation features and improves remaining useful life estimation accuracy by considering heterogeneous change points.


Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang,Jie Zhang,Zheng Yuan,Shiguang Shan

http://arxiv.org/abs/2401.04350v1

Compressor summary: PMG-AFT is a method that enhances CLIP's zero-shot adversarial robustness by preserving the generalization features of the pre-trained model using an auxiliary branch and minimizing the distance between feature distributions.


LAMPAT: Low-Rank Adaption for Multilingual Paraphrasing Using Adversarial Training

Khoi M. Le,Trinh Pham,Tho Quan,Anh Tuan Luu

http://arxiv.org/abs/2401.04348v1

Compressor summary: The paper introduces LAMPAT, an unsupervised multilingual paraphrasing model that uses low-rank adaptation and adversarial training to generate diverse and human-like sentences from monolingual data without parallel corpora.


RomniStereo: Recurrent Omnidirectional Stereo Matching

Hualie Jiang,Rui Xu,Minglang Tan,Wenjie Jiang

http://arxiv.org/abs/2401.04345v1

Compressor summary: The paper proposes a recurrent omnidirectional stereo matching algorithm that improves the performance over previous methods and introduces two techniques to enhance it further.


Private Fine-tuning of Large Language Models with Zeroth-order Optimization

Xinyu Tang,Ashwinee Panda,Milad Nasr,Saeed Mahloujifar,Prateek Mittal

http://arxiv.org/abs/2401.04343v1

Compressor summary: DP-ZO is a method for privately fine-tuning large language models by randomizing gradient directions and privatizing step sizes with Laplace or Gaussian noise, achieving a strong trade-off between privacy and performance.


Memory-Efficient Personalization using Quantized Diffusion Model

Hyogon Ryu,Seohyun Lim,Hyunjung Shim

http://arxiv.org/abs/2401.04339v1

Compressor summary: This paper explores fine-tuning quantized diffusion models for generative AI and proposes two strategies to improve personalization and prompt fidelity without compromising image quality.


G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

Youshao Xiao,Shangchun Zhao,Zhenglei Zhou,Zhaoxin Huan,Lin Ju,Xiaolu Zhang,Lin Wang,Jun Zhou

http://arxiv.org/abs/2401.04338v1

Compressor summary: G-Meta is a high-performance framework for large-scale meta learning based recommendation models that improves efficiency and statistical performance in distributed training on GPU clusters.


Mix-GENEO: A flexible filtration for multiparameter persistent homology detects digital images

Jiaxing He,Bingzhe Hou,Tieru Wu,Yue Xin

http://arxiv.org/abs/2401.04332v1

Compressor summary: The paper proposes three multifiltrations (multi-GENEO, multi-DGENEO, mix-GENEO) for Topological Data Analysis and shows their stability and effectiveness on MNIST dataset.


BD-MSA: Body decouple VHR Remote Sensing Image Change Detection method guided by multi-scale feature information aggregation

Yonghui Tan,Xiaolong Li,Yishu Chen,Jinquan Ai

http://arxiv.org/abs/2401.04330v1

Compressor summary: The BD-MSA model uses deep learning to detect changes in remote sensing images by extracting boundary information and separating the main body from the boundary, outperforming other models on public datasets.


RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale

Han Li,Yukai Ma,Yaqing Gu,Kewei Hu,Yong Liu,Xingxing Zuo

http://arxiv.org/abs/2401.04325v1

Compressor summary: The paper proposes a method to fuse Radar and image data for accurate dense depth estimation using four stages of processing.


Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs

Junjie Wang,Dan Yang,Binbin Hu,Yue Shen,Ziqi Liu,Wen Zhang,Jinjie Gu,Zhiqiang Zhang

http://arxiv.org/abs/2401.04319v1

Compressor summary: The paper proposes ARALLM, a method to use large language models for transforming natural language demands into structured logical languages for user targeting, by enhancing their reasoning ability with analogical prompts and distilling multi-task models.


Private Truly-Everlasting Robust-Prediction

Uri Stemmer

http://arxiv.org/abs/2401.04311v1

Compressor summary: PEP is a private learning model that predicts labels from unlabeled examples, with improvements in robustness, privacy-parameter independence, and reduced sample complexity.


Advancing Deep Active Learning & Data Subset Selection: Unifying Principles with Information-Theory Intuitions

Andreas Kirsch

http://arxiv.org/abs/2401.04305v1

Compressor summary: The thesis explores data subset selection techniques, such as active learning and active sampling, in deep learning models using information-theoretic principles to improve label and training efficiency.


Setting the Record Straight on Transformer Oversmoothing

Gbètondji J-S Dovonon,Michael M. Bronstein,Matt J. Kusner

http://arxiv.org/abs/2401.04301v1

Compressor summary: This paper analyzes the causes of oversmoothing in Transformers and proposes a way to control their spectrum, leading to improved generalization.