arxiv compressed, 2023-11-24

This page contains one-sentence summaries of cs.AI/ML/CV papers announced on 2023-11-24 generated by the compressor, my personal LLM-based project.


SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

Lingchen Meng,Shiyi Lan,Hengduo Li,Jose M. Alvarez,Zuxuan Wu,Yu-Gang Jiang

http://arxiv.org/abs/2311.14671v1

Compressor summary: SEGIC is an end-to-end framework that uses a single vision foundation model to learn segmentation rules from few labeled examples and achieve state-of-the-art performance on one-shot segmentation tasks.


Understanding Self-Supervised Features for Learning Unsupervised Instance Segmentation

Paul Engstler,Luke Melas-Kyriazi,Christian Rupprecht,Iro Laina

http://arxiv.org/abs/2311.14665v1

Compressor summary: The paper explores how self-supervised learning methods can be used for instance segmentation without manual annotations and compares different methods based on their ability to separate instances.


Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

Zhen Qin,Xuwei Tan,Zhihui Zhu

http://arxiv.org/abs/2311.14658v1

Compressor summary: The text discusses the benefits and challenges of using orthonormal weight matrices in deep neural networks, and presents a new approach to analyze their convergence with linear gradient descent and a class of loss functions.


Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

Jonathan Roberts,Timo Lüddecke,Rehan Sheikh,Kai Han,Samuel Albanie

http://arxiv.org/abs/2311.14656v1

Compressor summary: The paper explores how well large language models can perform geographic tasks and evaluates GPT-4V's performance on a new visual benchmark.


Data-driven Prior Learning for Bayesian Optimisation

Sigrid Passano Hellan,Christopher G. Lucas,Nigel H. Goddard

http://arxiv.org/abs/2311.14653v1

Compressor summary: The paper proposes a weaker assumption for transfer learning in Bayesian optimization, analyses the method Prior Learning for Bayesian Optimization (PLeBO), and shows its effectiveness using synthetic and real-world data.


One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

Raghav Addanki,Chenyang Li,Zhao Song,Chiwun Yang

http://arxiv.org/abs/2311.14652v1

Compressor summary: The authors propose a new algorithm that uses sublinear space to store Key and Value matrices for large language models in streaming applications, improving memory efficiency.


Learning in Deep Factor Graphs with Gaussian Belief Propagation

Seth Nabarro,Mark van der Wilk,Andrew J Davison

http://arxiv.org/abs/2311.14649v1

Compressor summary: The authors propose an efficient method to train and predict using Gaussian factor graphs, which can handle large-scale problems and enable continual learning.


More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

James B. Simon,Dhruva Karkada,Nikhil Ghosh,Mikhail Belkin

http://arxiv.org/abs/2311.14646v1

Compressor summary: The paper provides theoretical support for the idea that larger models, more data, and more computation improve performance in random feature regression models, which are equivalent to shallow neural networks with only the last layer trained.


A General Framework for User-Guided Bayesian Optimization

Carl Hvarfner,Frank Hutter,Luigi Nardi

http://arxiv.org/abs/2311.14645v1

Compressor summary: ColaBO is a Bayesian optimization framework that allows domain experts to customize the optimization routine by incorporating their prior beliefs about the function being optimized.


Continuous football player tracking from discrete broadcast data

Matthew J. Penn,Christl A. Donnelly,Samir Bhatt

http://arxiv.org/abs/2311.14642v1

Compressor summary: The paper proposes a method to estimate continuous full-pitch player tracking data from broadcast footage, which could be affordable for many football teams.


Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing

Dhruv Patel,Shivani Chepuri,Sarvesh Thakur,K. Harikumar,Ravi Kiran S.,K. Madhava Krishna

http://arxiv.org/abs/2311.14635v1

Compressor summary: The paper proposes a method to use UAVs and computer vision to automatically count windows in buildings for earthquake analysis.


One Strike, You're Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images

Thomas Jurriaans,Kinga Szarkowska,Eric Nalisnick,Markus Schwoerer,Camilo Thorne,Saber Akhondi

http://arxiv.org/abs/2311.14633v1

Compressor summary: This paragraph discusses a novel method for classifying Markush chemical structures using end-to-end learning (CNN), which significantly outperforms fixed-feature extraction and has the potential to improve OCSR pipelines.


Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

Xinwei Zhang,Zhiqi Bu,Zhiwei Steven Wu,Mingyi Hong

http://arxiv.org/abs/2311.14632v1

Compressor summary: The paper proposes an error-feedback differential privacy algorithm for training deep learning models that reduces the constant bias from gradient clipping and provides better performance and privacy guarantees.


CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization

Ruoyu Zhao,Mingrui Zhu,Shiyin Dong,Nannan Wang,Xinbo Gao

http://arxiv.org/abs/2311.14631v1

Compressor summary: CatVersion is a text-to-image method that learns a personalized concept from few examples, preserves prior knowledge in diffusion models, and improves image alignment scores for better editing.


Neural Style Transfer for Computer Games

Eleftherios Ioannou,Steve Maddock

http://arxiv.org/abs/2311.14617v1

Compressor summary: The paper presents a method for applying depth-aware Neural Style Transfer to 3D computer games in real-time, resulting in high-quality stylized scenes that surpass existing image and video NST techniques.


Animate124: Animating One Image to 4D Dynamic Scene

Yuyang Zhao,Zhiwen Yan,Enze Xie,Lanqing Hong,Zhenguo Li,Gim Hee Lee

http://arxiv.org/abs/2311.14603v1

Compressor summary: Animate124 is a new method that can animate a single image into 3D video using textual descriptions and a neural model with multiple diffusion priors to address semantic drift.


A Metalearned Neural Circuit for Nonparametric Bayesian Inference

Jake C. Snell,Gianluca Bencomo,Thomas L. Griffiths

http://arxiv.org/abs/2311.14601v1

Compressor summary: The paper presents a method to transfer the inductive bias of nonparametical Bayesian models to neural networks, allowing them to handle long-tailed class distributions and perform sequential inference over an open set of classes efficiently.


Example-Based Explanations of Random Forest Predictions

Henrik Boström

http://arxiv.org/abs/2311.14581v1

Compressor summary: The text describes a method for explaining random forest predictions by using a subset of training examples, which can reduce the number of examples and improve the explanations' usefulness.


Large Language Models as Automated Aligners for benchmarking Vision-Language Models

Yuanfeng Ji,Chongjian Ge,Weikai Kong,Enze Xie,Zhengying Liu,Zhengguo Li,Ping Luo

http://arxiv.org/abs/2311.14580v1

Compressor summary: Auto-Bench is a new benchmark that uses large language models to create question-answer-reasoning tasks to evaluate vision-language models' alignment with human intelligence.


RAISE -- Radiology AI Safety, an End-to-end lifecycle approach

M. Jorge Cardoso,Julia Moosbauer,Tessa S. Cook,B. Selnur Erdal,Brad Genereaux,Vikash Gupta,Bennett A. Landman,Tiarna Lee,Parashkev Nachev,Elanchezhian Somasundaram,Ronald M. Summers,Khaled Younis,Sebastien Ourselin,Franz MJ Pfister

http://arxiv.org/abs/2311.14570v1

Compressor summary: The paragraph discusses the importance of rigorous evaluation, safety, effectiveness, and collaboration for integrating AI into radiology to achieve its potential benefits while addressing risks and challenges.


Electric Vehicles coordination for grid balancing using multi-objective Harris Hawks Optimization

Cristina Bianca Pop,Tudor Cioara,Viorica Chifu,Ionut Anghel,Francesco Bellesini

http://arxiv.org/abs/2311.14563v1

Compressor summary: The paper proposes a model for coordinating electric vehicles (EVs) charging and discharging to balance the local grid, using Harris Hawks Optimization (HHO) to optimize schedules based on energy, time, and location criteria.


Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

Yufei Zhan,Yousong Zhu,Zhiyang Chen,Fan Yang,Ming Tang,Jinqiao Wang

http://arxiv.org/abs/2311.14552v1

Compressor summary: The paper introduces a novel dataset and a baseline model, $\textbf{Griffon}$, that shows LVLMs can perform fine-grained object perception and location awareness without additional modules or expert models.


Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning

Yassir Bendou,Vincent Gripon,Bastien Pasdeloup,Giulia Lioi,Lukas Mauch,Fabien Cardinaux,Ghouthi Boukli Hacene

http://arxiv.org/abs/2311.14544v1

Compressor summary: The paper proposes a novel approach using text to predict mean and covariance statistics of visual features for each class, improving few-shot learning robustness and generalizability.


ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

Eslam Mohamed Bakr,Liangbing Zhao,Vincent Tao Hu,Matthieu Cord,Patrick Perez,Mohamed Elhoseiny

http://arxiv.org/abs/2311.14542v1

Compressor summary: ToddlerDiffusion is an interpretable image synthesis framework that generates contours, palettes, and detailed colored images, outperforming existing methods while being faster and more efficient.


Finding Foundation Models for Time Series Classification with a PreText Task

Ali Ismail-Fawaz,Maxime Devanne,Stefano Berretti,Jonathan Weber,Germain Forestier

http://arxiv.org/abs/2311.14534v1

Compressor summary: The paper proposes a method to reduce overfitting in Time Series Classification using pre-trained domain foundation models that can identify the originating dataset of each sample and apply flexible convolution filters across different datasets.


Comparing Feature Engineering and End-to-End Deep Learning for Autism Spectrum Disorder Assessment based on Fullbody-Tracking

Alberto Altozano,Maria Eleonora Minissi,Mariano Alcañiz,Javier Marín-Morales

http://arxiv.org/abs/2311.14533v1

Compressor summary: The paragraph discusses a study comparing end-to-end models and hand-crafted features for autism spectrum disorder assessment using virtual reality tasks, finding that both methods have strengths and weaknesses.


GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen,Zilong Chen,Chi Zhang,Feng Wang,Xiaofeng Yang,Yikai Wang,Zhongang Cai,Lei Yang,Huaping Liu,Guosheng Lin

http://arxiv.org/abs/2311.14521v1

Compressor summary: GaussianEditor is an efficient 3D editing algorithm based on Gaussian Splatting that improves precision, control, and performance in complex scenes using novel techniques like semantic tracing and hierarchical splatting.


Multi-Class Anomaly Detection based on Regularized Discriminative Coupled hypersphere-based Feature Adaptation

Mehdi Rafiei,Alexandros Iosifidis

http://arxiv.org/abs/2311.14506v1

Compressor summary: The paper presents a new model for multi-class anomaly detection that combines a modified Regularized Discriminative Variational Auto-Encoder (RD-VAE) with Coupled-hypersphere-based Feature Adaptation (CFA), achieving better results than eight existing methods.


StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Shida Wang,Qianxiao Li

http://arxiv.org/abs/2311.14495v1

Compressor summary: The paper explores how different parameterizations affect the long-term memory learning abilities of state-space models and introduces new techniques to improve their performance.


MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

Zhiqi Li,Yiming Chen,Lingzhe Zhao,Peidong Liu

http://arxiv.org/abs/2311.14494v1

Compressor summary: MVControl is a new neural network architecture that improves multi-view image generation by incorporating extra input conditions, enabling controllable image creation and view-consistent 3D content generation using a hybrid diffusion prior.


Towards Interpretable Classification of Leukocytes based on Deep Learning

Stefan Röhrl,Johannes Groll,Manuel Lengl,Simon Schumann,Christian Klenk,Dominik Heim,Martin Knopp,Oliver Hayden,Klaus Diepold

http://arxiv.org/abs/2311.14485v1

Compressor summary: This work explores label-free cytological imaging using machine learning, confidence calibration, visual explanations, and detection patterns in neural networks for automated leukocyte classification and analysis.


MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting

Nathan Blake,Hana Chockler,David A. Kelly,Santiago Calderon Pena,Akchunya Chanchal

http://arxiv.org/abs/2311.14471v1

Compressor summary: The paper compares black-box methods to white-box method gradcam in explaining medical image classifications and finds that most black-box tools are not suitable but one, called causal explainability-based rex, performs as well as gradcam.


Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Corentin Salaün,Xingchang Huang,Iliyan Georgiev,Niloy J. Mitra,Gurprit Singh

http://arxiv.org/abs/2311.14468v1

Compressor summary: The paper introduces an algorithm that incorporates existing importance functions into a framework for adaptive or importance sampling in SGD, improving convergence in classification and regression tasks with minimal computational overhead.


Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

Loh Sher En Jessica,Naheed Anjum Arafat,Wei Xian Lim,Wai Lee Chan,Adams Wai Kin Kong

http://arxiv.org/abs/2311.14464v1

Compressor summary: The paper proposes new geometric representations and features for graph neural network-based computational fluid dynamics simulations to improve accuracy and reduce computation cost.


IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather

Furqan Ahmed Shaik,Abhishek Malreddy,Nikhil Reddy Billa,Kunal Chaudhary,Sunny Manchanda,Girish Varma

http://arxiv.org/abs/2311.14459v1

Compressor summary: The IDD-AW dataset contains 5000 pairs of annotated images in various adverse weather and traffic conditions, designed to evaluate the safety and robustness of autonomous vehicles.


How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

Zicong Zhao

http://arxiv.org/abs/2311.14457v1

Compressor summary: The paper proposes a SSA-DRL framework that combines linear temporal logic, reinforcement learning, Monte Carlo tree search, and an additional actor to ensure safe and efficient autonomous operation of urban rail transit trains.


Universal Jailbreak Backdoors from Poisoned Human Feedback

Javier Rando,Florian Tramèr

http://arxiv.org/abs/2311.14455v1

Compressor summary: The paper explores how adversaries can create powerful backdoors in large language models trained with Reinforcement Learning from Human Feedback (RLHF) by poisoning the training data, enabling harmful responses with a single trigger word.


GCPV: Guided Concept Projection Vectors for the Explainable Inspection of CNN Feature Spaces

Georgii Mikriukov,Gesina Schwalbe,Christian Hellert,Korinna Bade

http://arxiv.org/abs/2311.14435v1

Compressor summary: The paragraph introduces a new approach called Guided Concept Projection Vectors (GCPV) that improves the interpretation and debugging of computer vision neural networks by generating precise and multi-layer concept vectors from latent representations.


Human-Machine Cooperative Multimodal Learning Method for Cross-subject Olfactory Preference Recognition

Xiuxin Xia,Yuchen Guo,Yanwei Wang,Yuchao Yang,Yan Shi,Hong Men

http://arxiv.org/abs/2311.14426v1

Compressor summary: The paper proposes a multimodal learning method combining E-nose and olfactory EEG to improve cross-subject odor preference recognition, overcoming their individual limitations and achieving better results than existing methods.


A Comparison of PDF Projection with Normalizing Flows and SurVAE

Paul M. Baggenstoss,Felix Govaers

http://arxiv.org/abs/2311.14412v1

Compressor summary: SurVAE extends normalizing flows to handle dimension-altering transformations, but it is essentially a re-invention of older technique called PDF projection.


Unveiling The Factors of Aesthetic Preferences with Explainable AI

Derya Soydaner,Johan Wagemans

http://arxiv.org/abs/2311.14410v1

Compressor summary: The authors use machine learning models and explainable AI techniques to understand how different aesthetic attributes affect people's preferences for images.


LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design

Niklas Dobberstein,Astrid Maass,Jan Hamaekers

http://arxiv.org/abs/2311.14407v1

Compressor summary: LLamol is a novel generative transformer model that can create organic compounds with various conditions by using stochastic context learning and token sequences.


OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maxim Kolodiazhnyi,Anna Vorontsova,Anton Konushin,Danila Rukhovich

http://arxiv.org/abs/2311.14405v1

Compressor summary: OneFormer3D is a unified model that performs instance, semantic, and panoptic segmentation of 3D point clouds using learnable kernels trained with a transformer-based decoder, achieving state-of-the-art results on several benchmarks.


BHGNN-RT: Network embedding for directed heterogeneous graphs

Xiyang Sun,Fumiyasu Komaki

http://arxiv.org/abs/2311.14404v1

Compressor summary: The paper proposes a bidirectional heterogeneous graph neural network (BHGNN-RT) for directed heterogeneous graphs that uses message-passing and teleportation to overcome over-smoothing, achieving state-of-the-art performance in node classification and clustering tasks.


TEA: Test-time Energy Adaptation

Yige Yuan,Bingbing Xu,Liang Hou,Fei Sun,Huawei Shen,Xueqi Cheng

http://arxiv.org/abs/2311.14402v1

Compressor summary: TEA is a novel energy-based method to improve model generalizability by enhancing its perception of test data distributions without needing training data or processes.


Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

Ke Cheng,Xuecheng Hua,Hu Lu,Juanjuan Tu,Yuanquan Wang,Shitong Wang

http://arxiv.org/abs/2311.14395v1

Compressor summary: The paper proposes a network called MSCMNet to effectively use semantic features and modality information for person re-identification tasks by using multiple scales, novel components, and a specific loss function.


Directly Attention Loss Adjusted Prioritized Experience Replay

Zhuoying Chen,Huiping Li,Zhaoxu Wang

http://arxiv.org/abs/2311.14390v1

Compressor summary: DALAP is a new off policy RL framework that uses self-attention to correct the distribution shift caused by PER, and also optimizes sample screening for faster and more stable training.


A Parameterized Generative Adversarial Network Using Cyclic Projection for Explainable Medical Image Classification

Xiangyu Xiong,Yue Sun,Xiaohong Liu,ChanTong Lam,Tong Tong,Hao Chen,Qinquan Gao,Wei Ke,Tao Tan

http://arxiv.org/abs/2311.14388v1

Compressor summary: ParaGAN is a novel method that uses projection distance parameters and class-difference maps to generate domain-specific synthetic samples for improved image classification on small-scale datasets.


Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

Mingze Wang,Zeping Min,Lei Wu

http://arxiv.org/abs/2311.14387v1

Compressor summary: The paper proposes a new algorithm called Progressive Rescaling Gradient Descent (PRGD) that can efficiently maximize the margin for linearly separable data, unlike existing algorithms like gradient descent and normalized gradient descent.


Ethical implications of ChatGPT in higher education: A scoping review

Ming Li,Ariunaa Enkhtur,Fei Cheng,Beverley Anne Yamamoto

http://arxiv.org/abs/2311.14378v1

Compressor summary: The scoping review examines the ethical issues of using ChatGPT in higher education by reviewing academic articles and identifying six main areas of concern.


Deciphering and integrating invariants for neural operator learning with various physical mechanisms

Rui Zhang,Qi Meng,Zhi-Ming Ma

http://arxiv.org/abs/2311.14361v1

Compressor summary: PIANO is a novel neural operator method that learns from physical invariants in PDEs and achieves better performance than existing techniques on various forecasting tasks.


Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Minshan Xie,Hanyuan Liu,Chengze Li,Tien-Tsin Wong

http://arxiv.org/abs/2311.14343v1

Compressor summary: The paper proposes a synchronized multi-frame diffusion framework for text-guided video stylization that maintains visual details and temporal consistency by sharing information among frames using optical flow.


Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models

Cristiano Patrício,Luís F. Teixeira,João C. Neves

http://arxiv.org/abs/2311.14339v1

Compressor summary: The authors propose a vision-language model that uses CLIP with textual embeddings based on concepts to classify skin lesions, reducing the need for concept-annotated data and outperforming other methods.


TVT: Training-Free Vision Transformer Search on Tiny Datasets

Zimian Wei,Hengyue Pan,Lujun Li,Peijie Dong,Zhiliang Tian,Xin Niu,Dongsheng Li

http://arxiv.org/abs/2311.14337v1

Compressor summary: The paper proposes a training-free method to search for the best ViT model for distilling with ConvNet teachers, using teacher-aware and student-capability metrics, and shows improved efficiency and effectiveness compared to previous methods.


Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

Usneek Singh,Piyush Arora,Shamika Ganesan,Mohit Kumar,Siddhant Kulkarni,Salil R. Joshi

http://arxiv.org/abs/2311.14335v1

Compressor summary: The paper compares transformer-based models for tabular data on a large industry dataset, addressing challenges like high-dimensional data and efficient pre-processing, and discusses trade-offs between resources and performance.


Maximizing Discrimination Capability of Knowledge Distillation with Energy-based Score

Seonghak Kim,Gyeongdo Ham,Suin Lee,Donggon Jang,Daeshik Kim

http://arxiv.org/abs/2311.14334v1

Compressor summary: The authors propose an energy-based knowledge distillation method that uses temperature scaling to adjust non-target class predictions, improving performance on various datasets and enabling data augmentation on resource-limited devices.


Cycle Invariant Positional Encoding for Graph Representation Learning

Zuoyu Yan,Tengfei Ma,Liangcai Gao,Zhi Tang,Chao Chen,Yusu Wang

http://arxiv.org/abs/2311.14333v1

Compressor summary: CycleNet is a structure encoding module for graph neural networks that uses edge structure encoding to incorporate cycle information in a permutation invariant way, improving network performance on various benchmarks.


GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation

Yakun Chen,Xianzhi Wang,Guandong Xu

http://arxiv.org/abs/2311.14332v1

Compressor summary: The GATGPT framework combines a graph attention mechanism with pre-trained large language models to impute missing values in spatiotemporal data, improving on traditional methods.


Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs

Shengyin Sun,Yuxiang Ren,Chen Ma,Xuecang Zhang

http://arxiv.org/abs/2311.14324v1

Compressor summary: The authors explore using large language models to improve the structure of text-attributed graphs for node classification tasks by removing unreliable edges, adding reliable ones, and refining edge weights with pseudo-labels.


Binarized 3D Whole-body Human Mesh Recovery

Zhiteng Li,Yulun Zhang,Jing Lin,Haotong Qin,Jinjin Gu,Xin Yuan,Linghe Kong,Xiaokang Yang

http://arxiv.org/abs/2311.14323v1

Compressor summary: Key points: - The paper introduces BiDRN, a binarization method for 3D human reconstruction from a single image - BiDRN consists of BiDRB units with Local Convolution Residual and Block Residual modules - BiDRN achieves comparable performance with Hand4Whole while using much fewer parameters and operations Summary: The paper presents BiDRN, a novel binarization method for 3D human reconstruction that uses less memory and computation than existing methods.


Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Hui Liu,Wenya Wang,Hao Sun,Anderson Rocha,Haoliang Li

http://arxiv.org/abs/2311.14315v1

Compressor summary: The paragraph discusses a new approach called RDCM for detecting multi-modal misinformation on social media by aligning textual and visual modalities and handling domain shift issues.


Stable Cluster Discrimination for Deep Clustering

Qi Qian

http://arxiv.org/abs/2311.14310v1

Compressor summary: The paragraph discusses a novel method called SeCu for one-stage deep clustering that overcomes challenges in representation learning and clustering by introducing a stable cluster discrimination task and a hardness-aware criterion.


Cosine Similarity Knowledge Distillation for Individual Class Information Transfer

Gyeongdo Ham,Seonghak Kim,Suin Lee,Jae-Hyeok Lee,Daeshik Kim

http://arxiv.org/abs/2311.14307v1

Compressor summary: The paper introduces a novel Knowledge Distillation method using cosine similarity and a weighted temperature technique to improve student performance, achieving results comparable or better than teacher models.


New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Vasantha Kumar Venugopal,Abhishek Gupta,Rohit Takhar,Vidur Mahajan

http://arxiv.org/abs/2311.14305v1

Compressor summary: The authors propose two metrics to monitor AI radiology models' accuracy and stability, ensuring reliable AI use in healthcare.


AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine

Jie Lian,Xufang Luo,Caihua Shan,Dongqi Han,Varut Vardhanabhuti,Dongsheng Li

http://arxiv.org/abs/2311.14304v1

Compressor summary: The paper presents a novel algorithm that automatically selects important features to build patient similarity graphs and use graph neural networks for precision medicine, improving performance in two real-medical scenarios.


GeoViT: A Versatile Vision Transformer Architecture for Geospatial Image Analysis

Madhav Khirwar,Ankur Narang

http://arxiv.org/abs/2311.14301v1

Compressor summary: The paper introduces GeoViT, a compact vision transformer model that processes satellite imagery to estimate CO2 and NO2 emissions and power generation, outperforming previous models and helping monitor and regulate greenhouse gas emissions.


Decouple Content and Motion for Conditional Image-to-Video Generation

Cuifeng Shen,Yulu Gan,Chen Chen,Xiongwei Zhu,Lele Cheng,Jinzhi Wang

http://arxiv.org/abs/2311.14294v1

Compressor summary: The paper proposes a novel method for conditional image-to-video generation that disentangles spatial content and temporal motions, improving motion consistency and visual continuity while being more efficient than previous approaches.


Paragraph-to-Image Generation with Information-Enriched Diffusion Model

Weijia Wu,Zhuang Li,Yefei He,Mike Zheng Shou,Chunhua Shen,Lele Cheng,Yan Li,Tingting Gao,Di Zhang,Zhongyuan Wang

http://arxiv.org/abs/2311.14284v1

Compressor summary: The paper proposes a new model called ParaDiffusion that uses a language model to encode long paragraphs and generate images with better alignment and fidelity than existing models.


Image Super-Resolution with Text Prompt Diffusion

Zheng Chen,Yulun Zhang,Jinjin Gu,Xin Yuan,Linghe Kong,Guihai Chen,Xiaokang Yang

http://arxiv.org/abs/2311.14282v1

Compressor summary: Text prompts are used to improve image super-resolution by providing degradation information in a flexible and abstract manner, resulting in excellent performance on synthetic and real-world images.


Multi-modal Instance Refinement for Cross-domain Action Recognition

Yuan Qing,Naixing Wu,Shaohua Wan,Lixin Duan

http://arxiv.org/abs/2311.14281v1

Compressor summary: The paper proposes a reinforcement learning-based method to reduce negative transfer in unsupervised cross-domain action recognition by refining training data with a multi-modal instance refinement technique.


Cooperative Dual Attention for Audio-Visual Speech Enhancement with Facial Cues

Feixiang Wang,Shuang Yang,Shiguang Shan,Xilin Chen

http://arxiv.org/abs/2311.14275v1

Compressor summary: The paper proposes a DualAVSE method that leverages facial cues beyond the lip region for robust Audio-Visual Speech Enhancement, ignoring speech-unrelated information and dynamically integrating audio and visual features.


CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning

Shivam Aggarwal,Kuluhan Binici,Tulika Mitra

http://arxiv.org/abs/2311.14272v1

Compressor summary: The paper introduces CRISP, a new pruning method for machine learning models that combines structured sparsity patterns and class-aware saliency scores to reduce memory consumption and improve efficiency while maintaining accuracy.


Segmentation-Based Parametric Painting

Manuel Ladron de Guevara,Matthew Fisher,Aaron Hertzmann

http://arxiv.org/abs/2311.14271v1

Compressor summary: The method creates high-quality paintings from large images using segmentation and dynamic attention maps, allowing for control over details and style.


Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

Ekaterina Nikonova,Cheng Xue,Jochen Renz

http://arxiv.org/abs/2311.14270v1

Compressor summary: The paper proposes a framework for deep reinforcement learning agents that enables them to discover task-specific rules in new environments and self-supervise their learning, improving their ability to adapt to novelties.


Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision

Ziqing Wang,Yuetong Fang,Jiahang Cao,Renjing Xu

http://arxiv.org/abs/2311.14265v1

Compressor summary: The authors propose a burst-spike mechanism for spiking neural networks (SNNs) that reduces conversion errors, lowers latency, and saves energy compared to state-of-the-art methods.


ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation

Yuheng Xue,Nenglun Chen,Jun Liu,Wenyun Sun

http://arxiv.org/abs/2311.14262v1

Compressor summary: ZeroPS is a novel pipeline for zero-shot 3D part segmentation that leverages multi-view correspondences and prompt mechanisms of 2D pretrained foundational models, achieving state-of-the-art results without training or fine-tuning.


Out-of-Distribution Generalized Dynamic Graph Neural Network with Disentangled Intervention and Invariance Promotion

Zeyang Zhang,Xin Wang,Ziwei Zhang,Haoyang Li,Wenwu Zhu

http://arxiv.org/abs/2311.14255v1

Compressor summary: I-DIDA is a novel model that handles distribution shifts in dynamic graphs by discovering invariant patterns and making predictions based on them.


RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling

Xiaoyue Wan,Zhuo Chen,Yiming Bao,Xu Zhao

http://arxiv.org/abs/2311.14242v1

Compressor summary: The authors propose a method for 3D human pose estimation using binocular cameras that handles view inconsistency and occlusions by utilizing disparity and joint correlations.


Pseudo-label Correction for Instance-dependent Noise Using Teacher-student Framework

Eugene Kim

http://arxiv.org/abs/2311.14237v1

Compressor summary: The paper proposes P-LC, a teacher-student framework that uses a triple encoder and pseudo-label correction to handle label noise and improve generalization for deep learning models.