arxiv compressed, 2024-01-23

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-23 generated by the compressor, my personal LLM-based project.


Exploring Simple Open-Vocabulary Semantic Segmentation

http://arxiv.org/abs/2401.12217v1

Compressor summary: S-Seg is a novel model that trains a MaskFormer using pseudo-masks and language for open-vocabary semantic segmentation without relying on image-level VL models, ground truth masks, or custom grouping encoders.


Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models

http://arxiv.org/abs/2401.12215v1

Compressor summary: The paper investigates using parameter-efficient fine-tuning (PEFT) for transfer learning on chest radiography foundation models, showing its effectiveness compared to full-parameter fine-tuning and setting new state-of-the-art results.


Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition

http://arxiv.org/abs/2401.12210v1

Compressor summary: This paper introduces a new Bangla Sign Language dataset and two recognition models, showing lexical similarity with other sign languages and the need for more research on low-resource sign languages.


CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

http://arxiv.org/abs/2401.12208v1

Compressor summary: The text introduces CheXinstruct, a large dataset for CXR interpretation; CheXagent, a FM that analyzes and summarizes CXRs; and CheXbench, a benchmark to evaluate FMs on CXR interpretation tasks.


Retrieval-Guided Reinforcement Learning for Boolean Circuit Minimization

http://arxiv.org/abs/2401.12205v1

Compressor summary: The paper proposes a method (ABC-RL) that adjusts recommendations from pre-trained agents for logic synthesis, improving circuit quality and reducing runtime significantly.


APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

http://arxiv.org/abs/2401.12200v1

Compressor summary: APT adaptively prunes and tunes parameters for language models, improving both training and inference efficiency.


LONEStar: The Lunar Flashlight Optical Navigation Experiment

http://arxiv.org/abs/2401.12198v1

Compressor summary: The paper reports on the LONEStar experiment, which used optical observations of celestial bodies to navigate a spacecraft in heliocentric space after a failed lunar mission.


Text Embedding Inversion Attacks on Multilingual Language Models

http://arxiv.org/abs/2401.12192v1

Compressor summary: This paper explores the security risks of multilingual language models due to embedding inversion attacks and calls for more research on their prevention.


WARM: On the Benefits of Weight Averaged Reward Models

http://arxiv.org/abs/2401.12187v1

Compressor summary: Weight Averaged Reward Models (WARM) improve large language model predictions by averaging fine-tuned reward models, addressing challenges like distribution shifts and preference inconsistencies in reinforcement learning.


Universal Neurons in GPT2 Language Models

http://arxiv.org/abs/2401.12181v1

Compressor summary: The paper investigates whether individual neurons in GPT2 models have consistent functions across different random seeds and finds that some universal neurons exist with clear interpretations.


In-Context Learning for Extreme Multi-Label Classification

http://arxiv.org/abs/2401.12178v1

Compressor summary: The paper proposes a general program using in-context learning and retrievers to solve multi-label classification problems with thousands of classes efficiently, achieving state-of-the-art results on several benchmarks.


Broiler-Net: A Deep Convolutional Framework for Broiler Behavior Analysis in Poultry Houses

http://arxiv.org/abs/2401.12176v1

Compressor summary: This paper presents a novel real-time framework that uses deep learning to detect abnormal behaviors in cage-free poultry houses, such as inactive broiler and huddling behavior.


Single-View 3D Human Digitalization with Large Reconstruction Models

http://arxiv.org/abs/2401.12175v1

Compressor summary: Human-LRM is a model that predicts 3D human NeRFs from single images using a conditional diffusion strategy and outperforms prior methods.


SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

http://arxiv.org/abs/2401.12168v1

Compressor summary: Key points: - The paper proposes a system to train VLMs with internet-scale 3D spatial reasoning data - The system generates 2 billion VQA examples on real-world images - The system improves VLMs' performance on qualitative and quantitative spatial VQA - The system enables novel applications in chain-of-thought spatial reasoning and robotics Summary: The paper presents a system that generates massive 3D spatial reasoning data from real images to train VLMs for better spatial VQA and new robotics applications.


Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE

http://arxiv.org/abs/2401.12164v1

Compressor summary: The paper proposes a semi-supervised segmentation method for remote sensing data using a modified canonical correlation analysis algorithm with radial basis functions and k-means clustering.


Automated facial recognition system using deep learning for pain assessment in adults with cerebral palsy

http://arxiv.org/abs/2401.12161v1

Compressor summary: The study aimed to develop an automatic facial recognition system using deep learning to assess pain in individuals with cerebral palsy, showing promising results and highlighting the need for a larger dataset specific to this population.


Anisotropy Is Inherent to Self-Attention in Transformers

http://arxiv.org/abs/2401.12143v1

Compressor summary: This paper investigates the representation degeneration phenomenon, known as anisotropy, in Transformers and shows it affects various tasks and modalities, suggesting it's inherent to these models.


Evaluation of QCNN-LSTM for Disability Forecasting in Multiple Sclerosis Using Sequential Multisequence MRI

http://arxiv.org/abs/2401.12132v1

Compressor summary: The study compared quantum and classical neural networks for predicting Multiple Sclerosis disability using MRI data, finding that quantum models are competitive and faster to train.


Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy

http://arxiv.org/abs/2401.12129v1

Compressor summary: AbeT is a method that combines two existing scores to effectively identify Out-of-Distribution inputs in deep neural networks, reducing false positives and improving performance in various tasks.


The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

http://arxiv.org/abs/2401.12117v1

Compressor summary: The study evaluates multi-modal large language models' abstract reasoning abilities using Raven's Matrices and improves their performance with Chain-of-Thought prompting.


Extracting Formulae in Many-Valued Logic from Deep Neural Networks

http://arxiv.org/abs/2401.12113v1

Compressor summary: The paper suggests that deep ReLU networks can be seen as a type of many-valued logic and presents an algorithm to extract logical formulas from them using real-valued weights.


On-Time Delivery in Crowdshipping Systems: An Agent-Based Approach Using Streaming Data

http://arxiv.org/abs/2401.12108v1

Compressor summary: The paper proposes an agent-based system that uses smartphone sensor data to predict and prevent delivery delays in crowdshipping by transferring tasks to more promising couriers.


An Empirical Analysis of In-context Learning Abilities of LLMs for MT

http://arxiv.org/abs/2401.12097v1

Compressor summary: This paper explores how different aspects of in-context learning affect natural language generation and finds varying robustness to perturbations across large language models.


Unsupervised Learning of Graph from Recipes

http://arxiv.org/abs/2401.12088v1

Compressor summary: Key points: - The paper proposes a model to identify relevant information from recipes and generate a graph representing the sequence of actions - The model uses an unsupervised approach that learns text-to-graph and graph-to-text iteratively - The approach is evaluated by various metrics and compared with state of the art methods Summary: The paper presents an unsupervised model that learns to generate graphs from recipes, which can then be decoded back into text and evaluated against other methods.


Revisiting Demonstration Selection Strategies in In-Context Learning

http://arxiv.org/abs/2401.12087v1

Compressor summary: The study explores factors affecting in-context learning (ICL) performance with large language models (LLMs), proposes a data- and model-dependent demonstration selection method, and shows its improvements and explanatory power.


West-of-N: Synthetic Preference Generation for Improved Reward Modeling

http://arxiv.org/abs/2401.12086v1

Compressor summary: The paper proposes a self-training method using Best-of-N sampling to generate synthetic preference data and improve reward models for language model alignment in reinforcement learning from human feedback.


Temporal Blind Spots in Large Language Models

http://arxiv.org/abs/2401.12078v1

Compressor summary: This study examines the limitations of large language models for temporal question answering and identifies conditions under which their performance declines.


Cross-lingual Transfer Learning for Javanese Dependency Parsing

http://arxiv.org/abs/2401.12072v1

Compressor summary: The study examines how transfer learning can improve dependency parsing for Javanese, a low-resource language, using two methods that differ in the number of source languages involved.


Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

http://arxiv.org/abs/2401.12070v1

Compressor summary: Binoculars is a novel and accurate method to detect machine-generated text using a pair of pre-trained language models without needing training data or model-specific adjustments.


Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles

http://arxiv.org/abs/2401.12069v1

Compressor summary: TreeSHAP-IQ is a method that computes any-order additive Shapley interactions for predictions of tree-based models, making ensemble models more interpretable.


The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

http://arxiv.org/abs/2401.12058v1

Compressor summary: The paper investigates how gradient methods' generalization performance depends on the problem's dimension, showing lower bounds of $\Omega(\sqrt{d})$ for full-batch GD and SGD in stochastic convex optimization settings.


CloSe: A 3D Clothing Segmentation Dataset and Model

http://arxiv.org/abs/2401.12051v1

Compressor summary: CloSe-D is a large dataset for 3D clothing segmentation with CloSe-Net, a learning-based model that uses local point features and attention to segment clothing from colored point clouds, and CloSe-T, a 3D interactive tool for refining labels.


Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

http://arxiv.org/abs/2401.12039v1

Compressor summary: The paper proposes an audio-visual method for generating character-aware subtitles without face detection or tracking, using high-precision audio exemplars and speaker identity classification.


Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

http://arxiv.org/abs/2401.12033v1

Compressor summary: Momentum-SAM (MSAM) is a new optimization algorithm for deep neural networks that combines momentum and sharpness awareness to improve training and reduce overfitting, with minimal additional computational costs compared to SGD or Adam.


Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency

http://arxiv.org/abs/2401.12019v1

Compressor summary: The paper proposes a self-supervised method to improve monocular depth estimation by using multiple disparity maps to filter errors in pseudo-depth maps without ground truth data.


Robustness to distribution shifts of compressed networks for edge devices

http://arxiv.org/abs/2401.12014v1

Compressor summary: The study shows that compressed neural networks are less robust to data distribution shifts than their original networks, with post-training quantization being a reliable method for improving robustness.


Tensor-view Topological Graph Neural Network

http://arxiv.org/abs/2401.12007v1

Compressor summary: The paper introduces a new method for graph classification using tensor learning to capture topological information, which improves performance and reduces computation compared to existing graph neural networks.


ALMs: Authorial Language Models for Authorship Attribution

http://arxiv.org/abs/2401.12005v1

Compressor summary: Authorial Language Models (ALMs) is a method that identifies the most likely author of a document by measuring its perplexity using causal language models fine-tuned on candidates' writings, achieving high accuracy on both Blogs50 and CCAT50 datasets.


HgbNet: predicting hemoglobin level/anemia degree from EHR data

http://arxiv.org/abs/2401.12002v1

Compressor summary: HgbNet is a machine learning-based model that predicts hemoglobin levels and anemia degree from electronic health records (EHRs) data, addressing challenges such as missing values and irregular time intervals, and improving the quality of life for affected individuals.


Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep

http://arxiv.org/abs/2401.12001v1

Compressor summary: The text introduces a new way to measure how confident stereo-matching networks are in their results by comparing multiple disparity maps and using them as a 3-D volume, which can improve the performance of learning-based approaches.


Integrating Statistical Significance and Discriminative Power in Pattern Discovery

http://arxiv.org/abs/2401.12000v1

Compressor summary: The authors propose a method to improve pattern discovery algorithms by integrating statistical significance and discriminative power criteria, and demonstrate its effectiveness on triclustering tasks using multivariate time series data.


Expert-Driven Monitoring of Operational ML Models

http://arxiv.org/abs/2401.11993v1

Compressor summary: Expert Monitoring is a method that uses domain knowledge to improve machine learning models' ability to detect and handle changes in their input data.


Scaling Face Interaction Graph Networks to Real World Scenes

http://arxiv.org/abs/2401.11985v1

Compressor summary: Key points: - Learned simulators based on graph networks can capture complex real dynamics such as contact and friction - Applying them to real scenes requires handling large numbers of objects with complicated 3D shapes and inputs from perception - The method introduces a memory-efficient simulation model and a perceptual interface in the form of editable NeRFs - The method retains accuracy while using less memory than previous graph-based simulators and can apply to real world scenes from multiple camera angles Summary: The paper presents a memory-efficient method for applying learned simulators based on graph networks to real world scenes using perceptual information and editable NeRFs, which preserves accuracy while reducing memory requirements.


Cross-Validation Conformal Risk Control

http://arxiv.org/abs/2401.11974v1

Compressor summary: CV-CRC is a new method for conformal risk control that uses cross-validation instead of validation and allows for more risk functions while reducing the average set size in limited data scenarios.


Synergizing Machine Learning & Symbolic Methods: A Survey on Hybrid Approaches to Natural Language Processing

http://arxiv.org/abs/2401.11972v1

Compressor summary: The text summarizes hybrid approaches in Natural Language Processing (NLP) that combine machine learning and symbolic methods to overcome their individual weaknesses and enhance their strengths, covering various tasks and resources, as well as challenges and future directions.


Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research

http://arxiv.org/abs/2401.11969v1

Compressor summary: Key points: - The text discusses automated fact-checking, especially for multilingual data and methods. - It surveys existing research on detecting claims needing verification across different platforms and languages. - It categorizes the research into three factors: verifiability, priority, and similarity. Summary: The text reviews automated fact-checking methods for multilingual claims detection, covering existing datasets, challenges, and categorization of research factors.


Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

http://arxiv.org/abs/2401.11960v1

Compressor summary: The paper proposes a new downscaling method for meteorological fields that uses hypernetworks and multi-scale observational priors, achieving better results than previous deep learning approaches.


RUMBoost: Gradient Boosted Random Utility Models

http://arxiv.org/abs/2401.11954v1

Compressor summary: The paper proposes a new discrete choice modelling approach called RUMBoost that combines deep learning and Random Utility Models for better predictive ability and interpretability, with applications to mode choice data in London.


Feature Denoising Diffusion Model for Blind Image Quality Assessment

http://arxiv.org/abs/2401.11949v1

Compressor summary: The paper proposes a new method, PFD-IQA, to remove noise from quality-aware features in blind image quality assessment using a diffusion model and perceptual prior discovery and aggregation.


CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

http://arxiv.org/abs/2401.11944v1

Compressor summary: CMMMU is a Chinese benchmark for evaluating large multimodal models' advanced knowledge and reasoning abilities in various disciplines using 12k questions from different sources.


Benchmarking Large Multimodal Models against Common Corruptions

http://arxiv.org/abs/2401.11943v1

Compressor summary: The report evaluates how well large multimodal models handle common corruptions in text, image, and speech tasks, and introduces a new benchmark called MMCBench.


Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent

http://arxiv.org/abs/2401.11940v1

Compressor summary: The paper proposes a fast and effective method for recovering low-tubal-rank tensors from corrupted measurements using a factorization technique and factorized gradient descent, without requiring tensor Singular Value Decomposition or precise tubal-rank estimation.


The Bigger the Better? Rethinking the Effective Model Scale in Long-term Time Series Forecasting

http://arxiv.org/abs/2401.11929v1

Compressor summary: The HDformer is a lightweight Transformer variant for long-term time series forecasting that achieves high accuracy with over 99% fewer parameters than existing models by using conditional correlation and auto-correlation to eliminate redundancies in the input data.


A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

http://arxiv.org/abs/2401.11914v1

Compressor summary: The paper presents a new feature fusion module for multiscale CNNs in RGB-D saliency detection, which enhances features using saliency maps and achieves better results than existing methods.


Large receptive field strategy and important feature extraction strategy in 3D object detection

http://arxiv.org/abs/2401.11913v1

Compressor summary: The study proposes a Dynamic Feature Fusion Module (DFFM) to expand the 3D convolutional kernel's receptive field and a Feature Selection Module (FSM) to eliminate redundant features for better object detection in autonomous driving using LiDAR point clouds.


Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts for Open-Domain QA?

http://arxiv.org/abs/2401.11911v1

Compressor summary: The study investigates LLMs' ability to integrate generated and retrieved contexts and finds a significant bias towards generated contexts due to similarity and segmentation issues.


Considerations on Approaches and Metrics in Automated Theorem Generation/Finding in Geometry

http://arxiv.org/abs/2401.11905v1

Compressor summary: The paper explores different methods for automatically discovering and ranking geometric theorems, while acknowledging that judging their interestingness is a complex and non-deterministic task.


Automation of Triangle Ruler-and-Compass Constructions Using Constraint Solvers

http://arxiv.org/abs/2401.11903v1

Compressor summary: The paper proposes a method for automated triangle construction using finite-domain constraint solvers and shows its advantages over dedicated tools in terms of efficiency and optimality.


Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint Solving

http://arxiv.org/abs/2401.11898v1

Compressor summary: The paper presents a framework for completing incomplete conjectures and proofs in mathematical practice using synthetic geometry, coherent logic, and constraint solving.


PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

http://arxiv.org/abs/2401.11880v1

Compressor summary: The text proposes a psychology-based framework to identify, mitigate, and evaluate safety risks in multi-agent systems involving large language models, highlighting collective dangerous behaviors and self-reflection.


Evaluating the Feasibility of Standard Facial Expression Recognition in Individuals with Moderate to Severe Intellectual Disabilities

http://arxiv.org/abs/2401.11877v1

Compressor summary: The study explores using deep learning to recognize facial expressions in people with intellectual disabilities and shows that tailored training methods can help machines understand their unique emotions.


Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

http://arxiv.org/abs/2401.11874v1

Compressor summary: The paper proposes a tree construction method to analyze the hierarchical structure of documents created with structured formats, such as LaTeX, Word, or HTML, and introduces a new benchmark (Comp-HRDoc) for evaluating the approach.


Toward Semantic Interoperability of Electronic Health Records

http://arxiv.org/abs/2401.11865v1

Compressor summary: The paper proposes an ontology-based approach to achieve semantic interoperability of electronic health records by focusing on medical diagnoses statements and using canonical ontologies, modules, and mapping axioms.


Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation

http://arxiv.org/abs/2401.11864v1

Compressor summary: The paper proposes EoTD and MTD techniques to compress advanced LLMs into smaller SLMs without losing their reasoning capabilities, and shows improved performance in fine-tuning SLMs for equation-based and multiple thought processes.


A Review of Physics-Informed Machine Learning Methods with Applications to Condition Monitoring and Anomaly Detection

http://arxiv.org/abs/2401.11860v1

Compressor summary: The study surveys physically informed machine learning (PIML) techniques for condition monitoring, highlighting their advantages, limitations, and case studies, and suggests they will improve maintenance and reliability in engineering systems.


The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines

http://arxiv.org/abs/2401.11852v1

Compressor summary: The authors evaluate various multi-label classification methods on legal datasets and find DistilRoBERTa, LegalBERT, and T5 to perform well, while the CrossEncoder offers potential for improvement but is more computationally expensive.


Self-Labeling the Job Shop Scheduling Problem

http://arxiv.org/abs/2401.11849v1

Compressor summary: The paper introduces a Self-Supervised training method for combinatorial problems that uses generative models, avoids costly target solutions, and improves performance on Job Shop Scheduling (JSP).


ExtruOnt: An ontology for describing a type of manufacturing machine for Industry 4.0 systems

http://arxiv.org/abs/2401.11848v1

Compressor summary: The paper introduces ExtruOnt, an ontology for describing extruders, a type of manufacturing machine, and explains its modules and development process.


SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning

http://arxiv.org/abs/2401.11847v1

Compressor summary: SignVTCL is a sign language recognition framework that uses visual-textual contrastive learning to improve multi-modal data integration and alignment for better performance.


Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction

http://arxiv.org/abs/2401.11844v1

Compressor summary: Key points: - Crop yield prediction is complex and depends on multiple factors - Multi-view learning approach combines heterogeneous data sources (optical images, weather data, soil properties, topography) - Multi-view Gated Fusion (MVGF) model adaptively fuses view-specific representations with learned weights - MVGF outperforms conventional models and achieves best results by incorporating all data sources Summary: The authors propose a novel multi-view learning approach, MVGF, that effectively fuses heterogeneous data sources to predict crop yield for different crops and regions.


Learning to Approximate Adaptive Kernel Convolution on Graphs

http://arxiv.org/abs/2401.11840v1

Compressor summary: The paper proposes a diffusion learning framework that adapts the feature aggregation range using a scale parameter, overcomes limitations of conventional GNNs, and achieves state-of-the-art performance on node-wise classification tasks.


AI for social science and social science of AI: A Survey

http://arxiv.org/abs/2401.11839v1

Compressor summary: The text discusses how recent advancements in artificial intelligence, especially large language models, have led to a reevaluation of general AI possibilities and increased interest in combining AI with social science research, focusing on enhancing social science methods and studying AI as a social entity.


Unveiling the Human-like Similarities of Automatic Facial Expression Recognition: An Empirical Exploration through Explainable AI

http://arxiv.org/abs/2401.11835v1

Compressor summary: The study compares deep learning models for facial expression recognition and their similarity to human perception using heatmaps and finds limited alignment between humans and AIs, with pre-training affecting the results.


A Fair Evaluation of Various Deep Learning-Based Document Image Binarization Approaches

http://arxiv.org/abs/2401.11831v1

Compressor summary: The paper compares different deep learning methods for document image binarization using DIBCO datasets and provides public resources for further research.


Rethinking Centered Kernel Alignment in Knowledge Distillation

http://arxiv.org/abs/2401.11824v1

Compressor summary: The paper proposes Relation-Centered Kernel Alignment (RCKA), a novel framework to improve knowledge distillation using centered kernel alignment (CKA) by connecting it to maximum mean discrepancy (MMD) and customizing its application for different tasks.


SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese

http://arxiv.org/abs/2401.11819v1

Compressor summary: SC-Math6 is a challenging dataset for testing mathematical reasoning skills of Chinese language models, with top models like GPT-4 performing well.


Hallucination is Inevitable: An Innate Limitation of Large Language Models

http://arxiv.org/abs/2401.11817v1

Compressor summary: The paper shows that hallucination in large language models is unavoidable due to computational limitations and discusses its implications for safety and mitigation strategies.


Symbrain: A large-scale dataset of MRI images for neonatal brain symmetry analysis

http://arxiv.org/abs/2401.11814v1

Compressor summary: The paper introduces a labeled dataset of neonatal brain MRI images that can help train models to detect anomalies and diagnose clinical pathologies by analyzing symmetry patterns.


Generalization and Informativeness of Conformal Prediction

http://arxiv.org/abs/2401.11810v1

Compressor summary: This paper connects the generalization error of a base predictor to the informativeness of its conformal prediction sets and derives an upper bound on their expected size.


Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction

http://arxiv.org/abs/2401.11798v1

Compressor summary: The paper proposes a cost function for knowledge distillation to improve real-time traffic prediction using a spatio-temporal graph neural network while reducing its parameters and execution time.


Local Agnostic Video Explanations: a Study on the Applicability of Removal-Based Explanations to Video

http://arxiv.org/abs/2401.11796v1

Compressor summary: The paper presents a framework for explaining deep learning video models by adapting six existing techniques and evaluating their performance.


SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

http://arxiv.org/abs/2401.11791v1

Compressor summary: The paper proposes a method called SemPLeS that uses semantic prompt learning to improve weakly-supervised semantic segmentation by enhancing the alignment between segmented regions and target object categories.


Deep Learning for Computer Vision based Activity Recognition and Fall Detection of the Elderly: a Systematic Review

http://arxiv.org/abs/2401.11790v1

Compressor summary: Key points: - The text discusses a systematic review of literature on AAL systems for elderly people's safety - The focus is on fall detection and HAR using DL approaches on computer vision data - The text provides data collections, strengths, weaknesses, and recommendations for future works Summary: The text reviews AAL systems that use DL methods to detect falls and recognize human activities for elderly safety, and offers data, analysis, and suggestions for further research.


Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

http://arxiv.org/abs/2401.11783v1

Compressor summary: The paper proposes a novel framework for full-body motion reconstruction from sparse sensor data using a Body Pose Graph that captures temporal and spatial features of human joints.


Collaborative Position Reasoning Network for Referring Image Segmentation

http://arxiv.org/abs/2401.11775v1

Compressor summary: The paper proposes a new method for referring image segmentation using Collaborative Position Reasoning Network (CPRN) with Row-and-Column and Guided Holistic interactive modules to explicitly model entity localization and achieve accurate segmentation.


LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning

http://arxiv.org/abs/2401.11772v1

Compressor summary: LightDiC is a scalable directed graph neural network that uses the magnetic Laplacian and achieves high performance on various downstream tasks with fewer parameters and faster training.


ADA-GNN: Atom-Distance-Angle Graph Neural Network for Crystal Material Property Prediction

http://arxiv.org/abs/2401.11768v1

Compressor summary: The authors propose a method to predict properties of crystal materials using a graph neural network that considers both bond distances and angles, while reducing the time cost by partitioning neighbors into different scales.


Concealed Object Segmentation with Hierarchical Coherence Modeling

http://arxiv.org/abs/2401.11767v1

Compressor summary: The paper proposes a Hierarchical Coherence Modeling (HCM) segmenter for concealed object segmentation (COS), which improves feature coherence and uses a reversible re-calibration decoder to detect previously undetected parts in low-confidence regions.


Towards Effective and General Graph Unlearning via Mutual Evolution

http://arxiv.org/abs/2401.11760v1

Compressor summary: MEGU is a new graph unlearning method that evolves both prediction and unlearning capabilities, improving efficiency and performance on various graph tasks.


From Knowledge Organization to Knowledge Representation and Back

http://arxiv.org/abs/2401.11753v1

Compressor summary: The paper compares and combines Knowledge Organization and Knowledge Representation methods for modeling knowledge in different domains and showcases their integration in a real-world application.


Boosting Multi-view Stereo with Late Cost Aggregation

http://arxiv.org/abs/2401.11751v1

Compressor summary: Late aggregation preserves pairwise costs and enables more accurate Multi-view Stereo estimations by fully utilizing geometric matching cues without losing cost fidelity.


Multi-level Cross-modal Alignment for Image Clustering

http://arxiv.org/abs/2401.11740v1

Compressor summary: Key points: - Cross-modal pretraining model can produce pseudo-labels for image clustering - Erroneous alignments can degrade clustering performance - Multi-level Cross-modal Alignment method improves alignments by building a smaller but better semantic space and aligning at three levels - Theoretical and experimental results support the effectiveness of the new method Summary: The paper proposes a novel method to improve cross-modal pretraining for image clustering by aligning images and texts at different levels, achieving better performance with theoretical and empirical evidence.


EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

http://arxiv.org/abs/2401.11739v1

Compressor summary: The authors propose a method to generate fine-grained image segmentation masks from pre-trained diffusion models without additional training by exploiting the generation process and semantic correspondences between pixels and low-dimensional feature maps.


MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

http://arxiv.org/abs/2401.11738v1

Compressor summary: MetaSeg is a meta learning based semantic segmentation method that uses a content-aware meta-net to identify and ignore noisy labels in pseudo segmentation labels, improving performance with a decoupled training strategy.


Colorectal Polyp Segmentation in the Deep Learning Era: A Comprehensive Survey

http://arxiv.org/abs/2401.11734v1

Compressor summary: The paper provides a comprehensive review of deep learning-based colorectal polyp segmentation methods, covering network architectures, supervision levels, learning paradigms, datasets, evaluation metrics, and current challenges and trends.


Detecting Out-of-Distribution Samples via Conditional Distribution Entropy with Optimal Transport

http://arxiv.org/abs/2401.11726v1

Compressor summary: The paper proposes a novel OOD detection method using conditional distribution entropy based on optimal transport to utilize both training and test input distributions effectively.


Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models

http://arxiv.org/abs/2401.11725v1

Compressor summary: S2L helps large language models understand and reason with symbols by converting them to natural language representations.


Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification

http://arxiv.org/abs/2401.11724v1

Compressor summary: APNT uses TransMix data augmentation to improve few-shot hyperspectral image classification by generating synthetic boundary patches and labels.


Graph Condensation: A Survey

http://arxiv.org/abs/2401.11720v1

Compressor summary: The text introduces graph condensation as an innovative solution to address challenges in working with large graphs, provides an overview of existing research on graph condensation, and discusses its applications and future directions.


SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation

http://arxiv.org/abs/2401.11719v1

Compressor summary: Our method improves semantic segmentation by calibrating shared features in classifier weights using class prototypes and a Multi-Scaled Distribution-Weighted loss, addressing the issue of long-tailed distribution affecting CAM quality.


MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection

http://arxiv.org/abs/2401.11718v1

Compressor summary: MsSVT++ is a 3D object detection method that uses mixed-scale features and a divide-and-conquer approach to capture long-range and fine-grained information, improving accuracy in outdoor scenes.


Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council

http://arxiv.org/abs/2401.11713v1

Compressor summary: Ada-ABC is a debiasing framework for medical image classification that uses multiple classifiers to learn dataset bias and guides a debiasing model to agree or disagree with them on correctly and incorrectly predicted samples, improving accuracy and fairness.


HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs

http://arxiv.org/abs/2401.11711v1

Compressor summary: HG3-NeRF is a novel method for improving NeRF's performance in synthesizing scenes with sparse view inputs by incorporating geometric, semantic, and photometric guidance.


Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

http://arxiv.org/abs/2401.11708v1

Compressor summary: The paper introduces RPG, a text-to-image generation/editing framework that uses multimodal LLMs for chain-of-thought reasoning to enhance compositional image generation and editing.


EK-Net:Real-time Scene Text Detection with Expand Kernel Distance

http://arxiv.org/abs/2401.11704v1

Compressor summary: EK-Net is a new method for scene text detection that compensates for the shrinked kernel problem and achieves high accuracy and speed.


Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

http://arxiv.org/abs/2401.11700v1

Compressor summary: The study proposes a new way to transfer knowledge from a BERT language model to an ASR model using intermediate layers and an attention decoder, improving accuracy and parallel decoding.


Admission Prediction in Undergraduate Applications: an Interpretable Deep Learning Approach

http://arxiv.org/abs/2401.11698v1

Compressor summary: The authors propose deep learning models for validating undergraduate admissions decisions and incorporate an interpretability module to improve fairness and accuracy.


Parametric Matrix Models

http://arxiv.org/abs/2401.11694v1

Compressor summary: Parametric matrix models are machine learning algorithms that use matrix equations to approximate solutions efficiently and can be applied to various problems without needing high-fidelity model calculations.


Memory-Efficient Prompt Tuning for Incremental Histopathology Classification

http://arxiv.org/abs/2401.11674v1

Compressor summary: The text proposes a memory-efficient method to improve histopathology classification by attaching lightweight prompts to the initial model and updating them incrementally with a style-augmented graph attention network.


MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo

http://arxiv.org/abs/2401.11673v1

Compressor summary: The paper introduces MVSFormer++, a transformer-based model with attention mechanisms that enhances different components of the MVS pipeline by infusing cross-view information and optimizing various design details.


An Improved Grey Wolf Optimization Algorithm for Heart Disease Prediction

http://arxiv.org/abs/2401.11669v1

Compressor summary: The paper proposes a novel algorithm that combines adaptive curve grey wolf optimization and neural network backpropagation to improve medical image processing and achieve better heart disease prediction.


INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning

http://arxiv.org/abs/2401.11667v1

Compressor summary: INCPrompt is a novel approach to continual learning that prevents catastrophic forgetting by using adaptive key-learner and task-aware prompts to encode general and specific knowledge.


P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

http://arxiv.org/abs/2401.11666v1

Compressor summary: P2DT is a novel method to reduce catastrophic forgetting in reinforcement learning by dynamically appending decision tokens during new task training, leveraging knowledge from previous tasks and enhancing transformer-based models.


Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

http://arxiv.org/abs/2401.11664v1

Compressor summary: This paper proposes a zero-space fault protection mechanism for ReRAM-based DNNs that uses structure pruning, weight duplication, and embedding duplicated MSBs to reduce prediction errors due to hardware failures.


Differentiable Tree Search in Latent State Space

http://arxiv.org/abs/2401.11660v1

Compressor summary: Differentiable Tree Search (DTS) is a new neural network architecture that improves decision-making with limited data by combining a world model with an online search algorithm, resulting in better performance than other methods in Procgen games and grid navigation tasks.


ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

http://arxiv.org/abs/2401.11654v1

Compressor summary: The authors propose a new approach for zero-shot action recognition using rich video descriptions from a large-scale dataset (ActionHub), and a novel framework (CoCo) that aligns features across actions and modalities.


PointGL: A Simple Global-Local Framework for Efficient Point Cloud Analysis

http://arxiv.org/abs/2401.11650v1

Compressor summary: PointGL is a fast and efficient point cloud analysis architecture that uses global point embedding and local graph pooling to reduce computational redundancy.


M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

http://arxiv.org/abs/2401.11649v1

Compressor summary: Key points: - The paper introduces a new framework called MMClip that uses multimodal adapters and multi-task decoder to improve video action recognition. - The framework preserves high supervised performance and robust transferability. Summary: The paper presents MMClip, a novel framework for video action recognition that enhances visual and text representations with multimodal adapters and leverages multiple tasks in the decoder for strong performance and generalization.


Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical Regularisation

http://arxiv.org/abs/2401.11648v1

Compressor summary: NECHO is a new framework that uses medical codes and other EHR data to predict future diagnoses, while considering the hierarchy and diversity in EHR data.


Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition

http://arxiv.org/abs/2401.11644v1

Compressor summary: Key points: - The paper proposes two transformers (MS-AST and MS-ASCT) for surgical phase recognition using spatial and temporal features - The method uses multi-scale temporal self-attention and cross-attention to model temporal relationships - The method achieves new state-of-the-art results on online and offline surgical datasets and non-medical video action segmentation datasets Summary: The paper introduces two transformers that use spatial and temporal features to recognize surgical phases, with multi-scale attention for capturing temporal relationships, and shows their superior performance on various datasets.


Revolutionizing Finance with LLMs: An Overview of Applications and Insights

http://arxiv.org/abs/2401.11641v1

Compressor summary: Large Language Models (LLMs) like ChatGPT are being used in financial tasks such as report generation, market forecasting, sentiment analysis, and personalized advice, improving efficiency and customer satisfaction in the finance industry.


Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss

http://arxiv.org/abs/2401.11633v1

Compressor summary: Zoom-shot is a novel method that transfers CLIP's zero-shot capabilities to any pre-trained vision encoder using multimodal losses, enabling efficient and unsupervised Vision-Language Model development.