This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2401.12217v1
Compressor summary: S-Seg is a novel model that trains a MaskFormer using pseudo-masks and language for open-vocabary semantic segmentation without relying on image-level VL models, ground truth masks, or custom grouping encoders.
http://arxiv.org/abs/2401.12215v1
Compressor summary: The paper investigates using parameter-efficient fine-tuning (PEFT) for transfer learning on chest radiography foundation models, showing its effectiveness compared to full-parameter fine-tuning and setting new state-of-the-art results.
http://arxiv.org/abs/2401.12210v1
Compressor summary: This paper introduces a new Bangla Sign Language dataset and two recognition models, showing lexical similarity with other sign languages and the need for more research on low-resource sign languages.
http://arxiv.org/abs/2401.12208v1
Compressor summary: The text introduces CheXinstruct, a large dataset for CXR interpretation; CheXagent, a FM that analyzes and summarizes CXRs; and CheXbench, a benchmark to evaluate FMs on CXR interpretation tasks.
http://arxiv.org/abs/2401.12205v1
Compressor summary: The paper proposes a method (ABC-RL) that adjusts recommendations from pre-trained agents for logic synthesis, improving circuit quality and reducing runtime significantly.
http://arxiv.org/abs/2401.12200v1
Compressor summary: APT adaptively prunes and tunes parameters for language models, improving both training and inference efficiency.
http://arxiv.org/abs/2401.12198v1
Compressor summary: The paper reports on the LONEStar experiment, which used optical observations of celestial bodies to navigate a spacecraft in heliocentric space after a failed lunar mission.
http://arxiv.org/abs/2401.12192v1
Compressor summary: This paper explores the security risks of multilingual language models due to embedding inversion attacks and calls for more research on their prevention.
http://arxiv.org/abs/2401.12187v1
Compressor summary: Weight Averaged Reward Models (WARM) improve large language model predictions by averaging fine-tuned reward models, addressing challenges like distribution shifts and preference inconsistencies in reinforcement learning.
http://arxiv.org/abs/2401.12181v1
Compressor summary: The paper investigates whether individual neurons in GPT2 models have consistent functions across different random seeds and finds that some universal neurons exist with clear interpretations.
http://arxiv.org/abs/2401.12178v1
Compressor summary: The paper proposes a general program using in-context learning and retrievers to solve multi-label classification problems with thousands of classes efficiently, achieving state-of-the-art results on several benchmarks.
http://arxiv.org/abs/2401.12176v1
Compressor summary: This paper presents a novel real-time framework that uses deep learning to detect abnormal behaviors in cage-free poultry houses, such as inactive broiler and huddling behavior.
http://arxiv.org/abs/2401.12175v1
Compressor summary: Human-LRM is a model that predicts 3D human NeRFs from single images using a conditional diffusion strategy and outperforms prior methods.
http://arxiv.org/abs/2401.12168v1
Compressor summary: Key points: - The paper proposes a system to train VLMs with internet-scale 3D spatial reasoning data - The system generates 2 billion VQA examples on real-world images - The system improves VLMs' performance on qualitative and quantitative spatial VQA - The system enables novel applications in chain-of-thought spatial reasoning and robotics Summary: The paper presents a system that generates massive 3D spatial reasoning data from real images to train VLMs for better spatial VQA and new robotics applications.
http://arxiv.org/abs/2401.12164v1
Compressor summary: The paper proposes a semi-supervised segmentation method for remote sensing data using a modified canonical correlation analysis algorithm with radial basis functions and k-means clustering.
http://arxiv.org/abs/2401.12161v1
Compressor summary: The study aimed to develop an automatic facial recognition system using deep learning to assess pain in individuals with cerebral palsy, showing promising results and highlighting the need for a larger dataset specific to this population.
http://arxiv.org/abs/2401.12143v1
Compressor summary: This paper investigates the representation degeneration phenomenon, known as anisotropy, in Transformers and shows it affects various tasks and modalities, suggesting it's inherent to these models.
http://arxiv.org/abs/2401.12132v1
Compressor summary: The study compared quantum and classical neural networks for predicting Multiple Sclerosis disability using MRI data, finding that quantum models are competitive and faster to train.
http://arxiv.org/abs/2401.12129v1
Compressor summary: AbeT is a method that combines two existing scores to effectively identify Out-of-Distribution inputs in deep neural networks, reducing false positives and improving performance in various tasks.
http://arxiv.org/abs/2401.12117v1
Compressor summary: The study evaluates multi-modal large language models' abstract reasoning abilities using Raven's Matrices and improves their performance with Chain-of-Thought prompting.
http://arxiv.org/abs/2401.12113v1
Compressor summary: The paper suggests that deep ReLU networks can be seen as a type of many-valued logic and presents an algorithm to extract logical formulas from them using real-valued weights.
http://arxiv.org/abs/2401.12108v1
Compressor summary: The paper proposes an agent-based system that uses smartphone sensor data to predict and prevent delivery delays in crowdshipping by transferring tasks to more promising couriers.
http://arxiv.org/abs/2401.12097v1
Compressor summary: This paper explores how different aspects of in-context learning affect natural language generation and finds varying robustness to perturbations across large language models.
http://arxiv.org/abs/2401.12088v1
Compressor summary: Key points: - The paper proposes a model to identify relevant information from recipes and generate a graph representing the sequence of actions - The model uses an unsupervised approach that learns text-to-graph and graph-to-text iteratively - The approach is evaluated by various metrics and compared with state of the art methods Summary: The paper presents an unsupervised model that learns to generate graphs from recipes, which can then be decoded back into text and evaluated against other methods.
http://arxiv.org/abs/2401.12087v1
Compressor summary: The study explores factors affecting in-context learning (ICL) performance with large language models (LLMs), proposes a data- and model-dependent demonstration selection method, and shows its improvements and explanatory power.
http://arxiv.org/abs/2401.12086v1
Compressor summary: The paper proposes a self-training method using Best-of-N sampling to generate synthetic preference data and improve reward models for language model alignment in reinforcement learning from human feedback.
http://arxiv.org/abs/2401.12078v1
Compressor summary: This study examines the limitations of large language models for temporal question answering and identifies conditions under which their performance declines.
http://arxiv.org/abs/2401.12072v1
Compressor summary: The study examines how transfer learning can improve dependency parsing for Javanese, a low-resource language, using two methods that differ in the number of source languages involved.
http://arxiv.org/abs/2401.12070v1
Compressor summary: Binoculars is a novel and accurate method to detect machine-generated text using a pair of pre-trained language models without needing training data or model-specific adjustments.
http://arxiv.org/abs/2401.12069v1
Compressor summary: TreeSHAP-IQ is a method that computes any-order additive Shapley interactions for predictions of tree-based models, making ensemble models more interpretable.
http://arxiv.org/abs/2401.12058v1
Compressor summary: The paper investigates how gradient methods' generalization performance depends on the problem's dimension, showing lower bounds of $\Omega(\sqrt{d})$ for full-batch GD and SGD in stochastic convex optimization settings.
http://arxiv.org/abs/2401.12051v1
Compressor summary: CloSe-D is a large dataset for 3D clothing segmentation with CloSe-Net, a learning-based model that uses local point features and attention to segment clothing from colored point clouds, and CloSe-T, a 3D interactive tool for refining labels.
http://arxiv.org/abs/2401.12039v1
Compressor summary: The paper proposes an audio-visual method for generating character-aware subtitles without face detection or tracking, using high-precision audio exemplars and speaker identity classification.
http://arxiv.org/abs/2401.12033v1
Compressor summary: Momentum-SAM (MSAM) is a new optimization algorithm for deep neural networks that combines momentum and sharpness awareness to improve training and reduce overfitting, with minimal additional computational costs compared to SGD or Adam.
http://arxiv.org/abs/2401.12019v1
Compressor summary: The paper proposes a self-supervised method to improve monocular depth estimation by using multiple disparity maps to filter errors in pseudo-depth maps without ground truth data.
http://arxiv.org/abs/2401.12014v1
Compressor summary: The study shows that compressed neural networks are less robust to data distribution shifts than their original networks, with post-training quantization being a reliable method for improving robustness.
http://arxiv.org/abs/2401.12007v1
Compressor summary: The paper introduces a new method for graph classification using tensor learning to capture topological information, which improves performance and reduces computation compared to existing graph neural networks.
http://arxiv.org/abs/2401.12005v1
Compressor summary: Authorial Language Models (ALMs) is a method that identifies the most likely author of a document by measuring its perplexity using causal language models fine-tuned on candidates' writings, achieving high accuracy on both Blogs50 and CCAT50 datasets.
http://arxiv.org/abs/2401.12002v1
Compressor summary: HgbNet is a machine learning-based model that predicts hemoglobin levels and anemia degree from electronic health records (EHRs) data, addressing challenges such as missing values and irregular time intervals, and improving the quality of life for affected individuals.
http://arxiv.org/abs/2401.12001v1
Compressor summary: The text introduces a new way to measure how confident stereo-matching networks are in their results by comparing multiple disparity maps and using them as a 3-D volume, which can improve the performance of learning-based approaches.
http://arxiv.org/abs/2401.12000v1
Compressor summary: The authors propose a method to improve pattern discovery algorithms by integrating statistical significance and discriminative power criteria, and demonstrate its effectiveness on triclustering tasks using multivariate time series data.
http://arxiv.org/abs/2401.11993v1
Compressor summary: Expert Monitoring is a method that uses domain knowledge to improve machine learning models' ability to detect and handle changes in their input data.
http://arxiv.org/abs/2401.11985v1
Compressor summary: Key points: - Learned simulators based on graph networks can capture complex real dynamics such as contact and friction - Applying them to real scenes requires handling large numbers of objects with complicated 3D shapes and inputs from perception - The method introduces a memory-efficient simulation model and a perceptual interface in the form of editable NeRFs - The method retains accuracy while using less memory than previous graph-based simulators and can apply to real world scenes from multiple camera angles Summary: The paper presents a memory-efficient method for applying learned simulators based on graph networks to real world scenes using perceptual information and editable NeRFs, which preserves accuracy while reducing memory requirements.
http://arxiv.org/abs/2401.11974v1
Compressor summary: CV-CRC is a new method for conformal risk control that uses cross-validation instead of validation and allows for more risk functions while reducing the average set size in limited data scenarios.
http://arxiv.org/abs/2401.11972v1
Compressor summary: The text summarizes hybrid approaches in Natural Language Processing (NLP) that combine machine learning and symbolic methods to overcome their individual weaknesses and enhance their strengths, covering various tasks and resources, as well as challenges and future directions.
http://arxiv.org/abs/2401.11969v1
Compressor summary: Key points: - The text discusses automated fact-checking, especially for multilingual data and methods. - It surveys existing research on detecting claims needing verification across different platforms and languages. - It categorizes the research into three factors: verifiability, priority, and similarity. Summary: The text reviews automated fact-checking methods for multilingual claims detection, covering existing datasets, challenges, and categorization of research factors.
http://arxiv.org/abs/2401.11960v1
Compressor summary: The paper proposes a new downscaling method for meteorological fields that uses hypernetworks and multi-scale observational priors, achieving better results than previous deep learning approaches.
http://arxiv.org/abs/2401.11954v1
Compressor summary: The paper proposes a new discrete choice modelling approach called RUMBoost that combines deep learning and Random Utility Models for better predictive ability and interpretability, with applications to mode choice data in London.
http://arxiv.org/abs/2401.11949v1
Compressor summary: The paper proposes a new method, PFD-IQA, to remove noise from quality-aware features in blind image quality assessment using a diffusion model and perceptual prior discovery and aggregation.
http://arxiv.org/abs/2401.11944v1
Compressor summary: CMMMU is a Chinese benchmark for evaluating large multimodal models' advanced knowledge and reasoning abilities in various disciplines using 12k questions from different sources.
http://arxiv.org/abs/2401.11943v1
Compressor summary: The report evaluates how well large multimodal models handle common corruptions in text, image, and speech tasks, and introduces a new benchmark called MMCBench.
http://arxiv.org/abs/2401.11940v1
Compressor summary: The paper proposes a fast and effective method for recovering low-tubal-rank tensors from corrupted measurements using a factorization technique and factorized gradient descent, without requiring tensor Singular Value Decomposition or precise tubal-rank estimation.
http://arxiv.org/abs/2401.11929v1
Compressor summary: The HDformer is a lightweight Transformer variant for long-term time series forecasting that achieves high accuracy with over 99% fewer parameters than existing models by using conditional correlation and auto-correlation to eliminate redundancies in the input data.
http://arxiv.org/abs/2401.11914v1
Compressor summary: The paper presents a new feature fusion module for multiscale CNNs in RGB-D saliency detection, which enhances features using saliency maps and achieves better results than existing methods.
http://arxiv.org/abs/2401.11913v1
Compressor summary: The study proposes a Dynamic Feature Fusion Module (DFFM) to expand the 3D convolutional kernel's receptive field and a Feature Selection Module (FSM) to eliminate redundant features for better object detection in autonomous driving using LiDAR point clouds.
http://arxiv.org/abs/2401.11911v1
Compressor summary: The study investigates LLMs' ability to integrate generated and retrieved contexts and finds a significant bias towards generated contexts due to similarity and segmentation issues.
http://arxiv.org/abs/2401.11905v1
Compressor summary: The paper explores different methods for automatically discovering and ranking geometric theorems, while acknowledging that judging their interestingness is a complex and non-deterministic task.
http://arxiv.org/abs/2401.11903v1
Compressor summary: The paper proposes a method for automated triangle construction using finite-domain constraint solvers and shows its advantages over dedicated tools in terms of efficiency and optimality.
http://arxiv.org/abs/2401.11898v1
Compressor summary: The paper presents a framework for completing incomplete conjectures and proofs in mathematical practice using synthetic geometry, coherent logic, and constraint solving.
http://arxiv.org/abs/2401.11880v1
Compressor summary: The text proposes a psychology-based framework to identify, mitigate, and evaluate safety risks in multi-agent systems involving large language models, highlighting collective dangerous behaviors and self-reflection.
http://arxiv.org/abs/2401.11877v1
Compressor summary: The study explores using deep learning to recognize facial expressions in people with intellectual disabilities and shows that tailored training methods can help machines understand their unique emotions.
http://arxiv.org/abs/2401.11874v1
Compressor summary: The paper proposes a tree construction method to analyze the hierarchical structure of documents created with structured formats, such as LaTeX, Word, or HTML, and introduces a new benchmark (Comp-HRDoc) for evaluating the approach.
http://arxiv.org/abs/2401.11865v1
Compressor summary: The paper proposes an ontology-based approach to achieve semantic interoperability of electronic health records by focusing on medical diagnoses statements and using canonical ontologies, modules, and mapping axioms.
http://arxiv.org/abs/2401.11864v1
Compressor summary: The paper proposes EoTD and MTD techniques to compress advanced LLMs into smaller SLMs without losing their reasoning capabilities, and shows improved performance in fine-tuning SLMs for equation-based and multiple thought processes.
http://arxiv.org/abs/2401.11860v1
Compressor summary: The study surveys physically informed machine learning (PIML) techniques for condition monitoring, highlighting their advantages, limitations, and case studies, and suggests they will improve maintenance and reliability in engineering systems.
http://arxiv.org/abs/2401.11852v1
Compressor summary: The authors evaluate various multi-label classification methods on legal datasets and find DistilRoBERTa, LegalBERT, and T5 to perform well, while the CrossEncoder offers potential for improvement but is more computationally expensive.
http://arxiv.org/abs/2401.11849v1
Compressor summary: The paper introduces a Self-Supervised training method for combinatorial problems that uses generative models, avoids costly target solutions, and improves performance on Job Shop Scheduling (JSP).
http://arxiv.org/abs/2401.11848v1
Compressor summary: The paper introduces ExtruOnt, an ontology for describing extruders, a type of manufacturing machine, and explains its modules and development process.
http://arxiv.org/abs/2401.11847v1
Compressor summary: SignVTCL is a sign language recognition framework that uses visual-textual contrastive learning to improve multi-modal data integration and alignment for better performance.
http://arxiv.org/abs/2401.11844v1
Compressor summary: Key points: - Crop yield prediction is complex and depends on multiple factors - Multi-view learning approach combines heterogeneous data sources (optical images, weather data, soil properties, topography) - Multi-view Gated Fusion (MVGF) model adaptively fuses view-specific representations with learned weights - MVGF outperforms conventional models and achieves best results by incorporating all data sources Summary: The authors propose a novel multi-view learning approach, MVGF, that effectively fuses heterogeneous data sources to predict crop yield for different crops and regions.
http://arxiv.org/abs/2401.11840v1
Compressor summary: The paper proposes a diffusion learning framework that adapts the feature aggregation range using a scale parameter, overcomes limitations of conventional GNNs, and achieves state-of-the-art performance on node-wise classification tasks.
http://arxiv.org/abs/2401.11839v1
Compressor summary: The text discusses how recent advancements in artificial intelligence, especially large language models, have led to a reevaluation of general AI possibilities and increased interest in combining AI with social science research, focusing on enhancing social science methods and studying AI as a social entity.
http://arxiv.org/abs/2401.11835v1
Compressor summary: The study compares deep learning models for facial expression recognition and their similarity to human perception using heatmaps and finds limited alignment between humans and AIs, with pre-training affecting the results.
http://arxiv.org/abs/2401.11831v1
Compressor summary: The paper compares different deep learning methods for document image binarization using DIBCO datasets and provides public resources for further research.
http://arxiv.org/abs/2401.11824v1
Compressor summary: The paper proposes Relation-Centered Kernel Alignment (RCKA), a novel framework to improve knowledge distillation using centered kernel alignment (CKA) by connecting it to maximum mean discrepancy (MMD) and customizing its application for different tasks.
http://arxiv.org/abs/2401.11819v1
Compressor summary: SC-Math6 is a challenging dataset for testing mathematical reasoning skills of Chinese language models, with top models like GPT-4 performing well.
http://arxiv.org/abs/2401.11817v1
Compressor summary: The paper shows that hallucination in large language models is unavoidable due to computational limitations and discusses its implications for safety and mitigation strategies.
http://arxiv.org/abs/2401.11814v1
Compressor summary: The paper introduces a labeled dataset of neonatal brain MRI images that can help train models to detect anomalies and diagnose clinical pathologies by analyzing symmetry patterns.
http://arxiv.org/abs/2401.11810v1
Compressor summary: This paper connects the generalization error of a base predictor to the informativeness of its conformal prediction sets and derives an upper bound on their expected size.
http://arxiv.org/abs/2401.11798v1
Compressor summary: The paper proposes a cost function for knowledge distillation to improve real-time traffic prediction using a spatio-temporal graph neural network while reducing its parameters and execution time.
http://arxiv.org/abs/2401.11796v1
Compressor summary: The paper presents a framework for explaining deep learning video models by adapting six existing techniques and evaluating their performance.
http://arxiv.org/abs/2401.11791v1
Compressor summary: The paper proposes a method called SemPLeS that uses semantic prompt learning to improve weakly-supervised semantic segmentation by enhancing the alignment between segmented regions and target object categories.
http://arxiv.org/abs/2401.11790v1
Compressor summary: Key points: - The text discusses a systematic review of literature on AAL systems for elderly people's safety - The focus is on fall detection and HAR using DL approaches on computer vision data - The text provides data collections, strengths, weaknesses, and recommendations for future works Summary: The text reviews AAL systems that use DL methods to detect falls and recognize human activities for elderly safety, and offers data, analysis, and suggestions for further research.
http://arxiv.org/abs/2401.11783v1
Compressor summary: The paper proposes a novel framework for full-body motion reconstruction from sparse sensor data using a Body Pose Graph that captures temporal and spatial features of human joints.
http://arxiv.org/abs/2401.11775v1
Compressor summary: The paper proposes a new method for referring image segmentation using Collaborative Position Reasoning Network (CPRN) with Row-and-Column and Guided Holistic interactive modules to explicitly model entity localization and achieve accurate segmentation.
http://arxiv.org/abs/2401.11772v1
Compressor summary: LightDiC is a scalable directed graph neural network that uses the magnetic Laplacian and achieves high performance on various downstream tasks with fewer parameters and faster training.
http://arxiv.org/abs/2401.11768v1
Compressor summary: The authors propose a method to predict properties of crystal materials using a graph neural network that considers both bond distances and angles, while reducing the time cost by partitioning neighbors into different scales.
http://arxiv.org/abs/2401.11767v1
Compressor summary: The paper proposes a Hierarchical Coherence Modeling (HCM) segmenter for concealed object segmentation (COS), which improves feature coherence and uses a reversible re-calibration decoder to detect previously undetected parts in low-confidence regions.
http://arxiv.org/abs/2401.11760v1
Compressor summary: MEGU is a new graph unlearning method that evolves both prediction and unlearning capabilities, improving efficiency and performance on various graph tasks.
http://arxiv.org/abs/2401.11753v1
Compressor summary: The paper compares and combines Knowledge Organization and Knowledge Representation methods for modeling knowledge in different domains and showcases their integration in a real-world application.
http://arxiv.org/abs/2401.11751v1
Compressor summary: Late aggregation preserves pairwise costs and enables more accurate Multi-view Stereo estimations by fully utilizing geometric matching cues without losing cost fidelity.
http://arxiv.org/abs/2401.11740v1
Compressor summary: Key points: - Cross-modal pretraining model can produce pseudo-labels for image clustering - Erroneous alignments can degrade clustering performance - Multi-level Cross-modal Alignment method improves alignments by building a smaller but better semantic space and aligning at three levels - Theoretical and experimental results support the effectiveness of the new method Summary: The paper proposes a novel method to improve cross-modal pretraining for image clustering by aligning images and texts at different levels, achieving better performance with theoretical and empirical evidence.
http://arxiv.org/abs/2401.11739v1
Compressor summary: The authors propose a method to generate fine-grained image segmentation masks from pre-trained diffusion models without additional training by exploiting the generation process and semantic correspondences between pixels and low-dimensional feature maps.
http://arxiv.org/abs/2401.11738v1
Compressor summary: MetaSeg is a meta learning based semantic segmentation method that uses a content-aware meta-net to identify and ignore noisy labels in pseudo segmentation labels, improving performance with a decoupled training strategy.
http://arxiv.org/abs/2401.11734v1
Compressor summary: The paper provides a comprehensive review of deep learning-based colorectal polyp segmentation methods, covering network architectures, supervision levels, learning paradigms, datasets, evaluation metrics, and current challenges and trends.
http://arxiv.org/abs/2401.11726v1
Compressor summary: The paper proposes a novel OOD detection method using conditional distribution entropy based on optimal transport to utilize both training and test input distributions effectively.
http://arxiv.org/abs/2401.11725v1
Compressor summary: S2L helps large language models understand and reason with symbols by converting them to natural language representations.
http://arxiv.org/abs/2401.11724v1
Compressor summary: APNT uses TransMix data augmentation to improve few-shot hyperspectral image classification by generating synthetic boundary patches and labels.
http://arxiv.org/abs/2401.11720v1
Compressor summary: The text introduces graph condensation as an innovative solution to address challenges in working with large graphs, provides an overview of existing research on graph condensation, and discusses its applications and future directions.
http://arxiv.org/abs/2401.11719v1
Compressor summary: Our method improves semantic segmentation by calibrating shared features in classifier weights using class prototypes and a Multi-Scaled Distribution-Weighted loss, addressing the issue of long-tailed distribution affecting CAM quality.
http://arxiv.org/abs/2401.11718v1
Compressor summary: MsSVT++ is a 3D object detection method that uses mixed-scale features and a divide-and-conquer approach to capture long-range and fine-grained information, improving accuracy in outdoor scenes.
http://arxiv.org/abs/2401.11713v1
Compressor summary: Ada-ABC is a debiasing framework for medical image classification that uses multiple classifiers to learn dataset bias and guides a debiasing model to agree or disagree with them on correctly and incorrectly predicted samples, improving accuracy and fairness.
http://arxiv.org/abs/2401.11711v1
Compressor summary: HG3-NeRF is a novel method for improving NeRF's performance in synthesizing scenes with sparse view inputs by incorporating geometric, semantic, and photometric guidance.
http://arxiv.org/abs/2401.11708v1
Compressor summary: The paper introduces RPG, a text-to-image generation/editing framework that uses multimodal LLMs for chain-of-thought reasoning to enhance compositional image generation and editing.
http://arxiv.org/abs/2401.11704v1
Compressor summary: EK-Net is a new method for scene text detection that compensates for the shrinked kernel problem and achieves high accuracy and speed.
http://arxiv.org/abs/2401.11700v1
Compressor summary: The study proposes a new way to transfer knowledge from a BERT language model to an ASR model using intermediate layers and an attention decoder, improving accuracy and parallel decoding.
http://arxiv.org/abs/2401.11698v1
Compressor summary: The authors propose deep learning models for validating undergraduate admissions decisions and incorporate an interpretability module to improve fairness and accuracy.
http://arxiv.org/abs/2401.11694v1
Compressor summary: Parametric matrix models are machine learning algorithms that use matrix equations to approximate solutions efficiently and can be applied to various problems without needing high-fidelity model calculations.
http://arxiv.org/abs/2401.11674v1
Compressor summary: The text proposes a memory-efficient method to improve histopathology classification by attaching lightweight prompts to the initial model and updating them incrementally with a style-augmented graph attention network.
http://arxiv.org/abs/2401.11673v1
Compressor summary: The paper introduces MVSFormer++, a transformer-based model with attention mechanisms that enhances different components of the MVS pipeline by infusing cross-view information and optimizing various design details.
http://arxiv.org/abs/2401.11669v1
Compressor summary: The paper proposes a novel algorithm that combines adaptive curve grey wolf optimization and neural network backpropagation to improve medical image processing and achieve better heart disease prediction.
http://arxiv.org/abs/2401.11667v1
Compressor summary: INCPrompt is a novel approach to continual learning that prevents catastrophic forgetting by using adaptive key-learner and task-aware prompts to encode general and specific knowledge.
http://arxiv.org/abs/2401.11666v1
Compressor summary: P2DT is a novel method to reduce catastrophic forgetting in reinforcement learning by dynamically appending decision tokens during new task training, leveraging knowledge from previous tasks and enhancing transformer-based models.
http://arxiv.org/abs/2401.11664v1
Compressor summary: This paper proposes a zero-space fault protection mechanism for ReRAM-based DNNs that uses structure pruning, weight duplication, and embedding duplicated MSBs to reduce prediction errors due to hardware failures.
http://arxiv.org/abs/2401.11660v1
Compressor summary: Differentiable Tree Search (DTS) is a new neural network architecture that improves decision-making with limited data by combining a world model with an online search algorithm, resulting in better performance than other methods in Procgen games and grid navigation tasks.
http://arxiv.org/abs/2401.11654v1
Compressor summary: The authors propose a new approach for zero-shot action recognition using rich video descriptions from a large-scale dataset (ActionHub), and a novel framework (CoCo) that aligns features across actions and modalities.
http://arxiv.org/abs/2401.11650v1
Compressor summary: PointGL is a fast and efficient point cloud analysis architecture that uses global point embedding and local graph pooling to reduce computational redundancy.
http://arxiv.org/abs/2401.11649v1
Compressor summary: Key points: - The paper introduces a new framework called MMClip that uses multimodal adapters and multi-task decoder to improve video action recognition. - The framework preserves high supervised performance and robust transferability. Summary: The paper presents MMClip, a novel framework for video action recognition that enhances visual and text representations with multimodal adapters and leverages multiple tasks in the decoder for strong performance and generalization.
http://arxiv.org/abs/2401.11648v1
Compressor summary: NECHO is a new framework that uses medical codes and other EHR data to predict future diagnoses, while considering the hierarchy and diversity in EHR data.
http://arxiv.org/abs/2401.11644v1
Compressor summary: Key points: - The paper proposes two transformers (MS-AST and MS-ASCT) for surgical phase recognition using spatial and temporal features - The method uses multi-scale temporal self-attention and cross-attention to model temporal relationships - The method achieves new state-of-the-art results on online and offline surgical datasets and non-medical video action segmentation datasets Summary: The paper introduces two transformers that use spatial and temporal features to recognize surgical phases, with multi-scale attention for capturing temporal relationships, and shows their superior performance on various datasets.
http://arxiv.org/abs/2401.11641v1
Compressor summary: Large Language Models (LLMs) like ChatGPT are being used in financial tasks such as report generation, market forecasting, sentiment analysis, and personalized advice, improving efficiency and customer satisfaction in the finance industry.
http://arxiv.org/abs/2401.11633v1
Compressor summary: Zoom-shot is a novel method that transfers CLIP's zero-shot capabilities to any pre-trained vision encoder using multimodal losses, enabling efficient and unsupervised Vision-Language Model development.