This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-04-02 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.20331v1
Compressor summary: The paper proposes Unsolvable Problem Detection (UPD) to test Vision Language Models' ability to handle unanswerable questions in VQA tasks and explores various solutions to improve their performance.
http://arxiv.org/abs/2403.20330v1
Compressor summary: MMStar is a new benchmark for evaluating large vision-language models that requires visual content and avoids unintentional data leakage, addressing issues in current multi-modal evaluations.
http://arxiv.org/abs/2403.20329v1
Compressor summary: The paper shows how large language models can be used to improve reference resolution for different types of entities, including on-screen ones, and achieve performance comparable to or better than GPT-4.
http://arxiv.org/abs/2403.20327v1
Compressor summary: Gecko is a compact text embedding model that uses distilled knowledge from large language models to achieve strong retrieval performance.
http://arxiv.org/abs/2403.20324v1
Compressor summary: The paper presents Transformer models with cross-channel attention for localizing the epileptogenic focus using electrical stimulation responses, achieving better results than previous methods and handling different electrode placements and patient variability.
http://arxiv.org/abs/2403.20322v1
Compressor summary: The paper proposes a framework for systematically evaluating rationalizing explanations in NLP, with examples from automated fact verification.
http://arxiv.org/abs/2403.20320v1
Compressor summary: MTLoRA is a novel framework for efficient fine-tuning of multi-task learning models that achieves better accuracy and efficiency than existing methods while reducing the number of trainable parameters by 3.6x.
http://arxiv.org/abs/2403.20318v1
Compressor summary: The paper investigates the problem of monocular 3D detectors generalizing to large objects and proposes SeaBird, a method that uses segmentation in bird's view with dice loss for better noise-robustness.
http://arxiv.org/abs/2403.20317v1
Compressor summary: ConvPrompt is a novel convolutional prompt creation mechanism for continuous learning that uses layer-specific embeddings, generates text descriptions for each category, and adapts the number of prompts based on task similarity, improving performance without increasing parameter overhead.
http://arxiv.org/abs/2403.20312v1
Compressor summary: This paper introduces CoN-CLIP, a framework that improves vision-language models' understanding of negations by using CC-Neg, a new dataset, and modifying CLIP's contrastive loss, leading to better zero-shot image classification and compositionality performance.
http://arxiv.org/abs/2403.20309v1
Compressor summary: InstantSplat is a framework that combines point-based representations with dense stereo models to quickly estimate camera intrinsics and extrinsics from sparse-view images, improving novel view synthesis performance.
http://arxiv.org/abs/2403.20308v1
Compressor summary: ChainNet is a lexical resource that captures how senses of words are related by metaphor or metonymy in the Open English Wordnet.
http://arxiv.org/abs/2403.20306v1
Compressor summary: The paper discusses how to optimize energy usage and performance of large language models in data centers by exploring trade-offs between various parameters.
http://arxiv.org/abs/2403.20288v1
Compressor summary: Large Language Models can assist and correct physicians in medical decision-making tasks by interacting effectively with them, depending on prompt design and accuracy.
http://arxiv.org/abs/2403.20287v1
Compressor summary: The paper introduces a framework for evaluating counterfactual image generation methods using various metrics and provides a Python package to benchmark different approaches.
http://arxiv.org/abs/2403.20284v1
Compressor summary: The paper analyzes BERT components, finds output LayerNorm is crucial for fine-tuning, and shows that fine-tuning a small part of it can achieve comparable or better results than full fine-tuning.
http://arxiv.org/abs/2403.20280v1
Compressor summary: The paper explores how masked multimodal transformers can learn robust embeddings when modalities are sparse and proposes a new attention mechanism (MCA) that improves embedding quality and task performance.
http://arxiv.org/abs/2403.20279v1
Compressor summary: Extsc{Luq} is a new UQ method for long text generation that helps identify and reduce nonfactual outputs in large language models.
http://arxiv.org/abs/2403.20275v1
Compressor summary: Tactile-Informed 3DGS is a new method that uses touch data and vision to create more accurate and smoother 3D object models, especially for non-Lambertian surfaces like shiny or reflective ones.
http://arxiv.org/abs/2403.20273v1
Compressor summary: CATSNet is a context-aware deep learning method that uses convolutional neural networks to estimate forest and ground heights from TomoSAR data, outperforming existing techniques by leveraging patch-based information and context within MB TomoSAR data.
http://arxiv.org/abs/2403.20271v1
Compressor summary: The paper introduces a new model and dataset for visual prompting with artificial intelligence, improving the ability of multimodal language models to understand images and follow instructions.
http://arxiv.org/abs/2403.20266v1
Compressor summary: Latxa is a large language model family for Basque with new pretraining and evaluation datasets, improving Basque LLM performance.
http://arxiv.org/abs/2403.20262v1
Compressor summary: ELITR-Bench is a new benchmark for long-context LLMs focused on a meeting assistant scenario, revealing gaps between open-source and proprietary models and limitations of GPT-4's evaluation method.
http://arxiv.org/abs/2403.20260v1
Compressor summary: The paper proposes a framework to evaluate the quality of interpretable prototype-based models for breast cancer prediction using mammography, finding that while they perform well compared to black-box models, prototype quality still needs improvement.
http://arxiv.org/abs/2403.20254v1
Compressor summary: This paper evaluates the robustness of temporal action detection methods to frame corruptions, builds two benchmarks, and proposes a simple but effective method to improve robustness using FrameDrop augmentation and Temporal-Robust Consistency loss.
http://arxiv.org/abs/2403.20253v1
Compressor summary: The paper presents MedCLIP-SAM, a novel framework that uses CLIP and SAM models to generate segmentation of medical images using text prompts in zero-shot and weakly supervised settings, improving data efficiency and generalizability.
http://arxiv.org/abs/2403.20252v1
Compressor summary: The text discusses using large language models to align with human population preferences for various applications and evaluates different fine-tuning approaches and a new loss term for this purpose.
http://arxiv.org/abs/2403.20249v1
Compressor summary: The paper proposes Relation Rectification, a method that uses Heterogeneous Graph Convolutional Networks to adjust text embeddings and improve the visual representation of relationships between objects in text-to-image diffusion models.
http://arxiv.org/abs/2403.20246v1
Compressor summary: The study proposes a method to increase interpretability of dimension-reduced biomedical data by overlaying class and feature centroids on scatter plots.
http://arxiv.org/abs/2403.20236v1
Compressor summary: LTAD is a novel method that detects defects in images from multiple and long-tailed classes without relying on class names, using reconstruction and semantic modules.
http://arxiv.org/abs/2403.20234v1
Compressor summary: The article explores the use of artificial neural networks to classify motor/sensory stimuli from electroneurographic signals in implanted nerve interfaces for neuropathy recovery.
http://arxiv.org/abs/2403.20231v1
Compressor summary: The study proposes a new method for fine-grained visual appearance personalization that uses user-provided sentences and a decoupled self-augmentation strategy to learn target attributes and improve controllability and flexibility.
http://arxiv.org/abs/2403.20221v1
Compressor summary: GRADE is a novel graph neural network model that uses nonlinear diffusion and aggregation to avoid over-smoothing and create node clusters.
http://arxiv.org/abs/2403.20215v1
Compressor summary: The paper introduces a revised Arabic WordNet that improves its quality and covers multiple aspects of lexico-semantic resources.
http://arxiv.org/abs/2403.20213v1
Compressor summary: The authors developed a helpful and honest remote sensing vision-language model (H2RSVLM) that improves spatial perception and answers only answerable questions using two new datasets, HqDC-1.4M and RSSA.
http://arxiv.org/abs/2403.20212v1
Compressor summary: The paper investigates how various factors affect the performance and generalization of unsupervised learning methods for solving the Travelling Salesman Problem (TSP) using a graph neural network and a heat map approach.
http://arxiv.org/abs/2403.20208v1
Compressor summary: This research trains a large language model on a corpus of tables with annotations and shows its effectiveness in solving classification, regression, and imputation tasks for tabular data.
http://arxiv.org/abs/2403.20204v1
Compressor summary: The text proposes a comprehensive debunking process using AI that detects rumors and provides explanations to refute misinformation, ensuring high credibility assessment and relevant knowledge retrieval.
http://arxiv.org/abs/2403.20196v1
Compressor summary: The paper presents a fully automatic method to map discourse relations from different frameworks using label embeddings learned by contrastive learning.
http://arxiv.org/abs/2403.20195v1
Compressor summary: The Spatially Constrained Bayesian Network (SCB-Net) is a new architecture that effectively uses auxiliary data and learns from spatial patterns to create reliable geological maps with uncertainty assessment.
http://arxiv.org/abs/2403.20193v1
Compressor summary: The paper proposes Motion Embeddings, a new way to represent and manipulate motion in videos using one-dimensional vectors that work well with video diffusion models.
http://arxiv.org/abs/2403.20186v1
Compressor summary: The text describes a novel workflow that uses generative AI to create floorplans and 3D models from sketches, enabling faster architectural design based on textual descriptions.
http://arxiv.org/abs/2403.20183v1
Compressor summary: HARMamba is a lightweight selective state space model for real-time wearable sensor activity recognition that outperforms Transformer-based models while reducing computational and memory overhead.
http://arxiv.org/abs/2403.20180v1
Compressor summary: This paper introduces TMLU, a new evaluation suite for Chinese language models, especially Taiwanese Mandarin, that covers various subjects and assesses their knowledge and reasoning skills with explanations.
http://arxiv.org/abs/2403.20177v1
Compressor summary: The text discusses the theoretical and empirical possibility of artificial consciousness, suggesting that dimensions and profiles of consciousness should be used for a balanced discussion, and outlines a research strategy for realizing "awareness" in artificial systems.
http://arxiv.org/abs/2403.20159v1
Compressor summary: The paper proposes HGS-Mapping, a fast and accurate online dense mapping framework for urban scenes using Hybrid Gaussian Representation, which models different parts of the scene with Gaussians with distinct properties.
http://arxiv.org/abs/2403.20158v1
Compressor summary: The study compares ChatGPT's ability to detect different types of media bias against fine-tuned models, showing mixed results.
http://arxiv.org/abs/2403.20157v1
Compressor summary: Subword methods improve machine translation for low-resource languages, but their effectiveness depends on orthographic word boundaries and fine-tuning methods.
http://arxiv.org/abs/2403.20153v1
Compressor summary: The paper introduces Talk3D, a framework that uses 3D-aware generative prior to synthesize realistic talking head animations from audio inputs, achieving better performance than existing methods.
http://arxiv.org/abs/2403.20151v1
Compressor summary: This paper proposes a decentralized incentive mechanism using multi-agent deep reinforcement learning for allocating AI-generated content on roadside units in the Internet of Vehicles, improving user experience and reducing latency.
http://arxiv.org/abs/2403.20150v1
Compressor summary: TFB is an automated benchmark for comparing time series forecasting methods across diverse domains, datasets, and evaluation strategies.
http://arxiv.org/abs/2403.20149v1
Compressor summary: The paper investigates how conformal prediction, a probabilistic forecasting method, improves photovoltaic power predictions for electricity markets using different bidding strategies and shows that it outperforms linear methods in profit and energy balance.
http://arxiv.org/abs/2403.20147v1
Compressor summary: IndiBias is a new dataset for evaluating social biases in Indian context and languages, covering various dimensions and intersectionality, based on existing resources and LLMs' inputs.
http://arxiv.org/abs/2403.20145v1
Compressor summary: The authors evaluate state-of-the-art language models for generating summaries from mental health examinations, finding that their fine-tuned model performs well and could potentially improve support in developing countries.
http://arxiv.org/abs/2403.20142v1
Compressor summary: StegoGAN is a novel GAN-based model that uses steganography to prevent spurious features in non-bijective image translation tasks, enhancing semantic consistency without extra supervision.
http://arxiv.org/abs/2403.20137v1
Compressor summary: The paper proposes a novel method to improve quantization accuracy using low precision BFP formats by rearranging outliers in weights and activations, reducing memory footprint without compromising model accuracy.
http://arxiv.org/abs/2403.20134v1
Compressor summary: The text discusses how AI assistants can better understand users' mental states to provide more personalized guidance during tasks using large language models.
http://arxiv.org/abs/2403.20127v1
Compressor summary: This paper examines how prompts affect the accuracy of zero-shot detectors in identifying AI-generated texts and proposes a framework to evaluate them.
http://arxiv.org/abs/2403.20126v1
Compressor summary: The paper presents ECLIPSE, a novel method for continual panoptic segmentation that fine-tunes prompt embeddings and uses logit manipulation to address catastrophic forgetting and plasticity, achieving state-of-the-art results.
http://arxiv.org/abs/2403.20124v1
Compressor summary: The study proposes a novel machine learning approach to classify patients undergoing metabolic bariatric surgery, showing that enhanced KNN and Decision Tree models with oversampling techniques can achieve high accuracy in predicting patient outcomes.
http://arxiv.org/abs/2403.20122v1
Compressor summary: LUGSI is a new learning paradigm that improves classification performance and speed on large-scale datasets by using granularity statistical invariants to enhance structural information and reduce computational cost.
http://arxiv.org/abs/2403.20112v1
Compressor summary: The paper proposes a doctor-in-the-loop method to improve machine learning models for breast cancer analysis using genomic data and image analysis, but finds that it is not always effective due to the complexity of the domain.
http://arxiv.org/abs/2403.20109v1
Compressor summary: Mol-AIR is a reinforcement learning framework for generating molecules with desired properties using adaptive intrinsic rewards and random distillation network, improving drug discovery efficiency.
http://arxiv.org/abs/2403.20106v1
Compressor summary: The paper proposes an efficient image deblurring network using selective structured state spaces and aggregate local and global blocks to balance between accuracy and efficiency.
http://arxiv.org/abs/2403.20105v1
Compressor summary: The text describes a zero-shot, training-free method for image segmentation using foundation models like CLIP and diffusion models, which achieves competitive results compared to weakly-supervised approaches.
http://arxiv.org/abs/2403.20103v1
Compressor summary: The paper provides a guide for NLP researchers to conduct counterspeech research against online hate by describing steps, best practices, and open challenges.
http://arxiv.org/abs/2403.20101v1
Compressor summary: RealKIE is a new benchmark with five diverse datasets for key information extraction research, focusing on enterprise applications and addressing real-world challenges like text serialization, sparse annotations, and complex tables.
http://arxiv.org/abs/2403.20097v1
Compressor summary: The paper introduces a computational consciousness structure called ITCM and an agent (ITCMA) that enhances LLMs' understanding of implicit instructions and common-sense knowledge for better performance in open-world settings, including real-world tasks with robots.
http://arxiv.org/abs/2403.20092v1
Compressor summary: Key points: - The paper proposes a new method to handle multiple weather conditions in outdoor scenes for computer vision tasks - The method models weather uncertainty using a Gaussian mixture model and prior-posterior learning - A new dataset (MePe) is introduced to benchmark the proposed method and other weather classification tasks - The method achieves state-of-the-art performance and generalization capabilities Summary: The paper presents a novel approach to handle multiple weather conditions in outdoor scenes using a Gaussian mixture model and prior-posterior learning, along with a new dataset and benchmarks.
http://arxiv.org/abs/2403.20089v1
Compressor summary: The text discusses fairness in AI from a European law perspective, highlighting the need for bridging algorithmic fairness and non-discrimination law through the AI Act, which may affect bias detection and correction strategies.
http://arxiv.org/abs/2403.20088v1
Compressor summary: The authors study how different languages influence the performance of pre-trained multilingual models on various target languages, using adapter units to disentangle language effects and provide a list of recommended transfer configurations.
http://arxiv.org/abs/2403.20084v1
Compressor summary: The text discusses the need for a standardized IPA system for Bengali pronunciation and presents a new framework that includes a novel dataset and deep learning-based benchmarks.
http://arxiv.org/abs/2403.20080v1
Compressor summary: The paper presents a method to optimize and compress large vision models using mixed-precision search and memory-efficient training, achieving significant BitOPs reduction without sacrificing performance.
http://arxiv.org/abs/2403.20079v1
Compressor summary: The text proposes a new approach that improves neural rendering for street scenes by combining a diffusion model and multi-modal data to handle deviations from training viewpoints.
http://arxiv.org/abs/2403.20078v1
Compressor summary: NegLabel is a novel OOD detection method for vision-language models that uses negative labels from corpus databases and achieves state-of-the-art performance on various benchmarks and domains.
http://arxiv.org/abs/2403.20056v1
Compressor summary: The study examines how well multilingual language models perform in recognizing named entities across different languages, finding that transfer ability depends on shared entity chunks and robustness to input perturbations.
http://arxiv.org/abs/2403.20047v1
Compressor summary: Sparse training can make deep neural networks unreliable in detecting out-of-distribution data, but a new method improves their performance and reliability without increasing costs or requiring extra data.
http://arxiv.org/abs/2403.20046v1
Compressor summary: The study explores how large language models can learn from their mistakes in reasoning tasks using new benchmark CoTErrorSet and two methods: self-rethinking prompting and mistake tuning.
http://arxiv.org/abs/2403.20041v1
Compressor summary: The authors propose four optimization techniques to improve LLM deployment on mobile devices and achieve significant speedups in inference tasks compared to existing methods.
http://arxiv.org/abs/2403.20034v1
Compressor summary: NeSLAM is a framework that improves 3D reconstruction and camera tracking in RGB-D SLAM systems using NeRF, dense depth estimation, SDF scene representation, and self-supervised feature tracking.
http://arxiv.org/abs/2403.20032v1
Compressor summary: HO-Gaussian is a hybrid method that improves neural rendering of urban scenes by combining 3D Gaussian Splatting with grid-based volume and view-dependent color representation, overcoming previous limitations and enabling real-time photo-realistic results on multi-camera datasets.
http://arxiv.org/abs/2403.20031v1
Compressor summary: Key points: - Human-centric PVU is a field that extracts and interprets human features from point clouds for various applications. - Previous works rely on huge labeled data, which has poor generalization capability. - The paper proposes a unified framework that uses prior knowledge and inherent features to improve performance on action recognition and 3D pose estimation. Summary: The paper presents a novel framework for human-centric point cloud video understanding that leverages prior knowledge and data features to achieve state-of-the-art results on action recognition and 3D pose estimation tasks.
http://arxiv.org/abs/2403.20026v1
Compressor summary: The Feature Swapping Multi-modal Reasoning (FSMR) model enhances textual and visual understanding by exchanging features between images and words, using a pre-trained visual-language encoder and a multi-modal cross-attention mechanism.
http://arxiv.org/abs/2403.20022v1
Compressor summary: Psychometry is an omnifit model that uses fMRI data from different subjects to reconstruct images by capturing inter-subject commonalities and individual differences, enhancing the representation with subject-specific memories.
http://arxiv.org/abs/2403.20015v1
Compressor summary: The paper presents a text data augmentation method that deletes adverbs to preserve semantics while being efficient and effective for various tasks.
http://arxiv.org/abs/2403.20013v1
Compressor summary: Key points: - The paper proposes a method to remove waterdrops from multi-view images degraded by weather conditions - The method uses an attention network and a Neural Radiance Field model - The method can generate clear 3D scenes and high-quality novel-view images with waterdrops removed - The method outperforms existing SOTA methods for image adhesive waterdrop removal Summary: The paper presents a novel method that uses an attention network and a Neural Radiance Field model to remove waterdrops from multi-view images and generate clear 3D scenes and high-quality novel-view images, surpassing existing SOTA methods.
http://arxiv.org/abs/2403.20012v1
Compressor summary: The study introduces colorful cutout, a curriculum data augmentation technique for images that gradually increases noise and difficulty, improving generalization and performance.
http://arxiv.org/abs/2403.20009v1
Compressor summary: The text investigates how large language models can know correct answers but still hallucinate, using inference dynamics analysis and a classifier based on output token probabilities.
http://arxiv.org/abs/2403.20005v1
Compressor summary: The paper proposes situational dialogue models for second language learners to practice conversational skills using large language models, which can handle various topics and are evaluated automatically.
http://arxiv.org/abs/2403.20002v1
Compressor summary: The paper introduces a theoretical framework for analyzing grid-based neural field models and develops a novel model, MulFAGrid, which outperforms existing models in various tasks.
http://arxiv.org/abs/2403.19996v1
Compressor summary: Key points: - IoT sensor data has heterogeneity (different timestamps, frequencies, locations, units) - Traditional time series classification algorithms struggle with this heterogeneity - Proposed deep learning model combines CNN and BGRU to learn local and global features - Model outperforms state-of-the-art methods and baselines Summary: The paper proposes a novel deep learning model that learns both local and global features from heterogeneous IoT sensor data, achieving better classification results than existing methods.
http://arxiv.org/abs/2403.19995v1
Compressor summary: Key points: - The text discusses how humans can apply learned behavior to unlearned situations using compositionality, a skill that combines language and action. - The authors propose a neural network model that integrates vision, proprioception, and language to learn this skill in robots. - The results show that increasing task variations improves generalization and that visual attention and working memory are crucial for achieving linguistic goals. Summary: The text presents a brain-inspired model that teaches robots compositionality, the ability to reuse language and action parts in new situations, and shows how it benefits from increased task variation and visual attention.
http://arxiv.org/abs/2403.19992v1
Compressor summary: MindArm is a low-cost, non-invasive neuro-driven prosthetic arm system that uses EEG electrodes and a deep neural network to translate brain signals into prosthetic arm motions.
http://arxiv.org/abs/2403.19985v1
Compressor summary: The paper introduces an algorithm that uses a new surface regularization technique called ASDF to generate high-quality novel views from few input images, making it faster and more stable than existing methods.
http://arxiv.org/abs/2403.19980v1
Compressor summary: Key points: - Cattle face recognition is important for animal husbandry and behavioral research - New large-scale dataset ICRWE created for wild environments - Novel parallel attention network (PANet) introduced for cattle face recognition - PANet achieves 88.03% accuracy on ICRWE, the state-of-the-art approach Summary: The paper presents a new large-scale dataset and a novel network for recognizing cattle faces in wild environments, achieving the highest accuracy so far.
http://arxiv.org/abs/2403.19979v1
Compressor summary: The paper proposes a method for continuous learning that improves on previous approaches by using adapter tuning and feature sampling without expanding the model or retaining old samples.
http://arxiv.org/abs/2403.19976v1
Compressor summary: Event cameras can be used for static traffic monitoring with high performance, as shown by the novel eTraM dataset and its evaluation on various scenarios and models.
http://arxiv.org/abs/2403.19975v1
Compressor summary: TNL aims to track a target in a video using language, but existing methods have issues with drift and ambiguity; the proposed joint multi-modal framework improves accuracy and consistency by leveraging visual and linguistic cues and decoding them together.
http://arxiv.org/abs/2403.19969v1
Compressor summary: The SMART pruner is a new technique for improving the accuracy of DNN pruning by using a learnable probability mask, differentiable Top k operator, and dynamic temperature parameter in a dynamic, differentiable, and adaptable way.
http://arxiv.org/abs/2403.19967v1
Compressor summary: The paper explores how element-wise multiplication (star operation) can create non-linear features in neural networks without increasing size and introduces StarNet, a prototype that shows promising results.
http://arxiv.org/abs/2403.19964v1
Compressor summary: FairRAG is a framework that uses external images to improve fairness and diversity in text-to-image generation models by applying debiasing strategies.
http://arxiv.org/abs/2403.19963v1
Compressor summary: The text introduces a novel design for efficient vision networks called EfficientMod, which improves accuracy and efficiency by using a modulation mechanism with an MLP block.
http://arxiv.org/abs/2403.19962v1
Compressor summary: The authors propose a method to improve LLMs' abilities as intelligent agents by using GPT-4-constructed data, supervised fine-tuning, and techniques like multi-path reasoning and task decomposition.
http://arxiv.org/abs/2403.19950v1
Compressor summary: The paper proposes a method for making confident predictions in out-of-distribution settings by modifying split conformal prediction and proves its validity.
http://arxiv.org/abs/2403.19949v1
Compressor summary: Key points: - FairVLMed is a new medical vision-language dataset for studying fairness in VL models - CLIP and BLIP2, two widely-used VL models, show significant biases across four protected attributes - FairCLIP is a proposed method to reduce these biases using optimal transport Summary: The paper introduces FairVLMed, the first dataset for fairness analysis of medical vision-language models. It shows that CLIP and BLIP2 are biased towards certain subgroups and proposes FairCLIP to mitigate these biases.
http://arxiv.org/abs/2403.19944v1
Compressor summary: The paper explores using extremely compact binary neural networks for low-light raw video enhancement, addressing issues of temporal information fusion and performance gap between binary and full precision convolutions.
http://arxiv.org/abs/2403.19943v1
Compressor summary: TDANet is a novel Deep Learning technique for fault diagnosis in noisy industrial environments, using multi-scale 2D convolutions and attention modules to enhance signal features and achieve high diagnostic accuracy.
http://arxiv.org/abs/2403.19941v1
Compressor summary: The paper proposes Diverse Feature Learning (DFL), which combines self-distillation and reset to help models preserve important features and learn new ones for image classification.
http://arxiv.org/abs/2403.19936v1
Compressor summary: SLFNet is a new neural network for natural language interfaces that uses syntactic information, semantic probability graphs, and Multi-Head SLF Attention to convert user commands into structured forms.
http://arxiv.org/abs/2403.19935v1
Compressor summary: The authors review feature point detection and description methods for high dynamic range images and propose two modified algorithms (SIFT for HDR and Harris for HDR) that improve performance in computer vision tasks.
http://arxiv.org/abs/2403.19930v1
Compressor summary: The paper studies how to fine-tune large language models for a specific natural language understanding task in supervised settings, focusing on Chinese short text matching and examining different factors that affect performance.
http://arxiv.org/abs/2403.19928v1
Compressor summary: The paper introduces DiJiang, a linear complexity model for Transformers that reduces training costs and inference speeds using frequency domain kernelization and Discrete Cosine Transform.
http://arxiv.org/abs/2403.19926v1
Compressor summary: The paper presents a novel video-based human pose regression method that efficiently captures spatial and temporal dependencies using a Decoupled Space-Time Aggregation network (DSTA) without relying on intermediate heatmaps.
http://arxiv.org/abs/2403.19925v1
Compressor summary: The paper explores combining Decision Transformer and Mamba frameworks to improve sequential decision-making in reinforcement learning tasks by enhancing sequence modeling efficiency and effectiveness.
http://arxiv.org/abs/2403.19924v1
Compressor summary: The paper proposes a novel network called SceneTracker that can estimate long-term 3D motion in scenes by using appearance and depth correlation features and the Transformer architecture.
http://arxiv.org/abs/2403.19920v1
Compressor summary: MI-NeRF is a single neural network that learns non-rigid facial motion for multiple identities from monocular videos, using a multiplicative module to capture identity and non-identity information interactions.
http://arxiv.org/abs/2403.19919v1
Compressor summary: The diffusion matching model uses a denoising process to create robust correspondences for registration tasks, addressing challenges like large deformation and scale inconsistency in complex scenarios.
http://arxiv.org/abs/2403.19913v1
Compressor summary: MANGO is a benchmark for evaluating large language models' mapping and navigation skills using text-based mazes and questions.
http://arxiv.org/abs/2403.19912v1
Compressor summary: The method uses machine learning to identify and segment radio waves from space in 3D data, with high accuracy and recall rates.
http://arxiv.org/abs/2403.19907v1
Compressor summary: The paper introduces ORAL, a novel method for discovering novel classes on graphs using semi-supervised learning and multi-scale features.
http://arxiv.org/abs/2403.19905v1
Compressor summary: Key points: - The paper presents a CAD system for automatic classification of retinal images into five DR classes using CNNs and fine-tuning techniques. - The model is trained on fundus images with different resolutions and tested on the Kaggle platform. - The achieved AUC values are reported for various deep learning models. Summary: The paper introduces a CAD system that uses CNNs and fine-tuning to classify retinal images into five DR classes, achieving different AUC values for various models.
http://arxiv.org/abs/2403.19904v1
Compressor summary: The paper presents a novel localization method using only 2D-3D line geometry, avoiding visual descriptors and achieving fast and accurate results for challenging scenes.
http://arxiv.org/abs/2403.19902v1
Compressor summary: HCLNet is a novel method that uses heterogeneous architecture and contrastive learning to improve PolSAR image classification with few-shot learning and multi-features, addressing the challenges of labeled data scarcity and scattering confusion.
http://arxiv.org/abs/2403.19898v1
Compressor summary: The paper proposes a new model called StrDiffusion that guides image inpainting with structure-based semantics to reduce semantic discrepancy between masked and unmasked regions.
http://arxiv.org/abs/2403.19897v1
Compressor summary: The paper presents a novel GAN framework that enables fine-grained control over race-related facial features in 2D images using a new dataset and preserving facial identity for racial bias mitigation.
http://arxiv.org/abs/2403.19896v1
Compressor summary: The paper introduces an improved activation function for neural networks, which can increase accuracy without much extra computation, but has some tradeoffs and requires adjustable parameters.
http://arxiv.org/abs/2403.19893v1
Compressor summary: This paper proposes PLoc, a novel evaluation criterion for object detection in autonomous driving based on physical location, and presents ApolloScape-R, a re-annotated dataset reflecting this criterion.
http://arxiv.org/abs/2403.19889v1
Compressor summary: The paper proposes LogicSumm, a framework to evaluate large language models' robustness in summarization tasks, and SummRAG, a system to improve their performance by fine-tuning them on realistic scenarios.
http://arxiv.org/abs/2403.19888v1
Compressor summary: MambaMixer combines selective token and channel mixing to improve scalability and performance in vision and time series tasks, outperforming existing models.