This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-05-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2405.12981v1
Compressor summary: CLA reduces KV cache size by 2x while maintaining accuracy in transformer-based autoregressive LLMs.
http://arxiv.org/abs/2405.12979v1
Compressor summary: OmniGlue is a learnable image matcher that uses a vision foundation model to improve generalization and a novel keypoint position-guided attention mechanism for better matching descriptors, achieving significant gains on various image domains.
http://arxiv.org/abs/2405.12978v1
Compressor summary: The method generates efficient text-to-image diffusion models by learning residuals for specific concepts and applying localized attention-guided sampling to combine concept identity and generative prior.
http://arxiv.org/abs/2405.12971v1
Compressor summary: BiomedParse is a foundation model that uses textual descriptions to perform joint segmentation, detection, and recognition of 82 object types across 9 imaging modalities in biomedical image analysis.
http://arxiv.org/abs/2405.12970v1
Compressor summary: Face-Adapter is an efficient adapter for high-precision and high-fidelity face editing using pre-trained diffusion models, decoupling target structure, ID, and attribute control.
http://arxiv.org/abs/2405.12969v1
Compressor summary: EchoAlign is a new approach that aligns instance features with noisy labels using controllable generative models and selective sampling to improve machine learning accuracy.
http://arxiv.org/abs/2405.12961v1
Compressor summary: ERA is a scalable algorithm that optimizes autoregressive policies using a gradient-based objective with an explicit reward function, achieving robust molecular search and alignment in chemical space.
http://arxiv.org/abs/2405.12958v1
Compressor summary: The paper proposes an efficient online learning algorithm for linear classifiers in the presence of Massart noise and extends it to a contextual bandit setting with consistent rewards.
http://arxiv.org/abs/2405.12952v1
Compressor summary: The paper presents faster randomized algorithms for finding an $\epsilon$-optimal policy in discounted Markov decision processes with known or estimated transition matrices, improving previous state-of-the-art results.
http://arxiv.org/abs/2405.12944v1
Compressor summary: The paper introduces AMFD, a framework that uses Modal Extraction Alignment to train student networks for multispectral pedestrian detection without increasing inference time, and presents the SMOD dataset for this task.
http://arxiv.org/abs/2405.12939v1
Compressor summary: AoR is a new framework that improves the reasoning capabilities of large language models by selecting answers based on the evaluation of reasoning chains and adjusting the number of reasoning chains dynamically.
http://arxiv.org/abs/2405.12933v1
Compressor summary: The Skin-in-the-Game framework enhances moral reasoning in large language models by simulating accountability for actions and considering multiple stakeholder perspectives.
http://arxiv.org/abs/2405.12930v1
Compressor summary: Pytorch-Wildlife is an open-source platform that simplifies AI model development for wildlife monitoring using PyTorch, overcoming technical and interdisciplinary barriers.
http://arxiv.org/abs/2405.12929v1
Compressor summary: The study evaluates how large language models perform in code-mixed settings for sentiment analysis and offensive language detection tasks, finding that bilingual and multilingual specialized models are the most successful.
http://arxiv.org/abs/2405.12927v1
Compressor summary: Image registration problem in machine vision involves aligning images to subpixel accuracy using methods that depend on function complexity, pixel size, and sampling count.
http://arxiv.org/abs/2405.12926v1
Compressor summary: The paper proposes a multi-objective optimization method to balance fairness and data quality in bias mitigation techniques, allowing users to choose the best subset for their application.
http://arxiv.org/abs/2405.12915v1
Compressor summary: The paper proposes a gradient-based method to automatically select high-quality and diverse instruction finetuning data for machine translation by clustering on gradients and resampling.
http://arxiv.org/abs/2405.12914v1
Compressor summary: The paper proposes a three-stage training pipeline to integrate Large Language Models (LLMs) into text-to-image generation, improving language understanding and image quality.
http://arxiv.org/abs/2405.12910v1
Compressor summary: The paper presents a novel taxonomy for topic modelling summary judgment cases in the UK using AI, revealing patterns in their application across legal domains and demonstrating the potential of combining traditional and AI-driven approaches in legal classification.
http://arxiv.org/abs/2405.12900v1
Compressor summary: The study introduces a new training algorithm (ADPO) for open-domain dialogue systems that reduces toxicity by generating preferred and unsafe responses using a toxic control token and improves performance and stability compared to traditional methods.
http://arxiv.org/abs/2405.12891v1
Compressor summary: The paper presents a fast and effective image enhancement method for low-light conditions using machine learning and CNNs, which improves clarity and color accuracy without over-enhancement or unnatural colors.
http://arxiv.org/abs/2405.12888v1
Compressor summary: The paper explores conservation laws in non-Euclidean geometries and momentum-based dynamics for neural network training, finding temporal dependence and loss of conservation laws when transitioning from gradient flows to momentum dynamics.
http://arxiv.org/abs/2405.12884v1
Compressor summary: The paper studies persuasive techniques in Arabic social media using Pre-trained Language Models (PLMs) and finds that fine-tuning achieves the best results, while few-shot learning can improve the GPT model's performance.
http://arxiv.org/abs/2405.12875v1
Compressor summary: The paper proposes a probabilistic diffusion model to generate descriptive language for semantic changes in bi-temporal remote sensing images, addressing the pixel problem and enhancing terrain change localization accuracy.
http://arxiv.org/abs/2405.12868v1
Compressor summary: This paper proposes Equivariant Spatio-Temporal Attentive Graph Networks (ESTAG), a new model that uses a novel Equivariant Discrete Fourier Transform (EDFT) to extract periodic patterns from past frames and improve the dynamics simulation of physical systems.
http://arxiv.org/abs/2405.12864v1
Compressor summary: The paper proposes a method to synthetically augment datasets with spatially varying distortions and evaluates its impact on semantic segmentation models' performance.
http://arxiv.org/abs/2405.12862v1
Compressor summary: The study examines how different ethical frameworks affect an agent's goal formulation and planning, and the importance of metacognitive judgments in resolving ethical conflicts.
http://arxiv.org/abs/2405.12861v1
Compressor summary: The paper presents a dataset and evaluates how different levels of water droplet contamination on transparent objects affect computer vision tasks like segmentation.
http://arxiv.org/abs/2405.12850v1
Compressor summary: The paper proposes a new algorithm for aligning CT and MRI images to improve cervical cancer diagnosis and treatment by using weakly supervised registration.
http://arxiv.org/abs/2405.12833v1
Compressor summary: The text discusses recent advances in deep learning-based methods for generating radiology reports from multi-modal data, summarizes key techniques, and proposes a general workflow with five components.
http://arxiv.org/abs/2405.12832v1
Compressor summary: Wav-KAN is a new neural network architecture that uses wavelet functions to improve interpretability, speed, robustness, and performance over traditional MLPs and Spl-KAN.
http://arxiv.org/abs/2405.12819v1
Compressor summary: This study investigates how large language models (LLMs) are used in natural language processing (NLP), their current achievements and future possibilities, and provides a taxonomy and challenges for LLMs in NLP.
http://arxiv.org/abs/2405.12807v1
Compressor summary: The paper improves the Adam optimizer by correcting its flaws and refining it with insights from information geometry, leading to better performance in various domains.
http://arxiv.org/abs/2405.12806v1
Compressor summary: The MOSS framework uses kinematic information to create realistic motion-aware 3D clothed human reconstructions from single-view monocular videos.
http://arxiv.org/abs/2405.12802v1
Compressor summary: The paper proposes a method to infer plate rigidity and other properties using physics-informed Gaussian Processes and Bayesian inference from noisy measurements, with potential applications in structural health monitoring and uncertainty quantification.
http://arxiv.org/abs/2405.12801v1
Compressor summary: The CMC framework compares a query and multiple candidates using self-attention layers, improving reranking performance while being scalable and lightweight.
http://arxiv.org/abs/2405.12796v1
Compressor summary: DisenStudio is a novel framework that generates text-guided videos for multiple customized subjects using a diffusion-based model enhanced with spatial-disentangled cross-attention and motion-preserved disentangled finetuning.
http://arxiv.org/abs/2405.12791v1
Compressor summary: The paper proposes a new method for medical image registration that adapts boundary conditions based on flow fields, improving accuracy in two registration tasks.
http://arxiv.org/abs/2405.12789v1
Compressor summary: Key points: - The paper proposes a method to predict object state changes in images and videos based on visual and linguistic cues - The method uses the Ego4D dataset and introduces a new annotation data (Ego4D-OSCA) for this task - The method shows good performance on predicting object state changes in dynamic scenarios Summary: The paper presents a novel framework that leverages visual and linguistic features to anticipate object state changes in images and videos, using a large dataset and a new annotation data.
http://arxiv.org/abs/2405.12788v1
Compressor summary: Non-autoregressive translation methods have improved but still lag behind autoregressive ones in terms of quality and reliability.
http://arxiv.org/abs/2405.12785v1
Compressor summary: Predictive Maintenance (PdM) using AI is crucial for improving steel industry efficiency and sustainability, but faces challenges in practical implementation and reproducibility.
http://arxiv.org/abs/2405.12784v1
Compressor summary: The authors propose a method to insert synthetic polyps into endoscopic images using inpainting, and improve pseudo-masks with a guided refinement network and data augmentation, resulting in better polyp segmentation performance.
http://arxiv.org/abs/2405.12781v1
Compressor summary: SwinFUSE uses multi-modal pre-training to enhance 3D medical imaging segmentation by learning from CT and MRI, improving adaptability and generalizability.
http://arxiv.org/abs/2405.12779v1
Compressor summary: The text reviews how the Transformer model, which excels at natural language processing, can be applied to improve tactile perception tasks like object recognition and manipulation.
http://arxiv.org/abs/2405.12774v1
Compressor summary: The paper proposes a blind separation method for vibration sources from rotating machinery, enabling early detection of gear-related and bearing faults using dilated CNNs and whitening-based deconvolution.
http://arxiv.org/abs/2405.12759v1
Compressor summary: The authors propose a method that combines gated imaging with stereo HDR RCCB cameras to capture accurate depth information using low-cost sensors and flood-illumination, outperforming existing methods for long ranges.
http://arxiv.org/abs/2405.12757v1
Compressor summary: BIMM is a framework that mimics the human brain's visual pathway to learn comprehensive video representations using masked modeling and partial parameter sharing.
http://arxiv.org/abs/2405.12756v1
Compressor summary: Ordinal regression is a method to classify ordinal data using a one-dimensional transformation of the explanatory variable and optimal threshold labeling, which can be learned efficiently with parallel processing.
http://arxiv.org/abs/2405.12755v1
Compressor summary: This paper investigates grokking in real-world datasets using deep neural networks and introduces three new progress measures related to generalization that better explain grokking than weight norms.
http://arxiv.org/abs/2405.12752v1
Compressor summary: The paper proposes C3L, a new method for generating VLIT data using contrastive learning and a content relevance module to improve the quality and content match between image instructions and generated images.
http://arxiv.org/abs/2405.12744v1
Compressor summary: The study investigates how multilingual language models (MLMs) revise cultural values during fine-tuning when exposed to new linguistic experience from different data sources and languages.
http://arxiv.org/abs/2405.12742v1
Compressor summary: The paper proposes Multi-Subject Personalization (MSP) to improve the quality and coherence of text-to-image models for creative story illustrations with multiple characters or objects.
http://arxiv.org/abs/2405.12739v1
Compressor summary: The paper proposes Sequential Preference Optimization (SPO), a method to fine-tune large language models to align with multiple aspects of human preferences, without using explicit reward models.
http://arxiv.org/abs/2405.12736v1
Compressor summary: The paper presents a method to predict how rain and fog affect pedestrian detection by radar and lidar sensors using empirical data, improving upon existing models.
http://arxiv.org/abs/2405.12728v1
Compressor summary: The paper presents a novel method for estimating the 6D pose of an unknown spacecraft relative to a monocular camera using a Neural Radiance Field trained on sparse real-world images.
http://arxiv.org/abs/2405.12724v1
Compressor summary: RemoCap is a method that uses spatial disentanglement and motion disentanglement to accurately reconstruct 3D human bodies from realistic motion sequences despite occlusions.
http://arxiv.org/abs/2405.12721v1
Compressor summary: The paper proposes StarLKNet, a large kernel convolution-based network for palm-vein identification, which improves security and convenience using Mixup and achieves superior performance on two public datasets.
http://arxiv.org/abs/2405.12716v1
Compressor summary: The MAPDES simulator shows that using renewables and P2P energy trading can save costs and reduce demand for dairy farms.
http://arxiv.org/abs/2405.12713v1
Compressor summary: DIAN is a novel network that uses dynamic identity-guided attention to mine modality-consistent embeddings for visible-infrared person re-identification, achieving state-of-the-art performance.
http://arxiv.org/abs/2405.12711v1
Compressor summary: The study proposes a machine learning approach to recognize micro activities of the Otago Exercise Program for older adults, using a Transformer encoder and a Temporal Convolutional Network, which can help monitor exercise intensity and difficulty automatically.
http://arxiv.org/abs/2405.12710v1
Compressor summary: GLSCL is a fast and effective text-video retrieval method that uses latent shared semantics across modalities, global and local interactions, and inter-consistency and intra-diversity losses to achieve high performance and efficiency.
http://arxiv.org/abs/2405.12708v1
Compressor summary: The article describes a method to detect crowd anomalies from video data using pattern recognition and segmentation, which can help in decision-making for sectors like tourism and security.
http://arxiv.org/abs/2405.12705v1
Compressor summary: This paper proposes a multimodal early exit model design that balances performance and efficiency for visually-rich document understanding tasks, achieving faster speeds with similar accuracy to traditional models.
http://arxiv.org/abs/2405.12701v1
Compressor summary: The paper introduces MedLFQA, a benchmark dataset for evaluating factuality in long medical questions and answers, and proposes OLAPH, a framework to improve factuality using sampling predictions and preference optimization.
http://arxiv.org/abs/2405.12695v1
Compressor summary: The paper presents a new signature verifier that is easy to understand and performs well on public datasets, aiming to improve its use in forensic science and law.
http://arxiv.org/abs/2405.12689v1
Compressor summary: The paper proposes a method to detect AI-paraphrased text spans in a text by scoring each sentence based on its paraphrasing degree and evaluates it on a new dataset called PASTED.
http://arxiv.org/abs/2405.12681v1
Compressor summary: The paper presents a multimodal transformer-based detector for precise autonomous UAV landing and a DQN-based decision-making model for adaptive behaviour in outdoor scenarios.
http://arxiv.org/abs/2405.12676v1
Compressor summary: The text describes wrinkle defects in industrial products, presents a meso-mechanical modeling method to assess their stiffness, and compares two non-destructive testing methods (Shearography and FPP) for measuring their displacement responses.
http://arxiv.org/abs/2405.12669v1
Compressor summary: The paper provides an extensive overview of 99 multi-modal machine translation studies, analyzing factors affecting performance and discussing future directions.
http://arxiv.org/abs/2405.12661v1
Compressor summary: EmoEdit is a novel framework that uses psychological insights to modify images with content changes as well as color and style adjustments, enhancing emotional impact while preserving image composition.
http://arxiv.org/abs/2405.12658v1
Compressor summary: The authors propose a method to detect out-of-distribution inputs in neural networks by measuring extreme activation values in the penultimate layer, which serves as a proxy for overconfidence.
http://arxiv.org/abs/2405.12656v1
Compressor summary: The paper proposes a new task and framework to improve extrapolation in large language models using structured knowledge graphs and retrieval-augmentation, addressing hallucination and expensive training costs.
http://arxiv.org/abs/2405.12654v1
Compressor summary: The paper proposes using class expressions from description logic to explain node classification in graph-structured data more expressively and compares two scoring functions to identify the best explanation among multiple candidates.
http://arxiv.org/abs/2405.12648v1
Compressor summary: CooK is a model that improves scene graph generation by incorporating co-occurrence knowledge between objects and addressing the long-tail problem in the training dataset.
http://arxiv.org/abs/2405.12646v1
Compressor summary: The paper introduces a new efficient algorithm for estimating camera pose using intersections of a hyperbola and the unit circle, with simplified solutions for specific scenarios.
http://arxiv.org/abs/2405.12638v1
Compressor summary: This paper presents a new neural network method for analyzing rough surface lubrication that adapts to different frequency components, improving on existing methods and offering a more accurate and efficient tool for this application.
http://arxiv.org/abs/2405.12633v1
Compressor summary: Haar Cascade is a simple and low-cost algorithm for face detection in images and videos that can be used with OpenCV2 and NVIDIA Jetson Nano for efficient attendance tracking.
http://arxiv.org/abs/2405.12630v1
Compressor summary: This paper compares masked language modeling (MLM) and causal language modeling (CLM) for text generation tasks and shows that MLM consistently generates better texts, but the quality of the generated texts does not strongly correlate with their performance in downstream tasks.
http://arxiv.org/abs/2405.12621v1
Compressor summary: ToM modelling is less important for effective collaboration in dialogue-based CPA tasks than previously thought, as performance improves more by predicting one's own missing knowledge.
http://arxiv.org/abs/2405.12619v1
Compressor summary: MentalQA is an Arabic dataset with conversational Q&A on mental health, annotated using a rigorous schema, to support text mining tools for diagnosis and treatment.
http://arxiv.org/abs/2405.12617v1
Compressor summary: The paper proposes a method to measure the "intelligent" behaviors of large language models (LLMs) by comparing entropy reduction at different levels of abstraction, and shows its effectiveness and novel insights on emergent patterns.
http://arxiv.org/abs/2405.12615v1
Compressor summary: The paper introduces Object-Oriented CDM (OOCDM), a model that learns causal dependencies among objects in large-scale environments, and shows its advantages over existing CDMs in various aspects.
http://arxiv.org/abs/2405.12612v1
Compressor summary: Key points: - High quality dataset of 70k prompt-response pairs in 74 languages - Train open source English LLM to chat multilingually - Multilingual model outperforms previous state-of-the-art models - More multilingual data helps performance in target language (Japanese) Summary: The authors create a high quality dataset of multilingual prompt-response pairs and use it to train an open source English LLM that chats better than previous models, especially in Japanese.
http://arxiv.org/abs/2405.12607v1
Compressor summary: S3O is a new method that learns parametric models of dynamic objects from monocular videos without extra annotations, using a phased approach for efficient and robust 3D reconstruction.
http://arxiv.org/abs/2405.12604v1
Compressor summary: The paper proposes a sentinel model that reduces toxicity in responses from large language models by adding a few tokens to the input prompt, using an interleaved training method with PPO to improve both red team and sentinel models.
http://arxiv.org/abs/2405.12601v1
Compressor summary: The paper proposes a method, FFAM, to generate high-quality visual explanations for LiDAR-based 3D object detection models using non-negative matrix factorization and feature gradient refinement.
http://arxiv.org/abs/2405.12591v1
Compressor summary: DecoQuant is a data-free technique that compresses the key-value cache of large language models by using low-bit quantization on tensor decomposition, achieving significant memory savings and maintaining high inference quality.
http://arxiv.org/abs/2405.12579v1
Compressor summary: This paper proposes a self-instruction based fine-tuning method for fact-checking that balances accuracy and explainability using data augmentation and improved DPO, achieving comparable or better results than traditional methods while ensuring data security.
http://arxiv.org/abs/2405.12556v1
Compressor summary: The study explores how different feature vector splitting strategies and combinations improve biometric signature recognition in e-security applications by considering cognitive principles.
http://arxiv.org/abs/2405.12543v1
Compressor summary: The paper proposes a method called BiKop to improve few-shot learning by leveraging both textual and visual knowledge in a hierarchical representation that balances generalization and specificity.
http://arxiv.org/abs/2405.12541v1
Compressor summary: DrHouse is a novel LLM-based virtual doctor system that uses smart device data, updates medical databases, and evaluates diseases' likelihood for more accurate diagnoses.
http://arxiv.org/abs/2405.12540v1
Compressor summary: The LMR approach uses Large Language Models to improve video context representation, align visual and language modalities, and achieve state-of-the-art results in Video Moment Retrieval, especially for complex queries.
http://arxiv.org/abs/2405.12538v1
Compressor summary: The text proposes a framework to improve visual content generation by incorporating various knowledge sources and iteratively refining the output to better align with user intentions.
http://arxiv.org/abs/2405.12533v1
Compressor summary: The paper introduces a new multi-task Urdu scene text dataset for text detection, recognition, and VQA tasks, addressing the challenges of diverse text layouts and orientations.
http://arxiv.org/abs/2405.12532v1
Compressor summary: PyramidInfer compresses the key-value cache of large language models by retaining crucial context layer by layer, improving performance and reducing memory consumption.
http://arxiv.org/abs/2405.12531v1
Compressor summary: The paper proposes CustomText, a method that enhances image generation with precise text customization using a TextDiffuser model and a ControlNet model.
http://arxiv.org/abs/2405.12528v1
Compressor summary: SirLLM is a system that helps large language models process long inputs and maintain memory during infinite-length dialogues using token entropy and memory decay.
http://arxiv.org/abs/2405.12523v1
Compressor summary: The text proposes a method called Single Image Unlearning (SIU) for forgetting visual data in Multimodal Large Language Models (MLLMs) by fine-tuning a single image, introduces new evaluation metrics, and shows its effectiveness against attacks.
http://arxiv.org/abs/2405.12522v1
Compressor summary: The paper presents a method using discrete sparse autoencodters to efficiently discover interpretable circuits in large language models by training on positive and negative examples and measuring attention head overlaps.
http://arxiv.org/abs/2405.12521v1
Compressor summary: GNN-Diff is a framework that generates high-performing graph neural networks by learning from checkpoints during a light coarse search, reducing computational costs and improving generalization accuracy.
http://arxiv.org/abs/2405.12519v1
Compressor summary: MAGE is a novel motif-based approach to generate interpretable explanations for molecular graphs using attention, motif decomposition, and graph generation techniques.
http://arxiv.org/abs/2405.12512v1
Compressor summary: The paper proposes a new optical flow estimation method that combines apparent and kinetics information, improves efficiency, considers warping and occlusion, and uses a self-supervised loss function to achieve better performance than existing methods.
http://arxiv.org/abs/2405.12509v1
Compressor summary: The paper proposes a method to improve active object detection by using informed priors about possible interactions and knowledge distillation.
http://arxiv.org/abs/2405.12505v1
Compressor summary: Key points: - The paper proposes a new framework (NOVA-3D) for reconstructing anime characters from non-overlapped views - The framework uses view-aware feature fusion and synthesis to learn 3D features effectively - The paper also collects a new dataset (NOVA-Human) with multi-view images and camera parameters for 3D anime characters - The method outperforms baseline approaches and achieves high quality reconstruction results Summary: The paper presents NOVA-3D, a novel framework that can reconstruct full-body anime characters from non-overlapped front and back views using view-aware feature fusion and synthesis. The paper also introduces a new dataset (NOVA-Human) for this task and shows superior performance over baselines.
http://arxiv.org/abs/2405.12503v1
Compressor summary: CLRKDNet is a streamlined model that balances lane detection accuracy with real-time performance using teacher-student distillation and new distillation losses, reducing inference time by up to 60% compared to the state-of-the-art CLRNet.
http://arxiv.org/abs/2405.12502v1
Compressor summary: The text proposes a zero-label entropy metric for detecting outliers in unlabeled contaminated datasets, enabling efficient training of deep outlier detection models with robust performance.
http://arxiv.org/abs/2405.12500v1
Compressor summary: The entropic associative memory model can effectively process complex and unconventional images of animals and vehicles, generating meaningful associations between them.
http://arxiv.org/abs/2405.12493v1
Compressor summary: The paper studies the complex loss landscapes of deep neural networks and categorizes different types of 1D and 2D curves that represent perturbation directions and surfaces, also providing theoretical insights using the Hessian matrix.
http://arxiv.org/abs/2405.12490v1
Compressor summary: The text proposes a new image editing framework that enables users to customize effects with few image pairs using directional transformations and diffusion models, improving performance across different scenarios.
http://arxiv.org/abs/2405.12487v1
Compressor summary: The 3D-Spectral-Spatial Mamba (3DSS-Mamba) framework uses a novel scanning mechanism to model global spectral-spatial relationships for hyperspectral image classification with improved efficiency and performance.
http://arxiv.org/abs/2405.12477v1
Compressor summary: The HUGS framework uses semantic priors and high-frequency features to improve 3D human reconstruction by capturing geometric topology and surface details.
http://arxiv.org/abs/2405.12476v1
Compressor summary: FishPhenoKey is a large dataset with detailed annotations for measuring subtle morphological phenotypes in fish, along with new evaluation and loss functions to improve keypoint detection accuracy.
http://arxiv.org/abs/2405.12475v1
Compressor summary: GASE is a learning-based method that uses graph attention sampling with edge fusion to improve node embedding for vehicle routing problems, achieving better performance than existing methods.
http://arxiv.org/abs/2405.12474v1
Compressor summary: The paper introduces UniFilter, a novel adaptive polynomial filter for Graph Neural Networks that accommodates varying degrees of heterophily and improves convolution and propagation.
http://arxiv.org/abs/2405.12468v1
Compressor summary: The authors create a large dataset of diverse dialogue state tracking data using synthetic data generation and show that it improves zero-shot DST accuracy.
http://arxiv.org/abs/2405.12465v1
Compressor summary: The text proposes a new physics-informed operator learning framework that predicts spatiotemporal dynamics using finite element method and shows its effectiveness in a thermal conduction problem with arbitrary geometry.
http://arxiv.org/abs/2405.12462v1
Compressor summary: The article introduces a new design for Transformer-based models that improves efficiency and accuracy for long sequence time series forecasting by replacing some components with Surrogate Attention Blocks and Surrogate FFN Blocks.
http://arxiv.org/abs/2405.12461v1
Compressor summary: WorldAfford is a framework that uses natural language instructions to locate affordance regions of multiple objects in complex scenes, overcoming limitations of previous approaches.
http://arxiv.org/abs/2405.12460v1
Compressor summary: The text describes a physics-based approach to automatically generate realistic scene layouts for 3D animation that considers physical constraints and interaction affordances using reinforcement learning.
http://arxiv.org/abs/2405.12459v1
Compressor summary: The proposed PLM4Traj model uses pre-trained language models to effectively learn trajectory features and adapt to different spatio-temporal data mining tasks.
http://arxiv.org/abs/2405.12452v1
Compressor summary: Spatio-Temporal Graph Prompting (STGP) is a framework for adapting spatio-temporal graph neural networks to diverse tasks and domains using a unified template and learnable prompts.
http://arxiv.org/abs/2405.12447v1
Compressor summary: The paper proposes a method to improve face recognition by adaptively updating prototypes using sample feature similarity and adjustable margins, which helps counter the effect of hard samples on the model performance.
http://arxiv.org/abs/2405.12443v1
Compressor summary: The FFL algorithm improves neural network training by optimizing label processing, revising label integration, and introducing feedback loops that enhance learning performance and efficiency.
http://arxiv.org/abs/2405.12439v1
Compressor summary: The paper studies interactive optimization of natural-concave functions using stochastic and adversarial feedback, proving near-optimal regret bounds for some settings and showing impossibility results for others.
http://arxiv.org/abs/2405.12434v1
Compressor summary: ScenaFuse is a novel adapter for natural language inference that combines linguistic knowledge with visual information to enhance understanding and inference in ambiguous language.
http://arxiv.org/abs/2405.12433v1
Compressor summary: The proposed approach combines logical reasoning, classical AI planning, and an LLM to answer user queries accurately and handle missing information in API orchestration tasks.
http://arxiv.org/abs/2405.12427v1
Compressor summary: The paper explores using Deep Learning to improve wireless communication for IoT devices by estimating channels from noisy signal strength measurements.