This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-25 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.16288v1
Compressor summary: A simple, self-supervised approach to tracking any point uses a global matching transformer with attention-based cycles to find consistent tracks through video via random walks on a space-time graph, achieving strong performance and avoiding complexities of other methods.
http://arxiv.org/abs/2409.16280v1
Compressor summary: The paper proposes a simple method to use one transformer for both text and visual generation, achieving comparable performance to existing methods.
http://arxiv.org/abs/2409.16271v1
Compressor summary: The AIM 2024 UHD-IQA Challenge is a competition to develop No-Reference Image Quality Assessment models for high-resolution photos, with winners determined by performance, accuracy, and computational efficiency.
http://arxiv.org/abs/2409.16261v1
Compressor summary: The paper introduces a new dataset and method to improve large multimodal models' ability to describe changes between bi-temporal remote sensing images.
http://arxiv.org/abs/2409.16253v1
Compressor summary: The paper proposes a method to train experts for legacy devices in the context of learning with abstention, using Bayes-optimal rejection rules and surrogate losses.
http://arxiv.org/abs/2409.16243v1
Compressor summary: The authors propose a new tagging scheme for identifying discontinuous named entities using a weighted finite state automaton that guarantees correctness and simplicity.
http://arxiv.org/abs/2409.16241v1
Compressor summary: This study explores how large language models like GPT4 can spread misinformation in simulated social media environments and highlights the need for ethical safeguards.
http://arxiv.org/abs/2409.16239v1
Compressor summary: LADD enhances dataset distillation by adding label augmentations to synthetic images, improving training efficiency and performance.
http://arxiv.org/abs/2409.16238v1
Compressor summary: SPECTRUM is a scalable framework for learning accurate and explainable logical models from relational data using rule utility and linear-time mining of recurrent structures.
http://arxiv.org/abs/2409.16235v1
Compressor summary: The EuroLLM project develops open-weight multilingual LLMs for various European languages and releases initial models with good performance on multilingual tasks.
http://arxiv.org/abs/2409.16231v1
Compressor summary: The paper uses advanced machine learning and transformer techniques to predict cognitive decline in MCI patients using metabolomics data, showing their potential for early detection and intervention in Alzheimer's disease, and highlighting the importance of non-invasive biomarkers.
http://arxiv.org/abs/2409.16223v1
Compressor summary: This paper investigates why fine-tuning a model causes accuracy issues with unseen classes, and proposes post-processing calibration as a simple solution to restore the model's performance and reveal improved features for all classes.
http://arxiv.org/abs/2409.16218v1
Compressor summary: PoAC is an AutoML framework that adapts to diverse clustering tasks by using a surrogate model trained on previous clustering solutions, enabling algorithm-agnostic and customizable clustering pipelines.
http://arxiv.org/abs/2409.16213v1
Compressor summary: The authors propose an XAI computer vision pipeline to evaluate precision spraying systems post-spraying using semantic segmentation and class-specific spray deposition estimation.
http://arxiv.org/abs/2409.16211v1
Compressor summary: The study introduces an improved VQGAN and an embedding-free image generation network using bit tokens, achieving high performance and state-of-the-art results on the ImageNet 256x256 benchmark.
http://arxiv.org/abs/2409.16209v1
Compressor summary: LLMCount uses large-language models to improve crowd detection in millimeter wave sensing by compensating signal power and achieving higher accuracy with less latency.
http://arxiv.org/abs/2409.16202v1
Compressor summary: CJEval is a new benchmark for evaluating Large Language Models' performance in educational applications, covering Chinese Junior High School Exams with detailed annotations.
http://arxiv.org/abs/2409.16198v1
Compressor summary: The paper proposes AiRTran, a method to select the best PLM for text ranking by estimating its expected rank, which captures the model's ranking performance better than previous methods.
http://arxiv.org/abs/2409.16197v1
Compressor summary: Key points: - The text is about developing algorithms for contextual bandits with function approximation - The optimism principle based algorithms, such as optimistic least squares, have regret that scales with some factors - The paper proposes new algorithms that have better regret bounds when the variances of the rewards are unknown Summary: The paper presents novel algorithms for contextual bandits with function approximation that achieve improved regret bounds when the reward variances are unknown and vary over time.
http://arxiv.org/abs/2409.16191v1
Compressor summary: HelloBench is a benchmark for evaluating large language models' long text generation capabilities using five subtasks and a human-aligned evaluation method called HelloEval.
http://arxiv.org/abs/2409.16183v1
Compressor summary: RadFound is a vision-language foundation model tailored for radiology tasks that uses an extensive dataset to perform better than existing models on various multimodal interpretation and generation tasks.
http://arxiv.org/abs/2409.16178v1
Compressor summary: SDFit is a novel framework that uses learned signed-distance-function models as a strong shape prior to recover 3D object pose and shape from single images, refining estimates through an iterative render-and-compare process.
http://arxiv.org/abs/2409.16167v1
Compressor summary: The paper introduces Minimal Semantic Units (MSUs) for fine-tuning large language models by disassembling and reassembling multiple Low-Rank Adaptations (LoRAs), improving performance with the LoRA-LEGO framework.
http://arxiv.org/abs/2409.16165v1
Compressor summary: EnIGMA is a language model agent for solving Capture The Flag challenges with new Agent-Computer Interfaces and interactive command-line utilities, achieving state-of-the-art results on two benchmarks.
http://arxiv.org/abs/2409.16160v1
Compressor summary: MIMO is a novel framework that can generate realistic character videos with controllable attributes using spatial decomposition and encoded signals, achieving scalability, generality, and applicability in various scenarios.
http://arxiv.org/abs/2409.16159v1
Compressor summary: The text describes a pipeline that uses Vision-Language Models to generate informative captions for comic panels, along with a new metric and test set to evaluate them.
http://arxiv.org/abs/2409.16146v1
Compressor summary: The paper proposes a counterfactual prompting framework for RAG models to improve predictive uncertainty and control risks in their predictions.
http://arxiv.org/abs/2409.16143v1
Compressor summary: The paper presents a dataset of faces seen in random objects, studies human and machine face detection, and proposes a statistical model of pareidolia.
http://arxiv.org/abs/2409.16136v1
Compressor summary: The paper proposes a method to improve object detection by highlighting fine-grained attributes in the input text for mainstream OVD models using explicit linear composition, achieving better results.
http://arxiv.org/abs/2409.16133v1
Compressor summary: The text discusses using Item Response Theory in computer-aided language learning to assess learner proficiency efficiently and accurately through adaptive tests and exercise sessions.
http://arxiv.org/abs/2409.16126v1
Compressor summary: VisioPhysioENet is a new system that uses both visual and physiological cues to accurately measure learner engagement in online education.
http://arxiv.org/abs/2409.16125v1
Compressor summary: The text proposes two methods for estimating AI agent capabilities on rare tasks, but both introduce bias and underestimate performance.
http://arxiv.org/abs/2409.16118v1
Compressor summary: TabEBM is a novel generative method that creates distinct EBM models for each class to generate high-quality synthetic data for data augmentation, which improves classification performance.
http://arxiv.org/abs/2409.16112v1
Compressor summary: This work shows how to interpret self-attention in transformer models as a derivative of local energy terms, enabling a recurrent model without backpropagation that can learn from train and test examples.
http://arxiv.org/abs/2409.16099v1
Compressor summary: The text discusses the challenges of drone detection using RGB cameras and proposes a new model that combines neuromorphic and RGB data to improve detection performance.
http://arxiv.org/abs/2409.16098v1
Compressor summary: The paper proposes a platform that uses AI and Reinforcement Learning to optimize mobile health applications for various use cases, including supply chain, patient management, and capacity building, with potential benefits for resource-poor settings and efficiency improvements in general.
http://arxiv.org/abs/2409.16096v1
Compressor summary: HINTQA is a novel method for preparing context for QA systems by using LLMs to generate hints about potential answers, outperforming retrieval-based and generation-based approaches.
http://arxiv.org/abs/2409.16089v1
Compressor summary: The study proposes a chatbot that explains how face recognition works by combining explainable AI and natural language processing techniques, enhancing its interpretability without sacrificing accuracy.
http://arxiv.org/abs/2409.16086v1
Compressor summary: The paper studies how different hyperparameters in neural networks affect their simplicity and stability when using the MNIST dataset.
http://arxiv.org/abs/2409.16084v1
Compressor summary: The paper introduces MM-CamObj, a new dataset for camouflaged objects in visual-language tasks, and CamObj-Llava, an LVLM model that uses curriculum learning to improve its performance on these tasks.
http://arxiv.org/abs/2409.16082v1
Compressor summary: GS-Net uses global self-attention to improve multi-stage glaucoma classification from retinal fundus images, outperforming existing methods.
http://arxiv.org/abs/2409.16073v1
Compressor summary: The paper proposes a method to train an object detector that can detect novel objects and extract rich features in open-world scenarios using Vision Foundation Models, improving robustness and generalizability for various applications.
http://arxiv.org/abs/2409.16071v1
Compressor summary: The text discusses soft label learning, which considers uncertainty in class labels, and shows its potential benefits for classification models, especially when dealing with noisy or limited data.
http://arxiv.org/abs/2409.16069v1
Compressor summary: Key points: - Solar PV modules need monitoring for defects to maintain efficiency and environmental impact - Computer vision offers automatic, non-destructive and cost-effective way to detect defects - Existing approaches use deep learning-based methods, mainly convolutional neural networks - Interpretability analysis reveals focus on darker regions of images - Gaps include geometric, physics-based and interpretability aspects of models Summary: The text reviews computer vision techniques for monitoring solar PV module defects, which affect efficiency and environmental benefit. It highlights the use of deep learning, especially convolutional neural networks, and the need to address gaps in geometric, physics-based and interpretability aspects of models.
http://arxiv.org/abs/2409.16063v1
Compressor summary: The study presents a benchmark for assessing the robustness of endoscopic depth estimation models, introducing the Depth Estimation Robustness Score (DERS) to improve surgical precision and patient safety.
http://arxiv.org/abs/2409.16058v1
Compressor summary: The text proposes a deep learning method to model and generate synthetic aortic shapes using neural signed distance fields and trainable embedding vectors, trained on a dataset of aortic root meshes from CT images.
http://arxiv.org/abs/2409.16057v1
Compressor summary: The paper proposes a tailored framework to detect and remove backdoors in object detection models by analyzing inconsistencies between their modules.
http://arxiv.org/abs/2409.16045v1
Compressor summary: LTN is a framework that combines deep learning and logical reasoning using fuzzy logic, allowing neural models to be optimized by minimizing a loss function based on logical formulas.
http://arxiv.org/abs/2409.16040v1
Compressor summary: Time-MoE is a scalable and efficient deep learning architecture for time series forecasting that leverages mixture-of-experts and large-scale pre-training on Time-300B data, achieving better precision and reducing costs.
http://arxiv.org/abs/2409.16025v1
Compressor summary: MCPQA is a new task that involves answering product questions from different markets and languages using information from other marketplaces, and a large dataset (McMarket) with over 7 million questions across 11 languages was created to evaluate this task.
http://arxiv.org/abs/2409.16024v1
Compressor summary: The paper proposes a novel method for building language-conditioned agents by first finding an environment configuration that matches the desired task description, then using a goal-conditioned policy to reach it, improving speed and quality with distilled models and multiple viewpoints.
http://arxiv.org/abs/2409.16022v1
Compressor summary: This text investigates whether large language models are affected by the threshold priming effect, a cognitive bias that influences relevance judgments, and suggests considering human-like biases when designing and evaluating these models.
http://arxiv.org/abs/2409.16005v1
Compressor summary: This paper proposes a novel training approach to enhance large language models' performance in automatic speech recognition by pre-training them on Pinyin embedding sequences and fine-tuning their parameters.
http://arxiv.org/abs/2409.16002v1
Compressor summary: The text discusses using generative models and image selection methods to create realistic synthetic histopathology image patches for enhancing image classification tasks.
http://arxiv.org/abs/2409.16001v1
Compressor summary: This text discusses how human intelligence, evolved over time, now interacts with artificial intelligence, shaping its development and ethical considerations for future advancements in AI.
http://arxiv.org/abs/2409.15997v1
Compressor summary: The report describes modifications to SDXL for creating high-quality anime images with NovelAI Diffusion V3.
http://arxiv.org/abs/2409.15986v1
Compressor summary: This study examines how three common anomaly detection metrics behave under different conditions and challenges conventional understanding of their reliability and distinctiveness.
http://arxiv.org/abs/2409.15985v1
Compressor summary: The paper presents a system that converts natural language queries into SQL commands using fine-tuned models and datasets, improving data access for non-experts and achieving high accuracy.
http://arxiv.org/abs/2409.15980v1
Compressor summary: The study presents a low-cost visual anomaly detection system using unsupervised learning models and Raspberry Pi hardware, which achieves high accuracy with minimal data.
http://arxiv.org/abs/2409.15979v1
Compressor summary: The authors propose a finetuning method for large language models to improve their efficiency and accuracy in natural language generation assessment using comparative probabilities.
http://arxiv.org/abs/2409.15973v1
Compressor summary: The text discusses challenges and solutions for running deep learning at the edge, focusing on selective collaborative schemes that reduce data redundancy and improve performance metrics.
http://arxiv.org/abs/2409.15968v1
Compressor summary: Adversarial Backdoor Defense (ABD) is a novel data augmentation strategy that aligns features with adversarial examples to disrupt backdoor associations and provide robust defense against multimodal backdoor attacks targeting CLIP-like models.
http://arxiv.org/abs/2409.15963v1
Compressor summary: The paper introduces two efficient exploration algorithms for Inverse Constrained Reinforcement Learning that reduce error and strategically constrain the exploration policy, and shows their effectiveness in different environments.
http://arxiv.org/abs/2409.15953v1
Compressor summary: The text introduces the Prompt-Aware Counting (PrACo) benchmark to evaluate the ability of vision-and-language models to understand and count objects based on natural language prompts, addressing limitations in current evaluation protocols.
http://arxiv.org/abs/2409.15949v1
Compressor summary: The paper analyzes gender bias in English song lyrics using topic modeling, clustering, and word embedding techniques, finding thematic shifts over time and varying biases across topics and genres.
http://arxiv.org/abs/2409.15950v1
Compressor summary: TSFeatLIME is an explainable AI technique for time series forecasting that uses feature integration and Euclidean distances to improve surrogate model fidelity and user understanding, especially for non-experts.
http://arxiv.org/abs/2409.15939v1
Compressor summary: The paper proposes a non-adversarial self-supervised method for 3D shape completion that leverages correspondences and involves an involutory constraint on the completion function.
http://arxiv.org/abs/2409.15934v1
Compressor summary: The paper introduces a test generation pipeline to evaluate large language models as conversational AI agents, using a new dataset for customer support scenarios.
http://arxiv.org/abs/2409.15933v1
Compressor summary: The paper proposes a framework for zero-shot Named Entity Recognition in Italian using instruction-tuning and demonstrates its effectiveness compared to other models.
http://arxiv.org/abs/2409.15931v1
Compressor summary: The paper proposes a method for registering noninvasive second-harmonic microscopy images with hematoxylin and eosin slides using keypoint matching and deformable registration, achieving good results in alignment and error on a challenge dataset.
http://arxiv.org/abs/2409.15927v1
Compressor summary: The text studies how facial symmetry affects the performance of black box models in recognizing expressions and shows that reduced symmetry lowers output activations for all investigated classifiers.
http://arxiv.org/abs/2409.15924v1
Compressor summary: The article describes Huawei's participation and strategies for translating Spanish into three low-resource languages at WMT 2024.
http://arxiv.org/abs/2409.15922v1
Compressor summary: VLM-based reward models can improve agent performance in sparse reward environments, but only if they use the novel noise-resilient reward function BiMI to avoid false positive rewards.
http://arxiv.org/abs/2409.15919v1
Compressor summary: The paper proposes C3R, a method that learns compact channel correlation representation for LiDAR place recognition, reducing computation and improving accuracy.
http://arxiv.org/abs/2409.15915v1
Compressor summary: The text proposes a novel approach to improve natural language planning with large language models by generating multiple action schemas and ranking them without expert intervention.
http://arxiv.org/abs/2409.15912v1
Compressor summary: SMER is a new feature importance method for word embedding-based models that provides perfect fidelity and better explanations than LIME for predicting impactful research articles.
http://arxiv.org/abs/2409.15911v1
Compressor summary: The paper proposes a new method called Modular Gradient Conflict Mitigation (MGCM) that improves simultaneous speech translation performance and reduces GPU memory consumption by resolving optimization conflicts at a modular level using gradient projection.
http://arxiv.org/abs/2409.15910v1
Compressor summary: The paper describes a new app that lets plants "talk" to humans using soil sensors, AI language models, and real-time data to provide insights on their health and mood, improving plant care and promoting sustainability.
http://arxiv.org/abs/2409.15907v1
Compressor summary: The paper proposes a method to improve LLMs' performance in Text-to-SQL tasks by injecting domain-specific database knowledge, which reduces errors and shows generalizability.
http://arxiv.org/abs/2409.15904v1
Compressor summary: Unimotion is a novel human motion model that enables flexible control of avatar motion with global or local text inputs and provides frame-level text and pose outputs for various applications.
http://arxiv.org/abs/2409.15903v1
Compressor summary: This paper discusses the origins, future, emotions, risks, and singularity of Artificial Intelligence amidst society's controversy and fear.
http://arxiv.org/abs/2409.15902v1
Compressor summary: Konstruktor is an approach that uses structured knowledge graphs to answer simple questions with complex entities, integrating language models and knowledge graphs for entity extraction, relation prediction, and querying the knowledge graph.
http://arxiv.org/abs/2409.15893v1
Compressor summary: Key points: - The paper proposes a new unsupervised domain adaptation method for oracle character recognition - The method considers visual perceptual plausibility and enforces attention consistency and separability - The method outperforms previous methods on Oracle-241 dataset by 8.5% Summary: The paper introduces a novel unsupervised domain adaptation method for oracle character recognition that improves interpretability and performance by considering visual perceptual plausibility and attention consistency and separability, achieving state-of-the-art results on Oracle-241 dataset.
http://arxiv.org/abs/2409.15892v1
Compressor summary: The paper explores how state symmetries affect planning and generalized planning, and evaluates the expressive requirements for learning general policies using different methods.
http://arxiv.org/abs/2409.15890v1
Compressor summary: This paper presents a benchmark to evaluate how well language models mimic human communication using 10 psycholinguistic experiments and human responses.
http://arxiv.org/abs/2409.15887v1
Compressor summary: The proposed framework combines manifold learning with K-means to achieve one-step dimensionality reduction clustering without hyperparameters or class imbalance issues.
http://arxiv.org/abs/2409.15882v1
Compressor summary: The article proposes a new method to anonymize speech by disentangling its components and modifying speaker identity while keeping linguistic and emotional content intact, which works well for preserving emotions but needs improvement for other privacy tasks.
http://arxiv.org/abs/2409.15879v1
Compressor summary: Key points: - HW-TSC submitted to WMT24 Indian Languages MT Shared Task - Employed two knowledge transfer strategies for four low-resource languages: Assamese, Manipuri, Khasi, and Mizo - Achieved impressive BLEU scores using fine-tuning and multilingual model transfer learning Summary: HW-TSC used knowledge transfer techniques to improve machine translation for four low-resource Indian languages, achieving high BLEU scores.
http://arxiv.org/abs/2409.15875v1
Compressor summary: The ZED detector uses a lossless image encoder to measure the surprise of AI-generated images compared to a model of real images, without needing training data or knowledge of generative architectures.
http://arxiv.org/abs/2409.15868v1
Compressor summary: The paper presents a benchmark to assess privacy risks in NLP models, studies the impact of auxiliary data on attacks, and proposes an improved attack method using Knowledge Distillation and a chained framework for multiple attacks.
http://arxiv.org/abs/2409.15867v1
Compressor summary: In-context ensemble learning improves video-language models' ability to generate Standard Operating Procedures from demonstration videos.
http://arxiv.org/abs/2409.15861v1
Compressor summary: The authors propose a zero-shot, open-vocabulary system for dialogue state tracking that integrates domain classification and refines question-answering tasks, achieving better performance on Multi-WOZ 2.1 dataset with fewer LLM API requests.
http://arxiv.org/abs/2409.15848v1
Compressor summary: The paper proposes using visual analytics to guide the creation of synthetic data for text classification, addressing data deficiencies and improving model accuracy.
http://arxiv.org/abs/2409.15843v1
Compressor summary: SAM is an innovative online education platform that uses AI to provide personalized, context-specific assistance and improve learning outcomes.
http://arxiv.org/abs/2409.15834v1
Compressor summary: The paper introduces a large dataset for cephalometric landmark detection and evaluates state-of-the-art deep learning methods, achieving high accuracy with room for improvement.
http://arxiv.org/abs/2409.15827v1
Compressor summary: This study explores how large language models capture linguistic aspects using psycholinguistic paradigms and reveals that specific neurons correspond to different competencies.
http://arxiv.org/abs/2409.15825v1
Compressor summary: The text explores how to effectively fine-tune large language models for question-answering tasks by categorizing and analyzing supervised fine-tuning data based on memorized knowledge.
http://arxiv.org/abs/2409.15820v1
Compressor summary: This paper investigates how large language models (LLMs) adapt to complex tasks with scarce data and proposes a gradient-based method to improve their efficiency and effectiveness using attention patterns.
http://arxiv.org/abs/2409.15817v1
Compressor summary: The text describes how artificial intelligence systems, especially Large Language Models (LLMs), can be used to improve drug discovery by enhancing their accuracy, incorporating external tools, and generating target dossiers.
http://arxiv.org/abs/2409.15815v1
Compressor summary: AsthmaBot is a multi-lingual, multi-modal system that uses large language models and curated documents to provide accurate and interactive asthma support, especially in developing countries with limited medical care access.
http://arxiv.org/abs/2409.15812v1
Compressor summary: Stable Diffusion is fine-tuned using four methods to assist bridge-type innovation, generating new and inspiring designs for human designers.
http://arxiv.org/abs/2409.15810v1
Compressor summary: HyperIPC is a hyperbolic contrastive learning method that improves object classification and few-shot learning for multi-modal data by exploring intra-modal and cross-modal correlations in hyperbolic space.
http://arxiv.org/abs/2409.15806v1
Compressor summary: The paper proposes a pre-training method for state representations that improves performance in reinforcement learning, navigation, and multimodal language models.
http://arxiv.org/abs/2409.15804v1
Compressor summary: The study develops a named-entity recognition model for the fashion and luxury industry, addressing challenges such as entity disambiguation, French jargon, and diverse company structures.
http://arxiv.org/abs/2409.15803v1
Compressor summary: 3D-JEPA is a non-generative 3D self-supervised representation learning framework that uses multi-block sampling and context-aware decoding to improve semantic modeling and efficiency on downstream tasks.
http://arxiv.org/abs/2409.15794v1
Compressor summary: The paper proposes a foundation model for natural gas demand forecasting that leverages contrastive learning and industry-specific fine-tuning, outperforming existing methods in accuracy.
http://arxiv.org/abs/2409.15790v1
Compressor summary: The text is a survey of 59 small language models, analyzing their innovations and performance for on-device tasks.
http://arxiv.org/abs/2409.15781v1
Compressor summary: The proposed method detects unauthorized use of text-to-image models by exploiting their memorization characteristic and identifying consistent behaviors on specific samples.
http://arxiv.org/abs/2409.15771v1
Compressor summary: Foundation models can make competitive forecasts of chaotic systems without explicit re-training or fine-tuning, preserving their long-term behavior even when point forecasts fail.
http://arxiv.org/abs/2409.15766v1
Compressor summary: CHBench is a benchmark dataset to evaluate the performance of large language models on health-related topics in Chinese, revealing their limitations and potential for improvement.
http://arxiv.org/abs/2409.15764v1
Compressor summary: Key points: - ST-MoGE framework for collective multiple-type crime prediction - attentive-gated MGEs to capture diverse and shared crime patterns - CECL to reduce blending and redundancy among experts - HALR to address imbalanced spatial distribution Summary: The paper proposes a novel framework that uses attention, contrastive learning, and loss re-weighting to predict multiple types of crimes with diverse and shared patterns and balanced spatial distribution.
http://arxiv.org/abs/2409.15762v1
Compressor summary: XTRUST is a benchmark to evaluate the trustworthiness of large language models across 10 languages and various topics, revealing their strengths and weaknesses.
http://arxiv.org/abs/2409.15761v1
Compressor summary: The paper presents a new algorithmic framework for training-free guidance in conditional generation, improving performance across different diffusion models and tasks.
http://arxiv.org/abs/2409.15753v1
Compressor summary: The study proposes an AI system that optimizes heparin dosing for ICU patients based on their individual conditions using reinforcement learning, improving safety and efficacy.
http://arxiv.org/abs/2409.15750v1
Compressor summary: The paper surveys generative artificial intelligence (GenAI) applications in the Internet of electric vehicles (IoEV), categorizing them into four layers and providing recommendations for future research.
http://arxiv.org/abs/2409.15749v1
Compressor summary: This paper presents an automated grading system using AI for STEM subjects that evaluates textual answers and diagrams, such as flowcharts, by converting them into textual representations and comparing them with sample answers.
http://arxiv.org/abs/2409.15747v1
Compressor summary: The authors propose a method to make neural networks more interpretable by splitting them into non-interacting clusters that learn different parts of the task.
http://arxiv.org/abs/2409.15740v1
Compressor summary: The text describes using lightweight deep learning techniques on edge servers for real-time pedestrian detection in intelligent transportation systems.
http://arxiv.org/abs/2409.15739v1
Compressor summary: T3-DiffWeather is a novel pipeline that uses a prompt pool to adaptively handle intricate weather degradations and achieve state-of-the-art performance in adverse weather restoration.
http://arxiv.org/abs/2409.15732v1
Compressor summary: Key points: - Multi-speaker challenges in real-world scenarios - Attention-based encoder-decoder method with speaker cluster tokens - Clustering hypotheses by AHC based on edit distance - Effective method for complex 3-mix environments Summary: The paper proposes an attention-based encoder-decoder method that uses speaker cluster tokens and hierarchical clustering to transcribe multi-speaker utterances in real-world scenarios, especially for complex 3-mix environments.
http://arxiv.org/abs/2409.15721v1
Compressor summary: The paper proposes a new version of the Binary-Addition-Tree algorithm that uses incremental learning to adapt to dynamic and large-scale networks, improving efficiency and solution quality.
http://arxiv.org/abs/2409.15715v1
Compressor summary: The paper proposes a new method to improve triplane-based radiance fields for 3D scene disentanglement with better camera pose estimation and faster optimization.
http://arxiv.org/abs/2409.15699v1
Compressor summary: FlexRAG is a novel approach that compresses retrieved contexts into embeddings to enhance question-answering performance while reducing costs.
http://arxiv.org/abs/2409.15698v1
Compressor summary: GraphGI is a novel graph explanation technique that identifies and presents the subgraph with the highest interaction strength to explain GNN predictions using game-theoretic values.
http://arxiv.org/abs/2409.15690v1
Compressor summary: The paper surveys stance detection techniques that automatically detect users' opinions on contentious topics in social media, discussing their benefits and limitations for understanding public sentiment.
http://arxiv.org/abs/2409.15689v1
Compressor summary: The paper introduces a novel compact 3D representation called PPNG that encodes a scene from 2D images and allows real-time rendering on various platforms.
http://arxiv.org/abs/2409.15687v1
Compressor summary: The study evaluates large language models' abilities in mental health tasks using social media data and finds that prompt engineering and few-shot learning improve their accuracy, while highlighting challenges such as dataset variability and ethical concerns.
http://arxiv.org/abs/2409.15682v1
Compressor summary: The paper introduces a framework to address interference in linear contextual bandits, providing algorithms with theoretical guarantees for sublinear regret and other properties.
http://arxiv.org/abs/2409.15680v1
Compressor summary: Key points: - The paper studies online bandit optimization with nonconvex loss functions over a time-varying digraph - It proposes a novel one-point residual feedback algorithm that estimates gradient using two-points and reduces regret bound - It uses dynamic regret to evaluate the algorithm's performance and shows it is comparable to existing algorithms Summary: The paper introduces a new online bandit optimization algorithm that uses one-point residual feedback to estimate gradient and minimize nonconvex loss functions over a time-varying digraph, with reduced regret bound and similar performance to existing methods.
http://arxiv.org/abs/2409.15679v1
Compressor summary: The authors create new datasets for UAV-based pest and disease detection in crops and develop a high-precision object detection model called YOLO-Dense Pest.
http://arxiv.org/abs/2409.15664v1
Compressor summary: ORACLE is a new method to improve cross-lingual sentence embeddings by reducing semantic leakage and enhancing semantic alignment using a novel training objective.
http://arxiv.org/abs/2409.15662v1
Compressor summary: DPA-STIFormer is a new STGNN that extracts dynamic spatial information from stock data using continuous feature changes as tokens and a novel double-path fusion mechanism, achieving state-of-the-art results in stock prediction tasks.
http://arxiv.org/abs/2409.15657v1
Compressor summary: The paper proposes Multimodal Prompt Tuning, a method that finetunes pretrained models on multimodal tasks using visual and textual prompts, improving zero-shot generalization for unseen domains.
http://arxiv.org/abs/2409.15652v1
Compressor summary: The authors propose a new model that uses both Bi-GRU and CNN to automatically identify and filter inappropriate content on social media platforms.
http://arxiv.org/abs/2409.15650v1
Compressor summary: ImPoster is an unsupervised algorithm that generates target images of a source subject performing a driving action using text descriptions, pretrained latent diffusion models, and image frequency guidance.
http://arxiv.org/abs/2409.15647v1
Compressor summary: Looped Transformers with adaptive steps improve the ability of AI models to handle inputs of different lengths, especially for tasks requiring multiple iterations.
http://arxiv.org/abs/2409.15637v1
Compressor summary: Synatra uses indirect knowledge, such as online tutorials, to create direct supervision for large language models, improving their performance on web-based tasks and reducing data costs.
http://arxiv.org/abs/2409.15631v1
Compressor summary: The article proposes a framework to improve learning performance data by augmenting it using tensor factorization and generative AI, addressing data sparsity issues in adaptive learning systems.