This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-01 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.21794v1
Compressor summary: This text summarizes the evolution of out-of-distribution detection and related problems in vision language models, highlights the changes in definitions and benchmarks, and discusses future challenges and directions.
http://arxiv.org/abs/2407.21792v1
Compressor summary: The paper analyzes AI safety benchmarks and their relationship with general capabilities, suggesting that many benchmarks may be misleadingly correlated with capability improvements, and proposing a clearer framework for AI safety research.
http://arxiv.org/abs/2407.21788v1
Compressor summary: The paper explores using vision language models to improve handwriting verification by providing clear explanations and adapting to diverse styles, but finds that CNN-based ResNet-18 performs better.
http://arxiv.org/abs/2407.21787v1
Compressor summary: Repeated inference sampling improves language model performance on various tasks by increasing coverage and cost-effectiveness.
http://arxiv.org/abs/2407.21783v1
Compressor summary: Llama 3 is a multilingual language model with various capabilities that compares well to GPT-4 and can be integrated with other modalities like image, video, and speech.
http://arxiv.org/abs/2407.21778v1
Compressor summary: Tulip agent is an autonomous AI agent that can search for tools in a large library, reducing inference costs and enabling adaptation and extension of its tool set.
http://arxiv.org/abs/2407.21773v1
Compressor summary: RainMamba is a new video deraining method that uses state space models, Hilbert scanning, and dynamic contrastive locality learning to effectively remove rain from outdoor vision systems.
http://arxiv.org/abs/2407.21772v1
Compressor summary: ShieldGemma is a suite of models that use large language models to accurately predict safety risks in user input and generated output, outperforming existing models and providing a valuable resource to the research community.
http://arxiv.org/abs/2407.21771v1
Compressor summary: The paper proposes an algorithm to address text inertia and reduce hallucination in large vision-language models by adjusting attention weights and subtracting logits of multi-modal inputs.
http://arxiv.org/abs/2407.21770v1
Compressor summary: MoMa is a new architecture that improves the efficiency of mixed-modal language models by dividing expert modules into modality-specific groups and routing them to optimize pre-training.
http://arxiv.org/abs/2407.21757v1
Compressor summary: The paper introduces MovieSeq, a multimodal language model that represents videos as interleaved sequences of images, plots, videos, subtitles, and other information to improve understanding and interaction with narrative videos.
http://arxiv.org/abs/2407.21742v1
Compressor summary: HGOE is a model-agnostic framework that uses external and internal outliers to improve OOD detection for graph data by adaptively assigning weights to them with a boundary-aware loss function.
http://arxiv.org/abs/2407.21740v1
Compressor summary: Contrastive Factor Analysis is a novel framework that combines contrastive learning and factor analysis to leverage their respective advantages in unsupervised representational learning.
http://arxiv.org/abs/2407.21735v1
Compressor summary: The EventMatch framework uses a single model to perform optical flow, stereo matching, and depth estimation with event cameras by comparing feature similarities across different inputs.
http://arxiv.org/abs/2407.21729v1
Compressor summary: The paper proposes an improved local search solver for Pseudo-Boolean Optimization problems that balances hard constraints and objective function scores, and develops a parallel version that shares solutions to guide the search and enhances scoring with polarity density.
http://arxiv.org/abs/2407.21726v1
Compressor summary: The paper discusses AI applications for energy efficiency in smart buildings, focusing on multi-agent systems, IoT, Big Data, anomaly detection, and Intelligent Energy Management Systems classifications.
http://arxiv.org/abs/2407.21720v1
Compressor summary: The authors present a method to detect and mitigate memorization in diffusion models, ensuring that generated images are not replications of training data and addressing legal concerns.
http://arxiv.org/abs/2407.21717v1
Compressor summary: The text discusses the need for oversight of AI technologies by policymakers who lack technical knowledge, and provides an overview of existing guidelines and regulations at various levels.
http://arxiv.org/abs/2407.21714v1
Compressor summary: UMMAN is a novel method that uses Graph Neural Networks to predict intestinal flora diseases by learning the complex associations among gut microbes in an unsupervised way.
http://arxiv.org/abs/2407.21713v1
Compressor summary: The text discusses the role of social learning in human intelligence development and its potential application in machine learning, focusing on the use of embodied agents and natural language processing techniques.
http://arxiv.org/abs/2407.21712v1
Compressor summary: The paper proposes a gating model (RAGate) to determine whether external knowledge is needed for improved system responses in conversational systems, based on human judgements and conversation context.
http://arxiv.org/abs/2407.21708v1
Compressor summary: The authors propose a method to recognize chemical entities and their roles in scientific text using ontological knowledge from ChEBI and language understanding from LLMs, and create a knowledge graph (CEAR) to extend ChEBI.
http://arxiv.org/abs/2407.21705v1
Compressor summary: Tora is a framework that generates videos with controllable motion by integrating trajectory information into a diffusion transformer model, enabling high-quality and dynamic video generation.
http://arxiv.org/abs/2407.21703v1
Compressor summary: Forgedit is a text-guided image editing method that can handle complex problems by remembering and understanding input images during finetuning, and uses a simple workflow with efficient hyper-parameter tuning for editing.
http://arxiv.org/abs/2407.21693v1
Compressor summary: The study introduces a new dataset for task-oriented dialogue systems, called TransferTOD, which simulates human-machine conversations in 30 life service scenarios and improves the performance of large language models in information gathering.
http://arxiv.org/abs/2407.21691v1
Compressor summary: This study uses video-based group activity recognition to develop a machine learning model that can objectively and continuously quantify behaviors in autism spectrum disorder (ASD) in real-world classroom environments, helping to track intervention effectiveness and allocate resources.
http://arxiv.org/abs/2407.21687v1
Compressor summary: The paper proposes a Transformer-based method for incremental object detection that uses dynamic learnable queries to represent new and old classes and mitigates catastrophic forgetting through bipartite matching and risk-balanced calibration.
http://arxiv.org/abs/2407.21686v1
Compressor summary: ExAvatar is a 3D human avatar that learns from monocular video and supports expressive whole-body movements, addressing challenges like limited diversity and absent 3D observations with a hybrid representation of mesh and 3D Gaussians.
http://arxiv.org/abs/2407.21674v1
Compressor summary: Synthetic data can introduce simplicity bias in neural networks when there is a strong correlation between the data source and the task label, leading to poor deployment performance.
http://arxiv.org/abs/2407.21670v1
Compressor summary: This paper presents a parallelization strategy for deep learning models based on the Universal Approximation Theorem to reduce training and inference times as more layers are added.
http://arxiv.org/abs/2407.21669v1
Compressor summary: The paper introduces Synth-Empathy, a system that uses large language models to generate and select high-quality empathetic data, improving empathetic response performance and achieving state-of-the-art results on benchmarks and human evaluations.
http://arxiv.org/abs/2407.21666v1
Compressor summary: The text describes an explainable deep learning pipeline using vision transformers that detects drought stress in potato crops from aerial images, achieving high accuracy and providing insights into the plant features associated with stress.
http://arxiv.org/abs/2407.21659v1
Compressor summary: The text introduces CIDER, a cross-modality information detector that uses image and text similarity to detect jailbreaking attacks on vision language models.
http://arxiv.org/abs/2407.21656v1
Compressor summary: Comgra is a PyTorch library that helps inspect neural networks by visualizing their internal activations and gradients in a GUI.
http://arxiv.org/abs/2407.21652v1
Compressor summary: The paper proposes a method to improve YOLO's performance in object detection by integrating spatial transformer networks, which focus on important image areas and enhance spatial invariance.
http://arxiv.org/abs/2407.21647v1
Compressor summary: The study finds that using an SVM model with Cohere embeddings is the best way to classify human interactions in AIDA, a chatbot system, based on speed and accuracy.
http://arxiv.org/abs/2407.21646v1
Compressor summary: CLASI is a human-like simultaneous speech translation system that uses a data-driven read-write strategy and multi-modal retrieval to convey information accurately and efficiently in various languages and scenarios.
http://arxiv.org/abs/2407.21642v1
Compressor summary: The paper proposes a principled way to adapt time weighting in Physics-Informed Neural Networks using Lyapunov exponents to handle different dynamics.
http://arxiv.org/abs/2407.21638v1
Compressor summary: The authors propose a framework for quality control of AI-generated radiology reports by using auxiliary auditing components that assess the reliability and importance of the diagnoses.
http://arxiv.org/abs/2407.21633v1
Compressor summary: The paper proposes Dual Low-Rank Adaptation (DualLoRA), a method to improve zero-shot dialogue state tracking by enhancing prompt influence in transformer models without increasing inference latency.
http://arxiv.org/abs/2407.21631v1
Compressor summary: RoadFormer+ is a model that fuses different types of data for urban scene parsing, improving efficiency and performance over the previous RoadFormer model.
http://arxiv.org/abs/2407.21630v1
Compressor summary: TAROT is a new method for hiding an author's identity in a text while maintaining its usefulness, using policy optimization over small language models.
http://arxiv.org/abs/2407.21616v1
Compressor summary: The paper proposes a new event encoder for zero-shot object recognition using event camera data and synthetic RGB images, improving performance over previous methods that relied on RGB frame reconstructions.
http://arxiv.org/abs/2407.21604v1
Compressor summary: The paper introduces MicroMIL, a weakly-supervised framework that uses deep cluster embedding and Gumbel Softmax to analyze microscopy images for histopathology research, improving efficiency and accuracy over existing methods.
http://arxiv.org/abs/2407.21590v1
Compressor summary: This paper introduces IDPE, a novel method for evaluating unsupervised embeddings based on preserving Mahalanobis distances between data points in original and embedded spaces, providing a more reliable and comprehensive assessment than traditional extrinsic metrics.
http://arxiv.org/abs/2407.21581v1
Compressor summary: Key points: - Autonomous vehicles' perception systems can miss occluded objects due to vehicle-centric perspective - V2X paradigm proposes infrastructure-side perception system (IPS) to complement autonomous vehicles - InScope is a new 3D infrastructure-side collaborative perception dataset with LiDAR sensors on infrastructure side - InScope provides benchmarks for various tasks related to occlusion challenges in V2X scenarios - InScope improves detection and tracking of obscured, small, and distant objects Summary: InScope is a new dataset that helps autonomous vehicles detect and track occluded objects by using LiDAR sensors on the infrastructure side. It also provides benchmarks for evaluating V2X technologies in occlusion scenarios.
http://arxiv.org/abs/2407.21577v1
Compressor summary: The paper proposes a class-incremental learning method for echocardiography view classification that combines expert networks with a score fusion model to handle data diversity and privacy concerns.
http://arxiv.org/abs/2407.21571v1
Compressor summary: PMoE is a novel method that uses an asymmetric Transformer to reduce forgetting in large Language Models by adding progressive experts and routing new knowledge to appropriate layers.
http://arxiv.org/abs/2407.21566v1
Compressor summary: TRGR is a novel system that uses transmissive RIS to enhance gait recognition through walls using only magnitude measurements of RF signals.
http://arxiv.org/abs/2407.21560v1
Compressor summary: The study introduces a generative sentiment analysis model that addresses challenges in fine-grained sentiment analysis by using a latent category distribution variable, a variational autoencoder, and a trie data structure with constrained decoding.
http://arxiv.org/abs/2407.21556v1
Compressor summary: The paper introduces a framework to compare different semantics of choice constructs in logic programming.
http://arxiv.org/abs/2407.21554v1
Compressor summary: Prompt2Guard is a novel deepfake detection method that uses vision-language models and multimodal prompts to continuously detect photorealistic fake images without relying on prompt selection accuracy or multiple forward passes.
http://arxiv.org/abs/2407.21553v1
Compressor summary: The paper introduces a CX Simulator that uses large language models to predict user behavior transitions and simulate web-marketing campaign effects without costly online testing.
http://arxiv.org/abs/2407.21546v1
Compressor summary: This paper explores how meta-learning can enhance reinforcement learning by optimizing intrinsic rewards without using meta-gradients, and compares it to other methods in continuous control tasks with sparse rewards.
http://arxiv.org/abs/2407.21535v1
Compressor summary: The paper proposes probabilistic scoring lists (PSL), an extension of scoring systems that represent uncertainty with probability distributions, and a method for learning PSLs from data to improve explainability in AI decisions.
http://arxiv.org/abs/2407.21534v1
Compressor summary: The paper proposes a way to improve multimodal language models' visual referring ability by adjusting visual tokens based on text prompts during inference, without needing extra training.
http://arxiv.org/abs/2407.21530v1
Compressor summary: The CONDA 2024 workshop investigates data contamination in natural language processing and aims to create a shared task and database to collect evidence and prevent evaluation results on contaminated resources.
http://arxiv.org/abs/2407.21525v1
Compressor summary: The paper introduces Spatial-Structural GCN, a new method for skeleton-based human activity recognition that leverages both the topological structure and the dynamic similarity of edge node sequences in graph convolutional networks.
http://arxiv.org/abs/2407.21523v1
Compressor summary: Key points: - Tabular data augmentation (TDA) enhances tabular data for machine learning tasks - TDA pipeline consists of pre-augmentation, augmentation, and post-augmentation procedures - Generative AI is a trending approach for TDA Summary: The text reviews the progress and prospects of TDA, which improves tabular data for ML using generative AI and other methods.
http://arxiv.org/abs/2407.21519v1
Compressor summary: PhysFlow is a method for improving remote heart rate estimation by augmenting skin diversity using conditional normalizing flows, reducing errors especially in darker skin tones.
http://arxiv.org/abs/2407.21517v1
Compressor summary: The paper proposes a simple framework to reduce computational cost in video snapshot compressive imaging using low-bit quantization and improves the performance of existing methods.
http://arxiv.org/abs/2407.21510v1
Compressor summary: PEAR is a novel model that anticipates both interaction intention and manipulation in first-person hand-object interaction, addressing uncertainties using cross-alignment and bidirectional constraints.
http://arxiv.org/abs/2407.21503v1
Compressor summary: The study proposes a data-driven ensemble approach that analyzes productivity losses in automation systems, identifies root causes, and improves efficiency by integrating information theory and machine learning methods with stream processing.
http://arxiv.org/abs/2407.21497v1
Compressor summary: The text proposes an unsupervised out-of-distribution detection method for mitral regurgitation diagnosis using ultrasound videos, which can improve accuracy and reduce misdiagnosis.
http://arxiv.org/abs/2407.21491v1
Compressor summary: The text introduces a new system called GPT-Talker that generates natural and expressive conversational speech for user-agent interactions using multimodal information, and proposes a large dataset to evaluate its performance.
http://arxiv.org/abs/2407.21489v1
Compressor summary: Maverick is a simple and efficient pipeline for coreference resolution that outperforms large generative models with up to 13 billion parameters using only 500 million parameters, achieving state-of-the-art results on the CoNLL-2012 benchmark.
http://arxiv.org/abs/2407.21485v1
Compressor summary: The paper evaluates how to speed up generalized planning by applying parallel search techniques to a novel algorithm called Best-First Generalized Planning (BFGP).
http://arxiv.org/abs/2407.21483v1
Compressor summary: The paper proposes a four-valued logic query language called eSPARQL for operating with multiple and sometimes conflicting beliefs in epistemic RDF-star metadata.
http://arxiv.org/abs/2407.21476v1
Compressor summary: The text discusses how different TTS architectures affect synthetic data generation for ASR and SLT, but finds no clear relation between TTS performance and ASR performance.
http://arxiv.org/abs/2407.21475v1
Compressor summary: The paper proposes a novel algorithm that generates high-quality videos from image synthesis models without extra training or optimization, using dependency noise model and temporal momentum attention for content consistency and animation coherence.
http://arxiv.org/abs/2407.21467v1
Compressor summary: The study introduces a deep learning method that accurately predicts myopia progression and risk in children using fundus images and refraction data, enabling early interventions and reducing healthcare costs.
http://arxiv.org/abs/2407.21465v1
Compressor summary: The paper proposes a method, coded MarvelOVD, to improve open vocabulary detection by using the detector as an auxiliary guidance for vision language models and addressing the noise and bias issues in pseudo-labels.
http://arxiv.org/abs/2407.21460v1
Compressor summary: The paper evaluates a multi-agent Q-learning solution for improving network performance in VANETs without increasing computational burden or compatibility issues.
http://arxiv.org/abs/2407.21459v1
Compressor summary: This study develops KemenkeuGPT, a Large Language Model using LLM techniques, to help Indonesia's Ministry of Finance make decisions with complex financial data and regulations, showing its potential as an essential tool.
http://arxiv.org/abs/2407.21454v1
Compressor summary: The paper introduces StreetSurfaceVis, a dataset with 9,122 street images annotated for road surface type and quality to train models for assessing road surfaces, addressing the imbalance and reducing manual annotation using various strategies.
http://arxiv.org/abs/2407.21453v1
Compressor summary: The paper compares tinyML neural network architectures and compression techniques for bird song detection, focusing on the corn bunning species.
http://arxiv.org/abs/2407.21450v1
Compressor summary: The paper proposes a 3D video extrapolation method that disentangles geometry and motion, improving accuracy and quality of future video predictions.
http://arxiv.org/abs/2407.21448v1
Compressor summary: The proposed method (PCSR) adaptively allocates computational resources to pixels based on their restoration difficulty, improving efficiency and performance for single image super-resolution.
http://arxiv.org/abs/2407.21443v1
Compressor summary: SliSum is a novel summary generation strategy that improves the faithfulness of LLMs by dividing the source article into overlapping windows and generating local summaries for each window, then aggregating them using clustering and majority voting.
http://arxiv.org/abs/2407.21441v1
Compressor summary: The paper shows that automated question generation can improve fact-checking efficiency and sometimes yield better evidence than human-written questions.
http://arxiv.org/abs/2407.21439v1
Compressor summary: RagLLaVA is a novel framework that uses knowledge-enhanced reranking and noise-injected training to improve multimodal retrieval-augmented generation for dynamic contexts, addressing the multi-granularity noisy correspondence problem.
http://arxiv.org/abs/2407.21438v1
Compressor summary: Key points: - Human-object interactions (HOI) detection is important for visual reasoning and scene understanding - Existing methods struggle with rare human-object pairs due to real world bias - CEFA is a novel framework that aligns generated data with original data at the feature level and bridges the domain gap - CEFA consists of a feature alignment module and a context enhancement module Summary: CEFA is a new framework for improving HOI detection on rare categories by aligning features and enhancing context of generated and original data.
http://arxiv.org/abs/2407.21436v1
Compressor summary: The proposed method enhances thermal point clouds with building semantics and location, improving thermal analysis and supporting deep learning models.
http://arxiv.org/abs/2407.21432v1
Compressor summary: The paper proposes a novel car localization method using image features and detailed 3D building models, improving accuracy in GNSS-denied urban areas.
http://arxiv.org/abs/2407.21424v1
Compressor summary: The text describes a pipeline for detecting hallucinations in large language models' outputs by scoring, calibrating, and thresholding their confidence, and proposes a multi-scoring framework to improve performance and reduce costs.
http://arxiv.org/abs/2407.21422v1
Compressor summary: The paper proposes a new task, dataset, and framework for detecting text tampering in images by generative AI models, aiming to improve generalization and perception of forgery detection.
http://arxiv.org/abs/2407.21418v1
Compressor summary: FTuner is a new technique for deep learning compilers that uses uKernels to patch together small tensors, achieving comparable performance and speedup while reducing tuning time significantly.
http://arxiv.org/abs/2407.21417v1
Compressor summary: ReSet is a method to improve language models by combining self-instruction and rejection sampling, which enhances both faithfulness and instruction following compared to multi-task learning.
http://arxiv.org/abs/2407.21416v1
Compressor summary: VIPeR is a novel visual incremental place recognition method that adapts to new environments while preserving previous ones, using an adaptive mining strategy and a memory bank with probabilistic knowledge distillation.
http://arxiv.org/abs/2407.21408v1
Compressor summary: Key points: - Paper investigates subjective and objective quality assessment of AI-generated video (AIGC) - Constructs LGVQ dataset with 2,808 AIGC videos from 6 models and 468 text prompts - Evaluates existing metrics on LGVQ dataset and proposes UGVQ model to assess quality across three aspects Summary: The paper presents a large dataset and a new metric for evaluating the quality of AI-generated video from different perspectives, including spatial, temporal, and text-to-video alignment.
http://arxiv.org/abs/2407.21402v1
Compressor summary: The paper presents a novel network, DD-rPPGNet, that eliminates interference in remote photoplethysmography (rPPG) signals to improve estimation performance.
http://arxiv.org/abs/2407.21385v1
Compressor summary: SmileyNet is a neural network that uses smiley faces and positive reinforcement to predict coin flips using tea leaf images, outperforming other models and enabling lottery wins.
http://arxiv.org/abs/2407.21384v1
Compressor summary: GEGA is a novel graph neural network-based model for extracting relations between entities from unstructured document text, addressing challenges in evidence retrieval and complex cross-relations.
http://arxiv.org/abs/2407.21376v1
Compressor summary: The study proposes a novel EKLF model that uses an Extended Kalman Filter to track complex temporal patterns and an ALS algorithm to train latent features for representing dynamic weighted directed graphs, achieving better prediction accuracy and efficiency than existing models.
http://arxiv.org/abs/2407.21368v1
Compressor summary: Key points: - Large Vision-Language Models (LVLMs) are successful in medical VQA but suffer from hallucination and imbalanced data problems - Two prompting strategies reduce hallucination and improve VQA performance on complex pathologies - The methods can be extended to general LVLM domains Summary: The authors propose two prompting strategies that enhance LVLMs' ability to diagnose complex medical pathologies by reducing hallucination and leveraging weak learners.
http://arxiv.org/abs/2407.21363v1
Compressor summary: This paper introduces ESIQAD, the first quality assessment database for egocentrical spatial images in eXtended Reality, and evaluates 22 IQA models using it.
http://arxiv.org/abs/2407.21359v1
Compressor summary: ProSpec is a Reinforcement Learning method that uses prospective thinking to make optimal, lower-risk decisions by imagining future trajectories and employing cycle consistency for state reversibility and data efficiency.
http://arxiv.org/abs/2407.21358v1
Compressor summary: Tree-of-Traversals is a new algorithm that helps large language models use knowledge graphs for better reasoning and question answering.
http://arxiv.org/abs/2407.21347v1
Compressor summary: DP-BloGS is a new privacy-preserving algorithm for deep learning that shuffles gradients probabilistically and achieves fast training with strong protection against data extraction.
http://arxiv.org/abs/2407.21344v1
Compressor summary: The paper proposes a new method to model how emotions change over time using neural networks and constraints, and shows good results on a speech emotion database.
http://arxiv.org/abs/2407.21341v1
Compressor summary: The authors developed a 3D shape completion network (CoRe++) that uses RGB-D cameras to estimate potato yield more accurately by completing the 3D shape of individual tubers and achieved fast and accurate results on an operational harvester.
http://arxiv.org/abs/2407.21338v1
Compressor summary: The text introduces a new reinforcement learning method called NaSA-TD3 that uses intrinsic motivation to improve exploration and achieve better performance in complex, sparse environments with image inputs.
http://arxiv.org/abs/2407.21333v1
Compressor summary: Chat2Layout is an interactive system that uses large language models to generate and arrange furniture layouts in response to user input, enabling seamless communication and feedback-driven refinement.
http://arxiv.org/abs/2407.21331v1
Compressor summary: CAMAv2 is a vision-centric approach that generates high-quality, consistent, and accurate 3D annotations of static map elements without LiDAR inputs, improving the performance of models trained with these annotations.
http://arxiv.org/abs/2407.21330v1
Compressor summary: Recent large language models, especially Claude and GPT 4o, have improved Sinhala performance compared to previous versions and other models, while Llama and Mistral can be enhanced by fine-tuning for better results.
http://arxiv.org/abs/2407.21320v1
Compressor summary: MetaOpenFOAM is a novel framework that automates complex CFD simulations using natural language input and LLMs, achieving high pass rates and low costs per task.
http://arxiv.org/abs/2407.21319v1
Compressor summary: The text discusses how cooperation in training foundation models leads to advancements in artificial intelligence and proposes a new model, BigLearn-GAN, as an example.
http://arxiv.org/abs/2407.21317v1
Compressor summary: Pathology AI, especially Foundation Models, has greatly improved diagnosis and decision-making, but faces challenges for clinical application.
http://arxiv.org/abs/2407.21314v1
Compressor summary: The text introduces a new data-driven algorithm (SOAD) for assimilating nonlinear physical and observational models, which improves accuracy over traditional methods.
http://arxiv.org/abs/2407.21311v1
Compressor summary: The paper introduces EUDA, an efficient domain adaptation framework that uses DINOv2 as a feature extractor and SDAL to balance adaptation and alignment, achieving comparable results with significantly fewer parameters.
http://arxiv.org/abs/2407.21308v1
Compressor summary: The paper introduces an improved self-checkout system using YOLOv10 for retail automation, with better product recognition and faster checkout speed than existing methods.
http://arxiv.org/abs/2407.21298v1
Compressor summary: The authors propose a new geometric vectorization method for persistent diagrams, which improves protein function prediction by using topological data analysis.
http://arxiv.org/abs/2407.21290v1
Compressor summary: The paper introduces TrackSorter, a novel algorithm based on Transformers, that converts particle data into discrete tokens and sorts them to find tracks in High Energy Physics.
http://arxiv.org/abs/2407.21276v1
Compressor summary: The paper proposes a multi-layer knowledge pyramid approach in Retrieval-Augmented Generation methods to improve precision and recall, and shows better results than existing methods on two benchmarks.
http://arxiv.org/abs/2407.21275v1
Compressor summary: The paper proposes a new method for long-term time series forecasting using frequency domain representations and a novel attention mechanism based on Kramer-Kronig relations, achieving significant performance improvements over existing methods.
http://arxiv.org/abs/2407.21272v1
Compressor summary: The authors propose an automated algorithm to measure hyperreflective foci in retinal images, which could help diagnose and monitor various retinal diseases.
http://arxiv.org/abs/2407.21264v1
Compressor summary: The paper proposes a novel Supervised Contrastive Learning method to improve model attribution for machine-generated disinformation by focusing on the differences between large language models.
http://arxiv.org/abs/2407.21260v1
Compressor summary: The paper analyzes how distributional reinforcement learning can improve performance in stochastic environments by using Bellman unbiasedness and moment functionals, and proposes an efficient algorithm with a regret bound.
http://arxiv.org/abs/2407.21252v1
Compressor summary: The paper introduces lifelong person search, a problem where models are incrementally trained on new datasets while preserving old dataset knowledge, using techniques like knowledge distillation and rehearsal-based instance matching.