This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-08 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.03936v1
Compressor summary: The study uses a Portuguese language model to improve applications of Mercosur Common Nomenclature and proposes SLIM-RAFT, a simplified fine-tuning technique that outperforms existing models in the task.
http://arxiv.org/abs/2408.03934v1
Compressor summary: The paper proposes a method using fine-tuned LLMs to predict the future impact of new research articles based on titles and abstracts, outperforming traditional methods and showing real-world application potential.
http://arxiv.org/abs/2408.03922v1
Compressor summary: FMiFood is a new multi-modal contrastive learning framework that uses food category text descriptions and GPT-4 to improve food image classification accuracy by enhancing feature discrimination.
http://arxiv.org/abs/2408.03915v1
Compressor summary: The paper argues that the underlying distribution affects the interpretability of ML models and proposes considering it in assessing model complexity.
http://arxiv.org/abs/2408.03913v1
Compressor summary: AdapMTL is a framework that adaptively adjusts sparsity levels in multitask learning models for efficient multimedia processing, outperforming existing methods.
http://arxiv.org/abs/2408.03900v1
Compressor summary: Speech-MASSIVE is a multilingual spoken language understanding dataset with various tasks and applications for assessing foundation models.
http://arxiv.org/abs/2408.03899v1
Compressor summary: The authors propose a method to simplify scholarly abstracts using language models that improve readability while preserving content accuracy.
http://arxiv.org/abs/2408.03885v1
Compressor summary: The GlintIQA model combines ViTs and CNNs to assess image quality by extracting both global and local features, integrating them progressively, and using content similarity-based labeling with subjective scores.
http://arxiv.org/abs/2408.03877v1
Compressor summary: GraphProbe is a framework that investigates how well various graph learning methods encode different types of graph properties into node representations for downstream tasks.
http://arxiv.org/abs/2408.03874v1
Compressor summary: The authors propose a novel technique to improve draft clinical notes by modeling physician conversation styles and preferences, and enabling easy onboarding of new physicians without re-training the model. The technique significantly improves ROUGE-2 scores for three sections of the note.
http://arxiv.org/abs/2408.03872v1
Compressor summary: The authors propose a Transformer-based forecasting method for supply chain demand prediction that captures interactions between time series and handles sparsity, and show its effectiveness on both private and public datasets.
http://arxiv.org/abs/2408.03871v1
Compressor summary: Key points: - The report describes their participation in a biomedical abstract simplification task using different models and methods - They ranked highly in both automatic and human evaluations with some of their models - They shared their codes, fine-tuned models, prompts, and data splits on GitHub Summary: The authors participated in a biomedical abstract simplification task and achieved high rankings with various models. They also released their resources on GitHub.
http://arxiv.org/abs/2408.03867v1
Compressor summary: The paper proposes a surgical phase recognition method called Surgformer, which uses divided spatial-temporal attention and hierarchical temporal attention to model spatial-temporal dependency and reduce redundancy.
http://arxiv.org/abs/2408.03865v1
Compressor summary: PackMamba is a high-throughput Mamba architecture that efficiently handles variable-length sequences in generative AI by modifying parallel operators and reducing bottlenecks.
http://arxiv.org/abs/2408.03855v1
Compressor summary: The text discusses how transformer neural networks outperform other language models and suggests they should be more seriously considered as theories of language.
http://arxiv.org/abs/2408.03849v1
Compressor summary: The authors developed an Amharic hate speech detector model that can classify text into four categories using a custom annotated dataset and SBi-LSTM deep learning.
http://arxiv.org/abs/2408.03842v1
Compressor summary: The proposed method uses a Transformer-based image compression method with a novel block that considers frequency components and improves compression efficiency, outperforming existing learned image compression methods.
http://arxiv.org/abs/2408.03837v1
Compressor summary: WalledEval is an AI safety testing toolkit with various features and benchmarks to evaluate large language models.
http://arxiv.org/abs/2408.03834v1
Compressor summary: The text discusses how large vision and language models can improve information extraction systems by using targeted prompts to generate accurate and specific answers from document images.
http://arxiv.org/abs/2408.03822v1
Compressor summary: The text proposes a method to reduce memory and storage requirements for 3D scene representation using learnable masks, grid-based neural fields, and residual vector quantization, while maintaining performance and quality.
http://arxiv.org/abs/2408.03819v1
Compressor summary: The paper proposes a counterfactual data augmentation method for active learning that uses artificial datapoints generated by LLMs and rule-based models to enhance data efficiency and address the cold start problem.
http://arxiv.org/abs/2408.03816v1
Compressor summary: The authors propose a method to predict clinical variables using time series forecasting, which allows interpreting the causes of sepsis and other labels, and achieve better results with iterative multi-step decoders and dense encoders.
http://arxiv.org/abs/2408.03811v1
Compressor summary: This study explores using generative language models to improve automated short answer scoring in education, by combining vector databases, encoders, and models to analyze similar responses and assign scores.
http://arxiv.org/abs/2408.03795v1
Compressor summary: This note defines and compares two ways to measure analogical proportions between numbers using triangular norms and generalized means.
http://arxiv.org/abs/2408.03771v1
Compressor summary: Key points: - The paper developed a VAE-MLP model for predicting PHLF in HCC patients - The model integrated counterfactuals and LRP to provide explainability - The paper proposed a framework for evaluating AI explanations - The evaluations showed that the model's explanation improved clinicians' prediction accuracy and confidence Summary: The paper presented a transparent VAE-MLP model for predicting PHLF in HCC patients, which integrated counterfactuals and LRP to explain its decisions. It also proposed a framework for assessing AI explanations and showed that they enhanced clinicians' performance.
http://arxiv.org/abs/2408.03765v1
Compressor summary: The paper proposes a new method (NS4GC) for graph clustering that uses an estimated node similarity matrix to guide representation learning, improving accuracy and efficiency.
http://arxiv.org/abs/2408.03762v1
Compressor summary: The paper describes how the authors fine-tuned Llama3, a foundation model, for Financial Text Summarization and achieved third place with a ROUGE-1 score of 0.521.
http://arxiv.org/abs/2408.03753v1
Compressor summary: 3iGS improves 3D Gaussian Splatting by expressing outgoing radiance as a function of local illumination and BRDF features, optimising both for realistic view-dependent effects.
http://arxiv.org/abs/2408.03747v1
Compressor summary: The text discusses a survey on time-series anomaly detection methods and their challenges, such as benchmarking, data sets, evaluation metrics, and threshold selection.
http://arxiv.org/abs/2408.03748v1
Compressor summary: The paper introduces a new method, edge guided conditional diffusion model, to generate realistic pseudo thermal images using visible edges and train deep learning models for object detection in low light and adverse weather conditions.
http://arxiv.org/abs/2408.03746v1
Compressor summary: The paper introduces a new approach to improve Bayesian Last Layer models by using implicit priors and diffusion techniques, which enhances their performance on various datasets and tasks.
http://arxiv.org/abs/2408.03745v1
Compressor summary: The paper presents I2FCM, a novel framework that applies intuitionistic fuzzy c-means to image classification, making CNN models more interpretable by estimating hesitancy and focusing on informative image regions.
http://arxiv.org/abs/2408.03735v1
Compressor summary: The paper proposes QSLAW, a method that uses parameter quantization and multimodal warmup to improve vision-language instruction tuning efficiency for large language models while maintaining performance.
http://arxiv.org/abs/2408.03734v1
Compressor summary: The study proposes a novel deep learning architecture, SHAU, for multiscale shadow removal in complex scenes and introduces a new synthetic dataset, MSRD, to benchmark future methods.
http://arxiv.org/abs/2408.03732v1
Compressor summary: Question Rephrasing helps evaluate input uncertainty of large language models by using sampling methods to assess overall uncertainty in chemical tasks.
http://arxiv.org/abs/2408.03728v1
Compressor summary: FISTAPruner is a new pruning method for large language models that improves efficiency without sacrificing performance by using convex optimization and a correction mechanism.
http://arxiv.org/abs/2408.03717v1
Compressor summary: The paper introduces SeRankDet, a deep network that uses selective ranking and attention to improve infrared small target detection in complex backgrounds, achieving high accuracy without the conventional trade-off between precision and false alarms.
http://arxiv.org/abs/2408.03706v1
Compressor summary: The text proposes using complexity measures of the local topology of the latent space of a contextual language model to find features of embedding vectors that describe their relation to other similar vectors, which improves sequence tagging tasks like dialogue term extraction.
http://arxiv.org/abs/2408.03695v1
Compressor summary: Openstory++ is a large-scale dataset with instance-level annotations and better training methodology to improve image generation models for creating coherent visual stories from long captions.
http://arxiv.org/abs/2408.03691v1
Compressor summary: This paper explores using deep learning and artificial intelligence to generate periodic orbits for space missions and astrodynamics research.
http://arxiv.org/abs/2408.03685v1
Compressor summary: RL-ADN is a new open-source library that improves the performance and efficiency of deep reinforcement learning for optimizing energy storage systems in distribution networks using advanced data augmentation, network modeling, and power flow solver techniques.
http://arxiv.org/abs/2408.03677v1
Compressor summary: L4DR is a weather-robust method that fuses LiDAR and 4D radar for 3D object detection, improving performance under fog and other adverse weather conditions.
http://arxiv.org/abs/2408.03675v1
Compressor summary: The paper proposes NACL, a framework for efficient eviction of unnecessary tokens from the KV Cache in large language models, improving performance on short and long text tasks while reducing memory usage.
http://arxiv.org/abs/2408.03669v1
Compressor summary: This paper analyzes the real problem behind performance degradation of deep Graph Neural Networks (GNNs) and shows that trainability issues of MLPs are the main challenge, which can be improved by constraining gradient flow.
http://arxiv.org/abs/2408.03664v1
Compressor summary: This paper proposes a reinforcement learning method to reduce human-induced seismicity in geothermal energy and carbon capture systems by adjusting controller parameters in real-time.
http://arxiv.org/abs/2408.03657v1
Compressor summary: The paper presents a method to improve ultrasound image resolution using physics-based deconvolution with neural networks and B-mode images, outperforming traditional methods in tests.
http://arxiv.org/abs/2408.03655v1
Compressor summary: Key points: - The paper proposes a GAN system to generate synthetic retail transaction data - The system integrates consumer behavior modeling and SKU availability constraints - The system generates transactions under stock constraints, addressing assortment optimization challenges - The system shows enhanced realism in simulated transactions compared to previous methods Summary: The paper presents a GAN system that generates realistic retail transaction data by combining consumer behavior and SKU availability modeling, and applies it to optimize assortments and predict demand.
http://arxiv.org/abs/2408.03652v1
Compressor summary: Arabic KNN-NER is a method for identifying and classifying entities in Arabic text using KNN search over cached training data to improve fine-grained flat-entity recognition.
http://arxiv.org/abs/2408.03637v1
Compressor summary: TALE is a training-free framework that uses latent space manipulation to flawlessly incorporate user-specified objects into different visual contexts.
http://arxiv.org/abs/2408.03636v1
Compressor summary: SpectralX provides time-frequency explanations for black-box time-series classifiers using a plug-and-play framework and a new perturbation-based method called FIA.
http://arxiv.org/abs/2408.03633v1
Compressor summary: CARE is a reading assistant for customer service representatives that helps them find proper responses from user manuals faster and more accurately by using self-supervised learning and explicit clue chains.
http://arxiv.org/abs/2408.03632v1
Compressor summary: The paper introduces Concept Conductor, a method to generate multi-concept images without training, ensuring visual fidelity, correct layout, and semantic consistency by isolating sampling processes, using self-attention, and injecting concepts with shape-aware masks.
http://arxiv.org/abs/2408.03631v1
Compressor summary: The text proposes using large language models and autonomous agents to optimize base station siting in a more efficient, cost-effective, and reliable way, reducing human effort.
http://arxiv.org/abs/2408.03630v1
Compressor summary: The paper introduces PAGED, a benchmark for evaluating procedural graph extraction from documents, and shows that existing methods are limited while large language models have potential but also gaps.
http://arxiv.org/abs/2408.03627v1
Compressor summary: The paper proposes BIDFC, a contrastive learning framework for SAR ATR, which uses weakly contrastive learning and dynamic-weighted variance loss to improve classification accuracy with less labeled data.
http://arxiv.org/abs/2408.03626v1
Compressor summary: Random feature maps with optimal internal weights can accurately predict dynamical systems' behavior with much less computation than traditional neural networks.
http://arxiv.org/abs/2408.03624v1
Compressor summary: Key points: - The paper proposes AgentsCoMerge, a framework for CAVs to merge safely and efficiently using LLMs - The framework consists of observation, planning, communication, and training modules - Experiments show the superiority of the method in various scenarios Summary: The paper presents a novel framework that enables connected and autonomous vehicles to collaborate and merge safely and efficiently using large language models.
http://arxiv.org/abs/2408.03622v1
Compressor summary: This study developed an innovative spelling error correction method for Persian clinical text using a fine-tuned model and PERTO algorithm, achieving high precision in detecting and correcting word errors.
http://arxiv.org/abs/2408.03619v1
Compressor summary: The paper proposes a new training criterion for machine learning models that penalizes poor loss concentration to improve performance on rare or difficult data points and is compatible with loss transformations like CVaR or DRO.
http://arxiv.org/abs/2408.03618v1
Compressor summary: The FIPO framework improves the quality and logic of arguments generated by large language models by reducing fallacy errors.
http://arxiv.org/abs/2408.03617v1
Compressor summary: The study compares GPT-2 models trained on child-directed speech and synthetic TinyDialogues to BabyLM datasets, finding that local data properties affect performance but global ones do not, suggesting children's learning is more efficient than language modeling.
http://arxiv.org/abs/2408.03615v1
Compressor summary: The paper proposes a Hybrid Multimodal Memory module to improve long-horizon task completion in artificial intelligence agents by enabling better planning and reflection with world knowledge and multimodal experience.
http://arxiv.org/abs/2408.03612v1
Compressor summary: JARViS is a two-stage video action detection framework that uses Transformer attention to model interactions between actors and scenes, achieving state-of-the-art performance on three VAD datasets.
http://arxiv.org/abs/2408.03608v1
Compressor summary: InPer is a framework that uses causality to enhance domain generalization by refining causal variable selection during training and identifying anti-interference samples for prototype classification during testing.
http://arxiv.org/abs/2408.03599v1
Compressor summary: The paper proposes a framework that unifies and improves activation functions for neural networks, achieving better performance with minimal complexity increase.
http://arxiv.org/abs/2408.03598v1
Compressor summary: PRISM is a detector-free image matching method that prunes irrelevant features, tackles scale discrepancy, and achieves leading accuracy on various benchmarks.
http://arxiv.org/abs/2408.03591v1
Compressor summary: The study presents a calibration-free method using machine learning and LSTM networks to accurately estimate focal depth from eye movements, improving autofocal glasses usability and enabling their use in extended reality environments.
http://arxiv.org/abs/2408.03585v1
Compressor summary: The paper introduces real-world Traveling Salesman Problem scenarios and proposes a hierarchical approach using Hypernetworks and Expectation-Maximization algorithm to improve routing solutions.
http://arxiv.org/abs/2408.03574v1
Compressor summary: We present NumCLIP, a method that improves ordinal regression performance of pre-trained vision-language models by disassembling the problem into coarse classification and fine prediction stages, using language to leverage numerical bins, and introducing a novel cross-modal ranking loss to maintain alignment.