arxiv compressed, 2024-08-08

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-08 generated by the compressor, my personal LLM-based project.


SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature

http://arxiv.org/abs/2408.03936v1

Compressor summary: The study uses a Portuguese language model to improve applications of Mercosur Common Nomenclature and proposes SLIM-RAFT, a simplified fine-tuning technique that outperforms existing models in the task.


From Words to Worth: Newborn Article Impact Prediction with LLM

http://arxiv.org/abs/2408.03934v1

Compressor summary: The paper proposes a method using fine-tuned LLMs to predict the future impact of new research articles based on titles and abstracts, outperforming traditional methods and showing real-world application potential.


FMiFood: Multi-modal Contrastive Learning for Food Image Classification

http://arxiv.org/abs/2408.03922v1

Compressor summary: FMiFood is a new multi-modal contrastive learning framework that uses food category text descriptions and GPT-4 to improve food image classification accuracy by enhancing feature discrimination.


Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation

http://arxiv.org/abs/2408.03915v1

Compressor summary: The paper argues that the underlying distribution affects the interpretability of ML models and proposes considering it in assessing model complexity.


AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

http://arxiv.org/abs/2408.03913v1

Compressor summary: AdapMTL is a framework that adaptively adjusts sparsity levels in multitask learning models for efficient multimedia processing, outperforming existing methods.


Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

http://arxiv.org/abs/2408.03900v1

Compressor summary: Speech-MASSIVE is a multilingual spoken language understanding dataset with various tasks and applications for assessing foundation models.


Simplifying Scholarly Abstracts for Accessible Digital Libraries

http://arxiv.org/abs/2408.03899v1

Compressor summary: The authors propose a method to simplify scholarly abstracts using language models that improve readability while preserving content accuracy.


Global-Local Progressive Integration Network for Blind Image Quality Assessment

http://arxiv.org/abs/2408.03885v1

Compressor summary: The GlintIQA model combines ViTs and CNNs to assess image quality by extracting both global and local features, integrating them progressively, and using content similarity-based labeling with subjective scores.


Knowledge Probing for Graph Representation Learning

http://arxiv.org/abs/2408.03877v1

Compressor summary: GraphProbe is a framework that investigates how well various graph learning methods encode different types of graph properties into node representations for downstream tasks.


Personalized Clinical Note Generation from Doctor-Patient Conversations

http://arxiv.org/abs/2408.03874v1

Compressor summary: The authors propose a novel technique to improve draft clinical notes by modeling physician conversation styles and preferences, and enabling easy onboarding of new physicians without re-training the model. The technique significantly improves ROUGE-2 scores for three sections of the note.


Inter-Series Transformer: Attending to Products in Time Series Forecasting

http://arxiv.org/abs/2408.03872v1

Compressor summary: The authors propose a Transformer-based forecasting method for supply chain demand prediction that captures interactions between time series and handles sparsity, and show its effectiveness on both private and public datasets.


BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability

http://arxiv.org/abs/2408.03871v1

Compressor summary: Key points: - The report describes their participation in a biomedical abstract simplification task using different models and methods - They ranked highly in both automatic and human evaluations with some of their models - They shared their codes, fine-tuned models, prompts, and data splits on GitHub Summary: The authors participated in a biomedical abstract simplification task and achieved high rankings with various models. They also released their resources on GitHub.


Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition

http://arxiv.org/abs/2408.03867v1

Compressor summary: The paper proposes a surgical phase recognition method called Surgformer, which uses divided spatial-temporal attention and hierarchical temporal attention to model spatial-temporal dependency and reduce redundancy.


PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

http://arxiv.org/abs/2408.03865v1

Compressor summary: PackMamba is a high-throughput Mamba architecture that efficiently handles variable-length sequences in generative AI by modifying parallel operators and reducing bottlenecks.


Why transformers are obviously good models of language

http://arxiv.org/abs/2408.03855v1

Compressor summary: The text discusses how transformer neural networks outperform other language models and suggests they should be more seriously considered as theories of language.


Hate Speech Detection and Classification in Amharic Text with Deep Learning

http://arxiv.org/abs/2408.03849v1

Compressor summary: The authors developed an Amharic hate speech detector model that can classify text into four categories using a custom annotated dataset and SBi-LSTM deep learning.


Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression

http://arxiv.org/abs/2408.03842v1

Compressor summary: The proposed method uses a Transformer-based image compression method with a novel block that considers frequency components and improves compression efficiency, outperforming existing learned image compression methods.


WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

http://arxiv.org/abs/2408.03837v1

Compressor summary: WalledEval is an AI safety testing toolkit with various features and benchmarks to evaluate large language models.


Target Prompting for Information Extraction with Vision Language Model

http://arxiv.org/abs/2408.03834v1

Compressor summary: The text discusses how large vision and language models can improve information extraction systems by using targeted prompts to generate accurate and specific answers from document images.


Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields

http://arxiv.org/abs/2408.03822v1

Compressor summary: The text proposes a method to reduce memory and storage requirements for 3D scene representation using learnable masks, grid-based neural fields, and residual vector quantization, while maintaining performance and quality.


Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

http://arxiv.org/abs/2408.03819v1

Compressor summary: The paper proposes a counterfactual data augmentation method for active learning that uses artificial datapoints generated by LLMs and rule-based models to enhance data efficiency and address the cold start problem.


Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

http://arxiv.org/abs/2408.03816v1

Compressor summary: The authors propose a method to predict clinical variables using time series forecasting, which allows interpreting the causes of sepsis and other labels, and achieve better results with iterative multi-step decoders and dense encoders.


Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring

http://arxiv.org/abs/2408.03811v1

Compressor summary: This study explores using generative language models to improve automated short answer scoring in education, by combining vector databases, encoders, and models to analyze similar responses and assign scores.


Frank's triangular norms in Piaget's logical proportions

http://arxiv.org/abs/2408.03795v1

Compressor summary: This note defines and compares two ways to measure analogical proportions between numbers using triangular norms and generalized means.


Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

http://arxiv.org/abs/2408.03771v1

Compressor summary: Key points: - The paper developed a VAE-MLP model for predicting PHLF in HCC patients - The model integrated counterfactuals and LRP to provide explainability - The paper proposed a framework for evaluating AI explanations - The evaluations showed that the model's explanation improved clinicians' prediction accuracy and confidence Summary: The paper presented a transparent VAE-MLP model for predicting PHLF in HCC patients, which integrated counterfactuals and LRP to explain its decisions. It also proposed a framework for assessing AI explanations and showed that they enhanced clinicians' performance.


Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

http://arxiv.org/abs/2408.03765v1

Compressor summary: The paper proposes a new method (NS4GC) for graph clustering that uses an estimated node similarity matrix to guide representation learning, improving accuracy and efficiency.


'Finance Wizard' at the FinLLM Challenge Task: Financial Text Summarization

http://arxiv.org/abs/2408.03762v1

Compressor summary: The paper describes how the authors fine-tuned Llama3, a foundation model, for Financial Text Summarization and achieved third place with a ROUGE-1 score of 0.521.


3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting

http://arxiv.org/abs/2408.03753v1

Compressor summary: 3iGS improves 3D Gaussian Splatting by expressing outgoing radiance as a function of local illumination and BRDF features, optimising both for realistic view-dependent effects.


Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions

http://arxiv.org/abs/2408.03747v1

Compressor summary: The text discusses a survey on time-series anomaly detection methods and their challenges, such as benchmarking, data sets, evaluation metrics, and threshold selection.


Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model

http://arxiv.org/abs/2408.03748v1

Compressor summary: The paper introduces a new method, edge guided conditional diffusion model, to generate realistic pseudo thermal images using visible edges and train deep learning models for object detection in low light and adverse weather conditions.


Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling

http://arxiv.org/abs/2408.03746v1

Compressor summary: The paper introduces a new approach to improve Bayesian Last Layer models by using implicit priors and diffusion techniques, which enhances their performance on various datasets and tasks.


Intuitionistic Fuzzy Cognitive Maps for Interpretable Image Classification

http://arxiv.org/abs/2408.03745v1

Compressor summary: The paper presents I2FCM, a novel framework that applies intuitionistic fuzzy c-means to image classification, making CNN models more interpretable by estimating hesitancy and focusing on informative image regions.


Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation

http://arxiv.org/abs/2408.03735v1

Compressor summary: The paper proposes QSLAW, a method that uses parameter quantization and multimodal warmup to improve vision-language instruction tuning efficiency for large language models while maintaining performance.


Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal

http://arxiv.org/abs/2408.03734v1

Compressor summary: The study proposes a novel deep learning architecture, SHAU, for multiscale shadow removal in complex scenes and introduces a new synthetic dataset, MSRD, to benchmark future methods.


Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks

http://arxiv.org/abs/2408.03732v1

Compressor summary: Question Rephrasing helps evaluate input uncertainty of large language models by using sampling methods to assess overall uncertainty in chemical tasks.


A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

http://arxiv.org/abs/2408.03728v1

Compressor summary: FISTAPruner is a new pruning method for large language models that improves efficiency without sacrificing performance by using convex optimization and a correction mechanism.


Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention

http://arxiv.org/abs/2408.03717v1

Compressor summary: The paper introduces SeRankDet, a deep network that uses selective ranking and attention to improve infrared small target detection in complex backgrounds, achieving high accuracy without the conventional trade-off between precision and false alarms.


Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

http://arxiv.org/abs/2408.03706v1

Compressor summary: The text proposes using complexity measures of the local topology of the latent space of a contextual language model to find features of embedding vectors that describe their relation to other similar vectors, which improves sequence tagging tasks like dialogue term extraction.


Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

http://arxiv.org/abs/2408.03695v1

Compressor summary: Openstory++ is a large-scale dataset with instance-level annotations and better training methodology to improve image generation models for creating coherent visual stories from long captions.


Generative Design of Periodic Orbits in the Restricted Three-Body Problem

http://arxiv.org/abs/2408.03691v1

Compressor summary: This paper explores using deep learning and artificial intelligence to generate periodic orbits for space missions and astrodynamics research.


RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks

http://arxiv.org/abs/2408.03685v1

Compressor summary: RL-ADN is a new open-source library that improves the performance and efficiency of deep reinforcement learning for optimizing energy storage systems in distribution networks using advanced data augmentation, network modeling, and power flow solver techniques.


L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

http://arxiv.org/abs/2408.03677v1

Compressor summary: L4DR is a weather-robust method that fuses LiDAR and 4D radar for 3D object detection, improving performance under fog and other adverse weather conditions.


NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

http://arxiv.org/abs/2408.03675v1

Compressor summary: The paper proposes NACL, a framework for efficient eviction of unnecessary tokens from the KV Cache in large language models, improving performance on short and long text tasks while reducing memory usage.


Beyond Over-smoothing: Uncovering the Trainability Challenges in Deep Graph Neural Networks

http://arxiv.org/abs/2408.03669v1

Compressor summary: This paper analyzes the real problem behind performance degradation of deep Graph Neural Networks (GNNs) and shows that trainability issues of MLPs are the main challenge, which can be improved by constraining gradient flow.


AI-Driven approach for sustainable extraction of earth's subsurface renewable energy while minimizing seismic activity

http://arxiv.org/abs/2408.03664v1

Compressor summary: This paper proposes a reinforcement learning method to reduce human-induced seismicity in geothermal energy and carbon capture systems by adjusting controller parameters in real-time.


PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement

http://arxiv.org/abs/2408.03657v1

Compressor summary: The paper presents a method to improve ultrasound image resolution using physics-based deconvolution with neural networks and B-mode images, outperforming traditional methods in tests.


Consumer Transactions Simulation through Generative Adversarial Networks

http://arxiv.org/abs/2408.03655v1

Compressor summary: Key points: - The paper proposes a GAN system to generate synthetic retail transaction data - The system integrates consumer behavior modeling and SKU availability constraints - The system generates transactions under stock constraints, addressing assortment optimization challenges - The system shows enhanced realism in simulated transactions compared to previous methods Summary: The paper presents a GAN system that generates realistic retail transaction data by combining consumer behavior and SKU availability modeling, and applies it to optimize assortments and predict demand.


mucAI at WojoodNER 2024: Arabic Named Entity Recognition with Nearest Neighbor Search

http://arxiv.org/abs/2408.03652v1

Compressor summary: Arabic KNN-NER is a method for identifying and classifying entities in Arabic text using KNN search over cached training data to improve fine-grained flat-entity recognition.


TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization

http://arxiv.org/abs/2408.03637v1

Compressor summary: TALE is a training-free framework that uses latent space manipulation to flawlessly incorporate user-specified objects into different visual contexts.


Time is Not Enough: Time-Frequency based Explanation for Time-Series Black-Box Models

http://arxiv.org/abs/2408.03636v1

Compressor summary: SpectralX provides time-frequency explanations for black-box time-series classifiers using a plug-and-play framework and a new perturbation-based method called FIA.


CARE: A Clue-guided Assistant for CSRs to Read User Manuals

http://arxiv.org/abs/2408.03633v1

Compressor summary: CARE is a reading assistant for customer service representatives that helps them find proper responses from user manuals faster and more accurately by using self-supervised learning and explicit clue chains.


Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis

http://arxiv.org/abs/2408.03632v1

Compressor summary: The paper introduces Concept Conductor, a method to generate multi-concept images without training, ensuring visual fidelity, correct layout, and semantic consistency by isolating sampling processes, using self-attention, and injecting concepts with shape-aware masks.


Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent

http://arxiv.org/abs/2408.03631v1

Compressor summary: The text proposes using large language models and autonomous agents to optimize base station siting in a more efficient, cost-effective, and reliable way, reducing human effort.


PAGED: A Benchmark for Procedural Graphs Extraction from Documents

http://arxiv.org/abs/2408.03630v1

Compressor summary: The paper introduces PAGED, a benchmark for evaluating procedural graph extraction from documents, and shows that existing methods are limited while large language models have potential but also gaps.


Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR

http://arxiv.org/abs/2408.03627v1

Compressor summary: The paper proposes BIDFC, a contrastive learning framework for SAR ATR, which uses weakly contrastive learning and dynamic-weighted variance loss to improve classification accuracy with less labeled data.


On the choice of the non-trainable internal weights in random feature maps

http://arxiv.org/abs/2408.03626v1

Compressor summary: Random feature maps with optimal internal weights can accurately predict dynamical systems' behavior with much less computation than traditional neural networks.


AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

http://arxiv.org/abs/2408.03624v1

Compressor summary: Key points: - The paper proposes AgentsCoMerge, a framework for CAVs to merge safely and efficiently using LLMs - The framework consists of observation, planning, communication, and training modules - Experiments show the superiority of the method in various scenarios Summary: The paper presents a novel framework that enables connected and autonomous vehicles to collaborate and merge safely and efficiently using large language models.


Improving the quality of Persian clinical text with a novel spelling correction system

http://arxiv.org/abs/2408.03622v1

Compressor summary: This study developed an innovative spelling error correction method for Persian clinical text using a fine-tuned model and PERTO algorithm, achieving high precision in detecting and correcting word errors.


Making Robust Generalizers Less Rigid with Soft Ascent-Descent

http://arxiv.org/abs/2408.03619v1

Compressor summary: The paper proposes a new training criterion for machine learning models that penalizes poor loss concentration to improve performance on rare or difficult data points and is compatible with loss transformations like CVaR or DRO.


A Logical Fallacy-Informed Framework for Argument Generation

http://arxiv.org/abs/2408.03618v1

Compressor summary: The FIPO framework improves the quality and logic of arguments generated by large language models by reducing fallacy errors.


Is Child-Directed Speech Effective Training Data for Language Models?

http://arxiv.org/abs/2408.03617v1

Compressor summary: The study compares GPT-2 models trained on child-directed speech and synthetic TinyDialogues to BabyLM datasets, finding that local data properties affect performance but global ones do not, suggesting children's learning is more efficient than language modeling.


Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

http://arxiv.org/abs/2408.03615v1

Compressor summary: The paper proposes a Hybrid Multimodal Memory module to improve long-horizon task completion in artificial intelligence agents by enabling better planning and reflection with world knowledge and multimodal experience.


JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling

http://arxiv.org/abs/2408.03612v1

Compressor summary: JARViS is a two-stage video action detection framework that uses Transformer attention to model interactions between actors and scenes, achieving state-of-the-art performance on three VAD datasets.


InPer: Whole-Process Domain Generalization via Causal Intervention and Perturbation

http://arxiv.org/abs/2408.03608v1

Compressor summary: InPer is a framework that uses causality to enhance domain generalization by refining causal variable selection during training and identifying anti-interference samples for prototype classification during testing.


Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

http://arxiv.org/abs/2408.03599v1

Compressor summary: The paper proposes a framework that unifies and improves activation functions for neural networks, achieving better performance with minimal complexity increase.


PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

http://arxiv.org/abs/2408.03598v1

Compressor summary: PRISM is a detector-free image matching method that prunes irrelevant features, tackles scale discrepancy, and achieves leading accuracy on various benchmarks.


Focal Depth Estimation: A Calibration-Free, Subject- and Daytime Invariant Approach

http://arxiv.org/abs/2408.03591v1

Compressor summary: The study presents a calibration-free method using machine learning and LSTM networks to accurately estimate focal depth from eye movements, improving autofocal glasses usability and enabling their use in extended reality environments.


Hierarchical Neural Constructive Solver for Real-world TSP Scenarios

http://arxiv.org/abs/2408.03585v1

Compressor summary: The paper introduces real-world Traveling Salesman Problem scenarios and proposes a hierarchical approach using Hypernetworks and Expectation-Maximization algorithm to improve routing solutions.


Teach CLIP to Develop a Number Sense for Ordinal Regression

http://arxiv.org/abs/2408.03574v1

Compressor summary: We present NumCLIP, a method that improves ordinal regression performance of pre-trained vision-language models by disassembling the problem into coarse classification and fine prediction stages, using language to leverage numerical bins, and introducing a novel cross-modal ranking loss to maintain alignment.