arxiv compressed, 2024-07-30

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-30 generated by the compressor, my personal LLM-based project.


Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing

http://arxiv.org/abs/2407.20232v1

Compressor summary: The SANE pipeline uses a large language model to resolve ambiguities in text-based editing instructions and improve the performance and diversity of diffusion-based editing systems.


SAPG: Split and Aggregate Policy Gradients

http://arxiv.org/abs/2407.20230v1

Compressor summary: SAPG is a new on-policy RL algorithm that improves performance in large-scale environments by splitting them into chunks and fusing them back together via importance sampling.


FlexAttention for Efficient High-Resolution Vision-Language Models

http://arxiv.org/abs/2407.20228v1

Compressor summary: FlexAttention is a novel attention mechanism for vision-language models that uses low- and high-resolution tokens to reduce computational costs without sacrificing performance.


Can Editing LLMs Inject Harm?

http://arxiv.org/abs/2407.20224v1

Compressor summary: Key points: - Knowledge editing can be used to inject harm into Large Language Models (LLMs) by creating Editing Attack risks - Two main risks are Misinformation Injection and Bias Injection, which can compromise LLMs' safety alignment - Editing attacks are stealthy and hard to defend against Summary: The paper shows how knowledge editing techniques can be exploited as a new type of safety threat for Large Language Models, causing Misinformation Injection and Bias Injection that undermine their reliability and fairness.


Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

http://arxiv.org/abs/2407.20223v1

Compressor summary: The paper presents a novel unsupervised method for registering point clouds in SE(3) space, using RKHS functions and a new distance metric, which performs better than classical methods on noisy real datasets.


Global Structure-from-Motion Revisited

http://arxiv.org/abs/2407.20219v1

Compressor summary: The authors propose GLOMAP, a global Structure-from-Motion system that is fast and achieves accuracy comparable to or better than the state-of-the-art incremental method COLMAP.


Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

http://arxiv.org/abs/2407.20209v1

Compressor summary: The paper investigates how to identify stable and unstable minima in overparameterized optimization tasks using the characteristic Lyapunov exponent for gradient descent algorithms.


Supertrust: Evolution-based superalignment strategy for safe coexistence

http://arxiv.org/abs/2407.20208v1

Compressor summary: The paper proposes a new approach to solve the unsolvable alignment problem of controlling superintelligence by establishing mutual trust between humans and AI through instinctive nature rather than nurturing.


QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

http://arxiv.org/abs/2407.20207v1

Compressor summary: QAEA-DR is a novel text augmentation framework that uses question-answer pairs and event extraction to improve dense retrieval without modifying embedding or retrieval methods.


Learning Random Numbers to Realize Appendable Memory System for Artificial Intelligence to Acquire New Knowledge after Deployment

http://arxiv.org/abs/2407.20197v1

Compressor summary: The study developed a learning method to teach AI to memorize and recall data without updating parameters, using an Appendable Memory system with two components: the Memorizer and the Recaller.


Time series forecasting with high stakes: A field study of the air cargo industry

http://arxiv.org/abs/2407.20192v1

Compressor summary: Key points: - The paper proposes a machine learning approach for air cargo demand forecasting at O&D level - It uses a mixture of experts framework with statistical and deep learning models - It beats industry benchmarks and helps with capacity allocation and strategic decisions Summary: The paper presents a machine learning method that combines different models to forecast air cargo demand accurately and improve decision-making in the volatile air cargo market.


MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

http://arxiv.org/abs/2407.20183v1

Compressor summary: MindSearch is a LLM-based multi-agent framework that mimics human cognition in web information seeking and integration, outperforming existing methods in response quality and efficiency.


AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs

http://arxiv.org/abs/2407.20177v1

Compressor summary: AutoScale is a tool that automatically adjusts data composition for pretraining LLMs based on target scale and improves performance across downstream tasks.


Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

http://arxiv.org/abs/2407.20174v1

Compressor summary: The authors propose a data engine to enhance training datasets for chart question answering, improve visual encodings, and adapt multimodal language models to better recognize charts with rich text elements.


Machine Learning for predicting chaotic systems

http://arxiv.org/abs/2407.20158v1

Compressor summary: The paper compares lightweight and heavyweight machine learning architectures for predicting chaotic systems, finding that simple methods often outperform deep learning models depending on data and resources.


rLLM: Relational Table Learning with LLMs

http://arxiv.org/abs/2407.20157v1

Compressor summary: rLLM is a PyTorch library that helps create models for learning from tables with large language models, by standardizing modules and providing a simple framework.


Hierarchically Disentangled Recurrent Network for Factorizing System Dynamics of Multi-scale Systems

http://arxiv.org/abs/2407.20152v1

Compressor summary: The text proposes a KGML framework that uses a hierarchical recurrent neural architecture to model multi-scale processes and improve streamflow forecasting in hydrology, incorporating new observations without expensive optimization approaches.


ByteCheckpoint: A Unified Checkpointing System for LLM Development

http://arxiv.org/abs/2407.20143v1

Compressor summary: ByteCheckpoint is a PyTorch-native system that enables efficient checkpointing for large language models by supporting online resharding, disaggregated storage, and tensor merging techniques.


DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models

http://arxiv.org/abs/2407.20141v1

Compressor summary: The paper proposes a dual-domain framework to defend against personalized visual content generation technologies by disrupting their output and generating high-quality adversarial samples.


Tightening the Evaluation of PAC Bounds Using Formal Verification Results

http://arxiv.org/abs/2407.20122v1

Compressor summary: The paper proposes using formal verification of neural systems to improve PAC bounds and evaluate machine learning models' generalisation capacity more accurately.


Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

http://arxiv.org/abs/2407.20119v1

Compressor summary: ASRC is a self-supervised deep clustering method for unstructured data that adapts to graph structure and edge weights, learns feature representations with contrastive learning, and achieves superior performance over other methods.


Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

http://arxiv.org/abs/2407.20109v1

Compressor summary: Diffusion-DICE is a novel offline reinforcement learning method that transforms behavior distribution to optimal policy distribution using diffusion models and guides-then-selects actions to approach global optimum.


Strong Copyright Protection for Language Models via Adaptive Model Fusion

http://arxiv.org/abs/2407.20105v1

Compressor summary: The paper proposes a method to prevent language models from generating copyrighted material by adaptively combining them with a balancing property.


RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding

http://arxiv.org/abs/2407.20099v1

Compressor summary: The text discusses how SNNs with Poisson coding are more adversarially robust than ANNs, and proposes a Randomized Smoothing Coding method to improve their performance on large-scale datasets.


An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation

http://arxiv.org/abs/2407.20083v1

Compressor summary: The paper proposes an energy-based model for Word-level AutoCompletion in Computer-aided Translation, which improves efficiency and effectiveness using three strategies and outperforms previous models by about 6%.


UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

http://arxiv.org/abs/2407.20080v1

Compressor summary: The paper introduces a comprehensive benchmark, UniTTA, for Test-Time Adaptation (TTA) that considers various domain and class conditions, and proposes a versatile framework with two methods to address these challenges.


Investigating the Impact of Semi-Supervised Methods with Data Augmentation on Offensive Language Detection in Romanian Language

http://arxiv.org/abs/2407.20076v1

Compressor summary: This paper explores semi-supervised learning methods and data augmentation techniques for offensive language detection, finding that they can improve model accuracy and robustness.


An Interpretable Rule Creation Method for Black-Box Models based on Surrogate Trees -- SRules

http://arxiv.org/abs/2407.20070v1

Compressor summary: The paper presents SRules, a method to create transparent and interpretable decision trees from black-box machine learning models, enabling stakeholders to understand and trust AI systems' decisions.


xAI-Drop: Don't Use What You Cannot Explain

http://arxiv.org/abs/2407.20067v1

Compressor summary: xAI-Drop is a novel dropping regularizer for GNNs that uses explainability to identify and exclude noisy nodes, improving robustness, generalization, and interpretability.


SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

http://arxiv.org/abs/2407.20062v1

Compressor summary: The paper proposes SalNAS, a neural architecture search framework for saliency prediction, and Self-KD, a self-knowledge distillation approach to improve generalization without computing gradients in the teacher model.


RelBench: A Benchmark for Deep Learning on Relational Databases

http://arxiv.org/abs/2407.20060v1

Compressor summary: RelBench is a benchmark for using graph neural networks to solve predictive tasks over relational databases, showing that end-to-end learned models can outperform manual feature engineering.


Shapley Value Computation in Ontology-Mediated Query Answering

http://arxiv.org/abs/2407.20058v1

Compressor summary: The paper explores using Shapley values in ontology-mediated query answering, shows that it is PF/#P-hard, and extends the results to probabilistic queries with constants.


Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

http://arxiv.org/abs/2407.20053v1

Compressor summary: Orca is a machine learning framework that uses spatial and temporal encoding to improve significant wave height estimation from limited buoy data.


Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals

http://arxiv.org/abs/2407.20047v1

Compressor summary: The paper applies machine learning techniques to fill missing data gaps in ESG datasets and quantifies uncertainty using prediction intervals, improving the reliability of ESG ratings.


Exploring Large Language Models to generate Easy to Read content

http://arxiv.org/abs/2407.20046v1

Compressor summary: The study investigates using AI and NLP to simplify Spanish texts for people with cognitive impairments, creating a corpus of Easy to Read content and testing a Llama2 model.


MaskInversion: Localized Embeddings via Optimization of Explainability Maps

http://arxiv.org/abs/2407.20034v1

Compressor summary: MaskInversion uses explainability maps from pre-trained models like CLIP to generate context-aware embeddings for specific image regions, improving performance on various vision-language tasks.


MimiQ: Low-Bit Data-Free Quantization of Vision Transformers

http://arxiv.org/abs/2407.20021v1

Compressor summary: The paper proposes an improved DFQ method for ViTs that aligns attention maps of synthetic and full-precision data to enhance the performance of quantized networks.


ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning

http://arxiv.org/abs/2407.20020v1

Compressor summary: ImagiNet is a dataset for detecting synthetic images in photos, paintings, faces, and uncategorized content, created using various generative models, to support defensive methods against image impersonation and misinformation.


Classification of freshwater snails of the genus \emph{Radomaniola} with multimodal triplet networks

http://arxiv.org/abs/2407.20013v1

Compressor summary: The paper proposes a machine learning system for classifying freshwater snails using images, measurements, and genetic data, addressing challenges like dataset imbalance and visual similarity.


On the Effects of Irrelevant Variables in Treatment Effect Estimation with Deep Disentanglement

http://arxiv.org/abs/2407.20003v1

Compressor summary: The paper proposes a method to separate relevant and irrelevant variables in observational data for estimating treatment effects, using an autoencoder and orthogonalization.


Do LLMs Really Adapt to Domains? An Ontology Learning Perspective

http://arxiv.org/abs/2407.19998v1

Compressor summary: This paper explores if large language models can adapt to domains and reason over structured knowledge, or if they only learn lexical senses, using a controlled experiment with synthetic corpora containing English and gibberish terms.


Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

http://arxiv.org/abs/2407.19996v1

Compressor summary: The study reproduces and evaluates ITI-GEN, a text-to-image generation model that improves inclusiveness, but finds some limitations and proposes methods to address them.


A Study on the Implementation Method of an Agent-Based Advanced RAG System Using Graph

http://arxiv.org/abs/2407.19994v1

Compressor summary: The study develops an improved knowledge-based question-answering system using Graph technology to generate more accurate and diverse responses with real-time data integration.


More precise edge detections

http://arxiv.org/abs/2407.19992v1

Compressor summary: The paper proposes more precise edge detection models using cascaded skipping density blocks (CSDB) and shows improved performance without down-sample operations and with novel data augmentation techniques.


Classification of Alzheimer's Dementia vs. Healthy subjects by studying structural disparities in fMRI Time-Series of DMN

http://arxiv.org/abs/2407.19990v1

Compressor summary: The paper proposes a method to use deviation from stochasticity (DS) measure and autoencoders to distinguish between healthy and Alzheimer's disease brains based on fMRI time series.


Mixture of Nested Experts: Adaptive Processing of Visual Tokens

http://arxiv.org/abs/2407.19985v1

Compressor summary: MoNE is a new model that uses nested experts to process images and videos more efficiently, reducing computation costs by over two-fold without sacrificing performance.


Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

http://arxiv.org/abs/2407.19984v1

Compressor summary: Key points: - The paper proposes a Bayesian method for confidence estimation in speech-based diagnosis of AD and depression. - It uses a dynamic Dirichlet prior to model the predictive distribution uncertainty. - Experiments on two datasets show better performance than baselines. Summary: The paper presents a novel Bayesian approach that estimates confidence in speech-based diagnosis of AD and depression using a dynamic Dirichlet prior, and shows its advantages over existing methods on two datasets.


A Temporal Psycholinguistics Approach to Identity Resolution of Social Media Users

http://arxiv.org/abs/2407.19967v1

Compressor summary: The thesis presents a method to match social media profiles across platforms using topics, sentiments, and timings of posts, and tests various approaches with mixed results.


Simply Trainable Nearest Neighbour Machine Translation with GPU Inference

http://arxiv.org/abs/2407.19965v1

Compressor summary: The paper proposes a simple and trainable nearest neighbor machine translation method that improves domain adaptation and translation quality, and can be efficiently implemented on GPUs.


Can I trust my anomaly detection system? A case study based on explainable AI

http://arxiv.org/abs/2407.19951v1

Compressor summary: The text discusses how anomaly detection systems based on variational autoencoder generative models may be misled by spurious features and explores using explainable AI methods to assess their robustness.


Inference acceleration for large language models using "stairs" assisted greedy generation

http://arxiv.org/abs/2407.19947v1

Compressor summary: The paper proposes a method to use smaller language models faster and more accurately by combining their fast generation, large model's batch prediction, and validation steps.


Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation

http://arxiv.org/abs/2407.19944v1

Compressor summary: The paper proposes a novel unsupervised graph representation learning method that estimates the quality of multi-hop propagated features using a learnable "meta-representation" to handle noisy features in real data.


Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

http://arxiv.org/abs/2407.19943v1

Compressor summary: The paper proposes two methods to improve safe counterfactual learning to rank (CLTR) by addressing limitations of existing safety measures and handling trust bias, while ensuring performance and safety in deployment.


Boosting Graph Foundation Model from Structural Perspective

http://arxiv.org/abs/2407.19941v1

Compressor summary: The paper proposes a graph foundation model called BooG that unifies structural characteristics across domains by constructing virtual super nodes and uses contrastive learning for pre-training.


Monetizing Currency Pair Sentiments through LLM Explainability

http://arxiv.org/abs/2407.19922v1

Compressor summary: The paper presents a new technique to use large language models for explaining and improving sentiment analysis and price prediction in the financial domain.


FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

http://arxiv.org/abs/2407.19918v1

Compressor summary: Key points: - The paper proposes a training-free approach to extend short video diffusion models for long video generation - The main challenge is the distortion of high-frequency components in long videos - The solution is to blend global and local video features during denoising - The method improves consistency and fidelity of long video generation and supports coherent multi-prompt generation Summary: The paper presents FreeLong, a training-free approach that balances global and local video features to generate consistent and high-quality long videos from short video diffusion models.


Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models

http://arxiv.org/abs/2407.19914v1

Compressor summary: The authors apply transformer models like BERT and T5 to perform sentiment analysis on Lithuanian five-star-based online reviews, achieving high accuracy and outperforming the commercial GPT-4 model.


Cell Culture Assistive Application for Precipitation Image Diagnosis

http://arxiv.org/abs/2407.19913v1

Compressor summary: The authors develop an AI application that automatically detects and classifies precipitation in microscopy images of cell cultures for regenerative medicine research, improving consistency and resolution compared to human inspection.


BEExAI: Benchmark to Evaluate Explainable AI

http://arxiv.org/abs/2407.19897v1

Compressor summary: BEExAI is a benchmark tool for comparing post-hoc explainability methods in machine learning using various evaluation metrics.


End-to-end SYNTAX score prediction: benchmark and methods

http://arxiv.org/abs/2407.19894v1

Compressor summary: The paper introduces a new method to automatically estimate coronary disease severity using a comprehensive dataset and multi-view X-ray videos, achieving an R2 of 0.51.


Leveraging Foundation Models for Zero-Shot IoT Sensing

http://arxiv.org/abs/2407.19893v1

Compressor summary: The paper proposes a method to use foundation models for zero-shot learning in IoT sensing tasks, leveraging cross-attention and data augmentation to generate semantic embeddings from sensor signals.


Self-Supervised Learning for Text Recognition: A Critical Survey

http://arxiv.org/abs/2407.19889v1

Compressor summary: This paper reviews self-supervised learning methods for text recognition, compares their performance, and proposes standards and future directions for the field.


A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation

http://arxiv.org/abs/2407.19886v1

Compressor summary: The paper proposes a new unified model for multi-modal recommender systems called Unified Multi-modal Graph Transformer (UGT), which improves feature extraction and modality modelling by using a multi-way transformer and a graph neural network, resulting in better item representations and user preferences prediction.


Preliminary WMT24 Ranking of General MT Systems and LLMs

http://arxiv.org/abs/2407.19884v1

Compressor summary: The text ranks Machine Translation systems based on automatic metrics, but the official ranking will be from human evaluation and override this preliminary report.


Exploring Robust Face-Voice Matching in Multilingual Environments

http://arxiv.org/abs/2407.19875v1

Compressor summary: Key points: - Paper presents innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024 - Focuses on impact of different languages in face-voice matching using Fusion and Orthogonal Projection (FOP) - Introduces four key components: dual-branch structure, dynamic weighting, robust augmentation, and score polarization strategy - Achieves low EER on two datasets Summary: The paper proposes FAME, an approach to face-voice matching in multilingual environments using FOP and four key components. It achieves low error rates on V2-EH and V1-EU datasets.


OpenUAS: Embeddings of Cities in Japan with Anchor Data for Cross-city Analysis of Area Usage Patterns

http://arxiv.org/abs/2407.19872v1

Compressor summary: OpenUAS is a novel dataset of area embeddings that captures urban usage patterns in eight major Japanese cities, enabling analysis and comparison across different cities and periods.


Distances Between Partial Preference Orderings

http://arxiv.org/abs/2407.19869v1

Compressor summary: The paper suggests two ways to measure the difference between partial preferences, one based on combinatorics and another based on belief functions, with the latter being more efficient for high-dimensional problems.


Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

http://arxiv.org/abs/2407.19865v1

Compressor summary: The paper investigates using imitation learning to train artificial agents to assist human dispatchers in operating complex power grids, showing promising results and computational efficiency.


Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

http://arxiv.org/abs/2407.19860v1

Compressor summary: The paper proposes a safe reinforcement learning method that uses anomalous state sequences to detect unsafe states in safety-critical environments, such as self-driving cars.


Online Multi-Source Domain Adaptation through Gaussian Mixtures and Dataset Dictionary Learning

http://arxiv.org/abs/2407.19853v1

Compressor summary: The paper presents a novel online method for multi-source domain adaptation using Gaussian Mixture Models and Wasserstein geometry, which can adapt to new target domains in real time and store data as memory.


Normality Addition via Normality Detection in Industrial Image Anomaly Detection Models

http://arxiv.org/abs/2407.19849v1

Compressor summary: The paper proposes a new method called NAND that adjusts image anomaly detection models to incorporate new types of normality using vision-language models and textual descriptions.


BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

http://arxiv.org/abs/2407.19845v1

Compressor summary: BackdoorBench is a benchmark for backdoor learning that integrates 20 attack and 32 defense algorithms, provides comprehensive evaluations, and offers insights and resources for researchers.


VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field

http://arxiv.org/abs/2407.19837v1

Compressor summary: The paper proposes using Centroidal Voronoi Tesselation for volumetric shape representations in multi-view reconstruction tasks, improving performance and quality on complex scenes.


ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation

http://arxiv.org/abs/2407.19835v1

Compressor summary: The ATHAR dataset provides high-quality translations of Classical Arabic texts covering various topics and can improve large language models' performance in translating this language.


ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2

http://arxiv.org/abs/2407.19832v1

Compressor summary: ML-Mamba is a fast and efficient multimodal language model that uses Mamba-2 for processing images and text, achieving competitive results in various tasks.


Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

http://arxiv.org/abs/2407.19825v1

Compressor summary: The paper analyzes how output lengths affect large language models and proposes a refined prompt engineering strategy, Constrained-CoT, to improve correctness and conciseness of answers.


ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality

http://arxiv.org/abs/2407.19820v1

Compressor summary: ActivityCLIP enhances group activity recognition by using text information from action labels alongside image information.


Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies

http://arxiv.org/abs/2407.19816v1

Compressor summary: The study compares NER models and LLMs for skill extraction from Russian job vacancies, finding that traditional NER models perform better and faster.


Improving Retrieval Augmented Language Model with Self-Reasoning

http://arxiv.org/abs/2407.19813v1

Compressor summary: The text proposes a self-reasoning framework that improves reliability and traceability of Retrieval-Augmented Language Models by leveraging reasoning trajectories generated by the model itself.


Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture

http://arxiv.org/abs/2407.19811v1

Compressor summary: The study uses artificial neural networks to generate thermal videos, which can help improve automatic pain assessment in patients by monitoring their facial expressions.


Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS

http://arxiv.org/abs/2407.19809v1

Compressor summary: The study proposes a multimodal framework using facial videos and fNIRS to automatically assess pain, achieving 46.76% accuracy in a challenge.


Cool-Fusion: Fuse Large Language Models without Training

http://arxiv.org/abs/2407.19807v1

Compressor summary: Key points: - The problem is fusing heterogeneous large language models (LLMs) to leverage their complementary strengths - The proposed method, Cool-Fusion, does not require training or vocabulary alignment and applies to any set of LLMs - The method generates text segments from each source LLM and jointly reranks them - Experiments show significant accuracy improvement on benchmark datasets Summary: Cool-Fusion is a novel approach to fuse heterogeneous LLMs without training or alignment, by generating and reranking text segments from each model, achieving up to 17.8% accuracy boost on benchmarks.


Imputation for prediction: beware of diminishing returns

http://arxiv.org/abs/2407.19804v1

Compressor summary: Improving imputation for predictive models is less important when using expressive models, incorporating missingness indicators, or dealing with real-data outcomes.


Teaching LLMs at Charles University: Assignments and Activities

http://arxiv.org/abs/2407.19798v1

Compressor summary: The paper shares teaching materials for a new course on large language models, covering experiments with inference and translation, and activities like quizzes, research, and paper discussion.


VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

http://arxiv.org/abs/2407.19795v1

Compressor summary: VolDoGer is a new dataset for vision-language tasks that helps evaluate and improve domain generalization in deep learning models.


Introducing a new hyper-parameter for RAG: Context Window Utilization

http://arxiv.org/abs/2407.19794v1

Compressor summary: This paper proposes a new hyper-parameter for Retrieval-Augmented Generation systems that optimizes chunk size to improve answer quality and suggests that context window utilization is essential for RAG performance.


Hashing based Contrastive Learning for Virtual Screening

http://arxiv.org/abs/2407.19790v1

Compressor summary: DrugHash is a hashing-based contrastive learning method for virtual screening that uses binary hash codes to reduce memory and time costs while achieving high accuracy.


Interpreting Low-level Vision Models with Causal Effect Maps

http://arxiv.org/abs/2407.19789v1

Compressor summary: The paper introduces Causal Effect Map (CEM), a method to interpret and diagnose low-level vision models using causality theory, and demonstrates several interesting insights gained from applying it to various tasks.


SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters

http://arxiv.org/abs/2407.19787v1

Compressor summary: The study presents SciPostLayout, a dataset for scientific poster generation and analysis from papers, which challenges existing computer vision models and shows the potential of LLM for poster creation.


Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting

http://arxiv.org/abs/2407.19784v1

Compressor summary: The text discusses the importance of data-centric AI in training efficient transformer-based models, especially for time series forecasting, and reviews previous research on this topic using a proposed taxonomy.


Synthesizing Scientific Summaries: An Extractive and Abstractive Approach

http://arxiv.org/abs/2407.19779v1

Compressor summary: Key points: - The paper proposes a hybrid method for summarizing research papers using extractive and abstractive approaches - The method captures key findings and motivation of the paper by using unsupervised models and transformer language models - The method achieves human-level abstractiveness with certain combinations of hyperparameters Summary: The paper presents a hybrid summarization technique for research papers that combines extractive and abstractive methods, using different models to capture key findings and motivation, and reaches human-like abstractiveness.


Multimodal Large Language Models for Bioimage Analysis

http://arxiv.org/abs/2407.19778v1

Compressor summary: Multimodal Large Language Models can help extract complex information from biological data and aid human researchers in understanding and analyzing the biological world.


Revisiting Agnostic PAC Learning

http://arxiv.org/abs/2407.19777v1

Compressor summary: PAC learning is a classic model for studying supervised learning, and this paper shows that ERM, a common learning algorithm, is sub-optimal and proposes a better one with novel ideas.


Garment Animation NeRF with Color Editing

http://arxiv.org/abs/2407.19774v1

Compressor summary: The paper proposes a novel approach to synthesize realistic garment animations from body motion sequences using a neural radiance field, capturing detailed features and enabling recoloring.


Generating Unseen Code Tests In Infinitum

http://arxiv.org/abs/2407.19772v1

Compressor summary: The authors propose a method to create benchmark variations that generalize across coding tasks and languages, mitigating leaking into training data issue for evaluating large language models (LLMs) in coding-related tasks.


Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

http://arxiv.org/abs/2407.19768v1

Compressor summary: Our method improves face super-resolution by using wavelet transform to decompose and enhance features, and a full domain Transformer to extract facial information efficiently.


Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model

http://arxiv.org/abs/2407.19765v1

Compressor summary: Key points: - User mobility modeling is important for wireless networks analysis and optimization - Existing methods are limited by privacy issues or lack of real data - The paper proposes Map2Traj, a novel method that uses street maps and diffusion model to generate realistic trajectories - Map2Traj performs well on zero-shot trajectory generation and has potential for wireless network optimization Summary: The paper introduces Map2Traj, a new method that generates realistic user trajectories using street maps and diffusion model, overcoming privacy and data limitations, and showing promising results for wireless network analysis and optimization.


Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios

http://arxiv.org/abs/2407.19760v1

Compressor summary: GPT-4 tends to support liberal views on bioethics issues, possibly due to data bias, raising concerns about its use in legal decision-making.


PredIN: Towards Open-Set Gesture Recognition via Prediction Inconsistency

http://arxiv.org/abs/2407.19753v1

Compressor summary: PredIN is an ensemble learning approach that improves open-set gesture recognition by enhancing diversity among class feature distributions and optimizing inter-class separability in sEMG.


Contextuality Helps Representation Learning for Generalized Category Discovery

http://arxiv.org/abs/2407.19752v1

Compressor summary: The paper proposes a new method for discovering categories in unlabeled data using contextual information from both instance and cluster levels, which improves classification accuracy compared to existing techniques.


Octave-YOLO: Cross frequency detection network with octave convolution

http://arxiv.org/abs/2407.19746v1

Compressor summary: Octave-YOLO is a real-time object detection model that processes high-resolution images using a cross frequency partial network (CFPNet) to efficiently divide and operate on feature maps, achieving comparable performance to YOLOv8 with reduced computational demands.


KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining

http://arxiv.org/abs/2407.19740v1

Compressor summary: The paper introduces a two-stage pipeline for dialogical argument mining, achieving good results and ranking high in the DialAM-2024 shared task.


Sensor Selection via GFlowNets: A Deep Generative Modeling Framework to Navigate Combinatorial Complexity

http://arxiv.org/abs/2407.19736v1

Compressor summary: The paper proposes a new sensor selection method using deep generative models that optimizes quality of service, outperforms existing methods, and handles multiple objectives.


Do Text-to-Vis Benchmarks Test Real Use of Visualisations?

http://arxiv.org/abs/2407.19726v1

Compressor summary: The paper evaluates existing benchmark datasets for code generation of visualisations and finds a gap in their representativeness, suggesting the need for new benchmarks to better support users' needs.


Constructing artificial life and materials scientists with accelerated AI using Deep AndersoNN

http://arxiv.org/abs/2407.19724v1

Compressor summary: Deep AndersoNN accelerates AI by using a single implicit layer and iterative solvers, achieving significant speed-ups and improving accuracy in computational life and materials science tasks.


Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images

http://arxiv.org/abs/2407.19719v1

Compressor summary: The text describes a new method using multimodal large language models and image features to automatically evaluate urban safety, which could help city decision-makers and researchers.


CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare

http://arxiv.org/abs/2407.19705v1

Compressor summary: The study shows how using diverse and well-distributed datasets can improve the performance of smaller language models in medical scenarios, even without increasing their size.


Classification Matters: Improving Video Action Detection with Class-Specific Attention

http://arxiv.org/abs/2407.19698v1

Compressor summary: The paper proposes a new VAD method that pays attention to contextual information for accurate classification of actor actions in videos, using class-dedicated queries and reducing bias toward actor regions.


Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting

http://arxiv.org/abs/2407.19697v1

Compressor summary: The paper proposes a novel framework that uses multiscale representation learning and temporal flow fusion to capture long-term and near-term workload patterns for accurate forecasting in cloud computing systems.


Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images

http://arxiv.org/abs/2407.19696v1

Compressor summary: The paper introduces a novel feature pyramid network for small object detection in aerial images, with two attention blocks and cross-layer contextual information.


Structural damage detection via hierarchical damage information with volumetric assessment

http://arxiv.org/abs/2407.19694v1

Compressor summary: Guided-DetNet is a novel approach that uses generative attention, hierarchical elimination, and volumetric contour visual assessment to improve structural damage detection and classification.


Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

http://arxiv.org/abs/2407.19688v1

Compressor summary: The authors propose a causal interventional prediction system (CIPS) to improve the robustness and explainability of AI-based forecasting systems by using a variational autoencoder and multiple imputations.


Revisiting the robustness of post-hoc interpretability methods

http://arxiv.org/abs/2407.19683v1

Compressor summary: The text discusses the importance of post-hoc interpretability methods in XAI, the limitations of current evaluation strategies, and proposes an approach with new metrics for a fine-grained assessment of these methods.


Harnessing Large Vision and Language Models in Agriculture: A Review

http://arxiv.org/abs/2407.19679v1

Compressor summary: Large models can help farmers solve various challenges in agriculture by providing multimodal information and improving production efficiency.


Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment

http://arxiv.org/abs/2407.19675v1

Compressor summary: The paper proposes a semi-supervised method for action quality assessment (AQA) using a teacher-reference-student architecture, which leverages unlabeled data and pseudo-labels to improve performance.


Advancing Prompt Learning through an External Layer

http://arxiv.org/abs/2407.19674v1

Compressor summary: The paper proposes a new method for adapting visual-language models using an external layer and novel techniques to align textual and visual features, improving performance on various tasks.


SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

http://arxiv.org/abs/2407.19672v1

Compressor summary: SeaLLMs 3 is a large language model tailored for Southeast Asian languages, improving performance and reducing costs while prioritizing safety and inclusivity.


Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval

http://arxiv.org/abs/2407.19670v1

Compressor summary: The paper introduces a shared task on perspective argument retrieval that considers demographic and socio-cultural factors, evaluates six systems, and discusses challenges and biases in incorporating perspectives.


mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

http://arxiv.org/abs/2407.19669v1

Compressor summary: Key points: - The paper proposes a multilingual text representation model (TRM) and reranker for long-context retrieval - The TRM is based on a base-sized encoder enhanced with RoPE and unpadding, pre-trained in a longer context than previous models - The TRM and reranker are trained with contrastive learning and achieve comparable or better performance than large-sized models - The proposed models are more efficient during training and inference Summary: The paper presents efficient and effective multilingual text representation models and rerankers for long-context retrieval, based on a pre-trained encoder with RoPE and unpadding.


Smart Language Agents in Real-World Planning

http://arxiv.org/abs/2407.19667v1

Compressor summary: Key points: - The paper aims to improve travel planning using large language models (LLMs) in the sole-planning mode - It proposes a semi-automated prompt generation framework that combines LLM and human input - It shows that human feedback boosts LLM performance significantly Summary: The paper presents a method to enhance travel planning with LLMs by generating prompts that involve human input, which improves the output quality by 139%.


Take A Step Back: Rethinking the Two Stages in Visual Reasoning

http://arxiv.org/abs/2407.19666v1

Compressor summary: The paper proposes a two-stage visual reasoning framework that separates symbolization and logical reasoning, leading to better generalization ability on various tasks.


Adaptive Soft Error Protection for Deep Learning

http://arxiv.org/abs/2407.19664v1

Compressor summary: The authors propose an adaptive soft error protection strategy for deep learning systems that adjusts protection based on input complexity, reducing protection costs by 46.9% on average while maintaining reliability.


Short-Term Forecasting of Photovoltaic Power Generation Based on Entropy during the Foggy Winter

http://arxiv.org/abs/2407.19663v1

Compressor summary: The paper presents a new model for predicting solar energy output during foggy winters using entropy, clustering, and modified retention network, which improves accuracy over existing methods.


Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

http://arxiv.org/abs/2407.19660v1

Compressor summary: The paper proposes a new foundation model for remote sensing geoscience applications that uses multi-modal data (spectral imagery and weather) and variable step forecasting as its pretraining task, which improves embeddings for downstream applications like crop mapping.


AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias

http://arxiv.org/abs/2407.19655v1

Compressor summary: The text discusses how artificial intelligence is improving healthcare services but also creating ethical challenges related to biases in data and algorithms that can affect different demographic groups.


SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-grade Videos

http://arxiv.org/abs/2407.19652v1

Compressor summary: The paper evaluates 3D wound reconstruction methods from consumer-grade videos using a new dataset and finds that neural rendering approaches are promising for clinical wound assessment.


ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

http://arxiv.org/abs/2407.19651v1

Compressor summary: Key points: - First study of adapting compressed image latents for MLLM-based vision tasks - Novel framework with lightweight transform-neck and surrogate loss - Generic and applicable to different scenarios and MLLMs - Excludes downstream MLLMs from training the transform-neck - Achieves good performance with low complexity Summary: The paper presents a novel framework to adapt compressed image latents for various MLLM-based vision tasks using a lightweight transform-neet and a surrogate loss, without involving downstream MLLMs in training.


Practical Video Object Detection via Feature Selection and Aggregation

http://arxiv.org/abs/2407.19650v1

Compressor summary: The paper proposes a feature selection and aggregation strategy for video object detection that improves accuracy and efficiency, achieving a new record performance on the ImageNet VID dataset.


Foundations for Unfairness in Anomaly Detection -- Case Studies in Facial Imaging Data

http://arxiv.org/abs/2407.19646v1

Compressor summary: This paper explores how deep anomaly detection algorithms can be unfair to certain groups in facial imaging data, identifying sources and factors contributing to this unfairness.


Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices

http://arxiv.org/abs/2407.19644v1

Compressor summary: The paper introduces Block Expansion and Division (BED), a fast block selection algorithm for unaligned block pruning, and an efficient inference kernel for mobile devices to reduce DNN model accuracy drop while maintaining low latency.


Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation

http://arxiv.org/abs/2407.19643v1

Compressor summary: The text describes Prometheus, a chatbot that uses knowledge graphs and large language models to provide personalized recommendations for computer components based on natural language inputs.


From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks?

http://arxiv.org/abs/2407.19638v1

Compressor summary: This study explores how exposure to causal information and context influence the performance of Large Language Models in identifying cause-effect relationships.


OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

http://arxiv.org/abs/2407.19633v1

Compressor summary: The OptiMUS system uses a large language model to automatically formulate and solve linear programming problems from natural language descriptions, improving efficiency and correctness.


"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

http://arxiv.org/abs/2407.19631v1

Compressor summary: The paper proposes a framework called FaMSeC that helps intelligent machines assess their competencies in completing tasks using self-confidence indicators based on problem-solving statistics and human-interpretable reports.


LLMs' Understanding of Natural Language Revealed

http://arxiv.org/abs/2407.19630v1

Compressor summary: The authors claim that large language models have exaggerated language understanding capabilities and propose testing them by querying what they understood from given snippets of text.


Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

http://arxiv.org/abs/2407.19628v1

Compressor summary: Text2LiDAR is a novel model that generates high-quality, controllable LiDAR data from text for various scenarios, using an equirectangular transformer architecture and global-to-focused attention mechanism.


LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

http://arxiv.org/abs/2407.19625v1

Compressor summary: The paper proposes a novel approach called LoginMEA for aligning equivalent entities between multi-modal knowledge graphs by fusing local and global interactions in a unified framework.


Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

http://arxiv.org/abs/2407.19619v1

Compressor summary: The paper proposes a Retrieval-Augmented Generation method that uses contextual examples from existing code translations to enhance code translation quality for complex tasks.


AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs

http://arxiv.org/abs/2407.19617v1

Compressor summary: AgEval is a benchmark to evaluate the performance of multimodal large language models in plant stress phenotyping, showing improvements with few-shot learning and highlighting the need for subject matter expertise.


TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

http://arxiv.org/abs/2407.19616v1

Compressor summary: Topic modeling uses NMF to find themes in text, but manual labeling is needed. The paper proposes a method to automate topic labeling using LLMs and improve knowledge management.