This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-30 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.20232v1
Compressor summary: The SANE pipeline uses a large language model to resolve ambiguities in text-based editing instructions and improve the performance and diversity of diffusion-based editing systems.
http://arxiv.org/abs/2407.20230v1
Compressor summary: SAPG is a new on-policy RL algorithm that improves performance in large-scale environments by splitting them into chunks and fusing them back together via importance sampling.
http://arxiv.org/abs/2407.20228v1
Compressor summary: FlexAttention is a novel attention mechanism for vision-language models that uses low- and high-resolution tokens to reduce computational costs without sacrificing performance.
http://arxiv.org/abs/2407.20224v1
Compressor summary: Key points: - Knowledge editing can be used to inject harm into Large Language Models (LLMs) by creating Editing Attack risks - Two main risks are Misinformation Injection and Bias Injection, which can compromise LLMs' safety alignment - Editing attacks are stealthy and hard to defend against Summary: The paper shows how knowledge editing techniques can be exploited as a new type of safety threat for Large Language Models, causing Misinformation Injection and Bias Injection that undermine their reliability and fairness.
http://arxiv.org/abs/2407.20223v1
Compressor summary: The paper presents a novel unsupervised method for registering point clouds in SE(3) space, using RKHS functions and a new distance metric, which performs better than classical methods on noisy real datasets.
http://arxiv.org/abs/2407.20219v1
Compressor summary: The authors propose GLOMAP, a global Structure-from-Motion system that is fast and achieves accuracy comparable to or better than the state-of-the-art incremental method COLMAP.
http://arxiv.org/abs/2407.20209v1
Compressor summary: The paper investigates how to identify stable and unstable minima in overparameterized optimization tasks using the characteristic Lyapunov exponent for gradient descent algorithms.
http://arxiv.org/abs/2407.20208v1
Compressor summary: The paper proposes a new approach to solve the unsolvable alignment problem of controlling superintelligence by establishing mutual trust between humans and AI through instinctive nature rather than nurturing.
http://arxiv.org/abs/2407.20207v1
Compressor summary: QAEA-DR is a novel text augmentation framework that uses question-answer pairs and event extraction to improve dense retrieval without modifying embedding or retrieval methods.
http://arxiv.org/abs/2407.20197v1
Compressor summary: The study developed a learning method to teach AI to memorize and recall data without updating parameters, using an Appendable Memory system with two components: the Memorizer and the Recaller.
http://arxiv.org/abs/2407.20192v1
Compressor summary: Key points: - The paper proposes a machine learning approach for air cargo demand forecasting at O&D level - It uses a mixture of experts framework with statistical and deep learning models - It beats industry benchmarks and helps with capacity allocation and strategic decisions Summary: The paper presents a machine learning method that combines different models to forecast air cargo demand accurately and improve decision-making in the volatile air cargo market.
http://arxiv.org/abs/2407.20183v1
Compressor summary: MindSearch is a LLM-based multi-agent framework that mimics human cognition in web information seeking and integration, outperforming existing methods in response quality and efficiency.
http://arxiv.org/abs/2407.20177v1
Compressor summary: AutoScale is a tool that automatically adjusts data composition for pretraining LLMs based on target scale and improves performance across downstream tasks.
http://arxiv.org/abs/2407.20174v1
Compressor summary: The authors propose a data engine to enhance training datasets for chart question answering, improve visual encodings, and adapt multimodal language models to better recognize charts with rich text elements.
http://arxiv.org/abs/2407.20158v1
Compressor summary: The paper compares lightweight and heavyweight machine learning architectures for predicting chaotic systems, finding that simple methods often outperform deep learning models depending on data and resources.
http://arxiv.org/abs/2407.20157v1
Compressor summary: rLLM is a PyTorch library that helps create models for learning from tables with large language models, by standardizing modules and providing a simple framework.
http://arxiv.org/abs/2407.20152v1
Compressor summary: The text proposes a KGML framework that uses a hierarchical recurrent neural architecture to model multi-scale processes and improve streamflow forecasting in hydrology, incorporating new observations without expensive optimization approaches.
http://arxiv.org/abs/2407.20143v1
Compressor summary: ByteCheckpoint is a PyTorch-native system that enables efficient checkpointing for large language models by supporting online resharding, disaggregated storage, and tensor merging techniques.
http://arxiv.org/abs/2407.20141v1
Compressor summary: The paper proposes a dual-domain framework to defend against personalized visual content generation technologies by disrupting their output and generating high-quality adversarial samples.
http://arxiv.org/abs/2407.20122v1
Compressor summary: The paper proposes using formal verification of neural systems to improve PAC bounds and evaluate machine learning models' generalisation capacity more accurately.
http://arxiv.org/abs/2407.20119v1
Compressor summary: ASRC is a self-supervised deep clustering method for unstructured data that adapts to graph structure and edge weights, learns feature representations with contrastive learning, and achieves superior performance over other methods.
http://arxiv.org/abs/2407.20109v1
Compressor summary: Diffusion-DICE is a novel offline reinforcement learning method that transforms behavior distribution to optimal policy distribution using diffusion models and guides-then-selects actions to approach global optimum.
http://arxiv.org/abs/2407.20105v1
Compressor summary: The paper proposes a method to prevent language models from generating copyrighted material by adaptively combining them with a balancing property.
http://arxiv.org/abs/2407.20099v1
Compressor summary: The text discusses how SNNs with Poisson coding are more adversarially robust than ANNs, and proposes a Randomized Smoothing Coding method to improve their performance on large-scale datasets.
http://arxiv.org/abs/2407.20083v1
Compressor summary: The paper proposes an energy-based model for Word-level AutoCompletion in Computer-aided Translation, which improves efficiency and effectiveness using three strategies and outperforms previous models by about 6%.
http://arxiv.org/abs/2407.20080v1
Compressor summary: The paper introduces a comprehensive benchmark, UniTTA, for Test-Time Adaptation (TTA) that considers various domain and class conditions, and proposes a versatile framework with two methods to address these challenges.
http://arxiv.org/abs/2407.20076v1
Compressor summary: This paper explores semi-supervised learning methods and data augmentation techniques for offensive language detection, finding that they can improve model accuracy and robustness.
http://arxiv.org/abs/2407.20070v1
Compressor summary: The paper presents SRules, a method to create transparent and interpretable decision trees from black-box machine learning models, enabling stakeholders to understand and trust AI systems' decisions.
http://arxiv.org/abs/2407.20067v1
Compressor summary: xAI-Drop is a novel dropping regularizer for GNNs that uses explainability to identify and exclude noisy nodes, improving robustness, generalization, and interpretability.
http://arxiv.org/abs/2407.20062v1
Compressor summary: The paper proposes SalNAS, a neural architecture search framework for saliency prediction, and Self-KD, a self-knowledge distillation approach to improve generalization without computing gradients in the teacher model.
http://arxiv.org/abs/2407.20060v1
Compressor summary: RelBench is a benchmark for using graph neural networks to solve predictive tasks over relational databases, showing that end-to-end learned models can outperform manual feature engineering.
http://arxiv.org/abs/2407.20058v1
Compressor summary: The paper explores using Shapley values in ontology-mediated query answering, shows that it is PF/#P-hard, and extends the results to probabilistic queries with constants.
http://arxiv.org/abs/2407.20053v1
Compressor summary: Orca is a machine learning framework that uses spatial and temporal encoding to improve significant wave height estimation from limited buoy data.
http://arxiv.org/abs/2407.20047v1
Compressor summary: The paper applies machine learning techniques to fill missing data gaps in ESG datasets and quantifies uncertainty using prediction intervals, improving the reliability of ESG ratings.
http://arxiv.org/abs/2407.20046v1
Compressor summary: The study investigates using AI and NLP to simplify Spanish texts for people with cognitive impairments, creating a corpus of Easy to Read content and testing a Llama2 model.
http://arxiv.org/abs/2407.20034v1
Compressor summary: MaskInversion uses explainability maps from pre-trained models like CLIP to generate context-aware embeddings for specific image regions, improving performance on various vision-language tasks.
http://arxiv.org/abs/2407.20021v1
Compressor summary: The paper proposes an improved DFQ method for ViTs that aligns attention maps of synthetic and full-precision data to enhance the performance of quantized networks.
http://arxiv.org/abs/2407.20020v1
Compressor summary: ImagiNet is a dataset for detecting synthetic images in photos, paintings, faces, and uncategorized content, created using various generative models, to support defensive methods against image impersonation and misinformation.
http://arxiv.org/abs/2407.20013v1
Compressor summary: The paper proposes a machine learning system for classifying freshwater snails using images, measurements, and genetic data, addressing challenges like dataset imbalance and visual similarity.
http://arxiv.org/abs/2407.20003v1
Compressor summary: The paper proposes a method to separate relevant and irrelevant variables in observational data for estimating treatment effects, using an autoencoder and orthogonalization.
http://arxiv.org/abs/2407.19998v1
Compressor summary: This paper explores if large language models can adapt to domains and reason over structured knowledge, or if they only learn lexical senses, using a controlled experiment with synthetic corpora containing English and gibberish terms.
http://arxiv.org/abs/2407.19996v1
Compressor summary: The study reproduces and evaluates ITI-GEN, a text-to-image generation model that improves inclusiveness, but finds some limitations and proposes methods to address them.
http://arxiv.org/abs/2407.19994v1
Compressor summary: The study develops an improved knowledge-based question-answering system using Graph technology to generate more accurate and diverse responses with real-time data integration.
http://arxiv.org/abs/2407.19992v1
Compressor summary: The paper proposes more precise edge detection models using cascaded skipping density blocks (CSDB) and shows improved performance without down-sample operations and with novel data augmentation techniques.
http://arxiv.org/abs/2407.19990v1
Compressor summary: The paper proposes a method to use deviation from stochasticity (DS) measure and autoencoders to distinguish between healthy and Alzheimer's disease brains based on fMRI time series.
http://arxiv.org/abs/2407.19985v1
Compressor summary: MoNE is a new model that uses nested experts to process images and videos more efficiently, reducing computation costs by over two-fold without sacrificing performance.
http://arxiv.org/abs/2407.19984v1
Compressor summary: Key points: - The paper proposes a Bayesian method for confidence estimation in speech-based diagnosis of AD and depression. - It uses a dynamic Dirichlet prior to model the predictive distribution uncertainty. - Experiments on two datasets show better performance than baselines. Summary: The paper presents a novel Bayesian approach that estimates confidence in speech-based diagnosis of AD and depression using a dynamic Dirichlet prior, and shows its advantages over existing methods on two datasets.
http://arxiv.org/abs/2407.19967v1
Compressor summary: The thesis presents a method to match social media profiles across platforms using topics, sentiments, and timings of posts, and tests various approaches with mixed results.
http://arxiv.org/abs/2407.19965v1
Compressor summary: The paper proposes a simple and trainable nearest neighbor machine translation method that improves domain adaptation and translation quality, and can be efficiently implemented on GPUs.
http://arxiv.org/abs/2407.19951v1
Compressor summary: The text discusses how anomaly detection systems based on variational autoencoder generative models may be misled by spurious features and explores using explainable AI methods to assess their robustness.
http://arxiv.org/abs/2407.19947v1
Compressor summary: The paper proposes a method to use smaller language models faster and more accurately by combining their fast generation, large model's batch prediction, and validation steps.
http://arxiv.org/abs/2407.19944v1
Compressor summary: The paper proposes a novel unsupervised graph representation learning method that estimates the quality of multi-hop propagated features using a learnable "meta-representation" to handle noisy features in real data.
http://arxiv.org/abs/2407.19943v1
Compressor summary: The paper proposes two methods to improve safe counterfactual learning to rank (CLTR) by addressing limitations of existing safety measures and handling trust bias, while ensuring performance and safety in deployment.
http://arxiv.org/abs/2407.19941v1
Compressor summary: The paper proposes a graph foundation model called BooG that unifies structural characteristics across domains by constructing virtual super nodes and uses contrastive learning for pre-training.
http://arxiv.org/abs/2407.19922v1
Compressor summary: The paper presents a new technique to use large language models for explaining and improving sentiment analysis and price prediction in the financial domain.
http://arxiv.org/abs/2407.19918v1
Compressor summary: Key points: - The paper proposes a training-free approach to extend short video diffusion models for long video generation - The main challenge is the distortion of high-frequency components in long videos - The solution is to blend global and local video features during denoising - The method improves consistency and fidelity of long video generation and supports coherent multi-prompt generation Summary: The paper presents FreeLong, a training-free approach that balances global and local video features to generate consistent and high-quality long videos from short video diffusion models.
http://arxiv.org/abs/2407.19914v1
Compressor summary: The authors apply transformer models like BERT and T5 to perform sentiment analysis on Lithuanian five-star-based online reviews, achieving high accuracy and outperforming the commercial GPT-4 model.
http://arxiv.org/abs/2407.19913v1
Compressor summary: The authors develop an AI application that automatically detects and classifies precipitation in microscopy images of cell cultures for regenerative medicine research, improving consistency and resolution compared to human inspection.
http://arxiv.org/abs/2407.19897v1
Compressor summary: BEExAI is a benchmark tool for comparing post-hoc explainability methods in machine learning using various evaluation metrics.
http://arxiv.org/abs/2407.19894v1
Compressor summary: The paper introduces a new method to automatically estimate coronary disease severity using a comprehensive dataset and multi-view X-ray videos, achieving an R2 of 0.51.
http://arxiv.org/abs/2407.19893v1
Compressor summary: The paper proposes a method to use foundation models for zero-shot learning in IoT sensing tasks, leveraging cross-attention and data augmentation to generate semantic embeddings from sensor signals.
http://arxiv.org/abs/2407.19889v1
Compressor summary: This paper reviews self-supervised learning methods for text recognition, compares their performance, and proposes standards and future directions for the field.
http://arxiv.org/abs/2407.19886v1
Compressor summary: The paper proposes a new unified model for multi-modal recommender systems called Unified Multi-modal Graph Transformer (UGT), which improves feature extraction and modality modelling by using a multi-way transformer and a graph neural network, resulting in better item representations and user preferences prediction.
http://arxiv.org/abs/2407.19884v1
Compressor summary: The text ranks Machine Translation systems based on automatic metrics, but the official ranking will be from human evaluation and override this preliminary report.
http://arxiv.org/abs/2407.19875v1
Compressor summary: Key points: - Paper presents innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024 - Focuses on impact of different languages in face-voice matching using Fusion and Orthogonal Projection (FOP) - Introduces four key components: dual-branch structure, dynamic weighting, robust augmentation, and score polarization strategy - Achieves low EER on two datasets Summary: The paper proposes FAME, an approach to face-voice matching in multilingual environments using FOP and four key components. It achieves low error rates on V2-EH and V1-EU datasets.
http://arxiv.org/abs/2407.19872v1
Compressor summary: OpenUAS is a novel dataset of area embeddings that captures urban usage patterns in eight major Japanese cities, enabling analysis and comparison across different cities and periods.
http://arxiv.org/abs/2407.19869v1
Compressor summary: The paper suggests two ways to measure the difference between partial preferences, one based on combinatorics and another based on belief functions, with the latter being more efficient for high-dimensional problems.
http://arxiv.org/abs/2407.19865v1
Compressor summary: The paper investigates using imitation learning to train artificial agents to assist human dispatchers in operating complex power grids, showing promising results and computational efficiency.
http://arxiv.org/abs/2407.19860v1
Compressor summary: The paper proposes a safe reinforcement learning method that uses anomalous state sequences to detect unsafe states in safety-critical environments, such as self-driving cars.
http://arxiv.org/abs/2407.19853v1
Compressor summary: The paper presents a novel online method for multi-source domain adaptation using Gaussian Mixture Models and Wasserstein geometry, which can adapt to new target domains in real time and store data as memory.
http://arxiv.org/abs/2407.19849v1
Compressor summary: The paper proposes a new method called NAND that adjusts image anomaly detection models to incorporate new types of normality using vision-language models and textual descriptions.
http://arxiv.org/abs/2407.19845v1
Compressor summary: BackdoorBench is a benchmark for backdoor learning that integrates 20 attack and 32 defense algorithms, provides comprehensive evaluations, and offers insights and resources for researchers.
http://arxiv.org/abs/2407.19837v1
Compressor summary: The paper proposes using Centroidal Voronoi Tesselation for volumetric shape representations in multi-view reconstruction tasks, improving performance and quality on complex scenes.
http://arxiv.org/abs/2407.19835v1
Compressor summary: The ATHAR dataset provides high-quality translations of Classical Arabic texts covering various topics and can improve large language models' performance in translating this language.
http://arxiv.org/abs/2407.19832v1
Compressor summary: ML-Mamba is a fast and efficient multimodal language model that uses Mamba-2 for processing images and text, achieving competitive results in various tasks.
http://arxiv.org/abs/2407.19825v1
Compressor summary: The paper analyzes how output lengths affect large language models and proposes a refined prompt engineering strategy, Constrained-CoT, to improve correctness and conciseness of answers.
http://arxiv.org/abs/2407.19820v1
Compressor summary: ActivityCLIP enhances group activity recognition by using text information from action labels alongside image information.
http://arxiv.org/abs/2407.19816v1
Compressor summary: The study compares NER models and LLMs for skill extraction from Russian job vacancies, finding that traditional NER models perform better and faster.
http://arxiv.org/abs/2407.19813v1
Compressor summary: The text proposes a self-reasoning framework that improves reliability and traceability of Retrieval-Augmented Language Models by leveraging reasoning trajectories generated by the model itself.
http://arxiv.org/abs/2407.19811v1
Compressor summary: The study uses artificial neural networks to generate thermal videos, which can help improve automatic pain assessment in patients by monitoring their facial expressions.
http://arxiv.org/abs/2407.19809v1
Compressor summary: The study proposes a multimodal framework using facial videos and fNIRS to automatically assess pain, achieving 46.76% accuracy in a challenge.
http://arxiv.org/abs/2407.19807v1
Compressor summary: Key points: - The problem is fusing heterogeneous large language models (LLMs) to leverage their complementary strengths - The proposed method, Cool-Fusion, does not require training or vocabulary alignment and applies to any set of LLMs - The method generates text segments from each source LLM and jointly reranks them - Experiments show significant accuracy improvement on benchmark datasets Summary: Cool-Fusion is a novel approach to fuse heterogeneous LLMs without training or alignment, by generating and reranking text segments from each model, achieving up to 17.8% accuracy boost on benchmarks.
http://arxiv.org/abs/2407.19804v1
Compressor summary: Improving imputation for predictive models is less important when using expressive models, incorporating missingness indicators, or dealing with real-data outcomes.
http://arxiv.org/abs/2407.19798v1
Compressor summary: The paper shares teaching materials for a new course on large language models, covering experiments with inference and translation, and activities like quizzes, research, and paper discussion.
http://arxiv.org/abs/2407.19795v1
Compressor summary: VolDoGer is a new dataset for vision-language tasks that helps evaluate and improve domain generalization in deep learning models.
http://arxiv.org/abs/2407.19794v1
Compressor summary: This paper proposes a new hyper-parameter for Retrieval-Augmented Generation systems that optimizes chunk size to improve answer quality and suggests that context window utilization is essential for RAG performance.
http://arxiv.org/abs/2407.19790v1
Compressor summary: DrugHash is a hashing-based contrastive learning method for virtual screening that uses binary hash codes to reduce memory and time costs while achieving high accuracy.
http://arxiv.org/abs/2407.19789v1
Compressor summary: The paper introduces Causal Effect Map (CEM), a method to interpret and diagnose low-level vision models using causality theory, and demonstrates several interesting insights gained from applying it to various tasks.
http://arxiv.org/abs/2407.19787v1
Compressor summary: The study presents SciPostLayout, a dataset for scientific poster generation and analysis from papers, which challenges existing computer vision models and shows the potential of LLM for poster creation.
http://arxiv.org/abs/2407.19784v1
Compressor summary: The text discusses the importance of data-centric AI in training efficient transformer-based models, especially for time series forecasting, and reviews previous research on this topic using a proposed taxonomy.
http://arxiv.org/abs/2407.19779v1
Compressor summary: Key points: - The paper proposes a hybrid method for summarizing research papers using extractive and abstractive approaches - The method captures key findings and motivation of the paper by using unsupervised models and transformer language models - The method achieves human-level abstractiveness with certain combinations of hyperparameters Summary: The paper presents a hybrid summarization technique for research papers that combines extractive and abstractive methods, using different models to capture key findings and motivation, and reaches human-like abstractiveness.
http://arxiv.org/abs/2407.19778v1
Compressor summary: Multimodal Large Language Models can help extract complex information from biological data and aid human researchers in understanding and analyzing the biological world.
http://arxiv.org/abs/2407.19777v1
Compressor summary: PAC learning is a classic model for studying supervised learning, and this paper shows that ERM, a common learning algorithm, is sub-optimal and proposes a better one with novel ideas.
http://arxiv.org/abs/2407.19774v1
Compressor summary: The paper proposes a novel approach to synthesize realistic garment animations from body motion sequences using a neural radiance field, capturing detailed features and enabling recoloring.
http://arxiv.org/abs/2407.19772v1
Compressor summary: The authors propose a method to create benchmark variations that generalize across coding tasks and languages, mitigating leaking into training data issue for evaluating large language models (LLMs) in coding-related tasks.
http://arxiv.org/abs/2407.19768v1
Compressor summary: Our method improves face super-resolution by using wavelet transform to decompose and enhance features, and a full domain Transformer to extract facial information efficiently.
http://arxiv.org/abs/2407.19765v1
Compressor summary: Key points: - User mobility modeling is important for wireless networks analysis and optimization - Existing methods are limited by privacy issues or lack of real data - The paper proposes Map2Traj, a novel method that uses street maps and diffusion model to generate realistic trajectories - Map2Traj performs well on zero-shot trajectory generation and has potential for wireless network optimization Summary: The paper introduces Map2Traj, a new method that generates realistic user trajectories using street maps and diffusion model, overcoming privacy and data limitations, and showing promising results for wireless network analysis and optimization.
http://arxiv.org/abs/2407.19760v1
Compressor summary: GPT-4 tends to support liberal views on bioethics issues, possibly due to data bias, raising concerns about its use in legal decision-making.
http://arxiv.org/abs/2407.19753v1
Compressor summary: PredIN is an ensemble learning approach that improves open-set gesture recognition by enhancing diversity among class feature distributions and optimizing inter-class separability in sEMG.
http://arxiv.org/abs/2407.19752v1
Compressor summary: The paper proposes a new method for discovering categories in unlabeled data using contextual information from both instance and cluster levels, which improves classification accuracy compared to existing techniques.
http://arxiv.org/abs/2407.19746v1
Compressor summary: Octave-YOLO is a real-time object detection model that processes high-resolution images using a cross frequency partial network (CFPNet) to efficiently divide and operate on feature maps, achieving comparable performance to YOLOv8 with reduced computational demands.
http://arxiv.org/abs/2407.19740v1
Compressor summary: The paper introduces a two-stage pipeline for dialogical argument mining, achieving good results and ranking high in the DialAM-2024 shared task.
http://arxiv.org/abs/2407.19736v1
Compressor summary: The paper proposes a new sensor selection method using deep generative models that optimizes quality of service, outperforms existing methods, and handles multiple objectives.
http://arxiv.org/abs/2407.19726v1
Compressor summary: The paper evaluates existing benchmark datasets for code generation of visualisations and finds a gap in their representativeness, suggesting the need for new benchmarks to better support users' needs.
http://arxiv.org/abs/2407.19724v1
Compressor summary: Deep AndersoNN accelerates AI by using a single implicit layer and iterative solvers, achieving significant speed-ups and improving accuracy in computational life and materials science tasks.
http://arxiv.org/abs/2407.19719v1
Compressor summary: The text describes a new method using multimodal large language models and image features to automatically evaluate urban safety, which could help city decision-makers and researchers.
http://arxiv.org/abs/2407.19705v1
Compressor summary: The study shows how using diverse and well-distributed datasets can improve the performance of smaller language models in medical scenarios, even without increasing their size.
http://arxiv.org/abs/2407.19698v1
Compressor summary: The paper proposes a new VAD method that pays attention to contextual information for accurate classification of actor actions in videos, using class-dedicated queries and reducing bias toward actor regions.
http://arxiv.org/abs/2407.19697v1
Compressor summary: The paper proposes a novel framework that uses multiscale representation learning and temporal flow fusion to capture long-term and near-term workload patterns for accurate forecasting in cloud computing systems.
http://arxiv.org/abs/2407.19696v1
Compressor summary: The paper introduces a novel feature pyramid network for small object detection in aerial images, with two attention blocks and cross-layer contextual information.
http://arxiv.org/abs/2407.19694v1
Compressor summary: Guided-DetNet is a novel approach that uses generative attention, hierarchical elimination, and volumetric contour visual assessment to improve structural damage detection and classification.
http://arxiv.org/abs/2407.19688v1
Compressor summary: The authors propose a causal interventional prediction system (CIPS) to improve the robustness and explainability of AI-based forecasting systems by using a variational autoencoder and multiple imputations.
http://arxiv.org/abs/2407.19683v1
Compressor summary: The text discusses the importance of post-hoc interpretability methods in XAI, the limitations of current evaluation strategies, and proposes an approach with new metrics for a fine-grained assessment of these methods.
http://arxiv.org/abs/2407.19679v1
Compressor summary: Large models can help farmers solve various challenges in agriculture by providing multimodal information and improving production efficiency.
http://arxiv.org/abs/2407.19675v1
Compressor summary: The paper proposes a semi-supervised method for action quality assessment (AQA) using a teacher-reference-student architecture, which leverages unlabeled data and pseudo-labels to improve performance.
http://arxiv.org/abs/2407.19674v1
Compressor summary: The paper proposes a new method for adapting visual-language models using an external layer and novel techniques to align textual and visual features, improving performance on various tasks.
http://arxiv.org/abs/2407.19672v1
Compressor summary: SeaLLMs 3 is a large language model tailored for Southeast Asian languages, improving performance and reducing costs while prioritizing safety and inclusivity.
http://arxiv.org/abs/2407.19670v1
Compressor summary: The paper introduces a shared task on perspective argument retrieval that considers demographic and socio-cultural factors, evaluates six systems, and discusses challenges and biases in incorporating perspectives.
http://arxiv.org/abs/2407.19669v1
Compressor summary: Key points: - The paper proposes a multilingual text representation model (TRM) and reranker for long-context retrieval - The TRM is based on a base-sized encoder enhanced with RoPE and unpadding, pre-trained in a longer context than previous models - The TRM and reranker are trained with contrastive learning and achieve comparable or better performance than large-sized models - The proposed models are more efficient during training and inference Summary: The paper presents efficient and effective multilingual text representation models and rerankers for long-context retrieval, based on a pre-trained encoder with RoPE and unpadding.
http://arxiv.org/abs/2407.19667v1
Compressor summary: Key points: - The paper aims to improve travel planning using large language models (LLMs) in the sole-planning mode - It proposes a semi-automated prompt generation framework that combines LLM and human input - It shows that human feedback boosts LLM performance significantly Summary: The paper presents a method to enhance travel planning with LLMs by generating prompts that involve human input, which improves the output quality by 139%.
http://arxiv.org/abs/2407.19666v1
Compressor summary: The paper proposes a two-stage visual reasoning framework that separates symbolization and logical reasoning, leading to better generalization ability on various tasks.
http://arxiv.org/abs/2407.19664v1
Compressor summary: The authors propose an adaptive soft error protection strategy for deep learning systems that adjusts protection based on input complexity, reducing protection costs by 46.9% on average while maintaining reliability.
http://arxiv.org/abs/2407.19663v1
Compressor summary: The paper presents a new model for predicting solar energy output during foggy winters using entropy, clustering, and modified retention network, which improves accuracy over existing methods.
http://arxiv.org/abs/2407.19660v1
Compressor summary: The paper proposes a new foundation model for remote sensing geoscience applications that uses multi-modal data (spectral imagery and weather) and variable step forecasting as its pretraining task, which improves embeddings for downstream applications like crop mapping.
http://arxiv.org/abs/2407.19655v1
Compressor summary: The text discusses how artificial intelligence is improving healthcare services but also creating ethical challenges related to biases in data and algorithms that can affect different demographic groups.
http://arxiv.org/abs/2407.19652v1
Compressor summary: The paper evaluates 3D wound reconstruction methods from consumer-grade videos using a new dataset and finds that neural rendering approaches are promising for clinical wound assessment.
http://arxiv.org/abs/2407.19651v1
Compressor summary: Key points: - First study of adapting compressed image latents for MLLM-based vision tasks - Novel framework with lightweight transform-neck and surrogate loss - Generic and applicable to different scenarios and MLLMs - Excludes downstream MLLMs from training the transform-neck - Achieves good performance with low complexity Summary: The paper presents a novel framework to adapt compressed image latents for various MLLM-based vision tasks using a lightweight transform-neet and a surrogate loss, without involving downstream MLLMs in training.
http://arxiv.org/abs/2407.19650v1
Compressor summary: The paper proposes a feature selection and aggregation strategy for video object detection that improves accuracy and efficiency, achieving a new record performance on the ImageNet VID dataset.
http://arxiv.org/abs/2407.19646v1
Compressor summary: This paper explores how deep anomaly detection algorithms can be unfair to certain groups in facial imaging data, identifying sources and factors contributing to this unfairness.
http://arxiv.org/abs/2407.19644v1
Compressor summary: The paper introduces Block Expansion and Division (BED), a fast block selection algorithm for unaligned block pruning, and an efficient inference kernel for mobile devices to reduce DNN model accuracy drop while maintaining low latency.
http://arxiv.org/abs/2407.19643v1
Compressor summary: The text describes Prometheus, a chatbot that uses knowledge graphs and large language models to provide personalized recommendations for computer components based on natural language inputs.
http://arxiv.org/abs/2407.19638v1
Compressor summary: This study explores how exposure to causal information and context influence the performance of Large Language Models in identifying cause-effect relationships.
http://arxiv.org/abs/2407.19633v1
Compressor summary: The OptiMUS system uses a large language model to automatically formulate and solve linear programming problems from natural language descriptions, improving efficiency and correctness.
http://arxiv.org/abs/2407.19631v1
Compressor summary: The paper proposes a framework called FaMSeC that helps intelligent machines assess their competencies in completing tasks using self-confidence indicators based on problem-solving statistics and human-interpretable reports.
http://arxiv.org/abs/2407.19630v1
Compressor summary: The authors claim that large language models have exaggerated language understanding capabilities and propose testing them by querying what they understood from given snippets of text.
http://arxiv.org/abs/2407.19628v1
Compressor summary: Text2LiDAR is a novel model that generates high-quality, controllable LiDAR data from text for various scenarios, using an equirectangular transformer architecture and global-to-focused attention mechanism.
http://arxiv.org/abs/2407.19625v1
Compressor summary: The paper proposes a novel approach called LoginMEA for aligning equivalent entities between multi-modal knowledge graphs by fusing local and global interactions in a unified framework.
http://arxiv.org/abs/2407.19619v1
Compressor summary: The paper proposes a Retrieval-Augmented Generation method that uses contextual examples from existing code translations to enhance code translation quality for complex tasks.
http://arxiv.org/abs/2407.19617v1
Compressor summary: AgEval is a benchmark to evaluate the performance of multimodal large language models in plant stress phenotyping, showing improvements with few-shot learning and highlighting the need for subject matter expertise.
http://arxiv.org/abs/2407.19616v1
Compressor summary: Topic modeling uses NMF to find themes in text, but manual labeling is needed. The paper proposes a method to automate topic labeling using LLMs and improve knowledge management.