arxiv compressed, 2024-07-31

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-31 generated by the compressor, my personal LLM-based project.


ThinK: Thinner Key Cache by Query-Driven Pruning

http://arxiv.org/abs/2407.21018v1

Compressor summary: ThinK is a novel query-dependent KV cache pruning method that reduces memory costs by over 20% without compromising LLM performance on long sequences by exploiting the low-rank structure and unbalanced magnitude distribution in attention weights.


Matting by Generation

http://arxiv.org/abs/2407.21017v1

Compressor summary: The paper proposes a new image matting method using latent diffusion models and pre-trained knowledge, which achieves high resolution and detail in the mattes, and outperforms existing methods on three benchmarks.


Add-SD: Rational Generation without Manual Reference

http://arxiv.org/abs/2407.21016v1

Compressor summary: Add-SD is a diffusion model that can insert objects into realistic scenes based on text prompts, improving downstream tasks like object detection in images.


CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning

http://arxiv.org/abs/2407.21011v1

Compressor summary: CLEFT is a new method that combines large pre-trained models and context-based prompts for efficient contrastive learning of language and images in medical applications, achieving state-of-the-art results on chest X-ray and mammography datasets with reduced model and resource requirements.


AI-Assisted Generation of Difficult Math Questions

http://arxiv.org/abs/2407.21009v1

Compressor summary: The text proposes a design framework that combines LLMs and human input to generate diverse and challenging math questions for training mathematical reasoning.


Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection

http://arxiv.org/abs/2407.21004v1

Compressor summary: Evolver is a system that uses large multimodal models and chain-of-evolution prompting to detect hateful memes by simulating their evolving process and extracting relevant information from similar memes.


XHand: Real-time Expressive Hand Avatar

http://arxiv.org/abs/2407.21002v1

Compressor summary: The paper introduces XHand, a method to create expressive and photo-realistic hand avatars in real-time for extended reality and gaming applications.


GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

http://arxiv.org/abs/2407.21001v1

Compressor summary: VLMs are biased towards associating activities with the expected gender due to ingrained stereotypes and sample selection bias, leading to a 13.2% performance drop in complex scenarios.


MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

http://arxiv.org/abs/2407.20999v1

Compressor summary: MoFO is a new fine-tuning algorithm for large language models that selects and updates parameters with the largest momentum magnitudes to prevent knowledge forgetting without accessing pre-training data or altering the original loss function.


From Feature Importance to Natural Language Explanations Using LLMs with RAG

http://arxiv.org/abs/2407.20990v1

Compressor summary: The authors propose traceable question-answering, which uses an external knowledge repository to help large language models explain their predictions in a scene understanding task, employing counterfactual reasoning and social science insights for better human explanations.


PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter

http://arxiv.org/abs/2407.20987v1

Compressor summary: The paper introduces PIXELMOD, a system that uses perceptual hashes, vector databases, and OCR to efficiently identify visually misleading images on Twitter for soft moderation.


MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

http://arxiv.org/abs/2407.20962v1

Compressor summary: MMTrail is a large multi-modality video-language dataset with diverse topics and custom background music, designed to facilitate cross-modality studies and train advanced language models.


An Effective Dynamic Gradient Calibration Method for Continual Learning

http://arxiv.org/abs/2407.20956v1

Compressor summary: Key points: - Continual learning (CL) faces the challenge of catastrophic forgetting due to limited memory - The paper proposes an algorithm that calibrates the gradient to guide the model in the right direction - The approach can be combined with other CL methods and is evaluated on benchmark datasets Summary: The paper presents a gradient-based method for continual learning that reduces catastrophic forgetting and improves performance by adjusting the gradient direction.


An evidence-based methodology for human rights impact assessment (HRIA) in the development of AI data-intensive systems

http://arxiv.org/abs/2407.20951v1

Compressor summary: This text discusses a third way to regulate AI using human rights, presenting a methodology and model for Human Rights Impact Assessment (HRIA) based on empirical analysis and case studies.


dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans

http://arxiv.org/abs/2407.20950v1

Compressor summary: The authors present dopanim, a new benchmark dataset for multi-annotator learning research, which contains challenging animal images with human-estimated likelihoods and annotator metadata.


UniProcessor: A Text-induced Unified Low-level Image Processor

http://arxiv.org/abs/2407.20928v1

Compressor summary: The paper introduces UniProcessor, a text-induced unified image processor for low-level vision tasks that can effectively process various degradation types and levels using multimodal control.


SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition

http://arxiv.org/abs/2407.20920v1

Compressor summary: SSPA is a new framework for multi-label image recognition that uses in-context learning, split-and-synthesize prompting, and gated dual-modal alignments to improve performance and generalizability.


The Realizability of Revision and Contraction Operators in Epistemic Spaces

http://arxiv.org/abs/2407.20918v1

Compressor summary: The paper investigates when AGM belief revision and contraction can be implemented in epistemic spaces and finds that they require precise epistemic spaces and linear change operators.


How to Choose a Reinforcement-Learning Algorithm

http://arxiv.org/abs/2407.20917v1

Compressor summary: The paper presents a guide to help select reinforcement learning algorithms and action-distribution families for sequential decision-making problems, with an interactive online version available.


What Are Good Positional Encodings for Directed Graphs?

http://arxiv.org/abs/2407.20912v1

Compressor summary: This paper proposes a new positional encoding method for directed graphs called Multi-q Magnetic Laplacian PE, which can better capture spatial relations and outperforms existing methods on various tasks.


Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

http://arxiv.org/abs/2407.20910v1

Compressor summary: The paper proposes a new stance detection method for automated soft-moderation systems that reduces contextual false positives and improves the accuracy of warnings on social media content.


Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

http://arxiv.org/abs/2407.20908v1

Compressor summary: DynaVol-S is a 3D generative model that learns object-centric representations from unsupervised videos by voxelizing scenes and integrating semantic features for improved novel view synthesis and scene decomposition.


Automated Review Generation Method Based on Large Language Models

http://arxiv.org/abs/2407.20906v1

Compressor summary: Key points: - The method uses Large Language Models (LLMs) to generate comprehensive reviews from scientific articles automatically. - It can process a large number of articles quickly, providing deep insights into catalysts' composition, structure, and performance. - It has a quality control strategy to ensure reliability and minimize hallucination risks. - It has expert verification and user-friendly Windows application. Summary: The authors propose an automated review generation method based on LLMs that can rapidly analyze thousands of articles on catalysts, providing valuable insights while ensuring quality and reliability.


Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach

http://arxiv.org/abs/2407.20899v1

Compressor summary: The paper proposes a natural language explanation method for image classification that uses influential neurons and activation maps to generate accurate and accessible explanations without affecting the classifier's performance.


MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

http://arxiv.org/abs/2407.20893v1

Compressor summary: MambaCapsule is a deep neural network that improves the explainability and accuracy of ECG arrhythmia classification by using Mamba for feature extraction and Capsule networks for prediction, while mimicking human brain processing.


What is YOLOv5: A deep look into the internal features of the popular object detector

http://arxiv.org/abs/2407.20892v1

Compressor summary: The study analyzes the YOLOv5 object detection model, detailing its architecture, training methods, performance, and transition to PyTorch, showing its advantages for edge devices.


Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

http://arxiv.org/abs/2407.20891v1

Compressor summary: Bayesian Low-Rank LeArning (Bella) is a framework that reduces the computational complexity of Bayesian neural networks, enabling their use in large-scale tasks and achieving comparable or better performance than conventional methods.


Effective Black Box Testing of Sentiment Analysis Classification Networks

http://arxiv.org/abs/2407.20884v1

Compressor summary: Key points: - The paper proposes coverage criteria for testing transformer-based sentiment analysis networks - The approach uses input space partitioning and k-projection metric to generate tests with emotional features - The experiments show increased test coverage and decreased model accuracy, indicating vulnerabilities Summary: The paper presents a method to test transformer-based sentiment analysis networks for dependability using input space partitioning and k-projection metric, resulting in more covered tests and less accurate models.


A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning

http://arxiv.org/abs/2407.20879v1

Compressor summary: VariantKG is a tool that uses knowledge graphs and graph machine learning to analyze COVID-19 patient genomic data, helping understand complex genetic relationships at the RNA level.


Automatic Die Studies for Ancient Numismatics

http://arxiv.org/abs/2407.20876v1

Compressor summary: The paper presents a new method for automatically identifying ancient coins from images using computer vision techniques, which improves on existing approaches and requires less manual work.


Co-Neighbor Encoding Schema: A Light-cost Structure Encoding Method for Dynamic Link Prediction

http://arxiv.org/abs/2407.20871v1

Compressor summary: CNES is a memory-efficient technique that stores structure encoding information for evolving temporal graphs, enabling parallel vector computation and long-term/short-term neighbor structural learning.


Mean of Means: A 10-dollar Solution for Human Localization with Calibration-free and Unconstrained Camera Settings

http://arxiv.org/abs/2407.20870v1

Compressor summary: The proposed probabilistic approach improves human localization accuracy in the Metaverse era using cheap webcams and without requiring expensive hardware or strict setup constraints.


Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness

http://arxiv.org/abs/2407.20845v1

Compressor summary: The paper introduces a new framework to evaluate how well vision models understand charts by measuring channel accuracy and discriminability of image embeddings.


DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention

http://arxiv.org/abs/2407.20843v1

Compressor summary: The paper introduces DFE-IANet, a novel network that uses spectral transformation and feature interaction to detect polyps in the gastrointestinal tract with high efficiency and accuracy, achieving state-of-the-art results on the Kvasir dataset.


Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

http://arxiv.org/abs/2407.20836v1

Compressor summary: This paper investigates the robustness of image fake detector AI models against adversarial attacks, proposing a new frequency-based post-train Bayesian attack method that can bypass them.


How to Measure the Intelligence of Large Language Models?

http://arxiv.org/abs/2407.20828v1

Compressor summary: The text discusses the capabilities, limitations, and trustworthiness of large language models (LLMs), suggesting that their intelligence should be assessed using both quantitative and qualitative measures.


DyGKT: Dynamic Graph Learning for Knowledge Tracing

http://arxiv.org/abs/2407.20824v1

Compressor summary: The DyGKT model uses a dynamic graph to track students' learning states based on their question-answering behaviors, time intervals, and evolving relationships with questions and concepts.


Adding Circumscription to Decidable Fragments of First-Order Logic: A Complexity Rollercoaster

http://arxiv.org/abs/2407.20822v1

Compressor summary: The paper investigates the expressive power and decidability of circumscription in fragments of first-order logic, and shows that minimizing unary predicates preserves decidability while increasing complexity.


WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection

http://arxiv.org/abs/2407.20818v1

Compressor summary: The study introduces a synthetic dataset and a framework called WARM-3D that improves roadside monocular 3D detection using weak supervision from 2D labels and enhances performance across different real-world environments.


Robust Load Prediction of Power Network Clusters Based on Cloud-Model-Improved Transformer

http://arxiv.org/abs/2407.20817v1

Compressor summary: The Cloud Model Improved Transformer (CMIT) method combines a leading load prediction model with a cloud model and particle swarm optimization to achieve more accurate power load forecasts in power network clusters.


ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning

http://arxiv.org/abs/2407.20806v1

Compressor summary: ARCLE is a tool to help researchers study reinforcement learning on a challenging inductive reasoning benchmark called ARC.


SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting

http://arxiv.org/abs/2407.20799v1

Compressor summary: Key points: - The paper proposes a framework for facial expression spotting using sliding windows, multi-resolution optical flow, and spatio-temporal Transformer - The method can handle subtle motions and complete micro-expressions, as well as general macro- and micro-expressions - The method uses supervised contrastive learning to enhance expression discrimination Summary: The paper presents a framework that uses sliding windows, multi-resolution optical flow, and spatio-temporal Transformer to spot facial expressions accurately, especially micro-expressions, by applying supervised contrastive learning.


Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

http://arxiv.org/abs/2407.20798v1

Compressor summary: The text introduces a novel framework called Diffusion Augmented Agents (DAAG) that uses large language models, vision language models, and diffusion models to enhance reinforcement learning for embodied agents in simulated robotics environments.


How Novice Programmers Use and Experience ChatGPT when Solving Programming Exercises in an Introductory Course

http://arxiv.org/abs/2407.20792v1

Compressor summary: This paper investigates how students use ChatGPT for introductory programming tasks and their perceptions of the tool.


Be aware of overfitting by hyperparameter optimization!

http://arxiv.org/abs/2407.20786v1

Compressor summary: Key points: - Hyperparameter optimization can cause overfitting in solubility prediction with graph-based methods - Transformer CNN, a NLP method based on smiles, performed better than graph-based methods in most cases - Using pre-set hyperparameters reduced computational effort significantly - Consistent statistical measures are important for comparison Summary: The authors compared different methods for solubility prediction and found that overfitting was an issue with hyperparameter optimization of graph-based methods. They showed that a NLP method based on smiles outperformed them in most cases, while using pre-set hyperparameters saved time and consistent statistical measures ensured fair comparison.


Metaheuristic Enhanced with Feature-Based Guidance and Diversity Management for Solving the Capacitated Vehicle Routing Problem

http://arxiv.org/abs/2407.20777v1

Compressor summary: The paper presents a new metaheuristic algorithm for the Capacitated Vehicle Routing Problem that uses feature-based guidance from a Machine Learning model and outperforms existing methods.


Interpretable Pre-Trained Transformers for Heart Time-Series Data

http://arxiv.org/abs/2407.20775v1

Compressor summary: The authors create interpretable pre-trained cardiac models for ECG and PPG data using decoder-only transformers and show their fine-tuning effectiveness for classifying atrial fibrillation.


HyperMM : Robust Multimodal Learning with Varying-sized Inputs

http://arxiv.org/abs/2407.20768v1

Compressor summary: HyperMM is an end-to-end framework for multimodal learning with incomplete imaging data, which uses a conditional hypernetwork and a permutation-invariant neural network to process variable-sized inputs without imputation.


OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

http://arxiv.org/abs/2407.20761v1

Compressor summary: The paper proposes an omniverse balanced training framework to improve the efficiency of vision-language instruct-tuning models by rebalancing the computation load across devices.


SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

http://arxiv.org/abs/2407.20756v1

Compressor summary: SynthVLM is a novel data synthesis pipeline for Vision Large Language Models that generates and selects high-resolution images from captions, achieving SoTA performance on vision question answering tasks with reduced computational overhead and privacy concerns.


Re-localization acceleration with Medoid Silhouette Clustering

http://arxiv.org/abs/2407.20749v1

Compressor summary: The paper proposes a new tree-like search method that significantly speeds up visual re-localization without sacrificing accuracy.


Meltemi: The first open Large Language Model for Greek

http://arxiv.org/abs/2407.20743v1

Compressor summary: Meltemi 7B is an open large language model for Greek, trained on a huge corpus and adapted from Mistral, with an instruction-tuned version available.


Improving PINNs By Algebraic Inclusion of Boundary and Initial Conditions

http://arxiv.org/abs/2407.20741v1

Compressor summary: The paper proposes a method to improve the stability and performance of Physics-Informed Neural Networks by incorporating boundary and initial conditions algebraically, leading to significant reduction in fractional errors.


Efficient Pareto Manifold Learning with Low-Rank Structure

http://arxiv.org/abs/2407.20734v1

Compressor summary: The paper proposes a new method for multi-task learning that reduces parameters, extracts shared features, and improves performance, especially on large datasets.


Autogenic Language Embedding for Coherent Point Tracking

http://arxiv.org/abs/2407.20730v1

Compressor summary: The paper proposes a novel method that uses language embeddings to improve point tracking in long videos by enhancing visual feature coherence and semantic consistency.


Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

http://arxiv.org/abs/2407.20729v1

Compressor summary: The paper presents a novel safe-for-work text classifier for the Malaysian language to ensure responsible deployment of large language models.


SceneTeller: Language-to-3D Scene Generation

http://arxiv.org/abs/2407.20727v1

Compressor summary: The paper introduces a text-based approach for creating high-quality 3D room scenes using generative AI and natural language prompts, making it accessible for non-experts.


Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

http://arxiv.org/abs/2407.20708v1

Compressor summary: The authors propose SpikeYOLO, a spiking neural network-based object detection method that simplifies YOLO and uses a new spiking neuron design to improve performance and energy efficiency over previous methods.


Industrial-Grade Smart Troubleshooting through Causal Technical Language Processing: a Proof of Concept

http://arxiv.org/abs/2407.20700v1

Compressor summary: The paper introduces a method that uses a large language model and industrial knowledge to diagnose industrial problems from Return on Experience records.


Time Series Anomaly Detection with CNN for Environmental Sensors in Healthcare-IoT

http://arxiv.org/abs/2407.20695v1

Compressor summary: The research proposes a new method to identify DDoS attacks in healthcare-IoT using CNNs that analyze environmental sensor data with high accuracy.


Detecting Causality in the Frequency Domain with Cross-Mapping Coherence

http://arxiv.org/abs/2407.20694v1

Compressor summary: The Cross-Mapping Coherence method is a new technique that uses time-series data to discover causal connections in complex, nonlinear systems, and it outperforms existing methods in accuracy, sensitivity, and robustness.


Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

http://arxiv.org/abs/2407.20693v1

Compressor summary: The Temporal-Spatial Perception Model (TSPM) is a framework for answering complex questions related to multimodal videos by perceiving key audio-visual cues using declarative sentence prompts and cross-modal interaction.


The Susceptibility of Example-Based Explainability Methods to Class Outliers

http://arxiv.org/abs/2407.20678v1

Compressor summary: The study evaluates example-based explainability methods for black-box machine learning models, showing their limitations when dealing with class outliers and suggesting the need for better solutions.


Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection

http://arxiv.org/abs/2407.20673v1

Compressor summary: Our proposed label-guided prompt method improves aspect category detection by representing sentences and categories using contextual and semantic information from large language models.


Mimicking the Mavens: Agent-based Opinion Synthesis and Emotion Prediction for Social Media Influencers

http://arxiv.org/abs/2407.20668v1

Compressor summary: The study presents a novel computational framework to predict opinion leaders' views and public emotions on social media using an automatic 5W1H formulation engine and enhanced LLM-based agents, achieving high fidelity and accuracy in predicting the Russia-Ukraine War sentiments.


Rethinking the Function of Neurons in KANs

http://arxiv.org/abs/2407.20667v1

Compressor summary: This paper explores replacing the sum operation in Kolmogorov-Arnold Networks with the average function to improve performance on machine learning tasks.


ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task

http://arxiv.org/abs/2407.20663v1

Compressor summary: The paper describes an Arabic Natural Language Understanding shared task that evaluates systems for resolving word ambiguity and identifying locations in Arabic text using two subtasks and novel datasets.


DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis

http://arxiv.org/abs/2407.20662v1

Compressor summary: The DocXPand-25k dataset provides 25,000 synthetic identity document images with diverse backgrounds and features for ID analysis research.


What makes for good morphology representations for spatial omics?

http://arxiv.org/abs/2407.20660v1

Compressor summary: Spatial omics and imaging AI can help understand tissue architecture by combining gene expression patterns with morphological features, either by translating them to predict gene expression or integrating them to enrich information.


Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks

http://arxiv.org/abs/2407.20657v1

Compressor summary: The paper proposes PDCL-Attack, a method that uses CLIP to generate transferable adversarial examples by guiding a generative model with text prompts based on image labels.


Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian

http://arxiv.org/abs/2407.20654v1

Compressor summary: The paper explores using smaller, domain-specific encoder language models for Italian bureaucratic and legal tasks, improving performance with prompting techniques and calibration methods.


FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks

http://arxiv.org/abs/2407.20653v1

Compressor summary: The paper proposes a feature contrastive approach in the frequency domain to generate robust adversarial examples that work across different domains and models.


Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations

http://arxiv.org/abs/2407.20651v1

Compressor summary: The paper introduces CSR, a causality-guided self-adaptive representation method for reinforcement learning agents to generalize across tasks with changing dynamics, distribution shifts, and environment variations.


No learning rates needed: Introducing SALSA -- Stable Armijo Line Search Adaptation

http://arxiv.org/abs/2407.20650v1

Compressor summary: The paper proposes and evaluates improved line search methods that enhance the performance of stochastic gradient descent techniques for various architectures and data domains.


Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

http://arxiv.org/abs/2407.20648v1

Compressor summary: MF2Vec is a model that uses random walks to generate multi-faceted paths, improving node embeddings and relationships in complex networks for various tasks.


Image Re-Identification: Where Self-supervision Meets Vision-Language Learning

http://arxiv.org/abs/2407.20647v1

Compressor summary: The paper proposes a new image re-identification method, SVLL-ReID, that uses self-supervision to improve the performance of CLIP, a large-scale vision-language pre-trained model.


Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer

http://arxiv.org/abs/2407.20643v1

Compressor summary: The Universal IHC (UIHC) analyzer is an AI model that can interpret various types of cancer and IHC images, improving objective assessment and potentially enabling personalized medicine.


Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos

http://arxiv.org/abs/2407.20642v1

Compressor summary: ClipSitu is a multimodal model that uses CLIP embeddings to predict nouns in different roles for given verbs, achieving state-of-the-art results in situation recognition and localization tasks.


Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

http://arxiv.org/abs/2407.20640v1

Compressor summary: The paper studies pure private learning in the agnostic model, improving upper bounds for item-level and user-level privacy and presenting an efficient algorithm for learning thresholds.


Spiking-DD: Neuromorphic Event Camera based Driver Distraction Detection with Spiking Neural Network

http://arxiv.org/abs/2407.20633v1

Compressor summary: This paper proposes a new approach to detect driver distraction using event cameras and spiking neural networks, which offers better performance, privacy, and efficiency than existing methods.


SharkTrack: an accurate, generalisable software for streamlining shark and ray underwater video analysis

http://arxiv.org/abs/2407.20623v1

Compressor summary: SharkTrack is an AI-enhanced software that detects, tracks, and counts sharks and rays using BRUVS footage, reducing analysis time from hours to minutes.


Decoding Linguistic Representations of Human Brain

http://arxiv.org/abs/2407.20622v1

Compressor summary: This paper proposes a taxonomy of brain-to-language decoding methods that could potentially help people with limited articulation and advance brain-computer interface research.


The Entrapment Problem in Random Walk Decentralized Learning

http://arxiv.org/abs/2407.20611v1

Compressor summary: The paper proposes a decentralized SGD algorithm with L'evy jumps to speed up convergence in graph-based distributed learning, overcoming the entrapment problem.


Investigating Sparsity in Recurrent Neural Networks

http://arxiv.org/abs/2407.20601v1

Compressor summary: The text investigates how pruning and random graphs can improve the performance of Recurrent Neural Networks (RNNs) by making their architecture sparse.


Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning

http://arxiv.org/abs/2407.20600v1

Compressor summary: The paper proposes a new deep metric learning method that fuses hierarchical knowledge about image classes to improve image recognition using a triplet loss function.


Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks

http://arxiv.org/abs/2407.20597v1

Compressor summary: The authors propose two new methods to learn sheaf structures in Sheaf Neural Networks that improve performance, are intuitive, and use fewer parameters than existing methods.


EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos

http://arxiv.org/abs/2407.20592v1

Compressor summary: EgoSonics generates high-quality and synchronized audio for silent egocentric videos using latent diffusion models, enabling new applications in virtual reality and assistive technologies.


Enhancing Agricultural Machinery Management through Advanced LLM Integration

http://arxiv.org/abs/2407.20588v1

Compressor summary: The text proposes a new method using large language models to enhance agricultural machinery management, showing significant improvement over existing approaches.


Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

http://arxiv.org/abs/2407.20584v1

Compressor summary: The Adaptive Sparse Trainer (AST) method compresses large language models by up to 16x while maintaining minimal performance loss and reducing the zero-shot accuracy gap between dense and sparse models.


Image-based Detection of Segment Misalignment in Multi-mirror Satellites using Transfer Learning

http://arxiv.org/abs/2407.20582v1

Compressor summary: The paper presents a transfer learning system using image-based methods to detect segment misalignment in multimirror satellites, achieving high accuracy with binary models and intensity classification.


Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

http://arxiv.org/abs/2407.20581v1

Compressor summary: Knesset-DictaBERT is a Hebrew language model that improves understanding of parliamentary language using the Knesset Corpus dataset.


Comparison of Large Language Models for Generating Contextually Relevant Questions

http://arxiv.org/abs/2407.20578v1

Compressor summary: The study compares three large language models' ability to generate educational questions from slides and finds GPT-3.5 and Llama 2-Chat 13B slightly better than Flan T5 XXL in clarity and alignment.


Monocular Human-Object Reconstruction in the Wild

http://arxiv.org/abs/2407.20566v1

Compressor summary: Key points: - The method learns 3D human-object spatial relation prior from 2D images in the wild - It uses a flow-based neural network to learn the 2D keypoint layout and viewports distribution - It shows improved performance on reconstruction tasks compared to previous methods Summary: The authors propose a method that learns 3D human-object spatial relation prior from 2D images using a flow-based neural network, and demonstrates its effectiveness on reconstructing human-object interactions in real-world scenarios.


CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

http://arxiv.org/abs/2407.20564v1

Compressor summary: This paper tests large language models' ability to reason logically with general and biomedical knowledge graphs, finding they excel at general knowledge but struggle with specialized domain-knowledge and set intersections.


Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering

http://arxiv.org/abs/2407.20563v1

Compressor summary: PyramidCoder is a new framework for visual question answering that uses a single large language model to generate executable programs for complex questions by rephrasing queries, generating code, and aggregating answers in three hierarchical levels.


Invariant deep neural networks under the finite group for solving partial differential equations

http://arxiv.org/abs/2407.20560v1

Compressor summary: The paper proposes a symmetry-enhanced neural network for physics-informed learning that improves accuracy, reduces parameters, and simplifies architecture compared to existing methods.


DiffusionCounterfactuals: Inferring High-dimensional Counterfactuals with Guidance of Causal Representations

http://arxiv.org/abs/2407.20553v1

Compressor summary: The paper proposes a new framework that uses causal representation to generate accurate counterfactual outcomes for complex causal relationships in high-dimensional data.


StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset

http://arxiv.org/abs/2407.20545v1

Compressor summary: The paper proposes a simple and efficient method to encode 3D human-object spatial relations using the Human-Object Offset and a novel Stacked Normalizing Flow (StackFLOW) to infer it from monocular images for 3D human-object interaction perception.


Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

http://arxiv.org/abs/2407.20529v1

Compressor summary: The text discusses the vulnerabilities of large language models in natural language processing, especially in terms of leaking personal data, and proposes mitigation strategies such as model editing and chroma teaming to enhance their security and resilience.


Contrastive Feedback Mechanism for Simultaneous Speech Translation

http://arxiv.org/abs/2407.20524v1

Compressor summary: The text introduces a new method, contrastive feedback mechanism (CFM), for simultaneous speech translation systems to improve quality by using unstable predictions as feedback.


Machine Unlearning in Generative AI: A Survey

http://arxiv.org/abs/2407.20516v1

Compressor summary: The text discusses the problem of undesirable knowledge in generative AI models, and surveys machine unlearning techniques that aim to address it.


Markers Identification for Relative Pose Estimation of an Uncooperative Target

http://arxiv.org/abs/2407.20515v1

Compressor summary: The paper presents a new method using chaser spacecraft image processing and CNNs to detect markers on ENVISAT for safe de-orbiting, with promising results for space sustainability.


Prompt2DeModel: Declarative Neuro-Symbolic Modeling with Natural Language

http://arxiv.org/abs/2407.20513v1

Compressor summary: The paper introduces a method for generating graph-based knowledge representations and neural models from natural language prompts using large language models and domain expert feedback.


Unveiling the Potential of Spiking Dynamics in Graph Representation Learning through Spatial-Temporal Normalization and Coding Strategies

http://arxiv.org/abs/2407.20508v1

Compressor summary: This work proposes a spike-based graph neural network model that uses spiking dynamics and a novel feature normalization technique to improve efficiency and stability in graph representation learning on non-Euclidean data, offering competitive performance with lower computational costs than traditional GNNs.


Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge

http://arxiv.org/abs/2407.20506v1

Compressor summary: Causal exploration is a strategy that uses causal knowledge to improve the efficiency and reliability of world model learning in reinforcement learning by selecting actions that yield useful causal insights.


Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate

http://arxiv.org/abs/2407.20505v1

Compressor summary: The paper proposes a self-reflection scheme and a multi-agent debate approach to reduce hallucinations in MLLMs and interpret their causes, as well as distinguish creativity from hallucination.


Restoring Real-World Degraded Events Improves Deblurring Quality

http://arxiv.org/abs/2407.20502v1

Compressor summary: RDNet improves image deblurring by modeling event degradation and using a new real-world dataset, DavisMCR.


Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

http://arxiv.org/abs/2407.20499v1

Compressor summary: The text discusses the degree-based long-tailed problem that constrains the efficacy of graph neural networks on link prediction and proposes a framework to improve the performance of tail node pairs by increasing common neighbors.


Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs

http://arxiv.org/abs/2407.20496v1

Compressor summary: Gyro-permutation is a channel rearrangement method for hierarchical N:M sparsity that improves the accuracy of compressed deep neural networks using Sparse Tensor Core technology.


A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder

http://arxiv.org/abs/2407.20485v1

Compressor summary: The paper proposes A2SF, a technique that introduces a Forgetting Factor in the Attention Score accumulation process for large language models, improving accuracy in long sequence handling.


Distribution Learning for Molecular Regression

http://arxiv.org/abs/2407.20475v1

Compressor summary: The paper proposes Distributional Mixture of Experts (DMoE), a model-independent and data-independent regression method that predicts probability distributions of targets, and evaluates its performance on molecular property prediction tasks.


Relaxed Equivariant Graph Neural Networks

http://arxiv.org/abs/2407.20471v1

Compressor summary: Relaxed Euclidean graph equivariant neural networks can learn and represent symmetry breaking in continuous groups by using relaxed weights.