arxiv compressed, 2024-07-26

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-26 generated by the compressor, my personal LLM-based project.


Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis

http://arxiv.org/abs/2407.18251v1

Compressor summary: The paper studies how robust multimodal models are against different types of adversarial attacks and finds that unimodal DNNs are more resilient and models with ViT-based Image Encoder are more vulnerable.


Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

http://arxiv.org/abs/2407.18248v1

Compressor summary: The paper proposes self-training with Direct Preference Optimization (DPO) to enhance mathematical reasoning in small-scale language models, offering a cheaper and more stable alternative than using large proprietary LMs.


RegionDrag: Fast Region-Based Image Editing with Diffusion Models

http://arxiv.org/abs/2407.18247v1

Compressor summary: RegionDrag is a faster and more accurate region-based copy-and-paste dragging method for image editing that overcomes the limitations of point-drag-based approaches.


VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads

http://arxiv.org/abs/2407.18245v1

Compressor summary: VGGHeads is a large synthetic dataset with over 1 million images of human heads annotated with 3D meshes, landmarks, and boxes, used to train models that detect and reconstruct human heads in real images.


LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

http://arxiv.org/abs/2407.18242v1

Compressor summary: LoRA-Pro improves LoRA by using an equivalent gradient to approximate the full fine-tuning optimization process, closing the performance gap on NLP tasks.


Numerical Literals in Link Prediction: A Critical Examination of Models and Datasets

http://arxiv.org/abs/2407.18241v1

Compressor summary: The text discusses the importance of using numerical literals in link prediction over knowledge graphs, proposing a methodology to evaluate their effectiveness with a new synthetic dataset and ablation strategies on existing benchmarks.


LION: Linear Group RNN for 3D Object Detection in Point Clouds

http://arxiv.org/abs/2407.18232v1

Compressor summary: LION is a window-based framework that uses linear group RNN to improve 3D object detection in sparse point clouds by enhancing spatial features and voxel generation.


Automated Ensemble Multimodal Machine Learning for Healthcare

http://arxiv.org/abs/2407.18227v1

Compressor summary: The paper introduces AutoPrognosis-M, a multimodal machine learning framework that combines structured clinical data and medical imaging for diagnosis and prognosis, using various models and fusion strategies.


Recursive Introspection: Teaching Language Model Agents How to Self-Improve

http://arxiv.org/abs/2407.18219v1

Compressor summary: RISE is an approach to fine-tune LLMs to introspect and correct their mistakes sequentially on hard problems using iterative multi-turn strategies inspired by online imitation learning and reinforcement learning.


Exploring Scaling Trends in LLM Robustness

http://arxiv.org/abs/2407.18213v1

Compressor summary: Larger language models are more resistant to adversarial prompts after adversarial training, but model size alone does not improve robustness.


Geometry Fidelity for Spherical Images

http://arxiv.org/abs/2407.18207v1

Compressor summary: Omnidirectional FID (OmniFID) and Discontinuity Score (DS) are new metrics for measuring the geometric accuracy of spherical images, which account for field-of-view and seam alignment constraints not considered by traditional FID.


AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction

http://arxiv.org/abs/2407.18184v1

Compressor summary: The paper introduces a large dataset for antibody-specific epitope prediction and proposes a new method, WALLE, that combines language models and graph neural networks to achieve significant performance improvement.


Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

http://arxiv.org/abs/2407.18181v1

Compressor summary: The paper presents a novel joint graph learning approach that combines single-cell language models and gene regulatory networks to infer gene regulatory networks from scRNA-seq data, achieving superior performance over existing methods.


PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

http://arxiv.org/abs/2407.18178v1

Compressor summary: PianoMime is a framework that uses internet piano demonstrations, such as Youtube videos, to train a generalist piano-playing agent capable of playing any arbitrary song.


Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

http://arxiv.org/abs/2407.18175v1

Compressor summary: Quasar-ViT is a framework that designs efficient ViT models for hardware implementation by using quantization-aware architecture search techniques and model-adaptive designs on FPGA platform.


The FIGNEWS Shared Task on News Media Narratives

http://arxiv.org/abs/2407.18147v1

Compressor summary: The FIGNEWS shared task is a multilingual effort to develop annotation guidelines for bias and propaganda in news posts about the Israel War on Gaza, with 17 teams participating and producing over 129,000 data points.


Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation

http://arxiv.org/abs/2407.18143v1

Compressor summary: The paper proposes a method to implement maximum entropy reinforcement learning (MaxEnt RL) in on-policy settings by separating the entropy objective from the main objective and shows that it improves policy optimisation performance and generalisation in various tasks.


XS-VID: An Extremely Small Video Object Detection Dataset

http://arxiv.org/abs/2407.18137v1

Compressor summary: The XS-VID dataset provides diverse aerial scenes with small objects for evaluating and improving small video object detection methods, especially for very small objects.


$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

http://arxiv.org/abs/2407.18134v1

Compressor summary: The authors propose a new contrastive loss that encodes how samples relate to others and show that it improves vision model performance across various tasks and data regimes.


Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic

http://arxiv.org/abs/2407.18129v1

Compressor summary: Dallah is a state-of-the-art Arabic multimodal assistant that leverages an advanced LLaMA-2 language model for text and image interactions in multiple dialects, including MSA and dialect-specific tasks.


Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking

http://arxiv.org/abs/2407.18128v1

Compressor summary: The paper proposes using metric learning to estimate earthquake magnitudes from satellite images, improving accuracy over existing methods.


Efficient Inference of Vision Instruction-Following Models with Elastic Cache

http://arxiv.org/abs/2407.18121v1

Compressor summary: Elastic Cache is a novel approach for improving the efficiency of large multimodal instruction-following models by applying different acceleration methods and an importance-driven cache merging strategy.


Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification

http://arxiv.org/abs/2407.18119v1

Compressor summary: The study investigates how linguistic information like grammatical number or semantic role is reflected and localized in sentence embeddings of transformer-based models.


Keypoint Promptable Re-Identification

http://arxiv.org/abs/2407.18112v1

Compressor summary: Keypoint Promptable ReID (KPR) is a new method for identifying occluded individuals by adding keypoints to bounding boxes and introducing a new dataset, Occluded-PoseTrack ReID, with keypoint labels.


Graph Neural Ordinary Differential Equations for Coarse-Grained Socioeconomic Dynamics

http://arxiv.org/abs/2407.18108v1

Compressor summary: The paper proposes a machine learning method to simplify complex socioeconomic systems and predict their behavior, using Baltimore as a case study.


PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization

http://arxiv.org/abs/2407.18078v1

Compressor summary: The text introduces a new dataset, PEFT-U, for evaluating and building NLP models that can personalize large language models like Chat-GPT to individual users' preferences.


HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data

http://arxiv.org/abs/2407.18067v1

Compressor summary: Human-like Video Models (HVM-1) are large-scale video models trained on human-like videos and outperform image-based models in few-shot recognition tasks, thanks to their ability to capture temporal regularities.


Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

http://arxiv.org/abs/2407.18066v1

Compressor summary: The paper proposes a resilience management technique for future radio networks using multi-agent deep reinforcement learning to dynamically adjust antennas and power, improving service availability and coverage.


Difficulty Estimation and Simplification of French Text Using LLMs

http://arxiv.org/abs/2407.18061v1

Compressor summary: The authors propose using generative language models for difficulty estimation and text simplification in foreign languages, achieving high accuracy and meaningful simplifications with minimal fine-tuning.


Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis

http://arxiv.org/abs/2407.18060v1

Compressor summary: The study compares SVM and RF models using radiomic features from different MRI libraries for prostate cancer detection, finding multimodal feature integration can improve robustness and generalizability.


GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

http://arxiv.org/abs/2407.18046v1

Compressor summary: GaussianSR is a novel super-resolution method that uses 2D Gaussian Splatting to represent pixels as continuous Gaussian fields, improving representation ability and performance over traditional discrete latent codes in the encoder.


The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation

http://arxiv.org/abs/2407.18044v1

Compressor summary: QB-RAG is a novel approach that uses pre-computed queries to improve the accuracy of healthcare question answering by LLMs.


Lifelong Graph Summarization with Neural Networks: 2012, 2022, and a Time Warp

http://arxiv.org/abs/2407.18042v1

Compressor summary: Key points: - Neural networks for lifelong graph summarization of web graphs - Comparison of GNNs Graph-MLP and GraphSAINT with MLP baseline - Impact of reusing parameters from previous snapshots - $1$-hop and $2$-hop summaries - Heterogeneity of web graphs affects summary accuracy Summary: The paper explores neural networks for summarizing web graphs over time, using different GNN architectures and comparing $1$-hop and $2$-hop summaries. It also studies the impact of reusing parameters and the effect of web graph heterogeneity on summary accuracy.


How to Train the Teacher Model for Effective Knowledge Distillation

http://arxiv.org/abs/2407.18041v1

Compressor summary: The paper shows that training a teacher model for knowledge distillation with mean squared error (MSE) loss improves the student's accuracy by making the teacher's output closer to the true Bayes conditional probability density (BCPD).


RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

http://arxiv.org/abs/2407.18035v1

Compressor summary: RestoreAgent is an intelligent image restoration system that uses multimodal large language models to autonomously assess and restore images with multiple degradations, outperforming human experts.


AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

http://arxiv.org/abs/2407.18034v1

Compressor summary: AttentionHand is a novel method for generating controllable hand images from text and various modalities to overcome challenges in 3D hand reconstruction in the wild.


ECG Arrhythmia Detection Using Disease-specific Attention-based Deep Learning Model

http://arxiv.org/abs/2407.18033v1

Compressor summary: The authors propose DANet, a novel attention-based deep learning model for arrhythmia detection from short ECG recordings, which provides interpretable waveform regions for diagnosis guidance.


Self-Supervision Improves Diffusion Models for Tabular Data Imputation

http://arxiv.org/abs/2407.18013v1

Compressor summary: SimpDM is a new diffusion model that improves tabular data imputation by regularizing noise alignment and enhancing robustness with state-dependent data augmentation.


HANNA: Hard-constraint Neural Network for Consistent Activity Coefficient Prediction

http://arxiv.org/abs/2407.18011v1

Compressor summary: HANNA is a novel neural network that predicts thermodynamic activity coefficients by strictly adhering to physical laws and outperforming the current state-of-the-art model UNIFAC.


Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

http://arxiv.org/abs/2407.18003v1

Compressor summary: Key points: - Large Language Models (LLMs) are powerful but struggle with long texts due to Transformer architecture. - KV-Cache is a solution that improves efficiency at the cost of increased GPU memory overhead. - Various KV-Cache compression methods have been proposed and reviewed, covering different phases and aspects of LLM optimization. Summary: The text reviews how KV-Cache compression methods optimize Large Language Models for long texts, discussing their advantages, disadvantages, and applications in different phases of LLM development.


Network Inversion of Convolutional Neural Nets

http://arxiv.org/abs/2407.18002v1

Compressor summary: The paper proposes a network inversion technique that uses a conditioned generator to reconstruct inputs likely to produce specific outputs, making neural networks more interpretable and trustworthy.


Investigation to answer three key questions concerning plant pest identification and development of a practical identification framework

http://arxiv.org/abs/2407.18000v1

Compressor summary: The paper presents a new plant pest identification framework that uses ROI detection and CNN-based identification, achieving high accuracy and fast speed on a large dataset of images from various plants and pests.


On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

http://arxiv.org/abs/2407.17997v1

Compressor summary: The paper evaluates using synthetic speech for training ASR systems and compares different architectures' sensitivity to synthetic data generation.


Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography

http://arxiv.org/abs/2407.17996v1

Compressor summary: The proposed framework combines RGB and spectral images to improve image quality and enhance dynamic range, color mapping, and material semantics in mobile photography using a joint decomposition and prior-guided enhancement model.


Amortized Active Learning for Nonparametric Functions

http://arxiv.org/abs/2407.17992v1

Compressor summary: The paper proposes a fast and efficient active learning method using neural networks and Gaussian processes for function learning.


Personalized and Context-aware Route Planning for Edge-assisted Vehicles

http://arxiv.org/abs/2407.17980v1

Compressor summary: The paper proposes a novel graph neural network and deep reinforcement learning framework that customizes routes for autonomous vehicles based on individual driver preferences, outperforming conventional route planners in terms of travel time, congestion level, and satisfaction.


What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models

http://arxiv.org/abs/2407.17974v1

Compressor summary: The study explores how multimodal AI models represent visio-linguistic associations and whether they share the human cross-modal preference for the bouba-kiki effect, finding that results depend on model features.


Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks

http://arxiv.org/abs/2407.17963v1

Compressor summary: The text explains that different arithmetic tasks affect how well large language models (LLMs) perform, depending on the task properties and positional encoding used, and proposes a unified theoretical framework to understand these behaviors.


The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication

http://arxiv.org/abs/2407.17960v1

Compressor summary: The paper explores how representational alignment affects the emergence of linguistic properties in simulated communication and suggests that it may explain mixed results compared to human language experiments.


Neural Networks for Generating Better Local Optima in Topology Optimization

http://arxiv.org/abs/2407.17957v1

Compressor summary: Neural network material discretizations can improve acoustic topology optimization by finding better local optima when combined with the Adam optimizer, but their advantages are limited compared to constrained and higher-order techniques.


SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

http://arxiv.org/abs/2407.17956v1

Compressor summary: SaccadeDet is a new object detection method for gigapixel images that mimics human eye movement to quickly and efficiently find objects of interest.


Scaling Training Data with Lossy Image Compression

http://arxiv.org/abs/2407.17954v1

Compressor summary: The paper proposes a storage scaling law for computer vision tasks that balances the trade-off between storage space and model quality when using lossy data compression.


BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

http://arxiv.org/abs/2407.17952v1

Compressor summary: BetterDepth is a conditional diffusion-based refiner that uses pre-trained MDE predictions as depth conditioning to achieve geometrically correct and detailed monocular depth estimation while being efficient and easy to use with other models.


Pruning Boolean d-DNNF Circuits Through Tseitin-Awareness

http://arxiv.org/abs/2407.17951v1

Compressor summary: The paper introduces Tseitin artifacts as irrelevant subcircuits in d-DNNF compilation and proposes methods to detect and remove them for better probabilistic inference.


Real Time American Sign Language Detection Using Yolo-v9

http://arxiv.org/abs/2407.17950v1

Compressor summary: The paper investigates the performance of YOLO-v9, a real-time American Sign Language detection model that is new to this domain.


Positive Text Reframing under Multi-strategy Optimization

http://arxiv.org/abs/2407.17940v1

Compressor summary: The paper proposes a multi-strategy optimization framework to improve positive reframing using pre-trained language models, which involves designing rewards, decoding approaches, and a re-ranking method for generating fluent and diverse texts with preserved meaning.


Comparison of different Artificial Neural Networks for Bitcoin price forecasting

http://arxiv.org/abs/2407.17930v1

Compressor summary: The study explores how varying the length of sequences used to predict cryptocurrency returns using ANNs affects accuracy and suggests optimizing sequence configurations for better financial forecasting.


Invariance of deep image quality metrics to affine transformations

http://arxiv.org/abs/2407.17927v1

Compressor summary: The authors evaluate image quality metrics by testing their invariance to natural transformations like rotation and illumination changes, and find that none of the current state-of-the-art models match human vision.


Modelling Multimodal Integration in Human Concept Processing with Vision-and-Language Models

http://arxiv.org/abs/2407.17914v1

Compressor summary: The study investigates whether multimodal vision-and-language DNN models better represent human meaning and brain activity than unimodal ones, finding mixed results.


Separating Novel Features for Logical Anomaly Detection: A Straightforward yet Effective Approach

http://arxiv.org/abs/2407.17909v1

Compressor summary: The technical report proposes improving KD-based methods for detecting logical defects in industrial settings by using a margin-based constraint to prevent false negatives and increase AUROC by 1.3%.


Amortized Posterior Sampling with Diffusion Prior Distillation

http://arxiv.org/abs/2407.17907v1

Compressor summary: The authors present a variational inference method that uses a conditional flow model trained from a pre-trained diffusion model to sample efficiently from the posterior distribution for inverse problems in Euclidean and manifold spaces.


Hierarchical Object Detection and Recognition Framework for Practical Plant Disease Diagnosis

http://arxiv.org/abs/2407.17906v1

Compressor summary: HODRF is a two-stage system that combines object detection and classification for plant disease diagnosis, improving accuracy and reducing labeling costs.


Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision

http://arxiv.org/abs/2407.17904v1

Compressor summary: The study finds that using diverse unlabeled surgical data in self-supervised learning improves performance for surgical computer vision applications, and provides a public dataset and model.


The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer

http://arxiv.org/abs/2407.17900v1

Compressor summary: This paper proposes an ensemble method that combines large language models with machine learning to improve the accuracy of predicting lymph node metastasis in lung cancer patients using patient data and medical knowledge.


An Iterative Approach to Topic Modelling

http://arxiv.org/abs/2407.17892v1

Compressor summary: The paper proposes an iterative process for topic modelling that improves its quality and produces a sense of completeness, using the BERTopic package and clustering comparison measures on a COVIDSenti-A dataset subset.


DAM: Towards A Foundation Model for Time Series Forecasting

http://arxiv.org/abs/2407.17880v1

Compressor summary: The DAM is a neural model that uses randomly sampled histories to forecast non-fixed horizons, outperforming existing models in universal forecasting across multiple domains.


A Large-Scale Sensitivity Analysis on Latent Embeddings and Dimensionality Reductions for Text Spatializations

http://arxiv.org/abs/2407.17876v1

Compressor summary: The study analyzes how changes in text corpora, hyperparameters, and randomness affect the stability of map-like metaphors visualizing semantic similarity between documents.


Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions

http://arxiv.org/abs/2407.17874v1

Compressor summary: The paper proposes a method to improve speech recognition for domain specific words by using Whisper and various training techniques, including generating descriptions with a large language model.


EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

http://arxiv.org/abs/2407.17869v1

Compressor summary: The authors propose a deep learning framework that uses residual connections, self-attention, and a reconstruction loss to solve the inverse problem of ellipsometry faster and more accurately than traditional machine learning methods.


factgenie: A Framework for Span-based Evaluation of Generated Texts

http://arxiv.org/abs/2407.17863v1

Compressor summary: Factgenie is a tool that helps analyze and visualize word spans in text outputs, using both human and machine annotations.


Exploring Description-Augmented Dataless Intent Classification

http://arxiv.org/abs/2407.17862v1

Compressor summary: The paper proposes methods for intent classification using text embeddings without labelled data and evaluates their performance and limitations on four datasets.


Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network

http://arxiv.org/abs/2407.17857v1

Compressor summary: Mew is a novel framework for multiplex immunofluorescence images that addresses cellular heterogeneity and scalability issues using a multiplex network with two layers, a Voronoi network and a Cell-type network, and an interpretable attention module.


MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine

http://arxiv.org/abs/2407.17856v1

Compressor summary: The authors present a multimodal dataset for benchmarking decision support algorithms in emergency care, showing improved performance using raw waveform data.


Shapley Value-based Contrastive Alignment for Multimodal Information Extraction

http://arxiv.org/abs/2407.17854v1

Compressor summary: The paper introduces a new Image-Context-Text interaction paradigm and proposes Shap-CA, a contrastive learning method to align context-text and context-image pairs for Multimodal Information Extraction, achieving state-of-the-art results.


Scaling A Simple Approach to Zero-Shot Speech Recognition

http://arxiv.org/abs/2407.17852v1

Compressor summary: The paper introduces MMS Zero-shot, a method that improves zero-shot automatic speech recognition by using romanization and an acoustic model trained on more languages than previous work.


FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

http://arxiv.org/abs/2407.17850v1

Compressor summary: FlexiEdit is a new method that improves image editing by reducing high-frequency components in specific areas to better preserve the original layout and features during non-rigid edits.


Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing

http://arxiv.org/abs/2407.17847v1

Compressor summary: Our proposed tuning-free method enables simultaneous object editing and background preservation in videos using two branches: inversion and editing, with self-attention for consistent image editing.


DragText: Rethinking Text Embedding in Point-based Image Editing

http://arxiv.org/abs/2407.17843v1

Compressor summary: The study investigates how text and image embeddings interact during point-based image editing, proposing DragText to optimize text embedding while preserving content integrity.


On the Opportunities of (Re)-Exploring Atmospheric Science by Foundation Models: A Case Study

http://arxiv.org/abs/2407.17842v1

Compressor summary: Key points: - Current AI applications in atmospheric science use classic deep learning, but have limitations. - Multimodal foundation models, like GPT-4o, can process diverse data and execute complex tasks. - The report evaluates GPT-4o's performance on four main classes of atmospheric scientific tasks. Summary: The report explores how GPT-4o, a multimodal foundation model, performs various atmospheric scientific tasks that classic deep learning cannot handle well.


Long-term Fairness in Ride-Hailing Platform

http://arxiv.org/abs/2407.17839v1

Compressor summary: The text proposes a dynamic Markov Decision Process model for ride-hailing that balances efficiency and fairness by predicting future requests and using a customised scalarisation function.


UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

http://arxiv.org/abs/2407.17838v1

Compressor summary: UMono is a new framework for estimating depth from a single underwater image by considering light, medium, and feature fusion.


IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations

http://arxiv.org/abs/2407.17835v1

Compressor summary: IsUMap is a new method for better representing complex data using UMAP, Isomap, and Vietoris-Rips filtrations.


Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks

http://arxiv.org/abs/2407.17834v1

Compressor summary: Normalization techniques can reduce spectral bias in coordinate networks, improving their performance in various scientific computing tasks.


Unified Lexical Representation for Interpretable Visual-Language Alignment

http://arxiv.org/abs/2407.17827v1

Compressor summary: LexVLA is a framework that learns interpretable lexical representations for both visual and language modalities without complex design, achieving better cross-modal retrieval performance than baselines.


Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality

http://arxiv.org/abs/2407.17822v1

Compressor summary: This study proposes deep-reinforcement-learning methods for flow control in energy systems, using group-invariant networks and positional encoding to improve learning speed and quality.


Demystifying Verbatim Memorization in Large Language Models

http://arxiv.org/abs/2407.17817v1

Compressor summary: The study finds that large language models memorize sequences by encoding high-level features and using general language modeling abilities, making it hard to remove without affecting the model's performance.


NC-NCD: Novel Class Discovery for Node Classification

http://arxiv.org/abs/2407.17816v1

Compressor summary: SWORD is a novel self-training framework that clusters unlabeled data for node classification, preventing forgetting of old categories and improving performance on new ones.


Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning

http://arxiv.org/abs/2407.17813v1

Compressor summary: The Bottleneck Adapter is a novel approach to improve multimodal functionalities of large language models and vision-language tasks by using lightweight adapters for joint optimization, achieving 90.12% accuracy.


EEG-SSM: Leveraging State-Space Model for Dementia Detection

http://arxiv.org/abs/2407.17801v1

Compressor summary: EEG-SSM is a novel model that combines temporal and spectral components to effectively classify dementia using EEG data, achieving high accuracy and outperforming existing models.


A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models

http://arxiv.org/abs/2407.17797v1

Compressor summary: The authors propose Feature Guidance Attack (FGA), which uses text representations to perturb clean images, and its enhanced version FGA-T that leverages text attack, data augmentation, and momentum, showing superior black-box transferability against VLP models.


Enhancing Diversity in Multi-objective Feature Selection

http://arxiv.org/abs/2407.17795v1

Compressor summary: The paper proposes a method to improve feature selection in genetic algorithms by initializing the population with diverse individuals and re-initializing it with new random individuals in each generation.


Investigating learning-independent abstract reasoning in artificial neural networks

http://arxiv.org/abs/2407.17791v1

Compressor summary: This study shows that artificial neural networks can perform non-trivial visual reasoning tasks without prior training, similar to humans' learning-independent reasoning.


Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

http://arxiv.org/abs/2407.17790v1

Compressor summary: Key points: - KANs are a novel neural network type that can replace MLPs with higher accuracy and interoperability - KAN assessment is limited and no study has explored their implementation in hardware design - The paper tests KANs for classification issues using four datasets and implements them in hardware using Vitis HLS tool - The results show that KANs are not better than MLPs in high complex datasets and require more resources Summary: The paper compares KANs and MLPs for classification tasks, finding that MLP is more efficient and accurate than KANs, especially in high complex datasets.


How Lightweight Can A Vision Transformer Be

http://arxiv.org/abs/2407.17783v1

Compressor summary: The paper proposes a simple vision transformer using MoE, feedforward networks, and grouped query attention to reduce complexity and improve performance.


Integrating Ensemble Kalman Filter with AI-based Weather Prediction Model ClimaX

http://arxiv.org/abs/2407.17781v1

Compressor summary: The study shows that combining an AI-based weather prediction model with a data assimilation method can improve forecasts, especially in areas with limited observations.


DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction

http://arxiv.org/abs/2407.17779v1

Compressor summary: Key points: - Cross-modal retrieval with 2D and 3D data has challenges due to noisy annotations - DAC is a framework that uses adaptive division and alignment strategies to handle noisy labels - DAC achieves state-of-the-art results on both traditional and newly proposed benchmarks Summary: The paper presents DAC, a framework that improves cross-modal retrieval with 2D and 3D data by adapting to noisy annotations using dynamic division and alignment techniques. It shows superior performance on various benchmarks.


KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

http://arxiv.org/abs/2407.17773v1

Compressor summary: The paper introduces a new benchmark for testing visual analogical reasoning in large multimodal models (LMMs) and compares their performance to human adults and children, finding LMMs struggle with complex tasks requiring 3D understanding.


Banyan: Improved Representation Learning with Explicit Structure

http://arxiv.org/abs/2407.17771v1

Compressor summary: Banyan is an improved model that learns semantic representations by resolving multiple constituent structures into a shared one, leading to better performance and memory efficiency than prior approaches.


BotEval: Facilitating Interactive Human Evaluation

http://arxiv.org/abs/2407.17770v1

Compressor summary: BotEval is an open-source toolkit that allows human evaluators to interact with NLP models in complex tasks like negotiations and conversation moderation, providing templates and compatibility with crowdsourcing platforms.


Mpox Detection Advanced: Rapid Epidemic Response Through Synthetic Data

http://arxiv.org/abs/2407.17762v1

Compressor summary: The study presents a novel approach using synthetic data to train a computer vision model that accurately detects Mpox lesions on various body parts and skin tones, achieving high accuracy, precision, recall, and F1-Score metrics.


Enhancing Eye Disease Diagnosis with Deep Learning and Synthetic Data Augmentation

http://arxiv.org/abs/2407.17755v1

Compressor summary: The paper proposes an ensemble learning technique using machine learning and deep learning models for improved diagnosis of diabetic retinopathy with high accuracy.


Beyond Entity Alignment: Towards Complete Knowledge Graph Alignment via Entity-Relation Synergy

http://arxiv.org/abs/2407.17745v1

Compressor summary: The paper proposes a new model, EREM, for knowledge graph alignment that decomposes the task into entity alignment and relation alignment, achieving better results than existing models.


Balancing Complementarity and Consistency via Delayed Activation in Incomplete Multi-view Clustering

http://arxiv.org/abs/2407.17744v1

Compressor summary: The paper proposes a dual network with delayed activation to balance complementarity and consistency in incomplete multi-view clustering, improving performance over 12 baselines.


Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

http://arxiv.org/abs/2407.17738v1

Compressor summary: The paper introduces Orthogonal Mapping, a method to improve fine-grained object detection by reducing semantic confusion using orthogonal vectors and improving classification accuracy.


Cost-effective Instruction Learning for Pathology Vision and Language Analysis

http://arxiv.org/abs/2407.17734v1

Compressor summary: CLOVER is a cost-effective instruction learning framework for conversational pathology using GPT-3.5 and template-based instructions that outperforms strong baselines in answering questions.


Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

http://arxiv.org/abs/2407.17730v1

Compressor summary: The authors evaluate the feasibility of using large language models for cognitive behavioral therapy by testing their emotional tendency, structured dialogue pattern, and proactive questioning ability on a CBT corpus from online videos.


Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations

http://arxiv.org/abs/2407.17726v1

Compressor summary: The text describes a new framework that handles incomplete data from different sources and censored survival labels, using advanced foundation models to improve survival analysis accuracy.


Your Graph Recommender is Provably a Single-view Graph Contrastive Learning

http://arxiv.org/abs/2407.17723v1

Compressor summary: This paper analyzes the theoretical relationship between graph recommender (GR) and graph contrastive learning (GCL), showing their equivalence in terms of encoders and loss functions, and suggesting cross-field research directions.


A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

http://arxiv.org/abs/2407.17721v1

Compressor summary: The text proposes a hybrid learning framework that combines CNNs and PINNs to solve the full inverse EIT problem using supervised and unsupervised learning.


Revisiting Machine Unlearning with Dimensional Alignment

http://arxiv.org/abs/2407.17710v1

Compressor summary: The text introduces a novel framework for machine unlearning that uses dimensional alignment as a regularizer loss to remove information from specific data while preserving knowledge from the rest, and also criticizes existing evaluation metrics for machine unlearning.


ALMRR: Anomaly Localization Mamba on Industrial Textured Surface with Feature Reconstruction and Refinement

http://arxiv.org/abs/2407.17705v1

Compressor summary: The paper proposes a novel anomaly localization method that combines Mamba with feature reconstruction and refinement, using artificially simulated anomalies for better training and performance.


Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

http://arxiv.org/abs/2407.17703v1

Compressor summary: The text proposes a novel framework that uses context-aware knowledge graphs and neural networks to improve traffic speed forecasting by considering spatial and temporal urban contexts.


Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks

http://arxiv.org/abs/2407.17697v1

Compressor summary: The study presents new scoring rules (Penalized Brier Score and Penalized Logarithmic Loss) that reward correct predictions more and help improve probabilistic classification models.


Enhancing Agent Learning through World Dynamics Modeling

http://arxiv.org/abs/2407.17695v1

Compressor summary: DiVE is a framework that helps large language models learn and improve their understanding of world dynamics from limited demonstrations, enabling them to make better decisions like human players.


Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

http://arxiv.org/abs/2407.17688v1

Compressor summary: The study examines political biases in large language models and how they affect stance classification tasks, finding significant differences across datasets but not models or prompting schemes.


Transformers on Markov Data: Constant Depth Suffices

http://arxiv.org/abs/2407.17686v1

Compressor summary: Transformers can model generative processes for higher order Markov sources by learning the in-context conditional empirical distribution, as shown by both theoretical and empirical results.


Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

http://arxiv.org/abs/2407.17678v1

Compressor summary: Sparsely-Sharded (S2) Attention is a new attention algorithm that divides context into partitions for different heads, improving efficiency and memory reduction without sacrificing model quality.