arxiv compressed, 2024-07-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-17 generated by the compressor, my personal LLM-based project.


Does Refusal Training in LLMs Generalize to the Past Tense?

http://arxiv.org/abs/2407.11969v1

Compressor summary: The authors show that many LLMs can be tricked into generating harmful outputs by simply reformulating requests in the past tense, revealing a generalization gap in current refusal training methods.


Efficient Training with Denoised Neural Weights

http://arxiv.org/abs/2407.11966v1

Compressor summary: The paper proposes a weight generator that uses GANs and text conditions to synthesize neural weights, which reduces training time and improves image quality.


UrbanWorld: An Urban World Model for 3D City Generation

http://arxiv.org/abs/2407.11965v1

Compressor summary: UrbanWorld is a generative model that creates realistic 3D city environments for training AI agents to perceive, decide, and act like humans.


NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

http://arxiv.org/abs/2407.11963v1

Compressor summary: NeedleBench is a framework for testing large language models' abilities to retrieve and reason from long texts in different languages, revealing their limitations in complex reasoning tasks.


Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling

http://arxiv.org/abs/2407.11962v1

Compressor summary: MoCo-NeRF is a framework that uses radiance residual fields to model non-rigid motions in dynamic clothed humans, achieving state-of-the-art free-viewpoint rendering quality with efficient learning and simultaneous multi-subject support.


Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation

http://arxiv.org/abs/2407.11954v1

Compressor summary: The paper proposes a GTD network that models uncertainty in long-term action anticipation using a Gated Anticipation Network (GTAN) to represent past and future frames mutually.


Temporally Consistent Stereo Matching

http://arxiv.org/abs/2407.11950v1

Compressor summary: The paper proposes a video stereo matching method that uses temporal information to improve consistency, accuracy, and efficiency by completing the previous disparity map and refining it iteratively in both disparity and gradient spaces.


Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

http://arxiv.org/abs/2407.11948v1

Compressor summary: This paper investigates Transformer-based models' performance and behaviors in multi-document summarization using five empirical studies and various evaluation metrics.


Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

http://arxiv.org/abs/2407.11946v1

Compressor summary: The authors propose an efficient reconstruction architecture for Snapshot Compressive Imaging that uses Hierarchical Separable Video Transformer (HiSViT) to improve performance and efficiency.


Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain

http://arxiv.org/abs/2407.11941v1

Compressor summary: The authors propose a method to explain face recognition decisions by analyzing the influence of frequency components in images, which has not been done before.


Thermal Imaging and Radar for Remote Sleep Monitoring of Breathing and Apnea

http://arxiv.org/abs/2407.11936v1

Compressor summary: The study compares radar and thermal imaging for non-contact sleep monitoring and finds that thermal imaging outperforms radar in detecting and classifying sleep apnea.


Learning Multi-view Anomaly Detection

http://arxiv.org/abs/2407.11935v1

Compressor summary: The study proposes MVAD, a framework for multi-view anomaly detection that learns and integrates features from multiple views using MVAS algorithm, which achieves state-of-the-art performance with minimal computational complexity.


Fairly Accurate: Optimizing Accuracy Parity in Fair Target-Group Detection

http://arxiv.org/abs/2407.11933v1

Compressor summary: The paper introduces GAP, a new differentiable loss function for target detection that balances accuracy across demographic groups and reduces disparate impact.


Fine-grained Hallucination Detection and Mitigation in Long-form Question Answering

http://arxiv.org/abs/2407.11930v1

Compressor summary: HaluQuestQA is a new dataset with localized error annotations for LFQA to evaluate and improve answer quality and comprehensiveness.


Tackling Oversmoothing in GNN via Graph Sparsification: A Truss-based Approach

http://arxiv.org/abs/2407.11928v1

Compressor summary: Key points: - GNNs encode topological structures of networks but suffer from oversmoothing problem in dense regions. - Truss-based graph sparsification prunes redundant edges in dense regions to prevent excessive mixing of node embeddings. - The proposed model improves the performance of various GNN models on graph classification task. Summary: The paper proposes a truss-based graph sparsification method to reduce oversmoothing and enhance graph classification with GNNs.


IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields

http://arxiv.org/abs/2407.11921v1

Compressor summary: The text describes a new attack called IPA-NeRF that can embed hidden backdoors in Neural Radiance Fields, allowing them to produce illusory outputs when triggered by specific views.


What's Wrong? Refining Meeting Summaries with LLM Feedback

http://arxiv.org/abs/2407.11919v1

Compressor summary: The authors propose a multi-LLM correction method for meeting summarization that uses human feedback on error types and refines the summary based on relevance, informativeness, conciseness, and coherence.


Global Optimisation of Black-Box Functions with Generative Models in the Wasserstein Space

http://arxiv.org/abs/2407.11917v1

Compressor summary: The paper proposes a new way to estimate uncertainty in gradient-free optimization of black-box simulators using deep generative models and Wasserstein distance.


Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data

http://arxiv.org/abs/2407.11913v1

Compressor summary: The method combines local image patches with global frequencies learned from data to describe images more efficiently in quantised autoencoders.


Benchmarking the Attribution Quality of Vision Models

http://arxiv.org/abs/2407.11910v1

Compressor summary: The paper proposes a new evaluation protocol for attribution methods, finding that intrinsically explainable models and raw attribution values perform better than previous methods, and different network designs affect attribution quality.


GraphFM: A Scalable Framework for Multi-Graph Pretraining

http://arxiv.org/abs/2407.11907v1

Compressor summary: Graph Foundation Model (GraphFM) is a scalable pretraining approach for node classification on diverse graphs using Perceiver-based encoders and latent tokens, improving adaptability, stability, and generalization across domains.


Encapsulating Knowledge in One Prompt

http://arxiv.org/abs/2407.11902v1

Compressor summary: Key points: - Paradigm combines various models into one prompt without modifying them or needing training data - Enables efficient and convenient knowledge transfer in realistic scenarios - Solves problems of low reusability and high storage consumption of Data-Free Knowledge Transfer - Experiments show effectiveness on different datasets and models, even with no data and limited storage Summary: The KiOP paradigm transfers knowledge from multiple models efficiently and conveniently without modifying them or needing training data, solving common problems of data-free methods, and performing well in various scenarios.


OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

http://arxiv.org/abs/2407.11895v1

Compressor summary: OmniBind is a large-scale multimodal joint representation model that efficiently integrates 3D, audio, image, and language inputs by remaping and binding pre-trained specialist models.


Deep Learning without Global Optimization by Random Fourier Neural Networks

http://arxiv.org/abs/2407.11894v1

Compressor summary: The paper presents a new training algorithm for deep neural networks with random complex exponential activation functions that uses Markov Chain Monte Carlo sampling to achieve theoretical approximation and efficient learning of features without Gibbs phenomena.


DepGAN: Leveraging Depth Maps for Handling Occlusions and Transparency in Image Composition

http://arxiv.org/abs/2407.11890v1

Compressor summary: DepGAN uses depth maps and alpha channels to improve image composition by rectifying occlusions and enhancing transparency effects with a novel Depth Aware Loss function.


Learning Confidence Bounds for Classification with Imbalanced Data

http://arxiv.org/abs/2407.11878v1

Compressor summary: The paper proposes a novel framework that uses class-dependent confidence bounds to improve the robustness and reliability of classification models for imbalanced data.


Simplifying the Theory on Over-Smoothing

http://arxiv.org/abs/2407.11876v1

Compressor summary: The paper simplifies over-smoothing theory in graph convolutions by relating it to power iteration, introduces rank collapse and rank-one distance as new concepts, and identifies more models affected by rank collapse.


Single Layer Single Gradient Unlearning

http://arxiv.org/abs/2407.11867v1

Compressor summary: The paper proposes an efficient machine unlearning method that modifies only a single critical layer of model parameters using one-time gradient computation, enabling effective erasure of certain training samples with low computational cost and general utility retention.


A Novel Lexicon for the Moral Foundation of Liberty

http://arxiv.org/abs/2407.11862v1

Compressor summary: The paper presents a new Liberty lexicon for analyzing moral values in social issues using word embeddings and compositional semantics.


What Makes a Meme a Meme? Identifying Memes for Memetics-Aware Dataset Creation

http://arxiv.org/abs/2407.11861v1

Compressor summary: The paper proposes a meme identification protocol based on memetics to improve meme classification and suggests that existing meme datasets lack genuine memes.


Evaluating Task-Oriented Dialogue Consistency through Constraint Satisfaction

http://arxiv.org/abs/2407.11857v1

Compressor summary: The text proposes using Constraint Satisfaction Problem (CSP) to detect inconsistencies in task-oriented dialogues, finding that LLMs struggle to re-lexicalize dialogues consistently and accurately reflecting domain knowledge.


Scaling Sign Language Translation

http://arxiv.org/abs/2407.11855v1

Compressor summary: This paper presents a large-scale sign language translation pretraining method that uses various data sources and improves open-domain performance on multiple languages and sign languages.


Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection

http://arxiv.org/abs/2407.11854v1

Compressor summary: The paper presents a two-stage fine-tuning pipeline that uses multilingual synthetic data to generate synthetic error corpora for grammatical error detection in low-resource languages, outperforming current annotation-free methods.


SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images

http://arxiv.org/abs/2407.11850v1

Compressor summary: SpaceJAM is an efficient and simple model for joint image alignment that reduces complexity and improves speed by 10x.


InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

http://arxiv.org/abs/2407.11843v1

Compressor summary: InferAct is a new method that uses LLMs' Theory-of-Mind to detect potential errors before risky actions are taken, and it integrates human feedback to improve decision-making.


MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

http://arxiv.org/abs/2407.11840v1

Compressor summary: MVG-Splatting improves 3D reconstruction by adaptively adjusting densification levels and normal calculations for better rendering quality and mesh extraction.


LoFTI: Localization and Factuality Transfer to Indian Locales

http://arxiv.org/abs/2407.11833v1

Compressor summary: LoFTI is a new benchmark to evaluate large language models' ability to localize and transfer factual information for different locations in India.


Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

http://arxiv.org/abs/2407.11832v1

Compressor summary: The paper shows that approximating the number of relevant variables in a parity function is as hard as learning parities, and presents new algorithms for learning sparse parities with low-degree noise.


GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

http://arxiv.org/abs/2407.11827v1

Compressor summary: The study develops RhetAnn, a web app that simplifies annotating text with rhetorical and linguistic features of persuasion, and uses GPT to fine-tune models for detecting propaganda techniques cost-effectively and interpretably.


Harmonizing Safety and Speed: A Human-Algorithm Approach to Enhance the FDA's Medical Device Clearance Policy

http://arxiv.org/abs/2407.11823v1

Compressor summary: The paper proposes a machine learning-based policy to help the FDA reduce recalls and workload in its 510(k) medical device approval process, potentially saving billions of dollars annually.


Approximating Probabilistic Inference in Statistical EL with Knowledge Graph Embeddings

http://arxiv.org/abs/2407.11821v1

Compressor summary: Knowledge graph embeddings can help make probabilistic inference in statistical extensions of description logics more efficient and accurate.


Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis

http://arxiv.org/abs/2407.11814v1

Compressor summary: The paper proposes a method to create consistent multi-scene videos from action-centric text descriptions using contrastive sequential video diffusion.


DFDRNN: A dual-feature based neural network for drug repositioning

http://arxiv.org/abs/2407.11812v1

Compressor summary: The DFDRNN model uses two features to precisely encode drugs and diseases for drug repositioning, outperforming six existing methods on four datasets.


Invariant Consistency for Knowledge Distillation

http://arxiv.org/abs/2407.11802v1

Compressor summary: ICD is a new knowledge distillation method that uses contrastive learning and invariance penalties to transfer more structural knowledge from a teacher model to a student model, achieving better results on several image datasets.


PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

http://arxiv.org/abs/2407.11798v1

Compressor summary: PipeInfer is a new technique to speed up large language models across computer clusters by combining continuous asynchronous speculation and early inference cancellation.


Characterizing and Understanding HGNN Training on GPUs

http://arxiv.org/abs/2407.11790v1

Compressor summary: The study analyzes the efficiency of heterogeneous graph neural network (HGNN) training and identifies performance bottlenecks to optimize it.


Large Language Models as Misleading Assistants in Conversation

http://arxiv.org/abs/2407.11789v1

Compressor summary: The study shows that large language models can be deceptive in reading comprehension tasks, leading to significant accuracy drops when used as assistants.


Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data

http://arxiv.org/abs/2407.11785v1

Compressor summary: The paper proposes a framework to evaluate synthetic smart meter data for privacy and utility, using outlier injection and differential privacy methods.


Cryptocurrency Price Forecasting Using XGBoost Regressor and Technical Indicators

http://arxiv.org/abs/2407.11786v1

Compressor summary: The study proposes a machine learning model using technical indicators to predict Bitcoin prices and help traders make better decisions.


Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

http://arxiv.org/abs/2407.11784v1

Compressor summary: The paper introduces a sandbox for co-developing data and multi-modal generative models, improving performance and efficiency, and provides resources to foster progress in the field.


SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

http://arxiv.org/abs/2407.11781v1

Compressor summary: Key points: - The paper introduces SlingBAG, a novel 3D PAI reconstruction algorithm based on differentiable rendering and adaptive growth of point clouds. - The method shows improved quality and efficiency under sparse or limited view compared to traditional methods. - The paper provides a new dataset and code for future research. Summary: The paper presents SlingBAG, a 3D PAI reconstruction algorithm that uses differentiable rendering and adaptive point cloud growth to achieve high-quality results in sparse or limited view scenarios, outperforming traditional methods.


SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models

http://arxiv.org/abs/2407.11780v1

Compressor summary: The paper proposes a method to prevent catastrophic forgetting in large language models when adapting to new tasks by switching between tuned models.


Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

http://arxiv.org/abs/2407.11778v1

Compressor summary: The paper introduces SUWR, a local feature selection method that avoids misleading explanations by ensuring no label or feature leakage, improving interpretability of complex models.


Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text

http://arxiv.org/abs/2407.11774v1

Compressor summary: The paper presents a RoBERTa-based neural model for detecting machine-generated text in English, achieving 78.9% accuracy and ranking 57th in SemEval-2024 Subtask A.


Educational Personalized Learning Path Planning with Large Language Models

http://arxiv.org/abs/2407.11773v1

Compressor summary: The paper proposes using large language models and prompt engineering to create adaptable, interactive, and transparent personalized learning path planning systems that improve learning efficiency and engagement.


Robust Utility-Preserving Text Anonymization Based on Large Language Models

http://arxiv.org/abs/2407.11770v1

Compressor summary: The paper presents a framework using LLMs to evaluate privacy and utility of anonymized data and optimize the anonymization process, achieving better results than existing methods.


ITI-IQA: a Toolbox for Heterogeneous Univariate and Multivariate Missing Data Imputation Quality Assessment

http://arxiv.org/abs/2407.11767v1

Compressor summary: ITI-IQA is a toolbox for assessing imputation quality, selecting best imputers, filtering low-quality features, and diagnosing missing data issues in various data types.


Vectoring Languages

http://arxiv.org/abs/2407.11766v1

Compressor summary: The article introduces a new language structure inspired by large language models that better captures linguistic diversity and suggests it could improve scientific research.


Self-Duplicating Random Walks for Resilient Decentralized Learning on Graphs

http://arxiv.org/abs/2407.11762v1

Compressor summary: DECAFORK is a decentralized algorithm that maintains the number of random walks in a graph around a desired value by forking them when failures are likely, ensuring failure resilience.


A Theoretical Formulation of Many-body Message Passing Neural Networks

http://arxiv.org/abs/2407.11756v1

Compressor summary: The paper introduces a framework that models higher-order node interactions using tree-shaped motifs and spectral filters, and shows its effectiveness on regression and classification tasks.


A Channel Attention-Driven Hybrid CNN Framework for Paddy Leaf Disease Detection

http://arxiv.org/abs/2407.11753v1

Compressor summary: The research presents a new hybrid deep learning model for accurate and early rice leaf disease identification, which could improve agricultural efficiency and reduce crop loss.


Why long model-based rollouts are no reason for bad Q-value estimates

http://arxiv.org/abs/2407.11751v1

Compressor summary: The paper argues that using long model rollouts in model-based offline reinforcement learning can improve Q-value estimates and may enhance the technique.


Cycle Contrastive Adversarial Learning for Unsupervised image Deraining

http://arxiv.org/abs/2407.11750v1

Compressor summary: The paper proposes a new unsupervised method called CCLGAN for single image deraining that combines cycle contrastive learning and location contrastive learning to improve image quality.


ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection

http://arxiv.org/abs/2407.11735v1

Compressor summary: The text introduces ProSub, a new open-set semi-supervised learning (OSSL) framework that uses angles in feature space for in-distribution/out-of-distribution classification and conditional distribution estimation, achieving state-of-the-art performance on several benchmarks.


How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies

http://arxiv.org/abs/2407.11733v1

Compressor summary: The authors evaluate LLMs for stereotyping and find that they improve with a safety system prompt but still struggle with certain toxic harms, especially for intersectional identities, and call for more accountability and awareness in NLP.


Exploring Quantization for Efficient Pre-Training of Transformer Language Models

http://arxiv.org/abs/2407.11722v1

Compressor summary: This study explores how quantization can make pre-training Transformer models more efficient for language modeling by applying it to various components during training.


Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

http://arxiv.org/abs/2407.11717v1

Compressor summary: Turbo is a plug-and-play module that prunes data redundancy in vision-language models using information degree, achieving efficiency and performance trade-offs.


Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

http://arxiv.org/abs/2407.11701v1

Compressor summary: The text discusses how olfaction shapes human experiences, and proposes a transfer-learning method to classify fragrant spaces in artistic scenes using weakly labeled data.


Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

http://arxiv.org/abs/2407.11700v1

Compressor summary: The text proposes a new image compression method for machines that allows users to control the bitrate, quality, and task accuracy with one neural model.


Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

http://arxiv.org/abs/2407.11699v1

Compressor summary: The paper proposes Relation-DETR, an enhanced DETR method that uses position relation embeddings to improve convergence, performance, and speed for object detection tasks.


NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks

http://arxiv.org/abs/2407.11698v1

Compressor summary: NITRO-D is a novel framework for training deep integer-only Convolutional Neural Networks that operate entirely in the integer-only domain, reducing memory and computational requirements while maintaining performance.


Global atmospheric data assimilation with multi-modal masked autoencoders

http://arxiv.org/abs/2407.11696v1

Compressor summary: EarthNet is a model that learns to predict global atmospheric conditions from satellite observations, producing accurate and efficient data assimilation for weather forecasting.


VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

http://arxiv.org/abs/2407.11691v1

Compressor summary: VLMEvalKit is an open-source PyTorch toolkit for evaluating large multi-modal models and benchmarks, with a user-friendly interface and automatic handling of various tasks.


CCoE: A Compact LLM with Collaboration of Experts

http://arxiv.org/abs/2407.11686v1

Compressor summary: The paper proposes CCoE architecture, which combines multiple domain experts to create a large LLM, improving performance while reducing training costs.


Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning

http://arxiv.org/abs/2407.11683v1

Compressor summary: The paper proposes a network that learns stable image representations under distractors and generates captions based on reliable difference features using cross-modal contrastive regularization.


MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

http://arxiv.org/abs/2407.11682v1

Compressor summary: MapDistill uses knowledge distillation to transfer knowledge from a camera-LiDAR fusion model to a lightweight camera model for efficient high-definition map construction in autonomous driving.


MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models

http://arxiv.org/abs/2407.11681v1

Compressor summary: The paper proposes a hybrid pruning method for large language models that uses gradients estimated from forward passes to efficiently remove less critical components and improve performance on various tasks.


Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation

http://arxiv.org/abs/2407.11678v1

Compressor summary: The paper analyzes the excess risk of CycleGAN, a model that transforms unpaired data while ensuring consistent mappings, by decomposing it into approximation and estimation errors and exploring their trade-offs.


Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer

http://arxiv.org/abs/2407.11677v1

Compressor summary: The paper introduces a new module for pre-training video-language models that leverages spatio-temporal graph structure to learn contexts and improve alignment accuracy for downstream tasks.


SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation

http://arxiv.org/abs/2407.11676v1

Compressor summary: SKADA-Bench is a framework to evaluate and compare unsupervised domain adaptation methods fairly and realistically, using nested cross-validation and various scores.


Learning to Make Keypoints Sub-Pixel Accurate

http://arxiv.org/abs/2407.11668v1

Compressor summary: The authors propose a novel network that improves sub-pixel accuracy in detecting 2D local features by learning an offset vector for each feature, leading to better keypoint localization and faster computation.


Neural Compression of Atmospheric States

http://arxiv.org/abs/2407.11666v1

Compressor summary: The authors propose a method to compress large atmospheric data sets using neural networks, enabling faster access and analysis for various stakeholders.


Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network

http://arxiv.org/abs/2407.11663v1

Compressor summary: The paper presents a solution for the Multi-Task Learning Challenge of the ABAW7 Competition that uses a pre-trained model and cross-attention to extract features for action unit detection, facial expression recognition, and valance-arousal estimation tasks.


ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues

http://arxiv.org/abs/2407.11660v1

Compressor summary: GenResCoh is an open source dataset for evaluating dialogue coherence in multiple languages, created to overcome GPT-4's closed-source nature.


Statistics-aware Audio-visual Deepfake Detector

http://arxiv.org/abs/2407.11650v1

Compressor summary: The paper proposes an improved audio-visual deepfake detection method that uses statistical features, waveform representation, normalization, and shallower networks to enhance performance and reduce complexity.


Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

http://arxiv.org/abs/2407.11647v1

Compressor summary: The paper proposes a novel decentralized dataset dictionary learning approach that uses Wasserstein barycenters to adapt multiple related and heterogeneous source datasets to an unlabeled target dataset without centralizing clients' data or violating privacy.


Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

http://arxiv.org/abs/2407.11644v1

Compressor summary: The paper proposes Perception Helps Planning (PHP), a framework that integrates lane-level perception and planning for safe and efficient autonomous driving, achieving state-of-the-art performance on three Carla benchmarks.


A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

http://arxiv.org/abs/2407.11638v1

Compressor summary: This paper evaluates the ability of large language models (LLMs) in temporal event forecasting and proposes new methods using a constructed dataset and various input formats.


REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching

http://arxiv.org/abs/2407.11637v1

Compressor summary: REMM is a framework for multimodal image matching that encodes rotational differences in descriptors, improving performance and robustness to any angle.


Scaling Diffusion Transformers to 16 Billion Parameters

http://arxiv.org/abs/2407.11633v1

Compressor summary: DiT-MoE is a sparse diffusion Transformer that optimizes inference and achieves competitive performance with dense networks in conditional image generation using expert routing and balance loss.


Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces

http://arxiv.org/abs/2407.11626v1

Compressor summary: The Dynamic Dimension Wrapping (DDW) algorithm is a new optimization method for efficiently searching multi-dimensional spaces with varying dimensions, using a fitness function based on mapping relationships between time series and a novel cross-dimensional search mechanism.


Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots

http://arxiv.org/abs/2407.11625v1

Compressor summary: The text discusses two experiments on visual validation of linear regression models in scatterplots and finds that people are biased towards steeper slopes and error lines reduce bias but do not improve accuracy.


Rethinking Fair Graph Neural Networks from Re-balancing

http://arxiv.org/abs/2407.11624v1

Compressor summary: FairGB is a method to make Graph Neural Networks (GNNs) more fair by balancing the contributions of different groups in the data using counterfactual node mixup and contribution alignment loss.


Strategic Littlestone Dimension: Improved Bounds on Online Strategic Classification

http://arxiv.org/abs/2407.11619v1

Compressor summary: The paper studies online binary classification with strategic agents who can manipulate their features to get positive labels, introduces a new complexity measure for the learning problem, and provides algorithms with improved regret bounds in different settings.


Graph Dimension Attention Networks for Enterprise Credit Assessment

http://arxiv.org/abs/2407.11615v1

Compressor summary: Graph Dimension Attention Network (GDAN) improves credit risk evaluation by considering different feature dimensions using a dimension-level attention mechanism and provides edge-level interpretability through GDAN-DistShift.


MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

http://arxiv.org/abs/2407.11610v1

Compressor summary: MergeNet is a novel method that predicts local connectivity and filters out irrelevant edges to reconstruct meshes from sparse point clouds efficiently and accurately.


The Foundations of Tokenization: Statistical and Computational Concerns

http://arxiv.org/abs/2407.11606v1

Compressor summary: The paper proposes a formal framework to analyze and design tokenization models for neural language modeling, addressing the lack of theory in this critical NLP step.


HyperAggregation: Aggregating over Graph Edges with Hypernetworks

http://arxiv.org/abs/2407.11596v1

Compressor summary: HyperAggregation is a novel aggregation function for Graph Neural Networks that uses a hypernetwork to generate weights for aggregating variable-sized neighborhoods, and shows promising results on various graph tasks.


AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization

http://arxiv.org/abs/2407.11591v1

Compressor summary: The study evaluates how well Large Language Models adapt to different domains for summarization tasks and introduces a new evaluation suite called AdaptEval.


Progressive Pretext Task Learning for Human Trajectory Prediction

http://arxiv.org/abs/2407.11588v1

Compressor summary: The PPT framework progressively trains a model to predict pedestrian positions using short-term dynamics and long-term dependencies, achieving state-of-the-art performance with high efficiency.


QVD: Post-training Quantization for Video Diffusion Models

http://arxiv.org/abs/2407.11585v1

Compressor summary: The paper proposes a post-training quantization strategy (QVD) for video diffusion models to reduce latency and memory consumption while preserving temporal discriminability and improving channel coverage.


Enhancing stop location detection for incomplete urban mobility datasets

http://arxiv.org/abs/2407.11579v1

Compressor summary: The study applies classification algorithms to improve stop location detection from noisy or incomplete GPS datasets, using multiple features and prioritizing recall over precision.


UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

http://arxiv.org/abs/2407.11578v1

Compressor summary: The study proposes a new task and method (UP-Diff) for forecasting future urban layouts using remote sensing and planned change maps, which could help with city development planning.


TGIF: Text-Guided Inpainting Forgery Dataset

http://arxiv.org/abs/2407.11566v1

Compressor summary: Text-Guided Inpainting Forgery (TGIF) is a new dataset for evaluating image forgery localization and synthetic image detection methods in the context of text-guided inpainting, a powerful generative AI technique for editing images.


Self-Guided Generation of Minority Samples Using Diffusion Models

http://arxiv.org/abs/2407.11555v1

Compressor summary: Key points: - A novel approach for generating minority samples using diffusion models and guided sampling - Self-contained sampler that does not require external components - Time-scheduling techniques to manage guidance influence - Improved performance on real datasets Summary: The authors propose a self-contained, guided sampling method for generating realistic low-likelihood minority samples using diffusion models and time-scheduling techniques.


Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization

http://arxiv.org/abs/2407.11550v1

Compressor summary: The authors propose an adaptive allocation algorithm for efficient and high-quality language model inference by reducing the cache size within a memory budget without compromising generation quality.


How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models

http://arxiv.org/abs/2407.11549v1

Compressor summary: The paper presents a simulation framework using Large Language Model agents with synthetic personality traits that can mimic human negotiation behavior and strategically impact outcomes.


V2X-M2C: Efficient Multi-Module Collaborative Perception with Two Connections

http://arxiv.org/abs/2407.11546v1

Compressor summary: The paper proposes a collaborative perception model for autonomous vehicles that improves detection accuracy and reduces computational cost by communicating with other vehicles and infrastructures in either sequential or parallel connections.


A Discrete Perspective Towards the Construction of Sparse Probabilistic Boolean Networks

http://arxiv.org/abs/2407.11543v1

Compressor summary: The paper presents a new algorithm (GER) for building sparse PBNs, which are mathematical models used in various domains, and shows its superior performance over existing methods.


Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers

http://arxiv.org/abs/2407.11542v1

Compressor summary: The text analyzes simple transformer models trained on counting items in sequences, showing that different architectures can implement relation- or inventory-based counting mechanisms depending on various factors.


Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets

http://arxiv.org/abs/2407.11540v1

Compressor summary: The paper introduces NAIM, a transformer-based model that handles missing values in tabular datasets without imputation techniques, and shows its superior performance over existing models.


AEMIM: Adversarial Examples Meet Masked Image Modeling

http://arxiv.org/abs/2407.11537v1

Compressor summary: The authors propose using adversarial examples as reconstruction targets for masked image modeling to enhance representation learning, improve generalization, and increase robustness.


Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

http://arxiv.org/abs/2407.11536v1

Compressor summary: This study explores why medical LLMs struggle with long-context understanding and suggests adjusting the fine-tuning data composition to improve their performance.


LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

http://arxiv.org/abs/2407.11534v1

Compressor summary: Low-Rank Quantization (LRQ) is a post-training weight quantization method for large language models that uses low-rank weight-scaling matrices to improve accuracy and reduce parameters, achieving better results than existing techniques under various quantization schemes.


Length-Aware Motion Synthesis via Latent Diffusion

http://arxiv.org/abs/2407.11532v1

Compressor summary: The paper proposes a new model, Length-Aware Latent Diffusion (LADiff), that can generate 3D human motions with variable target lengths from textual descriptors.


FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

http://arxiv.org/abs/2407.11522v1

Compressor summary: FIRE is a new dataset for improving vision language models' ability to refine their responses based on user feedback in various tasks, and it is used to evaluate and develop better models.


ColorwAI: Generative Colorways of Textiles through GAN and Diffusion Disentanglement

http://arxiv.org/abs/2407.11514v1

Compressor summary: Key points: - Colorway creation is generating textile samples with different colors but the same pattern - The paper proposes a framework called ColorwAI that uses disentanglement methods on StyleGAN and Diffusion models - The paper introduces ShapleyVec, a variation of InterfaceGAN for supervised disentanglement - The paper analyzes the color representations in different architectures and finds StyleGAN's W space most aligned with human color perception - The paper suggests that disentanglement can enable creative colorway creation and evaluates it with experts Summary: The paper presents ColorwAI, a framework for generating creative textile samples with alternative colors based on disentangling color information from StyleGAN and Diffusion models. It also introduces ShapleyVec, a new method for supervised disentanglement, and evaluates the effect of disentanglement on color creation with experts.


Reasoning with Large Language Models, a Survey

http://arxiv.org/abs/2407.11511v1

Compressor summary: The paper reviews prompt-based reasoning with large language models, explores different approaches and open problems, and discusses the relation between reasoning and other aspects of artificial intelligence.


Haze-Aware Attention Network for Single-Image Dehazing

http://arxiv.org/abs/2407.11505v1

Compressor summary: The Haze-Aware Attention Network (HAA-Net) combines a novel attention module based on atmospheric scattering and a multiscale frequency enhancement module to effectively remove haze from images, outperforming existing methods.


How Control Information Influences Multilingual Text Image Generation and Editing?

http://arxiv.org/abs/2407.11502v1

Compressor summary: The paper proposes TextGen, a framework to improve visual text generation by optimizing control information using Fourier analysis and a two-stage generation process, achieving state-of-the-art results in Chinese and English.


Diff-MTS: Temporal-Augmented Conditional Diffusion-based AIGC for Industrial Time Series Towards the Large Model Era

http://arxiv.org/abs/2407.11501v1

Compressor summary: The paper proposes a new model, Diff-MTS, for generating multivariate time series data in the industrial field, which improves diversity, fidelity, and utility compared to existing GAN-based methods.


An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data

http://arxiv.org/abs/2407.11500v1

Compressor summary: The paper proposes a method to automatically grade knee osteoarthritis severity using self-supervised anomaly detection, denoising with CLIP, and dual centre representation learning, outperforming existing techniques and achieving human-level correlation.


Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection

http://arxiv.org/abs/2407.11499v1

Compressor summary: The paper proposes a method called "Bridge Past and Future" (BPF) that aligns models across stages to overcome inconsistent optimization objectives in incremental object detection, and introduces a new loss called "Distillation with Future" (DwF) that leverages background probability to mitigate forgetting and improve adaptability.


Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

http://arxiv.org/abs/2407.11494v1

Compressor summary: The paper proposes Semantic Latent Directions, a method to improve stochastic human motion prediction by constraining the latent space to learn meaningful motion semantics and offering controllable predictions with diverse queries.


A Meta-Learning Approach for Multi-Objective Reinforcement Learning in Sustainable Home Environments

http://arxiv.org/abs/2407.11489v1

Compressor summary: The paper proposes a meta-learning approach for residential appliance scheduling that adapts quickly to changing contexts, reduces electricity bills, increases user comfort, and saves utility while using less data and training time.


PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation

http://arxiv.org/abs/2407.11487v1

Compressor summary: The paper proposes a navigation method that aligns instructions with trajectories on a directed graph, improving efficiency and performance compared to previous methods.


An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

http://arxiv.org/abs/2407.11486v1

Compressor summary: The paper proposes an efficient framework for cervical cytopathology WSI classification using unsupervised and weakly supervised learning, which enhances the performance of various MIL methods and achieves SOTA results.


Scientific QA System with Verifiable Answers

http://arxiv.org/abs/2407.11485v1

Compressor summary: The VerifAI project is an open-source system that generates and verifies referenced claims from scientific papers using semantic search, retrieval-augmented generation, and a verification engine.


The Oscars of AI Theater: A Survey on Role-Playing with Language Models

http://arxiv.org/abs/2407.11484v1

Compressor summary: The survey explores the development and challenges of role-playing with language models, focusing on their ability to create complex character simulations using various methods and resources.


Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG

http://arxiv.org/abs/2407.11481v1

Compressor summary: This paper proposes a method to generate realistic 12-lead ECG signals from single-lead ECG using a multi-channel masked autoencoder, and introduces a benchmark for evaluating the quality of synthetic ECGs.


AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models

http://arxiv.org/abs/2407.11480v1

Compressor summary: This paper overviews generative models for industrial time series, discussing their applications, frameworks, technologies, and challenges.


XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

http://arxiv.org/abs/2407.11477v1

Compressor summary: The XTraffic dataset combines spatiotemporally-aligned traffic and incident data to enable new research on traffic-related tasks with higher interpretability and practice.


Quantum Maximum Entropy Inference and Hamiltonian Learning

http://arxiv.org/abs/2407.11473v1

Compressor summary: The paper develops quantum versions of maximum entropy inference and graphical model learning algorithms, improves their convergence rates using quasi-Newton methods, and applies them to Hamiltonian learning.


Safe Online Convex Optimization with Multi-Point Feedback

http://arxiv.org/abs/2407.11471v1

Compressor summary: The paper proposes an algorithm for safe online convex optimization using only zero-order information, achieving sublinear regret and zero constraint violation with smooth and strongly convex constraints.


AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

http://arxiv.org/abs/2407.11468v1

Compressor summary: The paper proposes a new video pre-training method for facial action unit detection that uses multi-label properties, temporal label consistency, and prior knowledge matrices to improve performance on limited data.


Graceful task adaptation with a bi-hemispheric RL agent

http://arxiv.org/abs/2407.11456v1

Compressor summary: A reinforcement learning agent with specialized hemispheres can exploit generalist knowledge for better initial performance on novel tasks while maintaining learning capabilities.


Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

http://arxiv.org/abs/2407.11451v1

Compressor summary: Isometric Diffusion is a technique that improves diffusion models by learning a disentangled latent space for better interpolation, inversion, and attribute control.


Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

http://arxiv.org/abs/2407.11449v1

Compressor summary: Controllable Contextualized Image Captioning (Ctrl-CIC) generates focused captions for images based on a user-defined highlight, using two approaches and a GPT-4V evaluator.


cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process

http://arxiv.org/abs/2407.11448v1

Compressor summary: The paper proposes a Bayesian nonparametric framework for multiple instance learning in histopathology image analysis, using cascade of Dirichlet processes to improve feature aggregation and prevent overfitting.


EARN Fairness: Explaining, Asking, Reviewing and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders

http://arxiv.org/abs/2407.11442v1

Compressor summary: EARN Fairness is a new framework that helps stakeholders without AI expertise choose and agree on fairness metrics for AI models.


Repurformer: Transformers for Repurposing-Aware Molecule Generation

http://arxiv.org/abs/2407.11439v1

Compressor summary: Repurformer is a model that uses multi-hop relationships among proteins and compounds to generate diverse molecules with desired properties for drug discovery, overcoming the sample bias problem.


Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

http://arxiv.org/abs/2407.11438v1

Compressor summary: The study analyzes personal disclosures in human-chatbot interactions, revealing privacy risks from leaking identifiable information and sensitive topics in various contexts.


CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation

http://arxiv.org/abs/2407.11433v1

Compressor summary: The paper introduces CycleHOI, a new learning framework for computer vision that bridges detection and generation tasks using cycle consistency loss, feature distillation, and data augmentation to improve human-object interaction detection.


MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction

http://arxiv.org/abs/2407.11431v1

Compressor summary: The paper presents an end-to-end framework that uses a recurrent neural network-like structure to mutually reinforce multi-view dense matching and point cloud surface optimization for indoor 3D reconstruction, improving both tasks and achieving better results.


Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis

http://arxiv.org/abs/2407.11427v1

Compressor summary: Key points: - Deep generative approach using latent temporal processes for modeling complex disease trajectories - Learn temporal latent representations that explain SSc patient trajectories - Semi-supervised disentangling of latent space using medical knowledge - Discovery of new aspects, sub-types, and personalized monitoring and prediction of SSc Summary: The authors propose a deep generative method to model and analyze complex disease trajectories, especially Systemic Sclerosis, by learning latent temporal representations that explain the disease and using medical knowledge to enhance interpretability.


Generally-Occurring Model Change for Robust Counterfactual Explanations

http://arxiv.org/abs/2407.11426v1

Compressor summary: The paper proposes a general framework for counterfactual explanations in machine learning that is robust to model and data changes.


Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

http://arxiv.org/abs/2407.11424v1

Compressor summary: Diff-MI is a novel diffusion model-based method that improves generative fidelity and privacy invasion of model inversion attacks by incorporating the target classifier into the learning process and using an iterative image reconstruction technique.


Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

http://arxiv.org/abs/2407.11422v1

Compressor summary: Reflective instruction tuning improves LVLMs' reasoning proficiency by learning rationales behind correct and incorrect responses, as demonstrated by the REVERIE dataset and benchmark results.


States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

http://arxiv.org/abs/2407.11421v1

Compressor summary: The paper reveals that large language models can perform complex arithmetic calculations without explicit chain-of-thought steps, possibly using implicit discrete state representations, but these representations are not lossless and cause inaccuracies.


TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

http://arxiv.org/abs/2407.11419v1

Compressor summary: The study proposes TeethDreamer, a framework that uses five intra-oral photos to reconstruct 3D dental models for remote orthodontic monitoring, improving on previous methods with better geometry accuracy.


SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions

http://arxiv.org/abs/2407.11417v1

Compressor summary: The SPINACH dataset and agent improve Knowledge Base Question Answering (KBQA) by handling complex questions and reasoning about large, incomplete schemas, achieving state-of-the-art results on several datasets.


SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

http://arxiv.org/abs/2407.11414v1

Compressor summary: SDPT improves the performance of fusion-based VLPMs by using learnable unified prototype tokens to represent aligned semantics of text and image modalities across different prompts.


Representation Bias in Political Sample Simulations with Large Language Models

http://arxiv.org/abs/2407.11409v1

Compressor summary: The study examines how well GPT-3.5-Turbo model simulates political behavior and opinions across different countries, languages, demographics, and regimes, finding that it performs better in English-speaking bipartisan democracies.


Revisiting the Impact of Pursuing Modularity for Code Generation

http://arxiv.org/abs/2407.11406v1

Compressor summary: The study finds that modular programming does not necessarily improve the performance of code generation models using large language models, challenging conventional wisdom.


Accounting for Work Zone Disruptions in Traffic Flow Forecasting

http://arxiv.org/abs/2407.11407v1

Compressor summary: The paper proposes a new graph convolutional network model that incorporates roadway maintenance work zones information to improve traffic speed forecasting and its impact on the economy and public well-being.


Mapping savannah woody vegetation at the species level with multispecral drone and hyperspectral EnMAP data

http://arxiv.org/abs/2407.11404v1

Compressor summary: The study uses EnMAP and Sentinel-2 data to accurately map fractional woody cover of three species in a South African savannah, helping to protect the ecosystem from invasive plants.


EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

http://arxiv.org/abs/2407.11401v1

Compressor summary: EndoFinder is a framework that uses content-based image retrieval to find similar polyps in a reference database, enabling explainable diagnostics and optical biopsy for colorectal cancer screening.


Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

http://arxiv.org/abs/2407.11398v1

Compressor summary: Animate3D is a novel framework for animating any static 3D model using multi-view video diffusion and 4D Score Distillation Sampling to achieve better spatiotemporal consistency.


DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

http://arxiv.org/abs/2407.11394v1

Compressor summary: DreamCatalyst is a novel framework that improves 3D editing quality and reduces training time by interpreting Score Distillation Sampling as a diffusion reverse process, offering fast and high-quality modes for NeRF scene editing.


CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

http://arxiv.org/abs/2407.11393v1

Compressor summary: The paper proposes a method to generate diverse, high-quality, and focused image descriptions using a structured semantic representation, which improves the performance of controllable image captioning models.


InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains

http://arxiv.org/abs/2407.11384v1

Compressor summary: InvAgent is a novel approach using large language models to manage multi-agent inventory systems, enhancing resilience and efficiency in supply chain management by leveraging zero-shot learning, explainability, and adaptability.


TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering

http://arxiv.org/abs/2407.11383v1

Compressor summary: The authors introduce a multilingual speech-based VQA dataset for medical diagnostics and evaluate different systems using acoustic and visual features.


Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

http://arxiv.org/abs/2407.11382v1

Compressor summary: Key points: - The paper proposes an algorithm for labeling 3D objects from 2D prompts for autonomous driving. - The algorithm uses a Segment, Lift, and Fit (SLF) paradigm that predicts 3D shapes instead of bounding boxes. - The algorithm does not require training on a specific dataset and shows better generalization and pseudo-label performance than previous methods. Summary: The paper presents an SLF algorithm that labels 3D objects from 2D prompts for autonomous driving without specific training data, achieving high quality and generalization.


NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

http://arxiv.org/abs/2407.11380v1

Compressor summary: NAMER is a novel non-autoregressive model for handwritten mathematical expression recognition that leverages visual and linguistic contexts and achieves better performance and speed than existing methods.


Exploring connections of spectral analysis and transfer learning in medical imaging

http://arxiv.org/abs/2407.11379v1

Compressor summary: The paper explores how spectral analysis of model gradients can reveal transfer learning biases and frequency shortcuts in medical imaging, and suggests source data editing as a way to reduce overfitting to artifacts.


Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain

http://arxiv.org/abs/2407.11375v1

Compressor summary: The paper introduces MAMMI, a novel method for interpreting deep neural networks in medical imaging without expensive pixel-level annotations, enabling transparent clinical decision-making.


Reliable Reasoning Beyond Natural Language

http://arxiv.org/abs/2407.11373v1

Compressor summary: The study proposes a neurosymbolic approach using Prolog to improve LLMs' reasoning skills and introduces a new dataset for testing non-linear reasoning abilities.


Estimating Agreement by Chance for Sequence Annotation

http://arxiv.org/abs/2407.11371v1

Compressor summary: The paper proposes a model for generating random annotations to estimate chance agreement in sequence annotation tasks, which can help evaluate their reliability.


Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

http://arxiv.org/abs/2407.11368v1

Compressor summary: The study compares three methods for translating ancient texts with sparse corpora and proposes a new inter methodological approach that outperforms previous models in BLEU score.


Graph Structure Prompt Learning: A Novel Methodology to Improve Performance of Graph Neural Networks

http://arxiv.org/abs/2407.11361v1

Compressor summary: GPL is a novel method that enhances GNN training by capturing intrinsic graph characteristics using task-independent structure losses, improving node and graph representations and achieving state-of-the-art results on several tasks.


Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias

http://arxiv.org/abs/2407.11360v1

Compressor summary: The paper compares human-AI interactions to giraffes and acacias on the Savannah, discussing how humans adapt to and shape AI while addressing ethical risks using the HHH framework.


Feature Inference Attack on Shapley Values

http://arxiv.org/abs/2407.11359v1

Compressor summary: The paper explores how Shapley value-based model interpretability methods can expose private features in machine learning models and suggests the need for privacy-preserving alternatives.


SES: Bridging the Gap Between Explainability and Prediction of Graph Neural Networks

http://arxiv.org/abs/2407.11358v1

Compressor summary: The paper proposes a new graph neural network (SES) that combines explainable training and enhanced predictive learning to improve accuracy and interpretability of predictions.


Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation

http://arxiv.org/abs/2407.11348v1

Compressor summary: The study introduces a new way of creating and using fish disease images for automated detection in farmed flatfish, improving performance by 12%.


I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

http://arxiv.org/abs/2407.11347v1

Compressor summary: The inverse image-formation module enhances visual SLAM pipelines by integrating physical imaging and optimizing variables to handle motion blur and appearance variation in casually captured videos.


Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models

http://arxiv.org/abs/2407.11345v1

Compressor summary: The text discusses the challenges of detecting different types of speech errors caused by aphasia using automatic methods and presents novel approaches based on pretrained transformers and end-to-end models that perform better than previous ones.


Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering

http://arxiv.org/abs/2407.11343v1

Compressor summary: Ev-GS is a new method that uses event cameras to reconstruct realistic views with less blur and improved quality, while being faster and more efficient than existing methods.


Continuity Preserving Online CenterLine Graph Learning

http://arxiv.org/abs/2407.11337v1

Compressor summary: CGNet is an end-to-end network that improves centerline graphs for autonomous driving by incorporating junction prediction, B'ezier space continuity constraints, and iterative refinement of topological connectivity.


LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

http://arxiv.org/abs/2407.11335v1

Compressor summary: The paper proposes LaMI-DETR, an open-vocabulary object detection method that leverages visual concepts and relationships to improve concept representation and avoid overfitting, achieving state-of-the-art performance on OV-LVIS.


COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

http://arxiv.org/abs/2407.11315v1

Compressor summary: COMET is a novel approach for generating high-quality mathematical problems by combining stem generation and problem solving, using a three-stage fine-tuning framework guided by "Cone of Experience" and a Chinese multimodal mathematical problem dataset.


Digital Twin Vehicular Edge Computing Network: Task Offloading and Resource Allocation

http://arxiv.org/abs/2407.11310v1

Compressor summary: The paper proposes a method using digital twins and multi-agent reinforcement learning to optimize task offloading and resource allocation in vehicular edge computing networks.


Gaussian Splatting LK

http://arxiv.org/abs/2407.11309v1

Compressor summary: The authors propose a novel method to improve the reconstruction of dynamic 3D scenes from 2D images by regularizing the native warp field within the dynamic Gaussian Splatting framework using an analytical velocity field derived from the forward warp field network.


PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

http://arxiv.org/abs/2407.11306v1

Compressor summary: PADRe is a framework that replaces self-attention in transformer models with polynomial functions for faster computation without sacrificing accuracy.


COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

http://arxiv.org/abs/2407.11294v1

Compressor summary: The paper presents a graph-based masked autoencoder (GMAE) for generating realistic, context-sensitive urban layouts across various styles in US cities.