arxiv compressed, 2024-08-28

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-28 generated by the compressor, my personal LLM-based project.

GenRec: Unifying Video Generation and Recognition with Diffusion Models

http://arxiv.org/abs/2408.15241v1

Compressor summary: The paper introduces GenRec, a framework that learns spatial-temporal representations for video generation and recognition, and shows its effectiveness and robustness in various tasks.

Generative Verifiers: Reward Modeling as Next-Token Prediction

http://arxiv.org/abs/2408.15240v1

Compressor summary: The authors propose a new way to improve reasoning performance of large language models by training verifiers jointly on generation and verification tasks, achieving better results than previous methods.

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

http://arxiv.org/abs/2408.15239v1

Compressor summary: The paper introduces a technique to generate videos with smooth motion between two input frames using a modified pretrained image-to-video diffusion model.

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

http://arxiv.org/abs/2408.15237v1

Compressor summary: Key points: - Linear RNN architectures like Mamba can compete with Transformers in language modeling - Pretrained Transformers can be distilled into linear RNNs using attention weights - The hybrid model outperforms some open-source models and achieves comparable performance to GPT-4 on chat benchmarks - A hardware-aware speculative decoding algorithm accelerates the inference speed of Mamba and hybrid models Summary: The paper shows how to distill pretrained Transformers into linear RNNs using attention weights, creating a hybrid model that performs well on language modeling tasks and is faster to decode with a hardware-aware algorithm.

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

http://arxiv.org/abs/2408.15232v1

Compressor summary: Co-STORM is an AI system that allows users to explore unknown information through conversations with several LM agents, who ask questions on their behalf and organize the results into a dynamic mind map and report.

Classifying populist language in American presidential and governor speeches using automatic text analysis

http://arxiv.org/abs/2408.15213v1

Compressor summary: The paper presents a method to automatically classify populist language in political speeches using machine learning, achieving high accuracy rates across different contexts and data amounts.

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

http://arxiv.org/abs/2408.15204v1

Compressor summary: Confidence-Driv

PoseWatch: A Transformer-based Architecture for Human-centric Video Anomaly Detection Using Spatio-temporal Pose Tokenization

http://arxiv.org/abs/2408.15185v1

Compressor summary: PoseWatch is a novel transformer-based architecture for detecting anomalous human behaviors in videos, using pose and motion information.

Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation

http://arxiv.org/abs/2408.15171v1

Compressor summary: The project proposes using Naive Bayes classification to estimate the factuality of summaries generated by large language models, addressing the problem of "hallucination."

Latent Ewald summation for machine learning of long-range interactions

http://arxiv.org/abs/2408.15165v1

Compressor summary: The paper proposes a method to include long-range interactions in machine learning potentials for molecular simulations, improving predictions and reducing artifacts.

Empowering Sign Language Communication: Integrating Sentiment and Semantics for Facial Expression Synthesis

http://arxiv.org/abs/2408.15159v1

Compressor summary: This paper presents a new method for synthesizing facial expressions in sign language, which improves sign language production by integrating sentiment information and outperforms existing approaches on benchmark datasets.

Delay as Payoff in MAB

http://arxiv.org/abs/2408.15158v1

Compressor summary: The paper studies a stochastic Multi-armed Bandit problem where the payoff depends on the delay, and provides optimal regret bounds for both cost and reward settings.

A Preliminary Exploration Towards General Image Restoration

http://arxiv.org/abs/2408.15143v1

Compressor summary: Key points: - Deep models excel at individual image restoration tasks but struggle with real-world challenges - General image restoration (GIR) is a new problem that covers most individual tasks and aims to address generalization and complex degradations - The paper defines GIR, introduces new datasets and evaluation framework, and analyzes existing approaches - The paper highlights the effectiveness of GIR and its practical difficulties, and suggests future directions for research Summary: The paper proposes general image restoration (GIR), a unified problem to tackle various individual image restoration tasks in real-world scenarios, and evaluates existing methods while identifying challenges and opportunities.

How transformers learn structured data: insights from hierarchical filtering

http://arxiv.org/abs/2408.15138v1

Compressor summary: The paper proposes a method to control positional correlations in sequence models on trees using encoder-only transformers that implement optimal Belief Propagation, which is shown by analyzing attention maps.

Using LLMs for Explaining Sets of Counterfactual Examples to Final Users

http://arxiv.org/abs/2408.15133v1

Compressor summary: The text discusses using causal inference methods and counterfactuals to generate natural language explanations for automated decision-making in Explainable AI, using a multi-step pipeline with LLMs.

Evaluating the Energy Consumption of Machine Learning: Systematic Literature Review and Experiments

http://arxiv.org/abs/2408.15128v1

Compressor summary: The text discusses evaluating energy consumption of Machine Learning (ML) using various tools and methods, comparing them, and providing a systematic literature review and open-source repositories for further exploration.

Aligning XAI with EU Regulations for Smart Biomedical Devices: A Methodology for Compliance Analysis

http://arxiv.org/abs/2408.15121v1

Compressor summary: The study investigates how to select appropriate Explainable AI methods for medical devices in compliance with EU regulations, using a categorization of smart devices, an analysis of legal requirements, and a classification of XAI objectives.

Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling

http://arxiv.org/abs/2408.15119v1

Compressor summary: The paper presents an OCR model for Urdu text with transformer-based architecture and attention mechanisms, achieving high accuracy and handling diverse styles but facing challenges in certain conditions.

Evaluating Stability of Unreflective Alignment

http://arxiv.org/abs/2408.15116v1

Compressor summary: The paper proposes a mechanism for causing alignment issues in future AI systems by destabilizing their priorities and evaluates two risk factors for this mechanism using current large language models.

Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries

http://arxiv.org/abs/2408.15114v1

Compressor summary: The paper proposes a new method for learning Neural Signed Distance Functions (SDF) from sparse 3D point clouds by using adversarial samples to improve the representation of shapes.

AnomalousPatchCore: Exploring the Use of Anomalous Samples in Industrial Anomaly Detection

http://arxiv.org/abs/2408.15113v1

Compressor summary: APC is a new system for anomaly detection in manufacturing images that uses fine-tuned feature extractors and memory banks to identify unusual features better than existing methods.

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

http://arxiv.org/abs/2408.15101v1

Compressor summary: Key points: - Multi-task dense scene understanding trains a model for multiple tasks and has many applications - MTMamba++ is a new architecture with a Mamba-based decoder that handles long-range dependency and cross-task interaction - It has two types of core blocks: STM and CTM, which use state-space models and feature/semantic perspectives to enhance information exchange - Experiments show that MTMamba++ outperforms CNN-based and Transformer-based methods on various datasets Summary: MTMamba++ is a novel architecture for multi-task dense scene understanding that uses a Mamba-based decoder, state-space models, and feature/semantic perspectives to improve performance over existing methods.

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

http://arxiv.org/abs/2408.15099v1

Compressor summary: The study finds that state-of-the-art Unsupervised Environment Design methods for reinforcement learning are not robust in a real-world robotics problem, and proposes a simple and intuitive approach based on training agents on levels with high learnability.

CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP

http://arxiv.org/abs/2408.15098v1

Compressor summary: The paper proposes a new model, CLIP-AGIQA, to assess the quality of AI-generated images using the visual and textual knowledge of CLIP, a powerful visual language model.

Post-processing fairness with minimal changes

http://arxiv.org/abs/2408.15096v1

Compressor summary: The paper presents a new post-processing algorithm that reduces bias without sensitive attribute input and maintains minimal changes between biased and debiased predictions using a multiplicative factor on logit values.

Constrained Diffusion Models via Dual Training

http://arxiv.org/abs/2408.15094v1

Compressor summary: The paper proposes constrained diffusion models that generate data with desired distributions and requirements, reducing the gap between original and generated data while obeying constraints.

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

http://arxiv.org/abs/2408.15091v1

Compressor summary: The paper proposes a relation-focused perspective to improve knowledge editing in transformer language models and reduce over-generalization.

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

http://arxiv.org/abs/2408.15079v1

Compressor summary: Key points: - Data processing pipeline to scale up and improve quality of pretraining datasets - BaichuanSEED: a 7B LLM baseline with open-source data pipeline - Comparable performance on benchmarks with commercial models - Potential for further optimization of downstream tasks Summary: The paper introduces an open-source data processing pipeline for large language models and shows that BaichuanSEED, a 7B LLM trained with it, performs well on various tasks.

MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder

http://arxiv.org/abs/2408.15077v1

Compressor summary: MMASD+ is a multimodal dataset and algorithm for autism diagnosis and action prediction that improves accuracy over single-modality approaches.

MiWaves Reinforcement Learning Algorithm

http://arxiv.org/abs/2408.15076v1

Compressor summary: MiWaves is an RL algorithm that delivers personalized intervention messages to reduce cannabis use among emerging adults.

Interactive dense pixel visualizations for time series and model attribution explanations

http://arxiv.org/abs/2408.15073v1

Compressor summary: DAVOTS is an interactive visual analytics tool that helps users explore and understand explanations from deep neural networks for time series data using dense-pixel visualization, clustering, and ordering strategies.

Geometric Artifact Correction for Symmetric Multi-Linear Trajectory CT: Theory, Method, and Generalization

http://arxiv.org/abs/2408.15069v1

Compressor summary: The paper proposes an efficient method to address geometric artifacts in Symmetric Multi-Linear Computed Tomography (SMLCT) by using the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) algorithm for image registration.

Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

http://arxiv.org/abs/2408.15063v1

Compressor summary: Key points: - The paper proposes a novel framework to use pre-trained SAM for multi-modal SOD tasks - The framework incorporates multi-modal saliency-specific knowledge into SAM using semantic feature fusion and adapter - The framework also uses semantic-geometric prompts to generate embeddings with various saliency cues - The experiments show the effectiveness of the proposed framework on RGB-D and RGB-T SOD benchmarks Summary: The paper presents a new framework that adapts pre-trained SAM for multi-modal SOD by fusing multi-modal semantic features, using an adapter to encode them, and generating embeddings with saliency cues. The framework outperforms existing methods on RGB-D and RGB-T datasets.

Subgroup Analysis via Model-based Rule Forest

http://arxiv.org/abs/2408.15057v1

Compressor summary: mobDRF is an interpretable representation learning algorithm that uses transparent IF-THEN rules to improve existing models without sacrificing accuracy, which can help identify key risk factors for cognitive decline in healthcare.

Causal Rule Forest: Toward Interpretable and Precise Treatment Effect Estimation

http://arxiv.org/abs/2408.15055v1

Compressor summary: Key points: - HTE and CATE are important for personalized treatment recommendations - Existing approaches are good at estimating HTE but not very interpretable - Causal Rule Forest (CRF) is a new model that learns hidden patterns and converts them into interpretable rules - CRF improves the performance and interpretability of other causal inference models for HTE and CATE estimation Summary: The authors propose Causal Rule Forest, a novel model that learns hidden patterns from data and generates interpretable rules to improve the accuracy and interpretability of personalized treatment recommendations based on HTE and CATE.

Self-supervised Topic Taxonomy Discovery in the Box Embedding Space

http://arxiv.org/abs/2408.15050v1

Compressor summary: The paper proposes a new method called BoxTM that improves topic taxonomy discovery by using box embeddings to model semantic scopes and asymmetric distances for hierarchical relations among topics.

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

http://arxiv.org/abs/2408.15045v1

Compressor summary: DocLayLLM is a multi-modal extension of LLMs for text-rich document understanding, leveraging visual and positional tokens, and enhancing perception of OCR information with chain-of-thought techniques.

Earth Observation Satellite Scheduling with Graph Neural Networks

http://arxiv.org/abs/2408.15041v1

Compressor summary: The paper proposes a new technique using Graph Neural Networks and Deep Reinforcement Learning to select and schedule Earth observation satellite observations while respecting constraints and maximizing benefit.

A Survey of Large Language Models for European Languages

http://arxiv.org/abs/2408.15040v1

Compressor summary: The paper gives an introduction to large language models and their applications in EU languages, describing different families of models and the data used to train them.

Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data

http://arxiv.org/abs/2408.15038v1

Compressor summary: The text introduces a new method (DNMMSI) for estimating occlusion boundaries in images, a synthetic benchmark (OB-FUTURE) for training, and a real benchmark (OB-LabName) for evaluation.

Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering

http://arxiv.org/abs/2408.15037v1

Compressor summary: EATQA is a framework that improves generative question answering by predicting all combinations of (Question, Evidence, Answer) triplets and learning the logical relations between them.

Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology

http://arxiv.org/abs/2408.15032v1

Compressor summary: Mamba2MIL is a novel framework for Multiple Instance Learning in Computational Pathology that improves feature fusion and sequence information utilization, achieving better performance than existing methods on various datasets.

Sequence-aware Pre-training for Echocardiography Probe Guidance

http://arxiv.org/abs/2408.15026v1

Compressor summary: The text proposes a sequence-aware self-supervised pre-training method for cardiac ultrasound probe guidance that learns personalized 2D and 3D cardiac structural features, improving navigation accuracy by reducing errors.

Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection

http://arxiv.org/abs/2408.15020v1

Compressor summary: Key points: - The paper proposes a hierarchical graph interaction network (HGINet) for camouflaged object detection - HGINet uses region-aware token focusing attention, hierarchical graph interaction transformer, and confidence aggregated feature fusion modules - HGINet outperforms existing methods on four datasets Summary: The paper introduces HGINet, a network that detects camouflaged objects by interacting hierarchical features using attention and fusion modules, achieving state-of-the-art results on four datasets.

Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training

http://arxiv.org/abs/2408.15011v1

Compressor summary: TPP is a simple framework that pre-trains new parameters in fine-tuning using a defined pretext task to improve performance in self-supervised learning.

FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

http://arxiv.org/abs/2408.14998v1

Compressor summary: FastTextSpotter is a new framework that combines Swin Transformer and Transformer Encoder-Decoder with a faster self-attention unit to improve the accuracy and efficiency of text spotting in various environments, achieving state-of-the-art results for multilingual scene text.

Speech Recognition Transformers: Topological-lingualism Perspective

http://arxiv.org/abs/2408.14991v1

Compressor summary: The paper surveys transformer techniques for speech processing and recognition tasks, covering background, models, data, features, architecture, decoding, evaluation, and toolkits, while discussing challenges and future directions.

Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning

http://arxiv.org/abs/2408.14976v1

Compressor summary: The paper proposes a novel method called Prior-free Balanced Replay (PBR) for long-tailed continual learning that uses uncertainty-guided reservoir sampling and two prior-free components to reduce forgetting without knowing the label distribution of the data stream.

MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

http://arxiv.org/abs/2408.14975v1

Compressor summary: MegActor-Σ is a new mixed-modal conditional diffusion transformer that enables flexible control of portrait animation using both audio and visual inputs, achieving better results than previous methods.

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

http://arxiv.org/abs/2408.14972v1

Compressor summary: The text introduces AgentMonitor, a framework that predicts multi-agent system performance before execution and enhances their security by detecting and correcting malicious agents in real time.

Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning

http://arxiv.org/abs/2408.14964v1

Compressor summary: The Multi-Modal Fusion framework combines Graph Neural Networks and Large Language Models to improve molecular property predictions by leveraging their complementary strengths in graph data and linguistic knowledge.

Deep Learning-based Average Shear Wave Velocity Prediction using Accelerometer Records

http://arxiv.org/abs/2408.14962v1

Compressor summary:

CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task

http://arxiv.org/abs/2408.14961v1

Compressor summary:

Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress

http://arxiv.org/abs/2408.14960v1

Compressor summary:

NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework

http://arxiv.org/abs/2408.14950v1

Compressor summary:

BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization

http://arxiv.org/abs/2408.14941v1

Compressor summary:

Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures

http://arxiv.org/abs/2408.14935v1

Compressor summary:

Cross-Modal Temporal Alignment for Event-guided Video Deblurring

http://arxiv.org/abs/2408.14930v1

Compressor summary:

Towards Real-world Event-guided Low-light Video Enhancement and Deblurring

http://arxiv.org/abs/2408.14916v1

Compressor summary:

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

http://arxiv.org/abs/2408.14909v1

Compressor summary:

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

http://arxiv.org/abs/2408.14906v1

Compressor summary:

MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation

http://arxiv.org/abs/2408.14899v1

Compressor summary:

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

http://arxiv.org/abs/2408.14895v1

Compressor summary:

A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm

http://arxiv.org/abs/2408.14892v1

Compressor summary:

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

http://arxiv.org/abs/2408.14874v1

Compressor summary:

Learning Robust Reward Machines from Noisy Labels

http://arxiv.org/abs/2408.14871v1

Compressor summary:

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

http://arxiv.org/abs/2408.14868v1

Compressor summary:

Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

http://arxiv.org/abs/2408.14866v1

Compressor summary:

Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems

http://arxiv.org/abs/2408.14864v1

Compressor summary:

DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose

http://arxiv.org/abs/2408.14860v1

Compressor summary:

Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

http://arxiv.org/abs/2408.14855v1

Compressor summary:

Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

http://arxiv.org/abs/2408.14853v1

Compressor summary:

Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing

http://arxiv.org/abs/2408.14849v1

Compressor summary:

Diffusion-Occ: 3D Point Cloud Completion via Occupancy Diffusion

http://arxiv.org/abs/2408.14846v1

Compressor summary:

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

http://arxiv.org/abs/2408.14845v1

Compressor summary:

Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

http://arxiv.org/abs/2408.14843v1

Compressor summary:

From Bias to Balance: Detecting Facial Expression Recognition Biases in Large Multimodal Foundation Models

http://arxiv.org/abs/2408.14842v1

Compressor summary:

Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

http://arxiv.org/abs/2408.14841v1

Compressor summary:

CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

http://arxiv.org/abs/2408.14840v1

Compressor summary:

Diffusion Models Are Real-Time Game Engines

http://arxiv.org/abs/2408.14837v1

Compressor summary:

Time-Aware Face Anti-Spoofing with Rotation Invariant Local Binary Patterns and Deep Learning

http://arxiv.org/abs/2408.14829v1

Compressor summary:

Alfie: Democratising RGBA Image Generation With No $$$

http://arxiv.org/abs/2408.14826v1

Compressor summary:

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

http://arxiv.org/abs/2408.14825v1

Compressor summary:

LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming

http://arxiv.org/abs/2408.14823v1

Compressor summary:

Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems

http://arxiv.org/abs/2408.14821v1

Compressor summary:

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

http://arxiv.org/abs/2408.14819v1

Compressor summary:

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

http://arxiv.org/abs/2408.14817v1

Compressor summary:

HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

http://arxiv.org/abs/2408.14812v1

Compressor summary:

Brain-inspired Artificial Intelligence: A Comprehensive Review

http://arxiv.org/abs/2408.14811v1

Compressor summary:

GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer Based Fusion Network for Multimodal Sentiment Analysis

http://arxiv.org/abs/2408.14809v1

Compressor summary:

Poly2Vec: Polymorphic Encoding of Geospatial Objects for Spatial Reasoning with Deep Neural Networks

http://arxiv.org/abs/2408.14806v1

Compressor summary:

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

http://arxiv.org/abs/2408.14805v1

Compressor summary:

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

http://arxiv.org/abs/2408.14802v1

Compressor summary:

Optimizing Structured Data Processing through Robotic Process Automation

http://arxiv.org/abs/2408.14791v1

Compressor summary:

Learning from Complementary Features

http://arxiv.org/abs/2408.14788v1

Compressor summary:

Unsupervised-to-Online Reinforcement Learning

http://arxiv.org/abs/2408.14785v1

Compressor summary:

GINN-KAN: Interpretability pipelining with applications in Physics Informed Neural Networks

http://arxiv.org/abs/2408.14780v1

Compressor summary:

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

http://arxiv.org/abs/2408.14774v1

Compressor summary:

A global AI community requires language-diverse publishing

http://arxiv.org/abs/2408.14772v1

Compressor summary:

Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification

http://arxiv.org/abs/2408.14770v1

Compressor summary:

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

http://arxiv.org/abs/2408.14765v1

Compressor summary:

SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding

http://arxiv.org/abs/2408.14764v1

Compressor summary:

Channel-wise Influence: Estimating Data Influence for Multivariate Time Series

http://arxiv.org/abs/2408.14763v1

Compressor summary:

Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

http://arxiv.org/abs/2408.14762v1

Compressor summary:

Learning effective pruning at initialization from iterative pruning

http://arxiv.org/abs/2408.14757v1

Compressor summary:

Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models

http://arxiv.org/abs/2408.14756v1

Compressor summary:

LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models

http://arxiv.org/abs/2408.14750v1

Compressor summary:

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

http://arxiv.org/abs/2408.14744v1

Compressor summary:

Personalized Video Summarization using Text-Based Queries and Conditional Modeling

http://arxiv.org/abs/2408.14743v1

Compressor summary:

Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation

http://arxiv.org/abs/2408.14738v1

Compressor summary:

General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

http://arxiv.org/abs/2408.14734v1

Compressor summary:

OctFusion: Octree-based Diffusion Models for 3D Shape Generation

http://arxiv.org/abs/2408.14732v1

Compressor summary:

GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning

http://arxiv.org/abs/2408.14724v1

Compressor summary:

Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

http://arxiv.org/abs/2408.14723v1

Compressor summary:

PAT: Pruning-Aware Tuning for Large Language Models

http://arxiv.org/abs/2408.14721v1

Compressor summary: