arxiv compressed, 2024-09-27

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-27 generated by the compressor, my personal LLM-based project.

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

http://arxiv.org/abs/2409.17146v1

Compressor summary: The paper introduces Molmo, a new family of open VLMs with state-of-the-art performance, based on a novel image caption dataset collected from human speech annotations and diverse fine-tuning data.

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

http://arxiv.org/abs/2409.17145v1

Compressor summary: DreamWaltz-G is a novel learning framework for generating high-quality, animatable 3D avatars from text by integrating skeleton controls into 2D diffusion models and using a hybrid 3D Gaussian representation for efficient rendering and animation.

Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization

http://arxiv.org/abs/2409.17144v1

Compressor summary: The paper proposes an efficient regularization method for DP-SGD to protect privacy while training neural networks.

Attention Prompting on Image for Large Vision-Language Models

http://arxiv.org/abs/2409.17143v1

Compressor summary: The paper introduces a new visual prompting technique that enhances large vision-language models' abilities to follow text instructions by overlaying a text-query-guided attention heatmap on the input image.

FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression

http://arxiv.org/abs/2409.17141v1

Compressor summary: This paper analyzes neural network and transformer-based text compression techniques and proposes FineZip, an LLM-based system that improves compression time by combining online memorization and dynamic context, achieving comparable performance and outperforming traditional methods.

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

http://arxiv.org/abs/2409.17140v1

Compressor summary: AXIS is a new framework that uses large language models to improve agent performance by prioritizing API actions over UI interactions, reducing task completion time and cognitive workload.

PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization

http://arxiv.org/abs/2409.17137v1

Compressor summary: PEFT improves vision transformer performance but sacrifices generalization; PACE combines PEFT with consistency regularization to reduce gradients, align models, and improve generalization in fine-tuned models.

Streaming Neural Images

http://arxiv.org/abs/2409.17134v1

Compressor summary: This paper explores the limitations of Implicit Neural Representations (INRs) for image compression, focusing on computational cost, performance stability, and robustness.

Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation

http://arxiv.org/abs/2409.17130v1

Compressor summary: The study uses Bangla-BERT to identify and categorize toxic comments targeting transgender, indigenous, and migrant people on social media in the Bengali language, finding that Bangla-BERT performs best among various models.

On-orbit Servicing for Spacecraft Collision Avoidance With Autonomous Decision Making

http://arxiv.org/abs/2409.17125v1

Compressor summary: The study presents an AI-based autonomous servicer that uses Reinforcement Learning to detect collisions, dock with endangered satellites, and perform collision avoidance maneuvers.

Small data deep learning methodology for in-field disease detection

http://arxiv.org/abs/2409.17119v1

Compressor summary: The study presents a machine learning model that uses high-resolution images from the field to detect early symptoms of late blight in potato crops, overcoming previous limitations and showing promising results for disease detection in agriculture.

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

http://arxiv.org/abs/2409.17115v1

Compressor summary: Key points: - Small language models can refine corpora as well as human experts using ProX framework - ProX treats data refinement as a programming task and generates operations for each example - ProX improves performance on various benchmarks and domain-specific continual pre-training Summary: ProX is a novel framework that enables small language models to refine corpora by generating and executing fine-grained operations for each example, achieving human-level data quality and performance improvements across various tasks.

Characterizing stable regions in the residual stream of LLMs

http://arxiv.org/abs/2409.17113v1

Compressor summary: Stable regions in Transformers' outputs are areas where the model is insensitive to small changes but sensitive at boundaries, and these regions correspond to semantic distinctions.

Unveiling Ontological Commitment in Multi-Modal Foundation Models

http://arxiv.org/abs/2409.17109v1

Compressor summary: The text proposes a method to extract meaningful superclass hierarchies from multimodal deep neural networks for qualitative reasoning models validation and verification.

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

http://arxiv.org/abs/2409.17106v1

Compressor summary: Text2CAD is an AI framework that generates parametric CAD models from natural language instructions, speeding up prototyping and aiding design.

General Detection-based Text Line Recognition

http://arxiv.org/abs/2409.17095v1

Compressor summary: The paper presents DTLR, a detection-based text line recognition approach that works for various scripts, improving performance on Chinese and cipher recognition tasks.

Accumulator-Aware Post-Training Quantization

http://arxiv.org/abs/2409.17092v1

Compressor summary: The paper introduces AXE, a framework for post-training quantization that improves overflow avoidance and supports multi-stage accumulation for large language models.

Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

http://arxiv.org/abs/2409.17091v1

Compressor summary: Ctrl-GenAug is a framework that uses diffusion-based generative augmentation to create high-quality synthetic medical datasets for downstream tasks while controlling semantics and sequential coherence.

Locally Regularized Sparse Graph by Fast Proximal Gradient Descent

http://arxiv.org/abs/2409.17090v1

Compressor summary: The SRSG method creates a sparse graph that preserves local geometric structure and uses support regularization to encourage smoothness, leading to better data clustering than existing methods.

Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

http://arxiv.org/abs/2409.17085v1

Compressor summary: This paper explores how parameter-efficient fine-tuning methods can improve the reliability of monocular depth estimation using Bayesian neural networks.

Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?

http://arxiv.org/abs/2409.17080v1

Compressor summary: SVAT benchmark tests if large vision-language models can learn new visuospatial tasks from visual demonstrations alone, finding they struggle without curriculum learning.

Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming

http://arxiv.org/abs/2409.17077v1

Compressor summary: Dream11 uses a new transformer model to predict user spending in fantasy sports games based on past transactions, improving accuracy over existing models.

Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition

http://arxiv.org/abs/2409.17073v1

Compressor summary: The paper proposes a novel method for decomposing generated answers into their source documents using in-context learning and negative sampling, aiming to improve answer attribution for long documents.

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

http://arxiv.org/abs/2409.17066v1

Compressor summary: VPTQ is a vector quantization method for large language models that achieves extremely low-bit quantization using second-order optimization and improves accuracy and compression efficiency.

Benchmarking Domain Generalization Algorithms in Computational Pathology

http://arxiv.org/abs/2409.17063v1

Compressor summary: The study evaluates 30 domain generalization algorithms on 3 computational pathology tasks, finding self-supervised learning and stain augmentation to be the most effective methods.

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

http://arxiv.org/abs/2409.17058v1

Compressor summary: The paper proposes a one-step image super-resolution method using a degradation-guided Low-Rank Adaptation module that corrects model parameters based on low-resolution image information and improves efficiency and quality over existing diffusion-based methods.

DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data

http://arxiv.org/abs/2409.17055v1

Compressor summary: DRIM is a multimodal deep learning method that captures shared and unique representations from diverse medical data to improve prognosis prediction and treatment pathways.

Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia

http://arxiv.org/abs/2409.17054v1

Compressor summary: The paper proposes using AI to automate transcription, translation, and summarization of doctor-patient conversations to improve efficiency, quality, and accuracy in Puskesmas, Indonesian primary healthcare centers.

ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis

http://arxiv.org/abs/2409.17049v1

Compressor summary: ControlCity is a multi-source geographic data transformation method that uses text, images, metadata, and road network data to generate accurate and diverse urban building footprints from volunteer geographic information.

Detecting Temporal Ambiguity in Questions

http://arxiv.org/abs/2409.17046v1

Compressor summary: TEMPAMBIQA is a new dataset to study temporally ambiguous open-domain questions and proposes novel search strategies and baselines to detect them.

GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design

http://arxiv.org/abs/2409.17045v1

Compressor summary: Key points: - Dataset (GeoBiked) for Deep Generative Models (DGMs) in engineering design - Automated data labeling using large-scale foundation models - Two techniques: hyperfeatures for geometric correspondence and text descriptions with VLMs - Trade-off between creativity and accuracy in text generation Summary: The paper introduces GeoBiked, a dataset for DGMs in engineering design, and explores automated data labeling using foundation models. It proposes two techniques to label technical images: hyperfeatures for geometric correspondence and text descriptions with VLMs, balancing creativity and accuracy.

How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not

http://arxiv.org/abs/2409.17044v1

Compressor summary: The paper investigates how different components of speech-to-text systems affect performance and whether the best adapter design varies depending on the model used.

Counterfactual Token Generation in Large Language Models

http://arxiv.org/abs/2409.17027v1

Compressor summary: Key points: - The text is about a story generated by a large language model and how it cannot produce counterfactual alternatives to tokens it has generated before. - The authors propose a causal model that enables such counterfactual generation at low cost, simplicity, and without fine-tuning or prompt engineering. - They apply their model on Llama 3 8B-instruct and use it for bias detection. Summary: The text presents a causal model that lets large language models generate alternative stories based on their tokens, without additional cost or effort. The authors test the model on a story generation task and find insights about the model's worldview.

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

http://arxiv.org/abs/2409.17021v1

Compressor summary: CombU is a novel neural network activation function that combines existing functions at various dimensions across layers to approximate complex data relationships, showing better performance than current methods in experiments.

CNN Mixture-of-Depths

http://arxiv.org/abs/2409.17016v1

Compressor summary: Mixture-of-Depths (MoD) is a novel approach that improves the computational efficiency of CNNs by selectively processing channels based on their relevance to the current prediction, achieving similar or better performance with reduced inference times and parameters.

AI-Driven Risk-Aware Scheduling for Active Debris Removal Missions

http://arxiv.org/abs/2409.17012v1

Compressor summary: The paper presents a deep reinforcement learning model for autonomous decision-making in orbital debris removal missions, enabling efficient and adaptive planning.

LLM-CARD: Towards a Description and Landscape of Large Language Models

http://arxiv.org/abs/2409.17011v1

Compressor summary: The paper presents a system to automatically extract key information about large language models from academic papers using NER and RE methods, aiming to help researchers with information overload.

Models Can and Should Embrace the Communicative Nature of Human-Generated Math

http://arxiv.org/abs/2409.17005v1

Compressor summary: The text argues that treating math as situated linguistic communication can benefit language models and suggests two case studies showing how language models interpret and generate math based on communicative intentions.

Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation

http://arxiv.org/abs/2409.17001v1

Compressor summary: The paper proposes a framework to improve optical flow in adverse weather conditions by using synthetic degraded domains as an intermediate between clean and real data, and transferring knowledge progressively and explicitly.

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

http://arxiv.org/abs/2409.16997v1

Compressor summary: INT-FlashAttention is a fast and memory-efficient attention module that works with fully INT8 activations and can be compatible with other data formats, improving inference speed and reducing quantization error.

What is the relationship between Slow Feature Analysis and the Successor Representation?

http://arxiv.org/abs/2409.16991v1

Compressor summary: The text compares two machine learning methods (SFA and SR) by analyzing their mathematical properties and applications in an MDP and a gridworld.

Single Image, Any Face: Generalisable 3D Face Generation

http://arxiv.org/abs/2409.16990v1

Compressor summary: Key points: - The text introduces a novel model, Gen3D-Face, that generates 3D human faces from unconstrained single images - The model uses a multi-view consistent diffusion framework, input-conditioned mesh estimation, and multi-view joint generation to achieve photorealistic results - The model outperforms previous alternatives for out-of-domain and in-domain settings Summary: The text presents Gen3D-Face, a new method that creates realistic 3D human face avatars from single images using a multi-view framework and geometry information.

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

http://arxiv.org/abs/2409.16986v1

Compressor summary: Exttt{Quad} is a data selection method for large language models that uses data influence to balance quality and diversity, improving pre-training results.

AXCEL: Automated eXplainable Consistency Evaluation using LLMs

http://arxiv.org/abs/2409.16984v1

Compressor summary: AXCEL is a new prompt-based metric for evaluating generated text consistency that provides explanations and generalizes across tasks without changing the prompt, outperforming existing metrics.

Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions

http://arxiv.org/abs/2409.16974v1

Compressor summary: The paper reviews literature on large language models, their developments, impacts, limitations, and future directions, focusing on responsible development, algorithmic improvements, ethical challenges, and societal implications.

Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization

http://arxiv.org/abs/2409.16973v1

Compressor summary: ASLS is a framework that uses self-supervised learning techniques to personalize large language models dynamically and improve user engagement and satisfaction.

Bridge to Real Environment with Hardware-in-the-loop for Wireless Artificial Intelligence Paradigms

http://arxiv.org/abs/2409.16968v1

Compressor summary: The text describes a new approach to test machine learning solutions for VANET using hardware-in-the-loop, which combines simulated and real-world testing to avoid unexpected outcomes.

ABCFair: an Adaptable Benchmark approach for Comparing Fairness Methods

http://arxiv.org/abs/2409.16965v1

Compressor summary: ABCFair is a benchmark approach that adapts to different problem settings and enables proper comparability of fairness methods across various use cases, including pre-, in-, and postprocessing techniques on both traditional and dual label datasets.

Informed deep hierarchical classification: a non-standard analysis inspired approach

http://arxiv.org/abs/2409.16956v1

Compressor summary: The proposed lexicographic hybrid deep neural network (LH-DNN) is a novel multi-output architecture that efficiently classifies data according to multiple labels in a hierarchical structure.

Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

http://arxiv.org/abs/2409.16954v1

Compressor summary: The paper proposes a method to improve low-resource language recognition in multilingual ASR models by using weighted cross-entropy and data augmentation.

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

http://arxiv.org/abs/2409.16949v1

Compressor summary: Key points: - The paper proposes a data augmentation framework using LLM and DM to generate synthetic images for few-shot learning - The method embeds semantic information into text prompts and adjusts the guidance weight based on CLIPScore - The approach improves diversity and adherence to target distribution and outperforms baselines on several benchmarks Summary: The paper presents a framework that uses LLM and DM to generate semantically rich synthetic images for few-shot learning, adjusting the guidance weight based on CLIPScore to balance diversity and target adherence.

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results

http://arxiv.org/abs/2409.16947v1

Compressor summary: The paper describes the 3rd NTIRE challenge on stereo image super-resolution, which aims to improve low-resolution stereo images by a factor of x4 while maintaining consistency and using limited resources.

Setting the AI Agenda -- Evidence from Sweden in the ChatGPT Era

http://arxiv.org/abs/2409.16946v1

Compressor summary: The paper analyzes how the AI meta-debate in Sweden has shifted from politicians to academics and become more focused on risks since ChatGPT's release.

Face Forgery Detection with Elaborate Backbone

http://arxiv.org/abs/2409.16945v1

Compressor summary: The text discusses improving Face Forgery Detection (FFD) models by revisiting the complete FFD workflow and integrating a ViT network with self-supervised learning to pre-train a backbone for better facial representation and forgery cue extraction.

Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model

http://arxiv.org/abs/2409.16938v1

Compressor summary: The paper proposes a new method for inserting objects into 3D scenes using Gaussian Splatting, MVInpainter, ControlNet, and mask-aware 3D reconstruction, which achieves better results than existing methods.

Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents

http://arxiv.org/abs/2409.16934v1

Compressor summary: The paper explores how to enhance named entity recognition (NER) for historical documents by identifying and neutralizing OCR-sensitive neurons in Transformer models.

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

http://arxiv.org/abs/2409.16925v1

Compressor summary: Key points: - The text introduces a new dataset (GTA-UAV) and a learning approach for vision-based geo-localization of UAVs in GPS-denied environments. - The dataset is based on computer games and covers different altitudes, attitudes, scenes, and targets. - The learning approach uses weight-based contrastive learning to avoid post-processing matching steps. Summary: The authors present a new vision-based geo-localization method for UAVs using a large-range dataset from computer games and weight-based contrastive learning.

AI-assisted Gaze Detection for Proctoring Online Exams

http://arxiv.org/abs/2409.16923v1

Compressor summary: The text describes a study that proposes an AI-assisted gaze detection system for online exams to help proctors identify test takers looking away from the screen and potentially using external resources.

Decomposition of Equivariant Maps via Invariant Maps: Application to Universal Approximation under Symmetry

http://arxiv.org/abs/2409.16922v1

Compressor summary: The paper presents a theory linking invariant and equivariant maps in group-symmetric deep neural networks, enabling the construction of universal equivariant architectures from invariant ones, and exploring their complexity and approximation rate.

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness

http://arxiv.org/abs/2409.16914v1

Compressor summary: The paper introduces a new feature called token cohesiveness that helps detect if text is generated by a large language model or written by humans, and proposes a generic dual-channel detection method named TOCSIN.

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

http://arxiv.org/abs/2409.16913v1

Compressor summary: The paper evaluates and improves the performance of Role-Playing Agents in handling different types of conflicting queries using a representation editing approach.

Pruning Multilingual Large Language Models for Multilingual Inference

http://arxiv.org/abs/2409.16911v1

Compressor summary: The study explores how to improve zero-shot learning of multilingual large language models by leveraging their translation capabilities between English and other languages, focusing on features with large magnitude that are critical for translation.

Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering

http://arxiv.org/abs/2409.16909v1

Compressor summary: The paper presents a new framework that improves time-sensitive question answering by enhancing temporal awareness and reasoning using Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning, outperforming existing language models in TSQA tasks.

An Adaptive Screen-Space Meshing Approach for Normal Integration

http://arxiv.org/abs/2409.16907v1

Compressor summary: The paper introduces an adaptive triangle mesh generation method for photometric stereo that uses normals to compute curvature, reduces vertex count, and speeds up normal integration.

Discriminative Anchor Learning for Efficient Multi-view Clustering

http://arxiv.org/abs/2409.16904v1

Compressor summary: The paper proposes DALMC, a method that learns discriminative view-specific features for multi-view clustering and builds a consensus anchor graph to capture complementary information across views.

AI-driven View Guidance System in Intra-cardiac Echocardiography Imaging

http://arxiv.org/abs/2409.16898v1

Compressor summary: The AI-driven closed-loop view guidance system assists operators in navigating intra-cardiac echocardiography catheter manipulation, improving accuracy and efficiency.

HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

http://arxiv.org/abs/2409.16897v1

Compressor summary: The Hyperbolic Vision Transformer (HVT) extends the traditional Vision Transformer by using hyperbolic geometry to better model hierarchical and relational dependencies in images.

Shifting from endangerment to rebirth in the Artificial Intelligence Age: An Ensemble Machine Learning Approach for Hawrami Text Classification

http://arxiv.org/abs/2409.16884v1

Compressor summary: The paper presents various text classification models for Hawrami, an endangered Kurdish dialect, using four different algorithms and achieving 96% accuracy with Linear SVM.

Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous

http://arxiv.org/abs/2409.16882v1

Compressor summary: The paper proposes a masked PPO algorithm to optimize the sequence of visiting space debris using Lambert solver, achieving significant time efficiency and computational speed improvements over standard methods.

Automating Traffic Model Enhancement with AI Research Agent

http://arxiv.org/abs/2409.16876v1

Compressor summary: Key points: - Traffic Research Agent (TR-Agent) is an AI system for developing and refining traffic models efficiently - TR-Agent consists of four modules that perform idea generation, theory formulation, evaluation, and optimization - TR-Agent improves performance across multiple traffic models and provides explanations for its optimizations - TR-Agent is open-source and supports research and collaboration Summary: TR-Agent is an AI system that autonomously develops and refines traffic models using four modules that perform various stages of the research pipeline. It improves performance across multiple models and explains its optimizations, while being open-source and fostering research collaboration.

Ethical and Scalable Automation: A Governance and Compliance Framework for Business Applications

http://arxiv.org/abs/2409.16872v1

Compressor summary: This paper proposes a framework to ensure ethical and controllable AI in businesses by balancing factors like performance and explainability, validated through case studies in finance and healthcare.

Multi-objective Evolution of Heuristic Using Large Language Model

http://arxiv.org/abs/2409.16867v1

Compressor summary: The paper proposes a new LLM-based framework for automatic generation of multiple heuristics in heuristic search, considering efficiency and scalability, and shows its effectiveness on two combinatorial optimization problems.

Risk-averse learning with delayed feedback

http://arxiv.org/abs/2409.16866v1

Compressor summary: The paper proposes two risk-averse learning algorithms that account for delayed feedback and bounded delays, and shows that one-point algorithms achieve sublinear regret under certain conditions while two-point algorithms perform better overall.

Linking in Style: Understanding learned features in deep learning models

http://arxiv.org/abs/2409.16865v1

Compressor summary: The paper proposes an automatic method to visualize and analyze learned features in convolutional neural networks using a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model, enabling interpretability and quantification of representations.

Towards Unified 3D Hair Reconstruction from Single-View Portraits

http://arxiv.org/abs/2409.16863v1

Compressor summary: The paper proposes a novel method for reconstructing 3D hair from single-view images, including both braided and un-braided styles, using a large-scale synthetic dataset and specialized diffusion priors.

The Role of Language Models in Modern Healthcare: A Comprehensive Review

http://arxiv.org/abs/2409.16860v1

Compressor summary: Large language models can process complex medical data, generate natural language for various healthcare tasks, and improve clinical decision-making, but they face challenges such as privacy, bias, and ethics.

A Versatile and Differentiable Hand-Object Interaction Representation

http://arxiv.org/abs/2409.16855v1

Compressor summary: CHOIR is a novel, versatile, and fully differentiable representation of hand-object interactions that improves contact accuracy and physical realism in grasp refinement and synthesis tasks using JointDiffusion, a diffusion model that learns from noisy interactions or object geometries.

Dispute resolution in legal mediation with quantitative argumentation

http://arxiv.org/abs/2409.16854v1

Compressor summary: The paper proposes QuAM, a framework that integrates facts and norms in legal mediation, and develops a new formalism to model goal argument acceptability based on variable values.

Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

http://arxiv.org/abs/2409.16850v1

Compressor summary: Our novel method for scene change detection uses a visual foundational model and full-image cross-attention to handle varying lighting, seasonal variations, and viewpoint differences, achieving significant improvements in F1-score and generalization over existing methods.

Exposing Assumptions in AI Benchmarks through Cognitive Modelling

http://arxiv.org/abs/2409.16849v1

Compressor summary: The authors suggest using explicit cognitive models to expose assumptions in cultural AI benchmarks, improving construct measurement and evaluation science.

IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

http://arxiv.org/abs/2409.16845v1

Compressor summary: IRASNet is a framework that uses clutter reduction and domain-invariant feature learning to improve automatic target recognition (ATR) in synthetic aperture radar (SAR) data.

Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

http://arxiv.org/abs/2409.16838v1

Compressor summary: The text introduces two new biologically-inspired CNN models that improve the robustness of image classification by simulating pre-cortical and V1 features, leading to better performance under common corruptions.

Demo2Vec: Learning Region Embedding with Demographic Information

http://arxiv.org/abs/2409.16837v1

Compressor summary: This study demonstrates how using demographic data improves region embedding and predictive performances for urban tasks like check-in prediction, crime rate prediction, and house price prediction.

Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing

http://arxiv.org/abs/2409.16832v1

Compressor summary: The paper proposes a framework for fractional reinforcement learning to optimize task scheduling and minimize the age of information in mobile edge computing for cyber-physical systems.

Learning phase-space flows using time-discrete implicit Runge-Kutta PINNs

http://arxiv.org/abs/2409.16826v1

Compressor summary: The paper introduces a method to solve complex nonlinear systems of differential equations using advanced neural networks, which can efficiently model particles' motion in various fields.

Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability

http://arxiv.org/abs/2409.16824v1

Compressor summary: Kalman filter layers improve reinforcement learning under partial observability by incorporating uncertainty in the latent state representation, leading to better decisions.

XAI-guided Insulator Anomaly Detection for Imbalanced Datasets

http://arxiv.org/abs/2409.16821v1

Compressor summary: The paper proposes a novel pipeline using object detection and fine-tuning to improve anomaly detection in powerline insulator components, and employs explainable-AI tools for precise localization and explanation of defects.

CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow

http://arxiv.org/abs/2409.16819v1

Compressor summary: Key points: - A new dataset for code generation with clear intent, code snippets, and unit tests - Covers various libraries including Pandas, Numpy, Regex, and 70+ standard libraries from Stack Overflow - Crafted by Python experts for finetuning and evaluation purposes - Refined to reduce data contamination and tested on leading models and GPT-4 - Available at \url{https://github.com/NathanaelBeau/CodeInsight} Summary: The paper presents a new dataset for code generation with examples that have clear intent, code snippets, and unit tests for different libraries. The dataset is crafted by Python experts to reduce data contamination and evaluate models like GPT-4. It can be accessed at \url{https://github.com/NathanaelBeau/CodeInsight}.

A parametric framework for kernel-based dynamic mode decomposition using deep learning

http://arxiv.org/abs/2409.16817v1

Compressor summary: Key points: - Surrogate modelling uses simplified models to speed up real-time simulations - The paper proposes a parametric framework for kernel-based dynamic mode decomposition method based on LANDO algorithm - The framework has two stages: offline (training) and online (prediction) - Dimensionality reduction is used to reduce computational cost - Three numerical examples show the efficiency and effectiveness of the framework Summary: The paper presents a parametric framework that uses kernel-based dynamic mode decomposition and LANDO algorithm for surrogate modelling, with two stages of training and prediction, dimensionality reduction, and three numerical examples.

Accelerating TinyML Inference on Microcontrollers through Approximate Kernels

http://arxiv.org/abs/2409.16815v1

Compressor summary: The authors propose a kernel-based approximation framework that reduces latency and memory usage of approximate CNNs on MCUs without sacrificing accuracy in TinyML applications.

PeerArg: Argumentative Peer Review with LLMs

http://arxiv.org/abs/2409.16813v1

Compressor summary: The paper proposes a new system called PeerArg that combines large language models with knowledge representation methods to support and understand peer review processes, and shows that it performs better than an end-2-end LLM for predicting paper acceptance from reviews.

Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices

http://arxiv.org/abs/2409.16808v1

Compressor summary: This paper evaluates object detection models' efficiency and performance on various edge devices, finding trade-offs between accuracy, speed, and energy efficiency.

A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

http://arxiv.org/abs/2409.16807v1

Compressor summary: The paper introduces a new dataset and evaluation method for detecting different types of hypocrisy accusations in climate change discussions using large language models.

Topological SLAM in colonoscopies leveraging deep features and topological priors

http://arxiv.org/abs/2409.16806v1

Compressor summary: ColonSLAM is a system that creates topological maps of the colon using SLAM, deep features, and topological priors to relate submaps from different times.

Large Language Model Predicts Above Normal All India Summer Monsoon Rainfall in 2024

http://arxiv.org/abs/2409.16799v1

Compressor summary: The research adapts and fine-tunes a LLM model to predict AISMR with high accuracy and low error, forecasting an above-normal monsoon for 2024.

Scalable Ensemble Diversification for OOD Generalization and Detection

http://arxiv.org/abs/2409.16797v1

Compressor summary: The paper introduces Scalable Ensemble Diversification (SED), a method that improves OOD generalization and detection by encouraging disagreement among models on hard training samples, without requiring OOD samples.

Spacewalker: Traversing Representation Spaces for Fast Interactive Exploration and Annotation of Unstructured Data

http://arxiv.org/abs/2409.16793v1

Compressor summary: Spacewalker is an interactive tool that helps users explore, annotate, and analyze unstructured data from various industries by visualizing it in low-dimensional spaces and detecting semantic similarities.

Symbolic State Partition for Reinforcement Learning

http://arxiv.org/abs/2409.16791v1

Compressor summary: The text proposes a method to improve reinforcement learning in continuous state spaces by using symbolic execution to extract partitions that capture the key structure of the environment dynamics, leading to faster learning and better policy performance.

Mitigating the Bias of Large Language Model Evaluation

http://arxiv.org/abs/2409.16788v1

Compressor summary: The paper proposes methods to reduce bias in LLM-as-a-Judge evaluations by calibrating closed-source models and using contrastive training with negative samples for open-source models, improving both probability and prompt level quality.

Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution

http://arxiv.org/abs/2409.16787v1

Compressor summary: The study proposes a feature selection method that uses Explainable Artificial Intelligence to improve regression prediction accuracy and stability for blade vibration analysis in turbo machinery.

Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction

http://arxiv.org/abs/2409.16783v1

Compressor summary: HARM is a holistic method to test large language models by generating diverse and realistic adversarial examples using a fine-grained risk taxonomy and multi-turn probing.

LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ

http://arxiv.org/abs/2409.16779v1

Compressor summary: LLaMa-SciQ is an educational chatbot that helps students solve STEM MCQs by using Retrieval-Augmented Generation and a compressed LLaMa-8B model, which improves accessibility but does not significantly enhance accuracy.

Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training

http://arxiv.org/abs/2409.16769v1

Compressor summary: The paper develops a dynamic learning rate algorithm for neural networks that improves optimization by ensuring stable and consistent training dynamics using Lyapunov stability principles.

Interpreting Deep Neural Network-Based Receiver Under Varying Signal-To-Noise Ratios

http://arxiv.org/abs/2409.16768v1

Compressor summary: The proposed method helps interpret convolutional neural networks by identifying which units contribute most or least to specific parameters, providing global and local insights, and generalizing to other architectures.

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

http://arxiv.org/abs/2409.16767v1

Compressor summary: The paper analyzes the interaction between data representations and classification weights using information-theoretic metrics and proposes new ones to improve supervised and semi-supervised learning.

MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features

http://arxiv.org/abs/2409.16765v1

Compressor summary: Key points: - Paper introduces a new dataset and algorithm for aligning lecture videos with slides - Algorithm uses speech, text, and image features and is faster and more accurate than SIFT - OCR and audio transcripts are important for alignment - Matching accuracy varies across lectures due to video quality and lecture style Summary: The paper proposes a multimodal algorithm that aligns lecture videos with slides using speech, text, and image features. It outperforms SIFT in speed and accuracy, and shows that OCR and audio transcripts are crucial for alignment. However, it faces challenges from video quality and lecture style variations.

Offline and Distributional Reinforcement Learning for Radio Resource Management

http://arxiv.org/abs/2409.16764v1

Compressor summary: The paper proposes an offline and distributional reinforcement learning scheme for radio resource management that performs better than conventional models and online RL in real-world stochastic environments.

Statewide Visual Geolocalization in the Wild

http://arxiv.org/abs/2409.16763v1

Compressor summary: The paper introduces a method that uses aerial images to predict the geolocation of street-view photos in large regions, achieving 60.6% accuracy for non-panoramic photos in Massachusetts.

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

http://arxiv.org/abs/2409.16756v1

Compressor summary: LATEC is a large-scale benchmark that evaluates 17 XAI methods with 20 metrics, considering different model architectures and input data types, to help practitioners choose the best method for their problem.

E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

http://arxiv.org/abs/2409.16751v1

Compressor summary: E-SQL is a novel pipeline for translating natural language queries into SQL that addresses challenges like database schema complexity, query ambiguity, and intricate query structures using direct schema linking and candidate predicate augmentation.

Commonly Interesting Images

http://arxiv.org/abs/2409.16736v1

Compressor summary: Images can be interesting depending on the viewer's preferences and characteristics, with some being universally appealing for their aesthetics and others being personally meaningful.

GB-RVFL: Fusion of Randomized Neural Network and Granular Ball Computing

http://arxiv.org/abs/2409.16735v1

Compressor summary: The paper proposes granular ball RVFL (GB-RVFL) and graph embedding GB-RVFL (GE-GB-RVFL), which improve the scalability and robustness of random vector functional link network for classification tasks by using granular balls as inputs and preserving the dataset's geometric structure.

Non-stationary BERT: Exploring Augmented IMU Data For Robust Human Activity Recognition

http://arxiv.org/abs/2409.16730v1

Compressor summary: The paper introduces OPPOHAR, a new dataset for human activity recognition using phone IMU data, and proposes Non-stationary BERT, a lightweight network with a two-stage training method and a data augmentation technique that outperforms existing approaches.

RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems

http://arxiv.org/abs/2409.16727v1

Compressor summary: This paper analyzes character hallucination in large language model-powered role-playing systems, introduces the RoleBreak framework to study it as an attack, and proposes the Narrator Mode defense strategy to improve query generalization and narrative coherence.

Verified Relative Safety Margins for Neural Network Twins

http://arxiv.org/abs/2409.16726v1

Compressor summary: The paper introduces Relative Safety Margins (RSMs) to measure how robustly two DNN classifiers make decisions on the same input, and proposes a framework to estimate RSM gains or losses under perturbations.

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

http://arxiv.org/abs/2409.16723v1

Compressor summary: EAGLE is a novel multimodal large language model that can understand arbitrary referring visual prompts without specialized feature encoding or fine-tuning, using colored patches on images and geometry-agnostic learning.

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

http://arxiv.org/abs/2409.16722v1

Compressor summary: PMSS improves low-rank adaptation by selecting skeletons from pre-trained weights and learning a small matrix with high-rank updates for efficient inference.

A Multi-Dataset Classification-Based Deep Learning Framework for Electronic Health Records and Predictive Analysis in Healthcare

http://arxiv.org/abs/2409.16721v1

Compressor summary: The study proposes a novel framework that combines Residual Networks and Artificial Neural Networks to classify multiple datasets from electronic health records, achieving high accuracies for detecting diseases such as heart conditions, cirrhosis, and retinal issues.

Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification

http://arxiv.org/abs/2409.16718v1

Compressor summary: The paper proposes ClipFit, a method to improve zero-shot CLIP performance by fine-tuning specific bias terms and normalization layers without extra parameters.

Pose-Guided Fine-Grained Sign Language Video Generation

http://arxiv.org/abs/2409.16709v1

Compressor summary: The authors propose a new Pose-Guided Motion Model (PGMM) for generating high-quality sign language videos by using optical flow warping, pose fusion, and a novel metric to measure temporal consistency.

Probing Omissions and Distortions in Transformer-based RDF-to-Text Models

http://arxiv.org/abs/2409.16707v1

Compressor summary: The paper investigates how to probe omissions and distortions in RDF-to-Text generation using two methods, and finds that the encoder is responsible for some information loss.

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

http://arxiv.org/abs/2409.16706v1

Compressor summary: Pix2Next is a novel image translation framework that generates high-quality Near-Infrared images from RGB inputs using an encoder-decoder architecture with cross-attention and a multi-scale PatchGAN discriminator, improving both quantitative metrics and visual quality and enabling the scaling up of NIR datasets for computer vision applications.

Numerical Approximation Capacity of Neural Networks with Bounded Parameters: Do Limits Exist, and How Can They Be Measured?

http://arxiv.org/abs/2409.16697v1

Compressor summary: The text explores how bounded parameters affect neural networks' approximation capacity, introduces new concepts to measure it, and discusses connections with random parameter networks and various network design choices.

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

http://arxiv.org/abs/2409.16694v1

Compressor summary: The paper surveys low-bit quantization methods for large language models, covering their principles, implementations, strategies, frameworks, systems, techniques, and trends.

CaBRNet, an open-source library for developing and evaluating Case-Based Reasoning Models

http://arxiv.org/abs/2409.16693v1

Compressor summary: CaBRNet is a new framework to create self-explainable AI models that can be easily compared and reproduced.

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model

http://arxiv.org/abs/2409.16689v1

Compressor summary: The paper introduces Layout-Corrector, a module that helps discrete diffusion models improve layout generation by identifying and correcting inharmonious elements in the layout.

MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

http://arxiv.org/abs/2409.16686v1

Compressor summary: MSI-Agent is an embodied agent that improves planning and decision-making by summarizing, storing, and utilizing insight across different scales using a three-part pipeline.

Erase then Rectify: A Training-Free Parameter Editing Approach for Cost-Effective Graph Unlearning

http://arxiv.org/abs/2409.16684v1

Compressor summary: The paper proposes Erase then Rectify (ETR), a training-free approach for efficient and scalable graph unlearning that preserves model utility without additional training or full data access.

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

http://arxiv.org/abs/2409.16682v1

Compressor summary: The paper compares Text-to-SQL and E2E TQA for Table-based Question Answering, identifying their strengths and weaknesses, and proposes a synergistic approach that combines both methods using answer selectors.

GraphLoRA: Structure-Aware Contrastive Low-Rank Adaptation for Cross-Graph Transfer Learning

http://arxiv.org/abs/2409.16670v1

Compressor summary: GraphLoRA is a method to transfer well-trained Graph Neural Networks (GNNs) to different graph domains by aligning feature and structural distributions using low-rank adaptation and structure-aware regularization.

Topic-aware Causal Intervention for Counterfactual Detection

http://arxiv.org/abs/2409.16668v1

Compressor summary: The paper proposes a new counterfactual detection model that uses neural topic model to capture global semantics, improves performance over existing models, and reduces bias in causal intervention.

A Character-Centric Creative Story Generation via Imagination

http://arxiv.org/abs/2409.16667v1

Compressor summary: The text introduces CCI, a novel story generation framework that uses DALL-E 3 to create visual representations of key elements and enhances character detail in creative stories.

TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans

http://arxiv.org/abs/2409.16666v1

Compressor summary: The paper introduces TalkinNeRF, a framework that learns to generate realistic full-body talking humans from monocular videos using a dynamic neural radiance field, capturing body pose, hand gestures, and facial expressions.

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

http://arxiv.org/abs/2409.16658v1

Compressor summary: The authors find that pre-trained language models often produce different generation probabilities and uncertainty distributions for unfaithfully generated texts, and use this to develop a training algorithm that reduces hallucination and improves faithfulness and quality.

The Credibility Transformer

http://arxiv.org/abs/2409.16653v1

Compressor summary: The authors propose a new Transformer architecture for tabular data with a credibility mechanism that improves stability and performance over existing models.

Progressive Representation Learning for Real-Time UAV Tracking

http://arxiv.org/abs/2409.16652v1

Compressor summary: The paper introduces PRL-Track, a novel framework for visual object tracking on UAVs that uses progressive representation learning with appearance and semantic regulators to handle aspect ratio changes and occlusion in complex environments.

Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data

http://arxiv.org/abs/2409.16647v1

Compressor summary: The authors propose a method to generate domain-independent descriptive texts for time-series data using two approaches, and create a new dataset called TACO to train a model based on contrastive learning.

Cross-Lingual and Cross-Cultural Variation in Image Descriptions

http://arxiv.org/abs/2409.16646v1

Compressor summary: The study shows how different languages describe images differently, with some entities being more or less frequently mentioned depending on language and culture.

Task Addition in Multi-Task Learning by Geometrical Alignment

http://arxiv.org/abs/2409.16645v1

Compressor summary: The text introduces a new method to improve molecular property prediction by adding tasks to an existing algorithm, enhancing its performance while keeping computation costs low.

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

http://arxiv.org/abs/2409.16636v1

Compressor summary: Debate training improves language model performance in reading comprehension tasks, while consultancy training does not.

Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models

http://arxiv.org/abs/2409.16635v1

Compressor summary: The paper introduces Judgment of Thought, a prompt engineering technique that uses three roles (lawyer, prosecutor, and judge) to improve binary logical reasoning performance in both LLM benchmarks and real-world tasks.

Functional Stochastic Gradient MCMC for Bayesian Neural Networks

http://arxiv.org/abs/2409.16632v1

Compressor summary: Functional SGMCMC is a new method for inferring Bayesian neural networks that uses diffusion dynamics to incorporate informative functional priors and achieve better accuracy and uncertainty quantification.

Enhancing Nighttime UAV Tracking with Light Distribution Suppression

http://arxiv.org/abs/2409.16631v1

Compressor summary: This paper proposes LDEnhancer, a novel enhancer for nighttime UAV tracking that suppresses uneven light distribution and improves image content, and introduces a new dataset (NAT2024-2) for evaluating low-light enhancement methods.

Stochastic Subsampling With Average Pooling

http://arxiv.org/abs/2409.16630v1

Compressor summary: The authors propose stochastic average pooling, a new module for deep neural networks that combines Dropout-like stochasticity in pooling, to achieve regularization without degrading performance.

Ascend HiFloat8 Format for Deep Learning

http://arxiv.org/abs/2409.16626v1

Compressor summary: The paper proposes a new 8-bit floating-point data format for deep learning called HiFloat8 that balances precision and dynamic range and can be used in both training and inference of AI models.

On Your Mark, Get Set, Predict! Modeling Continuous-Time Dynamics of Cascades for Information Popularity Prediction

http://arxiv.org/abs/2409.16623v1

Compressor summary: ConCat is a model that uses neural ODEs and temporal point processes to predict information popularity by capturing the continuous-time dynamics of cascades with irregular events.

Entailment-Driven Privacy Policy Classification with LLMs

http://arxiv.org/abs/2409.16621v1

Compressor summary: Key points: - Privacy policies are lengthy and complicated, leading to uninformed consent for data collection - Several attempts to make privacy policies more user friendly have been made - The paper proposes an entailment-driven LLM based framework to classify paragraphs of privacy policies into meaningful labels - The framework improves F1 score by 11.2% and provides explainable and meaningful predictions Summary: The paper presents a new framework that uses large language models to simplify privacy policies and help users make informed decisions, achieving better performance and interpretability than traditional methods.

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

http://arxiv.org/abs/2409.16620v1

Compressor summary: The paper introduces an optimized MCTS algorithm for solving complex decision-making problems like the FrozenLake environment, which uses cumulative reward and visit count tables along with UCT formula to learn efficiently and outperform other methods.

CasFT: Future Trend Modeling for Information Popularity Prediction with Dynamic Cues-Driven Diffusion Models

http://arxiv.org/abs/2409.16619v1

Compressor summary: CasFT is a method that predicts content popularity on social platforms by combining spatiotemporal patterns and future trends using neural ODEs and diffusion models.

Claim-Guided Textual Backdoor Attack for Practical Applications

http://arxiv.org/abs/2409.16618v1

Compressor summary: The Claim-Guided Backdoor Attack (CGBA) is a novel method that uses textual claims as triggers to fool language models without requiring post-distribution manipulation, enhancing the feasibility of practical backdoor attacks.

DeformStream: Deformation-based Adaptive Volumetric Video Streaming

http://arxiv.org/abs/2409.16615v1

Compressor summary: Deformation-based Adaptive Volumetrial Video Streaming is a new framework that uses embedded deformation to improve volumetric video streaming performance by reducing bandwidth usage and ensuring visual coherence between frames, while accounting for network conditions and quality of experience.

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

http://arxiv.org/abs/2409.16609v1

Compressor summary: The paper proposes a new method to identify and rank the causal chain of climate disturbances using Random Forest Regression and SHAP, and tests it on synthetic and real data sets.

Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications

http://arxiv.org/abs/2409.16605v1

Compressor summary: The paper introduces SchNovel, a benchmark to evaluate large language models' ability to assess novelty in scholarly papers, and RAG-Novelty, which simulates the human review process.

Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

http://arxiv.org/abs/2409.16604v1

Compressor summary: The paper proposes a semi-supervised low-light image enhancement framework using mean-teacher, semantic-aware contrastive loss, Mamba-based backbone, and perceptive loss to address color cast and texture issues.

Overview of the First Shared Task on Clinical Text Generation: RRG24 and "Discharge Me!"

http://arxiv.org/abs/2409.16603v1

Compressor summary: The text describes a shared task involving natural language generation in healthcare to generate radiology reports and discharge summaries, aiming to reduce clinician workload.

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

http://arxiv.org/abs/2409.16597v1

Compressor summary: EventHallusion is a benchmark to evaluate VideoLLMs' hallucination problem in video event comprehension, and Temporal Contrastive Decoding (TCD) is a method to improve their performance.

Pre-trained Graphformer-based Ranking at Web-scale Search (Extended Abstract)

http://arxiv.org/abs/2409.16590v1

Compressor summary: MPGraf is a new model that combines Transformers and Graph Neural Networks for learning to rank web search results by integrating their complementary strengths.

AutoSTF: Decoupled Neural Architecture Search for Cost-Effective Automated Spatio-Temporal Forecasting

http://arxiv.org/abs/2409.16586v1

Compressor summary: AutoSTF is a framework that efficiently searches for optimal neural network architectures for spatio-temporal forecasting by decoupling the search space and using multi-patch transfer.

SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling

http://arxiv.org/abs/2409.16581v1

Compressor summary: SelectiveKD is a semi-supervised learning framework that uses knowledge distillation to train effective cancer detection models for Digital Breast Tomosynthesis with limited annotated data.

Efficient and generalizable nested Fourier-DeepONet for three-dimensional geological carbon sequestration

http://arxiv.org/abs/2409.16572v1

Compressor summary: The nested Fourier-DeepONet is a machine learning model that improves efficiency and prediction accuracy for geological carbon sequestration simulations by combining two techniques, without compromising generalization and extrapolation ability.

Disentangling Questions from Query Generation for Task-Adaptive Retrieval

http://arxiv.org/abs/2409.16570v1

Compressor summary: The paper introduces EGG, a query generator that adapts to different search intents by compiling high-level tasks into task-adaptive queries, and shows its effectiveness on the BeIR benchmark using a smaller model than existing approaches.

Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels

http://arxiv.org/abs/2409.16563v1

Compressor summary: This paper explores using synthetic labels to improve a lightweight language model's performance on medical tasks, showing its potential for specializing large language models in the medical domain.

Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference

http://arxiv.org/abs/2409.16560v1

Compressor summary: The paper proposes dynamic-width speculative beam decoding (DSBD), a method that integrates speculative decoding and beam sampling to improve the speed and quality of inference for large language models.

EMIT- Event-Based Masked Auto Encoding for Irregular Time Series

http://arxiv.org/abs/2409.16554v1

Compressor summary: The paper introduces EMIT, a pretraining framework for irregular time series that uses event-based masking in the latent space to enhance model performance and robustness.

AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization

http://arxiv.org/abs/2409.16546v1

Compressor summary: The paper proposes a mixed-precision quantization approach that evaluates parameter importance using 'precision alignment' and develops a dynamic KV-Cache technique to reduce memory access latency and speed up large language model inference.

Monge-Kantorovich Fitting With Sobolev Budgets

http://arxiv.org/abs/2409.16541v1

Compressor summary: The paper studies how to approximate an $n$-dimensional probability measure using a measure on a lower-dimensional space, and proposes a functional to minimize the approximation error while constraining the complexity of the mapping between the spaces.

Context-aware and Style-related Incremental Decoding framework for Discourse-Level Literary Translation

http://arxiv.org/abs/2409.16539v1

Compressor summary: The authors present a novel method for translating literary texts that uses Continual Pre-training, Supervised Fine-tuning, and Incremental Decoding to maintain coherence and preserve original quality.

Source-Free Domain Adaptation for YOLO Object Detection

http://arxiv.org/abs/2409.16538v1

Compressor summary: SF-YOLO is a teacher-student framework that adapts YOLO object detection to new domains without using source data, achieving competitive results with simplicity and efficiency.

A QoE-Aware Split Inference Accelerating Algorithm for NOMA-based Edge Intelligence

http://arxiv.org/abs/2409.16537v1

Compressor summary: The paper proposes a resource allocation algorithm (ERA) for edge intelligence to balance inference delay, quality of experience (QoE), and resource consumption in model split inference scenarios.

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

http://arxiv.org/abs/2409.16535v1

Compressor summary: Prompt Sliders is a text-based method to learn and control image attributes across different diffusion models, improving efficiency and generalizability compared to Concept Sliders.

Graph Pruning Based Spatial and Temporal Graph Convolutional Network with Transfer Learning for Traffic Prediction

http://arxiv.org/abs/2409.16532v1

Compressor summary: The study proposes a novel network (TL-GPSTGN) that uses graph pruning and transfer learning to improve prediction accuracy in road networks with limited data.

Understanding the Cognitive Complexity in Language Elicited by Product Images

http://arxiv.org/abs/2409.16521v1

Compressor summary: The text describes how product images can reveal cognitive processes through the language used to describe them, and presents an approach for measuring and validating this cognitive complexity using natural language models.

SynChart: Synthesizing Charts from Language Models

http://arxiv.org/abs/2409.16517v1

Compressor summary: The paper explores how to use large language models (LLMs) for data generation in multi-modality tasks, focusing on chart understanding, and presents a large dataset and a chart-expert model that outperform GPT-4V.