This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-26 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.17146v1
Compressor summary: The paper introduces Molmo, a new family of open VLMs with state-of-the-art performance, based on a novel image caption dataset collected from human speech annotations and diverse fine-tuning data.
http://arxiv.org/abs/2409.17145v1
Compressor summary: DreamWaltz-G is a novel learning framework for generating high-quality, animatable 3D avatars from text by integrating skeleton controls into 2D diffusion models and using a hybrid 3D Gaussian representation for efficient rendering and animation.
http://arxiv.org/abs/2409.17144v1
Compressor summary: The paper proposes an efficient regularization method for DP-SGD to protect privacy while training neural networks.
http://arxiv.org/abs/2409.17143v1
Compressor summary: The paper introduces a new visual prompting technique that enhances large vision-language models' abilities to follow text instructions by overlaying a text-query-guided attention heatmap on the input image.
http://arxiv.org/abs/2409.17141v1
Compressor summary: This paper analyzes neural network and transformer-based text compression techniques and proposes FineZip, an LLM-based system that improves compression time by combining online memorization and dynamic context, achieving comparable performance and outperforming traditional methods.
http://arxiv.org/abs/2409.17140v1
Compressor summary: AXIS is a new framework that uses large language models to improve agent performance by prioritizing API actions over UI interactions, reducing task completion time and cognitive workload.
http://arxiv.org/abs/2409.17137v1
Compressor summary: PEFT improves vision transformer performance but sacrifices generalization; PACE combines PEFT with consistency regularization to reduce gradients, align models, and improve generalization in fine-tuned models.
http://arxiv.org/abs/2409.17134v1
Compressor summary: This paper explores the limitations of Implicit Neural Representations (INRs) for image compression, focusing on computational cost, performance stability, and robustness.
http://arxiv.org/abs/2409.17130v1
Compressor summary: The study uses Bangla-BERT to identify and categorize toxic comments targeting transgender, indigenous, and migrant people on social media in the Bengali language, finding that Bangla-BERT performs best among various models.
http://arxiv.org/abs/2409.17125v1
Compressor summary: The study presents an AI-based autonomous servicer that uses Reinforcement Learning to detect collisions, dock with endangered satellites, and perform collision avoidance maneuvers.
http://arxiv.org/abs/2409.17119v1
Compressor summary: The study presents a machine learning model that uses high-resolution images from the field to detect early symptoms of late blight in potato crops, overcoming previous limitations and showing promising results for disease detection in agriculture.
http://arxiv.org/abs/2409.17115v1
Compressor summary: Key points: - Small language models can refine corpora as well as human experts using ProX framework - ProX treats data refinement as a programming task and generates operations for each example - ProX improves performance on various benchmarks and domain-specific continual pre-training Summary: ProX is a novel framework that enables small language models to refine corpora by generating and executing fine-grained operations for each example, achieving human-level data quality and performance improvements across various tasks.
http://arxiv.org/abs/2409.17113v1
Compressor summary:
http://arxiv.org/abs/2409.17109v1
Compressor summary: The text proposes a method to extract meaningful superclass hierarchies from multimodal deep neural networks for qualitative reasoning models validation and verification.
http://arxiv.org/abs/2409.17106v1
Compressor summary: Text2CAD is an AI framework that generates parametric CAD models from natural language instructions, speeding up prototyping and aiding design.
http://arxiv.org/abs/2409.17095v1
Compressor summary: The paper presents DTLR, a detection-based text line recognition approach that works for various scripts, improving performance on Chinese and cipher recognition tasks.
http://arxiv.org/abs/2409.17092v1
Compressor summary: The paper introduces AXE, a framework for post-training quantization that improves overflow avoidance and supports multi-stage accumulation for large language models.
http://arxiv.org/abs/2409.17091v1
Compressor summary: Ctrl-GenAug is a framework that uses diffusion-based generative augmentation to create high-quality synthetic medical datasets for downstream tasks while controlling semantics and sequential coherence.
http://arxiv.org/abs/2409.17090v1
Compressor summary: The SRSG method creates a sparse graph that preserves local geometric structure and uses support regularization to encourage smoothness, leading to better data clustering than existing methods.
http://arxiv.org/abs/2409.17085v1
Compressor summary: This paper explores how parameter-efficient fine-tuning methods can improve the reliability of monocular depth estimation using Bayesian neural networks.
http://arxiv.org/abs/2409.17080v1
Compressor summary: SVAT benchmark tests if large vision-language models can learn new visuospatial tasks from visual demonstrations alone, finding they struggle without curriculum learning.
http://arxiv.org/abs/2409.17077v1
Compressor summary: Dream11 uses a new transformer model to predict user spending in fantasy sports games based on past transactions, improving accuracy over existing models.
http://arxiv.org/abs/2409.17073v1
Compressor summary: The paper proposes a novel method for decomposing generated answers into their source documents using in-context learning and negative sampling, aiming to improve answer attribution for long documents.
http://arxiv.org/abs/2409.17066v1
Compressor summary: VPTQ is a vector quantization method for large language models that achieves extremely low-bit quantization using second-order optimization and improves accuracy and compression efficiency.
http://arxiv.org/abs/2409.17063v1
Compressor summary: The study evaluates 30 domain generalization algorithms on 3 computational pathology tasks, finding self-supervised learning and stain augmentation to be the most effective methods.
http://arxiv.org/abs/2409.17058v1
Compressor summary: The paper proposes a one-step image super-resolution method using a degradation-guided Low-Rank Adaptation module that corrects model parameters based on low-resolution image information and improves efficiency and quality over existing diffusion-based methods.
http://arxiv.org/abs/2409.17055v1
Compressor summary: DRIM is a multimodal deep learning method that captures shared and unique representations from diverse medical data to improve prognosis prediction and treatment pathways.
http://arxiv.org/abs/2409.17054v1
Compressor summary: The paper proposes using AI to automate transcription, translation, and summarization of doctor-patient conversations to improve efficiency, quality, and accuracy in Puskesmas, Indonesian primary healthcare centers.
http://arxiv.org/abs/2409.17049v1
Compressor summary: ControlCity is a multi-source geographic data transformation method that uses text, images, metadata, and road network data to generate accurate and diverse urban building footprints from volunteer geographic information.
http://arxiv.org/abs/2409.17046v1
Compressor summary: TEMPAMBIQA is a new dataset to study temporally ambiguous open-domain questions and proposes novel search strategies and baselines to detect them.
http://arxiv.org/abs/2409.17045v1
Compressor summary: Key points: - Dataset (GeoBiked) for Deep Generative Models (DGMs) in engineering design - Automated data labeling using large-scale foundation models - Two techniques: hyperfeatures for geometric correspondence and text descriptions with VLMs - Trade-off between creativity and accuracy in text generation Summary: The paper introduces GeoBiked, a dataset for DGMs in engineering design, and explores automated data labeling using foundation models. It proposes two techniques to label technical images: hyperfeatures for geometric correspondence and text descriptions with VLMs, balancing creativity and accuracy.
http://arxiv.org/abs/2409.17044v1
Compressor summary: The paper investigates how different components of speech-to-text systems affect performance and whether the best adapter design varies depending on the model used.
http://arxiv.org/abs/2409.17027v1
Compressor summary: Key points: - The text is about a story generated by a large language model and how it cannot produce counterfactual alternatives to tokens it has generated before. - The authors propose a causal model that enables such counterfactual generation at low cost, simplicity, and without fine-tuning or prompt engineering. - They apply their model on Llama 3 8B-instruct and use it for bias detection. Summary: The text presents a causal model that lets large language models generate alternative stories based on their tokens, without additional cost or effort. The authors test the model on a story generation task and find insights about the model's worldview.
http://arxiv.org/abs/2409.17021v1
Compressor summary: CombU is a novel neural network activation function that combines existing functions at various dimensions across layers to approximate complex data relationships, showing better performance than current methods in experiments.
http://arxiv.org/abs/2409.17016v1
Compressor summary: Mixture-of-Depths (MoD) is a novel approach that improves the computational efficiency of CNNs by selectively processing channels based on their relevance to the current prediction, achieving similar or better performance with reduced inference times and parameters.
http://arxiv.org/abs/2409.17012v1
Compressor summary: The paper presents a deep reinforcement learning model for autonomous decision-making in orbital debris removal missions, enabling efficient and adaptive planning.
http://arxiv.org/abs/2409.17011v1
Compressor summary: The paper presents a system to automatically extract key information about large language models from academic papers using NER and RE methods, aiming to help researchers with information overload.
http://arxiv.org/abs/2409.17005v1
Compressor summary: The text argues that treating math as situated linguistic communication can benefit language models and suggests two case studies showing how language models interpret and generate math based on communicative intentions.
http://arxiv.org/abs/2409.17001v1
Compressor summary: The paper proposes a framework to improve optical flow in adverse weather conditions by using synthetic degraded domains as an intermediate between clean and real data, and transferring knowledge progressively and explicitly.
http://arxiv.org/abs/2409.16997v1
Compressor summary: INT-FlashAttention is a fast and memory-efficient attention module that works with fully INT8 activations and can be compatible with other data formats, improving inference speed and reducing quantization error.
http://arxiv.org/abs/2409.16991v1
Compressor summary: The text compares two machine learning methods (SFA and SR) by analyzing their mathematical properties and applications in an MDP and a gridworld.
http://arxiv.org/abs/2409.16990v1
Compressor summary: Key points: - The text introduces a novel model, Gen3D-Face, that generates 3D human faces from unconstrained single images - The model uses a multi-view consistent diffusion framework, input-conditioned mesh estimation, and multi-view joint generation to achieve photorealistic results - The model outperforms previous alternatives for out-of-domain and in-domain settings Summary: The text presents Gen3D-Face, a new method that creates realistic 3D human face avatars from single images using a multi-view framework and geometry information.
http://arxiv.org/abs/2409.16986v1
Compressor summary: Exttt{Quad} is a data selection method for large language models that uses data influence to balance quality and diversity, improving pre-training results.
http://arxiv.org/abs/2409.16984v1
Compressor summary: AXCEL is a new prompt-based metric for evaluating generated text consistency that provides explanations and generalizes across tasks without changing the prompt, outperforming existing metrics.
http://arxiv.org/abs/2409.16974v1
Compressor summary: The paper reviews literature on large language models, their developments, impacts, limitations, and future directions, focusing on responsible development, algorithmic improvements, ethical challenges, and societal implications.
http://arxiv.org/abs/2409.16973v1
Compressor summary: ASLS is a framework that uses self-supervised learning techniques to personalize large language models dynamically and improve user engagement and satisfaction.
http://arxiv.org/abs/2409.16968v1
Compressor summary: The text describes a new approach to test machine learning solutions for VANET using hardware-in-the-loop, which combines simulated and real-world testing to avoid unexpected outcomes.
http://arxiv.org/abs/2409.16965v1
Compressor summary: ABCFair is a benchmark approach that adapts to different problem settings and enables proper comparability of fairness methods across various use cases, including pre-, in-, and postprocessing techniques on both traditional and dual label datasets.
http://arxiv.org/abs/2409.16956v1
Compressor summary: The proposed lexicographic hybrid deep neural network (LH-DNN) is a novel multi-output architecture that efficiently classifies data according to multiple labels in a hierarchical structure.
http://arxiv.org/abs/2409.16954v1
Compressor summary: The paper proposes a method to improve low-resource language recognition in multilingual ASR models by using weighted cross-entropy and data augmentation.
http://arxiv.org/abs/2409.16949v1
Compressor summary: Key points: - The paper proposes a data augmentation framework using LLM and DM to generate synthetic images for few-shot learning - The method embeds semantic information into text prompts and adjusts the guidance weight based on CLIPScore - The approach improves diversity and adherence to target distribution and outperforms baselines on several benchmarks Summary: The paper presents a framework that uses LLM and DM to generate semantically rich synthetic images for few-shot learning, adjusting the guidance weight based on CLIPScore to balance diversity and target adherence.
http://arxiv.org/abs/2409.16947v1
Compressor summary: The paper describes the 3rd NTIRE challenge on stereo image super-resolution, which aims to improve low-resolution stereo images by a factor of x4 while maintaining consistency and using limited resources.
http://arxiv.org/abs/2409.16946v1
Compressor summary: The paper analyzes how the AI meta-debate in Sweden has shifted from politicians to academics and become more focused on risks since ChatGPT's release.
http://arxiv.org/abs/2409.16945v1
Compressor summary: The text discusses improving Face Forgery Detection (FFD) models by revisiting the complete FFD workflow and integrating a ViT network with self-supervised learning to pre-train a backbone for better facial representation and forgery cue extraction.
http://arxiv.org/abs/2409.16938v1
Compressor summary: The paper proposes a new method for inserting objects into 3D scenes using Gaussian Splatting, MVInpainter, ControlNet, and mask-aware 3D reconstruction, which achieves better results than existing methods.
http://arxiv.org/abs/2409.16934v1
Compressor summary: The paper explores how to enhance named entity recognition (NER) for historical documents by identifying and neutralizing OCR-sensitive neurons in Transformer models.
http://arxiv.org/abs/2409.16925v1
Compressor summary: Key points: - The text introduces a new dataset (GTA-UAV) and a learning approach for vision-based geo-localization of UAVs in GPS-denied environments. - The dataset is based on computer games and covers different altitudes, attitudes, scenes, and targets. - The learning approach uses weight-based contrastive learning to avoid post-processing matching steps. Summary: The authors present a new vision-based geo-localization method for UAVs using a large-range dataset from computer games and weight-based contrastive learning.
http://arxiv.org/abs/2409.16923v1
Compressor summary: The text describes a study that proposes an AI-assisted gaze detection system for online exams to help proctors identify test takers looking away from the screen and potentially using external resources.
http://arxiv.org/abs/2409.16922v1
Compressor summary: The paper presents a theory linking invariant and equivariant maps in group-symmetric deep neural networks, enabling the construction of universal equivariant architectures from invariant ones, and exploring their complexity and approximation rate.
http://arxiv.org/abs/2409.16914v1
Compressor summary: The paper introduces a new feature called token cohesiveness that helps detect if text is generated by a large language model or written by humans, and proposes a generic dual-channel detection method named TOCSIN.
http://arxiv.org/abs/2409.16913v1
Compressor summary: The paper evaluates and improves the performance of Role-Playing Agents in handling different types of conflicting queries using a representation editing approach.
http://arxiv.org/abs/2409.16911v1
Compressor summary: The study explores how to improve zero-shot learning of multilingual large language models by leveraging their translation capabilities between English and other languages, focusing on features with large magnitude that are critical for translation.
http://arxiv.org/abs/2409.16909v1
Compressor summary: The paper presents a new framework that improves time-sensitive question answering by enhancing temporal awareness and reasoning using Temporal Information-Aware Embedding and Granular Contrastive Reinforcement Learning, outperforming existing language models in TSQA tasks.
http://arxiv.org/abs/2409.16907v1
Compressor summary: The paper introduces an adaptive triangle mesh generation method for photometric stereo that uses normals to compute curvature, reduces vertex count, and speeds up normal integration.
http://arxiv.org/abs/2409.16904v1
Compressor summary: The paper proposes DALMC, a method that learns discriminative view-specific features for multi-view clustering and builds a consensus anchor graph to capture complementary information across views.
http://arxiv.org/abs/2409.16898v1
Compressor summary: The AI-driven closed-loop view guidance system assists operators in navigating intra-cardiac echocardiography catheter manipulation, improving accuracy and efficiency.
http://arxiv.org/abs/2409.16897v1
Compressor summary: The Hyperbolic Vision Transformer (HVT) extends the traditional Vision Transformer by using hyperbolic geometry to better model hierarchical and relational dependencies in images.
http://arxiv.org/abs/2409.16884v1
Compressor summary: The paper presents various text classification models for Hawrami, an endangered Kurdish dialect, using four different algorithms and achieving 96% accuracy with Linear SVM.
http://arxiv.org/abs/2409.16882v1
Compressor summary: The paper proposes a masked PPO algorithm to optimize the sequence of visiting space debris using Lambert solver, achieving significant time efficiency and computational speed improvements over standard methods.
http://arxiv.org/abs/2409.16876v1
Compressor summary: Key points: - Traffic Research Agent (TR-Agent) is an AI system for developing and refining traffic models efficiently - TR-Agent consists of four modules that perform idea generation, theory formulation, evaluation, and optimization - TR-Agent improves performance across multiple traffic models and provides explanations for its optimizations - TR-Agent is open-source and supports research and collaboration Summary: TR-Agent is an AI system that autonomously develops and refines traffic models using four modules that perform various stages of the research pipeline. It improves performance across multiple models and explains its optimizations, while being open-source and fostering research collaboration.
http://arxiv.org/abs/2409.16872v1
Compressor summary: This paper proposes a framework to ensure ethical and controllable AI in businesses by balancing factors like performance and explainability, validated through case studies in finance and healthcare.
http://arxiv.org/abs/2409.16867v1
Compressor summary: The paper proposes a new LLM-based framework for automatic generation of multiple heuristics in heuristic search, considering efficiency and scalability, and shows its effectiveness on two combinatorial optimization problems.
http://arxiv.org/abs/2409.16866v1
Compressor summary: The paper proposes two risk-averse learning algorithms that account for delayed feedback and bounded delays, and shows that one-point algorithms achieve sublinear regret under certain conditions while two-point algorithms perform better overall.
http://arxiv.org/abs/2409.16865v1
Compressor summary: The paper proposes an automatic method to visualize and analyze learned features in convolutional neural networks using a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model, enabling interpretability and quantification of representations.
http://arxiv.org/abs/2409.16863v1
Compressor summary: The paper proposes a novel method for reconstructing 3D hair from single-view images, including both braided and un-braided styles, using a large-scale synthetic dataset and specialized diffusion priors.
http://arxiv.org/abs/2409.16860v1
Compressor summary: Large language models can process complex medical data, generate natural language for various healthcare tasks, and improve clinical decision-making, but they face challenges such as privacy, bias, and ethics.
http://arxiv.org/abs/2409.16855v1
Compressor summary: CHOIR is a novel, versatile, and fully differentiable representation of hand-object interactions that improves contact accuracy and physical realism in grasp refinement and synthesis tasks using JointDiffusion, a diffusion model that learns from noisy interactions or object geometries.
http://arxiv.org/abs/2409.16854v1
Compressor summary: The paper proposes QuAM, a framework that integrates facts and norms in legal mediation, and develops a new formalism to model goal argument acceptability based on variable values.
http://arxiv.org/abs/2409.16850v1
Compressor summary: Our novel method for scene change detection uses a visual foundational model and full-image cross-attention to handle varying lighting, seasonal variations, and viewpoint differences, achieving significant improvements in F1-score and generalization over existing methods.
http://arxiv.org/abs/2409.16849v1
Compressor summary: The authors suggest using explicit cognitive models to expose assumptions in cultural AI benchmarks, improving construct measurement and evaluation science.
http://arxiv.org/abs/2409.16845v1
Compressor summary: IRASNet is a framework that uses clutter reduction and domain-invariant feature learning to improve automatic target recognition (ATR) in synthetic aperture radar (SAR) data.
http://arxiv.org/abs/2409.16838v1
Compressor summary: The text introduces two new biologically-inspired CNN models that improve the robustness of image classification by simulating pre-cortical and V1 features, leading to better performance under common corruptions.
http://arxiv.org/abs/2409.16837v1
Compressor summary: This study demonstrates how using demographic data improves region embedding and predictive performances for urban tasks like check-in prediction, crime rate prediction, and house price prediction.
http://arxiv.org/abs/2409.16832v1
Compressor summary: The paper proposes a framework for fractional reinforcement learning to optimize task scheduling and minimize the age of information in mobile edge computing for cyber-physical systems.
http://arxiv.org/abs/2409.16826v1
Compressor summary: The paper introduces a method to solve complex nonlinear systems of differential equations using advanced neural networks, which can efficiently model particles' motion in various fields.
http://arxiv.org/abs/2409.16824v1
Compressor summary: Kalman filter layers improve reinforcement learning under partial observability by incorporating uncertainty in the latent state representation, leading to better decisions.
http://arxiv.org/abs/2409.16821v1
Compressor summary: The paper proposes a novel pipeline using object detection and fine-tuning to improve anomaly detection in powerline insulator components, and employs explainable-AI tools for precise localization and explanation of defects.
http://arxiv.org/abs/2409.16819v1
Compressor summary: Key points: - A new dataset for code generation with clear intent, code snippets, and unit tests - Covers various libraries including Pandas, Numpy, Regex, and 70+ standard libraries from Stack Overflow - Crafted by Python experts for finetuning and evaluation purposes - Refined to reduce data contamination and tested on leading models and GPT-4 - Available at \url{https://github.com/NathanaelBeau/CodeInsight} Summary: The paper presents a new dataset for code generation with examples that have clear intent, code snippets, and unit tests for different libraries. The dataset is crafted by Python experts to reduce data contamination and evaluate models like GPT-4. It can be accessed at \url{https://github.com/NathanaelBeau/CodeInsight}.
http://arxiv.org/abs/2409.16817v1
Compressor summary: Key points: - Surrogate modelling uses simplified models to speed up real-time simulations - The paper proposes a parametric framework for kernel-based dynamic mode decomposition method based on LANDO algorithm - The framework has two stages: offline (training) and online (prediction) - Dimensionality reduction is used to reduce computational cost - Three numerical examples show the efficiency and effectiveness of the framework Summary: The paper presents a parametric framework that uses kernel-based dynamic mode decomposition and LANDO algorithm for surrogate modelling, with two stages of training and prediction, dimensionality reduction, and three numerical examples.
http://arxiv.org/abs/2409.16815v1
Compressor summary: The authors propose a kernel-based approximation framework that reduces latency and memory usage of approximate CNNs on MCUs without sacrificing accuracy in TinyML applications.
http://arxiv.org/abs/2409.16813v1
Compressor summary: The paper proposes a new system called PeerArg that combines large language models with knowledge representation methods to support and understand peer review processes, and shows that it performs better than an end-2-end LLM for predicting paper acceptance from reviews.
http://arxiv.org/abs/2409.16808v1
Compressor summary: This paper evaluates object detection models' efficiency and performance on various edge devices, finding trade-offs between accuracy, speed, and energy efficiency.
http://arxiv.org/abs/2409.16807v1
Compressor summary: The paper introduces a new dataset and evaluation method for detecting different types of hypocrisy accusations in climate change discussions using large language models.
http://arxiv.org/abs/2409.16806v1
Compressor summary: ColonSLAM is a system that creates topological maps of the colon using SLAM, deep features, and topological priors to relate submaps from different times.
http://arxiv.org/abs/2409.16799v1
Compressor summary: The research adapts and fine-tunes a LLM model to predict AISMR with high accuracy and low error, forecasting an above-normal monsoon for 2024.
http://arxiv.org/abs/2409.16797v1
Compressor summary: The paper introduces Scalable Ensemble Diversification (SED), a method that improves OOD generalization and detection by encouraging disagreement among models on hard training samples, without requiring OOD samples.
http://arxiv.org/abs/2409.16793v1
Compressor summary: Spacewalker is an interactive tool that helps users explore, annotate, and analyze unstructured data from various industries by visualizing it in low-dimensional spaces and detecting semantic similarities.
http://arxiv.org/abs/2409.16791v1
Compressor summary: The text proposes a method to improve reinforcement learning in continuous state spaces by using symbolic execution to extract partitions that capture the key structure of the environment dynamics, leading to faster learning and better policy performance.
http://arxiv.org/abs/2409.16788v1
Compressor summary: The paper proposes methods to reduce bias in LLM-as-a-Judge evaluations by calibrating closed-source models and using contrastive training with negative samples for open-source models, improving both probability and prompt level quality.
http://arxiv.org/abs/2409.16787v1
Compressor summary: The study proposes a feature selection method that uses Explainable Artificial Intelligence to improve regression prediction accuracy and stability for blade vibration analysis in turbo machinery.
http://arxiv.org/abs/2409.16783v1
Compressor summary: HARM is a holistic method to test large language models by generating diverse and realistic adversarial examples using a fine-grained risk taxonomy and multi-turn probing.
http://arxiv.org/abs/2409.16779v1
Compressor summary: LLaMa-SciQ is an educational chatbot that helps students solve STEM MCQs by using Retrieval-Augmented Generation and a compressed LLaMa-8B model, which improves accessibility but does not significantly enhance accuracy.
http://arxiv.org/abs/2409.16769v1
Compressor summary: The paper develops a dynamic learning rate algorithm for neural networks that improves optimization by ensuring stable and consistent training dynamics using Lyapunov stability principles.
http://arxiv.org/abs/2409.16768v1
Compressor summary: The proposed method helps interpret convolutional neural networks by identifying which units contribute most or least to specific parameters, providing global and local insights, and generalizing to other architectures.
http://arxiv.org/abs/2409.16767v1
Compressor summary: The paper analyzes the interaction between data representations and classification weights using information-theoretic metrics and proposes new ones to improve supervised and semi-supervised learning.
http://arxiv.org/abs/2409.16765v1
Compressor summary: Key points: - Paper introduces a new dataset and algorithm for aligning lecture videos with slides - Algorithm uses speech, text, and image features and is faster and more accurate than SIFT - OCR and audio transcripts are important for alignment - Matching accuracy varies across lectures due to video quality and lecture style Summary: The paper proposes a multimodal algorithm that aligns lecture videos with slides using speech, text, and image features. It outperforms SIFT in speed and accuracy, and shows that OCR and audio transcripts are crucial for alignment. However, it faces challenges from video quality and lecture style variations.
http://arxiv.org/abs/2409.16764v1
Compressor summary: The paper proposes an offline and distributional reinforcement learning scheme for radio resource management that performs better than conventional models and online RL in real-world stochastic environments.
http://arxiv.org/abs/2409.16763v1
Compressor summary: The paper introduces a method that uses aerial images to predict the geolocation of street-view photos in large regions, achieving 60.6% accuracy for non-panoramic photos in Massachusetts.
http://arxiv.org/abs/2409.16756v1
Compressor summary: LATEC is a large-scale benchmark that evaluates 17 XAI methods with 20 metrics, considering different model architectures and input data types, to help practitioners choose the best method for their problem.
http://arxiv.org/abs/2409.16751v1
Compressor summary: E-SQL is a novel pipeline for translating natural language queries into SQL that addresses challenges like database schema complexity, query ambiguity, and intricate query structures using direct schema linking and candidate predicate augmentation.
http://arxiv.org/abs/2409.16736v1
Compressor summary: Images can be interesting depending on the viewer's preferences and characteristics, with some being universally appealing for their aesthetics and others being personally meaningful.
http://arxiv.org/abs/2409.16735v1
Compressor summary: The paper proposes granular ball RVFL (GB-RVFL) and graph embedding GB-RVFL (GE-GB-RVFL), which improve the scalability and robustness of random vector functional link network for classification tasks by using granular balls as inputs and preserving the dataset's geometric structure.
http://arxiv.org/abs/2409.16730v1
Compressor summary: The paper introduces OPPOHAR, a new dataset for human activity recognition using phone IMU data, and proposes Non-stationary BERT, a lightweight network with a two-stage training method and a data augmentation technique that outperforms existing approaches.
http://arxiv.org/abs/2409.16727v1
Compressor summary: This paper analyzes character hallucination in large language model-powered role-playing systems, introduces the RoleBreak framework to study it as an attack, and proposes the Narrator Mode defense strategy to improve query generalization and narrative coherence.
http://arxiv.org/abs/2409.16726v1
Compressor summary: The paper introduces Relative Safety Margins (RSMs) to measure how robustly two DNN classifiers make decisions on the same input, and proposes a framework to estimate RSM gains or losses under perturbations.
http://arxiv.org/abs/2409.16723v1
Compressor summary: EAGLE is a novel multimodal large language model that can understand arbitrary referring visual prompts without specialized feature encoding or fine-tuning, using colored patches on images and geometry-agnostic learning.
http://arxiv.org/abs/2409.16722v1
Compressor summary: PMSS improves low-rank adaptation by selecting skeletons from pre-trained weights and learning a small matrix with high-rank updates for efficient inference.
http://arxiv.org/abs/2409.16721v1
Compressor summary: The study proposes a novel framework that combines Residual Networks and Artificial Neural Networks to classify multiple datasets from electronic health records, achieving high accuracies for detecting diseases such as heart conditions, cirrhosis, and retinal issues.
http://arxiv.org/abs/2409.16718v1
Compressor summary: The paper proposes ClipFit, a method to improve zero-shot CLIP performance by fine-tuning specific bias terms and normalization layers without extra parameters.
http://arxiv.org/abs/2409.16709v1
Compressor summary: The authors propose a new Pose-Guided Motion Model (PGMM) for generating high-quality sign language videos by using optical flow warping, pose fusion, and a novel metric to measure temporal consistency.
http://arxiv.org/abs/2409.16707v1
Compressor summary: The paper investigates how to probe omissions and distortions in RDF-to-Text generation using two methods, and finds that the encoder is responsible for some information loss.
http://arxiv.org/abs/2409.16706v1
Compressor summary: Pix2Next is a novel image translation framework that generates high-quality Near-Infrared images from RGB inputs using an encoder-decoder architecture with cross-attention and a multi-scale PatchGAN discriminator, improving both quantitative metrics and visual quality and enabling the scaling up of NIR datasets for computer vision applications.
http://arxiv.org/abs/2409.16697v1
Compressor summary: The text explores how bounded parameters affect neural networks' approximation capacity, introduces new concepts to measure it, and discusses connections with random parameter networks and various network design choices.
http://arxiv.org/abs/2409.16694v1
Compressor summary: The paper surveys low-bit quantization methods for large language models, covering their principles, implementations, strategies, frameworks, systems, techniques, and trends.
http://arxiv.org/abs/2409.16693v1
Compressor summary: CaBRNet is a new framework to create self-explainable AI models that can be easily compared and reproduced.
http://arxiv.org/abs/2409.16689v1
Compressor summary: The paper introduces Layout-Corrector, a module that helps discrete diffusion models improve layout generation by identifying and correcting inharmonious elements in the layout.
http://arxiv.org/abs/2409.16686v1
Compressor summary: MSI-Agent is an embodied agent that improves planning and decision-making by summarizing, storing, and utilizing insight across different scales using a three-part pipeline.
http://arxiv.org/abs/2409.16684v1
Compressor summary: The paper proposes Erase then Rectify (ETR), a training-free approach for efficient and scalable graph unlearning that preserves model utility without additional training or full data access.
http://arxiv.org/abs/2409.16682v1
Compressor summary: The paper compares Text-to-SQL and E2E TQA for Table-based Question Answering, identifying their strengths and weaknesses, and proposes a synergistic approach that combines both methods using answer selectors.
http://arxiv.org/abs/2409.16670v1
Compressor summary: GraphLoRA is a method to transfer well-trained Graph Neural Networks (GNNs) to different graph domains by aligning feature and structural distributions using low-rank adaptation and structure-aware regularization.
http://arxiv.org/abs/2409.16668v1
Compressor summary: The paper proposes a new counterfactual detection model that uses neural topic model to capture global semantics, improves performance over existing models, and reduces bias in causal intervention.
http://arxiv.org/abs/2409.16667v1
Compressor summary: The text introduces CCI, a novel story generation framework that uses DALL-E 3 to create visual representations of key elements and enhances character detail in creative stories.
http://arxiv.org/abs/2409.16666v1
Compressor summary: The paper introduces TalkinNeRF, a framework that learns to generate realistic full-body talking humans from monocular videos using a dynamic neural radiance field, capturing body pose, hand gestures, and facial expressions.
http://arxiv.org/abs/2409.16658v1
Compressor summary: The authors find that pre-trained language models often produce different generation probabilities and uncertainty distributions for unfaithfully generated texts, and use this to develop a training algorithm that reduces hallucination and improves faithfulness and quality.
http://arxiv.org/abs/2409.16653v1
Compressor summary: The authors propose a new Transformer architecture for tabular data with a credibility mechanism that improves stability and performance over existing models.
http://arxiv.org/abs/2409.16652v1
Compressor summary: The paper introduces PRL-Track, a novel framework for visual object tracking on UAVs that uses progressive representation learning with appearance and semantic regulators to handle aspect ratio changes and occlusion in complex environments.
http://arxiv.org/abs/2409.16647v1
Compressor summary: The authors propose a method to generate domain-independent descriptive texts for time-series data using two approaches, and create a new dataset called TACO to train a model based on contrastive learning.
http://arxiv.org/abs/2409.16646v1
Compressor summary: The study shows how different languages describe images differently, with some entities being more or less frequently mentioned depending on language and culture.
http://arxiv.org/abs/2409.16645v1
Compressor summary: The text introduces a new method to improve molecular property prediction by adding tasks to an existing algorithm, enhancing its performance while keeping computation costs low.
http://arxiv.org/abs/2409.16636v1
Compressor summary: Debate training improves language model performance in reading comprehension tasks, while consultancy training does not.
http://arxiv.org/abs/2409.16635v1
Compressor summary: The paper introduces Judgment of Thought, a prompt engineering technique that uses three roles (lawyer, prosecutor, and judge) to improve binary logical reasoning performance in both LLM benchmarks and real-world tasks.
http://arxiv.org/abs/2409.16632v1
Compressor summary: Functional SGMCMC is a new method for inferring Bayesian neural networks that uses diffusion dynamics to incorporate informative functional priors and achieve better accuracy and uncertainty quantification.
http://arxiv.org/abs/2409.16631v1
Compressor summary: This paper proposes LDEnhancer, a novel enhancer for nighttime UAV tracking that suppresses uneven light distribution and improves image content, and introduces a new dataset (NAT2024-2) for evaluating low-light enhancement methods.
http://arxiv.org/abs/2409.16630v1
Compressor summary: The authors propose stochastic average pooling, a new module for deep neural networks that combines Dropout-like stochasticity in pooling, to achieve regularization without degrading performance.
http://arxiv.org/abs/2409.16626v1
Compressor summary: The paper proposes a new 8-bit floating-point data format for deep learning called HiFloat8 that balances precision and dynamic range and can be used in both training and inference of AI models.
http://arxiv.org/abs/2409.16623v1
Compressor summary: ConCat is a model that uses neural ODEs and temporal point processes to predict information popularity by capturing the continuous-time dynamics of cascades with irregular events.
http://arxiv.org/abs/2409.16621v1
Compressor summary: Key points: - Privacy policies are lengthy and complicated, leading to uninformed consent for data collection - Several attempts to make privacy policies more user friendly have been made - The paper proposes an entailment-driven LLM based framework to classify paragraphs of privacy policies into meaningful labels - The framework improves F1 score by 11.2% and provides explainable and meaningful predictions Summary: The paper presents a new framework that uses large language models to simplify privacy policies and help users make informed decisions, achieving better performance and interpretability than traditional methods.
http://arxiv.org/abs/2409.16620v1
Compressor summary: The paper introduces an optimized MCTS algorithm for solving complex decision-making problems like the FrozenLake environment, which uses cumulative reward and visit count tables along with UCT formula to learn efficiently and outperform other methods.
http://arxiv.org/abs/2409.16619v1
Compressor summary: CasFT is a method that predicts content popularity on social platforms by combining spatiotemporal patterns and future trends using neural ODEs and diffusion models.
http://arxiv.org/abs/2409.16618v1
Compressor summary: The Claim-Guided Backdoor Attack (CGBA) is a novel method that uses textual claims as triggers to fool language models without requiring post-distribution manipulation, enhancing the feasibility of practical backdoor attacks.
http://arxiv.org/abs/2409.16615v1
Compressor summary: Deformation-based Adaptive Volumetrial Video Streaming is a new framework that uses embedded deformation to improve volumetric video streaming performance by reducing bandwidth usage and ensuring visual coherence between frames, while accounting for network conditions and quality of experience.
http://arxiv.org/abs/2409.16609v1
Compressor summary: The paper proposes a new method to identify and rank the causal chain of climate disturbances using Random Forest Regression and SHAP, and tests it on synthetic and real data sets.
http://arxiv.org/abs/2409.16605v1
Compressor summary: The paper introduces SchNovel, a benchmark to evaluate large language models' ability to assess novelty in scholarly papers, and RAG-Novelty, which simulates the human review process.
http://arxiv.org/abs/2409.16604v1
Compressor summary: The paper proposes a semi-supervised low-light image enhancement framework using mean-teacher, semantic-aware contrastive loss, Mamba-based backbone, and perceptive loss to address color cast and texture issues.
http://arxiv.org/abs/2409.16603v1
Compressor summary: The text describes a shared task involving natural language generation in healthcare to generate radiology reports and discharge summaries, aiming to reduce clinician workload.
http://arxiv.org/abs/2409.16597v1
Compressor summary: EventHallusion is a benchmark to evaluate VideoLLMs' hallucination problem in video event comprehension, and Temporal Contrastive Decoding (TCD) is a method to improve their performance.
http://arxiv.org/abs/2409.16590v1
Compressor summary: MPGraf is a new model that combines Transformers and Graph Neural Networks for learning to rank web search results by integrating their complementary strengths.
http://arxiv.org/abs/2409.16586v1
Compressor summary: AutoSTF is a framework that efficiently searches for optimal neural network architectures for spatio-temporal forecasting by decoupling the search space and using multi-patch transfer.
http://arxiv.org/abs/2409.16581v1
Compressor summary: SelectiveKD is a semi-supervised learning framework that uses knowledge distillation to train effective cancer detection models for Digital Breast Tomosynthesis with limited annotated data.
http://arxiv.org/abs/2409.16572v1
Compressor summary: The nested Fourier-DeepONet is a machine learning model that improves efficiency and prediction accuracy for geological carbon sequestration simulations by combining two techniques, without compromising generalization and extrapolation ability.
http://arxiv.org/abs/2409.16570v1
Compressor summary: The paper introduces EGG, a query generator that adapts to different search intents by compiling high-level tasks into task-adaptive queries, and shows its effectiveness on the BeIR benchmark using a smaller model than existing approaches.
http://arxiv.org/abs/2409.16563v1
Compressor summary: This paper explores using synthetic labels to improve a lightweight language model's performance on medical tasks, showing its potential for specializing large language models in the medical domain.
http://arxiv.org/abs/2409.16560v1
Compressor summary: The paper proposes dynamic-width speculative beam decoding (DSBD), a method that integrates speculative decoding and beam sampling to improve the speed and quality of inference for large language models.
http://arxiv.org/abs/2409.16554v1
Compressor summary: The paper introduces EMIT, a pretraining framework for irregular time series that uses event-based masking in the latent space to enhance model performance and robustness.
http://arxiv.org/abs/2409.16546v1
Compressor summary: The paper proposes a mixed-precision quantization approach that evaluates parameter importance using 'precision alignment' and develops a dynamic KV-Cache technique to reduce memory access latency and speed up large language model inference.
http://arxiv.org/abs/2409.16541v1
Compressor summary: The paper studies how to approximate an $n$-dimensional probability measure using a measure on a lower-dimensional space, and proposes a functional to minimize the approximation error while constraining the complexity of the mapping between the spaces.
http://arxiv.org/abs/2409.16539v1
Compressor summary: The authors present a novel method for translating literary texts that uses Continual Pre-training, Supervised Fine-tuning, and Incremental Decoding to maintain coherence and preserve original quality.
http://arxiv.org/abs/2409.16538v1
Compressor summary: SF-YOLO is a teacher-student framework that adapts YOLO object detection to new domains without using source data, achieving competitive results with simplicity and efficiency.
http://arxiv.org/abs/2409.16537v1
Compressor summary: The paper proposes a resource allocation algorithm (ERA) for edge intelligence to balance inference delay, quality of experience (QoE), and resource consumption in model split inference scenarios.
http://arxiv.org/abs/2409.16535v1
Compressor summary: Prompt Sliders is a text-based method to learn and control image attributes across different diffusion models, improving efficiency and generalizability compared to Concept Sliders.
http://arxiv.org/abs/2409.16532v1
Compressor summary: The study proposes a novel network (TL-GPSTGN) that uses graph pruning and transfer learning to improve prediction accuracy in road networks with limited data.
http://arxiv.org/abs/2409.16521v1
Compressor summary: The text describes how product images can reveal cognitive processes through the language used to describe them, and presents an approach for measuring and validating this cognitive complexity using natural language models.
http://arxiv.org/abs/2409.16517v1
Compressor summary: The paper explores how to use large language models (LLMs) for data generation in multi-modality tasks, focusing on chart understanding, and presents a large dataset and a chart-expert model that outperform GPT-4V.