arxiv compressed, 2024-08-27

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-27 generated by the compressor, my personal LLM-based project.


A Practitioner's Guide to Continual Multimodal Pretraining

http://arxiv.org/abs/2408.14471v1

Compressor summary: The text discusses FoMo-in-Flux, a benchmark for continual multimodal pretraining with realistic constraints and practical guidance, exploring various aspects of updating models in real-world scenarios.


Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

http://arxiv.org/abs/2408.14470v1

Compressor summary: ID^3 is a selective fine-tuning method that calculates parameter importance dynamically and balances exploration and exploitation to improve efficiency on various language tasks.


Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

http://arxiv.org/abs/2408.14469v1

Compressor summary: The paper introduces MH-VidQA, a new task that requires answering visual questions and localizing relevant time intervals in long-form egocentric videos; it proposes GeLM, a novel architecture that enhances multi-modal large language models for this task.


K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

http://arxiv.org/abs/2408.14468v1

Compressor summary: K-Sort Arena is an efficient and reliable platform for evaluating visual generative models using K-wise comparisons, probabilistic modeling, and exploration-exploitation strategies.


Explicit Inductive Inference using Large Language Models

http://arxiv.org/abs/2408.14467v1

Compressor summary: The paper proposes a method to reduce attestation bias in large language models for inference tasks by using alternative entailments.


A domain decomposition-based autoregressive deep learning model for unsteady and nonlinear partial differential equations

http://arxiv.org/abs/2408.14461v1

Compressor summary: The paper presents transient-CoMLSim, a domain decomposition-based deep learning framework for modeling unsteady and nonlinear PDEs with reduced computational complexity, improved scalability, and better prediction accuracy.


Dense Center-Direction Regression for Object Counting and Localization with Point Supervision

http://arxiv.org/abs/2408.14457v1

Compressor summary: The paper introduces CeDiRNet, a novel method for object counting and localization that uses dense regression of center-directions instead of point annotations, improving performance on six datasets.


Center Direction Network for Grasping Point Localization on Cloths

http://arxiv.org/abs/2408.14456v1

Compressor summary: The authors present CeDiRNet-3DoF, a deep learning model for grasp point detection on deformable objects like cloth, which achieved first place in ICRA 2023's Cloth Manipulation Challenge and introduced the ViCoS Towel Dataset as a robust benchmark.


Reconstructing physiological signals from fMRI across the adult lifespan

http://arxiv.org/abs/2408.14453v1

Compressor summary: The authors propose a new framework using Transformer-based models to reconstruct respiratory and cardiac signals from fMRI data in older adults, outperforming previous methods and showing the potential of attention mechanisms for modeling fMRI-physiological relationships.


Symmetry & Critical Points

http://arxiv.org/abs/2408.14445v1

Compressor summary: If a symmetric critical point of an invariant function exists, most neighboring points tend to break symmetry and help optimize neural networks more efficiently.


Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition

http://arxiv.org/abs/2408.14442v1

Compressor summary: The authors propose and compare parallel CNN-DNN architectures based on input data decomposition and various aggregation methods to efficiently train complex image processing models.


Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification

http://arxiv.org/abs/2408.14441v1

Compressor summary: Attend-Fusion is a compact audio-visual fusion method that classifies videos effectively and efficiently, reducing model size by nearly 80%.


Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

http://arxiv.org/abs/2408.14438v1

Compressor summary: The study introduces a novel dataset to evaluate large language models on spatial tasks, finding that gpt-4o performs best overall and prompt strategies significantly affect performance.


Social perception of faces in a vision-language model

http://arxiv.org/abs/2408.14435v1

Compressor summary: The text explores how CLIP, a vision-language model, perceives human faces based on social psychology terms and manipulates six face attributes, finding that age, gender, and race bias CLIP's judgments and facial expression affects them more than lighting or age.


Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

http://arxiv.org/abs/2408.14432v1

Compressor summary: TS-Conf is a contextual bandit algorithm that addresses feedback bias due to herding effects in recommendation systems, improving learning speed and accuracy.


Evaluating saliency scores in point clouds of natural environments by learning surface anomalies

http://arxiv.org/abs/2408.14421v1

Compressor summary: The text proposes a learning-based mechanism to detect salient objects in noisy and textured natural environments using deep neural networks that reconstruct the underlying surface.


CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

http://arxiv.org/abs/2408.14419v1

Compressor summary: CHARTOM is a visual test for language models that requires them to understand and evaluate charts, helping ensure they don't mislead humans.


MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

http://arxiv.org/abs/2408.14418v1

Compressor summary: MEDSAGE uses large language models to generate synthetic data for data augmentation, improving the accuracy and robustness of medical dialogue summarization despite noisy ASR outputs.


Language-specific Calibration for Pruning Multilingual Language Models

http://arxiv.org/abs/2408.14398v1

Compressor summary: The paper explores effective strategies for calibrating pruning of multilingual language models and presents the first comprehensive empirical study comparing different calibration languages across tasks, models, and techniques.


Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

http://arxiv.org/abs/2408.14397v1

Compressor summary: The ReXKG system creates a knowledge graph from AI-generated radiology reports to evaluate their understanding and granularity compared to human reports.


Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning

http://arxiv.org/abs/2408.14387v1

Compressor summary: The paper proposes a hybrid approach that combines language models with traditional forecasting methods to handle large and complex spatio-temporal datasets for various sectors, achieving better forecast accuracy.


Learning Tree-Structured Composition of Data Augmentation

http://arxiv.org/abs/2408.14381v1

Compressor summary: The paper proposes efficient algorithms for data augmentation using binary tree-structured compositions of transformations, achieving faster runtime complexity and improving performance on graph and image datasets.


Probing Causality Manipulation of Large Language Models

http://arxiv.org/abs/2408.14380v1

Compressor summary: The paper explores how to probe large language models' ability to understand and manipulate causality in natural language using retrieval augmented generation and in-context learning.


SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery

http://arxiv.org/abs/2408.14371v1

Compressor summary: The paper proposes `self-expertise', a novel concept that enhances the model's ability to discover and classify fine-grained categories using unsupervised and supervised strategies, outperforming state-of-the-art methods.


Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning

http://arxiv.org/abs/2408.14369v1

Compressor summary: ELIMIPL is a new algorithm that uses label information from both candidate and non-candidate sets to improve disambiguation in multi-instance partial-label learning scenarios.


An Embedding is Worth a Thousand Noisy Labels

http://arxiv.org/abs/2408.14358v1

Compressor summary: WANN is a method that uses self-supervised features and reliability scores to weight votes from data labels, improving robustness and efficiency in deep neural networks.


Assessing Contamination in Large Language Models: Introducing the LogProber method

http://arxiv.org/abs/2408.14352v1

Compressor summary: The paper introduces LogProber, a tool for detecting data contamination in large language models using token probability, and explores its limitations.


Deep learning-based ecological analysis of camera trap images is impacted by training data quality and size

http://arxiv.org/abs/2408.14348v1

Compressor summary: The study compares ecological metrics derived from expert-generated species identifications with those generated by deep neural networks and finds that while some factors affect these metrics, others remain robust and resilient.


A Brief Analysis of the Iterative Next Boundary Detection Network for Tree Rings Delineation in Images of Pinus taeda

http://arxiv.org/abs/2408.14343v1

Compressor summary: The INBD network is a two-stage U-Net-based method that segments tree rings in RGB images of Pinus taeda cross sections captured by a smartphone, achieving an F-Score of 77.5 and other metrics.


ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty

http://arxiv.org/abs/2408.14339v1

Compressor summary: ConceptMix is a benchmark for evaluating Text-to-Image models' compositional generation ability by automatically generating diverse text prompts and checking how well the models capture the prompted concepts in the images.


Machine Learning for Quantifier Selection in cvc5

http://arxiv.org/abs/2408.14338v1

Compressor summary: The paper presents an efficient machine learning approach to improve SMT solving on quantified problems by guiding quantifier selection using decision trees.


One-layer transformers fail to solve the induction heads task

http://arxiv.org/abs/2408.14332v1

Compressor summary: The text shows that a single-layer transformer needs to be much bigger than a two-layer one to perform the induction heads task.


Automated Machine Learning in Insurance

http://arxiv.org/abs/2408.14331v1

Compressor summary: The paper introduces an AutoML workflow for insurance applications that requires minimal human intervention and addresses domain-specific challenges.


PHEVA: A Privacy-preserving Human-centric Video Anomaly Detection Dataset

http://arxiv.org/abs/2408.14329v1

Compressor summary: PHEVA is a privacy-friendly video anomaly detection dataset with more data and context-specific cameras than previous ones, and it includes continual learning benchmarks for real-world deployment.


Function-Space MCMC for Bayesian Wide Neural Networks

http://arxiv.org/abs/2408.14325v1

Compressor summary: This paper explores how to efficiently sample from the posterior distribution of Bayesian Neural Networks using preconditioned Crank-Nicolson and Langevin methods, which improve as the network width increases.


Rethinking Knowledge Transfer in Learning Using Privileged Information

http://arxiv.org/abs/2408.14319v1

Compressor summary: The paper critically examines the assumptions and effectiveness of learning using privileged information in supervised machine learning.


Claim Verification in the Age of Large Language Models: A Survey

http://arxiv.org/abs/2408.14317v1

Compressor summary: This survey summarizes recent claim verification frameworks using large language models (LLMs) and their components, such as retrieval, prompting, and fine-tuning, as well as available datasets for the task.


LLM-3D Print: Large Language Models To Monitor and Control 3D Printing

http://arxiv.org/abs/2408.14307v1

Compressor summary: The text proposes a framework that uses large language models and 3D printers to detect and fix common 3D printing errors without human intervention.


May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

http://arxiv.org/abs/2408.14284v1

Compressor summary: AER uses forgetting to separate noisy, complex samples from clean ones and ABS prioritizes purity on the current task while retaining relevant past samples for Continual Learning under Noisy Labels.


Predictability and Causality in Spanish and English Natural Language Generation

http://arxiv.org/abs/2408.14283v1

Compressor summary: The paper compares causal and non-causal language modeling for English and Spanish, finding that different generation strategies may be more suitable for each language depending on their grammatical structures.


Uncertainties of Latent Representations in Computer Vision

http://arxiv.org/abs/2408.14281v1

Compressor summary: The text discusses methods to estimate and incorporate uncertainty in latent representations of pretrained computer vision models, making machine learning more trustworthy and accessible.


1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

http://arxiv.org/abs/2408.14267v1

Compressor summary: The paper proposes 1-bit Fully Quantized Training (FQT) and introduces two techniques to improve it: Activation Gradient Pruning (AGP) and Sample Channel joint Quantization (SCQ), achieving higher accuracy and faster training speed compared to per-sample quantization.


Self-supervised Speech Representations Still Struggle with African American Vernacular English

http://arxiv.org/abs/2408.14262v1

Compressor summary: SSL speech models perform similarly to traditional ASR systems in recognizing AAVE and MAE, but have higher error rates on AAVE features.


An Evaluation of Explanation Methods for Black-Box Detectors of Machine-Generated Text

http://arxiv.org/abs/2408.14252v1

Compressor summary: The study compares different explanation methods for machine-generated text detectors and finds that SHAP is the best in faithfulness and stability, while LIME is the most usable but worst in predicting detector behavior.


Beyond Few-shot Object Detection: A Detailed Survey

http://arxiv.org/abs/2408.14249v1

Compressor summary: The paper surveys various few-shot object detection methods that adapt to new object categories quickly with fewer labeled samples, comparing their performance and exploring their applications and challenges.


Cascaded Temporal Updating Network for Efficient Video Super-Resolution

http://arxiv.org/abs/2408.14244v1

Compressor summary: The paper proposes a new VSR method (CTUN) that uses a cascaded alignment module and a unidirectional propagation network to improve efficiency and performance for video super-resolution on resource-constrained devices.


DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification

http://arxiv.org/abs/2408.14236v1

Compressor summary: Semantic towers are a way to represent knowledge outside of language models, and their performance depends on how well they connect to intrinsic knowledge in large models.


FSDEM: Feature Selection Dynamic Evaluation Metric

http://arxiv.org/abs/2408.14234v1

Compressor summary: The paper introduces a new evaluation metric for feature selection algorithms that considers both performance and stability, and demonstrates its effectiveness through experiments and comparisons.


Gallery-Aware Uncertainty Estimation For Open-Set Face Recognition

http://arxiv.org/abs/2408.14229v1

Compressor summary: The text discusses a method to improve open-set face recognition by estimating both gallery and embedding uncertainties using a Bayesian probabilistic model, and tests it on challenging datasets.


TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation

http://arxiv.org/abs/2408.14227v1

Compressor summary: The paper introduces TC-DPM, a diffusion method that translates infrared videos to visible ones while preserving semantics and temporal consistency, achieving better results than previous methods.


Provable Imbalanced Point Clustering

http://arxiv.org/abs/2408.14225v1

Compressor summary: The paper presents efficient methods to approximate imbalanced point clustering using coresets and choice clustering, and demonstrates their effectiveness on various datasets.


Fact Probability Vector Based Goal Recognition

http://arxiv.org/abs/2408.14224v1

Compressor summary: The paper proposes a new method for recognizing goals in agent behavior by comparing observed facts with their expected probabilities, which are estimated and approximated efficiently.


MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

http://arxiv.org/abs/2408.14211v1

Compressor summary: MagicMan is a multi-view diffusion model that uses a pre-trained 2D diffusion model and a parametric SMPL-X body prior to generate high-quality images from a single reference image, improving generalizability and 3D human reconstruction.


Lemon and Orange Disease Classification using CNN-Extracted Features and Machine Learning Classifier

http://arxiv.org/abs/2408.14206v1

Compressor summary: The text proposes a disease classification approach for lemons and oranges using deep learning and machine learning algorithms to improve citrus farming and reduce yield losses.


Representative Arm Identification: A fixed confidence approach to identify cluster representatives

http://arxiv.org/abs/2408.14195v1

Compressor summary: Key points: - The text studies the RAI problem in the MAB framework, where arms are clustered by reward distribution and the goal is to identify a fixed number of arms from each cluster with minimal pulls. - The text provides an instance-dependent lower bound on the sample complexity and two algorithms based on confidence intervals that match the lower bound in terms of order. - The text compares the algorithms empirically on synthetic and real datasets and shows their superior performance. Summary: The text proposes and analyzes two algorithms for identifying a fixed number of arms from each cluster in the MAB framework with minimal pulls, based on confidence intervals, and demonstrates their empirical advantages over other methods on various data sets.


Feature Aligning Few shot Learning Method Using Local Descriptors Weighted Rules

http://arxiv.org/abs/2408.14192v1

Compressor summary: FAFD-LDWR is a novel few-shot classification method that uses cross-normalization and local descriptor alignment to improve performance, reduce noise, and enhance interpretability.


Ensemble Predicate Decoding for Unbiased Scene Graph Generation

http://arxiv.org/abs/2408.14187v1

Compressor summary: The paper proposes Ensemble Predicate Decoding (EPD) to address predicate bias in scene graph generation by using multiple decoders, including auxiliary ones trained on less common predicates, which improves the model's representation and prediction capability for all predicate categories.


Affine steerers for structured keypoint description

http://arxiv.org/abs/2408.14186v1

Compressor summary: The authors propose a method to train keypoint descriptors that are invariant to affine transformations using representation theory of GL(2) and steerers, achieving state-of-the-art results in image matching.


DynamicRouteGPT: A Real-Time Multi-Vehicle Dynamic Navigation Framework Based on Large Language Models

http://arxiv.org/abs/2408.14185v1

Compressor summary: The paper proposes DynamicRouteGPT, which uses causal inference to balance global and local optimality for real-time dynamic path planning in complex traffic environments, considering traffic, preferences, and unexpected events.


I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

http://arxiv.org/abs/2408.14180v1

Compressor summary: The paper introduces I2EBench, a comprehensive benchmark to evaluate instruction-based image editing models based on 16 dimensions that cover both high-level and low-level aspects of image quality, aligned with human perception, and offering valuable research insights for further development.


NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training

http://arxiv.org/abs/2408.14177v1

Compressor summary: NimbleD is a self-supervised monocular depth estimation framework that uses pseudo-labels from a large vision model, does not need camera intrinsics, and achieves low latency inference for virtual and augmented reality applications.


SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

http://arxiv.org/abs/2408.14176v1

Compressor summary: The paper proposes improvements to SwiftBrush, a one-step text-to-image diffusion model, achieving state-of-the-art performance and quality.


BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment

http://arxiv.org/abs/2408.14173v1

Compressor summary: The paper proposes BackFlip, a local data augmentation technique for artistic image aesthetic assessment, which often outperforms global augmentations without changing the composition of the art images.


Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent

http://arxiv.org/abs/2408.14154v1

Compressor summary: The text discusses the importance of mental models in user interactions with intelligent systems, presents a new dataset to study them, and suggests that implicit adaptation can improve system usability and success.


Explaining Vision-Language Similarities in Dual Encoders with Feature-Pair Attributions

http://arxiv.org/abs/2408.14153v1

Compressor summary: The paper investigates how dual encoder models like CLIP compare two inputs and find that they learn detailed connections between image regions and captions, but performance varies by class and data distribution, and improves with in-domain training.


Application of Disentanglement to Map Registration Problem

http://arxiv.org/abs/2408.14152v1

Compressor summary: The paper proposes a method to align different types of geospatial data using a combination of variational autoencoder and adversarial training, while preserving their geographic information and artistic styles.


TSAK: Two-Stage Semantic-Aware Knowledge Distillation for Efficient Wearable Modality and Model Optimization in Manufacturing Lines

http://arxiv.org/abs/2408.14146v1

Compressor summary: TSAK is a two-stage approach that uses knowledge distillation to create efficient, privacy-aware, and wearable human activity recognition systems for smart factories with smaller models and reduced sensor inputs.


Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?

http://arxiv.org/abs/2408.14141v1

Compressor summary: Crowd-Calibrator is a method that uses crowd worker agreement to calibrate models for subjective NLP tasks and inform whether they should abstain from decisions, improving performance on hate speech detection and natural language inference.


Multi-Faceted Evaluation of Modeling Languages for Augmented Reality Applications -- The Case of ARWFML

http://arxiv.org/abs/2408.14137v1

Compressor summary: The paper introduces two design iterations of ARWFML, a modeling language for creating augmented reality scenarios without programming knowledge, and presents a comprehensibility study based on various evaluations.


Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models

http://arxiv.org/abs/2408.14135v1

Compressor summary: The paper introduces FC22k, a large food image composite dataset, and Foodfusion, a novel method that uses diffusion models to synthesize natural images by fusing foreground and background information.


Exploring the Potential of Large Language Models for Heterophilic Graphs

http://arxiv.org/abs/2408.14134v1

Compressor summary: The text proposes a novel two-stage framework that leverages Large Language Models (LLMs) to improve Graph Neural Networks (GNNs) for handling heterophilic graphs, where connected nodes have dissimilar features.


GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

http://arxiv.org/abs/2408.14131v1

Compressor summary: GenFormer is a data augmentation strategy that uses generated images to improve transformer accuracy and robustness on small-scale image classification tasks.


Theoretical Proportion Label Perturbation for Learning from Label Proportions in Large Bags

http://arxiv.org/abs/2408.14130v1

Compressor summary: The study presents a method for learning from large-sized bags in weakly supervised learning by generating mini-bags, perturbing their proportions, and weighting losses to reduce overfitting.


Enhancing Fairness through Reweighting: A Path to Attain the Sufficiency Rule

http://arxiv.org/abs/2408.14126v1

Compressor summary: The authors propose a new way to improve fairness in model training by adjusting the weights of training data using a bilevel formulation and discretization, which enhances both prediction performance and fairness measures.


Contrastive Learning Subspace for Text Clustering

http://arxiv.org/abs/2408.14119v1

Compressor summary: The paper introduces Subspace Contrastive Learning, a new text clustering method that models cluster-wise relationships and outperforms existing methods on various datasets.


Towards Lifelong Learning Embeddings: An Algorithmic Approach to Dynamically Extend Embeddings

http://arxiv.org/abs/2408.14118v1

Compressor summary: The paper presents a modular algorithm for dynamic embedding in e-commerce that extends input size, preserves learned knowledge, and handles new product introductions better than traditional embeddings.


Hierarchical Learning and Computing over Space-Ground Integrated Networks

http://arxiv.org/abs/2408.14116v1

Compressor summary: Key points: - The paper proposes a framework to aggregate models from IoT devices using satellites for AI training. - The framework uses LEO and GEO satellites for low-latency and global coverage. - The paper formulates a network energy minimization problem as a DST problem and solves it with TAEER algorithm. Summary: The paper presents a satellite-based framework to aggregate models from IoT devices for AI training, using a minimum energy routing algorithm that reduces communication overhead and privacy concerns.


Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model

http://arxiv.org/abs/2408.14111v1

Compressor summary: The paper proposes a spatial-temporal attention-based model for hand gesture-based sign language recognition using hand joint skeletons, improving accuracy, privacy, and efficiency.


Estimating Causal Effects from Learned Causal Networks

http://arxiv.org/abs/2408.14101v1

Compressor summary: The paper proposes a new way to answer causal-effect queries over discrete observable variables by learning the causal Bayesian network and its latent variables from observational data, which can be more effective than estimand approaches for larger models.


LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

http://arxiv.org/abs/2408.14087v1

Compressor summary: The text proposes a novel model called LSM-YOLO, which improves real-time medical ROI detection by refining feature extraction and enhancing fusion between ROI features and neighboring features.


HABD: a houma alliance book ancient handwritten character recognition database

http://arxiv.org/abs/2408.14084v1

Compressor summary: The paper introduces a new database and benchmark for recognizing and studying ancient characters in the Houma Alliance Book using deep learning and digital technology.


Score-based change point detection via tracking the best of infinitely many experts

http://arxiv.org/abs/2408.14073v1

Compressor summary: The paper proposes a new online method for detecting changes in data using an algorithm that combines multiple experts and improves the fixed share forecaster's performance.


Revisiting Vacuous Reduct Semantics for Abstract Argumentation (Extended Version)

http://arxiv.org/abs/2408.14069v1

Compressor summary: The text discusses vacuous reduct semantics, a method to refine abstract argumentation frameworks by accepting only extensions with no non-empty alternative extensions, and analyzes its principles and behavior.


Evaluating the Visual Similarity of Southwest China's Ethnic Minority Brocade Based on Deep Learning

http://arxiv.org/abs/2408.14060v1

Compressor summary: The paper uses deep learning to analyze visual similarities of ethnic minority patterns in Southwest China, developing a custom network that outperforms other models and using three metrics to evaluate features, resulting in an ethnic thematic map.


Enhancing Depression Diagnosis with Chain-of-Thought Prompting

http://arxiv.org/abs/2408.14053v1

Compressor summary: The paper proposes using chain-of-thought prompting to improve AI models' accuracy in detecting depressive disorder symptoms based on PHQ-8 scores.


Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

http://arxiv.org/abs/2408.14051v1

Compressor summary: The proposed V2I-DETR method leverages video context for lesion detection in medical videos, while maintaining fast inference speed.


PAGE: Parametric Generative Explainer for Graph Neural Network

http://arxiv.org/abs/2408.14042v1

Compressor summary: PAGE is a framework for generating explanations for graph neural networks by training an auto-encoder to extract causal features from latent space and map them to substructures of the input graph.


MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

http://arxiv.org/abs/2408.14033v1

Compressor summary: The MLR-Copilot system uses large language models to generate and implement research ideas, increasing productivity in machine learning research by speeding up experimentation and reducing complexity.


SurGen: Text-Guided Diffusion Model for Surgical Video Generation

http://arxiv.org/abs/2408.14028v1

Compressor summary: SurGen is a text-guided diffusion model that generates high-resolution, long surgical videos with good quality and alignment to text prompts, showing its potential as an educational tool.


Empowering Low-Resource Language ASR via Large-Scale Pseudo Labeling

http://arxiv.org/abs/2408.14026v1

Compressor summary: The study proposes a framework that uses pseudo-labeling to improve automatic speech recognition (ASR) for low-resource languages like Hindi, using a new benchmark called IndicYT with YouTube audio files.


An Item Response Theory-based R Module for Algorithm Portfolio Analysis

http://arxiv.org/abs/2408.14025v1

Compressor summary: The paper presents AIRT-Module, an IRT-based analysis tool for evaluating algorithm performance across diverse tasks using difficulty and consistency measures.


Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

http://arxiv.org/abs/2408.14023v1

Compressor summary: Video-CCAM is a robust video-language model that uses cross-attention layers and causal cross-attention masks to process videos of various lengths, achieving excellent performance in several benchmarks.


Pixel-Aligned Multi-View Generation with Depth Guided Decoder

http://arxiv.org/abs/2408.14016v1

Compressor summary: Key points: - The paper proposes a method for generating multiple views from a single image using a latent video diffusion model with epipolar attention layers - The method addresses pixel-level misalignment across views by focusing on spatially adjacent regions - The method improves downstream multi-view to 3D reconstruction tasks


Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey

http://arxiv.org/abs/2408.14014v1

Compressor summary: This survey covers recent advances in category theory-based machine Learning, focusing on gradient-based, probability-based, invariance-based, and topos-based learning approaches.


A Multiscale Gradient Fusion Method for Edge Detection in Color Images Utilizing the CBM3D Filter

http://arxiv.org/abs/2408.14013v1

Compressor summary: The paper proposes a color edge detection method that combines collaborative filtering with multiscale gradient fusion to enhance noise robustness and edge quality.


Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing

http://arxiv.org/abs/2408.14010v1

Compressor summary: The study develops models to predict water quality parameters using satellite data and cloud computing, showing improved accuracy compared to previous methods.


LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models

http://arxiv.org/abs/2408.14008v1

Compressor summary: The authors propose a new model (LMM-VQA) that uses large multimodal models to assess video quality by extracting spatial and temporal features and aligning them with language tokens, achieving state-of-the-art performance on five VQA benchmarks.


Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control

http://arxiv.org/abs/2408.13995v1

Compressor summary: The Avatar Concept Slider (ACS) is a method for precise 3D avatar editing using semantic concepts and three designs that preserve identity and efficiency.


Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective

http://arxiv.org/abs/2408.13991v1

Compressor summary: The paper proposes Dual-CBA, a bi-level framework that adapts to catastrophic distribution shifts in online continual learning, using class-specific and class-agnostic modules, and Incremental Batch Normalization to alleviate feature bias.


Automatic Medical Report Generation: Methods and Applications

http://arxiv.org/abs/2408.13988v1

Compressor summary: This review analyzes artificial intelligence methods for automatically generating medical reports from 2021 to 2024, discussing challenges, applications, datasets, evaluation metrics, and future directions.


Question answering system of bridge design specification based on large language model

http://arxiv.org/abs/2408.13282v1

Compressor summary: The paper develops a question answering system for bridge design specification using different fine-tuning and self-built language models, achieving high accuracy in the training dataset but needing improvement in generalization.


Focused Large Language Models are Stable Many-Shot Learners

http://arxiv.org/abs/2408.13987v1

Compressor summary: FocusICL is a training-free method to improve large language models' task adaptation by filtering unimportant contents and ensuring sufficient attention, which outperforms vanilla ICL in many-shot settings.


AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

http://arxiv.org/abs/2408.13986v1

Compressor summary: Key points: - The paper proposes AgentMove, a framework for generalized mobility prediction using large language models (LLMs). - AgentMove consists of three modules: spatial-temporal memory, world knowledge generator, and collective knowledge extractor. - AgentMove performs better than the best baseline in various metrics and shows less geographical bias. Summary: AgentMove is a framework that uses LLMs to predict human mobility for any city worldwide, by mining individual patterns, modeling urban structure, and capturing shared patterns among population. It outperforms existing methods and reduces geographical bias.


Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

http://arxiv.org/abs/2408.13983v1

Compressor summary: This paper proposes a dual-path token lifting method for transformer models to efficiently separate input signals into principal components and noise components, improving test-time domain adaptation performance.


Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

http://arxiv.org/abs/2408.13979v1

Compressor summary: This paper explores the impact of soft-prompt norms on vision-language models and proposes a method to normalize them for better performance.


DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting

http://arxiv.org/abs/2408.13972v1

Compressor summary: Key points: - Dynamic scene reconstruction aims to render high-quality and real-time images of moving scenes - DynaSurfGS is a novel method that combines 4D neural voxels, Gaussian splatting, normal regularization, and ARAP constraint - DynaSurfGS outperforms existing methods in surface reconstruction and rendering quality Summary: DynaSurfGS is a new approach for dynamic scene reconstruction that achieves photorealistic rendering and high-fidelity surface reconstruction by integrating various techniques, such as 4D neural voxels and ARAP constraint.


Reducing the Cost: Cross-Prompt Pre-Finetuning for Short Answer Scoring

http://arxiv.org/abs/2408.13966v1

Compressor summary: The paper proposes a two-phase approach for automated short answer scoring that uses key phrases and cross-prompt data to reduce training costs and improve accuracy.