This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-27 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.14471v1
Compressor summary: The text discusses FoMo-in-Flux, a benchmark for continual multimodal pretraining with realistic constraints and practical guidance, exploring various aspects of updating models in real-world scenarios.
http://arxiv.org/abs/2408.14470v1
Compressor summary: ID^3 is a selective fine-tuning method that calculates parameter importance dynamically and balances exploration and exploitation to improve efficiency on various language tasks.
http://arxiv.org/abs/2408.14469v1
Compressor summary: The paper introduces MH-VidQA, a new task that requires answering visual questions and localizing relevant time intervals in long-form egocentric videos; it proposes GeLM, a novel architecture that enhances multi-modal large language models for this task.
http://arxiv.org/abs/2408.14468v1
Compressor summary: K-Sort Arena is an efficient and reliable platform for evaluating visual generative models using K-wise comparisons, probabilistic modeling, and exploration-exploitation strategies.
http://arxiv.org/abs/2408.14467v1
Compressor summary: The paper proposes a method to reduce attestation bias in large language models for inference tasks by using alternative entailments.
http://arxiv.org/abs/2408.14461v1
Compressor summary: The paper presents transient-CoMLSim, a domain decomposition-based deep learning framework for modeling unsteady and nonlinear PDEs with reduced computational complexity, improved scalability, and better prediction accuracy.
http://arxiv.org/abs/2408.14457v1
Compressor summary: The paper introduces CeDiRNet, a novel method for object counting and localization that uses dense regression of center-directions instead of point annotations, improving performance on six datasets.
http://arxiv.org/abs/2408.14456v1
Compressor summary: The authors present CeDiRNet-3DoF, a deep learning model for grasp point detection on deformable objects like cloth, which achieved first place in ICRA 2023's Cloth Manipulation Challenge and introduced the ViCoS Towel Dataset as a robust benchmark.
http://arxiv.org/abs/2408.14453v1
Compressor summary: The authors propose a new framework using Transformer-based models to reconstruct respiratory and cardiac signals from fMRI data in older adults, outperforming previous methods and showing the potential of attention mechanisms for modeling fMRI-physiological relationships.
http://arxiv.org/abs/2408.14445v1
Compressor summary: If a symmetric critical point of an invariant function exists, most neighboring points tend to break symmetry and help optimize neural networks more efficiently.
http://arxiv.org/abs/2408.14442v1
Compressor summary: The authors propose and compare parallel CNN-DNN architectures based on input data decomposition and various aggregation methods to efficiently train complex image processing models.
http://arxiv.org/abs/2408.14441v1
Compressor summary: Attend-Fusion is a compact audio-visual fusion method that classifies videos effectively and efficiently, reducing model size by nearly 80%.
http://arxiv.org/abs/2408.14438v1
Compressor summary: The study introduces a novel dataset to evaluate large language models on spatial tasks, finding that gpt-4o performs best overall and prompt strategies significantly affect performance.
http://arxiv.org/abs/2408.14435v1
Compressor summary: The text explores how CLIP, a vision-language model, perceives human faces based on social psychology terms and manipulates six face attributes, finding that age, gender, and race bias CLIP's judgments and facial expression affects them more than lighting or age.
http://arxiv.org/abs/2408.14432v1
Compressor summary: TS-Conf is a contextual bandit algorithm that addresses feedback bias due to herding effects in recommendation systems, improving learning speed and accuracy.
http://arxiv.org/abs/2408.14421v1
Compressor summary: The text proposes a learning-based mechanism to detect salient objects in noisy and textured natural environments using deep neural networks that reconstruct the underlying surface.
http://arxiv.org/abs/2408.14419v1
Compressor summary: CHARTOM is a visual test for language models that requires them to understand and evaluate charts, helping ensure they don't mislead humans.
http://arxiv.org/abs/2408.14418v1
Compressor summary: MEDSAGE uses large language models to generate synthetic data for data augmentation, improving the accuracy and robustness of medical dialogue summarization despite noisy ASR outputs.
http://arxiv.org/abs/2408.14398v1
Compressor summary: The paper explores effective strategies for calibrating pruning of multilingual language models and presents the first comprehensive empirical study comparing different calibration languages across tasks, models, and techniques.
http://arxiv.org/abs/2408.14397v1
Compressor summary: The ReXKG system creates a knowledge graph from AI-generated radiology reports to evaluate their understanding and granularity compared to human reports.
http://arxiv.org/abs/2408.14387v1
Compressor summary: The paper proposes a hybrid approach that combines language models with traditional forecasting methods to handle large and complex spatio-temporal datasets for various sectors, achieving better forecast accuracy.
http://arxiv.org/abs/2408.14381v1
Compressor summary: The paper proposes efficient algorithms for data augmentation using binary tree-structured compositions of transformations, achieving faster runtime complexity and improving performance on graph and image datasets.
http://arxiv.org/abs/2408.14380v1
Compressor summary: The paper explores how to probe large language models' ability to understand and manipulate causality in natural language using retrieval augmented generation and in-context learning.
http://arxiv.org/abs/2408.14371v1
Compressor summary: The paper proposes `self-expertise', a novel concept that enhances the model's ability to discover and classify fine-grained categories using unsupervised and supervised strategies, outperforming state-of-the-art methods.
http://arxiv.org/abs/2408.14369v1
Compressor summary: ELIMIPL is a new algorithm that uses label information from both candidate and non-candidate sets to improve disambiguation in multi-instance partial-label learning scenarios.
http://arxiv.org/abs/2408.14358v1
Compressor summary: WANN is a method that uses self-supervised features and reliability scores to weight votes from data labels, improving robustness and efficiency in deep neural networks.
http://arxiv.org/abs/2408.14352v1
Compressor summary: The paper introduces LogProber, a tool for detecting data contamination in large language models using token probability, and explores its limitations.
http://arxiv.org/abs/2408.14348v1
Compressor summary: The study compares ecological metrics derived from expert-generated species identifications with those generated by deep neural networks and finds that while some factors affect these metrics, others remain robust and resilient.
http://arxiv.org/abs/2408.14343v1
Compressor summary: The INBD network is a two-stage U-Net-based method that segments tree rings in RGB images of Pinus taeda cross sections captured by a smartphone, achieving an F-Score of 77.5 and other metrics.
http://arxiv.org/abs/2408.14339v1
Compressor summary: ConceptMix is a benchmark for evaluating Text-to-Image models' compositional generation ability by automatically generating diverse text prompts and checking how well the models capture the prompted concepts in the images.
http://arxiv.org/abs/2408.14338v1
Compressor summary: The paper presents an efficient machine learning approach to improve SMT solving on quantified problems by guiding quantifier selection using decision trees.
http://arxiv.org/abs/2408.14332v1
Compressor summary: The text shows that a single-layer transformer needs to be much bigger than a two-layer one to perform the induction heads task.
http://arxiv.org/abs/2408.14331v1
Compressor summary: The paper introduces an AutoML workflow for insurance applications that requires minimal human intervention and addresses domain-specific challenges.
http://arxiv.org/abs/2408.14329v1
Compressor summary: PHEVA is a privacy-friendly video anomaly detection dataset with more data and context-specific cameras than previous ones, and it includes continual learning benchmarks for real-world deployment.
http://arxiv.org/abs/2408.14325v1
Compressor summary: This paper explores how to efficiently sample from the posterior distribution of Bayesian Neural Networks using preconditioned Crank-Nicolson and Langevin methods, which improve as the network width increases.
http://arxiv.org/abs/2408.14319v1
Compressor summary: The paper critically examines the assumptions and effectiveness of learning using privileged information in supervised machine learning.
http://arxiv.org/abs/2408.14317v1
Compressor summary: This survey summarizes recent claim verification frameworks using large language models (LLMs) and their components, such as retrieval, prompting, and fine-tuning, as well as available datasets for the task.
http://arxiv.org/abs/2408.14307v1
Compressor summary: The text proposes a framework that uses large language models and 3D printers to detect and fix common 3D printing errors without human intervention.
http://arxiv.org/abs/2408.14284v1
Compressor summary: AER uses forgetting to separate noisy, complex samples from clean ones and ABS prioritizes purity on the current task while retaining relevant past samples for Continual Learning under Noisy Labels.
http://arxiv.org/abs/2408.14283v1
Compressor summary: The paper compares causal and non-causal language modeling for English and Spanish, finding that different generation strategies may be more suitable for each language depending on their grammatical structures.
http://arxiv.org/abs/2408.14281v1
Compressor summary: The text discusses methods to estimate and incorporate uncertainty in latent representations of pretrained computer vision models, making machine learning more trustworthy and accessible.
http://arxiv.org/abs/2408.14267v1
Compressor summary: The paper proposes 1-bit Fully Quantized Training (FQT) and introduces two techniques to improve it: Activation Gradient Pruning (AGP) and Sample Channel joint Quantization (SCQ), achieving higher accuracy and faster training speed compared to per-sample quantization.
http://arxiv.org/abs/2408.14262v1
Compressor summary: SSL speech models perform similarly to traditional ASR systems in recognizing AAVE and MAE, but have higher error rates on AAVE features.
http://arxiv.org/abs/2408.14252v1
Compressor summary: The study compares different explanation methods for machine-generated text detectors and finds that SHAP is the best in faithfulness and stability, while LIME is the most usable but worst in predicting detector behavior.
http://arxiv.org/abs/2408.14249v1
Compressor summary: The paper surveys various few-shot object detection methods that adapt to new object categories quickly with fewer labeled samples, comparing their performance and exploring their applications and challenges.
http://arxiv.org/abs/2408.14244v1
Compressor summary: The paper proposes a new VSR method (CTUN) that uses a cascaded alignment module and a unidirectional propagation network to improve efficiency and performance for video super-resolution on resource-constrained devices.
http://arxiv.org/abs/2408.14236v1
Compressor summary: Semantic towers are a way to represent knowledge outside of language models, and their performance depends on how well they connect to intrinsic knowledge in large models.
http://arxiv.org/abs/2408.14234v1
Compressor summary: The paper introduces a new evaluation metric for feature selection algorithms that considers both performance and stability, and demonstrates its effectiveness through experiments and comparisons.
http://arxiv.org/abs/2408.14229v1
Compressor summary: The text discusses a method to improve open-set face recognition by estimating both gallery and embedding uncertainties using a Bayesian probabilistic model, and tests it on challenging datasets.
http://arxiv.org/abs/2408.14227v1
Compressor summary: The paper introduces TC-DPM, a diffusion method that translates infrared videos to visible ones while preserving semantics and temporal consistency, achieving better results than previous methods.
http://arxiv.org/abs/2408.14225v1
Compressor summary: The paper presents efficient methods to approximate imbalanced point clustering using coresets and choice clustering, and demonstrates their effectiveness on various datasets.
http://arxiv.org/abs/2408.14224v1
Compressor summary: The paper proposes a new method for recognizing goals in agent behavior by comparing observed facts with their expected probabilities, which are estimated and approximated efficiently.
http://arxiv.org/abs/2408.14211v1
Compressor summary: MagicMan is a multi-view diffusion model that uses a pre-trained 2D diffusion model and a parametric SMPL-X body prior to generate high-quality images from a single reference image, improving generalizability and 3D human reconstruction.
http://arxiv.org/abs/2408.14206v1
Compressor summary: The text proposes a disease classification approach for lemons and oranges using deep learning and machine learning algorithms to improve citrus farming and reduce yield losses.
http://arxiv.org/abs/2408.14195v1
Compressor summary: Key points: - The text studies the RAI problem in the MAB framework, where arms are clustered by reward distribution and the goal is to identify a fixed number of arms from each cluster with minimal pulls. - The text provides an instance-dependent lower bound on the sample complexity and two algorithms based on confidence intervals that match the lower bound in terms of order. - The text compares the algorithms empirically on synthetic and real datasets and shows their superior performance. Summary: The text proposes and analyzes two algorithms for identifying a fixed number of arms from each cluster in the MAB framework with minimal pulls, based on confidence intervals, and demonstrates their empirical advantages over other methods on various data sets.
http://arxiv.org/abs/2408.14192v1
Compressor summary: FAFD-LDWR is a novel few-shot classification method that uses cross-normalization and local descriptor alignment to improve performance, reduce noise, and enhance interpretability.
http://arxiv.org/abs/2408.14187v1
Compressor summary: The paper proposes Ensemble Predicate Decoding (EPD) to address predicate bias in scene graph generation by using multiple decoders, including auxiliary ones trained on less common predicates, which improves the model's representation and prediction capability for all predicate categories.
http://arxiv.org/abs/2408.14186v1
Compressor summary: The authors propose a method to train keypoint descriptors that are invariant to affine transformations using representation theory of GL(2) and steerers, achieving state-of-the-art results in image matching.
http://arxiv.org/abs/2408.14185v1
Compressor summary: The paper proposes DynamicRouteGPT, which uses causal inference to balance global and local optimality for real-time dynamic path planning in complex traffic environments, considering traffic, preferences, and unexpected events.
http://arxiv.org/abs/2408.14180v1
Compressor summary: The paper introduces I2EBench, a comprehensive benchmark to evaluate instruction-based image editing models based on 16 dimensions that cover both high-level and low-level aspects of image quality, aligned with human perception, and offering valuable research insights for further development.
http://arxiv.org/abs/2408.14177v1
Compressor summary: NimbleD is a self-supervised monocular depth estimation framework that uses pseudo-labels from a large vision model, does not need camera intrinsics, and achieves low latency inference for virtual and augmented reality applications.
http://arxiv.org/abs/2408.14176v1
Compressor summary: The paper proposes improvements to SwiftBrush, a one-step text-to-image diffusion model, achieving state-of-the-art performance and quality.
http://arxiv.org/abs/2408.14173v1
Compressor summary: The paper proposes BackFlip, a local data augmentation technique for artistic image aesthetic assessment, which often outperforms global augmentations without changing the composition of the art images.
http://arxiv.org/abs/2408.14154v1
Compressor summary: The text discusses the importance of mental models in user interactions with intelligent systems, presents a new dataset to study them, and suggests that implicit adaptation can improve system usability and success.
http://arxiv.org/abs/2408.14153v1
Compressor summary: The paper investigates how dual encoder models like CLIP compare two inputs and find that they learn detailed connections between image regions and captions, but performance varies by class and data distribution, and improves with in-domain training.
http://arxiv.org/abs/2408.14152v1
Compressor summary: The paper proposes a method to align different types of geospatial data using a combination of variational autoencoder and adversarial training, while preserving their geographic information and artistic styles.
http://arxiv.org/abs/2408.14146v1
Compressor summary: TSAK is a two-stage approach that uses knowledge distillation to create efficient, privacy-aware, and wearable human activity recognition systems for smart factories with smaller models and reduced sensor inputs.
http://arxiv.org/abs/2408.14141v1
Compressor summary: Crowd-Calibrator is a method that uses crowd worker agreement to calibrate models for subjective NLP tasks and inform whether they should abstain from decisions, improving performance on hate speech detection and natural language inference.
http://arxiv.org/abs/2408.14137v1
Compressor summary: The paper introduces two design iterations of ARWFML, a modeling language for creating augmented reality scenarios without programming knowledge, and presents a comprehensibility study based on various evaluations.
http://arxiv.org/abs/2408.14135v1
Compressor summary: The paper introduces FC22k, a large food image composite dataset, and Foodfusion, a novel method that uses diffusion models to synthesize natural images by fusing foreground and background information.
http://arxiv.org/abs/2408.14134v1
Compressor summary: The text proposes a novel two-stage framework that leverages Large Language Models (LLMs) to improve Graph Neural Networks (GNNs) for handling heterophilic graphs, where connected nodes have dissimilar features.
http://arxiv.org/abs/2408.14131v1
Compressor summary: GenFormer is a data augmentation strategy that uses generated images to improve transformer accuracy and robustness on small-scale image classification tasks.
http://arxiv.org/abs/2408.14130v1
Compressor summary: The study presents a method for learning from large-sized bags in weakly supervised learning by generating mini-bags, perturbing their proportions, and weighting losses to reduce overfitting.
http://arxiv.org/abs/2408.14126v1
Compressor summary: The authors propose a new way to improve fairness in model training by adjusting the weights of training data using a bilevel formulation and discretization, which enhances both prediction performance and fairness measures.
http://arxiv.org/abs/2408.14119v1
Compressor summary: The paper introduces Subspace Contrastive Learning, a new text clustering method that models cluster-wise relationships and outperforms existing methods on various datasets.
http://arxiv.org/abs/2408.14118v1
Compressor summary: The paper presents a modular algorithm for dynamic embedding in e-commerce that extends input size, preserves learned knowledge, and handles new product introductions better than traditional embeddings.
http://arxiv.org/abs/2408.14116v1
Compressor summary: Key points: - The paper proposes a framework to aggregate models from IoT devices using satellites for AI training. - The framework uses LEO and GEO satellites for low-latency and global coverage. - The paper formulates a network energy minimization problem as a DST problem and solves it with TAEER algorithm. Summary: The paper presents a satellite-based framework to aggregate models from IoT devices for AI training, using a minimum energy routing algorithm that reduces communication overhead and privacy concerns.
http://arxiv.org/abs/2408.14111v1
Compressor summary: The paper proposes a spatial-temporal attention-based model for hand gesture-based sign language recognition using hand joint skeletons, improving accuracy, privacy, and efficiency.
http://arxiv.org/abs/2408.14101v1
Compressor summary: The paper proposes a new way to answer causal-effect queries over discrete observable variables by learning the causal Bayesian network and its latent variables from observational data, which can be more effective than estimand approaches for larger models.
http://arxiv.org/abs/2408.14087v1
Compressor summary: The text proposes a novel model called LSM-YOLO, which improves real-time medical ROI detection by refining feature extraction and enhancing fusion between ROI features and neighboring features.
http://arxiv.org/abs/2408.14084v1
Compressor summary: The paper introduces a new database and benchmark for recognizing and studying ancient characters in the Houma Alliance Book using deep learning and digital technology.
http://arxiv.org/abs/2408.14073v1
Compressor summary: The paper proposes a new online method for detecting changes in data using an algorithm that combines multiple experts and improves the fixed share forecaster's performance.
http://arxiv.org/abs/2408.14069v1
Compressor summary: The text discusses vacuous reduct semantics, a method to refine abstract argumentation frameworks by accepting only extensions with no non-empty alternative extensions, and analyzes its principles and behavior.
http://arxiv.org/abs/2408.14060v1
Compressor summary: The paper uses deep learning to analyze visual similarities of ethnic minority patterns in Southwest China, developing a custom network that outperforms other models and using three metrics to evaluate features, resulting in an ethnic thematic map.
http://arxiv.org/abs/2408.14053v1
Compressor summary: The paper proposes using chain-of-thought prompting to improve AI models' accuracy in detecting depressive disorder symptoms based on PHQ-8 scores.
http://arxiv.org/abs/2408.14051v1
Compressor summary: The proposed V2I-DETR method leverages video context for lesion detection in medical videos, while maintaining fast inference speed.
http://arxiv.org/abs/2408.14042v1
Compressor summary: PAGE is a framework for generating explanations for graph neural networks by training an auto-encoder to extract causal features from latent space and map them to substructures of the input graph.
http://arxiv.org/abs/2408.14033v1
Compressor summary: The MLR-Copilot system uses large language models to generate and implement research ideas, increasing productivity in machine learning research by speeding up experimentation and reducing complexity.
http://arxiv.org/abs/2408.14028v1
Compressor summary: SurGen is a text-guided diffusion model that generates high-resolution, long surgical videos with good quality and alignment to text prompts, showing its potential as an educational tool.
http://arxiv.org/abs/2408.14026v1
Compressor summary: The study proposes a framework that uses pseudo-labeling to improve automatic speech recognition (ASR) for low-resource languages like Hindi, using a new benchmark called IndicYT with YouTube audio files.
http://arxiv.org/abs/2408.14025v1
Compressor summary: The paper presents AIRT-Module, an IRT-based analysis tool for evaluating algorithm performance across diverse tasks using difficulty and consistency measures.
http://arxiv.org/abs/2408.14023v1
Compressor summary: Video-CCAM is a robust video-language model that uses cross-attention layers and causal cross-attention masks to process videos of various lengths, achieving excellent performance in several benchmarks.
http://arxiv.org/abs/2408.14016v1
Compressor summary: Key points: - The paper proposes a method for generating multiple views from a single image using a latent video diffusion model with epipolar attention layers - The method addresses pixel-level misalignment across views by focusing on spatially adjacent regions - The method improves downstream multi-view to 3D reconstruction tasks
http://arxiv.org/abs/2408.14014v1
Compressor summary: This survey covers recent advances in category theory-based machine Learning, focusing on gradient-based, probability-based, invariance-based, and topos-based learning approaches.
http://arxiv.org/abs/2408.14013v1
Compressor summary: The paper proposes a color edge detection method that combines collaborative filtering with multiscale gradient fusion to enhance noise robustness and edge quality.
http://arxiv.org/abs/2408.14010v1
Compressor summary: The study develops models to predict water quality parameters using satellite data and cloud computing, showing improved accuracy compared to previous methods.
http://arxiv.org/abs/2408.14008v1
Compressor summary: The authors propose a new model (LMM-VQA) that uses large multimodal models to assess video quality by extracting spatial and temporal features and aligning them with language tokens, achieving state-of-the-art performance on five VQA benchmarks.
http://arxiv.org/abs/2408.13995v1
Compressor summary: The Avatar Concept Slider (ACS) is a method for precise 3D avatar editing using semantic concepts and three designs that preserve identity and efficiency.
http://arxiv.org/abs/2408.13991v1
Compressor summary: The paper proposes Dual-CBA, a bi-level framework that adapts to catastrophic distribution shifts in online continual learning, using class-specific and class-agnostic modules, and Incremental Batch Normalization to alleviate feature bias.
http://arxiv.org/abs/2408.13988v1
Compressor summary: This review analyzes artificial intelligence methods for automatically generating medical reports from 2021 to 2024, discussing challenges, applications, datasets, evaluation metrics, and future directions.
http://arxiv.org/abs/2408.13282v1
Compressor summary: The paper develops a question answering system for bridge design specification using different fine-tuning and self-built language models, achieving high accuracy in the training dataset but needing improvement in generalization.
http://arxiv.org/abs/2408.13987v1
Compressor summary: FocusICL is a training-free method to improve large language models' task adaptation by filtering unimportant contents and ensuring sufficient attention, which outperforms vanilla ICL in many-shot settings.
http://arxiv.org/abs/2408.13986v1
Compressor summary: Key points: - The paper proposes AgentMove, a framework for generalized mobility prediction using large language models (LLMs). - AgentMove consists of three modules: spatial-temporal memory, world knowledge generator, and collective knowledge extractor. - AgentMove performs better than the best baseline in various metrics and shows less geographical bias. Summary: AgentMove is a framework that uses LLMs to predict human mobility for any city worldwide, by mining individual patterns, modeling urban structure, and capturing shared patterns among population. It outperforms existing methods and reduces geographical bias.
http://arxiv.org/abs/2408.13983v1
Compressor summary: This paper proposes a dual-path token lifting method for transformer models to efficiently separate input signals into principal components and noise components, improving test-time domain adaptation performance.
http://arxiv.org/abs/2408.13979v1
Compressor summary: This paper explores the impact of soft-prompt norms on vision-language models and proposes a method to normalize them for better performance.
http://arxiv.org/abs/2408.13972v1
Compressor summary: Key points: - Dynamic scene reconstruction aims to render high-quality and real-time images of moving scenes - DynaSurfGS is a novel method that combines 4D neural voxels, Gaussian splatting, normal regularization, and ARAP constraint - DynaSurfGS outperforms existing methods in surface reconstruction and rendering quality Summary: DynaSurfGS is a new approach for dynamic scene reconstruction that achieves photorealistic rendering and high-fidelity surface reconstruction by integrating various techniques, such as 4D neural voxels and ARAP constraint.
http://arxiv.org/abs/2408.13966v1
Compressor summary: The paper proposes a two-phase approach for automated short answer scoring that uses key phrases and cross-prompt data to reduce training costs and improve accuracy.