This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-25 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2401.13666v1
Compressor summary: The paper explores pattern recognition models using two operators and algebraic operations to create a family of algorithms with guaranteed completeness.
http://arxiv.org/abs/2401.13662v1
Compressor summary: This paper gives an overview of on-policy policy gradient algorithms in deep reinforcement learning, including their theory, implementation, and comparison on continuous control environments.
http://arxiv.org/abs/2401.13660v1
Compressor summary: MambaByte is a token-free language model that performs well on byte sequences and competes with subword Transformers while offering faster inference.
http://arxiv.org/abs/2401.13657v1
Compressor summary: This study examines stochastic deep learning methods for uncertainty estimation in mortality prediction for ICU patients and finds that they underestimate epistemic uncertainty and overconfident models, limiting their reliability for clinical decision support.
http://arxiv.org/abs/2401.13652v1
Compressor summary: The paper proposes a new method using Graph-Informed Neural Networks (GINNs) to find discontinuities in high-dimensional functions, which is efficient, accurate, and adaptable to different algorithms.
http://arxiv.org/abs/2401.13649v1
Compressor summary: VisualWebArena is a benchmark to test multimodal web agents' abilities on realistic visually grounded tasks using image-text inputs, natural language instructions, and website actions.
http://arxiv.org/abs/2401.13641v1
Compressor summary: Key points: - Large Language Models (LLMs) like GPT can perform various tasks without specific training - ChatGPT is a conversational interface for LLMs that has many applications - The study explores ChatGPT's ability for face biometrics tasks such as verification and estimation - ChatGPT could increase explainability and transparency of automatic decisions in human scenarios - Experiments show the potential of ChatGPT for face biometrics, especially to enhance explainability Summary: The study evaluates ChatGPT, a conversational interface for LLMs, for face biometrics tasks, finding that it could improve explainability and performance in human scenarios.
http://arxiv.org/abs/2401.13627v1
Compressor summary: SUPIR is a new image restoration method that uses generative models, scaling, and text prompts to improve the quality of images.
http://arxiv.org/abs/2401.13621v1
Compressor summary: The paper proposes a new sentence representation learning method that uses denoising from the intra-sentence perspective and shows its effectiveness in semantic textual similarity tasks and transfers.
http://arxiv.org/abs/2401.13613v1
Compressor summary: CLIP is a pre-trained model that learns to understand images and text, enabling effective image retrieval using natural language queries.
http://arxiv.org/abs/2401.13604v1
Compressor summary: The paper proposes a stream-based perception approach for cognitive agents that enables them to perceive meaningful situations from low-level sensor data and use them to guide auctions for collaboration in a crowdshipping case study.
http://arxiv.org/abs/2401.13601v1
Compressor summary: This paper surveys existing and emerging MultiModal Large Language Models (MM-LLMs), their design, performance, and future directions.
http://arxiv.org/abs/2401.13598v1
Compressor summary: The paper proposes a method to generate labeled data for extracting relations from documents using large language models, which improves performance on the task.
http://arxiv.org/abs/2401.13594v1
Compressor summary: The paper introduces a novel method to generate exhaustive and high-quality question answering (QA) training data from procedural text using graph-based representations, which enables training compact and competitive task-specific QA models.
http://arxiv.org/abs/2401.13588v1
Compressor summary: The study evaluated three large language models in understanding clinical notes and found GPT-4 to be the best overall, while GPT-3.5 and text-davinci-003 performed better with specific prompting strategies.
http://arxiv.org/abs/2401.13586v1
Compressor summary: The study examines how adjusting the importance of different parts of a language model affects its performance in instruction tasks, depending on the length of the input text.
http://arxiv.org/abs/2401.13581v1
Compressor summary: The paper introduces DigPro, a novel end-to-end deep clustering framework that extends contrastive learning from instance-level to group-level and performs prototype aggregation in a spherical feature space for efficient and effective representation learning and clustering.
http://arxiv.org/abs/2401.13565v1
Compressor summary: The paper introduces Mistral 7B, a large language model with extended context lengths, and shows its improved performance on Malay grammar tasks.
http://arxiv.org/abs/2401.13560v1
Compressor summary: SegMamba is a fast and efficient 3D medical image segmentation model that uses a State Space Model to capture long-range dependencies in volume features.
http://arxiv.org/abs/2401.13558v1
Compressor summary: Tanh and ReLU activation functions in neural networks affect the representational geometry differently, influencing the disentanglement of inputs and outputs depending on the target output dimension.
http://arxiv.org/abs/2401.13555v1
Compressor summary: The text introduces a framework for evaluating the fairness and diversity of conditional generative models, particularly in image upsampling, using UnfairFace as a benchmark dataset with balanced racial distribution.
http://arxiv.org/abs/2401.13554v1
Compressor summary: The PanAf20K dataset is a large and diverse video collection of chimpanzees and gorillas in Africa with annotations for AI tasks that help conserve endangered species.
http://arxiv.org/abs/2401.13551v1
Compressor summary: The paper proposes a new unsupervised video anomaly detection method that alternates training one-class classification and weakly-supervised models, and uses weighted one-class classification and adaptive thresholding to improve performance.
http://arxiv.org/abs/2401.13544v1
Compressor summary: The text introduces a method for making interpretable neural networks by performing concept-based interventions on already-trained models, which can improve their effectiveness and calibration.
http://arxiv.org/abs/2401.13531v1
Compressor summary: QAGait is a gait recognition approach that improves reliability and performance by using cost-effective quality assessment strategies to handle various challenging silhouettes in real-world scenarios.
http://arxiv.org/abs/2401.13530v1
Compressor summary: The paper explores new optimization methods on the Wasserstein space, a probability measure metric space, by extending standard stochastic methods to the Riemannian setting.
http://arxiv.org/abs/2401.13527v1
Compressor summary: The text introduces Chain-of-Information Generation (CoIG), a method for efficient speech generation by decoupling semantic and perceptual information, and SpeechGPT-Gen, an 8-billion-parameter SLLM that uses CoIG to excel in various speech tasks.
http://arxiv.org/abs/2401.13516v1
Compressor summary: Delocate is a novel Deepfake detection model that can recognize and locate unknown domain Deepfake videos by recovering real faces and guiding the localization process with supervision.
http://arxiv.org/abs/2401.13512v1
Compressor summary: The study used GPT-3.5 to generate medical documents with ICD-10 codes for data augmentation, improving performance for generation codes and their families, but the generated texts lacked variety and authenticity.
http://arxiv.org/abs/2401.13505v1
Compressor summary: The paper proposes a novel generative model for human motion stylization that uses latent space of pretrained autoencoders to extract and infuse style, allowing versatile and flexible stylization with content preservation and good performance in various applications.
http://arxiv.org/abs/2401.13504v1
Compressor summary: Large Language Models can detect basic tampering activities but struggle with identifying sophisticated forgeries and realistic AI-generated images.
http://arxiv.org/abs/2401.13503v1
Compressor summary: The paper introduces PICI, a new deep image clustering method that uses a Transformer encoder and three learning modules to achieve better results than existing methods.
http://arxiv.org/abs/2401.13499v1
Compressor summary: The paper introduces LDCA, a novel approach for few-shot image classification that uses local descriptors with contextual augmentation from a visual transformer to improve global understanding and achieve state-of-the-art results.
http://arxiv.org/abs/2401.13463v1
Compressor summary: SpeechDPR is a framework that uses unsupervised ASR and text dense retriever knowledge to find answers in spoken passages without manual transcription.
http://arxiv.org/abs/2401.13460v1
Compressor summary: MADRID is a novel approach to generate diverse adversarial scenarios for testing robustness in multi-agent systems, which can expose strategic vulnerabilities in pre-trained policies.
http://arxiv.org/abs/2401.13447v1
Compressor summary: The authors show how to use reinforcement learning and deep neural networks to automate solving linear equations in symbolic form.
http://arxiv.org/abs/2401.13444v1
Compressor summary: CGPE is a framework that efficiently combines knowledge graphs with large language models, using question clues to explore the required knowledge path, resulting in improved performance and reduced computational overhead.
http://arxiv.org/abs/2401.13432v1
Compressor summary: The paper proposes a novel method called CoupledTPS to improve single-image-based warping tasks by coupling multiple thin-plate splines with limited control points and using semi-supervised learning with unlabeled data.
http://arxiv.org/abs/2401.13418v1
Compressor summary: The paper presents a new framework to evaluate serial biometric fusion systems and shows their benefits over parallel ones using theoretical analysis and experiments.
http://arxiv.org/abs/2401.13414v1
Compressor summary: GTAutoAct is a dataset generation framework that uses game engine technology to create large-scale, diverse, and high-quality action recognition datasets with various viewpoints and annotations.
http://arxiv.org/abs/2401.13408v1
Compressor summary: The paper proposes a causal reasoning framework to formalize perception in automated decision-making systems and its implications for fairness and bias.
http://arxiv.org/abs/2401.13405v1
Compressor summary: The paper proposes a synthetic data generation method for object recognition and pose estimation in robotic grasping that reduces human intervention, costs, and improves segmentation and grasping performance.
http://arxiv.org/abs/2401.13398v1
Compressor summary: The paper shows that text categorization helps improve stopword extraction in nine African languages and French by detecting domain-agnostic stopwords with high success rates, but variances exist across languages.
http://arxiv.org/abs/2401.13391v1
Compressor summary: The paper argues that current bias mitigation techniques in AI are not sufficient because they do not account for changes within subgroups and suggests focusing on ranking precision before ensuring fair representation.
http://arxiv.org/abs/2401.13388v1
Compressor summary: UNIMO-G is a diffusion model that generates images from multimodal inputs, improving image quality and fidelity to textual and visual descriptions.
http://arxiv.org/abs/2401.13386v1
Compressor summary: The paper proposes a hybrid approach for privacy-preserving face recognition using frequency and color fusion, identity-specific mapping, and secure multiparty computation, achieving higher accuracy than existing methods.
http://arxiv.org/abs/2401.13363v1
Compressor summary: The paper introduces a new task, dataset, and evaluation protocol for measuring the generalizability of human dance generation models and proposes a novel zero-shot framework, MultiDance-Zero, that can synthesize realistic videos with multiple persons and complex backgrounds.
http://arxiv.org/abs/2401.13360v1
Compressor summary: The paper proposes a noIse-Tolerant Expert Model (ITEM) to address data and training bias in sample selection for learning with noisy labels, using a robust network architecture and a mixed sampling strategy.
http://arxiv.org/abs/2401.13357v1
Compressor summary: The paper presents a linear relative pose estimation algorithm for n point pairs that filters out outliers by reweighting and improves accuracy even with a high percentage of outliers.
http://arxiv.org/abs/2401.13352v1
Compressor summary: EndoGaussians is a new method that uses Gaussian Splatting to accurately reconstruct 3D soft body tissues from endoscopic videos, improving medical applications like VR surgery and image analysis.
http://arxiv.org/abs/2401.13334v1
Compressor summary: TNTRules is a new method that explains Bayesian optimization solutions in a way that improves human-AI collaboration and trust in parameter tuning.
http://arxiv.org/abs/2401.13330v1
Compressor summary: This paper proposes NACHOS, a NAS framework for designing optimal EENNs that balance accuracy and MAC operations while satisfying hardware constraints.
http://arxiv.org/abs/2401.13329v1
Compressor summary: The text describes a method to simulate target domain videos for cross-domain video moment retrieval using generative video diffusion controlled by target sentences, addressing both generation and selection of high-quality simulation videos.
http://arxiv.org/abs/2401.13327v1
Compressor summary: The authors propose a method to generate realistic and privacy-preserving synthetic smartwatch health data for stress detection using GANs and DP safeguards.
http://arxiv.org/abs/2401.13325v1
Compressor summary: MCDL is a semi-supervised learning method that uses memory banks to record historical predictions of unlabeled data, measuring their credibility, and dividing the data into consistent and inconsistent groups for better learning.
http://arxiv.org/abs/2401.13313v1
Compressor summary: InstructDoc is a collection of 30 diverse visual document understanding datasets with human-written instructions, and InstructDr is a new model that connects document images, image encoders, and large language models to adapt to various VDU tasks.
http://arxiv.org/abs/2401.13311v1
Compressor summary: The paper introduces ConTextual, a benchmark to evaluate large multimodal models' ability to perform text-rich visual reasoning in diverse real-world scenarios, revealing significant performance gaps between current AI and human capabilities.
http://arxiv.org/abs/2401.13307v1
Compressor summary: The study introduces a new task (MRG) and a model (ChatterBox) for multimodal dialogues that handle complex spatial relationships and reasoning among multiple instances.
http://arxiv.org/abs/2401.13303v1
Compressor summary: MaLA-500 is a new large language model that works for 534 languages and improves in-context learning.
http://arxiv.org/abs/2401.13301v1
Compressor summary: The study used machine learning to analyze MRI data from patients with different stages of MS and found that specific brain features could help differentiate between early clinical expressions of the disease with 78% accuracy.
http://arxiv.org/abs/2401.13298v1
Compressor summary: The paper proposes an explainable method to detect harmful memes using multimodal debates between large language models and a fine-tuned small language model as a judge, providing better performance and explanations than existing methods.
http://arxiv.org/abs/2401.13296v1
Compressor summary: The article presents a video-interpretation task for detecting objectification of characters in films using a novel dataset and evaluating existing vision models.
http://arxiv.org/abs/2401.13285v1
Compressor summary: Key points: - The paper proposes a Siamese network-based method for small object tracking in LiDAR point cloud - The method consists of TAPM and RGS modules that learn prototypes and recover fine-grained features respectively - The method improves the tracking performance of small targets without affecting normal-sized objects Summary: The paper presents a Siamese network method that learns prototype features for accurate tracking of small objects in LiDAR point cloud, using TAPM and RGS modules.
http://arxiv.org/abs/2401.13282v1
Compressor summary: RefreshNet is a multiscale framework that uses convolutional autoencoders and recurrent neural networks to capture latent dynamics of complex systems, enabling efficient long-term predictions with high accuracy.
http://arxiv.org/abs/2401.13280v1
Compressor summary: Color contrast between lesion area and skin affects malignancy detection in skin disease datasets, suggesting that dermatology AI models should consider both color difference and skin tone when evaluating skin conditions.
http://arxiv.org/abs/2401.13275v1
Compressor summary: The paper investigates if AI assistants can recognize and communicate their uncertainty in natural language, focusing on open-domain question answering tasks.
http://arxiv.org/abs/2401.13270v1
Compressor summary: The paper proposes a novel method for automatic image colorization using audio information, which improves the performance of color estimation by incorporating semantic information from both audio and video.
http://arxiv.org/abs/2401.13267v1
Compressor summary: The study proposes a new framework for generating medical reports from images, using dual-modal learning with dynamic traceback to capture both pathological semantics and morphological details, and perform well without text input during inference.
http://arxiv.org/abs/2401.13264v1
Compressor summary: Key points: - Detection transformer needs abundant training data but faces challenges in cross-domain adaptation - Proposed method uses adversarial learning and mean-teacher framework to address issues of class imbalance and performance degradation - Method introduces IoU-aware prediction branch, dynamic category threshold refinement, and instance-level class-aware contrastive learning module - Method improves performance and alleviates class imbalance in diverse domain-adaptive scenarios Summary: The paper presents a novel detection transformer method that uses adversarial learning and mean-teacher framework to handle class imbalance and performance issues in cross-domain adaptation, with three innovative components.
http://arxiv.org/abs/2401.13256v1
Compressor summary: The paper proposes UniMS-RAG, a system that uses multiple sources to generate personalized responses, by decomposing the task into three sub-tasks, unifying them in a sequence-to-sequence paradigm, and using self-refinement mechanism.
http://arxiv.org/abs/2401.13246v1
Compressor summary: SEER is a novel method that improves question-answering systems by enabling structured reasoning and explanation using a structure-based return and a fine-grained reward function.
http://arxiv.org/abs/2401.13239v1
Compressor summary: Just-predict-others is a new crowdsourcing approach that uses self-supervised learning and adaptive weighting to produce more accurate group estimates when workers' skills vary or their estimates correlate.
http://arxiv.org/abs/2401.13229v1
Compressor summary: The paper proposes an automatic data selection method to build a small but diverse dataset for few-shot learning, addressing issues with random sampling and crowdsourcing for natural language processing tasks.
http://arxiv.org/abs/2401.13227v1
Compressor summary: The paper introduces LPNL, a framework using natural language prompts and a T5 model to improve scalable link prediction on large heterogeneous graphs.
http://arxiv.org/abs/2401.13223v1
Compressor summary: The authors propose a step-wise pipeline for question answering over hybrid tabular and textual data, using large language models like GPT-4 initially but later developing a specialized smaller model (TAT-LLM) that outperforms existing methods on various benchmarks.
http://arxiv.org/abs/2401.13221v1
Compressor summary: The text introduces U-WADN, a novel image restoration method that adapts to different degradation types and levels using varying width sub-networks, achieving better performance and reducing computational resources.
http://arxiv.org/abs/2401.13218v1
Compressor summary: The paper introduces ULTRA, a hierarchical framework that uses open source LLMs to extract event arguments from documents cost-effectively and without positional bias, and LEAFER to improve argument boundary locating.
http://arxiv.org/abs/2401.13214v1
Compressor summary: A new deep learning method called AMAM improves ship detection in coastal areas by learning multi-scale features and adaptively aggregating salient features from various layers, outperforming existing methods.
http://arxiv.org/abs/2401.13213v1
Compressor summary: The authors propose CSBD, a method to discover dataset feature correlations based on image descriptions, which can help mitigate model bias by adjusting image sampling weights.
http://arxiv.org/abs/2401.13210v1
Compressor summary: MITIGATE is a novel framework for graph anomaly detection that leverages multitask learning and node informativeness to improve performance.
http://arxiv.org/abs/2401.13206v1
Compressor summary: The paper proposes a self-improving interference management framework that combines deep learning and uncertainty quantification to enhance wireless communication performance and address limitations of data-driven models.
http://arxiv.org/abs/2401.13205v1
Compressor summary: The paper proposes a black-box adversarial generative framework that enhances input diversity and adapts step sizes for better transferable adversarial examples.
http://arxiv.org/abs/2401.13203v1
Compressor summary: The text describes a new pipeline for generating 3D indoor scenes with customizable styles and appearances, using professionally designed bounding boxes as guidance.
http://arxiv.org/abs/2401.13201v1
Compressor summary: This paper explores how to adapt multimodal large language models for person re-identification and proposes two methods, Common Instruction and DirectReID, to address the challenges in this task.
http://arxiv.org/abs/2401.13200v1
Compressor summary: PDGNNs with TEM reduce memory complexity, utilize topological information for memory replay, and improve performance on expanding graphs in continual learning.
http://arxiv.org/abs/2401.13193v1
Compressor summary: Catch-up Mix is a novel method that improves deep learning models' performance and robustness by mixing activation maps with lower norms to promote diverse representations and reduce reliance on specific filters.
http://arxiv.org/abs/2401.13192v1
Compressor summary: The framework presents a novel approach for generating new, stable crystal structures using point cloud representation and diffusion model, which can help in material design and synthesis.
http://arxiv.org/abs/2401.13191v1
Compressor summary: Key points: - The text presents a two-stage training approach for face landmark detection in multiple domains using limited data and a pre-trained diffusion model. - The first stage trains a landmark-conditioned face generation model on real faces, and the second stage fine-tunes it on synthetic pairs with text prompts. - The method generates high-quality synthetic paired datasets from multiple domains and improves face landmark detection performance. Summary: The authors propose a method that uses limited data and a pre-trained diffusion model to generate synthetic paired datasets for multi-domain face landmark detection, achieving better results than existing methods.
http://arxiv.org/abs/2401.13185v1
Compressor summary: The authors present three efficient algorithms for computing matrices needed by predictive models like Kernel-Based Partial Least-Squares, which speed up cross-validation and do not leak data.
http://arxiv.org/abs/2401.13178v1
Compressor summary: AgentBoard is an evaluation framework for large language models that provides insights into their capabilities and limitations through interactive visualization and fine-grained progress rate metrics.
http://arxiv.org/abs/2401.13174v1
Compressor summary: The paper proposes a method to improve small semantic segmentation models by teaching them to preserve object boundaries and relations from larger models.
http://arxiv.org/abs/2401.13172v1
Compressor summary: This paper introduces ADMap, a framework for reconstructing HD maps in autonomous driving that reduces jitter and improves stability with multi-scale perception, interactive attention, and direction difference loss.
http://arxiv.org/abs/2401.13171v1
Compressor summary: The paper proposes a method for inverse design using learned diffusion models, which improves performance and allows compositional design of complex systems.
http://arxiv.org/abs/2401.13170v1
Compressor summary: The paper proposes a new evaluation method for question answering that uses expert rules and a lightweight classifier to better align with human judgments.
http://arxiv.org/abs/2401.13165v1
Compressor summary: The chapter explores gender-related errors in machine translation for low-resource languages like Bengali, highlighting the social and computational factors that create linguistic hierarchies and their impacts on representational harms and language preservation.
http://arxiv.org/abs/2401.13161v1
Compressor summary: The paper presents a noise-robust hyperspectral sparse unmixing method using multiscale spatial regularization with group sparsity, which selects the most representative abundance estimation for robust and reproducible results.
http://arxiv.org/abs/2401.13160v1
Compressor summary: SpacTor is a new training method for large language models that combines span corruption and token replacement detection, reducing pre-training time and computational costs while maintaining or improving downstream task performance.