This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-29 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2401.15077v1
Compressor summary: EAGLE is a fast and lossless framework for accelerating Large Language Models using speculative sampling at the second-top-layer feature level.
http://arxiv.org/abs/2401.15075v1
Compressor summary: The authors propose a new way to train generative models, such as GANs and diffusion models, to create more realistic hand images by adding extra information about hands in the training data.
http://arxiv.org/abs/2401.15071v1
Compressor summary: The paper studies how well large language models can handle different types of input (text, code, image, video) and their trustworthiness, generalizability, and causal reasoning abilities using qualitative analysis on various models including GPT-4 and Gemini.
http://arxiv.org/abs/2401.15068v1
Compressor summary: The paper introduces a new corpus of 19th century U.S. literature words with variants, trains neural edit distance models on it, and compares their performance with models trained on L2 English learners' errors, using different negative sample generation strategies.
http://arxiv.org/abs/2401.15062v1
Compressor summary: EWC is a hierarchical contextual bandit framework that leverages low-dimensional latent space to accelerate user preference learning and minimize regret in mobility recommendation systems.
http://arxiv.org/abs/2401.15059v1
Compressor summary: The paper explores how independent agents can communicate in multi-agent reinforcement learning without parameter sharing, proposes a new learning scheme for this setting, and studies the impact of network capacities on communication efficiency.
http://arxiv.org/abs/2401.15055v1
Compressor summary: The text proposes an AI-based computer vision system that can detect and classify different stages of ripening in tomato plants to optimize harvesting time.
http://arxiv.org/abs/2401.15050v1
Compressor summary: LongFin is a multimodal document AI model that can handle long financial documents and outperforms existing models on the new LongForms dataset.
http://arxiv.org/abs/2401.15048v1
Compressor summary: The paper introduces an image distortion technique that keeps biometric facial images unrecognizable to human eyes but identifiable by neural networks for privacy-aware biometric authentication systems.
http://arxiv.org/abs/2401.15043v1
Compressor summary: The authors introduce SimpleDC, a corpus for cancer education text simplification research, and explore various LLM-based methods, finding that RLHF with a novel reward function improves performance across metrics and adapts out-of-domain models to targeted domains.
http://arxiv.org/abs/2401.15042v1
Compressor summary: The study introduces a framework called ProxyQA to evaluate the quality of long-form text generation by LLMs using human-curated meta-questions and proxy-questions with annotated answers.
http://arxiv.org/abs/2401.15030v1
Compressor summary: The study evaluates various neural network architectures' ability to handle different types of multimodal generalization tasks and introduces gCOG, a new benchmark for multimodal reasoning research.
http://arxiv.org/abs/2401.15029v1
Compressor summary: Neural radiance fields improve forest monitoring by capturing fine 3D structures and integrating multiple remote sensing modalities.
http://arxiv.org/abs/2401.15024v1
Compressor summary: SliceGPT is a new sparsification technique that reduces the embedding dimension of large language models, enabling faster inference with fewer GPUs and less memory.
http://arxiv.org/abs/2401.15006v1
Compressor summary: Airavata is a new Hindi-tuned LLM that improves OpenHathi's performance for assistive tasks, and comes with a dataset and evaluation framework to support further research on Indic languages.
http://arxiv.org/abs/2401.15002v1
Compressor summary: BackdoorBench is a comprehensive benchmark for backdoor learning that provides an integrated implementation, comprehensive evaluation, and abundant analysis of state-of-the-art algorithms, helping researchers investigate, develop, and explore this field.
http://arxiv.org/abs/2401.14992v1
Compressor summary: The study proposes a new cluster repair method using graph metrics and active learning to handle duplicate-containing data sources effectively.
http://arxiv.org/abs/2401.14989v1
Compressor summary: The Mapping-to-Parameter function model is a novel approach to nonlinear functional regression that uses B-spline basis functions and a new knot placement algorithm to map complex functions from infinite-dimensional space to finite-dimensional parameter space, outperforming existing methods in various applications.
http://arxiv.org/abs/2401.14966v1
Compressor summary: The paper proposes a new image denoising method called MPI that uses masking strategy to pre-train a model on natural images and then iteratively fills the masked parts for efficient denoising of single noisy images.
http://arxiv.org/abs/2401.14953v1
Compressor summary: This paper explores using Solomonoff Induction, a powerful universal predictor, in neural networks by leveraging meta-learning with Universal Turing Machines data to push the limits of problem solving.
http://arxiv.org/abs/2401.14938v1
Compressor summary: The paper proposes DAM, a point cloud explainability method that uses a novel model and generates high-quality explanations with an adapted path gradient integration method, improving on existing methods in various aspects.
http://arxiv.org/abs/2401.14933v1
Compressor summary: SSDOnt is an ontology for describing and annotating single-subject design studies, enabling complex questions and searches about them.
http://arxiv.org/abs/2401.14931v1
Compressor summary: This paper studies how well large language models (LLMs) remember concepts from known ontologies, finding that their memorization depends on the popularity of these concepts online.
http://arxiv.org/abs/2401.14923v1
Compressor summary: The paper introduces Behavior Model Reinforcement Learning (BMRL), an AI framework that helps individuals achieve their goals by personalizing and interpreting interventions on their decision-making processes.
http://arxiv.org/abs/2401.14919v1
Compressor summary: Key points: - Method for robust estimation of geometric models from noisy data in real-time - Neural network segments input data into clusters representing potential model instances - Determines model parameters for each instance separately using sample and inlier weights - Trained via task-specific loss functions and new synthetic datasets Summary: The paper proposes a fast and accurate method that uses a neural network to segment noisy data into clusters of geometric models and estimates their parameters with RANSAC-like technique.
http://arxiv.org/abs/2401.14895v1
Compressor summary: SQ-b and OPT-m are techniques that improve post-training quantization of vision transformers, achieving significant accuracy improvements on various bit-width settings.
http://arxiv.org/abs/2401.14893v1
Compressor summary: The paper proposes a structured regression method for AI fairness evaluation across intersectional subgroups, improving accuracy and providing confidence intervals and insights into harm factors.
http://arxiv.org/abs/2401.14876v1
Compressor summary: The paper proposes a cross-space adaptive filter (CSF) for Graph Convolutional Networks that combines topology and node attributes to address the over-smoothing problem and improve node classification performance.
http://arxiv.org/abs/2401.14869v1
Compressor summary: F-Eval is a bilingual benchmark to evaluate large language models based on expression, commonsense, and logic, using tasks that better assess their fundamental abilities than previous methods.
http://arxiv.org/abs/2401.14861v1
Compressor summary: The paper presents a new method for controlling active soft bodies using neural networks and a physics-based simulation, which can accurately reproduce facial expressions and is easy to use.
http://arxiv.org/abs/2401.14856v1
Compressor summary: Our proposed method, Memory-Inspired Temporal Prompt Interaction (MITP), uses a two-stage human memory strategy to efficiently align vision and language modalities in large-scale multimodal models, reducing computational cost.
http://arxiv.org/abs/2401.14847v1
Compressor summary: The paper introduces IODDA, a novel algorithm that discovers how decisions are made and structured in complex business processes using object-centric process logs.
http://arxiv.org/abs/2401.14846v1
Compressor summary: The text discusses how machine learning algorithms for domain generalization may perform better than classic empirical risk minimization in dealing with label noise, but this advantage does not always translate to real-world benchmarks.
http://arxiv.org/abs/2401.14845v1
Compressor summary: AdaPT is a point cloud transformer model that adapts its token selection and budget at inference time, enabling efficient processing of large point clouds without sacrificing accuracy.
http://arxiv.org/abs/2401.14840v1
Compressor summary: The text introduces hybrid homomorphic encryption as a solution to protect privacy in machine learning, especially for classifying heart disease using encrypted ECG data.
http://arxiv.org/abs/2401.14838v1
Compressor summary: The paper proposes a new method called DFS for recognizing driver actions using multiple camera modalities in car cabins, which integrates complementary features across modalities and shares feature extraction stages.
http://arxiv.org/abs/2401.14832v1
Compressor summary: The paper introduces two new text inpainting datasets, one for scene text and one for handwritten text, and a novel neural framework called GSDM that uses global structure to restore corrupted texts with improved accuracy and quality.
http://arxiv.org/abs/2401.14828v1
Compressor summary: TIPEditor is a 3D scene editor that uses text, images, and bounding boxes to accurately edit scenes while maintaining their background.
http://arxiv.org/abs/2401.14818v1
Compressor summary: The text introduces ChemDFM, a large language model for chemistry that can understand chemical knowledge and languages better than general-domain models.
http://arxiv.org/abs/2401.14811v1
Compressor summary: This paper examines scalar, Markovian rewards in RL and shows they cannot express many instances of multi-objective, risk-sensitive, and modal RL tasks.
http://arxiv.org/abs/2401.14807v1
Compressor summary: PL-FSCIL uses visual prompts with a pre-trained Vision Transformer to enable deep neural networks to learn new tasks incrementally from few labeled samples without forgetting previous tasks, mimicking human learning patterns.
http://arxiv.org/abs/2401.14792v1
Compressor summary: The study proposes a method for protecting privacy during machine learning using the Privacy Funnel model, which works well for various face recognition tasks.
http://arxiv.org/abs/2401.14786v1
Compressor summary: Key points: - Hyperspectral Imaging (HSI) has many applications but is challenging to transmit due to large number of spectral bands - Compressive Sensing reduces HSI images by randomly subsampling spectral bands and reconstructing them with recovery algorithms - This work studies a data sparsification pre-processing stage prior to compression to ensure pixel sparsity - The gOMP algorithm reconstructs HSI images with high accuracy and fast convergence when pixels are highly sparsified but reduces image quality compared to original images Summary: The text proposes a method to compress and reconstruct hyperspectral images using data sparsification and the gOMP algorithm, which improves accuracy and speed but degrades image quality.
http://arxiv.org/abs/2401.14785v1
Compressor summary: The text describes a method for estimating human body poses from downwards-facing cameras on head-mounted devices using probabilistic joint rotations and a synthetic egocentric dataset, achieving state-of-the-art results with reduced parameters and faster speed.
http://arxiv.org/abs/2401.14777v1
Compressor summary: The paper studies adaptation methods for large language models (LLMs) in financial sentiment analysis, showing that smaller LLMs can perform similarly to larger ones while being more efficient.
http://arxiv.org/abs/2401.14772v1
Compressor summary: SGN is a new framework that can predict gene expression in tissue slides without training on specific gene types, by using functionality and phenotype information from a language model.
http://arxiv.org/abs/2401.14762v1
Compressor summary: The text compares different algorithms for compressing and recovering hyperspectral images, showing that greedy gOMP algorithm performs best in terms of accuracy and speed.
http://arxiv.org/abs/2401.14758v1
Compressor summary: The paper proposes conservative policy optimization and local policy convexification to improve safety constraints in primal-dual safe RL methods by addressing the uncertainty in cost estimation.
http://arxiv.org/abs/2401.14754v1
Compressor summary: The paper presents a novel end-to-end video transformer method for simultaneously deburring, enhancing low-light, and denoising videos using a multi-tier architecture and a new dataset.
http://arxiv.org/abs/2401.14743v1
Compressor summary: The paper introduces a new multimodal dataset of daily activities combining video simulations and knowledge graphs for hazard detection in home environments.
http://arxiv.org/abs/2401.14733v1
Compressor summary: The text discusses how movement and appearance in digital characters affect the perceived personality traits of videos altered by motion transfer networks.
http://arxiv.org/abs/2401.14732v1
Compressor summary: QINCo is a neural method for vector quantization that uses specialized codebooks per vector, predicted by a neural network, to improve data compression and search accuracy.
http://arxiv.org/abs/2401.14729v1
Compressor summary: The paper presents SRLane, a lane detection method that combines keypoint-based and proposal-based approaches with a "Sketch-and-Refine" paradigm, achieving fast performance and good accuracy.
http://arxiv.org/abs/2401.14726v1
Compressor summary: Du-NeRF is a new method that combines two neural fields to achieve high-quality geometry reconstruction and view rendering for indoor environments, improving both novel view synthesis and 3D reconstruction.
http://arxiv.org/abs/2401.14719v1
Compressor summary: The paper presents a method to map street-level plastic litter using deep learning and vehicle-mounted cameras, creating an open-source dataset and showing its effectiveness with four object detection algorithms.
http://arxiv.org/abs/2401.14718v1
Compressor summary: The paper surveys video prediction methods in computer vision, highlights challenges and trends, and introduces a new taxonomy based on stochasticity.
http://arxiv.org/abs/2401.14717v1
Compressor summary: The authors propose a method that combines neural acoustic modeling with large language modeling to predict turn-taking and backchanneling locations in spoken dialogue, improving human-AI conversation quality.
http://arxiv.org/abs/2401.14707v1
Compressor summary: The paper proposes a disentanglement-based approach to improve adversarial robustness of deep neural networks by separating and aligning latent features in the pre-trained and fine-tuned models.
http://arxiv.org/abs/2401.14702v1
Compressor summary: The paper proposes FairSample, a framework that mitigates demographic parity biases in GCNs by injecting edges, using reinforcement learning for neighbor sampling, and applying regularization.
http://arxiv.org/abs/2401.14698v1
Compressor summary: This paper explores the use of large language models to generate artificial data, highlighting their limitations in capturing human nuances and emphasizing ethical concerns in their application.
http://arxiv.org/abs/2401.14696v1
Compressor summary: The paper proposes a new feature augmentation method called asymptotic midpoint mixup that improves representation learning by addressing both inter-class and intra-class collapse problems in transfer learning tasks.
http://arxiv.org/abs/2401.14695v1
Compressor summary: The paper proposes a novel method (CEGNCDE) that captures both continuous temporal and spatial dependencies in traffic forecasting using a continuously evolving graph generator and a graph neural controlled differential equations framework.
http://arxiv.org/abs/2401.14694v1
Compressor summary: This paper proposes two interpretable deep learning models for predicting clinical outcomes in electronic health records (EHR) with irregular time intervals, and shows their superior performance on datasets for Alzheimer's disease and mortality prediction.
http://arxiv.org/abs/2401.14688v1
Compressor summary: Taiyi-Diffusion-XL is a bilingual Chinese and English text-to-image model that improves image generation and retrieval using CLIP and Stable-Diffusion-XL with vocabulary expansion, position encoding, and large vision-language pre-training.
http://arxiv.org/abs/2401.14686v1
Compressor summary: SSR improves image encoder robustness and semantic segmentation performance using SAM as a regularizer during training, while maintaining efficiency.
http://arxiv.org/abs/2401.14681v1
Compressor summary: The paper presents methods to detect homophobia and transphobia across ten languages using monolingual transformers and ensemble models, achieving top results for eight languages.
http://arxiv.org/abs/2401.14680v1
Compressor summary: The authors trained MaLLaM, a large language model for the Malay language, with different parameter sizes and showed its effectiveness in understanding and generating natural language specific to Malaysia.
http://arxiv.org/abs/2401.14661v1
Compressor summary: The paper proposes a new lightweight model that combines super-resolution and YOLOv5 architecture to improve object detection in aerial images with small and densely clustered objects, achieving better performance than existing methods.
http://arxiv.org/abs/2401.14656v1
Compressor summary: Scientific LLMs are a new subclass of large language models that facilitate scientific discovery by enhancing natural language comprehension and extending across various scientific disciplines, with a focus on biological and chemical domains.
http://arxiv.org/abs/2401.14654v1
Compressor summary: The paper presents a dataset for predicting legal outcomes in insurance disputes in Korean and shows that Sentence Transformer Fine-tuning can achieve similar performance to existing models with limited data.
http://arxiv.org/abs/2401.14645v1
Compressor summary: The paper introduces and studies sufficient statistics for learning omnipredictors that minimize expected loss for various loss functions in supervised learning, especially focusing on the regression setting with continuous labels.
http://arxiv.org/abs/2401.14641v1
Compressor summary: The proposed CNN algorithm improves video quality by reducing artifacts and increasing resolution when streaming with low internet speed.
http://arxiv.org/abs/2401.14640v1
Compressor summary: The paper introduces a new benchmark (CAQA) to evaluate the quality of citations generated by language models for question-answer pairs, using fine-grained categories and knowledge graphs.
http://arxiv.org/abs/2401.14637v1
Compressor summary: T-Rex is a text-assisted retrosynthesis prediction approach that uses pre-trained language models like ChatGPT to generate descriptions and rank candidate reactants, improving the accuracy of synthesizing target molecules.
http://arxiv.org/abs/2401.14636v1
Compressor summary: The paper proposes a new technique for solving Stochastic Shortest Path Problems that reduces unnecessary computation by ignoring sub-optimal actions and improves the efficiency of the iLAO* algorithm.
http://arxiv.org/abs/2401.14630v1
Compressor summary: The paper evaluates the domain adaption ability of different Chinese Spelling Check (CSC) models using three new datasets from financial, medical, and legal domains and tests ChatGPT's performance as well.
http://arxiv.org/abs/2401.14626v1
Compressor summary: This paper proposes Lifelong Scene Graph Generation (LSGG), a novel framework that enables scene graph generation models to continuously learn new relationships without forgetting previous knowledge, using a limited number of exemplars and in-context learning techniques.
http://arxiv.org/abs/2401.14625v1
Compressor summary: The paper proposes an Error Explainable Benchmark dataset to evaluate automatic speech recognition models based on both speech- and text-level aspects, improving user satisfaction and understanding system weaknesses.
http://arxiv.org/abs/2401.14624v1
Compressor summary: Key points: - Large language models have potential but lack domain-specific data and resources - The proposed method uses a large language model to bootstrap seed information and retrieve related data from public corpora - The method creates a high-quality dataset called Knowledge Pile covering four major domains - The dataset improves the performance of large language models in reasoning tests - The dataset and code are open-sourced for academic sharing Summary: The paper introduces an efficient data collection method that uses a large language model to create a high-quality dataset called Knowledge Pile, which enhances the reasoning ability of large language models and is open-sourced for academic use.
http://arxiv.org/abs/2401.14619v1
Compressor summary: ResiTTA is a test-time adaptation method that improves model performance by using resilient batch normalization and an entropy-driven memory bank to handle domain shifts and non-i.i.d. test samples.
http://arxiv.org/abs/2401.14616v1
Compressor summary: Alternative Speech is a new approach that offers practical solutions to hate speech by correcting speakers and promoting social change, while working alongside counter-narratives.
http://arxiv.org/abs/2401.14609v1
Compressor summary: The paper proposes a data-physics-hybrid method called PISAL to solve PDEs for complex industrial systems with heterogeneous media and unknown parameters or time-varying interfaces, using synchronic-adaptive learning strategy.
http://arxiv.org/abs/2401.14591v1
Compressor summary: The text describes a new method to learn nonlinear dynamics from PDEs using an autoencoder with an evolving manifold latent space based on Ricci flow.
http://arxiv.org/abs/2401.14589v1
Compressor summary: The study used a GPT-4 Turbo-based multi-agent system to simulate clinical decision-making conversations and improve diagnosis accuracy by mitigating cognitive biases.
http://arxiv.org/abs/2401.14587v1
Compressor summary: The proposed CNA-TTA method addresses domain shift by selectively training a model with clean and noisy regions in target data clusters using cluster structure and mixup inputs.
http://arxiv.org/abs/2401.14585v1
Compressor summary: The paper introduces DSS-OG, a new optimization method that improves upon the conventional stochastic gradient method for nonconvex problems and distributed scenarios, with a complexity comparable to its counterpart.
http://arxiv.org/abs/2401.14580v1
Compressor summary: Key points: - The paper proposes a model-agnostic framework to enhance Physics-informed Graph Neural Networks (PGINNs) for learning through graph-structured data - The framework introduces additional nodes and rewiring connections with both positive and negative weights, guided by node labeling information - The framework improves GNNs' performance on over-smoothing, over-squashing, and heterophily adaption issues Summary: The paper presents a method to improve PGINNs by adding nodes and rewiring connections based on node labels, addressing common GNN challenges.
http://arxiv.org/abs/2401.14579v1
Compressor summary: The study presents an advanced method for recognizing ingredients segmented from food images using CNNs and novel algorithms, with a focus on multi-ingredient recognition.
http://arxiv.org/abs/2401.14578v1
Compressor summary: Graph Output Attribution (GOAt) is a novel method to explain Graph Neural Networks (GNNs) by attributing graph outputs to input features, resulting in faithful, discriminative, and stable explanations.