This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-09 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.05937v1
Compressor summary: Key points: - Paper introduces InstaGen, a diffusion model enhanced with an instance-level grounding head - InstaGen can generate synthetic images with arbitrary instances for object detection - InstaGen improves object detector performance in open-vocabulary and data-sparse scenarios Summary: The paper presents InstaGen, a diffusion model that can generate realistic images with any objects and improve object detection by aligning text and visual features.
http://arxiv.org/abs/2402.05935v1
Compressor summary: SPHINX-X is a large multimodal language model that improves on SPHINX by modifying its architecture, using diverse datasets, and training on different base models.
http://arxiv.org/abs/2402.05934v1
Compressor summary: The authors propose a node classification method that does not use graph neural networks (GNNs) at any stage of training or testing, achieving state-of-the-art accuracy on popular benchmarks using smoothness constraints, pseudo-labeling and neighborhood histograms.
http://arxiv.org/abs/2402.05933v1
Compressor summary: The paper explores how representing time series in the frequency domain can improve score-based diffusion models for generative modelling and shows empirical evidence that frequency diffusion models perform better than time diffusion models.
http://arxiv.org/abs/2402.05929v1
Compressor summary: The Interactive Agent Foundation Model trains AI agents using diverse pre-training strategies to perform well in various applications like robotics, gaming, and healthcare.
http://arxiv.org/abs/2402.05930v1
Compressor summary: The authors introduce WEBLINX, a benchmark for conversational web navigation tasks, and evaluate different models on it, finding that finetuned multimodal models perform best but still struggle with generalization to new websites.
http://arxiv.org/abs/2402.05928v1
Compressor summary: This paper studies learning with dependent data using a special class of functions and shows how to minimize the risk without depending on the mixing time of the data.
http://arxiv.org/abs/2402.05919v1
Compressor summary: The paper proposes a method to generate physically-based rendering (PBR) images directly without relying on RGB images, using a novel cross-network communication paradigm.
http://arxiv.org/abs/2402.05917v1
Compressor summary: The paper introduces Point-VOS, a video object segmentation method that reduces annotation effort by using sparse point-wise annotations instead of dense masks.
http://arxiv.org/abs/2402.05916v1
Compressor summary: GenEFT is a framework that uses physics-inspired concepts to study neural network generalization, showing how data size, decoder strength, and learning rates affect the balance between generalization and overfitting.
http://arxiv.org/abs/2402.05913v1
Compressor summary: The proposed progressive subnetwork training framework trains smaller subsets of layers in a large language model at each step, achieving better pre-training loss, fewer FLOPs, and improved downstream performance compared to standard training or gradual stacking methods.
http://arxiv.org/abs/2402.05906v1
Compressor summary: The paper proposes a risk-sensitive reinforcement learning algorithm for non-cooperative multi-agent settings based on cumulative prospect theory, which can capture human loss aversion and probabilistic bias.
http://arxiv.org/abs/2402.05904v1
Compressor summary: FACT-GPT is a system that uses large language models to help fact-check claims on social media, speeding up the process and improving accuracy.
http://arxiv.org/abs/2402.05902v1
Compressor summary: ClickSAM is a method that improves the Segment Anything Model's ability to segment ultrasound images by fine-tuning it with click prompts from human annotators.
http://arxiv.org/abs/2402.05894v1
Compressor summary: LinguGKD is a novel framework that uses language models as teachers and graph neural networks as students to improve node classification in text-attributed graphs, achieving faster inference speed and better performance than traditional methods.
http://arxiv.org/abs/2402.05892v1
Compressor summary: Mamba-ND extends Mamba's state space models to handle arbitrary multi-dimensional data with improved efficiency compared to Transformers and other alternatives.
http://arxiv.org/abs/2402.05889v1
Compressor summary: CREMA is a modular framework that injects new modalities into video reasoning tasks by using existing pre-trained models, query transformers, and fusion modules to efficiently integrate diverse data types for response generation.
http://arxiv.org/abs/2402.05885v1
Compressor summary: EUGENE is an efficient algebraic method that approximates Graph Edit Distance and provides explanatory edit paths without requiring ground truth or data-specific training, achieving high accuracy in comparison to existing neural approaches.
http://arxiv.org/abs/2402.05880v1
Compressor summary: LLM-based conversational search may increase selective exposure and reinforce biased opinions, which could have significant implications for users and policymakers.
http://arxiv.org/abs/2402.05869v1
Compressor summary: The paper proposes a method to learn depth and surface normal from images using geometric context, which improves the accuracy of 3D geometry estimation and outperforms existing methods on various datasets.
http://arxiv.org/abs/2402.05868v1
Compressor summary: PromptCrypt is a mechanism that encrypts user inputs with Emoji to protect privacy while using cloud-based large language models without compromising their performance.
http://arxiv.org/abs/2402.05864v1
Compressor summary: The paper introduces Permute-and-Flip decoder, a faster and more robust method for large language model decoding with low false positive rate and high recall, and a tailored watermarking scheme to protect the generated text.
http://arxiv.org/abs/2402.05863v1
Compressor summary: The paper introduces NegotiationArena, a framework for evaluating how large language models (LLMs) can negotiate with each other in various scenarios, revealing their negotiation tactics, outcomes, and irrationalities.
http://arxiv.org/abs/2402.05862v1
Compressor summary: GraphToken is a method to represent structured data in language models, improving their performance on various reasoning tasks.
http://arxiv.org/abs/2402.05861v1
Compressor summary: The memory-consolidated vision transformer (MC-ViT) extends the context of video understanding far into the past by fine-tuning pre-trained video transformers to attend to non-parametrically derived memories, outperforming methods with more parameters.
http://arxiv.org/abs/2402.05860v1
Compressor summary: Key points: - DNNs for semantic segmentation in robot-assisted surgery have catastrophic forgetting problem - Privacy-preserving synthetic continual framework blends open-source old instruments with new ones and synthesized background - Overlapping class-aware temperature normalization (CAT) and multi-scale shifted-feature distillation (SD) techniques improve knowledge transfer Summary: The authors propose a privacy-preserving synthetic continual framework that blends open-source old instruments with new ones and synthesized background to address the catastrophic forgetting problem in DNNs for semantic segmentation in robot-assisted surgery. They also use CAT and SD techniques to enhance knowledge transfer.
http://arxiv.org/abs/2402.05628v1
Compressor summary: RepQuant is a novel post-training quantization method that uses complex quantizers for accurate compression and simple quantizers for efficient inference in large transformer models, achieving better performance than existing methods.
http://arxiv.org/abs/2402.05627v1
Compressor summary: The paper introduces a new mechanism called "cosine binding" that improves understanding of how Rotating Features learn object-centric representations in machine learning, similar to human cognition.
http://arxiv.org/abs/2402.05626v1
Compressor summary: The paper studies the loss landscape and stationarity conditions for one-hidden-layer neural networks with non-differentiable ReLU-like activation functions, showing how escape neurons affect the training process and network embedding.
http://arxiv.org/abs/2402.05624v1
Compressor summary: The text discusses the problem of hateful, abusive, and profane content in large language models trained on web data and the need for detecting and filtering it for creating civil and unbiased models.
http://arxiv.org/abs/2402.05617v1
Compressor summary: This survey provides an overview of deep learning methods, datasets, and terminologies in NLP-driven skill extraction and classification for computational job market analysis.
http://arxiv.org/abs/2402.05616v1
Compressor summary: Key points: - Small pretrained generative language models can be used as a general learning framework for sequence-based tasks - Instruction fine-tuning with many examples improves performance on challenging cheminformatics tasks - Data formatting and pretrained model selection are important factors for instruction fine-tuning success Summary: The authors show how to use small pretrained language models as a general learning framework for sequence-based tasks, and demonstrate the benefits of instruction fine-tuning with many examples on cheminformatics tasks. They also highlight the role of data formatting and pretrained model selection.
http://arxiv.org/abs/2402.05615v1
Compressor summary: The paper introduces DAPlankton, a new dataset for plankton recognition, which consists of phytoplankton images from different instruments and helps develop domain adaptation methods to overcome instrument-induced domain shifts.
http://arxiv.org/abs/2402.05610v1
Compressor summary: The authors propose a stereo vision method for 6D object pose estimation that uses dense features and outperforms existing methods.
http://arxiv.org/abs/2402.05608v1
Compressor summary: The paper introduces a new type of diffusion model for image generation that uses state space architecture, outperforming or matching CNN-based and Transformer-based models while being more scalable and efficient.
http://arxiv.org/abs/2402.05605v1
Compressor summary: Key points: - The text proposes an AI manager for hybrid teams of humans and autonomous systems - The manager uses Reinforcement Learning to select the best control agent based on performance and environment - The manager minimizes intervention by avoiding constraint violations - The text demonstrates the manager's effectiveness in a simulated driving scenario Summary: The text introduces an AI manager that learns to choose the best human or autonomous driver for hybrid teams, based on performance and environment, while minimizing intervention and improving team performance.
http://arxiv.org/abs/2402.05602v1
Compressor summary: Our method accurately attributes both inputs and latent representations of transformer models with efficient computation, enabling better understanding and concept-based explanations.
http://arxiv.org/abs/2402.05593v1
Compressor summary: The text describes a method to reconstruct medieval statues from their original red sketches using automated techniques and synthetic data.
http://arxiv.org/abs/2402.05591v1
Compressor summary: The paper introduces a technique to improve text data augmentation by using soft labels, which preserves the original meaning and enhances model performance in seven classification tasks.
http://arxiv.org/abs/2402.05589v1
Compressor summary: RESMatch is a new semi-supervised learning approach for referring expression segmentation that significantly improves performance by adapting to challenges in understanding free-form linguistic descriptions and object attributes variability.
http://arxiv.org/abs/2402.05584v1
Compressor summary: The paper proposes using AutoAugment to improve data augmentation for text tasks, addressing the challenges of rule-based methods and enhancing pre-trained language models.
http://arxiv.org/abs/2402.05581v1
Compressor summary: The study uses an unsupervised method with ABX tests to analyze how well vector representations of speech capture extra-linguistic and linguistic characteristics in low-resource language research.
http://arxiv.org/abs/2402.05576v1
Compressor summary: The text discusses how digital computers' finite grids affect machine learning models and proposes new generalization bounds for kernel and deep ReLU MLP regressors, using a non-asymptotic concentration of measure result.
http://arxiv.org/abs/2402.05575v1
Compressor summary: Bi-Level Fairness is a new approach for stochastic multi-armed bandits that ensures fair exposure to groups of arms and within-group meritocratic allocation, achieving optimal regret bounds and sub-linear regret with BF-UCB algorithm.
http://arxiv.org/abs/2402.05571v1
Compressor summary: The study used machine learning and deep learning models to classify tweets related to eating disorders, finding that transformer-based bidirectional encoder representations performed best.
http://arxiv.org/abs/2402.05569v1
Compressor summary: The paper proposes WCE-GNN, a framework that combines graph neural networks with weighted clique expansion, for hypergraph node classification, achieving higher accuracy and efficiency than existing methods.
http://arxiv.org/abs/2402.05566v1
Compressor summary: The paper proposes a method to combine SHAP and NSHAP, two approaches for explaining black-box models, by partitioning features into interacting sets and generating an interpretable explanation with fewer false interactions.
http://arxiv.org/abs/2402.05557v1
Compressor summary: The Convolution vision Transformer (CvT) is a new method for yield prediction on remote sensing data that combines convolution and attention, but currently lags behind other approaches.
http://arxiv.org/abs/2402.05548v1
Compressor summary: The study evaluates different classifiers for assessing expression neutrality in face images and its impact on face recognition performance.
http://arxiv.org/abs/2402.05547v1
Compressor summary: ChatCoach is a system that uses NLP and AI to help inexperienced doctors practice medical communication skills with a patient agent and receive real-time feedback from a coaching agent, using Llama2 as the most effective model.
http://arxiv.org/abs/2402.05546v1
Compressor summary: Offline actor-critic reinforcement learning can scale to large models like transformers and outperform behavioral cloning for multi-task training on continuous control tasks using a Perceiver-based model with self- and cross-attention modules.
http://arxiv.org/abs/2402.05545v1
Compressor summary: The paper presents an NER model based on SlovakBERT that extracts address parts from speech-to-text transcriptions, and shows its effectiveness when trained on synthetic data generated with GPT API.
http://arxiv.org/abs/2402.05536v1
Compressor summary: The authors propose a hybrid approach that combines knowledge graphs with AI to enhance the categorization of social media posts, particularly for identifying eating disorder-related content to aid in early diagnosis.
http://arxiv.org/abs/2402.05532v1
Compressor summary: The paper presents a new framework called Neural Contact Radiance Field (NCRF) to reconstruct hand-object interactions from sparse videos, using a contact optimization field and a hand-object neural radiance field to achieve photo-realistic novel view synthesis and accurate pose estimation.
http://arxiv.org/abs/2402.05525v1
Compressor summary: The paper proposes DP-MORL, a method for training private reinforcement learning agents from offline data using differentially private neural networks and model-based policy optimization.
http://arxiv.org/abs/2402.05521v1
Compressor summary: RLNet is a robust and efficient model that reduces latency by minimizing ReLU operations, improving performance on clean and corrupted images in client-server applications with data privacy concerns.
http://arxiv.org/abs/2402.05515v1
Compressor summary: NoisyICL perturbs language models to improve in-context learning performance, calibration, fairness, and confidence.
http://arxiv.org/abs/2402.05512v1
Compressor summary: The study introduces a cost-efficient method of data annotation for low-resource languages using large language models, and shares an image captioning dataset and the source code.
http://arxiv.org/abs/2402.05495v1
Compressor summary: Key points: - Cardiovascular diseases are a major cause of death and hard to diagnose with many variables involved - Deep learning methods combined with feature augmentation techniques can evaluate patients' risk better than existing methods - The proposed methods achieve 90% precision, a significant improvement for early detection and prevention Summary: The authors propose using deep learning and feature augmentation to improve cardiovascular disease risk assessment, achieving 90% precision and potentially saving lives.
http://arxiv.org/abs/2402.05491v1
Compressor summary: The paper proposes a non-intrusive voice analysis method using deep learning techniques to diagnose and monitor severe or non-severe Parkinson's disease, achieving high success rates and outperforming previous approaches.
http://arxiv.org/abs/2402.05476v1
Compressor summary: A novel model-free ensemble reinforcement learning algorithm adapts classical Q-learning to solve network control problems more efficiently and accurately in unknown environments.
http://arxiv.org/abs/2402.05472v1
Compressor summary: QA-ViT is a Question Aware Vision Transformer that improves multimodal reasoning by embedding question awareness in the vision encoder, resulting in better visual features tailored to image questions.
http://arxiv.org/abs/2402.05468v1
Compressor summary: The new algorithm optimizes distributions from parameterized stochastic diffusions by jointly performing optimization and sampling steps, leveraging advances in bilevel optimization and automatic implicit differentiation.
http://arxiv.org/abs/2402.05467v1
Compressor summary: RIPPLE is a new method that optimizes jailbreaking prompts for language models using subconsciousness and echopraxia, achieving high success rates and evading detection.
http://arxiv.org/abs/2402.05457v1
Compressor summary: The text describes a new method, Uncertainty-Aware Dynamic Fusion (UADF), that improves generative error correction in automatic speech recognition by incorporating acoustic information into the language model's output.
http://arxiv.org/abs/2402.05455v1
Compressor summary: The study examines if Language Models can replace human evaluators in psycholinguistics by generating plausibility judgments for linguistic materials and finds that they work well for coarse-grained judgments but not for fine-grained ones.
http://arxiv.org/abs/2402.05453v1
Compressor summary: The Convex-Concave Loss method enhances privacy and utility in machine learning models by reducing loss variance and increasing the variability of training losses with a concave term.
http://arxiv.org/abs/2402.05448v1
Compressor summary: The paper introduces Minecraft-ify, a system that generates face-focused textures for 3D virtual characters in the Minecraft game using AI techniques like StyleGAN and StyleCLIP.
http://arxiv.org/abs/2402.05445v1
Compressor summary: IR-QLoRA improves the accuracy of compact LLMs through information retention using statistics-based quantization and finetuning-based elastic connections, with minimal additional time consumption.
http://arxiv.org/abs/2402.05443v1
Compressor summary: The paper introduces a scalable Wasserstein Gradient Flow model that reduces training complexity and achieves competitive performance in image generation.
http://arxiv.org/abs/2402.05441v1
Compressor summary: The paper presents a compact spiking neural network that recognizes hand gestures in different light conditions using photon intensity data and compares its performance with a conventional CNN and an SMLP.
http://arxiv.org/abs/2402.05440v1
Compressor summary: The text discusses improving AI communication skills by using language models to enhance task understanding in Minecraft dataset and shows better results than previous methods.
http://arxiv.org/abs/2402.05439v1
Compressor summary: UTE is a novel algorithm that uses uncertainty measurement to improve reinforcement learning with action repetition by balancing exploration and exploitation.
http://arxiv.org/abs/2402.05435v1
Compressor summary: The study evaluates GPT-4's ability to generate coherent narratives about life events and develops Machine Learning models to classify the generated narratives as valid or invalid.
http://arxiv.org/abs/2402.05428v1
Compressor summary: The paper proposes two models based on mixture density networks (MDNs) for classification tasks, which fit Gaussian mixtures to data and use them to classify samples by evaluating the cumulative distribution function. The models perform well in a real-world product bundling application.
http://arxiv.org/abs/2402.05427v1
Compressor summary: The paper analyzes implicit neural representations using sampling theory and shows that sinc activations and dynamical systems can improve signal encoding.
http://arxiv.org/abs/2402.05424v1
Compressor summary: Neural circuit diagrams are a new graphical language that improves communication of deep learning architectures by precisely showing data arrangement, operations, and parallel behavior, enabling better implementation, analysis, innovation, and ethical assurance.
http://arxiv.org/abs/2402.05423v1
Compressor summary: The paper proposes a new spiking neural network model that can handle complex, non-stationary time series data and achieves better performance on three tasks.
http://arxiv.org/abs/2402.05421v1
Compressor summary: DiffTOP is a new method for deep reinforcement and imitation learning that uses differentiable trajectory optimization to learn cost and dynamics functions end-to-end, achieving state-of-the-art results on various robotic tasks.
http://arxiv.org/abs/2402.05417v1
Compressor summary: The paper proposes a new text captcha classification model using connectionist temporal classification loss that achieves high accuracy and handles complex captchas.
http://arxiv.org/abs/2402.05410v1
Compressor summary: SpirDet is a novel approach that uses a dual-branch sparse decoder and a lightweight DO-RepEncoder to efficiently detect small infrared targets, achieving faster inference speed and fewer parameters than existing models.
http://arxiv.org/abs/2402.05408v1
Compressor summary: The Multi-Instance Generation (MIG) task involves generating diverse instances at designated locations with accurate attributes, and a new approach called MIGC is proposed to tackle this challenge using instance enhancement attention and stable diffusion.
http://arxiv.org/abs/2402.05406v1
Compressor summary: The authors propose Bonsai, a method to prune large language models using only forward passes, which results in small, fast, and accurate models that outperform existing methods.
http://arxiv.org/abs/2402.05403v1
Compressor summary: LEAP is a method to improve LLMs' performance on various tasks by learning from mistakes and general principles derived from few input-output examples, outperforming standard few-shot prompting approaches.
http://arxiv.org/abs/2402.05401v1
Compressor summary: This paper investigates how different types of adaptive activation functions affect the accuracy and uncertainty of neural networks in settings with limited data, finding that individual trainable parameters lead to better results than fixed or identical ones.
http://arxiv.org/abs/2402.05400v1
Compressor summary: Loss Conditional Training (LCT) improves imbalanced binary classification by training over a family of loss functions, making models more general and robust to hyperparameter choices.
http://arxiv.org/abs/2402.05398v1
Compressor summary: The study presents a streamlined model that can produce high-resolution semantic segmentations without downscaling images or losing details, improving performance using bottom-up information propagation and Noisy Student Training.
http://arxiv.org/abs/2402.05396v1
Compressor summary: TASER is a novel adaptive sampling method for Temporal Graph Neural Networks that improves accuracy, efficiency, and scalability by selecting optimal mini-batches and temporal neighbors based on various properties of the graph data.
http://arxiv.org/abs/2402.05394v1
Compressor summary: ExpressCount is a novel method that uses language-guided exemplar learning to improve zero-shot object counting efficiency and generality by exploiting semantic priors from pre-trained Large Language Models and enhancing similarity learning with dual-branch and cross-attention schemes.
http://arxiv.org/abs/2402.05391v1
Compressor summary: The text surveys over 300 articles on Knowledge Graphs (KGs) and Multi-Modal Knowledge Graphs (MMKGs), exploring their construction, tasks, challenges, and emerging trends in AI applications.
http://arxiv.org/abs/2402.05382v1
Compressor summary: MoCE is a novel MAE-based pre-training method that uses cluster-conditional gates to train each expert with semantically relevant images, improving performance on diverse downstream tasks.
http://arxiv.org/abs/2402.05379v1
Compressor summary: The paper explores two methods for estimating the Fisher information matrix in neural networks, analyzing their accuracy, sample complexity, and trade-offs based on variance dependencies and nonlinearities.
http://arxiv.org/abs/2402.05376v1
Compressor summary: The paper proposes a new zero-shot prompting method for large language models that uses evolutionary algorithms to generate diverse prompts and improve reasoning performance on different tasks.
http://arxiv.org/abs/2402.05375v1
Compressor summary: The paper proposes two methods to improve text-to-image diffusion models by removing unwanted content from text embeddings, enhancing their ability to generate images as described in the prompt.
http://arxiv.org/abs/2402.05374v1
Compressor summary: The paper introduces Culturally-aware Image Captioning, a framework that generates detailed captions describing cultural elements in images, such as traditional clothing from Asian cultures.
http://arxiv.org/abs/2402.05370v1
Compressor summary: The authors propose an attention map structure for transformer-based models to improve multivariate time series forecasting accuracy by leveraging temporal relationships and robust kernel representation.
http://arxiv.org/abs/2402.05369v1
Compressor summary: The paper proposes a general framework for aligning language models using Noise Contrastive Estimation and introduces two algorithms, NCA and InfoNCA, that handle explicit evaluation rewards and preferences.
http://arxiv.org/abs/2402.05367v1
Compressor summary: The paper proposes an optimistic algorithm for preferential Bayesian optimization using preference feedback and a confidence set of the black-box function, with an information-theoretic bound on the cumulative regret and a guaranteed convergence rate.
http://arxiv.org/abs/2402.05359v1
Compressor summary: Our method improves LLM's ability to handle tasks with repetitive sub-tasks and deceptive contents using a Divide-and-Conquer program that enhances expressive power and avoids intermediate errors.
http://arxiv.org/abs/2402.05356v1
Compressor summary: Key points: - The paper proposes a new scoring function based on learning complexity to prune informative samples for fine-tuning over-parameterized models. - The method is efficient and effective for classification tasks and outperforms full training for instruction fine-tuning of language models. Summary: The paper introduces a learning complexity-based scoring function for data pruning in fine-tuning over-parameterized models, achieving high performance and efficiency for both vision and language tasks.
http://arxiv.org/abs/2402.05350v1
Compressor summary: The paper introduces DESCAN-18K, a dataset for image restoration from scanned copies with complex degradations, and proposes DescanDiffusion, a model that uses an encoder and a conditional denoising diffusion probabilistic model to restore high-quality images.
http://arxiv.org/abs/2402.05349v1
Compressor summary: Pyro is a web-scraped dataset of annotated wildfire videos for early detection and rapid response, improving object detection models when combined with other datasets.
http://arxiv.org/abs/2402.05346v1
Compressor summary: The KIX framework helps artificial agents learn generalist behaviors by interacting with objects and using type space to acquire transferable interaction concepts.