This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-13 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.06335v1
Compressor summary: The paper proposes Colbert, a model for humor detection that combines syntactic, semantic, and contextual features with BERT embeddings to capture sentence congruity and improve accuracy on unseen data.
http://arxiv.org/abs/2408.06333v1
Compressor summary: FastFiD is a novel approach to improve Open Domain Question Answering speed by selecting valuable sentences from encoded passages, achieving up to 5.7X faster inference with maintained performance on three datasets.
http://arxiv.org/abs/2408.06332v1
Compressor summary: The paper investigates if LLMs can process animacy like humans do, using different prompting approaches and finding that they show human-like behavior when dealing with typical animate and inanimate entities.
http://arxiv.org/abs/2408.06327v1
Compressor summary: VisualAgentBench (VAB) is a new benchmark for evaluating large multimodal models as visual foundation agents in diverse scenarios, showing their strengths and weaknesses and helping future development.
http://arxiv.org/abs/2408.06318v1
Compressor summary: The paper investigates how large language models perform in real-world planning tasks, identifying their weaknesses and proposing Feedback-Aware Fine-Tuning to improve them.
http://arxiv.org/abs/2408.06310v1
Compressor summary: OWL2Vec4OA is a new method for aligning ontologies that uses edge confidence values to guide its random walk strategy, improving its performance for this task.
http://arxiv.org/abs/2408.06303v1
Compressor summary: VizWiz-LF is a dataset of long-form visual question answers for blind and low vision users, which can provide explanations and suggestions but may contain incorrect details when generated by VQA models.
http://arxiv.org/abs/2408.06302v1
Compressor summary: The paper introduces a method to improve the interpretability of deep binary classifiers by selecting and visualizing representative samples from their decision boundary, helping to identify low-confidence decisions and ensure reliable machine learning systems.
http://arxiv.org/abs/2408.06297v1
Compressor summary: The paper proposes a robust online convex optimization framework with a novel non-convex loss function and provides regret guarantees and experimental validation.
http://arxiv.org/abs/2408.06292v1
Compressor summary: The paper presents The AI Scientist, a framework that enables large language models to independently conduct scientific research and communicate their findings in the form of papers.
http://arxiv.org/abs/2408.06291v1
Compressor summary: Mambular is a modified Mamba architecture optimized for tabular data analysis that shows competitive performance against state-of-the-art models and explores various adaptations for better understanding and expanding deep learning applications in this domain.
http://arxiv.org/abs/2408.06285v1
Compressor summary: SynDial is an approach that uses a large language model to generate and refine synthetic medical dialogues from clinical notes while ensuring privacy, quality, and performance.
http://arxiv.org/abs/2408.06286v1
Compressor summary: The paper proposes a method to improve 3D Gaussian Splatting for novel view synthesis by making Gaussians adaptive to different scales using self-adjusting properties and distribution, inspired by mipmap technique.
http://arxiv.org/abs/2408.06281v1
Compressor summary: The paper introduces MovieSum, a new dataset for abstractive summarization of movie screenplays, which is larger and more diverse than existing datasets.
http://arxiv.org/abs/2408.06276v1
Compressor summary: EXP3RT is a novel Large Language Model-based recommender that extracts preferences from user and item reviews, generates detailed reasoning, and predicts ratings with high accuracy and explainability.
http://arxiv.org/abs/2408.06273v1
Compressor summary: FuxiTranyu is a multilingual language model that performs well on various tasks across different languages, and it is open-source for research purposes.
http://arxiv.org/abs/2408.06266v1
Compressor summary: The text proposes CLAIR and APO, methods to improve LLM alignment using contrastive preference pairs and controllable objectives, achieving significant gains in performance.
http://arxiv.org/abs/2408.06262v1
Compressor summary: The text introduces a new AI architecture (DUNE) that uses ERA5 monthly data to make global monthly and seasonal temperature forecasts, which outperform existing methods in accuracy and resolution.
http://arxiv.org/abs/2408.06261v1
Compressor summary: The authors introduce open-source infrastructure to easily build generative molecular models into the DeepChem library using MolGAN and Normalizing Flows.
http://arxiv.org/abs/2408.06259v1
Compressor summary: The proposed framework generates diverse and coherent stories from image sequences using pretrained models and a multimodal contrastive objective.
http://arxiv.org/abs/2408.06245v1
Compressor summary: Key points: - The paper proposes a new method for low light image enhancement (LDE-Net) based on latent disentanglement and content-aware embedding. - The method separates the input image into clean Content and Illumination components in latent space. - The method improves the Illumination component using the Content features and achieves superior performance on various LLIE benchmarks and downstream tasks. Summary: The paper introduces a novel low light image enhancement method (LDE-Net) that disentangles the input image into clean Content and Illumination components in latent space and enhances the Illumination using Content features, showing better results on LLIE benchmarks and downstream tasks.
http://arxiv.org/abs/2408.06227v1
Compressor summary: FLEURS-R is a speech restoration model for 102 languages that improves audio quality and fidelity, enabling better speech technology in low-resource languages.
http://arxiv.org/abs/2408.06223v1
Compressor summary: The paper explores Representation Misdirection for Unlearning (RMU), a method for deleting information from large language models, and proposes Adaptive RMU to improve its effectiveness across different layers.
http://arxiv.org/abs/2408.06220v1
Compressor summary: The text introduces a digital twin framework that uses machine learning to predict tire health, update its model, and make maintenance decisions, enhancing automotive safety and efficiency.
http://arxiv.org/abs/2408.06212v1
Compressor summary: The text discusses the limitations of deep learning in providing performance guarantees for safety-critical applications due to its lack of trustworthiness and computational feasibility issues.
http://arxiv.org/abs/2408.06202v1
Compressor summary: The paper proposes a state abstraction technique for strategy games that limits the number of nodes grouped, improving AI search performance and avoiding the need to abandon abstractions.
http://arxiv.org/abs/2408.06199v1
Compressor summary: The paper presents a novel method to use blocked clause elimination for projected model counting, which counts the number of models of a propositional formula after removing existentially projected variables, by focusing on them during the search and introducing a new data structure.
http://arxiv.org/abs/2408.06195v1
Compressor summary: rStar is a self-play method that improves reasoning in small language models by generating and verifying human-like reasoning trajectories without fine-tuning or superior models.
http://arxiv.org/abs/2408.06186v1
Compressor summary: Structural diversity is a user-defined metric for measuring text diversity based on features, and chain-of-specification prompting helps improve it in large language models.
http://arxiv.org/abs/2408.06167v1
Compressor summary: Blind-Match is a fast and private biometric identification system that uses homomorphic encryption for efficient 1:N matching and outperforms existing methods.
http://arxiv.org/abs/2408.06158v1
Compressor summary: OmniCLIP is a video recognition framework that adapts CLIP by learning omni-scale features using spatial-temporal blocks, temporal adapters, and a self-prompt generator, achieving improved performance in various tasks.
http://arxiv.org/abs/2408.06157v1
Compressor summary: HawkI++ is a method for generating camera-controlled views from a single image that handles complex scenes without 3D data or extensive training.
http://arxiv.org/abs/2408.06150v1
Compressor summary: Key points: - The study generates and maintains a database of 10 million virtual lipids for various tasks - It proposes LipidBERT, a BERT-like model pre-trained with different secondary tasks - It compares the performance of LipidBERT and PhatGPT on downstream tasks - It demonstrates the effectiveness of a pre-trained language model on virtual lipids and its integration with wet-lab data Summary: The study presents LipidBERT, a BERT-like model that learns from a large database of virtual lipids and predicts LNP properties, and shows its superior performance over GPT-like models and its usefulness for screening and in vivo testing.
http://arxiv.org/abs/2408.06145v1
Compressor summary: The proposed point cloud U-Net diffusion architecture can generate diverse and high-quality 3D shapes quickly and efficiently, outperforming previous methods on various tasks such as unconditional generation, conditional generation, implicit generation, completion, and super-resolution.
http://arxiv.org/abs/2408.06142v1
Compressor summary: Med42-v2 is a new generation of large language models tailored for healthcare, with improved performance and ability to handle clinical queries and reasoning tasks.
http://arxiv.org/abs/2408.06137v1
Compressor summary: MR3D-Net is a dynamic architecture for fusing LiDAR data from multiple vehicles, improving 3D object detection and reducing bandwidth requirements.
http://arxiv.org/abs/2408.06124v1
Compressor summary: The authors built and fine-tuned language models to translate Wikipedia categories from English to Vietnamese, achieving high performance with minimal resources.
http://arxiv.org/abs/2408.06123v1
Compressor summary: The paper introduces DPDETR, a method for robust object detection using infrared and visible images, addressing modality misalignment and fusing complementary features with decoupled position attention and training strategies.
http://arxiv.org/abs/2408.06121v1
Compressor summary: The paper proposes an ensemble learning-based anomaly detection approach for dynamic knowledge graphs in microservices using different graph representations and machine/deep learning models.
http://arxiv.org/abs/2408.06120v1
Compressor summary: The text analyzes how media coverage of AI changed after the launch of ChatGPT, showing increased focus on dangers and risks, and a shift in the types of threats and human-like qualities attributed to AI.
http://arxiv.org/abs/2408.06101v1
Compressor summary: The paper explores the ability of MeshGraphNets to predict fluid dynamics around unseen shapes using a new benchmark dataset for computational fluid dynamics.
http://arxiv.org/abs/2408.06099v1
Compressor summary: Key points: - propose a fairness measure based on distances between sets from a manifold perspective, called HFM - have two optional versions to deal with multiple sensitive attributes of multiple values - introduce approximation algorithms (ApproxDist and ExtendDist) to speed up computation of distances - provide effectiveness analysis for ApproxDist under certain assumptions - show empirical results validating HFM and effectiveness of approximation algorithms Summary: The paper proposes a new fairness measure, HFM, that uses distances between sets from a manifold perspective to evaluate discrimination in ML models with multiple sensitive attributes. It also introduces two approximation algorithms to accelerate the computation and provides theoretical and empirical evidence for their effectiveness.
http://arxiv.org/abs/2408.06087v1
Compressor summary: The paper proposes a new method called "Learning then Using" (LTU) that leverages large language models' generalization capabilities to improve decision making across different domains and scenarios.
http://arxiv.org/abs/2408.06083v1
Compressor summary: The paper proposes a method to improve monocular depth estimation for non-Lambertian surfaces by using regional guidance, tone-mapping augmentation, and lighting fusion.
http://arxiv.org/abs/2408.06072v1
Compressor summary: CogVideoX is a large-scale diffusion transformer model that uses a 3D VAE to compress videos and an expert transformer for text-video alignment, achieving state-of-the-art results in generating coherent, long-duration videos from text prompts.
http://arxiv.org/abs/2408.06070v1
Compressor summary: ControlNeXt is an efficient method for controllable image and video generation that reduces learnable parameters, uses minimal additional cost, and integrates with LoRA weights.
http://arxiv.org/abs/2408.06069v1
Compressor summary: The paper proposes a fully Bayesian approach to learn kernel hyperparameters and inducing points in differential Gaussian processes using coupled stochastic differential equations, improving flexibility, accuracy, and realistic posterior approximation.
http://arxiv.org/abs/2408.06068v1
Compressor summary: RHEA CL is a method that uses evolutionary algorithms and curriculum learning to create effective training plans for reinforcement learning agents, which can improve their performance compared to other curriculum schedules.
http://arxiv.org/abs/2408.06067v1
Compressor summary: The study introduces a novel method using a neural network surrogate and Projected Gradient Descal to efficiently and accurately calibrate finite element models of human intervertebral discs for spinal treatments, reducing calibration time from days to seconds.
http://arxiv.org/abs/2408.06065v1
Compressor summary: The paper introduces explainable audio hate speech detection methods that locate specific time intervals in speech as evidence for classification, and creates a synthetic dataset for training and evaluation.
http://arxiv.org/abs/2408.06063v1
Compressor summary: TruVRF is a framework that verifies the honesty of model providers when they perform machine unlearning on legacy data, using three metrics to detect different types of dishonesty.
http://arxiv.org/abs/2408.06062v1
Compressor summary: The paper criticizes the epistemic culture of computational linguistics for its reliance on tables with numbers, which are epistemically irrelevant, environmentally harmful, socially unjust, and commercially motivated.
http://arxiv.org/abs/2408.06051v1
Compressor summary: The authors introduce three enhancements to Playstyle Distance, a metric that measures playstyle similarity in games, which improves accuracy and offers insights into human cognition of similarity.
http://arxiv.org/abs/2408.06050v1
Compressor summary: The paper analyzes why generative models for structure-based drug design using graph neural networks perform poorly and proposes a simpler, faster, and more efficient alternative.
http://arxiv.org/abs/2408.06047v1
Compressor summary: The authors propose a new method for virtual try-on using realistic synthesis without precise masks, improving performance in complex wild scenarios with minimal input.
http://arxiv.org/abs/2408.06044v1
Compressor summary: The authors propose the DESC dataset for assessing depression symptoms in mental health chatbots and show it has better diagnostic accuracy and conversation quality than existing data.
http://arxiv.org/abs/2408.06043v1
Compressor summary: The paper proposes Context Noise Representation Learning (CNRL) to improve dialogue speech recognition accuracy by enhancing robustness against noisy context and using decoder pre-training and noise representation learning for a context encoder.
http://arxiv.org/abs/2408.06040v1
Compressor summary: ARPA is a new architecture that combines language understanding with visual context using transformers and graph neural networks, achieving significant improvements in visual word disambiguation and paving the way for future advances in AI.
http://arxiv.org/abs/2408.06039v1
Compressor summary: The SET model uses equivariant Transformers to capture both spatial and temporal dynamics in graph data, leading to better performance than non-equivariant models on the charged $N$-body problem.
http://arxiv.org/abs/2408.06029v1
Compressor summary: GCCFP is a novel method that uses multi-view feature propagation to improve graph clustering by considering both vertex features and graph topology.
http://arxiv.org/abs/2408.06024v1
Compressor summary: The paper proposes a new matrix decomposition method for convolutional layers to reduce model size and speed up network operations, while maintaining model quality using a subset of convolutions called basis convolutions.
http://arxiv.org/abs/2408.06019v1
Compressor summary: The paper proposes a 3D head avatar creation method that uses prior knowledge from a large-scale dataset and applies it to few-shot personalization for realistic rendering, multi-view consistency, and stable animation.
http://arxiv.org/abs/2408.06010v1
Compressor summary: DEEPTalk is a method to generate realistic and emotional 3D facial animations from speech using probabilistic contrastive learning and a temporally hierarchical VQ-VAE.
http://arxiv.org/abs/2408.06000v1
Compressor summary: Image-to-image translation and style transfer are different deep learning technologies that generate realistic images based on input images, but have distinct differences in concepts, forms, training modes, evaluation processes, and visualization results.