arxiv compressed, 2024-08-13

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-13 generated by the compressor, my personal LLM-based project.


LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification

http://arxiv.org/abs/2408.06335v1

Compressor summary: The paper proposes Colbert, a model for humor detection that combines syntactic, semantic, and contextual features with BERT embeddings to capture sentence congruity and improve accuracy on unseen data.


FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection

http://arxiv.org/abs/2408.06333v1

Compressor summary: FastFiD is a novel approach to improve Open Domain Question Answering speed by selecting valuable sentences from encoded passages, achieving up to 5.7X faster inference with maintained performance on three datasets.


Animate, or Inanimate, That is the Question for Large Language Models

http://arxiv.org/abs/2408.06332v1

Compressor summary: The paper investigates if LLMs can process animacy like humans do, using different prompting approaches and finding that they show human-like behavior when dealing with typical animate and inanimate entities.


VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

http://arxiv.org/abs/2408.06327v1

Compressor summary: VisualAgentBench (VAB) is a new benchmark for evaluating large multimodal models as visual foundation agents in diverse scenarios, showing their strengths and weaknesses and helping future development.


Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example

http://arxiv.org/abs/2408.06318v1

Compressor summary: The paper investigates how large language models perform in real-world planning tasks, identifying their weaknesses and proposing Feedback-Aware Fine-Tuning to improve them.


OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment

http://arxiv.org/abs/2408.06310v1

Compressor summary: OWL2Vec4OA is a new method for aligning ontologies that uses edge confidence values to guide its random walk strategy, improving its performance for this task.


Long-Form Answers to Visual Questions from Blind and Low Vision People

http://arxiv.org/abs/2408.06303v1

Compressor summary: VizWiz-LF is a dataset of long-form visual question answers for blind and low vision users, which can provide explanations and suggestions but may contain incorrect details when generated by VQA models.


Finding Patterns in Ambiguity: Interpretable Stress Testing in the Decision~Boundary

http://arxiv.org/abs/2408.06302v1

Compressor summary: The paper introduces a method to improve the interpretability of deep binary classifiers by selecting and visualizing representative samples from their decision boundary, helping to identify low-confidence decisions and ensure reliable machine learning systems.


LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization

http://arxiv.org/abs/2408.06297v1

Compressor summary: The paper proposes a robust online convex optimization framework with a novel non-convex loss function and provides regret guarantees and experimental validation.


The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

http://arxiv.org/abs/2408.06292v1

Compressor summary: The paper presents The AI Scientist, a framework that enables large language models to independently conduct scientific research and communicate their findings in the form of papers.


Mambular: A Sequential Model for Tabular Deep Learning

http://arxiv.org/abs/2408.06291v1

Compressor summary: Mambular is a modified Mamba architecture optimized for tabular data analysis that shows competitive performance against state-of-the-art models and explores various adaptations for better understanding and expanding deep learning applications in this domain.


Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM

http://arxiv.org/abs/2408.06285v1

Compressor summary: SynDial is an approach that uses a large language model to generate and refine synthetic medical dialogues from clinical notes while ensuring privacy, quality, and performance.


Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering

http://arxiv.org/abs/2408.06286v1

Compressor summary: The paper proposes a method to improve 3D Gaussian Splatting for novel view synthesis by making Gaussians adaptive to different scales using self-adjusting properties and distribution, inspired by mipmap technique.


MovieSum: An Abstractive Summarization Dataset for Movie Screenplays

http://arxiv.org/abs/2408.06281v1

Compressor summary: The paper introduces MovieSum, a new dataset for abstractive summarization of movie screenplays, which is larger and more diverse than existing datasets.


Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation

http://arxiv.org/abs/2408.06276v1

Compressor summary: EXP3RT is a novel Large Language Model-based recommender that extracts preferences from user and item reviews, generates detailed reasoning, and predicts ratings with high accuracy and explainability.


FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data

http://arxiv.org/abs/2408.06273v1

Compressor summary: FuxiTranyu is a multilingual language model that performs well on various tasks across different languages, and it is open-source for research purposes.


Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

http://arxiv.org/abs/2408.06266v1

Compressor summary: The text proposes CLAIR and APO, methods to improve LLM alignment using contrastive preference pairs and controllable objectives, achieving significant gains in performance.


DUNE: A Machine Learning Deep UNet++ based Ensemble Approach to Monthly, Seasonal and Annual Climate Forecasting

http://arxiv.org/abs/2408.06262v1

Compressor summary: The text introduces a new AI architecture (DUNE) that uses ERA5 monthly data to make global monthly and seasonal temperature forecasts, which outperform existing methods in accuracy and resolution.


Open-Source Molecular Processing Pipeline for Generating Molecules

http://arxiv.org/abs/2408.06261v1

Compressor summary: The authors introduce open-source infrastructure to easily build generative molecular models into the DeepChem library using MolGAN and Normalizing Flows.


Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning

http://arxiv.org/abs/2408.06259v1

Compressor summary: The proposed framework generates diverse and coherent stories from image sequences using pretrained models and a multimodal contrastive objective.


Latent Disentanglement for Low Light Image Enhancement

http://arxiv.org/abs/2408.06245v1

Compressor summary: Key points: - The paper proposes a new method for low light image enhancement (LDE-Net) based on latent disentanglement and content-aware embedding. - The method separates the input image into clean Content and Illumination components in latent space. - The method improves the Illumination component using the Content features and achieves superior performance on various LLIE benchmarks and downstream tasks. Summary: The paper introduces a novel low light image enhancement method (LDE-Net) that disentangles the input image into clean Content and Illumination components in latent space and enhances the Illumination using Content features, showing better results on LLIE benchmarks and downstream tasks.


FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks

http://arxiv.org/abs/2408.06227v1

Compressor summary: FLEURS-R is a speech restoration model for 102 languages that improves audio quality and fidelity, enabling better speech technology in low-resource languages.


On Effects of Steering Latent Representation for Large Language Model Unlearning

http://arxiv.org/abs/2408.06223v1

Compressor summary: The paper explores Representation Misdirection for Unlearning (RMU), a method for deleting information from large language models, and proposes Adaptive RMU to improve its effectiveness across different layers.


A Digital Twin Framework Utilizing Machine Learning for Robust Predictive Maintenance: Enhancing Tire Health Monitoring

http://arxiv.org/abs/2408.06220v1

Compressor summary: The text introduces a digital twin framework that uses machine learning to predict tire health, update its model, and make maintenance decisions, enhancing automotive safety and efficiency.


Computability of Classification and Deep Learning: From Theoretical Limits to Practical Feasibility through Quantization

http://arxiv.org/abs/2408.06212v1

Compressor summary: The text discusses the limitations of deep learning in providing performance guarantees for safety-critical applications due to its lack of trustworthiness and computational feasibility issues.


Strategy Game-Playing with Size-Constrained State Abstraction

http://arxiv.org/abs/2408.06202v1

Compressor summary: The paper proposes a state abstraction technique for strategy games that limits the number of nodes grouped, improving AI search performance and avoiding the need to abandon abstractions.


Dynamic Blocked Clause Elimination for Projected Model Counting

http://arxiv.org/abs/2408.06199v1

Compressor summary: The paper presents a novel method to use blocked clause elimination for projected model counting, which counts the number of models of a propositional formula after removing existentially projected variables, by focusing on them during the search and introducing a new data structure.


Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

http://arxiv.org/abs/2408.06195v1

Compressor summary: rStar is a self-play method that improves reasoning in small language models by generating and verifying human-like reasoning trajectories without fine-tuning or superior models.


Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting

http://arxiv.org/abs/2408.06186v1

Compressor summary: Structural diversity is a user-defined metric for measuring text diversity based on features, and chain-of-specification prompting helps improve it in large language models.


Blind-Match: Efficient Homomorphic Encryption-Based 1:N Matching for Privacy-Preserving Biometric Identification

http://arxiv.org/abs/2408.06167v1

Compressor summary: Blind-Match is a fast and private biometric identification system that uses homomorphic encryption for efficient 1:N matching and outperforms existing methods.


OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

http://arxiv.org/abs/2408.06158v1

Compressor summary: OmniCLIP is a video recognition framework that adapts CLIP by learning omni-scale features using spatial-temporal blocks, temporal adapters, and a self-prompt generator, achieving improved performance in various tasks.


Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance

http://arxiv.org/abs/2408.06157v1

Compressor summary: HawkI++ is a method for generating camera-controlled views from a single image that handles complex scenes without 3D data or extensive training.


LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library

http://arxiv.org/abs/2408.06150v1

Compressor summary: Key points: - The study generates and maintains a database of 10 million virtual lipids for various tasks - It proposes LipidBERT, a BERT-like model pre-trained with different secondary tasks - It compares the performance of LipidBERT and PhatGPT on downstream tasks - It demonstrates the effectiveness of a pre-trained language model on virtual lipids and its integration with wet-lab data Summary: The study presents LipidBERT, a BERT-like model that learns from a large database of virtual lipids and predicts LNP properties, and shows its superior performance over GPT-like models and its usefulness for screening and in vivo testing.


Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models

http://arxiv.org/abs/2408.06145v1

Compressor summary: The proposed point cloud U-Net diffusion architecture can generate diverse and high-quality 3D shapes quickly and efficiently, outperforming previous methods on various tasks such as unconditional generation, conditional generation, implicit generation, completion, and super-resolution.


Med42-v2: A Suite of Clinical LLMs

http://arxiv.org/abs/2408.06142v1

Compressor summary: Med42-v2 is a new generation of large language models tailored for healthcare, with improved performance and ability to handle clinical queries and reasoning tasks.


MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

http://arxiv.org/abs/2408.06137v1

Compressor summary: MR3D-Net is a dynamic architecture for fusing LiDAR data from multiple vehicles, improving 3D object detection and reducing bandwidth requirements.


Utilize Transformers for translating Wikipedia category names

http://arxiv.org/abs/2408.06124v1

Compressor summary: The authors built and fine-tuned language models to translate Wikipedia categories from English to Vietnamese, achieving high performance with minimal resources.


DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection

http://arxiv.org/abs/2408.06123v1

Compressor summary: The paper introduces DPDETR, a method for robust object detection using infrared and visible images, addressing modality misalignment and fusing complementary features with decoupled position attention and training strategies.


A Methodological Report on Anomaly Detection on Dynamic Knowledge Graphs

http://arxiv.org/abs/2408.06121v1

Compressor summary: The paper proposes an ensemble learning-based anomaly detection approach for dynamic knowledge graphs in microservices using different graph representations and machine/deep learning models.


How ChatGPT Changed the Media's Narratives on AI: A Semi-Automated Narrative Analysis Through Frame Semantics

http://arxiv.org/abs/2408.06120v1

Compressor summary: The text analyzes how media coverage of AI changed after the launch of ChatGPT, showing increased focus on dangers and risks, and a shift in the types of threats and human-like qualities attributed to AI.


Generalization capabilities of MeshGraphNets to unseen geometries for fluid dynamics

http://arxiv.org/abs/2408.06101v1

Compressor summary: The paper explores the ability of MeshGraphNets to predict fluid dynamics around unseen shapes using a new benchmark dataset for computational fluid dynamics.


Approximating Discrimination Within Models When Faced With Several Non-Binary Sensitive Attributes

http://arxiv.org/abs/2408.06099v1

Compressor summary: Key points: - propose a fairness measure based on distances between sets from a manifold perspective, called HFM - have two optional versions to deal with multiple sensitive attributes of multiple values - introduce approximation algorithms (ApproxDist and ExtendDist) to speed up computation of distances - provide effectiveness analysis for ApproxDist under certain assumptions - show empirical results validating HFM and effectiveness of approximation algorithms Summary: The paper proposes a new fairness measure, HFM, that uses distances between sets from a manifold perspective to evaluate discrimination in ML models with multiple sensitive attributes. It also introduces two approximation algorithms to accelerate the computation and provides theoretical and empirical evidence for their effectiveness.


Building Decision Making Models Through Language Model Regime

http://arxiv.org/abs/2408.06087v1

Compressor summary: The paper proposes a new method called "Learning then Using" (LTU) that leverages large language models' generalization capabilities to improve decision making across different domains and scenarios.


Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces

http://arxiv.org/abs/2408.06083v1

Compressor summary: The paper proposes a method to improve monocular depth estimation for non-Lambertian surfaces by using regional guidance, tone-mapping augmentation, and lighting fusion.


CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

http://arxiv.org/abs/2408.06072v1

Compressor summary: CogVideoX is a large-scale diffusion transformer model that uses a 3D VAE to compress videos and an expert transformer for text-video alignment, achieving state-of-the-art results in generating coherent, long-duration videos from text prompts.


ControlNeXt: Powerful and Efficient Control for Image and Video Generation

http://arxiv.org/abs/2408.06070v1

Compressor summary: ControlNeXt is an efficient method for controllable image and video generation that reduces learnable parameters, uses minimal additional cost, and integrates with LoRA weights.


Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations

http://arxiv.org/abs/2408.06069v1

Compressor summary: The paper proposes a fully Bayesian approach to learn kernel hyperparameters and inducing points in differential Gaussian processes using coupled stochastic differential equations, improving flexibility, accuracy, and realistic posterior approximation.


Online Optimization of Curriculum Learning Schedules using Evolutionary Optimization

http://arxiv.org/abs/2408.06068v1

Compressor summary: RHEA CL is a method that uses evolutionary algorithms and curriculum learning to create effective training plans for reinforcement learning agents, which can improve their performance compared to other curriculum schedules.


Don't You (Project Around Discs)? Neural Network Surrogate and Projected Gradient Descent for Calibrating an Intervertebral Disc Finite Element Model

http://arxiv.org/abs/2408.06067v1

Compressor summary: The study introduces a novel method using a neural network surrogate and Projected Gradient Descal to efficiently and accurately calibrate finite element models of human intervertebral discs for spinal treatments, reducing calibration time from days to seconds.


An Investigation Into Explainable Audio Hate Speech Detection

http://arxiv.org/abs/2408.06065v1

Compressor summary: The paper introduces explainable audio hate speech detection methods that locate specific time intervals in speech as evidence for classification, and creates a synthetic dataset for training and evaluation.


TruVRF: Towards Triple-Granularity Verification on Machine Unlearning

http://arxiv.org/abs/2408.06063v1

Compressor summary: TruVRF is a framework that verifies the honesty of model providers when they perform machine unlearning on legacy data, using three metrics to detect different types of dishonesty.


On Tables with Numbers, with Numbers

http://arxiv.org/abs/2408.06062v1

Compressor summary: The paper criticizes the epistemic culture of computational linguistics for its reliance on tables with numbers, which are epistemically irrelevant, environmentally harmful, socially unjust, and commercially motivated.


Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games

http://arxiv.org/abs/2408.06051v1

Compressor summary: The authors introduce three enhancements to Playstyle Distance, a metric that measures playstyle similarity in games, which improves accuracy and offers insights into human cognition of similarity.


What Ails Generative Structure-based Drug Design: Too Little or Too Much Expressivity?

http://arxiv.org/abs/2408.06050v1

Compressor summary: The paper analyzes why generative models for structure-based drug design using graph neural networks perform poorly and proposes a simpler, faster, and more efficient alternative.


BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

http://arxiv.org/abs/2408.06047v1

Compressor summary: The authors propose a new method for virtual try-on using realistic synthesis without precise masks, improving performance in complex wild scenarios with minimal input.


DiagESC: Dialogue Synthesis for Integrating Depression Diagnosis into Emotional Support Conversation

http://arxiv.org/abs/2408.06044v1

Compressor summary: The authors propose the DESC dataset for assessing depression symptoms in mental health chatbots and show it has better diagnostic accuracy and conversation quality than existing data.


Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning

http://arxiv.org/abs/2408.06043v1

Compressor summary: The paper proposes Context Noise Representation Learning (CNRL) to improve dialogue speech recognition accuracy by enhancing robustness against noisy context and using decoder pre-training and noise representation learning for a context encoder.


ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers

http://arxiv.org/abs/2408.06040v1

Compressor summary: ARPA is a new architecture that combines language understanding with visual context using transformers and graph neural networks, achieving significant improvements in visual word disambiguation and paving the way for future advances in AI.


Spacetime $E(n)$-Transformer: Equivariant Attention for Spatio-temporal Graphs

http://arxiv.org/abs/2408.06039v1

Compressor summary: The SET model uses equivariant Transformers to capture both spatial and temporal dynamics in graph data, leading to better performance than non-equivariant models on the charged $N$-body problem.


Graph Clustering with Cross-View Feature Propagation

http://arxiv.org/abs/2408.06029v1

Compressor summary: GCCFP is a novel method that uses multi-view feature propagation to improve graph clustering by considering both vertex features and graph topology.


Layer-Specific Optimization: Sensitivity Based Convolution Layers Basis Search

http://arxiv.org/abs/2408.06024v1

Compressor summary: The paper proposes a new matrix decomposition method for convolutional layers to reduce model size and speed up network operations, while maintaining model quality using a subset of convolutions called basis convolutions.


HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

http://arxiv.org/abs/2408.06019v1

Compressor summary: The paper proposes a 3D head avatar creation method that uses prior knowledge from a large-scale dataset and applies it to few-shot personalization for realistic rendering, multi-view consistency, and stable animation.


DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

http://arxiv.org/abs/2408.06010v1

Compressor summary: DEEPTalk is a method to generate realistic and emotional 3D facial animations from speech using probabilistic contrastive learning and a temporally hierarchical VQ-VAE.


An Analysis for Image-to-Image Translation and Style Transfer

http://arxiv.org/abs/2408.06000v1

Compressor summary: Image-to-image translation and style transfer are different deep learning technologies that generate realistic images based on input images, but have distinct differences in concepts, forms, training modes, evaluation processes, and visualization results.