This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-22 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.11817v1
Compressor summary: GRAB is a synthetic graph analysis benchmark for testing large multimodal models' capabilities in interpreting figures and estimating properties of graphs.
http://arxiv.org/abs/2408.11816v1
Compressor summary: The paper explores whether an object-centric mapping helps agents learn efficiently in reinforcement learning and proposes a hierarchical model-based algorithm that outperforms existing methods.
http://arxiv.org/abs/2408.11815v1
Compressor summary: $K$-nearest neighbor language models perform well on memory-intensive tasks but struggle with multi-hop reasoning.
http://arxiv.org/abs/2408.11813v1
Compressor summary: The text introduces Supervised Embedding Alignment (SEA), a method that improves the integration of visual and language representations in Multimodal Large Language Models (MLLMs) using contrastive learning, leading to better performance and interpretability without additional data or computation.
http://arxiv.org/abs/2408.11810v1
Compressor summary: The text proposes a novel attack method for image editing based on diffusion models, which can bypass previous defenses and exploit vulnerabilities in both pixel-domain and latent-domain models.
http://arxiv.org/abs/2408.11804v1
Compressor summary: The text proposes an empirical approach using weight dynamics in deep learning to explain various phenomena, such as optimization bias, memorizing vs. generalizing networks, and sparse subnetworks.
http://arxiv.org/abs/2408.11801v1
Compressor summary: The paper introduces Story3D-Agent, a method that uses large language models to create 3D visualizations of stories with precise control and logical reasoning.
http://arxiv.org/abs/2408.11800v1
Compressor summary: The paper introduces a framework for creating domain-specific benchmarks to evaluate Retrieval Augmented Generation (RAG) in Natural Language Processing, using Human-AI teaming and a case study on wind energy permitting.
http://arxiv.org/abs/2408.11799v1
Compressor summary: Key points: - The paper proposes a method to improve an enterprise VA system's intent classification with contrastive learning and multi-task adaptation. - The method achieves high accuracy even with few training samples and outperforms commercial solutions. - The method also speeds up the inference time by pruning tokens dynamically without extra training. Summary: The paper presents a fast and accurate intent classification method for enterprise VA systems using contrastive learning, multi-task adaptation, and dynamic token pruning.
http://arxiv.org/abs/2408.11796v1
Compressor summary: The authors compress large language models using pruning and distillation, achieving high performance on benchmarks and aligning them with NeMo Aligner for instruct-tuned use.
http://arxiv.org/abs/2408.11795v1
Compressor summary: The paper introduces EE-MLLM, a multimodal language model that balances data and compute efficiency by modifying the self-attention mechanism to enable both computational and weight reuse advantages.
http://arxiv.org/abs/2408.11793v1
Compressor summary: The text describes using large language models to predict molecular properties, generate materials, and retrieve relevant chemistry information for various tasks.
http://arxiv.org/abs/2408.11791v1
Compressor summary: CLoud reward models generate critiques of assistant responses and use them to predict rewards for quality, improving preference classification accuracy and win rate in RLHF.
http://arxiv.org/abs/2408.11788v1
Compressor summary: DreamFactory is a framework that uses language models and multi-agent collaboration to create long, coherent, and stylistic videos with novel evaluation metrics and a new dataset.
http://arxiv.org/abs/2408.11785v1
Compressor summary: Key points: - TBGDiff is a novel network for video shadow detection that considers temporal and boundary information. - It uses DSA to aggregate long-term and short-term frames, SBAA to attend to shadow boundaries, and Diffusion with STEE to guide the process. - It outperforms state-of-the-art methods in experiments and is publicly available at GitHub. Summary: The paper presents TBGDiff, a new network for detecting shadows in videos that leverages temporal and boundary cues, and achieves state-of-the-art results.
http://arxiv.org/abs/2408.11779v1
Compressor summary: Personality Alignment (PA) is a method to tailor large language models' responses based on individual users' behavioral preferences, using the PAPI dataset and an optimization technique that improves efficiency and relevance of AI interactions.
http://arxiv.org/abs/2408.11778v1
Compressor summary: This paper analyzes expressive generative models and introduces a novel class of probabilistic circuits called sum of squares PCs that can be exponentially more powerful than existing ones.
http://arxiv.org/abs/2408.11775v1
Compressor summary: The paper proposes a fine-tuned system using a small language model (Phi-2) to help communicate technical standards effectively by processing diverse document formats and adapting context windows.
http://arxiv.org/abs/2408.11768v1
Compressor summary: The paper proposes a new loss function for binary flare prediction that considers ordinal flare characteristics and improves the performance of a ResNet34-based model using magnetogram features.
http://arxiv.org/abs/2408.11760v1
Compressor summary: R2GConv and SBDet use a relaxed rotation-equivariant group to handle symmetry-breaking and non-rigid transformations in visual data.
http://arxiv.org/abs/2408.11758v1
Compressor summary: MambaCSR is a framework that uses dual-interleaved scanning and position-aligned cross-scale scanning to effectively restore compressed images with contextual information.
http://arxiv.org/abs/2408.11749v1
Compressor summary: The paper explores the susceptibility of multilingual LLMs to embedding inversion attacks across various languages, scripts, and language families, and identifies patterns that could help attackers improve their methods.
http://arxiv.org/abs/2408.11748v1
Compressor summary: The paper evaluates the geometric comprehension of large Vision Language Models (VLMs) in depth and height perception, finding that they consistently struggle with these aspects, and introduces benchmark datasets to improve their capabilities.
http://arxiv.org/abs/2408.11746v1
Compressor summary: Mixed Sparsity Training (MST) is a pretraining method for large language models that reduces computational demands by up to 75% while maintaining performance.
http://arxiv.org/abs/2408.11745v1
Compressor summary: FocusLLM extends the context length of decoder-only large language models by dividing and appending chunks as prompts for parallel decoding, improving performance on long-context tasks with less training cost.
http://arxiv.org/abs/2408.11744v1
Compressor summary: The study uses a Fine-tuned Stable Diffusion Model with ControlNet (FSDMC) to refine depiction techniques from Jiehua artists and outperforms CycleGAN in style transfer.
http://arxiv.org/abs/2408.11743v1
Compressor summary: MARLIN is a technique that speeds up large language model inference on GPUs by efficiently handling batched workloads with quantization and various other optimizations.
http://arxiv.org/abs/2408.11742v1
Compressor summary: The proposed CluMo method uses a novel prompt-based approach with key-key-prompt pairs and clustering to improve generalization capacity and prevent catastrophic forgetting in multimodal continual learning for vision-language models.
http://arxiv.org/abs/2408.11735v1
Compressor summary: The paper examines large language models' advancements, applications, and challenges in the healthcare sector, emphasizing clinical efficiency, ethics, data privacy, and open-source models.
http://arxiv.org/abs/2408.11721v1
Compressor summary: Key points: - Text-to-image models struggle with counting a specified number of objects - Proposed method optimizes image generation based on a counting loss from an out-of-the-box counting model - Method has three advantages: (i) considers non-derivable counting techniques, (ii) is plug-and-play, and (iii) reuses the optimized counting token for image generation Summary: The authors propose a method to improve text-to-image models by using a counting loss from an external model and adapting it online, which offers advantages in flexibility and accuracy.
http://arxiv.org/abs/2408.11720v1
Compressor summary: The text analyzes how weight patterns in deep learning models affect performance across various datasets and model architectures, finding that successful networks share similar weight statistics and distribution.
http://arxiv.org/abs/2408.11711v1
Compressor summary: ControlCol is an automatic video colorization system that gives users control over the process and outperforms current techniques in quality and preference.
http://arxiv.org/abs/2408.11706v1
Compressor summary: FRAP is a simple approach that adapts token weights in text-to-image diffusion models to improve prompt-image alignment and authenticity, with faster latency and better realism compared to latent code optimization methods.
http://arxiv.org/abs/2408.11700v1
Compressor summary: The paper proposes a representation learning approach with ISIL, a novel loss function modification, to improve assembly state recognition and robustness to execution errors.
http://arxiv.org/abs/2408.11691v1
Compressor summary: A new method uses physical principles to improve neural network models for discovering state variables of dynamical systems.
http://arxiv.org/abs/2408.11687v1
Compressor summary: The paper proposes a new method for evaluating long-term actions in videos that improves interpretability by addressing temporal skipping and using weight-score regression.
http://arxiv.org/abs/2408.11680v1
Compressor summary: The paper proposes a first layer design for neural networks that acts as an implicit adversarial noise filter, improving robustness to adversarial attacks without the need for additional training or computation.
http://arxiv.org/abs/2408.11679v1
Compressor summary: The paper explores how the state space model mechanism in Visual State Space Model affects its robustness against backdoor attacks and suggests that adding a recurrent backdoor makes it more resilient to patch perturbations.
http://arxiv.org/abs/2408.11656v1
Compressor summary: Macformer is a Transformer architecture that uses random Maclaurin features to approximate various dot-product kernels, speeding up attention computations for long sequences.
http://arxiv.org/abs/2408.11649v1
Compressor summary: The paper proposes Video-to-Text Pedestrian Monitoring (VTPM), which generates real-time textual reports on pedestrian movements at intersections, preserving privacy and enhancing safety analysis.
http://arxiv.org/abs/2408.11632v1
Compressor summary: The paper introduces Decision Tree Policy Optimization (DTPO), an algorithm that directly optimizes complete decision trees using policy gradients, making them interpretable alternatives to neural networks in reinforcement learning settings.
http://arxiv.org/abs/2408.11629v1
Compressor summary: The paper proposes a probabilistic model for stochastic iterative algorithms that can learn their optimal parameters and predict their convergence rate and time based on empirical data.
http://arxiv.org/abs/2408.11620v1
Compressor summary: Annealed Sinkhorn, a variant of optimal transport solver, has conditions for convergence and can be improved by Debiased Annealed Sinkhorn to achieve faster annealing schedules.
http://arxiv.org/abs/2408.11609v1
Compressor summary: Xinyu is an LLM-based system that helps Chinese commentators create well-structured and logically consistent narratives by deconstructing the generation process into sequential steps and addressing advanced requirements with argument ranking, a comprehensive evidence database, and retrieval augmented generation technology.
http://arxiv.org/abs/2408.11608v1
Compressor summary: This article discusses the use of AI in arbitration, arguing that parties can choose AI arbitrators if they agree, and that this approach could improve efficiency, fairness, and flexibility in legal disputes.
http://arxiv.org/abs/2408.11599v1
Compressor summary: The paper introduces a new approach to generate empathetic responses by considering emotions and their causes, using a Chain-of-Thought prompt and external knowledge, and shows its effectiveness on LLaMA-7b.
http://arxiv.org/abs/2408.11598v1
Compressor summary: Focal loss training improves classifier calibration by raising confidence on training data and decomposing into a proper loss and a confidence-raising transformation.
http://arxiv.org/abs/2408.11589v1
Compressor summary: This paper introduces a new dataset for vehicle color recognition and shows that nighttime scenes are challenging for current models.
http://arxiv.org/abs/2408.11587v1
Compressor summary: The paper introduces EST-Bad, a novel and effective method for backdoor attacks on NLP systems using large language models, which hides the malicious trigger in the data more effectively than previous methods.
http://arxiv.org/abs/2408.11574v1
Compressor summary: The Drama Engine is a framework that uses multi-agent principles to create context-aware companions for narrative purposes with features like companion development, mood systems, and automatic context summarizing.
http://arxiv.org/abs/2408.11571v1
Compressor summary: The CHOTA metric measures all aspects of cell tracking, including global coherence, and improves biological analysis by unifying the evaluation of cell detections, local associations, and lineage tracking.
http://arxiv.org/abs/2408.11564v1
Compressor summary: AutoDirector is an AI framework that helps create realistic multi-sensory films by scheduling production steps, supporting interactive tasks, and improving user feedback.
http://arxiv.org/abs/2408.11561v1
Compressor summary: The Iterative Refinement Process (IRP) is a robust anomaly detection method that improves defect detection accuracy by removing misleading data points and outperforms traditional models in noisy environments.
http://arxiv.org/abs/2408.11554v1
Compressor summary: DCQA is a novel MCQA model that leverages token-level attention and semantic commonalities among choices to differentiate them and provide justifications for choosing the correct answer.
http://arxiv.org/abs/2408.11553v1
Compressor summary: The paper introduces an extended dataset for human generation with diverse clothing items and backgrounds, and proposes AnyDesedn, a diffusion-based method for mask-free fashion image editing that uses Fashion DiT with Fashion-Guidance Attention to fuse apparel types and features.
http://arxiv.org/abs/2408.11552v1
Compressor summary: The paper proposes a new approach to improve human activity recognition models by using data augmentation techniques that provide clear and trustworthy explanations for their decisions.
http://arxiv.org/abs/2408.11546v1
Compressor summary: This study finds that in-context learning improves large language model performance by surfacing memorized training data, with the effectiveness of this strategy depending on the level of memorization and the presence or absence of labels.
http://arxiv.org/abs/2408.11541v1
Compressor summary: The paper studies how synthetic images change over time online and how current detectors struggle to tell them apart, proposing a method to improve their performance.
http://arxiv.org/abs/2408.11540v1
Compressor summary: The study introduces 3DRRE, a novel task for reconstructing 3D scenes under rainy conditions, and presents DeRainGS, the first method tailored for this task, which performs better than existing methods.
http://arxiv.org/abs/2408.11531v1
Compressor summary: MuChaPro is a framework that uses existing single-channel despeckling methods to improve multi-channel SAR images, with potential for self-supervised learning.
http://arxiv.org/abs/2408.11527v1
Compressor summary: Google Vizier is a successful Bayesian optimization service that has improved over time and performs well on various benchmarks.
http://arxiv.org/abs/2408.11526v1
Compressor summary: RConE is a novel embedding method for logical multi-hop question answering in Multi-Modal Knowledge Graphs, capturing sub-entities of multi-modal entities as answers.
http://arxiv.org/abs/2408.11518v1
Compressor summary: EmoFace is a novel 3D virtual human model that uses Mesh Attention and self-growing training to generate realistic facial animations with emotions, overcoming data limitations and outperforming existing methods.
http://arxiv.org/abs/2408.11517v1
Compressor summary: The paper presents ImageTeller, a tool that uses GPT-4o to generate narratives from images or image sequences in various genres, allowing user interaction and influence on the story.
http://arxiv.org/abs/2408.11515v1
Compressor summary: The paper introduces a new measure, BED, that clusters similar expressions with errors and improves the smoothness of the error landscape in symbolic regression.
http://arxiv.org/abs/2408.11513v1
Compressor summary: The PDR-ANPG algorithm learns constrained Markov decision processes using entropy and quadratic regularizers, achieving improved sample complexity and last-iterate guarantees compared to previous methods.
http://arxiv.org/abs/2408.11512v1
Compressor summary: The paper presents two multilingual machine translation systems, IKUN and IKUN-C, which use large language models pre-trained on monolingual data and fine-tuned on parallel data, achieving high rankings in WMT24.
http://arxiv.org/abs/2408.11505v1
Compressor summary: MSCPT is a method that uses frozen large language models to generate multi-scale pathological visual language prior knowledge for few-shot weakly supervised whole slide image classification.
http://arxiv.org/abs/2408.11500v1
Compressor summary: SliceGCN is a distributed graph learning method that slices node features to improve scalability and efficiency on large graphs by reducing memory consumption and communication overhead.
http://arxiv.org/abs/2408.11494v1
Compressor summary: The study used a biological technique called mutagenesis screen to explore how different parameters in large language models affect their performance on various tasks, revealing complex relationships and potential ways to improve them.
http://arxiv.org/abs/2408.11493v1
Compressor summary: The study applies a cross-disease transferability framework to chest X-rays, aiming to use models trained on one pulmonary disease to diagnose another disease with similar implications for resource-limited settings and emerging diseases.
http://arxiv.org/abs/2408.11492v1
Compressor summary: The paper proposes a method to estimate causal effects involving peer interactions using attention mechanisms and multi-layer graph neural networks, considering both direct and indirect effects, and incorporating structural information to enhance the model's performance.
http://arxiv.org/abs/2408.11491v1
Compressor summary: SCANS is a method to improve safety alignment in large language models by steering activations and hidden state transitions, achieving better performance while avoiding overly cautious rejections.
http://arxiv.org/abs/2408.11490v1
Compressor summary: The paper introduces DocTabQA, a new question answering task that uses structured tables derived from documents to answer questions, and presents DocTabTalk, a two-stage framework that improves GPT-4's performance on this task.
http://arxiv.org/abs/2408.11481v1
Compressor summary: E-Bench is a benchmark suite for evaluating text-driven video editing quality based on human perception, including a database with various videos, edits, and annotators' scores, as well as a new assessment network that aligns better with human preferences.
http://arxiv.org/abs/2408.11479v1
Compressor summary: The study presents a method to transform neural networks into dissipative dynamical systems, ensuring stability, input-output stability, and energy conservation for various applications like robotic arms and fluid dynamics.
http://arxiv.org/abs/2408.11478v1
Compressor summary: LAKD is a novel knowledge distillation framework that efficiently utilizes distilled information from teacher networks, achieving higher interpretability and competitive performance on various image classification tasks.
http://arxiv.org/abs/2408.11475v1
Compressor summary: Summary: TrackGo is a novel method for controlling video generation using free-form masks and arrows, enhanced by the TrackAdapter that integrates into temporal self-attention layers of a pretrained model, achieving state-of-the-art results.
http://arxiv.org/abs/2408.11469v1
Compressor summary: The authors propose an improved version of a test to evaluate how well pretrained language models handle negation, and find that most models still struggle with it.
http://arxiv.org/abs/2408.11465v1
Compressor summary: MeTTA is a test-time adaptation method for 3D reconstruction from single view images that uses generative prior, joint optimization, and learnable virtual cameras to handle out-of-distribution cases and achieve realistic appearance with physically based rendering.
http://arxiv.org/abs/2408.11463v1
Compressor summary: The text introduces LLOT, a benchmark dataset for low-light object tracking, and proposes H-DCPT, a novel tracker that performs better than existing methods in such conditions.
http://arxiv.org/abs/2408.11457v1
Compressor summary: The text describes the creation and evaluation of a new Emakhuwa translation dataset for low-resource languages, highlighting challenges and data availability.
http://arxiv.org/abs/2408.11455v1
Compressor summary: The paper proposes a non-negative training method for actor models in deep reinforcement learning, enabling more interpretable part-based representations while respecting non-negative constraints.
http://arxiv.org/abs/2408.11451v1
Compressor summary: The text introduces SIGMA, a framework that improves sequential recommendations by integrating a bidirectional Partially Flipped Mamba model with a Dense Selective Gate and a Feature Extract GRU.
http://arxiv.org/abs/2408.11449v1
Compressor summary: The paper proposes Model Label Learning (MLL), a method to leverage expert models' functionalities for zero-shot tasks by using a Semantic Directed Acyclic Graph (SDAG) and an algorithm to select suitable models from a model hub.
http://arxiv.org/abs/2408.11448v1
Compressor summary: The paper discusses how lookism, or bias based on physical appearance, is a significant and under-explored issue in computer vision and AI technologies, and calls for systematic study and development of equitable systems.
http://arxiv.org/abs/2408.11447v1
Compressor summary: GaussianOcc is a fast and efficient method for self-supervised 3D occupancy estimation using Gaussian splatting techniques for projection and rendering.
http://arxiv.org/abs/2408.11443v1
Compressor summary: Subword regularization improves NLP models but is biased towards certain tokenizations; a new algorithm improves quality by sampling tokenizations more uniformly.
http://arxiv.org/abs/2408.11441v1
Compressor summary: The paper discusses how generative AI can harm collective knowledge and democracy by causing various types of epistemic injustice and suggests ways to design fairer AI systems.
http://arxiv.org/abs/2408.11440v1
Compressor summary: LAHAJA is a diverse benchmark dataset for Hindi ASR that reveals poor performance of existing models and highlights the need for multilingual training and better speaker representation.
http://arxiv.org/abs/2408.11439v1
Compressor summary: BAdd is a method for learning fair representations in computer vision datasets that handles both single- and multi-attribute biases by incorporating features related to these attributes into the model.
http://arxiv.org/abs/2408.11438v1
Compressor summary: Key points: - LWMs are deep learning models that use NWP inputs but not autonomous yet - DABench is a benchmark dataset for data-driven DA models using ERA5 as ground truth - DABench has four standard features to guide and evaluate data-driven weather prediction systems - DaT is a strong baseline that integrates 4D variational DA into Transformer model Summary: The paper introduces DABench, a benchmark dataset for data-driven data assimilation models in weather prediction, with a strong baseline called DaT that combines 4D variational DA and Transformer model.
http://arxiv.org/abs/2408.11433v1
Compressor summary: The Twin Machine Unlearning (TMU) technique aligns the unlearned model with the original model while removing data without affecting its accuracy.
http://arxiv.org/abs/2408.11432v1
Compressor summary: T2VIndexer is a generative model that quickly generates video identifiers to improve the efficiency of text-video retrieval while maintaining high accuracy.
http://arxiv.org/abs/2408.11431v1
Compressor summary: LaMer is a label-free framework that diagnoses and remedies knowledge gaps in large language models using curricular meaningful learning and relative entropy.
http://arxiv.org/abs/2408.11424v1
Compressor summary: Key points: - Facial expression recognition (FER) is important for emotional AI, but faces challenges in generalization and multimodality - Multimodal Large Language Models (MLLMs) can address some of these issues, but still need improvement - The paper introduces EMO-LLaMA, a novel MLLM that incorporates facial priors and age-gender-race attributes to enhance FER performance Summary: The paper presents EMO-LLaMA, a multimodal language model that leverages facial analysis and demographic information to improve facial expression recognition in emotional AI.
http://arxiv.org/abs/2408.11415v1
Compressor summary: The text discusses the challenges of using language models to generate politically nuanced content and suggests a framework for improving these representations.
http://arxiv.org/abs/2408.11413v1
Compressor summary: The paper presents Pano2Room, a method that reconstructs high-quality 3D indoor scenes from a single panoramic image using a panoramic RGBD inpainter and a 3D Gaussian Splatting field.
http://arxiv.org/abs/2408.11412v1
Compressor summary: REF is a linear-time, easy-to-use one-class classification method that performs well on various benchmark datasets and provides robust default settings.
http://arxiv.org/abs/2408.11411v1
Compressor summary: The paper proposes SelfDRSC++, a self-supervised learning framework for correcting rolling shutter distortion using dual reversed images, which can also generate high framerate global shutter videos from low framerate rolling shutter videos.
http://arxiv.org/abs/2408.11407v1
Compressor summary: The paper proposes a new knowledge distillation framework for UAV-based object detection, addressing the feature gap and background complexity issues to improve efficiency and performance.
http://arxiv.org/abs/2408.11402v1
Compressor summary: The paper proposes a new method called FFF-VDI that uses image-to-video diffusion models to improve video inpainting by addressing noise and time consistency issues.
http://arxiv.org/abs/2408.11401v1
Compressor summary: The authors compare two visualizations for ProtoPNet explanations and suggest using similarity maps instead of bounding boxes to better align with the network's purpose.
http://arxiv.org/abs/2408.11397v1
Compressor summary: EAGLE is a novel framework that uses visual enhancement to improve geometric reasoning in large language models by leveraging both CLIP and LLM features.
http://arxiv.org/abs/2408.11396v1
Compressor summary: MoE-LPR is a two-stage method to enhance multilingual capabilities in LLMs without forgetting original language proficiency by using Mixture-of-Experts and Language Priors Routing.
http://arxiv.org/abs/2408.11393v1
Compressor summary: TDA is a training-free method that improves the inference efficiency of large language models by exploiting their sparsity without compromising performance, and it reveals two critical features of LLM sparsity.
http://arxiv.org/abs/2408.11392v1
Compressor summary: The text discusses the importance of developing fairness measures for quality assessment algorithms in biometric systems to ensure equal performance across all individuals regardless of demographic characteristics.
http://arxiv.org/abs/2408.11384v1
Compressor summary: This work uses model explanation methods to identify crucial features and reduce data usage for temporal multimodal geospatial machine learning models.
http://arxiv.org/abs/2408.11382v1
Compressor summary: The text discusses switching from absolute to relative positional embeddings in neural machine translation models, showing that it improves performance with minimal or no loss while maintaining encoder-decoder architecture.
http://arxiv.org/abs/2408.11381v1
Compressor summary: RAGLAB is an open-source library that allows researchers to compare and create Retrieval Augmented Generation (RAG) algorithms for large language models.
http://arxiv.org/abs/2408.11374v1
Compressor summary: The paper presents a novel framework that tackles continual learning and machine unlearning together using controlled knowledge distillation with a memory buffer, achieving good results on benchmark datasets.
http://arxiv.org/abs/2408.11371v1
Compressor summary: The paper presents a new approach to solve decision theory problems using Probabilistic Answer Set Programming and an efficient algorithm based on Algebraic Model Counting.
http://arxiv.org/abs/2408.11370v1
Compressor summary: GRDL is a new efficient and accurate graph classification method that uses node embeddings as discrete distributions and outperforms existing methods with global pooling operations.
http://arxiv.org/abs/2408.11367v1
Compressor summary: Propper is an improved inductive logic programming method that learns from probabilistic and flawed background knowledge using neurosymbolic inference, BCE, and NoisyCombo, achieving better results for relational patterns in noisy images than binary ILP and Graph Neural Networks.
http://arxiv.org/abs/2408.11366v1
Compressor summary: GeoReasoner is a language model that leverages Large Language Models and geospatial information to improve reasoning on geospatially grounded natural language, outperforming existing methods in three tasks.
http://arxiv.org/abs/2408.11365v1
Compressor summary: This paper reviews the state of research on image anti-forensics, which studies how to detect and prevent manipulation of human faces in images, using bibliometric analysis of publications in the Web of Science database.
http://arxiv.org/abs/2408.11363v1
Compressor summary: ProteinGPT is a chatbot that uses GPT-4o to analyze protein sequences and structures and provide relevant answers to users.
http://arxiv.org/abs/2408.11359v1
Compressor summary: The text presents a self-adapting anomaly detection framework that learns hypergraph structure and temporal trends in multisensor data, predicts anomalies based on forecast error, and provides root cause analysis and recommendations for remediation.
http://arxiv.org/abs/2408.11357v1
Compressor summary: Key points: - Paper proposes a novel method to generate physically-layered 3D humans from text prompts - Method uses a physically-decoupled diffusion model and a dual-representation decoupling framework - Method enables reusable and complex clothing matching for different body shapes Summary: The paper presents a new way to create realistic 3D clothed humans from text by separating the clothing and the body, using a diffusion model and a deformation network.
http://arxiv.org/abs/2408.11356v1
Compressor summary: LigPose is a multi-task geometric deep learning model that accurately predicts protein-ligand binding without using docking tools, and shows promise for AI-based drug development.
http://arxiv.org/abs/2408.11349v1
Compressor summary: The authors propose and test a cost-efficient approach using a large language model to assess image quality in e-commerce settings, achieving a significant increase in sales on Mercari's web platform.
http://arxiv.org/abs/2408.11347v1
Compressor summary: Key points: - 3D simulator creates artificial video data for Embodied AI development - QA dataset measures robot's understanding of human behavior and home environment - Dataset useful in measuring AI's comprehension of daily life Summary: The paper presents a 3D simulator and a QA dataset to evaluate Embodied AI's ability to grasp human behavior and daily life in a home setting.
http://arxiv.org/abs/2408.11344v1
Compressor summary: The text discusses using transformer models for generating radiology reports from chest X-rays, comparing them to LSTM models, and suggesting new evaluation methods that consider both language and classification metrics.
http://arxiv.org/abs/2408.11338v1
Compressor summary: Automatic Dataset Construction (ADC) is a method that uses large language models and code generation to create image classification datasets with less manual annotation, but faces challenges such as label errors and imbalanced data distributions.
http://arxiv.org/abs/2408.11336v1
Compressor summary: The paper introduces a new framework, FATE, that uses the FocalNet Transformer architecture to improve temperature forecasting and climate change analysis by capturing complex patterns in meteorological data.
http://arxiv.org/abs/2408.11334v1
Compressor summary: Key points: - Breast ultrasound is important for detecting abnormalities, but radiology reports are unstructured and hard to extract information from. - The study develops an in-house LLM that matches GPT-4's performance in extracting clinical information from radiology reports, with cost and privacy benefits. Summary: The study creates a cheaper and more private alternative to GPT-4 for extracting breast ultrasound findings from unstructured radiology reports.
http://arxiv.org/abs/2408.11330v1
Compressor summary: LAPT is a novel transfer paradigm for TNAS that uses a large language model to learn design principles from existing architectures, refine them, and reduce the search space, leading to improved efficiency and performance.
http://arxiv.org/abs/2408.11327v1
Compressor summary: The paper proposes a zero-shot ensembling strategy to integrate different models for multimodal tasks, such as machine translation and image processing, by re-ranking beams during decoding.
http://arxiv.org/abs/2408.11326v1
Compressor summary: AutoToS is a method that automates planning problem solving by generating search components using language models, achieving 100% accuracy without human intervention.
http://arxiv.org/abs/2408.11323v1
Compressor summary: The study proposes a novel deep learning-based method to improve B1+ field homogeneity in ultrahigh field MRI, leading to faster and better image quality.
http://arxiv.org/abs/2408.11319v1
Compressor summary: This paper evaluates 11 LLMs and 8 PLMs on six sarcasm benchmark datasets and finds that LLMs underperform PLMs, with GPT-4 being the best performer, and few-shot IO prompting being the most effective method.
http://arxiv.org/abs/2408.11316v1
Compressor summary: The text discusses the challenges of using large language models for clinical predictions due to their unreliable prediction probabilities, which are essential for decision-making, and suggests more research is needed.
http://arxiv.org/abs/2408.11313v1
Compressor summary: ECLIPSE is a novel black-box jailbreaking method that uses natural language instructions and LLM self-reflection to generate adversarial suffixes for malicious queries with high efficiency.
http://arxiv.org/abs/2408.11312v1
Compressor summary: The paper proposes a new framework for visual geo-localization using multiple Large Vision-Language Models that communicate with each other and learn dynamic patterns to improve performance, showing better results than existing methods on a new dataset called GeoGlobe.
http://arxiv.org/abs/2408.11309v1
Compressor summary: The study proposes using Modern Hopfield Networks to improve computer vision models' robustness against minor perturbations like blurring, achieving state-of-the-art results on MNIST-C dataset.
http://arxiv.org/abs/2408.11308v1
Compressor summary: EEG-Defender is a novel defense approach that uses early transformer outputs to detect and stop malicious inputs in large language models.
http://arxiv.org/abs/2408.11306v1
Compressor summary: The paper introduces the Kolmogorov-Arnold Network (KAN) for time series forecasting, which improves on existing methods by having better mathematical properties and interpretability.
http://arxiv.org/abs/2408.11305v1
Compressor summary: UniFashion is a unified framework that handles multimodal generation and retrieval tasks in the fashion domain by integrating image generation with retrieval tasks and text generation tasks, achieving superior performance compared to previous single-task models.
http://arxiv.org/abs/2408.11303v1
Compressor summary: The paper proposes using SVD of the Koopman matrix to improve long-term prediction in nonlinear dynamics modeling by adjusting singular values, which affect eigenvalues and thus forecasting performance.
http://arxiv.org/abs/2408.11302v1
Compressor summary: The authors propose ArcRec, a deep learning framework for modeling reference-dependent preferences in recommender systems, using historical purchase records, attribute-level networks, and a novel utility function that considers interest and price sensitivity.
http://arxiv.org/abs/2408.11300v1
Compressor summary: The paper proposes an offline method for learning goal-conditioned policies that use skills and abstraction to handle long-horizon goals with distribution shifts.
http://arxiv.org/abs/2408.11297v1
Compressor summary: This paper explores the challenges of using large vision language models (LVLMs) in few-shot classification tasks and proposes a meta-learning approach with label augmentation and candidate selection to improve their performance.
http://arxiv.org/abs/2408.11294v1
Compressor summary: RedWhale is a Korean-specific NLP model that uses continual pretraining and cross-lingual transfer learning to improve accuracy and comprehension while reducing training time and costs.
http://arxiv.org/abs/2408.11288v1
Compressor summary: Large language models show promise for mental health care but need more rigorous evaluation and ethical considerations before being widely used in clinical settings.
http://arxiv.org/abs/2408.11287v1
Compressor summary: Key points: - Diffusion models are used for image restoration but need assumptions about degradation model - BIR-D is a universal blind image restoration method that adapts to different degradation models and updates parameters dynamically - BIR-D outperforms existing methods on real and synthetic datasets and can handle multiple and complex degradations Summary: BIR-D is a novel diffusion-based method for blind image restoration that can adjust to various degradation models without assumptions and achieve superior results.
http://arxiv.org/abs/2408.11281v1
Compressor summary: BearLLM is a novel framework that uses large language models and vibration signals to manage bearing health by processing user prompts and unifying different tasks.
http://arxiv.org/abs/2408.11271v1
Compressor summary: The text discusses improving biometric system accuracy by filling in missing scores using various score imputation methods and simple sum fusion, especially for high proportions of missing data.
http://arxiv.org/abs/2408.11267v1
Compressor summary: The paper proposes an iterative algorithm to recover model parameters using leverage scores gradient, improving data privacy and security, with low time complexity.
http://arxiv.org/abs/2408.11266v1
Compressor summary: This paper explains deep learning and the Deep Galerkin method for solving differential equations, providing step-by-step examples and code snippets.
http://arxiv.org/abs/2408.11261v1
Compressor summary: The paper analyzes and proposes a method to reduce sycophancy, the undue influence of leading or deceptive prompts on large vision-language models, improving their performance and reducing biased outputs.
http://arxiv.org/abs/2408.11258v1
Compressor summary: The paper proposes a sampling-based method and a sequence-to-sequence model for simulating speech recognition errors in neural network acoustic models, improving their robustness and performance on unseen data.
http://arxiv.org/abs/2408.11253v1
Compressor summary: The paper presents a method using Deep Convolutional Neural Networks to accurately grade almonds and their shells, improving global agricultural product classification and trade.
http://arxiv.org/abs/2408.11252v1
Compressor summary: The paper proposes a method to evaluate the faithfulness of explanation methods for autoregressive language models using counterfactual generation.
http://arxiv.org/abs/2408.11251v1
Compressor summary: The paper proposes a system that uses neural network modeling to create 3D twin models and compare them for defect detection in large machinery, reducing the need for manual inspections by personnel.
http://arxiv.org/abs/2408.11250v1
Compressor summary: The paper proposes a vision-based approach using deep CNNs for crack detection in AM surfaces with high accuracy and efficiency.