This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-15 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.09473v1
Compressor summary: StyleSplat is a fast method for applying diverse artistic styles to 3D objects in scenes using 3D Gaussian splatting and feature matching, achieving localized stylization and photorealistic results.
http://arxiv.org/abs/2407.09468v1
Compressor summary: The text discusses the need for a broader mathematical perspective in modern machine learning to handle non-Euclidean data with intricate geometric, topological, and algebraic structures.
http://arxiv.org/abs/2407.09467v1
Compressor summary: FairyLandAI is an AI-driven storytelling model that creates personalized fairytales for children, integrating text and image generation to enhance the storytelling experience and impart moral values.
http://arxiv.org/abs/2407.09453v1
Compressor summary: The text describes a method called weight block sparsity that reduces the size and computational cost of deep neural networks by zeroing some parameters in pre-trained models, leading to faster inference speeds and memory efficiency.
http://arxiv.org/abs/2407.09450v1
Compressor summary: EM-LLM integrates human episodic memory into large language models, enabling them to handle infinite context lengths and outperforming state-of-the-art models in various tasks.
http://arxiv.org/abs/2407.09447v1
Compressor summary: The paper proposes a reinforcement learning approach to find prompts that can trigger toxic outputs from language models while maintaining intelligibility and realism.
http://arxiv.org/abs/2407.09435v1
Compressor summary: The paper proposes compatibility metrics and a training strategy to minimize inconsistencies when updating large language models, aiming to improve user experience and satisfaction.
http://arxiv.org/abs/2407.09434v1
Compressor summary: The paper proposes using foundation models, advanced deep learning techniques, to enhance the management of complex and uncertain aspects of electric power grids in the context of climate change and energy transition.
http://arxiv.org/abs/2407.09431v1
Compressor summary: The paper proposes a new framework to count repetitive actions in videos by learning embeddings and predicting action start probabilities, instead of using a temporal self-similarity matrix as an intermediate representation.
http://arxiv.org/abs/2407.09429v1
Compressor summary: The study examines how well instruction-tuned LLMs perform on clinical NLP tasks when given different natural language instructions, finding that domain-specific models can be more brittle than general ones and that phrasing differences affect fairness.
http://arxiv.org/abs/2407.09417v1
Compressor summary: The paper proposes DRAD, a method to detect and correct hallucinations in LLMs by adapting the retrieval process based on real-time detection and using external knowledge.
http://arxiv.org/abs/2407.09415v1
Compressor summary: OfflineMania is a new environment for offline reinforcement learning research that simulates a single-agent racing game and provides datasets to test different algorithms.
http://arxiv.org/abs/2407.09413v1
Compressor summary: SPIQA is a large-scale QA dataset that tests multimodal models' ability to understand figures and tables in computer science research articles using multiple images and a Chain-of-Thought evaluation strategy.
http://arxiv.org/abs/2407.09392v1
Compressor summary: Open-Canopy is an open-access benchmark for estimating very high resolution canopy height and detecting change using satellite and aerial data in France.
http://arxiv.org/abs/2407.09388v1
Compressor summary: Key points: - Automated game generation is complex and has challenges - The work uses Ludii, a language that encodes rules of many board games - The approach uses large language models and evolutionary computation to mutate and recombine games - The generated games are new, interesting, and cover unseen regions of the rules space - Some games can be played online through Ludii portal Summary: The work presents a novel approach to generate new and interesting board games using Ludii, a language that encodes many existing games, and combining large language models and evolutionary computation to create diverse and original game mechanics.
http://arxiv.org/abs/2407.09386v1
Compressor summary: Quanta radiance fields use single-photon cameras to train neural networks at the photon level, enabling high-quality view synthesis under challenging conditions like motion, low light, and dynamic range.
http://arxiv.org/abs/2407.09381v1
Compressor summary: Curvature-based rewiring may not improve message passing efficiency in real-world graphs because it does not always target oversquashed edges, and performance gains are due to hyperparameter sweeps.
http://arxiv.org/abs/2407.09379v1
Compressor summary: The paper proposes FANet, a network that uses AFE blocks to incorporate semantic information for better semantic segmentation in complex scenes with cluttered backgrounds and translucent objects.
http://arxiv.org/abs/2407.09378v1
Compressor summary: { ame} is a causal graph neural network explainer that identifies important subgraphs by training neural causal models on the input graph.
http://arxiv.org/abs/2407.09375v1
Compressor summary: This paper investigates how State Space Models can learn from past observations and make predictions without retraining, by using a new weight construction method that approximates input signal derivatives.
http://arxiv.org/abs/2407.09373v1
Compressor summary: The text proposes a method to group ICU patients based on their observation trajectories and develop personalized risk predictions, improving clinical decision making.
http://arxiv.org/abs/2407.09372v1
Compressor summary: The text introduces a new dataset of images for autonomous robots in construction and analyzes their performance, suggesting more data and consistent labels are needed.
http://arxiv.org/abs/2407.09370v1
Compressor summary: Sinusoidal positional encoding (SPE) is a new method that adapts to different tasks without hyperparameter tuning, improving performance in various tasks like 3D view synthesis, Text-to-Speech generation, and 1D regression.
http://arxiv.org/abs/2407.09367v1
Compressor summary: The paper introduces a new approach to adapt pre-trained models to unsupervised domain shifts by using uncertainty-aware data buffering, graph-based class relation preservation, and pseudo-target replay.
http://arxiv.org/abs/2407.09364v1
Compressor summary: WhosAI is a new framework that can detect and reveal whether text was written by humans or AI, using contrastive learning to learn semantic similarity representations from multiple generators.
http://arxiv.org/abs/2407.09359v1
Compressor summary: GLASS is a novel anomaly synthesis strategy that enhances unsupervised anomaly detection by combining global and local synthesis methods, achieving state-of-the-art results in weak defect detection and industrial applications.
http://arxiv.org/abs/2407.09357v1
Compressor summary: STGG+ is a Transformer-based method for generating molecules with desired properties by incorporating random masking, property prediction loss, and other improvements, achieving state-of-the-art results on various tasks.
http://arxiv.org/abs/2407.09352v1
Compressor summary: The paper proposes an implicit method for solving Electromagnetic Inverse Scattering Problems, which improves the accuracy of non-invasively determining the internal relative permittivity of a scatterer using electromagnetic fields.
http://arxiv.org/abs/2407.09344v1
Compressor summary: Point-CPR is a compact point cloud model that uses partial-aware prediction and a local aggregation encoder to improve 3D representation and reduce model size compared to existing Masked Point Modeling methods.
http://arxiv.org/abs/2407.09336v1
Compressor summary: Key points: - Contrastive learning is a key technique for unsupervised time series representation learning - Augmentation choice affects performance significantly, but is often empirical or grid searching - The paper proposes a framework to select augmentations based on trend and seasonality of datasets - The framework is evaluated on synthetic and real-world datasets, showing better results than baselines Summary: The paper presents a principled method to recommend augmentations for contrastive learning in time series analysis, based on dataset characteristics such as trend and seasonality, and shows its effectiveness on diverse datasets.
http://arxiv.org/abs/2407.09327v1
Compressor summary: The article presents a multilingual bias and propaganda annotated corpus from Facebook posts about the Israeli War on Gaza, created for a shared task and used to evaluate performance of detection techniques.
http://arxiv.org/abs/2407.09311v1
Compressor summary: The authors present a new method for finding Bayesian Network structures using multiple LLMs and majority voting, compare it to an alternative method, and discuss its scalability and applicability.
http://arxiv.org/abs/2407.09303v1
Compressor summary: ProDepth is a novel framework that uses a probabilistic approach to address inconsistencies caused by dynamic objects in multi-frame monocular depth estimation, improving performance on various datasets.
http://arxiv.org/abs/2407.09299v1
Compressor summary: The Physics-Informed Diffusion model translates RGB images to infrared images while adhering to physical laws, achieving better results than existing methods.
http://arxiv.org/abs/2407.09298v1
Compressor summary: The study investigates how transformer layers work and how to use them more efficiently for different problems.
http://arxiv.org/abs/2407.09297v1
Compressor summary: The paper proposes a new method to estimate density-based distances in high-dimensional data using normalizing flows and a dimension-adapted Fermat distance.
http://arxiv.org/abs/2407.09294v1
Compressor summary: The text presents a novel method to estimate 3D shapes and refractive index from single-view polarization images using a modified polarization reflection model, reflectance cues, and an inverse rendering-based deep learning framework.
http://arxiv.org/abs/2407.09288v1
Compressor summary: The paper presents a new dataset for winter sports equipment segmentation and tests interactive segmentation models with online adaptation methods to improve efficiency and accuracy.
http://arxiv.org/abs/2407.09287v1
Compressor summary: The study proposes a hierarchical framework that combines language understanding and reinforcement learning to enable AI agents to execute complex language instructions in virtual environments.
http://arxiv.org/abs/2407.09285v1
Compressor summary: The MetaFood Workshop and its challenge aim to improve 3D food reconstruction for nutrition monitoring using a visible checkerboard as a size reference, with 16 teams submitting results on varying difficulty levels.
http://arxiv.org/abs/2407.09283v1
Compressor summary: DAHRS improves multilingual SRL projection accuracy by addressing spurious role labels caused by naturally occurring divergences and using linguistically-informed alignment remediation followed by FCFA projection.
http://arxiv.org/abs/2407.09281v1
Compressor summary: The paper compares large language models and a cognitive instance-based learning model in predicting human behavior in sequential decision-making tasks, finding that LLMs are better at incorporating feedback while the cognitive IBL model captures loss aversion bias.
http://arxiv.org/abs/2407.09276v1
Compressor summary: H2O-Danube3 is a small language model pre-trained on Web data, with high performance and portability for various tasks and devices.
http://arxiv.org/abs/2407.09274v1
Compressor summary: HelixProtX is a system that uses a large multimodal model to transform any input protein modality into any desired protein modality, enabling better understanding and generation of protein data and outperforming existing models in various tasks.
http://arxiv.org/abs/2407.09271v1
Compressor summary: Our method improves continual learning for vision tasks by using incremental neural mesh models, latent space initialization, and positional regularization, achieving better performance in both in-domain and out-of-distribution scenarios.
http://arxiv.org/abs/2407.09252v1
Compressor summary: COCOM compresses long contexts for Retrieval-Augmented Generation, speeding up decoding time and improving answer quality.
http://arxiv.org/abs/2407.09248v1
Compressor summary: The authors propose a method that uses semantic information to improve UV mapping and 3D reconstruction of indoor scenes after clutter removal.
http://arxiv.org/abs/2407.09247v1
Compressor summary: Constrained Intrinsic Motivation (CIM) improves unsupervised skill discovery and exploration with intrinsic motivation in reinforcement learning tasks by addressing challenges like static skills, limited state coverage, and suboptimality.
http://arxiv.org/abs/2407.09241v1
Compressor summary: The paper proposes a sociolinguistic approach to language modeling, considering how different language varieties affect various challenges and emphasizing the importance of accurate representation in training data.
http://arxiv.org/abs/2407.09236v1
Compressor summary: The study proposes a visual intuition model based on Gestalt theory to improve CNN performance by completing missing information in images, and tests it on MNIST data set.
http://arxiv.org/abs/2407.09230v1
Compressor summary: The authors propose Surgical Imagen, a text-to-image generative model that can create realistic surgical images from textual descriptions, addressing challenges like high annotation costs and ethical constraints in acquiring surgical data for research and development.
http://arxiv.org/abs/2407.09221v1
Compressor summary: The paper argues for a reform in AI evaluation methods using cognitive sciences and identifies challenges and promising research directions.
http://arxiv.org/abs/2407.09216v1
Compressor summary: The paper corrects an error in panoptic scene graph generation evaluations, shows that two-stage models are competitive to one-stage models, and introduces a new two-stage model (DSFormer) that outperforms existing models.
http://arxiv.org/abs/2407.09215v1
Compressor summary: HUP-3D is a large, diverse, and realistic synthetic dataset for estimating hand-ultrasound probe pose in obstetric ultrasound using markerless 3D joint pose estimation with potential applications in medical education and guidance.
http://arxiv.org/abs/2407.09212v1
Compressor summary: AConE is a novel query embedding method that explains knowledge in SROI^{-} description logic and outperforms previous models with fewer parameters.
http://arxiv.org/abs/2407.09209v1
Compressor summary: The paper proposes a LLM-based scoring system for automated language learning assessment that uses speech encoder and modality adapter layer to generate accuracy and fluency scores.
http://arxiv.org/abs/2407.09197v1
Compressor summary: ACME is a chatbot that helps migrants in Europe find the best protection option using computational argumentation.
http://arxiv.org/abs/2407.09192v1
Compressor summary: The text describes a method for accurate anatomical landmark detection using diffusion models that generate probability regions as heatmaps, achieving high quality results in medical image processing.
http://arxiv.org/abs/2407.09191v1
Compressor summary: CAFE is a model-agnostic learning strategy for panoptic scene graph generation that incorporates shape-aware features in an easy-to-hard manner and outperforms existing methods.
http://arxiv.org/abs/2407.09187v1
Compressor summary: The study proposes a method to detect depression in Bangla social media posts using advanced natural language processing and deep learning techniques, achieving better results than existing approaches.
http://arxiv.org/abs/2407.09186v1
Compressor summary: SPH-ParVI is a new variational inference method that uses smoothed particle hydrodynamics to simulate fluid flow and sample probabilistic models efficiently.
http://arxiv.org/abs/2407.09184v1
Compressor summary: The study introduces a new dataset (SIKO) for Korean language models to improve their handling of incomplete syntax in natural language processing.
http://arxiv.org/abs/2407.09181v1
Compressor summary: The paper studies how to extract dialogue participant information and evaluate their performance in Russian using Multi-Session Chat dataset, F-score metric, and various models, finding that all models have low recall and larger models improve extraction.
http://arxiv.org/abs/2407.09174v1
Compressor summary: DART is an automated pipeline for object detection in industrial applications that uses image generation, open-vocabulary annotation, and multimodal review to streamline the workflow and improve performance.
http://arxiv.org/abs/2407.09173v1
Compressor summary: Conformal prediction can be applied to transductive node-classification using exchangeable graphs, preserving coverage and statistical efficiency.
http://arxiv.org/abs/2407.09172v1
Compressor summary: The study applies generative AI to create unique architectural designs using neural networks trained on human data, producing coherent images with captions shared online.
http://arxiv.org/abs/2407.09167v1
Compressor summary: SE(3)-bi-equivariant transformer (BITR) is a method to align non-overlapped point clouds by exploiting SE(3)-bi-equivariance prior, which ensures robustness against rigid perturbations and initial positions.
http://arxiv.org/abs/2407.09165v1
Compressor summary: Conformal prediction can produce robust prediction sets against adversarial examples, by bounding the change in conformity scores with efficient algorithms.
http://arxiv.org/abs/2407.09162v1
Compressor summary: The paper proposes a method to improve the Tsetlin Machine's word embedding by incorporating feature negations and optimizing its parameters, achieving high accuracy in pattern classification tasks.
http://arxiv.org/abs/2407.09159v1
Compressor summary: The paper presents a video-based method to detect and categorize severity of autism using spatio-temporal features and a shallow TCN-MLP network, which can aid clinicians in autism spectrum analysis.
http://arxiv.org/abs/2407.09152v1
Compressor summary: The study examined four large language models' abilities to generate and detect hallucinations, using ensemble voting for detection, in the CLEF ELOQUENT HalluciGen shared task.
http://arxiv.org/abs/2407.09150v1
Compressor summary: Key points: - Machine learning models can be fooled by small adversarial perturbations - Semantic segmentation models are more sensitive than image classification models - New attacks and detailed analysis reveal the extent of the problem - Size-bias and diversity of attacks make evaluation challenging Summary: The paper shows that semantic segmentation models are highly vulnerable to adversarial perturbations, which can fool them by altering small parts of images. The authors propose new attacks and analyze the models in detail, finding size-bias and diversity issues that complicate evaluation.
http://arxiv.org/abs/2407.09141v1
Compressor summary: Summary: The paper studies how compression techniques affect LLMs' quality and suggests using KL-Divergence and flips as better evaluation metrics than accuracy.
http://arxiv.org/abs/2407.09136v1
Compressor summary: The authors propose a method to improve dialog tutoring models by detecting student errors in math reasoning problems and generating targeted feedback based on the errors.
http://arxiv.org/abs/2407.09127v1
Compressor summary: The paper evaluates XAI methods using an EAF model and a novel scoring method based on ground truth simulations and sensitivity analysis, showing how well they explain the data-generating process.
http://arxiv.org/abs/2407.09124v1
Compressor summary: The text proposes a photonic-based algorithm for multi-agent decision-making that balances exploration and exploitation without explicit information sharing, using chaotic lasers and decentralized coupling adjustment.
http://arxiv.org/abs/2407.09121v1
Compressor summary: This study proposes a new method, DeRTa, to improve the safety of Large Language Models by training them to recognize and refuse harmful content at any position in a response.
http://arxiv.org/abs/2407.09120v1
Compressor summary: URRL-IMVC is a novel method for incomplete multi-view clustering that leverages multi-view information and neighboring samples to generate robust embeddings without relying on cross-view contrastive learning or missing view recovery.
http://arxiv.org/abs/2407.09115v1
Compressor summary: The paper presents a method to explain ResNet neural networks by extending Layer-wise Relevance Propagation with Relevance Splitting to handle skip connections, and shows its effectiveness on ImageNet and Caltech-UCSD Birds datasets.
http://arxiv.org/abs/2407.09111v1
Compressor summary: The tutorial discusses optimization techniques for fast and efficient inference using AI accelerators with Transformer-based foundation models in various applications.
http://arxiv.org/abs/2407.09105v1
Compressor summary: Packing and Flash Attention with proper masking improve LLM training efficiency and accuracy by combining multiple examples up to the maximum sequence length without wasting GPU resources or affecting attention computation.
http://arxiv.org/abs/2407.09103v1
Compressor summary: DANIEL is a fast, end-to-end architecture for handwritten document understanding that integrates language modeling, layout analysis, text recognition, and named entity recognition across multiple languages, layouts, and tasks.
http://arxiv.org/abs/2407.09096v1
Compressor summary: Key points: - The paper proposes STD-LLM, a model that can do spatial-temporal forecasting and imputation tasks using LLMs - STD-LLM uses tokenizers, virtual nodes, node embeddings, and hypergraph learning to understand spatial-temporal data - STD-LLM performs well on various datasets and few-shot/zero-shot learning tasks Summary: The paper introduces STD-LLM, a model that leverages LLMs to handle both forecasting and imputation for spatial-temporal data using special tokens, embeddings, and hypergraph learning, achieving strong results on different datasets and few-shot/zero-shot scenarios.
http://arxiv.org/abs/2407.09093v1
Compressor summary: The paper proposes bit-level reversible transformers by treating each block as an Euler integration approximation and using bidirectional integration approximation, which improves accuracy and data-throughput in training.
http://arxiv.org/abs/2407.09087v1
Compressor summary: The paper explores the impact of discrete tokens in masked image modeling, proposes a new metric to measure their effectiveness, and introduces ClusterMIM, a method that outperforms existing approaches on various datasets and models.
http://arxiv.org/abs/2407.09073v1
Compressor summary: The authors propose a method to adapt a pre-trained vision-language model (VLM) for open vocabulary multilabel video classification by using large language models (LLMs) to provide semantic guidance and integrating temporal modeling into the VLM's encoder.
http://arxiv.org/abs/2407.09072v1
Compressor summary: The text discusses the challenges of using direct preference optimization (DPO) methods for fine-tuning language models based on human feedback, and proposes a new loss function to address these issues.
http://arxiv.org/abs/2407.09061v1
Compressor summary: Key points: - Unsupervised feature selection for high-dimensional data - Self-supervised graph-based approach using pseudo-labels from graph Laplacian eigenvectors - Robust to outliers and complex substructures - Effective on real-world datasets, especially biological ones Summary: The paper presents a robust and effective self-supervised graph-based method for unsupervised feature selection from high-dimensional data, using pseudo-labels from graph Laplacian eigenvectors.
http://arxiv.org/abs/2407.09059v1
Compressor summary: The paper proposes a domain adaptation scheme that uses a blurring model to generate training pairs for fine-tuning a video deblurring model in unseen domains, improving performance on real-world videos.
http://arxiv.org/abs/2407.09057v1
Compressor summary: PersonificationNet is a model that can make a cartoon or toy act like a person by copying their pose and appearance from few images.
http://arxiv.org/abs/2407.09052v1
Compressor summary: Key points: - The system generates tablatures for lead electric guitar from MIDI melodies - It solves a multi-attribute optimization problem to find the best fingering - It incorporates common clichés, biomechanical feasibility, articulations, and expressive techniques - It converts the output into MusicXML format for easy use Summary: The system creates tablatures for lead electric guitar from MIDI melodies by optimizing fingering, incorporating musical aspects, and converting to MusicXML.
http://arxiv.org/abs/2407.09048v1
Compressor summary: Key points: - Intelligent maritime is a part of smart ocean construction that uses AI and data analysis to improve efficiency and intelligence of ocean transportation - It faces challenges from complex and dynamic maritime environment and diverse data sources - KUNPENG is a model for intelligent maritime that perceives heterogeneous data, makes decisions, and optimizes power for safe and efficient navigation Summary: KUNPENG is an AI model that enhances the efficiency and intelligence of ocean transportation by navigating vessels safely and optimally in complex maritime environments using diverse data sources.
http://arxiv.org/abs/2407.09047v1
Compressor summary: The paper proposes a method for incremental semantic segmentation that balances learning from old and new classes using prototype-guided techniques and weight-guided consolidation.
http://arxiv.org/abs/2407.09043v1
Compressor summary: AMOLE is a model that enhances the understanding of molecules and their texts by preserving structural similarity and transferring expertise.
http://arxiv.org/abs/2407.09039v1
Compressor summary: TRIL3 is a new method for continuous learning in tabular data classification that uses synthetic data to prevent forgetting and achieves superior performance with minimal synthetic data.
http://arxiv.org/abs/2407.09033v1
Compressor summary: The paper proposes a text-based semantic segmentation method that leverages vision-language models and transformers to improve generalization across domains, achieving state-of-the-art results on GTA5$->$Cityscapes.
http://arxiv.org/abs/2407.09026v1
Compressor summary: The paper proposes HPC, a framework for compressing volumetric videos based on NeRF that enables variable bitrate and quality using a single model, reducing temporal redundancy and optimizing compression with multi-rate-distortion loss function.
http://arxiv.org/abs/2407.09025v1
Compressor summary: SpreadsheetLLM introduces an efficient encoding framework for large language models to effectively understand and reason with spreadsheets using SheetCompressor and Chain of Spreadsheet.
http://arxiv.org/abs/2407.09024v1
Compressor summary: The paper proposes a two-stage optimization method for offline Reinforcement Learning using language model alignment, introduces Efficient Diffusion Alignment (EDA) for continuous control problems, and shows its superior performance on the D4RL benchmark.
http://arxiv.org/abs/2407.09020v1
Compressor summary: This paper proposes a multimodal and multi-teacher approach to improve mental health classification using social media data, which integrates diverse features like text and sound, and distributes learning across specialized teachers.
http://arxiv.org/abs/2407.09017v1
Compressor summary: Copilot Guided Response (CGR) is an ML system that helps security analysts investigate, triage, and remediate security incidents by providing historical context, determining the nature of the incident, and suggesting containment actions.
http://arxiv.org/abs/2407.09014v1
Compressor summary: CompAct is a framework that compresses extensive documents for language models to improve multi-hop question-answering without losing key information.
http://arxiv.org/abs/2407.09013v1
Compressor summary: This paper surveys how generative AI is used for creating various types of content in PCG and discusses the challenges of limited domain-specific training data.
http://arxiv.org/abs/2407.09012v1
Compressor summary: TCAN is a pose-driven human image animation method that uses ControlNet and LoRA to create realistic videos with robustness to erroneous poses and consistent over time, while also allowing for a more static background.
http://arxiv.org/abs/2407.09011v1
Compressor summary: The paper proposes a method to improve question answering systems by using supervised contrastive learning, which enhances robustness and efficiency in handling various user inputs.
http://arxiv.org/abs/2407.09007v1
Compressor summary: The paper introduces a framework to measure the creativity of large language models using two characteristics: convergent and divergent thinking, and applies it to Codeforces problems.
http://arxiv.org/abs/2407.09005v1
Compressor summary: The paper proposes a new AI model for recognizing objects in maritime scenes, introduces a benchmark dataset, and develops a performance evaluation method to standardize and improve autonomous navigation systems.
http://arxiv.org/abs/2407.09003v1
Compressor summary: The summary: The authors propose a 'denoising-then-voting' method using large language models to predict stock trends from individual news instead of merged news, overcoming noise and input length limits and achieving comparable performance to supervised methods.
http://arxiv.org/abs/2407.08995v1
Compressor summary: The study proposes a method for LLMs to generate their own role-play prompts through fine-tuning, improving their performance in various domains and automating complex prompting strategies.
http://arxiv.org/abs/2407.08994v1
Compressor summary: Key points: - Point-based neural models are effective for point cloud analysis but have issues with input embedding and neighboring aggregations - The paper proposes GAD, a network with CPT and DKFF modules to address these issues - Experiments show the superior performance of GAD on various tasks Summary: The paper introduces GAD, a network that improves point cloud analysis by using global attention and dual-domain feature learning to enhance input embedding and neighboring aggregations.
http://arxiv.org/abs/2407.08993v1
Compressor summary: The paper explores using deep learning for super-resolution to improve optical character recognition from document scans, introducing a multi-task loss function to address the ill-posedness of the problem.
http://arxiv.org/abs/2407.08992v1
Compressor summary: Emotion Talk is a system that provides emotional support through audio messages in Portuguese, analyzing and responding to users' emotions outside therapy sessions.
http://arxiv.org/abs/2407.08989v1
Compressor summary: This study evaluates large language models' robustness against morphological variations in text and finds that they perform better than previous pre-trained models on real-world benchmarks like GEC and LSC.
http://arxiv.org/abs/2407.08978v1
Compressor summary: The text describes a new dataset of Chinese-English literature with complex discourse structures, proposes chapter-to-chapter translation as a pragmatic context-aware translation task, and explores the performance of machine translation models and large language models on this setting.
http://arxiv.org/abs/2407.08973v1
Compressor summary: The paper proposes an ensembling method that uses transparent and opaque models for easy and hard inputs respectively to balance interpretability and performance.
http://arxiv.org/abs/2407.08972v1
Compressor summary: The paper evaluates the robustness of large kernel convolutional neural networks (CNNs) and compares their performance to transformers (ViTs) on six benchmark datasets, revealing novel insights into their properties.
http://arxiv.org/abs/2407.08971v1
Compressor summary: FuSTAL improves pseudo label quality for action localization in untrimmed videos using cross-video contrastive learning, prior-based filtering, and EMA-based distillation, achieving state-of-the-art results.
http://arxiv.org/abs/2407.08968v1
Compressor summary: SlideGCD is a WSI analysis pipeline that uses slide-based graph construction and graph learning to explore inter-correlations between slides for cancer diagnostics, improving the performance of existing multi-instance learning methods on TCGA datasets.
http://arxiv.org/abs/2407.08967v1
Compressor summary: The text proposes a Dual-System Augmented Relation Extractor (DSARE) that combines traditional RE models with LLMs to address their limitations in few-shot relation extraction tasks.
http://arxiv.org/abs/2407.08966v1
Compressor summary: LAPT is a novel approach for OOD detection that automatically generates prompts with class names and negative labels, improving reliability and reducing manual intervention.
http://arxiv.org/abs/2407.08965v1
Compressor summary: Lite-SAM is an efficient end-to-end solution for SegEvery with a streamlined CNN-Transformer hybrid encoder, an automated prompt proposal network, and a mask decoder that achieves state-of-the-art performance while reducing computational costs.
http://arxiv.org/abs/2407.08964v1
Compressor summary: CA-RL improves traffic efficiency and safety by using communication-aware Reinforcement Learning to optimize CACC in Connected and Autonomous Vehicles.
http://arxiv.org/abs/2407.08959v1
Compressor summary: The paper proposes HierICRF, a method that improves few-shot hierarchical text classification by adapting unstructured semantic spaces in pre-trained language models to the downstream domain hierarchy using iterative language modeling and hierarchical consistency correction.
http://arxiv.org/abs/2407.08952v1
Compressor summary: DAFND is a model that improves fake news detection by enhancing large language models with a Detection, Investigation, Judge, and Determination module that utilizes both inside and outside information.
http://arxiv.org/abs/2407.08950v1
Compressor summary: MSFSNet is a network that improves image restoration by selectively recovering information using spatial and frequency domain knowledge, dynamic filter selection, and skip feature fusion.
http://arxiv.org/abs/2407.08949v1
Compressor summary: The authors propose a method to generate dynamic and expressive talking head videos from a single face image, improving the existing Image2Video model with a Face Locator and Motion Frame mechanism, and provide a demo platform for easy use.
http://arxiv.org/abs/2407.08947v1
Compressor summary: Key points: - The paper proposes a method to build interpretable models (CBMs) using foundation models with minimal human effort - The method reduces biases and vulnerability to spurious correlations in CBMs - The method is evaluated on multiple datasets and shows effectiveness Summary: The paper presents a novel framework that leverages foundation models to construct interpretable models with low bias and high robustness, and evaluates it on various datasets.
http://arxiv.org/abs/2407.08946v1
Compressor summary: The paper proposes a new self-supervised training objective for diffusion models to improve denoising in regions outside the standard training distribution, leading to better sampling quality and performance.
http://arxiv.org/abs/2407.08944v1
Compressor summary: Bora is a new spatio-temporal diffusion probabilistic model that can generate realistic and diverse biomedical videos from text prompts, potentially improving medical education, decision-making, and training.
http://arxiv.org/abs/2407.08940v1
Compressor summary: The paper evaluates large language models' ability to generate novel hypotheses from biomedical literature, using various settings and metrics, and finds that they can produce valid and diverse hypotheses with some limitations.
http://arxiv.org/abs/2407.08939v1
Compressor summary: The paper introduces LightenDiffusion, a diffusion-based unsupervised framework that enhances low-light images using Retinex theory and physically explainable diffusion models, improving visual quality over existing methods.
http://arxiv.org/abs/2407.08937v1
Compressor summary: The text describes a framework that enables large language models to learn from experience and improve their performance on various tasks, demonstrating the feasibility of using LLMs to mimic human experiential learning.
http://arxiv.org/abs/2407.08934v1
Compressor summary: The text explores how vector embeddings in neural networks relate to conditional independence constraints and helps explain structural patterns in data representations.
http://arxiv.org/abs/2407.08933v1
Compressor summary: The authors develop a novel program that combines rule-based and machine learning approaches to identify and adapt to errors or failures in high-volume manufacturing environments.
http://arxiv.org/abs/2407.08932v1
Compressor summary: The paper proposes a simple framework for autonomous vehicles to make decisions in urban environments by incorporating the significance of surrounding vehicles and contextual information using deep attention, reinforcement learning, and an encoder.
http://arxiv.org/abs/2407.08931v1
Compressor summary: Key points: - OVD is detecting objects in a scene without predefined classes - The paper proposes GLIS, a method that uses both object-level and scene-level features from lidar point clouds - GLIS also uses LLM for inference and RPLG and BAOL for refinement - Experiments show the effectiveness of GLIS on two datasets Summary: The paper introduces GLIS, a lidar-based OVD method that leverages object-level and scene-level features, as well as LLM inference, RPLG, and BAOL for refinement.
http://arxiv.org/abs/2407.08922v1
Compressor summary: This paper introduces a benchmark and a novel evaluation metric (c-score) to assess whether large language models understand the underlying physicochemical mechanisms of gold nanoparticle synthesis, suggesting their potential for advancing scientific discovery.
http://arxiv.org/abs/2407.08918v1
Compressor summary: This paper proposes a new framework using complex networks to analyze and improve knowledge transfer in evolutionary many-task optimization (EMaTO).
http://arxiv.org/abs/2407.08916v1
Compressor summary: The study creates a movie recommendation system using machine learning methods like NMF, SVD, and K-Means clustering to improve user experience.
http://arxiv.org/abs/2407.08910v1
Compressor summary: PAIL is a novel method that uses imitation learning and adversarial training to optimize industrial operations for carbon neutrality without predefined rewards.
http://arxiv.org/abs/2407.08909v1
Compressor summary: KGpose is a novel framework that uses keypoint graphs to estimate 6D poses of multiple objects from RGB and point cloud features, achieving competitive results for robotic applications.
http://arxiv.org/abs/2407.08908v1
Compressor summary: Key points: - Image retrieval is important for various applications and requires human expertise - CHAIR is a model that allows humans to correct intermediate concepts and improves embedding generation - CHAIR outperforms similar models without intervention and benefits from human guidance Summary: CHAIR is a human-in-the-loop image retrieval model that enables better performance by letting humans correct intermediate concepts.
http://arxiv.org/abs/2407.08906v1
Compressor summary: AirSketch is a method for generating sketches from hand motions without needing expensive hardware or markers, using controllable image diffusion models.
http://arxiv.org/abs/2407.08902v1
Compressor summary: The paper presents an AI algorithm that can accurately detect Autism Spectrum Disorder by analyzing facial and bodily expressions of children during daily activities.
http://arxiv.org/abs/2407.08898v1
Compressor summary: The paper introduces a toolkit for creating interactive AI agents that understand natural language and execute grounded instructions in a Minecraft-like environment, as well as an evaluation platform with human annotators to assess their performance.