This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-04-17 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2404.10776v1
Compressor summary: The paper proposes a robust algorithm for learning from human feedback in generative models, even when the feedback is adversarial and manipulative.
http://arxiv.org/abs/2404.10775v1
Compressor summary: The paper proposes a method for multi-agent cooperation using a compositional world model that generates videos from partial observations and enables online planning.
http://arxiv.org/abs/2404.10774v1
Compressor summary: The authors propose a method to train smaller models with low cost that can fact-check LLM outputs by using synthetic data generated with GPT-4.
http://arxiv.org/abs/2404.10772v1
Compressor summary: The paper introduces Gaussian Opacity Fields (GOF), a new method for efficient and high-quality surface reconstruction from 3D Gaussians, using ray-tracing-based volume rendering and marching tetrahedra.
http://arxiv.org/abs/2404.10771v1
Compressor summary: The paper presents Time-Evolving Natural Gradient (TENG), a method that uses neural networks to solve partial differential equations (PDEs) with high accuracy by optimizing variational principles and time integration.
http://arxiv.org/abs/2404.10765v1
Compressor summary: The paper proposes RefFusion, a 3D inpainting method that uses a reference image to enable high-quality synthesis and control over the reconstructed scene, achieving state-of-the-art results for various tasks.
http://arxiv.org/abs/2404.10763v1
Compressor summary: The paper proposes a new diffusion model, LaDiC, for image captioning that leverages a latent space for captions and improves performance without pre-training or extra modules.
http://arxiv.org/abs/2404.10761v1
Compressor summary: TorchSurv is a lightweight Python package that helps create custom deep survival models with PyTorch, especially for complex high-dimensional data.
http://arxiv.org/abs/2404.10760v1
Compressor summary: This work introduces a large-scale COCO-AD dataset for anomaly detection, new evaluation metrics, and an effective InvAD framework for reconstruction-based methods.
http://arxiv.org/abs/2404.10759v1
Compressor summary: The paper explores binary hyperdimensional computing, introduces a new encoding method called Laplace-HDC that improves accuracy, and discusses its limitations and potential solutions for image processing.
http://arxiv.org/abs/2404.10758v1
Compressor summary: This paper proposes a framework for evaluating and improving selective retrieval strategies in continual learning using replay buffers.
http://arxiv.org/abs/2404.10745v1
Compressor summary: The paper proposes a new RL algorithm, Cert-LSVI-UCB, that achieves constant regret guarantees for linear MDPs with misspecified transition kernels and rewards, and provides novel analysis techniques.
http://arxiv.org/abs/2404.10740v1
Compressor summary: The paper introduces N-agent ad hoc teamwork, a new multi-agent reinforcement learning problem, and proposes POAM, an algorithm that learns representations of teammate behaviors for cooperative task adaptation.
http://arxiv.org/abs/2404.10733v1
Compressor summary: BLR-HAC is a method that combines offline data and online logistic regression to initialize and update policies for human-agent collaboration tasks.
http://arxiv.org/abs/2404.10731v1
Compressor summary: The paper seeks a common definition for AGI by defining it as adapting to open environments with limited resources using intelligent principles.
http://arxiv.org/abs/2404.10730v1
Compressor summary: The paper explores IPUs as accelerators for ML in materials science and battery research, using a CNN model for predicting effective conductivity with comparable performance to GPUs.
http://arxiv.org/abs/2404.10719v1
Compressor summary: The paper compares reward-based (PPO) and reward-free (DPO) methods for aligning large language models with human preferences using theoretical, empirical, and benchmarking studies.
http://arxiv.org/abs/2404.10718v1
Compressor summary: The proposed GazeHTA method detects multiple head-target associations in a scene using a pre-trained diffusion model, enhanced head features, and a connection map for improved gaze target detection.
http://arxiv.org/abs/2404.10717v1
Compressor summary: MPCL is a semi-supervised medical image segmentation method that uses mixed prototypes to improve class embeddings and achieve better performance than existing methods.
http://arxiv.org/abs/2404.10716v1
Compressor summary: MOWA is a novel image warping model that can handle multiple tasks in one single model and generalize well to different scenarios.
http://arxiv.org/abs/2404.10713v1
Compressor summary: This paper explores augmented reality techniques for medical surgeries, focusing on a new approach to ventriculoperitoneal shunt operations using 3D models and the Microsoft HoloLens 2.
http://arxiv.org/abs/2404.10710v1
Compressor summary: The paper introduces a pre-training framework that combines visual and textual data to improve pixel-based language models.
http://arxiv.org/abs/2404.10704v1
Compressor summary: The text explores automated methods for ranking MC questions by difficulty in English learning tests, comparing task transfer and zero-shot approaches, and finding that zero-shot comparative assessment is more effective than other methods.
http://arxiv.org/abs/2404.10699v1
Compressor summary: ECLAIR is a large outdoor LiDAR dataset with 11 object categories for research in point cloud semantic segmentation.
http://arxiv.org/abs/2404.10696v1
Compressor summary: Our method improves coreference and bridging resolution in chemical patents by using external knowledge in a multi-task learning model.
http://arxiv.org/abs/2404.10690v1
Compressor summary: MathWriting is the largest dataset of handwritten mathematical expressions, containing 630k samples that can be used for offline recognition and benchmarking.
http://arxiv.org/abs/2404.10689v1
Compressor summary: The authors propose an automated method to optimize neural network models for X-ray and electron microscopy using hyperparameter and architecture search, achieving improved performance and reduced resource usage.
http://arxiv.org/abs/2404.10688v1
Compressor summary: The paper proposes Efficient Conditional Diffusion Model with Probability Flow Sampling, a fast and high-quality image super-resolution method that uses a continuous-time conditional diffusion model and a hybrid parametrization for the denoiser network.
http://arxiv.org/abs/2404.10685v1
Compressor summary: TeSMo is a method that generates realistic and diverse human-object interactions in different scenes using denoising diffusion models and detailed scene information.
http://arxiv.org/abs/2404.10684v1
Compressor summary: The paper proposes a new heuristic (DDS) and model (stochastic neural network with random activations) to predict how ridesharing drivers make decisions as they experience fatigue and cognitive decline during their shifts, outperforming existing methods in simulations and real data.
http://arxiv.org/abs/2404.10683v1
Compressor summary: The paper proposes a new reinforcement learning method, CAOSD, to optimize portfolios with allocation constraints, such as investing in green technologies while limiting fossil energy sector exposure, and shows it outperforms existing methods on real-world data.
http://arxiv.org/abs/2404.10681v1
Compressor summary: StyleCity is a system that stylizes large-scale urban scenes using vision and text, generating harmonious backgrounds and enhancing semantics consistency for virtual production prototyping.
http://arxiv.org/abs/2404.10667v1
Compressor summary: VASA is a framework that generates realistic talking faces from images and audio, with high quality and fast performance, enabling engaging avatar interactions.
http://arxiv.org/abs/2404.10664v1
Compressor summary: The study proposes a novel method for defect detection in noisy images using deep learning models and denoising techniques, achieving significant improvements in accuracy compared to previous methods.
http://arxiv.org/abs/2404.10662v1
Compressor summary: The paper proposes a dual generative replay framework for continual offline reinforcement learning that retains previous knowledge and mitigates forgetting by replaying high-fidelity samples of past tasks.
http://arxiv.org/abs/2404.10652v1
Compressor summary: The authors introduce ViTextVQA, a Vietnamese dataset for visual question answering that focuses on understanding text in images and improve the performance of models by studying the order of processing OCR tokens.
http://arxiv.org/abs/2404.10646v1
Compressor summary: The paper explores how sharing parking spot availability data within a vehicle fleet can help drivers find free spots faster in smart cities.
http://arxiv.org/abs/2404.10645v1
Compressor summary: Distributed Distributional DrQ is a model-free RL algorithm that uses distributional value functions and distributed actor policies to improve performance in continuous control tasks.
http://arxiv.org/abs/2404.10642v1
Compressor summary: The authors study how self-play in an adversarial language game called SPAG can improve large language models' reasoning ability on various benchmarks.
http://arxiv.org/abs/2404.10633v1
Compressor summary: Contextrast is a semantic segmentation method that uses contrastive learning to capture local/global contexts and their relationships, improving performance on various datasets.
http://arxiv.org/abs/2404.10630v1
Compressor summary: The paper presents HLAT, a large language model pre-trained on AWS Trainium using Neuron Distributed Training Library (NDTL), achieving comparable performance to baseline models trained on GPUs and TPUs.
http://arxiv.org/abs/2404.10626v1
Compressor summary: We propose and test simple methods to adapt a trained UNet for canopy cover and height estimation across different geographic settings using remotely sensed data, achieving better results than baselines and existing approaches.
http://arxiv.org/abs/2404.10625v1
Compressor summary: The paper proposes a method to combine NeRF-based 3D GANs with 3D Gaussian Splatting for efficient rendering quality and real-time editing.
http://arxiv.org/abs/2404.10620v1
Compressor summary: PyTorchGeoNodes is a module for 3D object reconstruction from images using shape programs, allowing for semantic reasoning and optimization.
http://arxiv.org/abs/2404.10618v1
Compressor summary: The text discusses the privacy risks posed by large language and multimodal vision-language models that can accurately infer personal attributes from benign images posted online.
http://arxiv.org/abs/2404.10603v1
Compressor summary: CorrespondentDream is an effective method to use cross-view correspondences from diffusion U-Net as additional 3D prior for NeRF models, improving their geometry and coherence with common sense.
http://arxiv.org/abs/2404.10600v1
Compressor summary: The study proposes and evaluates an intra-operative tumour margin evaluation scheme using specimen mammography, deep learning, and image thresholding to reduce the risk of local recurrences after breast retention surgery.
http://arxiv.org/abs/2404.10595v1
Compressor summary: CODA-LM is a vision-language benchmark for self-driving that evaluates large language models' abilities in interpretable autonomous driving scenarios, especially challenging road corner cases.
http://arxiv.org/abs/2404.10588v1
Compressor summary: The study uses diffusion models to analyze how robust classifiers handle semantically altered data and finds that they struggle with low-norm counterfactual examples, suggesting a link between non-robustness and semantic features.
http://arxiv.org/abs/2404.10584v1
Compressor summary: The paper introduces a new dataset, ReWiTe, that uses a hardware setup with two cellphones to capture authentic wide-angle and telephoto images for training deep learning methods in dual camera system fusion tasks, improving performance over existing synthetic datasets.
http://arxiv.org/abs/2404.10579v1
Compressor summary: This paper explores how Augmented Reality (AR) technology can improve remote work and online education by analyzing its features, advantages, challenges, scientific basis, technical support, performance, influencing factors, and future trends.
http://arxiv.org/abs/2404.10575v1
Compressor summary: EMC^2 is an efficient Markov Chain Monte Carlo method for generating negative samples in contrastive learning, which achieves low computation and memory cost, global convergence, and competitive performance.
http://arxiv.org/abs/2404.10574v1
Compressor summary: This paper proposes a novel approach for source-free open-set domain adaptation that improves target-private sample segregation and robustness using clustering, uncertainty-based selection, and a new contrastive loss.
http://arxiv.org/abs/2404.10573v1
Compressor summary: Key points: - The study proposes an end-to-end diffusion model to generate capsid sequences with enhanced viability for gene therapy using rAAV vectors - The model outperforms traditional methods and can generate viable sequences even in the absence of AAV9 capsid data - The research contributes to the improvement of specificity and transduction efficiency in gene therapy applications Summary: The study presents a novel diffusion model that generates better capsid sequences for rAAV vectors, enhancing gene therapy outcomes.
http://arxiv.org/abs/2404.10572v1
Compressor summary: Label merge-and-split reduces the number of labels for whole brain parcellation, improves accuracy and efficiency, and can be used in other semantic segmentation tasks.
http://arxiv.org/abs/2404.10571v1
Compressor summary: The paper proposes CMU-Flownet, a model that handles occlusions in LiDAR data by using a Correlation Matrix to estimate point similarity and an Occlusion-aware Cost Volume mechanism for better flow estimation.
http://arxiv.org/abs/2404.10561v1
Compressor summary: The paper proposes a novel deep learning method (HiGraphDTI) for predicting drug-target interactions that leverages hierarchical graph representations to capture chemical information from atoms, motifs, and molecules, as well as an attentional feature fusion module and a hierarchical attention mechanism for interpreting interaction mechanisms.
http://arxiv.org/abs/2404.10555v1
Compressor summary: The study developed a Japanese financial-specific large language model by continually pre-training it with custom datasets, improving its performance and quality of outputs compared to the original model.
http://arxiv.org/abs/2404.10552v1
Compressor summary: The text warns that large language models without ethical alignment can execute malicious instructions, posing significant risks and demanding improved security measures.
http://arxiv.org/abs/2404.10551v1
Compressor summary: The paper investigates how generative AI models like ChatGPT influence higher education, exploring benefits, drawbacks, and transformative changes through a survey and scenario analysis of students' perspectives and attitudes.
http://arxiv.org/abs/2404.10550v1
Compressor summary: The paper presents a method to approximate the gradient of the Evidence Lower Bound in Bayesian networks with clutter problems using the reparameterization trick and local likelihood factor approximations, which is faster and more accurate than classical methods.
http://arxiv.org/abs/2404.10547v1
Compressor summary: The paper introduces UNITE, a new method to estimate global average treatment effects in A/B tests with social connections by using only information about neighbors without knowing the exact network structure.
http://arxiv.org/abs/2404.10527v1
Compressor summary: SPVLoc is a global indoor localization method that uses a convolutional network to match a query image with a panoramic semantic layout of the environment and estimate its 6D camera pose.
http://arxiv.org/abs/2404.10518v1
Compressor summary: MobileNetV4 is an efficient and versatile architecture for mobile devices that uses new blocks and search techniques to achieve high accuracy on various accelerators.
http://arxiv.org/abs/2404.10513v1
Compressor summary: The text introduces a new method for improving QA systems by generating more accurate and correct attributions using Chain-of-Thought reasoning.
http://arxiv.org/abs/2404.10512v1
Compressor summary: The text describes a new AI-based system (DDMS) for accurate and efficient convection nowcasting using diffusion processes and geostationary satellite data, which improves forecast lead time, coverage, and resolution compared to existing methods.
http://arxiv.org/abs/2404.10508v1
Compressor summary: The study finds significant social biases in human and AI-generated texts based on gender, race, and intersectional identities, with minority groups experiencing lower levels of agency.
http://arxiv.org/abs/2404.10505v1
Compressor summary: The paper introduces RLKWiC, a new dataset of real-world knowledge work data that can help study and improve knowledge workers' productivity.
http://arxiv.org/abs/2404.10503v1
Compressor summary: The paper explores how different deep learning networks affect sentiment analysis of medical texts using pre-trained models like BERT, and finds that CNN models perform best on smaller datasets.
http://arxiv.org/abs/2404.10501v1
Compressor summary: The paper proposes an unsupervised method for improving vision-language models by generating and aligning responses with augmented image inputs, achieving high scores on complex reasoning tasks without GPT-4 supervision or human involvement.
http://arxiv.org/abs/2404.10500v1
Compressor summary: This paper introduces an Auto-Prompt Graphical Paradigm (APGP) that combines both stimulating and framework prompts to enhance problem-solving capabilities of large language models across multiple domains using automated approaches.
http://arxiv.org/abs/2404.10499v1
Compressor summary: TSSD is a sample selection framework for noisy label learning that uses PSD to generate reliable samples and MSP to mine semi-hard samples from uncertain data, improving network robustness.
http://arxiv.org/abs/2404.10498v1
Compressor summary: LAECIPS is a new edge-cloud framework for vision tasks that achieves high accuracy, low latency, and adapts to dynamic IoT data streams using plug-and-play models and hard input mining strategy.
http://arxiv.org/abs/2404.10490v1
Compressor summary: The study proposes a new sign language teaching model using real-time vision, mixed reality, and improved hand-posture reconstruction to provide an immersive and effective learning experience.
http://arxiv.org/abs/2404.10484v1
Compressor summary: The paper analyzes the cause of over-reconstruction issue in 3D Gaussian Splatting technique and proposes a novel homodirectional view-space positional gradient criterion to split large Gaussians and recover fine details for better rendering quality.
http://arxiv.org/abs/2404.10483v1
Compressor summary: The paper presents a new AI model that improves reliability on small medical datasets, addressing trust issues in healthcare AI.
http://arxiv.org/abs/2404.10481v1
Compressor summary: BayesJudge is a novel Bayesian approach using deep learning and Gaussian Processes to improve prediction confidence and accuracy for legal tasks.
http://arxiv.org/abs/2404.10476v1
Compressor summary: The paper presents a novel method to efficiently detect faces using optimally configured dispersed Haar-like filters that balance between-class and within-class variance.
http://arxiv.org/abs/2404.10475v1
Compressor summary: The paper introduces a new data set from video transcripts to train language models for engaging and effective conversational teaching of scientific concepts across different audiences.
http://arxiv.org/abs/2404.10474v1
Compressor summary: The paper proposes a new benchmark for evaluating out-of-distribution detection in deep neural networks, using ImageNet and Places365 datasets and varying the criteria for in-distribution classes.
http://arxiv.org/abs/2404.10464v1
Compressor summary: DeStein is a novel method that reduces toxic outputs from language models by altering their activation space representations using self-induced steering pairs and arithmetic operations, achieving better performance than previous methods with lower resource and time cost.
http://arxiv.org/abs/2404.10458v1
Compressor summary: Patchformer is a novel Transformer-based model that improves long-term multi-energy load forecasting by segmenting data into patches and capturing local and global dependencies.
http://arxiv.org/abs/2404.10457v1
Compressor summary: The text argues that machine learning for protein-protein interactions needs better evaluation strategies, data preparation, and structural similarity-based data splits to avoid overoptimistic evaluations and unfair benchmarking.
http://arxiv.org/abs/2404.10454v1
Compressor summary: The paper proposes an AI-based automatic monitoring system for detecting anticoagulant substances in test tubes, which is competitive with existing models and could improve efficiency and sustainability in the production of plastic consumables.
http://arxiv.org/abs/2404.10450v1
Compressor summary: This paper reviews different graph-based methods for predicting protein-protein interactions, discussing their applications and classifying them into two groups based on model structures.
http://arxiv.org/abs/2404.10445v1
Compressor summary: Key points: - The paper proposes a method to improve deployment efficiency of diffusion models on mobile devices. - The method uses sparse masks and progressive sparsity training to control the trade-off between FID and MACs. - The experiments show that the method reduces MACs by 50% while maintaining low FID. Summary: The paper introduces a method for improving diffusion models on mobile devices using sparse masks and progressive sparsity training, which achieves low FID and MACs.
http://arxiv.org/abs/2404.10443v1
Compressor summary: AGHINT is a new model for representing heterogeneous information networks, which improves node classification by considering attribute disparities and incorporating higher-order similar neighbor features.
http://arxiv.org/abs/2404.10441v1
Compressor summary: The report describes the winning method for reconstructing 3D objects from few images using Pixel-NeRF, depth supervision, and positional encoding.
http://arxiv.org/abs/2404.10440v1
Compressor summary: The study examines how well second language learners of English can imitate native speakers' pitch variations in a reading task and finds that proficiency affects entrainment differently at individual and group levels.
http://arxiv.org/abs/2404.10438v1
Compressor summary: The paper presents a simple and effective method for pose refinement using pre-trained features, a particle filter, and a renderable scene representation.
http://arxiv.org/abs/2404.10436v1
Compressor summary: The paper proposes a self-aware framework that learns from past trials and errors to accelerate ABC rejection sampling in generative models with obscured likelihood.
http://arxiv.org/abs/2404.10433v1
Compressor summary: The study investigates how deep neural networks learn to classify Alzheimer's patients from normal controls using quantitative maps, and finds that they focus on brain regions near the basal ganglia.
http://arxiv.org/abs/2404.10429v1
Compressor summary: MEEL is a new method to improve machines' ability to understand event relations across different data types by generating evolving graphs and using them for instruction tuning and guiding discrimination.
http://arxiv.org/abs/2404.10420v1
Compressor summary: The study adapts a deep learning model that can accurately classify bird species from acoustic signals and provide interpretable explanations of its decisions using prototypical patterns.
http://arxiv.org/abs/2404.10416v1
Compressor summary: This paper introduces summary candidates into Multi-Document Scientific Summarization (MDSS) to guide the decoding process, improve global information handling, and generate better summaries using a specialized pairwise comparison method and Conditional Variational Autoencoder.
http://arxiv.org/abs/2404.10411v1
Compressor summary: The paper proposes a framework to create efficient video object detection models using self-training and knowledge distillation, and shows that clustering cameras improves accuracy of distilled models.
http://arxiv.org/abs/2404.10408v1
Compressor summary: The paper proposes an SIS architecture with cross-attention to generate realistic and identity-preserving faces using semantic, style, and identity features.
http://arxiv.org/abs/2404.10407v1
Compressor summary: The study compares four techniques to optimize Vision Transformers (ViT) for resource-constrained environments, improving their performance and efficiency.
http://arxiv.org/abs/2404.10405v1
Compressor summary: The paper proposes an enhanced medical image recognition method by combining self-supervised and semi-supervised learning techniques, which improves accuracy when labeled data is limited.
http://arxiv.org/abs/2404.10394v1
Compressor summary: Portrait3D is a novel text-to-3D-portrait generation framework that overcomes geometry issues by using a joint geometry-appearance prior and a pyramid tri-grid 3D representation.
http://arxiv.org/abs/2404.10393v1
Compressor summary: OTTO is a method to improve offline reinforcement learning by using World Transformers to predict dynamics and reward, and generating high-rewarded data simulations from offline data.
http://arxiv.org/abs/2404.10387v1
Compressor summary: The authors explore how combining different explanations from deep learning models can reveal more reliable patterns and improve evaluation of the model's behavior.
http://arxiv.org/abs/2404.10384v1
Compressor summary: Key points: - Large language models (LLMs) excel at many tasks but struggle with domain-specific evaluations and hallucination problems - Knowledge graphs (KGs) provide background knowledge for LLMs and enable reasoning and analysis - The paper proposes a pipeline to select reasoning paths from KG based on LLM and a subgraph retrieval method based on CoT and page rank - Experiments show that fewer LLM calls can achieve the same results as previous SOTAs models Summary: The paper presents a method to use knowledge graphs and large language models for reasoning and subgraph retrieval, reducing the dependency on LLM and achieving similar performance to existing models.
http://arxiv.org/abs/2404.10383v1
Compressor summary: The paper proposes a two-stage method for sign language performance evaluation using pose reconstruction and smoothing methods, providing effective feedback and consistent results with professional assessments.
http://arxiv.org/abs/2404.10378v1
Compressor summary: The paper overviews the 2nd edition of a face recognition challenge that explores using synthetic data to address privacy, bias, and performance issues in face recognition.
http://arxiv.org/abs/2404.10370v1
Compressor summary: The paper analyzes open set recognition methods and proposes a new approach that improves performance by leveraging feature diversity.
http://arxiv.org/abs/2404.10363v1
Compressor summary: This paper discusses the significance of fault diagnosis in marine diesel engines for maritime safety, efficiency, and reliability, focusing on subsystems, common issues, and data-driven methods.
http://arxiv.org/abs/2404.10358v1
Compressor summary: The paper proposes a novel framework called IREANet, which uses optical flow and residual blocks to align and aggregate features from multiple low dynamic range images to restore high quality high dynamic range images.
http://arxiv.org/abs/2404.10357v1
Compressor summary: CoKnow is a framework that improves Prompt Learning for Vision-Language Models by using Multi-Knowledge Representation to enhance context optimization and achieve better performance on various downstream tasks.
http://arxiv.org/abs/2404.10356v1
Compressor summary: The study proposes a novel framework called CDCT to discover decision-relevant concepts from opaque deep learning models, which could improve trust and advance medical research.
http://arxiv.org/abs/2404.10353v1
Compressor summary: Key points: - The text introduces a novel basis for spectral GNNs that incorporates graph information and decouples positive and negative activation. - The basis is based on the Positive and Negative Coupling Analysis (PNCA) framework, which analyses the message propagation process and reveals hidden information. - The proposed GSCNet achieves better or comparable results with less computational time than existing GNNs. Summary: The text presents a new basis for spectral GNNs that uses graph information and PNCA to decouple activation effects, leading to improved performance and efficiency over existing GNNs.
http://arxiv.org/abs/2404.10346v1
Compressor summary: Self-Explore is a method that helps language models improve their reasoning skills by exploring the first mistake in a rationale and using it as feedback for further improvement, achieving significant gains compared to supervised fine-tuning on GSM8K and MATH datasets.
http://arxiv.org/abs/2404.10343v1
Compressor summary: The paper reviews the NTIRE 2024 challenge on efficient single-image super-resolution, focusing on optimization aspects and outcomes, with four sub-tracks to evaluate runtime, FLOPs, parameters, and overall performance.
http://arxiv.org/abs/2404.10342v1
Compressor summary: Key points: - The text introduces a new challenge in image restoration called RFIR, where models need to remove specific degradation types based on human commands in images with multiple degradations. - The text presents a synthetic dataset (RFIR) and a novel transformer-based model (TransRFIR) for this task, which use two attention modules (MHASA and MHACA) to perceive and remove degradation types. - The text claims that TransRFIR achieves state-of-the-art performances and is released at https://github.com/GuanRunwei/FIR-CP. Summary: The text proposes a new image restoration task (RFIR) that requires models to remove specific degradation types in images with multiple degradations based on human commands, and introduces a synthetic dataset and a transformer-based model for this task that use novel attention modules.
http://arxiv.org/abs/2404.10341v1
Compressor summary: The text describes a bridge incident in Norway where a structural defect was detected by Internet of Things sensors and Digital Twin technology, highlighting the benefits of online monitoring and condition-based maintenance for infrastructure management.
http://arxiv.org/abs/2404.10337v1
Compressor summary: The paper proposes two new positional encodings for transformer-based time series forecasting methods and evaluates their performance in a dual-branch framework.
http://arxiv.org/abs/2404.10335v1
Compressor summary: AdvDiffVLM generates natural and effective adversarial examples for large visual-language models using diffusion models and GradCAM-guided Mask method, improving speed and robustness compared to existing methods.
http://arxiv.org/abs/2404.10332v1
Compressor summary: The paper proposes DFTG, a framework that generates targeted instruction data for different large vision-language models to address their specific hallucination issues and improve their performance on cross-modal tasks.
http://arxiv.org/abs/2404.10329v1
Compressor summary: The paper explores using large language models to automate the complex process of aligning ontologies in the Semantic Web, which is currently done manually by experts.
http://arxiv.org/abs/2404.10324v1
Compressor summary: The text proposes a graph neural network (GNN)-based surrogate model for predicting hydraulic states in urban drainage networks, which incorporates physical constraints and outperforms a fully-connected neural network (NN) model in accuracy and cost-effectiveness.
http://arxiv.org/abs/2404.10322v1
Compressor summary: Our method adapts a small adapter to rectify diverse target domain styles to the source domain for better few-shot semantic segmentation, using local-global style perturbations and cyclic domain alignment.
http://arxiv.org/abs/2404.10320v1
Compressor summary: The authors present a new dataset for wind turbine anomaly detection with detailed fault information, and propose a CARE scoring method to evaluate anomaly detection models.
http://arxiv.org/abs/2404.10319v1
Compressor summary: The authors propose methods to improve cell counting in moving streams by adapting training and decision making processes.
http://arxiv.org/abs/2404.10318v1
Compressor summary: The paper proposes SRGS, a method that improves 3D Gaussian Splatting for high-resolution novel view synthesis by densifying and learning texture features from low-resolution inputs using sub-pixel constraints and a pre-trained 2D super-resolution model.
http://arxiv.org/abs/2404.10317v1
Compressor summary: The text introduces LLMs4OM, a novel approach that uses large language models for ontology matching tasks and shows they can outperform traditional methods in data integration.
http://arxiv.org/abs/2404.10315v1
Compressor summary: The paper proposes LePe, a method to improve confidence expression in large language models by learning from past experience, addressing key problems and designing a complete pipeline for data preparation and sampling.
http://arxiv.org/abs/2404.10314v1
Compressor summary: The paper proposes an uncertain AI model that estimates its own predictions' uncertainties and uses them for data augmentation and multimodal optimization.
http://arxiv.org/abs/2404.10312v1
Compressor summary: The OmniSSR method uses Stable Diffusion and tangent projection to achieve high-resolution omnidirectional image super-resolution with fidelity and realness, without training or fine-tuning.
http://arxiv.org/abs/2404.10308v1
Compressor summary: HOMER is a new method that divides long inputs into smaller chunks, processes them collectively, and merges them using a hierarchical strategy to overcome the context limit of large language models without requiring training or expensive modifications.
http://arxiv.org/abs/2404.10307v1
Compressor summary: The paper presents a method that uses SegGPT and learnable prompts for few-shot segmentation, addressing catastrophic forgetting, object sizes, and discontinuities, with image similarity search for inference.
http://arxiv.org/abs/2404.10306v1
Compressor summary: CoFiTune is a framework that balances speciality and versatility in aligned large language models by updating specific modules and using soft-masking, achieving better performance across diverse tasks.
http://arxiv.org/abs/2404.10305v1
Compressor summary: The research proposes an end-to-end deep learning pipeline for recognizing tables in document images, improving accuracy and efficiency over existing methods.
http://arxiv.org/abs/2404.10299v1
Compressor summary: This study developed a machine learning model that uses sleep sounds to accurately assess sleep quality and identifies the specific sound events and timing that affect individual sleep satisfaction.
http://arxiv.org/abs/2404.10297v1
Compressor summary: The text introduces future language modeling, a task of predicting future texts based on their temporal history, which can be useful for various human activities and improve upon existing non-temporal language models.
http://arxiv.org/abs/2404.10296v1
Compressor summary: The interpolating neural network (INN) is a new AI approach that uses interpolation points in physical space to improve software programming, reducing parameters, increasing accuracy, and addressing data challenges.
http://arxiv.org/abs/2404.10292v1
Compressor summary: The paper proposes a Filtering-WoRA method to efficiently train person search models using synthetic data with minimal but effective data samples and fine-tuning, improving retrieval performance and reducing training time.
http://arxiv.org/abs/2404.10282v1
Compressor summary: Tripod is a neural network autoencoder with three complementary inductive biases that improve disentangled representation learning, achieving state-of-the-art results on image benchmarks.
http://arxiv.org/abs/2404.10279v1
Compressor summary: EucliDreamer is a method that generates realistic and diverse textures for 3D models based on text prompts using a depth-conditioned Stable Diffusion model, achieving superior quality and faster convergence than existing methods.
http://arxiv.org/abs/2404.10275v1
Compressor summary: The paper proposes a new gradient-based method for optimizing profit margins in insurance markets that directly integrates fairness criteria into pricing, addressing challenges faced by traditional methods.
http://arxiv.org/abs/2404.10274v1
Compressor summary: The study presents a new method that combines UMAP and LASSO to improve predictions of soil fertility from imbalanced datasets, achieving high accuracy and interpretability.
http://arxiv.org/abs/2404.10272v1
Compressor summary: The text introduces two techniques to improve the efficiency of ray-tracing in Occupancy Grid for Neural Radiance Field by using VDB grids and hierarchical digital differential analyzer.
http://arxiv.org/abs/2404.10271v1
Compressor summary: The paper explores how social choice theory can help address ethical and safety challenges in fine-tuning foundation models like GPT-4 based on human preferences and principles.
http://arxiv.org/abs/2404.10268v1
Compressor summary: The paper presents neuro-symbolic goal summarizer and dialogue generation models for health coaching that assist patients in setting and achieving physical activity goals, improve over previous methods, and introduce a new dataset and metric for evaluating patient responses.
http://arxiv.org/abs/2404.10267v1
Compressor summary: OneActor is a novel method that generates consistent and high-quality images from text using cluster-conditioned diffusion models, without relying on external data or expensive tuning.
http://arxiv.org/abs/2404.10263v1
Compressor summary: PreGSU is a generalized pre-trained scene understanding model for autonomous driving that learns universal interactions using graph attention networks and self-supervised tasks.
http://arxiv.org/abs/2404.10259v1
Compressor summary: The authors propose a method using large language models to automatically discover arguments related to specific themes in social media discussions, reducing the need for manual coding and human intervention.
http://arxiv.org/abs/2404.10242v1
Compressor summary: This paper compares weakly supervised classifiers and self-supervised masked autoencodters (MAEs) for featurizing microscopy images, showing that MAEs perform better and introducing a new channel-agnostic MAE architecture that generalizes well across different data.
http://arxiv.org/abs/2404.10241v1
Compressor summary: The paper introduces GOAT, a VLN model that uses causal inference and feature pooling to reduce dataset bias and improve performance on multiple VLN tasks.
http://arxiv.org/abs/2404.10237v1
Compressor summary: MoE-TinyMed is a low-parameter model for medical visual question answering that performs better than existing models with fewer resources.
http://arxiv.org/abs/2404.10234v1
Compressor summary: The paper proposes a framework that combines AI-native multi-modal search with neural image compression, improving storage and retrieval efficiency for large multimedia datasets.
http://arxiv.org/abs/2404.10229v1
Compressor summary: LLM-Stega is a black-box generative text steganography method that uses large language models' user interfaces to securely communicate secret messages with rich semantics.
http://arxiv.org/abs/2404.10228v1
Compressor summary: Key points: - The paper proposes a two stage method for stance labeling using user-hashtag and user-user graphs - The method uses label propagation, semi-supervised learning, and graph neural networks - The method outperforms zero-shot stance labeling with LLMs on climate change and gun control tweets Summary: The paper presents a novel graph-based method for stance labeling that beats LLMs on tweets about climate change and gun control.
http://arxiv.org/abs/2404.10227v1
Compressor summary: The paper introduces a new model for realistic hand motion analysis that combines a musculoskeletal system with a parametric hand model, and a pose refinement framework that uses a neural network to improve the estimated hand pose.
http://arxiv.org/abs/2404.10226v1
Compressor summary: The text analyzes how to improve visual question answering by using large language models and external knowledge bases, focusing on the impact of explicit supervised retrieval and multi-hop reasoning.
http://arxiv.org/abs/2404.10213v1
Compressor summary: The text proposes a new gait recognition network, GaitPoint+, that uses both silhouette and skeleton features for more robust recognition, with a lightweight and fast key point learning module and a recycling max-pooling method to improve accuracy.
http://arxiv.org/abs/2404.10211v1
Compressor summary: The paper proposes a Transformer autoencoder-based method for detecting and correcting business process anomalies without setting thresholds, outperforming previous methods in accuracy and efficiency.