This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-06 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.03312v1
Compressor summary: The paper proposes a method for adapting depth completion models in real-time without needing source data, which improves performance by 21.1% on average.
http://arxiv.org/abs/2402.03311v1
Compressor summary: HASSOD is a novel self-supervised object detection method that adapts to the number of objects per image and understands their compositions, achieving better performance and interpretability than existing methods.
http://arxiv.org/abs/2402.03310v1
Compressor summary: V-IRL is a platform that lets AI agents interact with a virtual but realistic version of the physical world to improve their abilities in perception, decision-making, and interaction.
http://arxiv.org/abs/2402.03309v1
Compressor summary: The text describes a new method (AONeuS) that uses acoustic-optical neural fusion to create high-resolution 3D underwater scenes from limited data, improving on existing RGB and sonar methods.
http://arxiv.org/abs/2402.03307v1
Compressor summary: The paper proposes a novel method for synthesizing views in dynamic scenes using anisotropic 4D Gaussians that can capture complex motion dynamics and achieve real-time rendering speeds.
http://arxiv.org/abs/2402.03305v1
Compressor summary: The text explains how diffusion models learn to generate 2D Gaussian bumps by traversing three phases of latent representations and demonstrates that they cannot factorize localization in x and y positions, suggesting the need for better inductive biases.
http://arxiv.org/abs/2402.03303v1
Compressor summary: The paper benchmarks different-sized LLMs on instruction following in conflicting situations and finds that larger models perform better, while instruction following conflicts with safety filters or guidelines.
http://arxiv.org/abs/2402.03302v1
Compressor summary: The paper introduces a new Mamba-based model, Swin-UMamba, for medical image segmentation that leverages ImageNet pretraining to achieve superior performance over existing methods.
http://arxiv.org/abs/2402.03300v1
Compressor summary: DeepSeekMath 7B is a language model pre-trained with math-related web data and optimized for mathematical reasoning using GRPO, achieving high scores on MATH benchmark without external tools.
http://arxiv.org/abs/2402.03299v1
Compressor summary: Key points: - The paper proposes a system (GUARD) to generate jailbreaks for testing LLMs' safety and ethical behavior - GUARD uses a role-playing method, a knowledge graph, and a guideline-following setting - GUARD is tested on various LLMs across different modalities and shows effectiveness Summary: The paper introduces GUARD, a system that generates jailbreaks for testing the safety and ethics of LLMs using roles, a knowledge graph, and guidelines. It demonstrates its performance on different LLMs and modalities.
http://arxiv.org/abs/2402.03295v1
Compressor summary: Ginger is a method to improve second-order optimization in deep learning by efficiently computing the inverse of the generalized Gauss-Newton matrix.
http://arxiv.org/abs/2402.03293v1
Compressor summary: Flora uses random projections to enable high-rank weight updates for neural networks, reducing memory usage without compromising performance.
http://arxiv.org/abs/2402.03292v1
Compressor summary: RONIN is a method for detecting out-of-distribution objects by using a diffusion model to inpaint the object with the predicted in-distribution label, making it easier to distinguish between in-distribution and out-of-distribution samples.
http://arxiv.org/abs/2402.03290v1
Compressor summary: InstanceDiffusion is a text-to-image model that allows precise control over individual objects in an image using various location methods and outperforms existing state-of-the-art models.
http://arxiv.org/abs/2402.03289v1
Compressor summary: The paper proposes an automated transformer decoding algorithm that uses Monte Carlo tree-search for generating RTL code with improved PPA efficiency and correctness.
http://arxiv.org/abs/2402.03287v1
Compressor summary: The Lennard-Jones layer (LJL) is a method to equalize the density of 2D and 3D point clouds by simulating interactions between points and adjusting their distribution without retraining neural networks.
http://arxiv.org/abs/2402.03286v1
Compressor summary: ConsiStory is a training-free method for consistent subject generation in text-to-image models using shared activations, subject-driven attention, and feature injection.
http://arxiv.org/abs/2402.03284v1
Compressor summary: FortUne Dial is a task that evaluates language models' ability to represent uncertainty in conversations, using two types of uncertainty representations and fine-tuning strategies to improve calibration.
http://arxiv.org/abs/2402.03282v1
Compressor summary: The paper proposes a new model for reinforcement learning with human feedback (RLHF) that considers partially observed rewards and different types of feedback, and presents efficient algorithms and generalizations based on this model.
http://arxiv.org/abs/2402.03271v1
Compressor summary: UoT is a new algorithm that helps language models ask better questions to seek information and solve tasks more effectively in uncertain situations.
http://arxiv.org/abs/2402.03270v1
Compressor summary: The text discusses how machine learning techniques can improve intrusion detection systems in IoT networks using a dataset with MQTT protocol frames under attack.
http://arxiv.org/abs/2402.03268v1
Compressor summary: Pre-trained language models reason by aggregating indirect reasoning paths from knowledge graphs and math word problems, which can be improved by augmenting unlabeled random walk reasoning paths.
http://arxiv.org/abs/2402.03264v1
Compressor summary: MobilityGPT is a geospatially-aware generative model that uses GPT to create realistic human mobility trajectories with controllable generation, semantic sequence similarity, and road connectivity constraints.
http://arxiv.org/abs/2402.03256v1
Compressor summary: PG losses are new decision-aware surrogate losses that approximate the downstream loss and perform well in misspecified settings with non-central symmetric noise.
http://arxiv.org/abs/2402.03252v1
Compressor summary: The paper proposes a fair ranking method that minimizes the error in groups of items using pairwise comparisons and an oracle, adapting to different fairness preferences by adjusting parameters.
http://arxiv.org/abs/2402.03251v1
Compressor summary: The paper adapts CLIP for monocular depth estimation by jointly training a compact decoder and a tiny embedding matrix named mirror, improving its performance without fine-tuning and refining CLIP's prior knowledge.
http://arxiv.org/abs/2402.03246v1
Compressor summary: SGS-SLAM is a novel system that combines semantic understanding with 3D Gaussian representations for accurate scene interpretation and high-quality visualizations in real time.
http://arxiv.org/abs/2402.03244v1
Compressor summary: Skill Set Optimization (SSO) is a method to improve LLM actor performance by constructing and refining sets of transferable skills using subgoals and instructions.
http://arxiv.org/abs/2402.03243v1
Compressor summary: PINN-BO uses Physics-Informed Neural Networks with Partial Differential Equations to improve black-box optimization efficiency and sample quality.
http://arxiv.org/abs/2402.03242v1
Compressor summary: JobSkape is a framework for generating synthetic job postings to improve skill-to-taxonomy matching using large language models.
http://arxiv.org/abs/2402.03241v1
Compressor summary: FROSTER is a framework that uses residual feature distillation to adapt CLIP for open-vocabulary action recognition while preserving its generalization capability.
http://arxiv.org/abs/2402.03235v1
Compressor summary: The paper proposes ActiveAnno3D, an active learning framework for 3D object detection that selects informative data samples for labeling and minimizes labeling costs.
http://arxiv.org/abs/2402.03232v1
Compressor summary: The paper derives a new loss and algorithm for training vector field models that improves over standard Conditional Flow Matching with smaller variance and better learning results.
http://arxiv.org/abs/2402.03230v1
Compressor summary: Key points: - The text introduces a benchmark study for different 3D U-shaped models in medical image segmentation for thoracic surgery planning - It compares the impact of attention mechanisms, resolution stages, and network configurations on accuracy and complexity - STUNet performs best among the models tested Summary: The text summarizes a benchmark study that evaluates various 3D U-shaped deep learning models for segmenting thoracic anatomy from CT scans and compares their performance and features.
http://arxiv.org/abs/2402.03227v1
Compressor summary: IGUANe is a 3D deep learning model that harmonizes brain MR images from multiple sites by integrating an arbitrary number of domains and preserving individual information related to age and Alzheimer's disease.
http://arxiv.org/abs/2402.03226v1
Compressor summary: FuseMoE is a novel framework that combines different types of data and handles missing information to improve machine learning models' performance in various tasks, especially in medical settings.
http://arxiv.org/abs/2402.03223v1
Compressor summary: The paper explores the best language for prompting emotion labels on non-English texts using multilingual large language models, and finds that English prompts are consistently better than the target language.
http://arxiv.org/abs/2402.03221v1
Compressor summary: The paper proposes a joint embedding model for detecting different types of offensive speech on social media using limited data and showing promising results.
http://arxiv.org/abs/2402.03216v1
Compressor summary: The paper introduces M3-Embedding, a versatile model for multi-lingual, cross-lingual, and various functionalities of information retrieval, with new state-of-the-art results and novel training techniques.
http://arxiv.org/abs/2402.03214v1
Compressor summary: The paper evaluates different methods for distinguishing AI-generated images from human art across various settings and scenarios, finding that a hybrid approach combining both human and automated detectors is most effective.
http://arxiv.org/abs/2402.03207v1
Compressor summary: The paper proposes an optimal Schr"odinger bridge matching procedure that recovers the SB process with a single step and arbitrary transport plan, and relates it to energy-based modeling objectives.
http://arxiv.org/abs/2402.03201v1
Compressor summary: The paper proposes Diffusion with Spherical Gaussian constraint (DSG), a method that improves conditional diffusion models by constraining guidance steps within the data manifold, leading to better sample quality and faster sampling processes.
http://arxiv.org/abs/2402.03191v1
Compressor summary: The paper argues that isotropy in embedding spaces, a property that has been debated recently, cannot coexist with clustered data, which also harms linear classification tasks.
http://arxiv.org/abs/2402.03190v1
Compressor summary: The paper introduces MHaluBench, a benchmark for evaluating multimodal hallucination detection methods, and UNIHD, a framework that uses auxiliary tools to detect hallucinations in large language models.
http://arxiv.org/abs/2402.03188v1
Compressor summary: The paper proposes a new loss equation for face swapping that improves the realism of the eyes, reducing uncanny valley effects and making it harder to detect deepfakes.
http://arxiv.org/abs/2402.03187v1
Compressor summary: The study shows that connected ensembles with more interaction negatively affect performance but can be improved by re-discovering multi-basin deep ensembles through distillation.
http://arxiv.org/abs/2402.03182v1
Compressor summary: The text reviews methods that use large language models (LLMs) for time series analysis, discussing their challenges, motivations, and applications in various domains.
http://arxiv.org/abs/2402.03181v1
Compressor summary: This paper proposes C-RAG, a framework to certify and reduce generation risks in retrieval-augmented language models by grounding external knowledge and providing theoretical guarantees.
http://arxiv.org/abs/2402.03177v1
Compressor summary: The paper introduces CIDAR, an open Arabic instruction-tuning dataset that reflects the diverse cultures of the Arab region and addresses the biases in existing instruction datasets towards Western culture.
http://arxiv.org/abs/2402.03175v1
Compressor summary: The paper proposes a Bayesian learning model to understand how Large Language Models work by approximating an ideal generative text model based on predicting the next token.
http://arxiv.org/abs/2402.03173v1
Compressor summary: Multi is a comprehensive benchmark for multimodal large language models that tests their understanding of complex figures, tables, and scientific questions in realistic examination styles.
http://arxiv.org/abs/2402.03172v1
Compressor summary: The paper presents a new automated method for assigning ICD codes to clinical texts using a Transformer-based encoder and label embeddings, achieving better performance than previous models.
http://arxiv.org/abs/2402.03171v1
Compressor summary: Homograph attacks severely reduce sentiment analysis accuracy for Arabic dialects, highlighting the need for ethical and responsible machine learning.
http://arxiv.org/abs/2402.03170v1
Compressor summary: Mamba is a new model that performs as well as transformers in learning from sequential data.
http://arxiv.org/abs/2402.03166v1
Compressor summary: RRWNet is an automated framework that uses a neural network to segment retinal blood vessels and correct errors in classification, improving the accuracy of disease biomarkers.
http://arxiv.org/abs/2402.03164v1
Compressor summary: The paper proposes a new approach to represent time in cyber-physical systems using clocks as real-valued fluents, making the reachability problem decidable and enabling Golog program realization.
http://arxiv.org/abs/2402.03163v1
Compressor summary: The paper explores what makes sentences difficult for aspect-based sentiment analysis by analyzing different data sets and using a combination of classifiers and linguistic features to measure difficulty.
http://arxiv.org/abs/2402.03162v1
Compressor summary: The paper introduces Direct-a-Video, a system that enables independent control of object motion and camera movement in text-to-video models using cross-attention layers and self-supervised training.
http://arxiv.org/abs/2402.03161v1
Compressor summary: The paper proposes an efficient method to pre-train LLMs on videos by decomposing them into keyframes and motions, enabling unified generative pre-training of video, image, and text content.
http://arxiv.org/abs/2402.03158v1
Compressor summary: The paper presents new algorithms that improve the efficiency and scalability of adaptive vector quantization (AVQ), enabling its wider use in machine learning optimization.
http://arxiv.org/abs/2402.03146v1
Compressor summary: This paper proposes a multi-step objective for training one-step models in model-based reinforcement learning, which improves trajectory prediction and handling of noisy data.
http://arxiv.org/abs/2402.03142v1
Compressor summary: Key points: - Neural network pruning is important for reducing model complexity and memory usage - Existing pruning algorithms have limitations - KEN is a novel, universal, and unstructured pruning algorithm based on KDE - KEN preserves the most significant parameters and restores others to pre-training state - KEN achieves better or equal performance with 25% parameter reduction or more - KEN_viz is an explainable tool that visualizes the optimized model composition and subnetwork selection Summary: KEN is a new pruning algorithm that uses KDE to selectively preserve significant parameters in transformer models, achieving better or equal performance with less memory and an explainable tool called KEN_viz.
http://arxiv.org/abs/2402.03141v1
Compressor summary: AD-RL improves reinforcement learning in delayed scenarios by using a short-delayed auxiliary task to learn the value function faster and more efficiently for the long-delayed main task.
http://arxiv.org/abs/2402.03139v1
Compressor summary: This paper proposes a new method for selecting subsets from larger sets in drug discovery using neural networks that considers the superset's information, which improves performance over existing methods.
http://arxiv.org/abs/2402.03138v1
Compressor summary: The paper proposes a clustering-based exploration method for 3-D environments using random or pre-trained representations, which outperforms other exploration methods.
http://arxiv.org/abs/2402.03137v1
Compressor summary: The study investigates if pre-trained language models can learn associations between language choice and emotional expression in Hinglish, finding that they do but may overgeneralize this in some cases.
http://arxiv.org/abs/2402.03136v1
Compressor summary: Albatross is a novel algorithm that uses simulated self-play and planning to learn how to interact with agents of any strength in simultaneous games, achieving better results than AlphaZero and previous methods in both competitive and cooperative scenarios.
http://arxiv.org/abs/2402.03131v1
Compressor summary: The authors propose a constrained decoding method for label projection in zero-shot cross-lingual transfer, improving translation quality and performance over existing methods.
http://arxiv.org/abs/2402.03126v1
Compressor summary: The paper explores if there are completely parameter-free methods for stochastic optimization, and shows that simple hyperparameter search can achieve this in non-convex settings, while providing a partially parameter-free method in convex settings.
http://arxiv.org/abs/2402.03119v1
Compressor summary: The paper introduces e$^2$KD, a method that improves knowledge distillation by aligning teacher and student explanations, leading to better accuracy, agreement, and robustness.
http://arxiv.org/abs/2402.03115v1
Compressor summary: The paper proposes methods to create interpretable models from complex image data using disentangled representation learning, sparse neural networks, and symbolic regression, and shows their usefulness in bioimaging for cell state classification.
http://arxiv.org/abs/2402.03112v1
Compressor summary: The text describes a machine learning model that uses Structural Attention Mechanism to improve the prediction and interpretation of infrared spectra, especially for diazo compounds, by focusing on chemical information near functional groups.
http://arxiv.org/abs/2402.03110v1
Compressor summary: A new model and algorithm for non-stationary multi-armed bandits with latent auto-regressive rewards are proposed and shown to perform better than standard UCB in various settings.
http://arxiv.org/abs/2402.03099v1
Compressor summary: The authors propose a new method for automatic prompt engineering that uses calibration and synthetic data generation to improve the performance of Large Language Models on real-world tasks.
http://arxiv.org/abs/2402.03094v1
Compressor summary: The paper proposes a new method, CD-ViTO, for cross-domain few-shot object detection that improves the performance of open-set detectors by adding novel components and outperforms existing methods on both out-of-domain and in-domain datasets.
http://arxiv.org/abs/2402.03093v1
Compressor summary: The paper examines how artificial intelligence and virtual reality are transforming medical care and services through three categories of applications: visualization enhancement, medical data processing, and intervention assistance.
http://arxiv.org/abs/2402.03082v1
Compressor summary: The authors provide a comprehensive analysis of recent advancements and challenges in the field of visual text processing, covering various tasks, features, learning paradigms, and datasets.
http://arxiv.org/abs/2402.03067v1
Compressor summary: BERTopic performs well in topic modeling for partially preprocessed Serbian tweets, providing more informative topics than LDA and NMF.
http://arxiv.org/abs/2402.03055v1
Compressor summary: PAC is a new reinforcement learning algorithm that combines stochastic policies and critics, using PAC-Bayes analysis to model and adapt uncertainty, leading to better exploration and control.
http://arxiv.org/abs/2402.03053v1
Compressor summary: The paper presents Malaysian language models Llama2 and Mistral fine-tuned for embedding tasks, showing their effectiveness in Semantic Similarity and Retrieval-Augmented Generation.
http://arxiv.org/abs/2402.03049v1
Compressor summary: EasyInstruct is an open-source framework that simplifies instruction processing for large language models and encourages more research on instruction data.
http://arxiv.org/abs/2402.03047v1
Compressor summary: Key points: - Virtual try-on improves garment shopping experiences but needs accurate segmentation masks - PFDM is a parser-free virtual try-on method based on diffusion model - PFDM uses pseudo-images, Garment Fusion Attention, and large-scale dataset to synthesize high-fidelity images Summary: PFDM is a novel parser-free virtual try-on method that can seamlessly wear garments on the target person using diffusion model, pseudo-images, and Garment Fusion Attention, achieving high-fidelity results.
http://arxiv.org/abs/2402.03046v1
Compressor summary: Open RL Benchmark is a community-driven repository of fully tracked Reinforcement Learning experiments that allows easy comparison and reproducibility of RL algorithms.
http://arxiv.org/abs/2402.03043v1
Compressor summary: The paper proposes SIDU-TXT, an explainable AI method that provides word-level explanations for text classification models, and evaluates its performance on image and text datasets using a comprehensive framework.
http://arxiv.org/abs/2402.03040v1
Compressor summary: InteractiveVideo is a framework that allows users to interactively generate videos using various input mechanisms and refine the result through user instructions.
http://arxiv.org/abs/2402.03038v1
Compressor summary: This paper investigates how different sample selection methods affect few-shot learning performance and proposes a new method (ACSESS) that combines them for better results.
http://arxiv.org/abs/2402.03021v1
Compressor summary: The paper explores how multiscale data affects deep learning and proposes a new gradient descent method that adapts learning rates based on data variations.
http://arxiv.org/abs/2402.03019v1
Compressor summary: Taylor video is a new format that highlights dominant motions in each frame using Taylor series expansion, improving action recognition performance when combined with RGB or optical flow videos.
http://arxiv.org/abs/2402.03017v1
Compressor summary: The text provides a comprehensive overview of Few-Shot Learning, its taxonomy, applications, and future research directions.
http://arxiv.org/abs/2402.03014v1
Compressor summary: The paper proposes a new algorithm, Pri-GP, that improves cooperative learning in multi-agent systems by allowing agents to selectively request predictions from trustworthy neighbors and ensuring reliable predictions.
http://arxiv.org/abs/2402.03011v1
Compressor summary: The paper investigates how output perturbation affects individual and group fairness in binary linear classification under differential privacy.
http://arxiv.org/abs/2402.03009v1
Compressor summary: UniMem is a unified framework for improving large language models' ability to process long contexts by enhancing their memory capabilities, and UniMix integrates the strengths of 16 existing methods based on this framework.
http://arxiv.org/abs/2402.03006v1
Compressor summary: The article presents ENVBO, an algorithm that optimizes systems in changing environments with controllable and uncontrollable parameters by conditioning on measurements of the latter, demonstrating its effectiveness in a wind farm simulator example.
http://arxiv.org/abs/2402.03003v1
Compressor summary: The authors developed two tools to detect dataset usage in medical imaging papers and found that there is a high concentration of usage of a limited set of datasets and inconsistent citation practices.
http://arxiv.org/abs/2402.02998v1
Compressor summary: Bloop is a method for combining auxiliary objectives with training losses in deep learning models using gradient surgery and moving averages, which improves performance on NLP and vision tasks.
http://arxiv.org/abs/2402.02996v1
Compressor summary: The paper proposes Text-Guided Image Clustering, which uses generated text from captioning and VQA models to cluster images, and introduces a novel approach to inject task or domain knowledge for clustering.
http://arxiv.org/abs/2402.02992v1
Compressor summary: DeRa is a method to improve language models by adjusting their alignment without retraining, making them more efficient and less prone to errors and biases.
http://arxiv.org/abs/2402.02986v1
Compressor summary: The paper proposes a new training strategy for object detectors in automated driving that considers the criticality of pedestrians to prevent dangerous misdetections.
http://arxiv.org/abs/2402.02985v1
Compressor summary: The paper proposes an unsupervised road parsing framework that uses a vision language model and a computer vision model to process UAV images without manual annotations, achieving high accuracy and flexibility.
http://arxiv.org/abs/2402.02978v1
Compressor summary: The paper compares different logic programming tools for meta-querying in ontologies under the Metamodeling Semantic Entailment Regime (MSER) using Datalog, a practical approach for sizeable ontologies.
http://arxiv.org/abs/2402.02977v1
Compressor summary: The paper introduces variational inference methods for posterior flows, a class of stochastic processes, and proposes a training-free method to transform linear flows into straight constant-speed flows for faster sampling and improved accuracy.
http://arxiv.org/abs/2402.02976v1
Compressor summary: The paper proposes a randomized boosting algorithm that improves the theoretical performance of voting classifiers by reducing logarithmic dependencies in the generalization error.
http://arxiv.org/abs/2402.02975v1
Compressor summary: The authors investigate how different types of contextual information, such as linguistic, structural, and temporal, can improve stance detection in text classification using a transformer-based model on a large dataset.
http://arxiv.org/abs/2402.02972v1
Compressor summary: RetDream improves text-to-3D generation by using a retrieval-based approach to enhance the quality and geometry of generated scenes, while adapting the diffusion model's 2D prior for view consistency.
http://arxiv.org/abs/2402.02968v1
Compressor summary: Foundation models improve intelligent vehicle capabilities by processing and fusing diverse data modalities and tasks, with potential applications in various learning paradigms.
http://arxiv.org/abs/2402.02964v1
Compressor summary: The authors present a new method for estimating posterior and noise parameters in Bayesian inverse problems using an expectation maximization algorithm and a learned conditional normalizing flow.
http://arxiv.org/abs/2402.02956v1
Compressor summary: The paper proposes a framework called AdaTreeFormer that uses a shared encoder with hierarchical feature extraction and attention mechanisms to estimate tree density from aerial or satellite images in different domains.
http://arxiv.org/abs/2402.02951v1
Compressor summary: DynaBRO is a fault-tolerant distributed machine learning method that can handle dynamic Byzantine behaviors without requiring knowledge of the number of malicious machines or losing convergence speed.
http://arxiv.org/abs/2402.02949v1
Compressor summary: Key points: - OoD detection is important for DNN reliability - PCA fails to separate OoD and InD features in nonlinear subspace - KPCA with non-linear kernels improves OoD-InD separability - Reconstruction error in KPCA subspace is used for efficient detection - Empirical results show superior efficiency and efficacy of KPCA-based detector Summary: The authors propose a Kernel PCA (KPCA)-based method for Out-of-Distribution (OoD) detection in Deep Neural Networks, which uses non-linear kernels to enhance the separability between OoD and In-Distribution features and achieves efficient and accurate detection with low reconstruction error.
http://arxiv.org/abs/2402.02946v1
Compressor summary: The paper proposes a new layer for neural networks, HoughToRadon Transform, which improves speed and accuracy in semantic image segmentation tasks by modifying feature maps after Hough Transform.
http://arxiv.org/abs/2402.02941v1
Compressor summary: The text is a comprehensive review of hybrid CNN-ViT architectures in computer vision, exploring their synergies, challenges, and future directions.
http://arxiv.org/abs/2402.02933v1
Compressor summary: InterpretCC is a family of interpretable neural networks that adaptively activate features to provide trustworthy explanations, actionable interpretations, and accurate predictions for human-facing domains.
http://arxiv.org/abs/2402.02928v1
Compressor summary: The text describes a challenge to test machine learning-based image segmentation for identifying parts of a historic airplane in XL-CT images.
http://arxiv.org/abs/2402.02926v1
Compressor summary: The paper proposes a transformer-based method for automated cognate detection in historical linguistics, which uses labeled information and multiple sequence alignments to improve accuracy and efficiency.
http://arxiv.org/abs/2402.02922v1
Compressor summary: The text proposes a new color constancy method for images with multiple light sources, which learns pixel-wise illumination maps and preserves smoothness and natural appearance using total variation loss, bilateral filter, and label-smoothing techniques.
http://arxiv.org/abs/2402.02915v1
Compressor summary: The paper proposes a computer-assisted method using a linguistic model and semantic vectors to study mutual intelligibility between closely related languages, such as German, Dutch, and English.
http://arxiv.org/abs/2402.02910v1
Compressor summary: This study develops a waist-mounted sensor that accurately recognizes four exercises in an older adult rehabilitation program, improving on existing methods.
http://arxiv.org/abs/2402.02906v1
Compressor summary: ViewFusion is an end-to-end generative approach to novel view synthesis that adapts to multiple scenes and object classes, uses variable number of views, and works well in undetermined conditions, but has limitations in inference speed and dataset size.
http://arxiv.org/abs/2402.02896v1
Compressor summary: The text explores how personality profiles affect the behaviour of large language models in naturalistic dialogues and calls for more research on crafting human-like personas for interactive AI agents.
http://arxiv.org/abs/2402.02892v1
Compressor summary: The paper proposes a novel video frame interpolation method that uses a hierarchical pyramid module to estimate intermediate optical flow, reducing complexity and improving accuracy.
http://arxiv.org/abs/2402.02890v1
Compressor summary: HTBB is a new black-box approximation and optimization method that uses low-rank hierarchical Tucker decomposition and outperforms existing gradient-free methods and tensor train decomposition for high-dimensional problems.
http://arxiv.org/abs/2402.02887v1
Compressor summary: Our proposed adaptation method for foundation models does not backpropagate gradients through the backbone, reducing training-time, memory usage, and achieving state-of-the-art accuracy-parameter trade-offs on VTAB benchmark.
http://arxiv.org/abs/2402.02885v1
Compressor summary: Key points: - AI is transforming our lives but has ethical issues - Decentralized AI (DEAI) is an alternative to centralized AI (CEAI) - The paper reviews 71 studies on DEAI and its building blocks Summary: The paper explores decentralized AI as a way to address ethical challenges of AI, by reviewing existing work and identifying its components.
http://arxiv.org/abs/2402.02883v1
Compressor summary: The authors propose a method for attributing Siamese encoders' decisions to linguistic aspects and compare exact and approximate attributions for understanding their behavior.
http://arxiv.org/abs/2402.02872v1
Compressor summary: The text explores how in-context learning works by merging demonstration features and using attention weights to transfer label information, and proposes a hypothesis with experiments on different models.
http://arxiv.org/abs/2402.02870v1
Compressor summary: Explanation algorithms are complex, but need clear interpretations to avoid errors; papers should clarify how to use and understand them.
http://arxiv.org/abs/2402.02868v1
Compressor summary: Fine-tuning RL models can cause forgetting of pre-trained capabilities, leading to poor transfer performance; however, knowledge retention techniques can mitigate this issue and improve results.
http://arxiv.org/abs/2402.02864v1
Compressor summary: EEVEE is an easy-to-use web-based tool for creating NLP datasets with support for various tasks.
http://arxiv.org/abs/2402.02858v1
Compressor summary: The authors propose a single autoregressive model for model-based reinforcement learning without ensembles, achieving good performance on D4RL benchmark while analyzing model properties.
http://arxiv.org/abs/2402.02851v1
Compressor summary: Compositional Feature Alignment (CFA) is a two-stage technique that improves the compositional generalization of pretrained models by learning orthogonal class and domain features and finetuning the encoder with them.
http://arxiv.org/abs/2402.02844v1
Compressor summary: The paper evaluates open-domain claim verification systems on biomedical and health claims using different knowledge sources and retrieval techniques.
http://arxiv.org/abs/2402.02837v1
Compressor summary: The paper proposes a formal approach to segment dialogues into topic segments and analyse their interactions to understand topic organization in open-domain conversations.
http://arxiv.org/abs/2402.02834v1
Compressor summary: This paper proposes a depth pruning method for large language models that improves inference speeds, especially when memory is limited, compared to width pruning methods.
http://arxiv.org/abs/2402.02827v1
Compressor summary: PowerGraph is a new dataset for training and explaining graph neural networks on cascading failure events in power grids, which could improve GNN applications across various disciplines.
http://arxiv.org/abs/2402.02826v1
Compressor summary: Key points: - Computer vision models can detect Human Papilloma Virus Genital warts using synthetic data generated by diffusion models - The model achieved high accuracy, precision, recall, and F1 Score for both HPV and normal cases - The approach is fast and innovative for urgent medical situations Summary: The study shows how diffusion models can create realistic synthetic images for training a computer vision model to detect genital warts caused by Human Papilloma Virus with high accuracy and reliability.
http://arxiv.org/abs/2402.02823v1
Compressor summary: The text discusses the problem of deliberate contamination of large language models' performance measurements by malicious providers and proposes a new method to detect it.
http://arxiv.org/abs/2402.02820v1
Compressor summary: Frequency-enhanced Conditional Variational Autoencoder (FCVAE) is a novel unsupervised method for detecting anomalies in time series data that integrates global and local frequency features to capture both long-periodic and short-periodic patterns, achieving better performance than existing VAE-based methods.
http://arxiv.org/abs/2402.02812v1
Compressor summary: The authors propose different reconstruction methods for real-time urban air pollution maps using city graphs and super-learning models, and test their performance in Paris.
http://arxiv.org/abs/2402.02811v1
Compressor summary: The study uses a novel deep learning technique to analyze multi-scale views of resting-state fMRI volumes and classify mild cognitive impairment from healthy controls, revealing differences in brain network activity and dynamics.
http://arxiv.org/abs/2402.02807v1
Compressor summary: The text compares the performance of using lexical cognates and sound correspondences for language family tree reconstruction and finds that cognate-based approaches are more accurate.
http://arxiv.org/abs/2402.02805v1
Compressor summary: The study investigates if large language models can plan asynchronously and presents a new technique called Plan Like a Graph that improves their performance but reveals their limitations in complex tasks.
http://arxiv.org/abs/2402.02801v1
Compressor summary: Key points: - Paper proposes KS-Lottery, a method to find effective parameters for multilingual fine-tuning of LLMs using Kolmogorov-Smirnov Test - Theoretically proves that KS-Lottery can find certified winning tickets in the embedding layer - Experiments show that KS-Lottery achieves comparable performance with full fine-tuning using much fewer parameters Summary: KS-Lottery is a new method to efficiently fine-tune LLMs for multilingual tasks by identifying a small subset of parameters that perform well, using a statistical test and a theoretical guarantee.
http://arxiv.org/abs/2402.02800v1
Compressor summary: The paper presents a new method to estimate camera poses from extreme viewpoint differences by using object priors and diffusion models to synthesize novel-view images and match them.
http://arxiv.org/abs/2402.02797v1
Compressor summary: JAFFNet is a feature fusion network for saliency detection of surface defects, which adapts to different scales and backgrounds by using joint attention and dense receptive fields.
http://arxiv.org/abs/2402.02791v1
Compressor summary: Key points: - The study explores how to optimize tiny language models (1B parameters) for mobile devices. - It proposes several design formulas and trains PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on multilingual corpora. - The experiments show significant improvement over baselines and even some state-of-the-art models. Summary: The study optimizes tiny language models for mobile devices using design formulas and achieves better performance than many larger models.
http://arxiv.org/abs/2402.02790v1
Compressor summary: The Hyperbolic Tangent Exponential Linear Unit (TeLU) is a novel activation function for neural networks that improves stability and robustness over existing functions like ReLU, GELU, and Mish.
http://arxiv.org/abs/2402.02782v1
Compressor summary: The paper explores how to build incremental parsers that output trees using prefix information, guided by left-to-right language models and tree-decoding modules, and compares them to non-incremental and partially incremental models.
http://arxiv.org/abs/2402.02772v1
Compressor summary: The paper introduces CDiffuser, a novel diffusion-based reinforcement learning method that uses return contrast to improve the base distribution and generate trajectories towards high-return states.
http://arxiv.org/abs/2402.02769v1
Compressor summary: LoT is a new technique that improves generalization in deep neural networks by training auxiliary student learners who teach the main model more abstract and generalizable correlations.
http://arxiv.org/abs/2402.02761v1
Compressor summary: Our improved stochastic Hough transform technique can accurately and quickly detect transmission lines in UAV images by using the Hessian matrix for initial processing and enhancing boundary search and pixel row segmentation.
http://arxiv.org/abs/2402.02750v1
Compressor summary: The paper proposes KIVI, a 2bit KV cache quantization algorithm that reduces memory usage and enables larger batch sizes for large language models.
http://arxiv.org/abs/2402.02746v1
Compressor summary: Standard Bayesian Optimization with Gaussian process regression often performs well in high-dimensional optimization problems, contrary to common belief.
http://arxiv.org/abs/2402.02741v1
Compressor summary: The paper proposes a new method to optimize hyperparameters called "glocal hypergradient estimation", which combines reliability and efficiency by using Koopman operator theory to approximate global hypergradients from local ones.
http://arxiv.org/abs/2402.02738v1
Compressor summary: The paper evaluates and proposes a method to improve the robustness of LiDAR-camera fusion models for 3D object detection in various weather conditions.
http://arxiv.org/abs/2402.02736v1
Compressor summary: The text describes a method to improve human body pose and shape estimation from a single camera using unannotated videos for extra supervision when there is not enough annotated data available.
http://arxiv.org/abs/2402.02734v1
Compressor summary: The paper introduces InVA, a novel method that uses variational autoencoders to efficiently borrow information from multiple images and capture complex non-linear associations for predictive inference.
http://arxiv.org/abs/2402.02733v1
Compressor summary: The paper introduces a novel method to re-age faces and transfer portrait styles in non-photorealistic images in one generative step, using existing networks trained within the same domain.
http://arxiv.org/abs/2402.02732v1
Compressor summary: The paper proposes a generative surrogate method for black-box attacks on DNNs, which learns the distribution of samples near the target's decision boundaries and crafts adversarial examples with imperceptible differences.
http://arxiv.org/abs/2402.02725v1
Compressor summary: This research shows that analyzing head movement patterns can help detect cybersickness with high accuracy using machine learning algorithms.
http://arxiv.org/abs/2402.02724v1
Compressor summary: The paper introduces a new dataset and method (FDNet) for segmenting astrocytes from microscopy images, which helps to study neuronal metabolism and predict differentiation progress in iPSCs.
http://arxiv.org/abs/2402.02720v1
Compressor summary: The paper proposes an adaptive online learning algorithm that can gracefully forget old information and improve over gradient descent for tasks where the future may differ from the past.
http://arxiv.org/abs/2402.02716v1
Compressor summary: This survey examines how large language models can enhance the planning abilities of autonomous agents, reviewing existing works and categorizing them into different directions, such as task decomposition and memory.
http://arxiv.org/abs/2402.02713v1
Compressor summary: The paper discusses the potential of large language models to revolutionize time series analysis and improve decision-making.
http://arxiv.org/abs/2402.02711v1
Compressor summary: Gaussian activations and preconditioned neural architectures improve the training of physics-informed neural networks for solving partial differential equations.
http://arxiv.org/abs/2402.02705v1
Compressor summary: The paper introduces "Surgery," a method to reduce representation bias in multi-task learning by adjusting the merged model's representation to match individual models.
http://arxiv.org/abs/2402.02701v1
Compressor summary: This paper analyzes the factors affecting the generalization gap in visual reinforcement learning with distractors and shows that minimizing representation distance between training and testing environments is crucial.
http://arxiv.org/abs/2402.02700v1
Compressor summary: The paper studies CMDPs with time-varying environments using two linear function approximation models and proposes novel algorithms with guaranteed suboptimality gap and polynomial sample complexity.
http://arxiv.org/abs/2402.02698v1
Compressor summary: The paper proposes a general framework for learning with stochastic dominance, which extends the concept to compare arbitrary random variables and finds optimal solutions efficiently.
http://arxiv.org/abs/2402.02697v1
Compressor summary: The paper analyzes the connections between implicit deep equilibrium models and explicit neural networks using random matrix theory, showing how their spectral behavior depends on activation functions and weight variances.
http://arxiv.org/abs/2402.02696v1
Compressor summary: Key points: - Responsible ML addresses issues like interpretability, fairness, robustness, and domain generalization. - Feature selection is important for responsible ML tasks. - Causal feature selection identifies features with causal impacts on outcomes and distinguishes correlation from causation. Summary: The survey discusses how causal feature selection can enhance responsible ML by ensuring ethical and social values, reliability, and trustworthiness in high-stakes applications.
http://arxiv.org/abs/2402.02695v1
Compressor summary: The paper proposes a new algorithm that uses class probabilities for black-box sentence-level attacks, which are adversarial sentences that fool text classifiers.
http://arxiv.org/abs/2402.02692v1
Compressor summary: The paper develops a linear GNN architecture (LG-GNN) that accurately predicts edges in graphs generated by a graphon and provides theoretical guarantees for its performance.
http://arxiv.org/abs/2402.02687v1
Compressor summary: PoPBO is a novel, efficient, and robust Bayesian Optimization method that uses a ranking-based surrogate model based on the Poisson process to handle noisy and intractable function responses in optimization problems.
http://arxiv.org/abs/2402.02681v1
Compressor summary: The paper proposes a novel framework for systematically breaking symmetry in equivariant neural networks using symmetry breaking sets, which are data efficient and applicable to any group.
http://arxiv.org/abs/2402.02680v1
Compressor summary: The authors study geographic biases in large language models (LLMs) and show that they are correlated with socioeconomic conditions, affecting predictions on sensitive topics.
http://arxiv.org/abs/2402.02678v1
Compressor summary: The paper proposes a new XAI framework that uses counterfactual probabilities and prior information to explain black-box models without assuming a known causal graph.
http://arxiv.org/abs/2402.02675v1
Compressor summary: The paper introduces a method to verify model evaluations of private neural networks using zero-knowledge proofs, improving transparency and trust in commercial machine learning.
http://arxiv.org/abs/2402.02665v1
Compressor summary: The paper proposes extending the utility-based paradigm from multi-objective reinforcement learning to single-objective reinforcement learning for various benefits and challenges.
http://arxiv.org/abs/2402.02663v1
Compressor summary: The author challenges the claim that counterfactual fairness and demographic parity are equivalent, and clarifies common misconceptions about counterfactual fairness.
http://arxiv.org/abs/2402.02662v1
Compressor summary: The paper proposes ICE, a method that uses generated captions to improve OOD generalization of vision-language models for image classification.
http://arxiv.org/abs/2402.02658v1
Compressor summary: The paper proposes MiPS, a method to automate data curation for process supervision, which improves problem-solving performance by sampling and scoring intermediate step completions using a reasoning model.
http://arxiv.org/abs/2402.02656v1
Compressor summary: The RACER system uses a large language model to analyze semi-structured interviews in healthcare research, producing themes that match human evaluators' results with high agreement.
http://arxiv.org/abs/2402.02655v1
Compressor summary: The paper describes VlogQA, a corpus of Vietnamese spoken language for machine reading comprehension tasks, based on YouTube videos about food and travel, and reports promising results using deep learning models.
http://arxiv.org/abs/2402.02653v1
Compressor summary: PALM is a method that improves OOD detection by modeling each class with multiple prototypes and learning more faithful sample embeddings, achieving state-of-the-art performance on CIFAR-100.
http://arxiv.org/abs/2402.02651v1
Compressor summary: The text proposes a new method for embodied reinforcement learning that uses vision-language models as promptable representations, which improves performance on complex RL tasks in Minecraft and Habitat compared to generic image embeddings or instruction-following methods.
http://arxiv.org/abs/2402.02649v1
Compressor summary: The paper proposes densely decoded networks (ddn) with crutch connections for refined dense prediction in medical image segmentation, and adaptive deep supervision (ads) for robust feature extraction using layer-wise effective receptive fields (lerf).
http://arxiv.org/abs/2402.02648v1
Compressor summary: The paper discusses the issues with large language models' responses to knowledge-intensive questions and proposes a new prompting method, Recursive Chain of Feedback (R-CoF), to improve reliability and validity.