This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-15 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.09404v1
Compressor summary: AQA-Bench is a benchmark to test large language models' ability to reason sequentially in algorithmic contexts, revealing insights on closed-source vs open-source models and other factors affecting performance.
http://arxiv.org/abs/2402.09401v1
Compressor summary: The paper proposes a query-efficient reinforcement learning method for aligning large language models with human preferences by formalizing the problem as a contextual dueling bandit problem and designing an active-query-based proximal policy optimization algorithm.
http://arxiv.org/abs/2402.09398v1
Compressor summary: The paper proposes LESS, a cache method for large language models that balances memory efficiency and information retention by integrating a constant-sized cache with eviction-based methods.
http://arxiv.org/abs/2402.09394v1
Compressor summary: The paper introduces LEME, a protocol to evaluate model editing techniques for long-form text generation, which reveals different performance aspects than short-form metrics and identifies common failure modes in long-form settings.
http://arxiv.org/abs/2402.09391v1
Compressor summary: The paper introduces SMolInstruct, a large dataset for instruction tuning, and shows that LLMs can achieve strong results on various chemistry tasks, outperforming GPT-4.
http://arxiv.org/abs/2402.09390v1
Compressor summary: The hierarchical graph of thoughts (HGOT) is a multi-layered graph approach that improves the factuality and quality of language models by enhancing the retrieval of relevant passages during in-context learning.
http://arxiv.org/abs/2402.09388v1
Compressor summary: The paper proposes an entropy-regularized model-based planner for partially observable problems that improves robustness and objective inference by reducing policy commitment to a single action.
http://arxiv.org/abs/2402.09381v1
Compressor summary: GraSSRep is a novel method that uses graph neural networks to classify DNA sequences as repetitive or non-repetitive, improving repeat detection accuracy in metagenomic data.
http://arxiv.org/abs/2402.09373v1
Compressor summary: The text proposes a Constrained Learning method for long-term time series forecasting that minimizes both average performance and maximum error, using a Primal-Dual algorithm.
http://arxiv.org/abs/2402.09371v1
Compressor summary: The paper investigates how data format and position encoding affect length generalization in Transformers for adding two integers and achieves a 2.5x extrapolation, but finds the process sensitive to random seeds and other factors.
http://arxiv.org/abs/2402.09369v1
Compressor summary: The paper introduces CultureAtlas, a dataset for acquiring multicultural knowledge from Wikipedia links to improve language models' cross-cultural communication and understanding.
http://arxiv.org/abs/2402.09368v1
Compressor summary: VCD is a simple framework for generating videos with controlled subject identities using various modules to disentangle and enhance identity information.
http://arxiv.org/abs/2402.09367v1
Compressor summary: Key points: - Microbial communities affect wastewater treatment processes and settling characteristics - Computer vision-based approach using deep CNN models to assess sludge settling based on microscopy images - Transfer learning, data augmentation, and various CNN architectures tested for performance - Approach provides less labour-intensive, objective, and consistent assessments Summary: The study presents a computer vision-based approach using deep CNN models to predict sludge settling characteristics in wastewater treatment plants based on microscopy images. The approach uses transfer learning, data augmentation, and different CNN architectures to overcome limitations of existing techniques and provide objective and consistent assessments.
http://arxiv.org/abs/2402.09363v1
Compressor summary: The paper proposes using copyright traps made of fictitious entries in original content to detect the use of copyrighted materials in Large Language Models that do not naturally memorize.
http://arxiv.org/abs/2402.09360v1
Compressor summary: HiRE is a method that combines compression and efficient multi-device operations to speed up autoregressive decoding with sparse large language models on accelerators.
http://arxiv.org/abs/2402.09358v1
Compressor summary: The study shows how to use a cloud-based AI similar to ChatGPT to analyze radiology reports in hospitals, keeping data private and improving accuracy, reliability, and interpretability.
http://arxiv.org/abs/2402.09353v1
Compressor summary: DoRA is a novel method that combines weight decomposition and LoRA to improve fine-tuning performance while avoiding extra inference costs.
http://arxiv.org/abs/2402.09346v1
Compressor summary: The paper proposes an automatic method to create probes to audit large language models using different versions of the same question and human verification, increasing transparency and scientific rigor.
http://arxiv.org/abs/2402.09345v1
Compressor summary: The paper proposes InfoRM, an information theoretic-based framework for reward modeling in reinforcement learning from human feedback, which can detect and mitigate reward overoptimization by using a variational information bottleneck objective and a cluster deviation score.
http://arxiv.org/abs/2402.09344v1
Compressor summary: The paper introduces methods to generate more diverse translations using perturbed k-nearest neighbor machine translation, addressing the overcorrection problem in previous methods.
http://arxiv.org/abs/2402.09334v1
Compressor summary: AuditLLM is a tool that evaluates the safety, consistency, and reliability of Large Language Models by probing them with multiple variations of a single question, helping to identify potential issues such as bias or hallucinations.
http://arxiv.org/abs/2402.09329v1
Compressor summary: The YOLOv8-AM model incorporating attention mechanisms improves fracture detection performance and achieves state-of-the-art results in computer-assisted diagnosis of wrist trauma.
http://arxiv.org/abs/2402.09327v1
Compressor summary: This paper studies how memorizing training data affects learning performance in stochastic convex optimization and shows a tradeoff between accuracy and memorization measured by conditional mutual information.
http://arxiv.org/abs/2402.09326v1
Compressor summary: The text discusses ranking functions that incorporate uncertainty in classification tasks, focusing on their stability and multigroup fairness properties.
http://arxiv.org/abs/2402.09325v1
Compressor summary: PC-NeRF is a novel framework for 3D scene reconstruction and view synthesis using sparse LiDAR frames, which divides the scene into different levels of representation and leverages hierarchical spatial partitioning.
http://arxiv.org/abs/2402.09320v1
Compressor summary: ICDPO is a novel approach that improves LLMs' ability to generate safe content by borrowing HPA capabilities from superior LLMs with In-context Learning, without the need for fine-tuning.
http://arxiv.org/abs/2402.09316v1
Compressor summary: The study presents a method to protect image data privacy by generating human-perceivable images that can be accurately classified by authorized models while confusing unauthorized ones.
http://arxiv.org/abs/2402.09315v1
Compressor summary: The paper proposes a sparse context transformer (SCT) for few-shot object detection, which leverages source domain knowledge and learns sparse context from few target domain images to reduce class confusion and enhance detector performance.
http://arxiv.org/abs/2402.09305v1
Compressor summary: Causal Pretraining is a method to learn causal graphs from time series data in a supervised way, which can improve performance with more data and larger models.
http://arxiv.org/abs/2402.09303v1
Compressor summary: This study compares how humans and deep neural networks learn image classification and finds significant differences in their representational changes during the learning process.
http://arxiv.org/abs/2402.09290v1
Compressor summary: PSRL is a reinforcement learning framework that fuses supervised and unsupervised learning to improve policy interpretability and performance.
http://arxiv.org/abs/2402.09288v1
Compressor summary: The paper introduces EcoVal, a fast and practical framework to estimate the value of clusters of similar data points for machine learning models, using intrinsic and extrinsic values and a production function concept.
http://arxiv.org/abs/2402.09286v1
Compressor summary: The study proposes a Model Facts template to increase AI trust and transparency in firearm injury research, allowing users to assess the validity and biases of models without technical knowledge.
http://arxiv.org/abs/2402.09283v1
Compressor summary: This text is a survey of research on how to make large language models safe for conversational applications and prevent harmful responses.
http://arxiv.org/abs/2402.09282v1
Compressor summary: This paper shows how using GPT-4's Chain of Thought technique can improve NER performance in BERT by combining distilled knowledge from GPT-4 with human annotations, leading to better results and cost savings.
http://arxiv.org/abs/2402.09281v1
Compressor summary: The paper proposes a novel method that combines covariance and Hessian matrices to improve binary classification by maximizing between-class distance and minimizing within-class variance in a two-dimensional space.
http://arxiv.org/abs/2402.09271v1
Compressor summary: BAGNET is a hybrid machine learning model that accurately predicts the toxicity levels of mollusc meat due to harmful algal blooms, helping to control shellfish production areas effectively.
http://arxiv.org/abs/2402.09270v1
Compressor summary: Window-based event denoising is a method that improves interpretability and real-time processing for deep learning-based noise removal in complex scenes, using temporal and spatial analysis to filter out irrelevant events.
http://arxiv.org/abs/2402.09268v1
Compressor summary: Transformers can efficiently simulate and be simulated by communication rounds, enabling them to solve basic computational tasks faster than other neural sequence models.
http://arxiv.org/abs/2402.09267v1
Compressor summary: The paper proposes a method to reduce factual errors in large language models by using their own self-evaluation abilities for training and fine-tuning.
http://arxiv.org/abs/2402.09266v1
Compressor summary: The text proposes a predictive model using the kNN algorithm to support precautionary closures in mussel farming based on harmful algal blooms, achieving high accuracy and sensitivity values.
http://arxiv.org/abs/2402.09264v1
Compressor summary: UR2M is a framework for accurate event detection and reliable uncertainty estimation on wearable devices with low resource constraints.
http://arxiv.org/abs/2402.09262v1
Compressor summary: MultiMedEval is an open-source toolkit for evaluating medical vision-language models on various tasks and datasets.
http://arxiv.org/abs/2402.09259v1
Compressor summary: SyntaxShap is a new method to explain text generation by considering syntactic dependencies in the data.
http://arxiv.org/abs/2402.09257v1
Compressor summary: The Temporal Dilated Video Transformer (TDViT) is a model that efficiently extracts spatiotemporal representations for dense video tasks, such as object detection and instance segmentation, by using temporal dilated transformer blocks and hierarchical structures.
http://arxiv.org/abs/2402.09249v1
Compressor summary: The paper introduces a new adaptive activation function for neural networks that can transform and generalize existing activation functions, improving their performance and versatility.
http://arxiv.org/abs/2402.09242v1
Compressor summary: Zero-Shot Food Detection (ZSFD) is tackled by a new framework, ZSFDet, which uses multi-source graphs to model the correlation between food categories and attributes and improves performance on FOWA and UECFOOD-256 datasets.
http://arxiv.org/abs/2402.09241v1
Compressor summary: The paper proposes a framework to improve one-stage video object detection by exploiting temporal consistency, reducing computational costs, and increasing efficiency.
http://arxiv.org/abs/2402.09240v1
Compressor summary: Switch EMA (SEMA) is a simple modification to the Exponential Moving Average regularization method in deep neural networks, which improves generalization and convergence without extra costs.
http://arxiv.org/abs/2402.09239v1
Compressor summary: The paper proposes a new method to train temporal graph neural networks (TGNNs) by using importance-based negative sampling instead of uniform random sampling, which improves their performance in future-link prediction tasks.
http://arxiv.org/abs/2402.09237v1
Compressor summary: The paper improves image retrieval for visual localization by generating synthetic variants of training images and using a tailored training approach.
http://arxiv.org/abs/2402.09236v1
Compressor summary: The paper proposes a method to learn interpretable concepts from data using ideas from causal representation learning and foundation models, and demonstrates its effectiveness on synthetic and natural language data.
http://arxiv.org/abs/2402.09234v1
Compressor summary: Key points: - Crash simulations are important but computationally expensive - A multi-hierarchical framework for creating surrogate models at different resolutions is proposed - Surrogates learn low-dimensional latent dynamics and transfer learning is used to pass information between levels Summary: The authors propose a method to create surrogate models of crash simulations at different resolutions, using graph convolutional networks and transfer learning to capture macroscale and microscale features efficiently.
http://arxiv.org/abs/2402.09226v1
Compressor summary: The paper studies how two-homogeneous neural networks with small initializations converge in direction to KKT points of a correlation function using gradient flow dynamics and saddle-to-saddle dynamics.
http://arxiv.org/abs/2402.09225v1
Compressor summary: The paper presents MINT, a method to determine if an AI model was trained with specific data, using two novel architectures based on MLP and CNNs, and evaluates them on face recognition tasks with six databases.
http://arxiv.org/abs/2402.09221v1
Compressor summary: The paper introduces a method to analyze and control attention mechanisms in transformer-based LLMs using spectral filters on intermediate representations.
http://arxiv.org/abs/2402.09216v1
Compressor summary: The paper explores using Large Language Models (LLMs) to create Intelligent Tutoring Systems with handcrafted pedagogical designs, and presents MWPTutor, a sample system that outperforms GPT-4 in human evaluation on math word problems.
http://arxiv.org/abs/2402.09211v1
Compressor summary: DivaTrack is a deep learning framework that improves full-body pose estimation in digital reality by using linear accelerations from IMUs and blending predictions in two reference frames, outperforming existing methods on diverse body sizes and activities.
http://arxiv.org/abs/2402.09205v1
Compressor summary: IN3 is a benchmark for inspecting users' implicit intentions in language model-driven agents, while Mistral-Interact is a powerful model that uses IN3 to improve user-agent interaction by refining vague tasks into actionable goals.
http://arxiv.org/abs/2402.09204v1
Compressor summary: The text proposes a new post-hoc calibration method for deep neural networks that adapts to different test sets by using data augmentation and simulating domain shifts.
http://arxiv.org/abs/2402.09201v1
Compressor summary: The paper proposes a novel better-than-KL divergence for PAC-Bayes concentration inequalities and shows it can achieve strictly tighter bounds for estimating the mean of random sequences.
http://arxiv.org/abs/2402.09199v1
Compressor summary: The paper proposes POGER, a method to improve AIGT detection by estimating word generation probabilities using multiple re-sampling in black-box settings.
http://arxiv.org/abs/2402.09197v1
Compressor summary: Key points: - GBDT is a black-box model based on tree ensembles - A feature contribution method for GBDT is developed using node residues - The method allows to calculate node decisions and explain GBDT's behavior - The method is useful for ethical analysis of AI and compliance with GDPR Summary: The paper proposes a feature contribution method for GBDT that uses node residues to calculate node decisions and explain the model's behavior, which can help with ethical and legal issues in AI.
http://arxiv.org/abs/2402.09193v1
Compressor summary: The paper evaluates language models' rational reasoning abilities using cognitive tasks and finds that they display irrationality similar to humans but with different biases and inconsistent responses.
http://arxiv.org/abs/2402.09178v1
Compressor summary: FHIQA is a learning-based method for assessing portrait quality in images that uses image semantics to improve precision and generalize to various scenes, as shown by experiments on the PIQ23 benchmark.
http://arxiv.org/abs/2402.09177v1
Compressor summary: The authors propose a new Jailbreaking attack on large language models that uses preliminary question-answer pairs to guide the model's response towards revealing harmful information.
http://arxiv.org/abs/2402.09173v1
Compressor summary: The paper proposes new decentralized online convex optimization algorithms with improved regret bounds by using an online accelerated gossip strategy and exploiting network topology spectral properties.
http://arxiv.org/abs/2402.09167v1
Compressor summary: The paper introduces ERBM-KNet, a novel online clustering algorithm that combines an evolving restricted Boltzmann machine with a Kohonen network, achieving improved performance and handling streaming data efficiently.
http://arxiv.org/abs/2402.09166v1
Compressor summary: Key points: - New deinterleaving method for mixtures of discrete renewal Markov chains - Method maximizes a penalized likelihood score - Theoretical analysis proves the method's accuracy under mild conditions - Experiments on synthetic data validate the theory - Applied to pulse trains in RESM context and performs well Summary: The paper presents a new deinterleaving method that uses a penalized likelihood score to separate symbols from discrete renewal Markov chains, and shows its theoretical and practical advantages.
http://arxiv.org/abs/2402.09165v1
Compressor summary: The PNSIS framework uses probability theory to extract invariant subgraphs and improve generalization for graph Out-of-Distribution tasks, outperforming existing methods.
http://arxiv.org/abs/2402.09164v1
Compressor summary: The paper proposes a new image attribution method using submodular subset selection to improve model interpretability and accuracy for various samples, outperforming existing methods on three datasets.
http://arxiv.org/abs/2402.09161v1
Compressor summary: The article shows how Large Language Models and ChatGPT can improve teaching quality and student engagement through role-playing simulations during the digital transformation of education.
http://arxiv.org/abs/2402.09154v1
Compressor summary: Our method improves the speed and efficiency of creating adversarial prompts against large language models while maintaining their effectiveness.
http://arxiv.org/abs/2402.09152v1
Compressor summary: The paper proposes a new algorithm for bandit convex optimization with delayed feedback that improves the regret bound for various scenarios, including strong convexity and unconstrained action sets.
http://arxiv.org/abs/2402.09151v1
Compressor summary: The authors created a specialized pre-trained language model for psychological text analysis using a large dataset from Chinese social media platforms and integrated psychological lexicons into the training process.
http://arxiv.org/abs/2402.09147v1
Compressor summary: The paper proposes a framework for large language models (LLMs) to learn unknown knowledge from their own hallucinations, using a score called Points in The Unknown (PiUs), and shows that finetuned or aligned 7B-Mistral models can self-learn effectively.
http://arxiv.org/abs/2402.09146v1
Compressor summary: The paper proposes Residual Quanvolutional Neural Networks (ResQuNNs) that enable training of quanvolutional layers, improving the performance of quantum deep learning by addressing gradient-based optimization challenges.
http://arxiv.org/abs/2402.09142v1
Compressor summary: The study proposes an effective theory of representation learning in deep neural networks that shows how different architectures learn similar representations when they are flexible enough and highlights behaviors that are conserved across various models.
http://arxiv.org/abs/2402.09141v1
Compressor summary: The study evaluates text augmentation techniques and their effects on NLP tasks, proposing Modified Cyclical Curriculum Learning (MCCL) as a novel approach to improve performance.
http://arxiv.org/abs/2402.09136v1
Compressor summary: DolphCoder is a diverse instruction model for code generation that learns from various instruction targets and self-evaluates its code quality, achieving superior performance on benchmarks.
http://arxiv.org/abs/2402.09132v1
Compressor summary: The text discusses the security risks of large language models, which can generate adversarial examples that fool hate speech detection systems and other safety measures.
http://arxiv.org/abs/2402.09113v1
Compressor summary: The paper introduces a new measure called Exploration Index, which quantifies how much an RL algorithm explores compared to supervised learning, by measuring the distance travelled in the data distribution space using optimal transport metrics.
http://arxiv.org/abs/2402.09107v1
Compressor summary: The paper introduces a new multimodal database with diverse and ethically compliant volumetric data of people speaking and wearing HMDs, recorded using a VoCap studio and a Lytro Illum camera, to aid XR algorithm development and testing.
http://arxiv.org/abs/2402.09100v1
Compressor summary: The paper proposes a network for removing occlusions from facial videos using GANs and preserving emotional expression, which can be useful for various applications like video conferencing and virtual makeup.
http://arxiv.org/abs/2402.09099v1
Compressor summary: The study explores how neuron interactions evolve during training in large language models, using self-organization and multifractal analysis to reveal emergent behavior and propose Neuron-based Multifractal Analysis (NeuroMFA) as a tool for quantitative analysis.
http://arxiv.org/abs/2402.09092v1
Compressor summary: The paper presents a large survey of 400 activation functions for neural networks, providing a comprehensive overview and systematization with links to original sources.
http://arxiv.org/abs/2402.09085v1
Compressor summary: The paper proves that different probabilistic circuit models for binary distributions are equivalent and explores the challenges of extending them to categorical random variables.
http://arxiv.org/abs/2402.09084v1
Compressor summary: Sobolev Training improves model performance by using derivative information on irregular meshes for operator learning.
http://arxiv.org/abs/2402.09078v1
Compressor summary: The paper presents two new RL algorithms (ExpD3 and BE-TD3) that address and use estimation biases for better continuous control tasks, outperforming existing methods like TD3 in some scenarios.
http://arxiv.org/abs/2402.09071v1
Compressor summary: The paper proposes adding a predictive module to self-supervised learning models that improves their performance and efficiency by constraining representations with an additional loss term based on an affine transformation.
http://arxiv.org/abs/2402.09066v1
Compressor summary: Remote sensing using EO satellites can help detect and monitor illegal landfills, mitigating pollution and health hazards, by providing high-resolution data at low cost.
http://arxiv.org/abs/2402.09063v1
Compressor summary: The paper proposes a new attack on open-source language models that targets their continuous input representations, showing how it can lead to harmful behaviors and extract deleted information.
http://arxiv.org/abs/2402.09059v1
Compressor summary: BlindTuner is a system that enables privacy-preserving fine-tuning of transformer models on encrypted image data for image classification, achieving comparable accuracy and significant speed improvements over existing methods.
http://arxiv.org/abs/2402.09056v1
Compressor summary: The paper discusses challenges and novel insights of evidential deep learning for quantifying uncertainty in ML systems, focusing on optimizing second-order loss functions and interpreting epistemic uncertainty measures.
http://arxiv.org/abs/2402.09055v1
Compressor summary: The paper proposes a new multi-modal humor detection model for short videos that aligns video and text in a shared meaning space and beats existing methods on two datasets.
http://arxiv.org/abs/2402.09052v1
Compressor summary: L3GO is a language agent that uses large language models to reason about 3D mesh generation of unconventional objects in simulation environments, outperforming standard GPT-4 and other models.
http://arxiv.org/abs/2402.09051v1
Compressor summary: The paper presents FGeoDRL, a neural-symbolic system that uses reinforcement learning to perform human-like geometric deductive reasoning without human supervision.
http://arxiv.org/abs/2402.09050v1
Compressor summary: E2E training outperforms non-E2E methods due to efficient input information propagation and layer-role differentiation in intermediate representations.
http://arxiv.org/abs/2402.09047v1
Compressor summary: This paper introduces FGeo-TP, a theorem predictor that uses Transformers to help solve geometry problems faster and with fewer timeouts.
http://arxiv.org/abs/2402.09046v1
Compressor summary: The paper proposes a probabilistic inference theory that unifies reasoning and learning by modeling how data leads to symbolic knowledge through abstraction and selective ignorance.
http://arxiv.org/abs/2402.09043v1
Compressor summary: The paper investigates how difficult it is to audit web platforms with large-capacity models that can fit any data, and shows that such platforms are hard to audit robustly.
http://arxiv.org/abs/2402.09036v1
Compressor summary: This paper proposes a framework called GTI-MM that uses text-to-image models to help multi-modal learning for vision recognition tasks, improving data efficiency and robustness when some visual modalities are missing.
http://arxiv.org/abs/2402.09034v1
Compressor summary: Squared Sigmoid TanH (SST) activation improves the performance of sequential neural networks on small and sparse datasets by amplifying differences between strong and weak activations over time.
http://arxiv.org/abs/2402.09025v1
Compressor summary: SLEB prunes redundant transformer blocks from large language models to speed up inference without sacrificing linguistic capabilities.
http://arxiv.org/abs/2402.09016v1
Compressor summary: The paper proposes a pyramid attention network (PAN) for deformable medical image registration, which improves feature representation and motion pattern analysis using a dual-stream encoder and a local attention Transformer decoder.
http://arxiv.org/abs/2402.09015v1
Compressor summary: AgentEval is a framework that simplifies verifying the utility of LLM-powered applications by proposing criteria tailored to their unique purposes and assessing their performance against them.
http://arxiv.org/abs/2402.09008v1
Compressor summary: CrisisFACTS is a competition that challenges participants to develop systems for disaster summarization using web sources, focusing on fact extraction and QA-based prompting with LLaMA-13b model.
http://arxiv.org/abs/2402.09004v1
Compressor summary: GAP is a regularizer for Test-time Adaptation that uses gradient alignment and prototype features to prevent negative impacts from misclassified pseudo labels and improve performance on various datasets.
http://arxiv.org/abs/2402.08998v1
Compressor summary: The paper proposes a new algorithm for the Stochastic Shortest Path problem with a linear mixture transition kernel, which eliminates restrictive assumptions and achieves near-optimality in regret bound.
http://arxiv.org/abs/2402.08994v1
Compressor summary: The CLIP-MUSED method uses a Transformer-based feature extractor, learnable subject-specific tokens, and representational similarity analysis to decode visual neural information from multiple subjects, overcoming limitations of prior methods.
http://arxiv.org/abs/2402.08982v1
Compressor summary: MEL is a novel evolutionary computational approach that uses multi-task learning to improve feature selection in high-dimensional data and achieve better performance and efficiency than existing methods.
http://arxiv.org/abs/2402.08975v1
Compressor summary: This is a review paper that provides an overview of how Transformers and their variants are used for anomaly detection, discusses the challenges, datasets, evaluation metrics, and future trends in this domain, and compiles over 100 references related to Transformer-based anomaly detection.
http://arxiv.org/abs/2402.08968v1
Compressor summary: GrounDial is a safe conversational AI system that uses commonsense social rules instead of fine-tuning, improving safety and reducing costs.
http://arxiv.org/abs/2402.08966v1
Compressor summary: The paper introduces PLURAL, a pretrained vision-language model that excels at difference visual question answering for chest X-ray images, by using longitudinal data to train on both natural and longitudinal image-text pairs.
http://arxiv.org/abs/2402.08964v1
Compressor summary: The authors present a method to predict user experience on laptops from hardware specifications, using web browsing, video playback, and audio/video calls as indicators.
http://arxiv.org/abs/2402.08963v1
Compressor summary: DUEL is a novel framework that uses active data filtering during self-supervised pre-training to address class imbalances cost-efficiently by enhancing distinctiveness information in an active memory.
http://arxiv.org/abs/2402.08961v1
Compressor summary: HyCubE is a novel 3D circular convolutional neural network that improves knowledge hypergraph embedding performance and efficiency, achieving significant improvements over existing methods.
http://arxiv.org/abs/2402.08960v1
Compressor summary: The paper proposes a weakly-supervised open-vocabulary segmentation framework that uses independent image-mask and image-text pairs to predict masks and associate them with entities in CLIP embedding space, improving performance on challenging datasets.
http://arxiv.org/abs/2402.08958v1
Compressor summary: The paper proposes aespa, a novel PTQ algorithm for Transformers that balances accuracy and efficiency by quantizing layer-wise and considering cross-layer dependency.
http://arxiv.org/abs/2402.08957v1
Compressor summary: MUSTARD is a data generation framework that creates high-quality theorem and proof datasets for training large language models in mathematical reasoning tasks.
http://arxiv.org/abs/2402.08955v1
Compressor summary: The study shows that large language models struggle with analogies that are dissimilar to their training data, unlike humans who maintain high performance across different analogy problems.
http://arxiv.org/abs/2402.08948v1
Compressor summary: The paper investigates learning sparse polynomials with neural networks and gradient descent, and provides a basis-free generalization and a nearly sufficient condition for learnability.
http://arxiv.org/abs/2402.08946v1
Compressor summary: This paper introduces a method to measure grokking in neural networks, a phenomenon where performance improves significantly after reaching a plateau on a validation set, and investigates its sharpness under two settings.
http://arxiv.org/abs/2402.08943v1
Compressor summary: The paper proposes a synthesis framework to model variations between time-series data sequences and evaluate different DTW measures for alignment and classification tasks.
http://arxiv.org/abs/2402.08939v1
Compressor summary: LLMs struggle with reasoning tasks when the order of premises is changed, even though the underlying task remains unchanged.
http://arxiv.org/abs/2402.08936v1
Compressor summary: The paper proposes a temporal attention mechanism for DVS cameras that reduces power consumption by only focusing on unpredictable visual events, filtering out noise and decreasing computational workload.
http://arxiv.org/abs/2402.08931v1
Compressor summary: Our method improves stereo matching and depth estimation for textured and texture-less images using a volume refinement scheme that incorporates depth, attention, and a new evaluation metric.
http://arxiv.org/abs/2402.08929v1
Compressor summary: The paper proposes a simple and practical online decision making algorithm for high dimensional data that achieves optimal regret bounds for a large class of convex functions, including linear, quadratic, and generalized linear models.
http://arxiv.org/abs/2402.08925v1
Compressor summary: The paper proposes a method to align language models with diverse human preferences using a mixture of preference distributions and a MaxMin alignment objective, achieving better performance and fairness than conventional methods.
http://arxiv.org/abs/2402.08923v1
Compressor summary: The paper proposes a new method for predicting human poses using IMU data, which improves upon previous models by optimizing IMU placement and using a transformer-based model.
http://arxiv.org/abs/2402.08922v1
Compressor summary: The paper introduces the Mirrored Influence Hypothesis, which suggests that evaluating the influence of training data on test predictions can be reformulated as an inverse problem, and proposes a new method for estimating this influence more efficiently.
http://arxiv.org/abs/2402.08919v1
Compressor summary: The paper proposes a method to measure conceptual similarity between images using captions and shows it correlates with human judgement and outperforms existing methods.
http://arxiv.org/abs/2402.08918v1
Compressor summary: The paper introduces SimMLP, a framework to learn MLPs on graphs without supervision, achieving better generalization and acceleration than existing methods.
http://arxiv.org/abs/2402.08910v1
Compressor summary: The paper proposes a learning-based method for automatic bone quality classification in spinal metastasis using CT images, which improves performance with multi-task learning and self-paced learning.
http://arxiv.org/abs/2402.08907v1
Compressor summary: The paper studies negative transfer in graph learning, finding that structural differences cause node embedding discrepancies, and proposes subgraph pooling methods to address this issue.
http://arxiv.org/abs/2402.08892v1
Compressor summary: WISS is a weakly supervised method that uses corner landmarks to segment vertebral bodies from CT images with high accuracy and reduced annotation costs.
http://arxiv.org/abs/2402.08882v1
Compressor summary: Key points: - The paper proposes a new neural network architecture for moving object proposals (MOP) using optical flow estimation and semantic segmentation. - The paper uses DAVIS Dataset and Encoder-Decoder architecture as main contributions. - The paper provides the codes with TensorFlow and runs them on AWS EC2 instance. Summary: The paper presents a new neural network method for MOP using optical flow and semantic segmentation, leveraging DAVIS Dataset and Encoder-Decoder model, and providing TensorFlow codes and AWS execution.
http://arxiv.org/abs/2402.08876v1
Compressor summary: The paper proposes a hyperbolic scaling method for learning unsigned distance fields, which improves 3D reconstruction quality, training performance, and enables accurate computation of topological properties.
http://arxiv.org/abs/2402.08875v1
Compressor summary: The paper presents a new TikTok dataset, TikTokActions, for human action recognition and shows its effectiveness in improving computer vision models.
http://arxiv.org/abs/2402.08874v1
Compressor summary: TEAROOM is a novel framework that helps large language models understand hierarchical text structures and improve their performance in estimating task-specific properties.
http://arxiv.org/abs/2402.08871v1
Compressor summary: Topological deep learning uses topological features to enhance deep learning models and has promising applications and theoretical foundations that require further investigation.
http://arxiv.org/abs/2402.08869v1
Compressor summary: ScamSpot is a system that combats spam and fraud in Instagram comments using a browser extension, a BERT model, and a REST API, with data annotation, user feedback, and open-source availability.
http://arxiv.org/abs/2402.08859v1
Compressor summary: The paper proposes a new method to use large language models for improving item descriptions in recommendations by leveraging graph information and avoiding hallucination problems.