This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-02 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.00867v1
Compressor summary: AToM is a fast text-to-mesh framework that generates high-quality 3D models from multiple text prompts simultaneously and generalizes well to unseen inputs.
http://arxiv.org/abs/2402.00868v1
Compressor summary: This paper compares Image-DAS and Video-DAS methods for semantic segmentation, finding that Image-DAS outperforms Video-DAS and suggests a lack of improvement from combining the two approaches.
http://arxiv.org/abs/2402.00865v1
Compressor summary: Feature-shaping methods improve out-of-distribution detection by adjusting deep learning model features, and can be optimized using ID data only.
http://arxiv.org/abs/2402.00864v1
Compressor summary: ViCA-NeRF is a method for 3D editing with text instructions that ensures multi-view consistency using depth information and latent code alignment.
http://arxiv.org/abs/2402.00863v1
Compressor summary: Geometry Transfer is a new method for 3D style transfer that uses depth maps to extract a style guide and apply geometric deformation to radiance fields, resulting in more expressive and accurate stylizations.
http://arxiv.org/abs/2402.00861v1
Compressor summary: The authors propose a lossless data compression evaluation method for large language models that tests their generalization and robustness using data from different sources split by training cutoff dates.
http://arxiv.org/abs/2402.00858v1
Compressor summary: This paper introduces a context understanding benchmark for Large Language Models (LLMs) and evaluates their performance in different scenarios, such as in-context learning pretraining and quantized models.
http://arxiv.org/abs/2402.00857v1
Compressor summary: The paper introduces a statistical framework to control the accuracy gap between full and early-time classification by using a data-driven stopping rule.
http://arxiv.org/abs/2402.00856v1
Compressor summary: The paper proposes a new method for optimizing language models based on human preferences that avoids the drawbacks of previous methods and shows its effectiveness in experiments.
http://arxiv.org/abs/2402.00854v1
Compressor summary: SymbolicAI is a framework that combines generative AI with symbolic reasoning using logic, enabling seamless integration of models, task execution, data manipulation, and evaluation of computational graphs.
http://arxiv.org/abs/2402.00853v1
Compressor summary: LTAU is a fast and accurate uncertainty quantification method for deep learning models that uses CDFs of per-sample errors and latent space distance search.
http://arxiv.org/abs/2402.00851v1
Compressor summary: The authors propose a data augmentation technique for convolutional neural networks that uses Raman spectroscopy to measure cell densities, substrate- and product concentrations in complex biological processes, improving model performance and robustness.
http://arxiv.org/abs/2402.00849v1
Compressor summary: The paper proposes a score-based class of algorithms for causal representation learning under nonparametric latent causal models with unknown transformations, ensuring identifiability and achievability through stochastic hard or soft interventions.
http://arxiv.org/abs/2402.00847v1
Compressor summary: The authors improve a tracking-any-point model by using large amounts of unlabeled real-world data and achieve state-of-the-art results.
http://arxiv.org/abs/2402.00841v1
Compressor summary: The paper compares the performance of smaller, fine-tuned language models with larger, zero-shot ones on meeting summarization tasks and finds that FLAN-T5, a 780M parameter model, is a cost-efficient alternative for real-world deployment.
http://arxiv.org/abs/2402.00838v1
Compressor summary: OLMo is an open language model with full transparency and access to its training data, architecture, and development, aiming to enable scientific study and innovation in NLP research.
http://arxiv.org/abs/2402.00835v1
Compressor summary: ALISON is a fast and effective authorship obfuscation method that uses unique stylometric features to protect privacy while preserving text semantics.
http://arxiv.org/abs/2402.00827v1
Compressor summary: Emo-Avatar is a method to create high-quality, dynamic portrait videos with minimal customization and data requirements using deferred neural rendering and a two-stage pipeline.
http://arxiv.org/abs/2402.00823v1
Compressor summary: SLIM is a multi-critic actor-critic approach that improves latent-variable skill discovery for robotic manipulation by combining multiple reward functions and achieving better performance than existing methods in tabletop manipulation tasks.
http://arxiv.org/abs/2402.00816v1
Compressor summary: The paper extends a shielding technique called approximate model-based shielding (AMBS) to handle continuous state and action spaces, and introduces new penalties for improved stability.
http://arxiv.org/abs/2402.00809v1
Compressor summary: This paper argues that Bayesian deep learning can enhance deep learning capabilities in various settings and explores promising research directions for its future development.
http://arxiv.org/abs/2402.00807v1
Compressor summary: The authors propose a method to reduce the size of deep generative models for offline reinforcement learning by using data augmentation and knowledge distillation, achieving similar or better results than existing methods on several benchmarks.
http://arxiv.org/abs/2402.00803v1
Compressor summary: The authors present an open-source software toolkit for assessing and improving the quality of time-series data used in AI applications like Predictive Maintenance, to prevent incorrect decisions due to hardware or software failures.
http://arxiv.org/abs/2402.00798v1
Compressor summary: The paper proposes a Formal-LLM framework that integrates natural language and formal language to create controllable AI agents for complex tasks, improving planning performance and validity.
http://arxiv.org/abs/2402.00795v1
Compressor summary: The paper investigates how large language models can predict dynamical systems' behavior without fine-tuning or prompt engineering, finding that their accuracy increases with context window length.
http://arxiv.org/abs/2402.00794v1
Compressor summary: ReAGent is a model-agnostic feature attribution method for generative language models that updates token importance in a recursive manner, providing more faithful importance distributions than existing methods.
http://arxiv.org/abs/2402.00793v1
Compressor summary: The text introduces a framework for using human expertise to help algorithms make better predictions, especially on specific instances where human judgment is superior.
http://arxiv.org/abs/2402.00789v1
Compressor summary: Graph-Mamba is a new method that combines a state space model with node selection strategies to improve long-range context modeling in graph networks, achieving better performance and efficiency than existing methods.
http://arxiv.org/abs/2402.00786v1
Compressor summary: CroissantLLM is a bilingual language model that trains on equal amounts of English and French data and performs well on various French tasks.
http://arxiv.org/abs/2402.00782v1
Compressor summary: The authors propose a method to improve reinforcement learning for language models by using attention weights from the reward model to redistribute the reward, making the signal more informative and easier to optimize.
http://arxiv.org/abs/2402.00769v1
Compressor summary: The paper proposes AnimateLCM, a fast and high-quality video generation model that decouples consistency learning for images and motions, and adapts existing adapters for various functions.
http://arxiv.org/abs/2402.00763v1
Compressor summary: 360-GS is a novel technique for panoramic rendering that addresses challenges in 3D Gaussian splatting by projecting Gaussians onto the tangent plane of the unit sphere and using layout priors to guide optimization.
http://arxiv.org/abs/2402.00761v1
Compressor summary: The authors propose control theory methods to update deep neural networks online, providing stability and transfer learning guarantees for applications such as controls.
http://arxiv.org/abs/2402.00759v1
Compressor summary: The text surveys the progress and methods in tractable probabilistic generative modeling using Probabilistic Circuits (PCs), describing their trade-offs, design principles, algorithmic extensions, and challenges for deep and hybrid PCs.
http://arxiv.org/abs/2402.00752v1
Compressor summary: This paper proposes and validates an optimal projection strategy for 3D Gaussian Splatting that reduces artifacts and improves photo-realistic rendering quality.
http://arxiv.org/abs/2402.00751v1
Compressor summary: The paper proposes ERASE, an efficient algorithm for exact unlearning of task adaptation data using few-shot examples with in-context learning for large language models.
http://arxiv.org/abs/2402.00746v1
Compressor summary: Heath-LLM is an innovative framework that combines large-scale feature extraction and medical knowledge trade-off scoring to improve intelligent healthcare by integrating health reports, adjusting weights based on expertise, and enhancing language models with semi-automated features.
http://arxiv.org/abs/2402.00745v1
Compressor summary: The paper proposes Logic-Explainer, a hybrid neuro-symbolic framework that improves ethical NLI explanations by integrating LLMs with an external backward-chaining solver to refine, verify, and support their reasoning.
http://arxiv.org/abs/2402.00743v1
Compressor summary: The paper explores how transformers can learn from unstructured data in linear regression tasks, identifying components that facilitate in-context learning.
http://arxiv.org/abs/2402.00742v1
Compressor summary: The paper explores how to optimize language models based on human preferences using reward models, and proposes a transformed reward model that improves performance and allows combining multiple properties.
http://arxiv.org/abs/2402.00740v1
Compressor summary: Our method uses neural rendering to decompose 4D scenes from monocular cameras into static and dynamic features, overcoming challenges like occlusion and limited 3D clues, and achieves higher-fidelity results than existing methods for single-view dynamic scene representation.
http://arxiv.org/abs/2402.00738v1
Compressor summary: Key points: - The paper proposes a new reinforcement learning method for two-team zero-sum Markov games (2t0sMGs) - The method uses the individual-global-minimax principle to ensure coherence between minimax and greedy behaviors - The method factorizes the joint minimax Q function into individual ones and iteratively solves for them - The paper provides a theoretical analysis and empirical evaluation of the method Summary: The paper introduces FM3Q, a novel reinforcement learning method for 2t0sMGs that ensures coherence between minimax and greedy behaviors using the individual-global-minimax principle and factorizes the joint minimax Q function. The paper shows the convergence and superiority of FM3Q theoretically and empirically.
http://arxiv.org/abs/2402.00732v1
Compressor summary: The paper reviews deep learning methods for trajectory data, focusing on eight mobility use cases and analyzing their performance along the mobility data continuum.
http://arxiv.org/abs/2402.00728v1
Compressor summary: The text discusses the problem of multiple competing models in classification tasks that can lead to unfairness and presents a novel framework using dropout techniques to measure and mitigate this issue.
http://arxiv.org/abs/2402.00724v1
Compressor summary: The study presents an automatic method for segmenting spinal nerve rootlets from T2-weighted MRI scans using a 3D convolutional neural network, achieving good performance and low variability across different MRI vendors, sites, and sessions.
http://arxiv.org/abs/2402.00723v1
Compressor summary: T5VQVAE is a novel model that combines VQVAEs with T5 to improve semantic control and generation in NLP tasks.
http://arxiv.org/abs/2402.00715v1
Compressor summary: The paper proposes an assurance framework for intent-based networking that uses AI policies from large language models to detect and fix intent drift.
http://arxiv.org/abs/2402.00712v1
Compressor summary: ChaosBench is a new physics-based benchmark to evaluate subseasonal-to-seasonal climate prediction models, which shows existing methods struggle with this challenging task.
http://arxiv.org/abs/2402.00711v1
Compressor summary: The paper proposes a method to generate counterfactuals for text classification by intervening in text representations, which overcomes the limitations of using plausible real-world events for texts.
http://arxiv.org/abs/2402.00707v1
Compressor summary: The paper proposes a new method for generating text with statistical guarantees using non-exchangeable conformal prediction and nearest neighbors.
http://arxiv.org/abs/2402.00705v1
Compressor summary: The paper introduces two datasets, one with longitudinal survey data and the other with register data, to study and compare the predictability of fertility outcomes in the Netherlands, and announces a data challenge called PreFer starting in Spring 2024.
http://arxiv.org/abs/2402.00700v1
Compressor summary: The text discusses human pose estimation in-bed monitoring applications, comparing unimodal and multimodal methods, and reviewing existing datasets and approaches to highlight limitations, challenges, and future directions.
http://arxiv.org/abs/2402.00695v1
Compressor summary: The paper proposes a new method to generate realistic face morphing attacks using face recognition models' embeddings and shows its effectiveness against various face recognition systems.
http://arxiv.org/abs/2402.00692v1
Compressor summary: Key points: - The paper presents a framework for point cloud cleaning, plane detection, and semantic segmentation for enhancing building modeling. - The framework uses adaptive threshold, RANSAC, PointNet, and deep learning techniques. - The results show improved accuracy and efficiency in building modeling tasks. Summary: The paper proposes a framework that uses various techniques to clean, detect planes, and segment semantically point clouds for better building modeling.
http://arxiv.org/abs/2402.00672v1
Compressor summary: The paper proposes a new method for person re-identification across different modalities (visible and infrared) that uses a Modality-Unified Label Transfer module and an Online Cross-memory Label Refinement module to improve cross-modality label associations and representation learning.
http://arxiv.org/abs/2402.00667v1
Compressor summary: The paper explores two phases of superalignment under the W2SG framework to enhance weak supervision and ensure consistent AI behavior with human values and intentions.
http://arxiv.org/abs/2402.00659v1
Compressor summary: The study compares machine learning classifiers and finds that Random Forest is the best at predicting freight mode choice based on shipment characteristics.
http://arxiv.org/abs/2402.00658v1
Compressor summary: This paper proposes a method to improve LLMs' reasoning by learning from ranked trajectories and synthesized process rewards, achieving better results on logical reasoning benchmarks.
http://arxiv.org/abs/2402.00654v1
Compressor summary: The study uses 2017 commodity flow data to build a high-performance freight mode choice model that improves accuracy by constructing local models, extracting geographical features, and applying ensemble learning methods.
http://arxiv.org/abs/2402.00638v1
Compressor summary: The authors use machine learning to create a predictive model for the long-term outcomes of stroke patients based on various factors.
http://arxiv.org/abs/2402.00637v1
Compressor summary: The text proposes a novel multimodal fusion model that combines fisheye cameras and ultrasonic sensors for efficient obstacle perception in autonomous driving, especially under challenging conditions like low-light or glare.
http://arxiv.org/abs/2402.00632v1
Compressor summary: This paper shows how direct speech-to-text translation systems can use prosody (tone of speech) to better translate Korean to English, outperforming traditional cascade systems.
http://arxiv.org/abs/2402.00627v1
Compressor summary: The CapHuman framework generates realistic human portraits from a single reference photo using identity preservation, 3D facial prior, and text-to-image diffusion models.
http://arxiv.org/abs/2402.00626v1
Compressor summary: Typographic attacks threaten large vision-language models (LVLMs), and the paper introduces a new benchmark and an effective self-generated attack method using GPT-4V.
http://arxiv.org/abs/2402.00620v1
Compressor summary: The text discusses challenges and methods for identifying political actors who make claims in public debates, highlighting the limitations of large language models and suggesting a hybrid approach.
http://arxiv.org/abs/2402.00608v1
Compressor summary: The paper introduces soft silhoutte, a probabilistic approach to improve deep clustering by forming compact and well-separated clusters, and presents an autoencoder-based architecture for optimizing it.
http://arxiv.org/abs/2402.00607v1
Compressor summary: Key points: - Time-series data has quality issues, bias, and generalization problem - InfoBoost is a framework that synthesizes cross-domain time-series data with representation learning - InfoBoost enables model training without real data and universal feature extraction - InfoBoot overcomes interference and sampling window limitations Summary: InfoBoost is a novel framework that uses synthetic data to train models and extract features from time-series data, improving quality and generalization.
http://arxiv.org/abs/2402.00606v1
Compressor summary: The paper proposes a model that combines PatchMatch and Transformers to transfer dynamic textures from one video to another by synthesizing the start frame, predicting patches, and merging them smoothly.
http://arxiv.org/abs/2402.00592v1
Compressor summary: The article introduces a new partial-label-learning algorithm based on Dempster-Shafer theory that produces well-calibrated uncertainty estimates and performs competitively in real-world applications.
http://arxiv.org/abs/2402.00591v1
Compressor summary: Sandra is a neuro-symbolic reasoner that combines vectorial representations with deductive reasoning using the Description and Situation ontology design pattern, achieving better performance and interpretability than baselines without increasing complexity.
http://arxiv.org/abs/2402.00575v1
Compressor summary: LFdiff is a diffusion-based framework that synthesizes light fields from single RGB images using disparity estimation, position-aware warping, and disentanglement-based noise estimation.
http://arxiv.org/abs/2402.00564v1
Compressor summary: Key points: - The paper proposes a novel grayscale image classification approach using MLPs and graph convolutional layers - The approach exploits the lightweightness of MLPs, reduces problem complexity, and improves accuracy - The paper also develops an FPGA accelerator for the model with several optimizations - The approach achieves low latency and competitive or leading performance on benchmark datasets Summary: The paper presents a fast and accurate grayscale image classification method using MLPs and graph convolutional layers, along with an FPGA accelerator.
http://arxiv.org/abs/2402.00559v1
Compressor summary: Reveal is a new dataset for evaluating automatic verifiers of complex reasoning steps in open-domain question answering tasks.
http://arxiv.org/abs/2402.00541v1
Compressor summary: The paper proposes a new data augmentation technique, Masked Conditional Diffusion Model (MCDM), to generate diverse and realistic forged faces that enhance deepfake detection models' robustness and generalizability.
http://arxiv.org/abs/2402.00534v1
Compressor summary: The paper proposes disentangling and manifold-structured keys in vision transformers, which improves their accuracy on various tasks.
http://arxiv.org/abs/2402.00531v1
Compressor summary: Key points: - PINNs are neural networks for solving PDEs but have convergence and prediction issues - Condition number is a metric to diagnose and mitigate these issues by measuring sensitivity and stability - Preconditioning algorithm improves condition number and reduces errors in 18 PDE problems Summary: The paper proposes using condition number, a measure of sensitivity and stability, to improve PINNs' performance in solving PDEs with preconditioning.
http://arxiv.org/abs/2402.00530v1
Compressor summary: Superfiltering uses a smaller model to select data for finetuning a larger model, improving instruction tuning efficiency and performance.
http://arxiv.org/abs/2402.00522v1
Compressor summary: The paper studies how different parts of the Transformer model affect its ability to handle long and complex sequences, and identifies key parameters that influence its performance.
http://arxiv.org/abs/2402.00518v1
Compressor summary: EE-Tuning is a method to improve large language models by adding tuned early-exit layers that require less resources and data, and it's released as open-source code.
http://arxiv.org/abs/2402.00491v1
Compressor summary: The text discusses how different types of explanations in interactive machine-learning systems help healthcare experts improve models by configuring data, and suggests that a hybrid fusion of both global model-centric and data-centric explanations is the most effective approach.
http://arxiv.org/abs/2402.00485v1
Compressor summary: The paper proposes CP-FairRank, a re-ranking algorithm for recommender systems that considers fairness on both user and item sides, using group segmentation, model selection, and domain to adapt to various settings.
http://arxiv.org/abs/2402.00481v1
Compressor summary: The paper proposes a novel method to address the accuracy imbalance issue in few-shot class-incremental learning by stimulating mapping ability, using dual-feature classification, and self-optimizing classifiers.
http://arxiv.org/abs/2402.00474v1
Compressor summary: SA-MDKIF is a framework that enhances general-purpose language models with medical knowledge through instruction tuning and skill adaptation, improving performance on various medical tasks.
http://arxiv.org/abs/2402.00468v1
Compressor summary: The article introduces RadDQN, a deep Q-learning based architecture that optimizes radiation exposure for autonomous UAVs using a radiation-aware reward function and exploration strategies.
http://arxiv.org/abs/2402.00453v1
Compressor summary: iDocVQA and LLaDoc improve instruction-following in document analysis tasks, but they are still far from matching human performance.
http://arxiv.org/abs/2402.00450v1
Compressor summary: CPT is a novel curriculum learning method that adapts to task difficulty and improves few-shot node classification by GNNs.
http://arxiv.org/abs/2402.00448v1
Compressor summary: The paper proposes a dual-student knowledge distillation (DSKD) architecture for unsupervised anomaly detection, which uses two inverted student networks to improve normal data consistency and anomaly representation.
http://arxiv.org/abs/2402.00447v1
Compressor summary: The paper surveys data-efficient graph learning approaches that address the challenge of limited labeled data in graph neural networks.
http://arxiv.org/abs/2402.00446v1
Compressor summary: The paper proposes a dual-step fine-tuning process to teach conversational AI systems to produce safe and prosocial content in both adversarial and casual contexts using n-pair contrastive loss and datasets like MIC and ProsocialDialog.
http://arxiv.org/abs/2402.00433v1
Compressor summary: The paper proposes a dynamic method to merge Transformer models for concurrent tasks using a weight-ensembling mixture of experts module, which adapts to each instance and reduces parameter interference.
http://arxiv.org/abs/2402.00422v1
Compressor summary: The paper proposes new types of convolutions, Pixel Difference Convolution and Binary PDC, that improve accuracy and efficiency in lightweight Deep Neural Networks for visual tasks like edge detection and object recognition.
http://arxiv.org/abs/2402.00421v1
Compressor summary: The text introduces two AI systems, PARIS and LE-PARIS, designed to help patent attorneys respond more efficiently and effectively to Office Actions, and validates their effectiveness through multiple studies.
http://arxiv.org/abs/2402.00414v1
Compressor summary: The paper explores how to use large language models to generate knowledge graphs from text prompts, using three methods and a synthetic dataset.
http://arxiv.org/abs/2402.00407v1
Compressor summary: This paper introduces InfMAE, a foundation model for infrared images, with a new dataset (Inf30), an information-aware masking strategy, and a multi-scale encoder and decoder to improve performance in downstream tasks.
http://arxiv.org/abs/2402.00402v1
Compressor summary: The paper investigates and tries to reduce gender, race, and religion biases in the Llama 2 7B Chat model using activation steering.
http://arxiv.org/abs/2402.00397v1
Compressor summary: Key points: - Traffic forecasting is important but challenging due to data scarcity in many cities - MTPB is a solution that leverages similarities across diverse cities and uses a pre-training process, clustering, and meta-knowledge aggregation - MTPB outperforms existing methods and can improve cross-city few-shot forecasting Summary: MTPB is a novel framework for cross-city few-shot traffic forecasting that leverages similarities across cities, using pre-training, clustering, and meta-knowledge aggregation to overcome data scarcity and achieve superior performance.
http://arxiv.org/abs/2402.00396v1
Compressor summary: Efficient exploration helps improve large language models using human feedback, and double Thompson sampling with epistemic neural networks performs best.
http://arxiv.org/abs/2402.00388v1
Compressor summary: The CuFun model is a novel deep temporal point process model that uses a cumulative distribution function and a monotonic neural network to capture complex behavioral patterns in event sequences, improving prediction accuracy and adaptability.
http://arxiv.org/abs/2402.00385v1
Compressor summary: The paper presents a new model for Arabic nominals that handles their complex morphology and irregular paradigms, and shows improved performance over existing tools.
http://arxiv.org/abs/2402.00371v1
Compressor summary: This paper explores how large language models can be used to improve and evade social media bot detection, showing their potential for both applications but also risks.
http://arxiv.org/abs/2402.00367v1
Compressor summary: The paper proposes two novel approaches to identify and address knowledge gaps in large language models using model collaboration, improving the accuracy of abstaining from answering questions when unsure.
http://arxiv.org/abs/2402.00357v1
Compressor summary: This paper surveys current work on evaluating, attacking, and defending the safety of Multimodal Large Language Models (MLLMs) in image and text domains, highlighting open challenges.
http://arxiv.org/abs/2402.00355v1
Compressor summary: Adaptive primal-dual methods optimize policy in safe reinforcement learning by adjusting learning rates based on Lagrangian multipliers, improving convergence and stability.
http://arxiv.org/abs/2402.00353v1
Compressor summary: The paper introduces Sketch2MedI, a model that can generate realistic medical images from free-hand sketches by encoding them into StyleGAN's latent space, outperforming other models in this task.
http://arxiv.org/abs/2402.00351v1
Compressor summary: This paper introduces a framework and algorithm for machine unlearning of image-to-image generative models, ensuring data privacy while maintaining performance.
http://arxiv.org/abs/2402.00348v1
Compressor summary: The paper proposes O-DICE, a new method for offline RL and IL that uses orthogonal-gradient updates to improve the performance of DICE-based methods by resolving gradient conflicts and imposing state-action-level constraint in a corrected way.
http://arxiv.org/abs/2402.00347v1
Compressor summary: This paper highlights the challenges of explaining machine learning models in scientific domains and proposes a method to find accurate models with consistent explanations that meet stakeholders' needs and reinforce physical laws.
http://arxiv.org/abs/2402.00345v1
Compressor summary: The study introduces IndiVec, a general media bias detection framework that uses large language models and vector databases to adapt and excel in detecting biases across diverse datasets.
http://arxiv.org/abs/2402.00341v1
Compressor summary: Our method removes shadows by estimating local lighting and restoring textures conditioned on the corrected illumination, achieving better results than previous methods.
http://arxiv.org/abs/2402.00332v1
Compressor summary: The paper compares two training algorithms for two-layer neural networks and shows that ARFF has less spectral bias and similar robustness to SGD.
http://arxiv.org/abs/2402.00326v1
Compressor summary: PirateNets is a novel deep learning framework for physics-informed neural networks that uses an adaptive residual connection to enable efficient and stable training of deeper models, achieving state-of-the-art results.
http://arxiv.org/abs/2402.00324v1
Compressor summary: The paper proposes a new multi-label learning method that can handle non-differentiable loss functions and proves its consistency, achieving state-of-the-art results without complex features.
http://arxiv.org/abs/2402.00322v1
Compressor summary: The study investigates bias in abstractive opinion summarisation models and suggests that fine-tuning with diverse topics reduces bias.
http://arxiv.org/abs/2402.00321v1
Compressor summary: SmartCooper is an adaptive framework for collaborative perception in autonomous vehicles that optimizes communication, compression, and filters detrimental data to improve road safety and efficiency.
http://arxiv.org/abs/2402.00319v1
Compressor summary: The paper presents SCO-VIST, a framework that generates coherent and engaging stories from image sequences using object relations and social interaction knowledge, outperforming existing methods in multiple metrics.
http://arxiv.org/abs/2402.00315v1
Compressor summary: The paper studies online estimation of distribution-valued functions with unbounded label sets under local differential privacy and shows a different growth rate of KL-risk compared to bounded label sets.
http://arxiv.org/abs/2402.00313v1
Compressor summary: The paper presents a new reinforcement learning method using stochastic planning that incorporates risk preference and works well for control problems with delayed feedback, including Atari games.
http://arxiv.org/abs/2402.00306v1
Compressor summary: The paper proposes an energy-efficient machine learning architecture that predicts users' next locations with high accuracy, low parameters, small size, and fast training time.
http://arxiv.org/abs/2402.00300v1
Compressor summary: The study shows that self-supervised video models can effectively learn action concepts and object representations from children's egocentric visual experience, suggesting that temporal aspects of a child's internal model of the world may be learned using generic learning algorithms.
http://arxiv.org/abs/2402.00295v1
Compressor summary: This paper evaluates various segmentation methods for image-based analysis of spoil piles in mining using remotely acquired data and finds that a morphology-based deep learning approach outperforms other techniques.
http://arxiv.org/abs/2402.00293v1
Compressor summary: FineBio is a new dataset of videos showing people performing biological experiments with detailed annotations for activity understanding and hand-object interaction recognition.
http://arxiv.org/abs/2402.00290v1
Compressor summary: MEIA is a multimodal agent that can translate natural language tasks into actions using a memory module to integrate visual-language information and large models.
http://arxiv.org/abs/2402.00281v1
Compressor summary: This paper proposes a method to train deep facial expression recognition models with spatial action units cues, making them more interpretable without sacrificing accuracy or requiring extra annotations.
http://arxiv.org/abs/2402.00271v1
Compressor summary: The text proposes a more accurate model for word frequency in natural languages, showing that the parameter $\gamma$ measures vocabulary growth resistance, and introduces a method to estimate it using a "zeroth word".
http://arxiv.org/abs/2402.00263v1
Compressor summary: Key points: - Large language models (LLMs) raise concerns about abuse - DetectGPT is a zero-shot text detector with random perturbation and logit regression - \modelname{} is a novel detector that uses selective strategy perturbation and multi-pair contrastive learning - \modelname{} outperforms SOTA by 1.20% on average on four datasets Summary: \modelname{} is a new text detector that improves over DetectGPT by using selective perturbation and contrastive learning to capture implicit patterns and reduce noise.
http://arxiv.org/abs/2402.00262v1
Compressor summary: The text discusses integrating Large Language Models into Agent-based Modeling to enhance anthropomorphism in complex systems simulations, but highlights the need for explainability and causal analysis in social sciences.
http://arxiv.org/abs/2402.00261v1
Compressor summary: The paper presents Linear Algebra techniques to model neural network layers, visualize weight spaces and kernels, and find inputs for invertible networks.
http://arxiv.org/abs/2402.00258v1
Compressor summary: The text describes a multi-group learning model for hierarchical data and presents an algorithm that produces interpretable decision trees with good generalization.
http://arxiv.org/abs/2402.00254v1
Compressor summary: VSR-DPG combines vertical symbolic regression with deep policy gradient to discover ground-truth equations involving multiple input variables more effectively than previous methods.
http://arxiv.org/abs/2402.00253v1
Compressor summary: The paper surveys the challenges, evaluation, causes, and mitigation of hallucination in Large Vision-Language Models (LVLMs).
http://arxiv.org/abs/2402.00251v1
Compressor summary: The paper presents a method for uncertainty estimation in LLMs and a decision-making agent design that allows efficient use of black-box proprietary LLMs in AI agent development.
http://arxiv.org/abs/2402.00250v1
Compressor summary: LRDif is a novel framework that uses diffusion models and transformers to recognize facial expressions from under-display camera images, overcoming their image degradation challenges.