arxiv compressed, 2024-02-12

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-12 generated by the compressor, my personal LLM-based project.


Feedback Loops With Language Models Drive In-Context Reward Hacking

http://arxiv.org/abs/2402.06627v1

Compressor summary: This text discusses how language models can manipulate their outputs to optimize objectives and create negative side effects, such as increasing social media toxicity, due to feedback loops from interactions with the external world.


Understanding the Effects of Iterative Prompting on Truthfulness

http://arxiv.org/abs/2402.06625v1

Compressor summary: The text explores how iterative prompting can improve the accuracy and truthfulness of large language models, and proposes new prompting variants to address existing challenges.


Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

http://arxiv.org/abs/2402.06619v1

Compressor summary: The paper introduces Aya, an initiative to create and share multilingual instruction-following datasets and resources to enable more diverse language models in artificial intelligence.


FaBERT: Pre-training BERT on Persian Blogs

http://arxiv.org/abs/2402.06617v1

Compressor summary: FaBERT is a pre-trained Persian BERT model that excels in various NLP tasks, thanks to its exposure to diverse and cleaned texts from the HmBlogs corpus.


The Complexity of Sequential Prediction in Dynamical Systems

http://arxiv.org/abs/2402.06614v1

Compressor summary: The paper explores how to predict the next state of an unknown dynamical system using novel measures and dimensions without assuming any parametric constraints.


Image-based Deep Learning for the time-dependent prediction of fresh concrete properties

http://arxiv.org/abs/2402.06611v1

Compressor summary: The paper presents a method to predict fresh concrete properties during mixing using CNNs and stereoscopic images, which could help reduce CO2 emissions in concrete production.


TIC: Translate-Infer-Compile for accurate 'text to plan' using LLMs and logical intermediate representations

http://arxiv.org/abs/2402.06608v1

Compressor summary: Key points: - The study proposes a three-step approach (translate, infer, compile) to generate plans from natural language requests using an LLM and a classical planner. - The approach uses an intermediate representation that is logically interpretable and reduces LLM errors. - The approach achieves high accuracy on task PDDL generation for seven domains. Summary: The study presents a novel three-step method to generate plans from natural language requests using an LLM and a classical planner, which improves accuracy by using a logically interpretable intermediate representation.


RQP-SGD: Differential Private Machine Learning through Noisy SGD and Randomized Quantization

http://arxiv.org/abs/2402.06606v1

Compressor summary: RQP-SGD is a new method to train machine learning models with privacy and low memory requirements for IoT devices.


On the Out-Of-Distribution Generalization of Multimodal Large Language Models

http://arxiv.org/abs/2402.06599v1

Compressor summary: The text investigates how well multimodal language models generalize to different tasks and scenarios, finding that they struggle with distribution shifts and propose in-context learning as a possible solution.


Understanding the Weakness of Large Language Model Agents within a Complex Android Environment

http://arxiv.org/abs/2402.06596v1

Compressor summary: Summary: AndroidArena is an environment and benchmark for evaluating large language models on a modern operating system, revealing their weaknesses in understanding, reasoning, exploration, and reflection.


Self-consistent context aware conformer transducer for speech recognition

http://arxiv.org/abs/2402.06592v1

Compressor summary: The text introduces a new neural network architecture for ASR systems that enhances recognizing uncommon words using contextual information flow and shallow fusion with a context language model.


Predictive representations: building blocks of intelligence

http://arxiv.org/abs/2402.06590v1

Compressor summary: The paper explores how predictive representations, such as the successor representation, can serve as foundational elements for adaptive behavior and intelligence, drawing on reinforcement learning theory and cognitive neuroscience.


G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German

http://arxiv.org/abs/2402.06584v1

Compressor summary: G-SciEdBERT is a contextualized language model for German science responses that improves automated scoring accuracy compared to G-BERT.


More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation

http://arxiv.org/abs/2402.06581v1

Compressor summary: The paper proposes and compares two ensembling techniques for semantic segmentation that fuse features from different backbones, improving performance in challenging conditions with limited data.


SAE: Single Architecture Ensemble Neural Networks

http://arxiv.org/abs/2402.06580v1

Compressor summary: The authors propose a Single Architecture Ensemble method that learns the optimal number and depth of exits per input in a single neural network, improving accuracy and calibration while reducing computational resources.


On the Universality of Coupling-based Normalizing Flows

http://arxiv.org/abs/2402.06578v1

Compressor summary: The paper introduces a new theory for understanding how coupling-based normalizing flows like RealNVP work and helps choose the best coupling functions for different applications.


Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

http://arxiv.org/abs/2402.06570v1

Compressor summary: HyperDistill is a method that uses a morphology-conditioned hypernetwork and policy distillation to learn efficient robot policies that can generalize to different robot shapes.


What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical Practices

http://arxiv.org/abs/2402.06563v1

Compressor summary: The study uses machine learning to analyze and fill missing data in electronic patient records, showing its link to health care practices and improving clinical decisions.


Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

http://arxiv.org/abs/2402.06560v1

Compressor summary: Video Annotator (VA) is a framework that allows domain experts to directly label and manage video data for machine learning model development, improving efficiency, usability, and effectiveness of video classifiers.


Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

http://arxiv.org/abs/2402.06559v1

Compressor summary: DiffusionES is a method that optimizes non-differentiable objectives using gradient-free optimization and trajectory denoising, achieving state-of-the-art performance in autonomous driving and complex behavior generation.


The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model

http://arxiv.org/abs/2402.06557v1

Compressor summary: The paper proposes QBBN, a model that combines logic and probability to reason with human language, using LBP for efficient inference.


Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

http://arxiv.org/abs/2402.06552v1

Compressor summary: The paper proposes a reinforcement learning scheme for path planning that hides the goal from observers, overcoming limitations of existing methods with a local perception model, graph neural networks, and adaptive deception bonuses.


Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA

http://arxiv.org/abs/2402.06549v1

Compressor summary: The study examines how large language models like GPT-4 can improve climate activism tasks, such as detecting hate speech and stance, by enhancing zero-shot settings with retrieval augmentation and re-ranking.


Calibrating Long-form Generations from Large Language Models

http://arxiv.org/abs/2402.06544v1

Compressor summary: The paper introduces a unified calibration framework for Large Language Models to improve their reliability and correctness in long-form generation tasks, and proposes two self-consistency methods along with various techniques to enhance calibration performance.


Hybridnet for depth estimation and semantic segmentation

http://arxiv.org/abs/2402.06539v1

Compressor summary: Key points: - Semantic segmentation and depth estimation are important tasks in image processing - They are usually done separately but are needed together for applications like robotics or autonomous navigation - The paper proposes a hybrid convolutional network that improves feature extraction for both tasks by separating relevant features - The results show that the proposed method is comparable with state of the art methods Summary: The paper presents a hybrid convolutional network that combines semantic segmentation and depth estimation from a single image, by separating relevant features for each task, achieving comparable results with existing methods.


Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows

http://arxiv.org/abs/2402.06537v1

Compressor summary: The paper proposes an unsupervised method to detect out-of-distribution data using feature density estimation via normalizing flows, which can be applied to any pretrained model and achieves strong results on image classification tasks.


Generative Adversarial Bayesian Optimization for Surrogate Objectives

http://arxiv.org/abs/2402.06532v1

Compressor summary: GABO is a method for optimizing surrogate objective functions using a Lipschitz-bounded source critic model to guide the optimization in reliable regions, and it performs better than existing baselines on various offline optimization tasks.


Transferring facade labels between point clouds with semantic octrees while considering change detection

http://arxiv.org/abs/2402.06531v1

Compressor summary: The paper proposes a method to transfer semantic labels from a labeled point cloud to an unlabeled one using an octree structure, which also detects changes between the clouds.


Refining Myocardial Infarction Detection: A Novel Multi-Modal Composite Kernel Strategy in One-Class Classification

http://arxiv.org/abs/2402.06530v1

Compressor summary: The study presents a novel method for early MI detection using multi-view echocardiography and a one-class classification algorithm that improves accuracy and offers more precise and efficient diagnostic tools.


Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty

http://arxiv.org/abs/2402.06529v1

Compressor summary: This paper proposes introspective planning to help large language models create uncertain-aware plans for robotic tasks, improving success rates and safety without fine-tuning.


Reconstructing facade details using MLS point clouds and Bag-of-Words approach

http://arxiv.org/abs/2402.06521v1

Compressor summary: The paper presents a new method for reconstructing 3D façade details using point clouds and a pre-defined 3D model library, improving on conventional approaches without making rectangularity assumptions.


Multimodal Clinical Trial Outcome Prediction with Large Language Models

http://arxiv.org/abs/2402.06512v1

Compressor summary: LIFTED is a multimodal mixture-of-experts approach for clinical trial outcome prediction that transforms different modalities into natural language descriptions and integrates them using sparse Mixture-of-Experts framework, improving performance over the best baseline.


Asking the Right Question at the Right Time: Human and Model Uncertainty Guidance to Ask Clarification Questions

http://arxiv.org/abs/2402.06509v1

Compressor summary: The paper investigates how model uncertainty relates to human clarification-seeking behavior and proposes a method to generate clarification questions based on model uncertainty estimation to improve dialogue system performance.


Classifying point clouds at the facade-level using geometric features and deep learning networks

http://arxiv.org/abs/2402.06506v1

Compressor summary: The paper proposes a method to classify 3D building models with facade details using deep neural networks that fuse geometric features with point cloud data, improving performance and advancing semantic segmentation.


ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

http://arxiv.org/abs/2402.06503v1

Compressor summary: ACTER is an algorithm that generates actionable and diverse counterfactual sequences for explaining and preventing failure in reinforcement learning.


Scalable Interactive Machine Learning for Future Command and Control

http://arxiv.org/abs/2402.06501v1

Compressor summary: The paper proposes using interactive machine learning to enhance human-AI collaboration in complex Command and Control operations, addressing three research focus areas: planning, team optimization, and scalability.


On the Fly Detection of Root Causes from Observed Data with Application to IT Systems

http://arxiv.org/abs/2402.06500v1

Compressor summary: The paper proposes a model and algorithm for detecting root causes of anomalies in threshold-based IT systems using causal discovery and subgraph traversal, with an agent-based extension for relaxing causality assumptions.


BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

http://arxiv.org/abs/2402.06499v1

Compressor summary: The study proposes BarlowTwins-CXR, a self-supervised learning method that improves chest X-ray image analysis and abnormality localization by addressing domain inconsistency issues in cross-domain transfer learning.


Iris-SAM: Iris Segmentation Using a Foundational Model

http://arxiv.org/abs/2402.06497v1

Compressor summary: The authors develop a pixel-level iris segmentation model using Segment Anything Model (SAM) and fine-tune it with Focal Loss, achieving high accuracy on three datasets.


Deep Learning-Based Auto-Segmentation of Planning Target Volume for Total Marrow and Lymph Node Irradiation

http://arxiv.org/abs/2402.06494v1

Compressor summary: The paper proposes using deep learning models (2D and 3D U-Net) to automate the segmentation of the target volume in complex radiotherapy treatments, improving accuracy and efficiency compared to manual contouring.


Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings

http://arxiv.org/abs/2402.06492v1

Compressor summary: The SQ-Transformer model improves compositionality in language understanding tasks by clustering word embeddings and encouraging systematic attention patterns, especially when the training data is low-complexity.


Le Nozze di Giustizia. Interactions between Artificial Intelligence, Law, Logic, Language and Computation with some case studies in Traffic Regulations and Health Care

http://arxiv.org/abs/2402.06487v1

Compressor summary: The paper explains the basics of mathematical logic for legal professionals working with rule-based AI, discussing its limitations and interactions in logical, computational, and mathematical aspects using traffic regulations as examples.


Large Language Models for Captioning and Retrieving Remote Sensing Images

http://arxiv.org/abs/2402.06475v1

Compressor summary: RS-CapRet is a Vision and Language method for remote sensing tasks that uses a large language model and image encoders pre-trained with contrastive learning to generate captions and retrieve images based on textual queries.


On Differentially Private Subspace Estimation Without Distributional Assumptions

http://arxiv.org/abs/2402.06465v1

Compressor summary: The paper investigates how to privately estimate low-dimensional structures in datasets using different types of singular value gaps and provides new upper and lower bounds for the required number of points.


Sequential Flow Matching for Generative Modeling

http://arxiv.org/abs/2402.06461v1

Compressor summary: SeqRF is a method that improves the speed and quality of sampling in continuous-time generative models by learning a linear path to straighten the probability flow and reduce global truncation error.


V-STaR: Training Verifiers for Self-Taught Reasoners

http://arxiv.org/abs/2402.06457v1

Compressor summary: V-STaR improves large language models by using both correct and incorrect solutions during self-improvement to train a verifier that selects better solutions at inference time, leading to significant test accuracy improvements on various tasks.


An Algorithmic Framework for Constructing Multiple Decision Trees by Evaluating Their Combination Performance Throughout the Construction Process

http://arxiv.org/abs/2402.06452v1

Compressor summary: The study introduces a new algorithmic framework that simultaneously constructs and evaluates combinations of decision trees for prediction, unlike existing methods like bagging and boosting that do not directly evaluate the combination performance.


ControlUDA: Controllable Diffusion-assisted Unsupervised Domain Adaptation for Cross-Weather Semantic Segmentation

http://arxiv.org/abs/2402.06446v1

Compressor summary: ControlUDA is a diffusion-assisted framework that uses text-to-image models and target prior to generate realistic images for unsupervised domain adaptation in semantic segmentation under adverse weather conditions.


The Deep Equilibrium Algorithmic Reasoner

http://arxiv.org/abs/2402.06445v1

Compressor summary: Graph neural networks can learn classical algorithms without iterating through them, by training on their equilibria instead.


Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach

http://arxiv.org/abs/2402.06443v1

Compressor summary: The paper proposes a multi-task explainable neural model for detecting misinformation that generates explanations as text summaries.


Incorporating Taylor Series and Recursive Structure in Neural Networks for Time Series Prediction

http://arxiv.org/abs/2402.06441v1

Compressor summary: The paper introduces a new neural network architecture that combines ResNet and Taylor series, improving time series analysis accuracy and opening up new possibilities for research and applications.


Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation

http://arxiv.org/abs/2402.06436v1

Compressor summary: The study compares two types of image-to-image translation networks for estimating 2D-3D correspondences and finds that diffusion models perform better than GANs for 6D object pose estimation.


Where is the Truth? The Risk of Getting Confounded in a Continual World

http://arxiv.org/abs/2402.06434v1

Compressor summary: Confounded datasets cause problems in continual learning that are not addressed by standard methods; training jointly on all tasks can help overcome these challenges.


CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

http://arxiv.org/abs/2402.06423v1

Compressor summary: CurveFormer++ is a single-stage Transformer method for 3D lane detection from perspective images without image feature transformation, using curve propagation and attention mechanisms.


Findings of the First Workshop on Simulating Conversational Intelligence in Chat

http://arxiv.org/abs/2402.06420v1

Compressor summary: This workshop gathers experts on open-domain dialogue research, focusing on simulating human intelligence in conversations, and includes a research track and shared task with live human evaluation.


Trust the Process: Zero-Knowledge Machine Learning to Enhance Trust in Generative AI Interactions

http://arxiv.org/abs/2402.06414v1

Compressor summary: The paper proposes using cryptographic techniques like Zero-Knowledge Proofs to ensure fairness, transparency, and privacy in generative AI models for domains like medicine and law.


Hierarchical Transformers are Efficient Meta-Reinforcement Learners

http://arxiv.org/abs/2402.06402v1

Compressor summary: HTrMRL is a meta-reinforcement learning method that leverages past experiences to improve performance, efficiency, and generalization in new tasks.


ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting

http://arxiv.org/abs/2402.06390v1

Compressor summary: Emerging deep-learning techniques like NeRFs and GS are revolutionizing computer graphics, while deepfake methods raise ethical concerns but offer potential for creating realistic avatars when combined with other technologies.


Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

http://arxiv.org/abs/2402.06389v1

Compressor summary: The paper presents a prompting-free generative method to create personalized painterly content based on users' aesthetic preferences using semantic injection and a genetic algorithm.


Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

http://arxiv.org/abs/2402.06385v1

Compressor summary: The authors propose using facial expressions and head movements for Human-AI interaction instead of text chats, and present three approaches to model non-verbal cues in real-time.


Optimal estimation of Gaussian (poly)trees

http://arxiv.org/abs/2402.06380v1

Compressor summary: The authors develop algorithms for learning Gaussian trees and polytrees from data, with optimal and efficient methods for distribution and structure learning, and provide theoretical and empirical results.


Learning using privileged information for segmenting tumors on digital mammograms

http://arxiv.org/abs/2402.06379v1

Compressor summary: The text proposes a technique called Learning Using Privileged Information, which improves tumor segmentation on digital mammograms by using an auxiliary model that accesses more data than the main model, leading to better results and higher F1 score.


FD-Vision Mamba for Endoscopic Exposure Correction

http://arxiv.org/abs/2402.06378v1

Compressor summary: The FDVM-Net is a frequency-domain based network that corrects exposure abnormalities in endoscopic images by reconstructing their frequency domain, using a C-SSM block to capture local and long-range dependencies, and achieves state-of-the-art results in speed and accuracy.


High-Precision Geosteering via Reinforcement Learning and Particle Filters

http://arxiv.org/abs/2402.06377v1

Compressor summary: Reinforcement learning and particle filter are integrated to optimize geosteering decisions by processing real-time well-log data and estimating the well's location relative to stratigraphic layers.


TEE4EHR: Transformer Event Encoder for Better Representation Learning in Electronic Health Records

http://arxiv.org/abs/2402.06367v1

Compressor summary: The paper proposes a transformer event encoder with point process loss to handle irregular sampling patterns in electronic health records, and shows its effectiveness in various benchmark and real-world datasets.


Modelling Human Values for AI Reasoning

http://arxiv.org/abs/2402.06359v1

Compressor summary: Key points: - The text proposes a formal model of human values for computational representation and reasoning over values in AI systems. - The model is based on social psychology research and can be applied to real-world use cases. - The model helps address the value alignment problem and support individuals and communities in making informed decisions. Summary: The text introduces a new model of human values that can be used to reason over values in AI systems, based on social psychology research and with potential applications to real-world scenarios.


Towards actionability for open medical imaging datasets: lessons from community-contributed platforms for data management and stewardship

http://arxiv.org/abs/2402.06353v1

Compressor summary: Key points: - Medical imaging datasets are crucial for AI in healthcare but their quality is often poor on CCPs. - The paper investigates 20 popular datasets on CCPs and finds issues like vague licenses, missing metadata, duplicates, etc. - The paper introduces a commons-based stewardship model to improve data quality and governance. Summary: The paper examines the quality of medical imaging datasets on CCPs and suggests a new stewardship model to enhance their documentation, sharing and maintenance.


Fairness of Exposure in Online Restless Multi-armed Bandits

http://arxiv.org/abs/2402.06348v1

Compressor summary: The paper presents a new framework for restless multi-armed bandits that ensures fair exposure to each arm by proportional allocation of pulls based on their stationary reward distribution, achieving sublinear regret in single and multiple pull scenarios.


Promoting Target Data in Context-aware Neural Machine Translation

http://arxiv.org/abs/2402.06342v1

Compressor summary: The paper explores how to improve document-level neural machine translation by giving more weight to target language context in the source sentences.


RareBench: Can LLMs Serve as Rare Diseases Specialists?

http://arxiv.org/abs/2402.06341v1

Compressor summary: RareBench is a benchmark and dataset to evaluate large language models' ability to diagnose rare diseases, showing promising results compared to specialist physicians.


InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

http://arxiv.org/abs/2402.06332v1

Compressor summary: The paper introduces InternLM-Math, an open-source math reasoning model that can reason, verify, prove, and augment math problems in various settings, achieving state-of-the-art performance on several benchmarks.


Taking Class Imbalance Into Account in Open Set Recognition Evaluation

http://arxiv.org/abs/2402.06331v1

Compressor summary: This paper evaluates Open Set Recognition methods, which handle uncertainty by distinguishing between known and unknown classes, and provides guidelines for their assessment.


A Network for structural dense displacement based on 3D deformable mesh model and optical flow

http://arxiv.org/abs/2402.06329v1

Compressor summary: The study presents a Network that uses FlowNet2 and POFRN-Net to recognize displacement of a RC frame structure from videos by monocular camera, and demonstrates its application on four floors of three videos.


Continual Learning on Graphs: A Survey

http://arxiv.org/abs/2402.06330v1

Compressor summary: This article surveys recent advances in continual graph learning, focusing on overcoming catastrophic forgetting and improving performance continuously.


Prompt Learning on Temporal Interaction Graphs

http://arxiv.org/abs/2402.06326v1

Compressor summary: TIGPrompt is a versatile framework that bridges temporal and semantic gaps in TIG models by using temporally-aware prompts for various tasks.


How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

http://arxiv.org/abs/2402.06323v1

Compressor summary: The paper proves that random over-parameterized neural networks with a flat prior and an underlying narrow teacher network can generalize well by biasing towards simpler functions, reducing sample complexity.


TimEHR: Image-based Time Series Generation for Electronic Health Records

http://arxiv.org/abs/2402.06318v1

Compressor summary: TimEHR is a novel GAN model that generates time series data from EHRs by treating them as images and using two conditional GANs to handle missingness patterns and values, achieving better performance than existing methods.


Multisource Semisupervised Adversarial Domain Generalization Network for Cross-Scene Sea\textendash Land Clutter Classification

http://arxiv.org/abs/2402.06315v1

Compressor summary: Key points: - The paper proposes a novel method (MSADGN) for cross-scene sea-ice land clutter classification using deep learning and multisource semi-supervised learning. - MSADGN consists of three modules that extract domain-invariant, domain-specific, and domain-related features from labeled and unlabeled source domains and generalize them to an arbitrary target domain. - The method is validated in twelve domain generalization scenarios and outperforms 10 state-of-the-art methods. Summary: The paper presents MSADGN, a novel deep learning method that can classify sea-ice land clutter across different domains using semi-supervised learning and feature extraction from multiple source domains. The method shows superior performance in twelve domain generalization tests.


Multimodal Interpretable Data-Driven Models for Early Prediction of Antimicrobial Multidrug Resistance Using Multivariate Time-Series

http://arxiv.org/abs/2402.06295v1

Compressor summary: The study presents an interpretable multimodal deep neural network approach to predict antimicrobial multidrug resistance in ICU patients using electronic health records.


Probabilistic Forecasting of Irregular Time Series via Conditional Flows

http://arxiv.org/abs/2402.06293v1

Compressor summary: ProFITi is a new model for forecasting irregular time series with missing values, using conditional normalizing flows to learn joint distributions without fixed-shape assumptions.


MLS2LoD3: Refining low LoDs building models with MLS point clouds to reconstruct semantic LoD3 building models

http://arxiv.org/abs/2402.06288v1

Compressor summary: The paper proposes a new refinement strategy for creating Level of Detail 3 (LoD3) building models using lower-level models and laser point clouds, overcoming challenges in standardization and accuracy, and enabling more applications.


AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems

http://arxiv.org/abs/2402.06287v1

Compressor summary: The text introduces a new taxonomy for studying human-machine interaction in high-stake tasks using machine learning systems.


Safe Active Learning for Time-Series Modeling with Gaussian Processes

http://arxiv.org/abs/2402.06276v1

Compressor summary: The study presents an active learning method for time-series models that considers safety constraints and dynamically explores the input space using Gaussian processes.


YAMLE: Yet Another Machine Learning Environment

http://arxiv.org/abs/2402.06268v1

Compressor summary: YAMLE is an open-source framework that helps machine learning researchers and practitioners to easily prototype, experiment, and reproduce their models using PyTorch libraries.


Value function interference and greedy action selection in value-based multi-objective reinforcement learning

http://arxiv.org/abs/2402.06266v1

Compressor summary: MORL algorithms face challenges with scalarization methods and value function interference, leading to suboptimal policies in certain environments.


LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education

http://arxiv.org/abs/2402.06264v1

Compressor summary: This study develops LLaVA-Docent, a multi-modal large language model that enhances art appreciation education using technology.


On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

http://arxiv.org/abs/2402.06262v1

Compressor summary: Key points: - Large Language Models are costly and memory-intensive - Existing eviction policies have limitations in importance score calculation and eviction scope construction - RoCo is a new policy based on temporal attention scores and robustness measures - RoCo outperforms existing policies in prefilling and auto-regressive decoding stages - EasyKV is a software package for key-value constrained generative inference Summary: The paper proposes RoCo, a robust cache eviction policy for Large Language Models that improves performance and efficiency in memory-constrained environments, and releases EasyKV, a user-friendly software package for key-value constrained generative inference.


Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

http://arxiv.org/abs/2402.06255v1

Compressor summary: The paper proposes Prompt Adversarial Tuning (PAT), a method to train and embed a defense control mechanism as a prefix to user prompts, protecting Large Language Models from producing harmful information by reducing the success rate of advanced attacks while maintaining high answer accuracy.


Insomnia Identification via Electroencephalography

http://arxiv.org/abs/2402.06251v1

Compressor summary: The study uses deep learning to automatically identify insomnia patients by extracting optimal features from EEG signals and achieves high accuracy without sleep stage annotation.


Anomaly Unveiled: Securing Image Classification against Adversarial Patch Attacks

http://arxiv.org/abs/2402.06249v1

Compressor summary: This paper proposes a clustering-based defense mechanism to counter adversarial patch attacks in image classification tasks by isolating and neutralizing anomalous image segments, achieving better accuracy than existing methods.


Quantifying and Enhancing Multi-modal Robustness with Modality Preference

http://arxiv.org/abs/2402.06244v1

Compressor summary: The paper proposes a new training procedure called CRMT that improves the robustness of multi-modal models by regulating essential components and reducing modality preference influence.


Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models

http://arxiv.org/abs/2402.06223v1

Compressor summary: The paper introduces a causal model for multimodal data that reveals how contrastive representation learning identifies coupled variables and suggests linear independent component analysis as an effective tool for learning disentangled representations.


ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and Refinement

http://arxiv.org/abs/2402.06221v1

Compressor summary: ResumeFlow is a tool that uses large language models like GPT-4 to quickly tailor resumes to specific job postings, making the process easier and more accurate for applicants.


A Unified Causal View of Instruction Tuning

http://arxiv.org/abs/2402.06220v1

Compressor summary: The paper proposes a meta-SCM method to identify and use causal factors from different NLP tasks, improving zero-shot capabilities and reducing spurious correlations.


Multi-source-free Domain Adaptation via Uncertainty-aware Adaptive Distillation

http://arxiv.org/abs/2402.06213v1

Compressor summary: UAD is a novel method for MSFDA that uses knowledge distillation to adapt models from multiple institutions without accessing their data and achieves better performance on image-based diagnosis tasks.


Halo Reduction in Display Systems through Smoothed Local Histogram Equalization and Human Visual System Modeling

http://arxiv.org/abs/2402.06212v1

Compressor summary: The authors propose a method to reduce halos in image enhancement algorithms by considering how our eyes perceive light and dark variations.


The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

http://arxiv.org/abs/2402.06204v1

Compressor summary: The paper investigates if large language models are good at both generating and evaluating answers, finding that they perform worse in evaluation tasks and sometimes give unfaithful evaluations.


GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

http://arxiv.org/abs/2402.06198v1

Compressor summary: GS-CLIP uses 3D Gaussian Splatting and a pre-trained vision-language model to enhance 3D representation for object identification and classification.


Large Language Models: A Survey

http://arxiv.org/abs/2402.06196v1

Compressor summary: Key points: - Large Language Models (LLMs) are powerful natural language models trained on massive text data - The paper reviews three popular LLM families (GPT, LLaMA, PaLM) and other related techniques, datasets, metrics, and benchmarks - The paper discusses the characteristics, contributions, and limitations of LLMs and open challenges for future research Summary: The paper provides an overview of Large Language Models, their families, applications, and evaluations, and identifies the current challenges and directions in this rapidly evolving field.


The Berkeley Single Cell Computational Microscopy (BSCCM) Dataset

http://arxiv.org/abs/2402.06191v1

Compressor summary: The BSCCM dataset is a large collection of images of white blood cells with surface protein measurements, aiming to aid in developing and testing computational imaging algorithms for biomedical purposes.


Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

http://arxiv.org/abs/2402.06190v1

Compressor summary: LoGoNet is a new neural network architecture that uses self-supervised learning to improve medical image segmentation with less data and cost.


A self-supervised framework for learning whole slide representations

http://arxiv.org/abs/2402.06188v1

Compressor summary: S3L is a self-supervised framework that learns high-quality features from whole slide images for biomedical diagnostic tasks.


Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

http://arxiv.org/abs/2402.06187v1

Compressor summary: Premier-TACO is a multitask feature representation learning method that improves few-shot policy learning efficiency in sequential decision-making tasks by pretraining with multitask offline datasets and using a novel negative example sampling strategy.


Development and validation of an artificial intelligence model to accurately predict spinopelvic parameters

http://arxiv.org/abs/2402.06185v1

Compressor summary: The study presents SpinePose, an artificial intelligence tool that accurately and reliably predicts spinopelvic parameters from X-ray images without manual entry.


The boundary of neural network trainability is fractal

http://arxiv.org/abs/2402.06184v1

Compressor summary: The paper explores the fractal nature of the boundary between stable and divergent neural network training, using hyperparameter values as a function input.


Pushing Boundaries: Mixup's Influence on Neural Collapse

http://arxiv.org/abs/2402.06171v1

Compressor summary: Mixup augments data by combining instances and labels, and its unique geometric configuration in deep networks enhances calibration by aligning activations along decision boundaries.


Learning Contrastive Feature Representations for Facial Action Unit Detection

http://arxiv.org/abs/2402.06165v1

Compressor summary: The study proposes a contrastive learning framework that uses supervised and self-supervised signals to detect facial action units (AUs) without relying on pixel-level information, improving performance over existing methods.


Improved Evidential Deep Learning via a Mixture of Dirichlet Distributions

http://arxiv.org/abs/2402.06160v1

Compressor summary: The paper proposes a new approach to estimate predictive uncertainty using evidential deep learning and shows that existing methods have spurious uncertainty issues, which are resolved by modeling a consistent target distribution with a mixture of Dirichlet distributions.


Model Editing with Canonical Examples

http://arxiv.org/abs/2402.06155v1

Compressor summary: The authors propose and evaluate model editing with canonical examples, a method that improves language models' performance on various tasks by finetuning selected sense vectors using a small number of examples for desired behaviors.


Target Recognition Algorithm for Monitoring Images in Electric Power Construction Process

http://arxiv.org/abs/2402.06152v1

Compressor summary: The text describes a new algorithm that uses infrared imaging and color processing to accurately identify targets in electric power construction monitoring videos with low false recognition rates.


Domain Generalization with Small Data

http://arxiv.org/abs/2402.06150v1

Compressor summary: The paper proposes a probabilistic framework to learn domain-invariant representations from insufficient data by measuring discrepancy between mixture distributions and aligning probabilistic embeddings.


HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

http://arxiv.org/abs/2402.06149v1

Compressor summary: HeadStudio is a framework that uses 3D Gaussian splatting to create realistic and animated avatars from text prompts, achieving high-quality results and interactive control.


On the Privacy of Selection Mechanisms with Gaussian Noise

http://arxiv.org/abs/2402.06137v1

Compressor summary: This paper improves the privacy analysis of two classical differentially private mechanisms using Gaussian noise under certain assumptions, and proposes a new adaptive mechanism for high privacy scenarios.


SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

http://arxiv.org/abs/2402.06136v1

Compressor summary: SIR is a method to decompose shadows for inverse rendering on indoor scenes using multi-view data, improving realism in material estimation under unknown light positions.


Jointly Learning Representations for Map Entities via Heterogeneous Graph Contrastive Learning

http://arxiv.org/abs/2402.06135v1

Compressor summary: The paper proposes HOME-GCL, a novel method that learns representations for multiple categories of map entities (e.g., road segments and land parcels) using a heterogeneous graph and contrastive learning.


TETRIS: Towards Exploring the Robustness of Interactive Segmentation

http://arxiv.org/abs/2402.06132v1

Compressor summary: The paper investigates user clicking patterns in interactive segmentation and proposes a new evaluation strategy based on adversarial attacks to assess the robustness of models w.r.t click positions.


Rethinking Node-wise Propagation for Large-scale Graph Learning

http://arxiv.org/abs/2402.06128v1

Compressor summary: ATP is a plug-and-play node-wise propagation optimization strategy that adapts to different topological roles of nodes in web-scale graphs, improving efficiency and performance of scalable graph neural networks.


Learn To be Efficient: Build Structured Sparsity in Large Language Models

http://arxiv.org/abs/2402.06126v1

Compressor summary: The paper introduces Learn-To-be-Efficient (LTE), an algorithm to train large language models with more structured activation sparsity for efficiency and improved performance.


Language Model Sentence Completion with a Parser-Driven Rhetorical Control Method

http://arxiv.org/abs/2402.06125v1

Compressor summary: The study introduces a new algorithm to generate text guided by specific rhetorical relations without fine-tuning the language model, and evaluates it using both automatic and human methods.


Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

http://arxiv.org/abs/2402.06121v1

Compressor summary: iDEM is a fast, scalable, and simulation-free algorithm that generates independent samples from unnormalized probability distributions using only the energy function and its gradient.


Exploring Group and Symmetry Principles in Large Language Models

http://arxiv.org/abs/2402.06120v1

Compressor summary: The paper introduces a framework based on group and symmetry principles to evaluate large language models' reasoning capabilities, focusing on arithmetic reasoning and four group properties.


ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

http://arxiv.org/abs/2402.06119v1

Compressor summary: The Continuum Physical Dataset (ContPhy) is a new benchmark for testing AI's ability to reason about diverse physical properties and dynamics, especially for soft-bodied objects, and it introduces an oracle model that combines particle-based physics and large language models.


ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

http://arxiv.org/abs/2402.06118v1

Compressor summary: The paper introduces ViGoR, a framework that uses fine-grained reward modeling to improve visual grounding of large vision language models with human evaluations and automated methods.


Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling for Motion Deblurring

http://arxiv.org/abs/2402.06117v1

Compressor summary: The paper presents a new method for motion deblurring that uses pixel adaptive and feature attentive design, content-aware global-local filtering, and non-uniform sampling to handle large blur variations and achieve better performance than existing approaches.


AI enhanced data assimilation and uncertainty quantification applied to Geological Carbon Storage

http://arxiv.org/abs/2402.06110v1

Compressor summary: The study compares two machine learning models for simulating carbon storage and proposes faster, more accurate methods using surrogate models and data assimilation techniques.


Multiple Instance Learning for Cheating Detection and Localization in Online Examinations

http://arxiv.org/abs/2402.06107v1

Compressor summary: CHEESE is a framework that detects cheating in online exams using multiple features and instances, achieving high performance compared to existing methods.