arxiv compressed, 2024-02-05

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-05 generated by the compressor, my personal LLM-based project.


Beyond Lengthscales: No-regret Bayesian Optimisation With Unknown Hyperparameters Of Any Type

http://arxiv.org/abs/2402.01632v1

Compressor summary: HE-GP-UCB is a new algorithm for Bayesian optimization with unknown hyperparameters that has the no-regret property, works in both Bayesian and frequentist settings, and performs well empirically.


Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction

http://arxiv.org/abs/2402.01629v1

Compressor summary: The paper proposes a framework for building compositional models that can generalize using symmetry-based constraints called Generalized Grammar Rules (GGRs), which can handle transduction tasks and relate to other research areas like reinforcement learning.


TravelPlanner: A Benchmark for Real-World Planning with Language Agents

http://arxiv.org/abs/2402.01622v1

Compressor summary: TravelPlanner is a new planning benchmark that tests language agents' ability to handle complex travel planning scenarios, showing their current limitations but also potential for future improvement.


Stochastic Two Points Method for Deep Model Zeroth-order Optimization

http://arxiv.org/abs/2402.01621v1

Compressor summary: The paper proposes a gradient-free method called Accelerated Stochastic Two-Point (AS2P) that efficiently optimizes large deep models, including language models, by exploiting theoretical convergence properties and achieving significant speed-ups.


MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

http://arxiv.org/abs/2402.01620v1

Compressor summary: MAGDi improves reasoning in smaller models by distilling knowledge from multiple LLMs using graph-based representations and objectives, achieving higher efficiency and generalization than other methods.


KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

http://arxiv.org/abs/2402.01619v1

Compressor summary: KB-Plugin is a framework that uses self-supervised learning to encode schema information into plug-and-play modules, enabling language models to induce programs over low-resourced knowledge bases.


Style Vectors for Steering Generative Large Language Model

http://arxiv.org/abs/2402.01618v1

Compressor summary: The research proposes a method for controlling the style of text generated by large language models using style vectors derived from layer activations.


L2G2G: a Scalable Local-to-Global Network Embedding with Graph Autoencoders

http://arxiv.org/abs/2402.01614v1

Compressor summary: L2G2G is a Local2Global method that improves graph autoencoder accuracy by dynamically synchronising node representations during training and using a local patch decoder.


Nomic Embed: Training a Reproducible Long Context Text Embedder

http://arxiv.org/abs/2402.01613v1

Compressor summary: The report introduces a new English text embedding model, nomic-embed-text-v1, that surpasses OpenAI models on short and long-context tasks and provides fully reproducible training code and data.


Contingency Analysis of a Grid of Connected EVs for Primary Frequency Control of an Industrial Microgrid Using Efficient Control Scheme

http://arxiv.org/abs/2402.01608v1

Compressor summary: The text discusses how electric vehicles can improve frequency regulation and reliability in industrial microgrids using the Vehicle-to-Grid approach.


Natural Counterfactuals With Necessary Backtracking

http://arxiv.org/abs/2402.01607v1

Compressor summary: The authors propose a new method for generating natural counterfactuals in causal reasoning that minimizes deviations from realistic scenarios and introduces an optimization framework to control the extent of backtracking.


Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning

http://arxiv.org/abs/2402.01602v1

Compressor summary: The paper proposes a framework for how agents can interact with foundation models (large language models) to guide them for specific tasks, addressing limitations in trustworthiness and usability.


Towards Sustainable Workplace Mental Health: A Novel Approach to Early Intervention and Support

http://arxiv.org/abs/2402.01592v1

Compressor summary: The text discusses a stress detection algorithm that uses chatbot technology to measure and improve employee mental health in real-time, showing its effectiveness in reducing workplace issues like attrition and absenteeism.


NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

http://arxiv.org/abs/2402.01590v1

Compressor summary: The authors present NeuroCine, a framework to generate video from fMRI data, which outperforms previous methods in decoding brain activities and shows biological plausibility.


TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution

http://arxiv.org/abs/2402.01586v1

Compressor summary: The paper proposes TrustAgent, a framework that improves the safety of LLM-based agents by using pre-, in-, and post-planning strategies to inject safety knowledge and prevent potential dangers.


Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study

http://arxiv.org/abs/2402.01582v1

Compressor summary: The paper presents new methods using neural networks and typological data to partially automate linguistic phylogenetic inference, improving on previous semi-automated approaches.


Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

http://arxiv.org/abs/2402.01567v1

Compressor summary: The paper analyzes Adam optimizer's algorithmic components using an online learning framework and shows their importance for achieving better convergence rates than non-adaptive algorithms like SGD.


Boximator: Generating Rich and Controllable Motions for Video Synthesis

http://arxiv.org/abs/2402.01566v1

Compressor summary: Boximeter is a new approach for fine-grained motion control in video synthesis, using hard and soft boxes to define object position, shape, or motion path, while preserving the base model's knowledge.


Deep Continuous Networks

http://arxiv.org/abs/2402.01557v1

Compressor summary: DCNs are a new type of neural network that better model biological vision by using continuous filters and learning receptive field sizes, improving performance on image tasks and pattern completion.


SLYKLatent, a Learning Framework for Facial Features Estimation

http://arxiv.org/abs/2402.01555v1

Compressor summary: SLYKLatent is a new method for improving gaze estimation by addressing dataset challenges and outperforming existing methods.


Adaptive Optimization for Prediction with Missing Data

http://arxiv.org/abs/2402.01543v1

Compressor summary: The paper proposes adaptive linear regression models that learn from imputed data and improve prediction accuracy, especially when data is not missing at random.


Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data

http://arxiv.org/abs/2402.01537v1

Compressor summary: The research introduces a new technique to create trimodal datasets for human behavior analysis using RGB, thermal, and depth images, addressing challenges such as lighting conditions and privacy concerns.


An Empirical Analysis of Diversity in Argument Summarization

http://arxiv.org/abs/2402.01535v1

Compressor summary: The text discusses the importance of capturing diversity in online argument summarization and evaluates existing approaches, suggesting a combination of strategies for better results.


Decoding Speculative Decoding

http://arxiv.org/abs/2402.01528v1

Compressor summary: The paper investigates how to choose the best draft model for speculative decoding in large language models to achieve optimal speedup and proposes an analytical model and a new draft model.


HyperPlanes: Hypernetwork Approach to Rapid NeRF Adaptation

http://arxiv.org/abs/2402.01524v1

Compressor summary: The text introduces a few-shot learning approach for NeRFs using hypernetworks to efficiently generate high-quality 3D object representations from limited images.


K-Level Reasoning with Large Language Models

http://arxiv.org/abs/2402.01521v1

Compressor summary: The text explores how large language models can improve their ability to reason and make decisions in rapidly changing environments using a new approach called "K-Level Reasoning".


Cross-view Masked Diffusion Transformers for Person Image Synthesis

http://arxiv.org/abs/2402.01516v1

Compressor summary: X-MDPT is a new image generation model that uses masked diffusion transformers, latent patches, and semantic information to produce high-quality human images with fewer parameters and faster inference than existing methods.


Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

http://arxiv.org/abs/2402.01515v1

Compressor summary: The authors propose a unified framework to analyze the convergence of first-order optimization methods under non-convex conditions and introduce two new acceleration methods, Reject Accelerating and Random Vector Accelerating.


Mapping the Multiverse of Latent Representations

http://arxiv.org/abs/2402.01514v1

Compressor summary: PRESTO is a framework that analyzes the variability in latent representations of machine-learning models using persistent homology, enabling various applications like sensitivity analysis, anomaly detection, and hyperparameter search.


Multilingual Gradient Word-Order Typology from Universal Dependencies

http://arxiv.org/abs/2402.01513v1

Compressor summary: The paper proposes a new continuous-valued seed dataset for word-order typology that improves NLP performance by addressing inconsistencies in existing categorical datasets.


Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation

http://arxiv.org/abs/2402.01512v1

Compressor summary: This paper reviews distractor generation tasks for English multiple-choice questions, focusing on their characteristics, datasets, and evaluation metrics, finding a lack of open domain and multimodal data.


A Hybrid Strategy for Chat Transcript Summarization

http://arxiv.org/abs/2402.01510v1

Compressor summary: The paper proposes a hybrid method that combines extractive and abstractive summarization with reinforcement learning to produce readable punctuated summaries from chat transcripts without manual annotations.


Code-Switched Language Identification is Harder Than You Think

http://arxiv.org/abs/2402.01505v1

Compressor summary: The paper explores code switching language identification (LID) for corpus building, using realistic scenarios and simpler models to improve performance and metrics.


Developing and Evaluating a Design Method for Positive Artificial Intelligence

http://arxiv.org/abs/2402.01499v1

Compressor summary: The article introduces and evaluates a human-centered method to design AI systems that promote wellbeing by translating aspirations into concrete practices using four steps and a feedback cycle.


A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation

http://arxiv.org/abs/2402.01495v1

Compressor summary: The study evaluates how well conversational large language models generate natural language text from semantic triples derived from knowledge graphs, and suggests ways to improve their performance.


Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?

http://arxiv.org/abs/2402.01484v1

Compressor summary: The paper proposes a method to infer properties of Bayesian neural networks by leveraging the relationship between their weights and functions, and provides guidelines for sampling and convergence diagnosis.


Multi-level protein pre-training with Vabs-Net

http://arxiv.org/abs/2402.01481v1

Compressor summary: The authors propose a span mask pre-training strategy that improves 3D protein representation learning by capturing both residue and atom information for better downstream tasks performance.


Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes

http://arxiv.org/abs/2402.01476v1

Compressor summary: KEP-SVGP is a novel method to estimate uncertainty in self-attention using asymmetric kernels and sparse variational Gaussian processes, achieving better performance and reduced complexity.


Synthetic Data for the Mitigation of Demographic Biases in Face Recognition

http://arxiv.org/abs/2402.01472v1

Compressor summary: The study shows that using synthetic data can help reduce demographic biases in face recognition systems by controlling the representation of different groups.


AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

http://arxiv.org/abs/2402.01469v1

Compressor summary: AMOR is an agent framework that uses LLMs, external knowledge bases, and human supervision to reason with FSMs and adapt to specific domains.


Scaled 360 layouts: Revisiting non-central panoramas

http://arxiv.org/abs/2402.01466v1

Compressor summary: This paper proposes a novel method to recover 3D layouts of indoor environments using deep learning and non-central panoramas, which handles occlusions and scales both Manhattan and Atlanta environments.


3D Vertebrae Measurements: Assessing Vertebral Dimensions in Human Spine Mesh Models Using Local Anatomical Vertebral Axes

http://arxiv.org/abs/2402.01462v1

Compressor summary: The study presents a new automated method for measuring vertebral morphology from 3D spine models with high accuracy and potential for clinical applications.


Visual Gyroscope: Combination of Deep Learning Features and Direct Alignment for Panoramic Stabilization

http://arxiv.org/abs/2402.01461v1

Compressor summary: The article describes a visual gyroscope using panoramas, which combines three methods to accurately estimate camera attitude and tests it on two aerial image sequences.


GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

http://arxiv.org/abs/2402.01459v1

Compressor summary: GaMeS is a novel method for fast and realistic image rendering using neural networks, which combines meshes and Gaussian distributions to condition splats on the object surface.


Convolution kernel adaptation to calibrated fisheye

http://arxiv.org/abs/2402.01456v1

Compressor summary: Key points: - Convolution kernels are important for CNNs - Fisheye cameras have radially symmetric projection that causes distortions - The method adapts the kernel to the distortion using camera calibration - Improves performance in depth estimation and semantic segmentation with fine-tuning Summary: The paper proposes a method to correct convolution kernel deformation caused by fisheye cameras using camera calibration, improving CNN performance in tasks like depth estimation and semantic segmentation.


Integrating Large Language Models in Causal Discovery: A Statistical Causal Approach

http://arxiv.org/abs/2402.01454v1

Compressor summary: The paper proposes a method that combines statistical causal discovery with knowledge from large language models to improve causal inference in various domains.


The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs

http://arxiv.org/abs/2402.01453v1

Compressor summary: The study examines the coherency of factual knowledge in pre-trained language models (PLMs) by measuring how well they can predict a subject entity given an object entity, finding that PLMs struggle with inverse relations and need improvement to serve as knowledge bases.


Improving importance estimation in covariate shift for providing accurate prediction error

http://arxiv.org/abs/2402.01450v1

Compressor summary: The paper proposes incorporating target information into the KLIEP algorithm to improve error estimation in machine learning with covariate shift problems.


Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning

http://arxiv.org/abs/2402.01444v1

Compressor summary: The text argues that satellite data is a unique modality for machine learning and calls for a new research agenda to improve its quality and impact.


Few-Shot Learning on Graphs: from Meta-learning to Pre-training and Prompting

http://arxiv.org/abs/2402.01440v1

Compressor summary: The text surveys recent advancements in few-shot learning on graphs, categorizing existing methods into meta-learning, pre-training, and hybrid approaches, and discussing their strengths and limitations.


From Words to Molecules: A Survey of Large Language Models in Chemistry

http://arxiv.org/abs/2402.01439v1

Compressor summary: This paper explores how Large Language Models (LLMs) are integrated into chemistry, discussing various representation methods, input data categorization, pretraining objectives, and applications, while identifying promising research directions.


Approximate Control for Continuous-Time POMDPs

http://arxiv.org/abs/2402.01431v1

Compressor summary: The paper presents a framework for making decisions in complex systems with many hidden states by approximating their distributions and using control heuristics from fully observable systems.


The effect of diversity on group decision-making

http://arxiv.org/abs/2402.01427v1

Compressor summary: The study examines how cognitive diversity influences group decision-making by analysing 500 online discussions of a problem-solving task and finding that greater diversity leads to better outcomes.


Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

http://arxiv.org/abs/2402.01423v1

Compressor summary: The paper investigates the causes of disagreements in expert-annotated named entity datasets for English, Danish, and Bavarian and finds that text ambiguity and artificial guideline changes are major factors.


EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

http://arxiv.org/abs/2402.01422v1

Compressor summary: EmoSpeaker is a system that generates realistic facial animations for expressing fine-grained emotions using only a portrait and an audio recording, improving the quality and personalization of generated content.


Sequence Shortening for Context-Aware Machine Translation

http://arxiv.org/abs/2402.01416v1

Compressor summary: The study proposes a multi-encoder architecture that caches and reuses the source sentence representation as context, improving translation accuracy and exploring sequence shortening techniques.


SMLP: Symbolic Machine Learning Prover

http://arxiv.org/abs/2402.01415v1

Compressor summary: SMLP is a system exploration tool that uses machine learning, statistics, and formal methods to analyze and optimize complex hardware designs.


XAI for Skin Cancer Detection with Prototypes and Non-Expert Supervision

http://arxiv.org/abs/2402.01410v1

Compressor summary: The paper proposes a new interpretable model for skin cancer diagnosis using binary masks and user feedback to guide learning relevant features.


Climbing the Ladder of Interpretability with Counterfactual Concept Bottleneck Models

http://arxiv.org/abs/2402.01408v1

Compressor summary: CF-CBMs are models that can predict, explain, and imagine alternative scenarios for classification tasks without running post-hoc searches.


On Measuring Context Utilization in Document-Level MT Systems

http://arxiv.org/abs/2402.01404v1

Compressor summary: The paper proposes perturbation-based analysis and supports evaluating translation models' use of supporting context with automatically-annotated data.


Zero-Shot Machine Unlearning at Scale via Lipschitz Regularization

http://arxiv.org/abs/2402.01401v1

Compressor summary: The paper proposes a method to unlearn information from machine learning models in a zero-shot scenario, by inducing smoothing of the output related to the data to be forgotten, while maintaining model performance.


A Probabilistic Model to explain Self-Supervised Representation Learning

http://arxiv.org/abs/2402.01399v1

Compressor summary: The paper proposes a generative latent variable model for self-supervised learning that unifies contrastive methods and explains their performance on content and style tasks.


ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data

http://arxiv.org/abs/2402.01393v1

Compressor summary: Key points: - hybrid pipeline of asynchronous sensing and synchronous processing - ALERT module for event integration based on PointNet - patch-based approach inspired by Vision Transformer to exploit sparsity - transformer model for object and gesture recognition - lower latency and adaptable sampling rate Summary: The authors propose a method that uses asynchronous sensing, PointNet-based event integration, patch-based sparsity exploitation, and a transformer model to process ultra-sparse spatiotemporal data with low latency and flexible sampling rate.


SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation

http://arxiv.org/abs/2402.01389v1

Compressor summary: This paper presents SiMA-Hand, a method to reconstruct 3D hand mesh from RGB images with occlusion by adapting information from multiple views during training and fusing features at different levels.


LLM-based NLG Evaluation: Current Status and Challenges

http://arxiv.org/abs/2402.01383v1

Compressor summary: The paper surveys various methods to evaluate natural language generation using large language models and discusses their advantages and disadvantages, as well as potential collaboration with humans.


Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization

http://arxiv.org/abs/2402.01380v1

Compressor summary: The paper proposes a volumetric video compression method based on dynamic NeRF, which improves compression efficiency by jointly optimizing modeling and compression processes.


Regularized boosting with an increasing coefficient magnitude stop criterion as meta-learner in hyperparameter optimization stacking ensemble

http://arxiv.org/abs/2402.01379v1

Compressor summary: The text describes various hyperparameter optimization (HPO) ensemble methods, focusing on boosting as a promising stacking meta-learner with improved regularization and a novel non-parametric stop criterion for HPO.


LoTR: Low Tensor Rank Weight Adaptation

http://arxiv.org/abs/2402.01376v1

Compressor summary: LoTR is a new method for efficiently fine-tuning large language models using tensor decomposition that improves parameter efficiency compared to previous low-rank adaptation methods.


Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization

http://arxiv.org/abs/2402.01375v1

Compressor summary: The study investigates the reasons for cross-topic performance differences among pre-trained language models and suggests factors that improve their robustness.


Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

http://arxiv.org/abs/2402.01371v1

Compressor summary: The paper presents a new critic-actor algorithm with function approximation for long-run average reward settings, providing the first finite-time convergence analysis and showing improved sample complexity compared to actor-critic methods.


Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors

http://arxiv.org/abs/2402.01369v1

Compressor summary: MMP-Attack is a targeted image generation method that uses both text and image features to insert a new object and remove another one, showing superior performance against commercial models like DALL-E 3.


LIR: Efficient Degradation Removal for Lightweight Image Restoration

http://arxiv.org/abs/2402.01368v1

Compressor summary: LIR is a lightweight image restoration network that uses an Efficient Adaptive Attention Block to remove degradation from images efficiently and with good visual quality, outperforming other networks in terms of parameters and computations.


Continual Learning for Large Language Models: A Survey

http://arxiv.org/abs/2402.01364v1

Compressor summary: The paper discusses continual learning techniques for large language models (LLMs) that balance updates with high training costs and the need to stay current with human knowledge.


To the Max: Reinventing Reward in Reinforcement Learning

http://arxiv.org/abs/2402.01361v1

Compressor summary: This paper proposes max-reward RL, a new way to learn from rewards that works for various environments and improves performance in goal-reaching tasks.


What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation Properties for Fact Verification

http://arxiv.org/abs/2402.01360v1

Compressor summary: The authors study how claim properties, such as entities and relations, affect biomedical fact verification and create a new corpus (BEAR-Fact) for this purpose.


TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version)

http://arxiv.org/abs/2402.01359v1

Compressor summary: This paper proposes fair experiment design constraints, a new classifier robustness metric (AUT), an algorithm for data tuning, and an open-source framework (TESSERACT) to improve malware detection in real-world settings by addressing spatial and temporal biases in existing studies.


Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes

http://arxiv.org/abs/2402.01352v1

Compressor summary: The text discusses how human behavior and signals when describing images vary due to image properties, and explores whether pretrained vision encoders can capture this variation.


Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models

http://arxiv.org/abs/2402.01349v1

Compressor summary: The study finds that Large Language Models' performance on Multiple Choice Question Answering is inconsistent and suggests a need for better evaluation methods.


CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay

http://arxiv.org/abs/2402.01348v1

Compressor summary: CORE is a novel replay-based method for continuous learning that adaptively adjusts the replay buffer size and selects high-quality data samples to mitigate catastrophic forgetting.


Skip $\textbackslash n$: A simple method to reduce hallucination in Large Vision-Language Models

http://arxiv.org/abs/2402.01345v1

Compressor summary: The paper suggests that semantic shifts around paragraph breaks in LVLMs' training data cause multimodal hallucinations, and proposes a simple method to reduce them.


Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks

http://arxiv.org/abs/2402.01344v1

Compressor summary: The paper introduces BiLipNet, a new invertible neural network that can control its output sensitivity and input distinguishability, using a novel residual layer and fast algorithms for model inverse calculation.


Shapelet-based Model-agnostic Counterfactual Local Explanations for Time Series Classification

http://arxiv.org/abs/2402.01343v1

Compressor summary: Time-CF is a post-hoc method for time series classification that uses shapelets and TimeGAN to provide counterfactual explanations with improved performance on four explainability metrics.


Fundamental Properties of Causal Entropy and Information Gain

http://arxiv.org/abs/2402.01341v1

Compressor summary: The paper introduces causal entropy and causal information gain as new measures to quantify causal control in structural causal models, and studies their properties and relationship with stochastic interventions to improve causal machine learning tasks.


Simulator-Free Visual Domain Randomization via Video Games

http://arxiv.org/abs/2402.01335v1

Compressor summary: BehAVE is a video understanding framework that uses commercial video games for domain randomization and improves transferability of vision models across visually distinct domains.


A general framework for rotation invariant point cloud analysis

http://arxiv.org/abs/2402.01331v1

Compressor summary: The authors propose a general method for deep learning based point cloud analysis that is invariant to rotation and show its effectiveness on common benchmarks.


Supervised Algorithmic Fairness in Distribution Shifts: A Survey

http://arxiv.org/abs/2402.01327v1

Compressor summary: The text introduces a field that addresses fair and unbiased machine learning under changing data distributions, reviews existing methods and datasets, and discusses challenges and future directions.


AutoGCN -- Towards Generic Human Activity Recognition with Neural Architecture Search

http://arxiv.org/abs/2402.01313v1

Compressor summary: AutoGCN is a novel NAS algorithm that uses GCNs to recognize human activities from skeletal graphs, achieving better performance and generalization than conventional methods.


Deep Multimodal Fusion of Data with Heterogeneous Dimensionality via Projective Networks

http://arxiv.org/abs/2402.01311v1

Compressor summary: The text proposes a new deep learning framework for fusing multimodal data with different dimensions that can improve diagnosis and treatment of diseases like age-related macular degeneration.


KTO: Model Alignment as Prospect Theoretic Optimization

http://arxiv.org/abs/2402.01306v1

Compressor summary: Kahneman-Tversky Optimization (KTO) uses a human utility model from prospect theory to align LLMs with binary human feedback, outperforming current methods that rely on preferences.


Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

http://arxiv.org/abs/2402.01304v1

Compressor summary: PGST uses textual prompts and style transfer to enhance object detection in unseen domains using only source domain data, achieving state-of-the-art performance.


A Unified Framework for Gradient-based Clustering of Distributed Data

http://arxiv.org/abs/2402.01302v1

Compressor summary: The paper introduces a family of distributed clustering algorithms (DGC-$\mathcal{F}_ho$) for networks of users that communicate only with neighbors, converging to consensus fixed points or Lloyd points under certain conditions.


Two Approaches to Diachronic Normalization of Polish Texts

http://arxiv.org/abs/2402.01300v1

Compressor summary: The paper compares a rule-based and a neural approach to normalize Polish texts over time, presenting data, experiments, and analysis results that show the rule-based method is better for now, but both have pros and cons.


Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

http://arxiv.org/abs/2402.01297v1

Compressor summary: The paper develops novel bounds for the condition number of kernel matrices and studies how different kernel spectra affect overfitting in kernel ridge regressors.


Bi-CryptoNets: Leveraging Different-Level Privacy for Encrypted Inference

http://arxiv.org/abs/2402.01296v1

Compressor summary: The text proposes a new approach to privacy-preserving neural networks using bi-CryptoNets, which separate sensitive and insensitive data segments and use homomorphic encryption and knowledge distillation.


ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

http://arxiv.org/abs/2402.01295v1

Compressor summary: The paper proposes Exloss and ExEnsemble, two novel methods for improving the forecast of extreme weather events using machine learning and data-driven approaches.


Can MLLMs Perform Text-to-Image In-Context Learning?

http://arxiv.org/abs/2402.01293v1

Compressor summary: The paper introduces CoBSAT, a new benchmark dataset for Text-to-Image In-Context Learning (T2I-ICL), and evaluates six MLLMs on it, finding challenges in multimodality and image generation.


Towards the new XAI: A Hypothesis-Driven Approach to Decision Support Using Evidence

http://arxiv.org/abs/2402.01292v1

Compressor summary: The paper proposes a hypothesis-driven XAI method based on the Weight of Evidence framework, which generates positive and negative evidence for hypotheses, and shows it improves decision accuracy compared to other approaches.


UCVC: A Unified Contextual Video Compression Framework with Joint P-frame and B-frame Coding

http://arxiv.org/abs/2402.01289v1

Compressor summary: The paper introduces a unified framework for joint video compression of P-frames and B-frames that can achieve comparable efficiency to specialized methods.


Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection

http://arxiv.org/abs/2402.01287v1

Compressor summary: Spiking CenterNet uses spiking neural networks and knowledge distillation to achieve energy-efficient and accurate object detection on event data.


Spectrum-guided Feature Enhancement Network for Event Person Re-Identification

http://arxiv.org/abs/2402.01269v1

Compressor summary: SFE-Net is a novel network for event-based person re-identification, using spectrum attention and patch dropout to filter noise and enhance discriminative features.


The Human and the Mechanical: logos, truthfulness, and ChatGPT

http://arxiv.org/abs/2402.01267v1

Compressor summary: The paper argues that `mechanical minds' cannot form veridicality judgments, as they lack both exogenous and endogenous evidence, unlike human speakers who manipulate their judgments and communicate them transparently.


Direct side information learning for zero-shot regression

http://arxiv.org/abs/2402.01264v1

Compressor summary: The paper proposes a new method for zero-shot regression using a special kernel to jointly incorporate features and target side information in a single learning process, improving performance on both artificial and real datasets.


A Differentiable POGLM with Forward-Backward Message Passing

http://arxiv.org/abs/2402.01263v1

Compressor summary: The paper introduces a new differentiable partially observable generalized linear model (POGLM) that improves the learning of neural connectivity from spike train data using variational inference and message-passing sampling.


Cascaded Scaling Classifier: class incremental learning with probability scaling

http://arxiv.org/abs/2402.01262v1

Compressor summary: The paper proposes Margin Dampening and Cascaded Scaling Classifier, two methods to reduce forgetting in neural networks when learning new tasks continuously, by combining a soft constraint, knowledge distillation, and gated incremental classifiers.


TEDDY: Trimming Edges with Degree-based Discrimination strategY

http://arxiv.org/abs/2402.01261v1

Compressor summary: TEDDY is a novel framework for finding sparse graph lottery tickets (GLT) by leveraging edge-degree information and achieving better generalization than conventional iterative methods.


Target inductive methods for zero-shot regression

http://arxiv.org/abs/2402.01252v1

Compressor summary: This paper presents two zero-shot regression methods for predicting air pollutants using side information from meteorological stations, and compares them with a baseline method.


Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

http://arxiv.org/abs/2402.01242v1

Compressor summary: Graph Sparse Training (GST) is a new method that dynamically adjusts sparsity in graphs to improve performance and efficiency of Graph Neural Networks (GNNs).


Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?

http://arxiv.org/abs/2402.01241v1

Compressor summary: CISP is a new method that improves 3D shape generation from images by aligning them in a shared embedding space, enhancing coherence and leveraging 3D knowledge in the generative process.


PRIME: Protect Your Videos From Malicious Editing

http://arxiv.org/abs/2402.01239v1

Compressor summary: PRIME is a new video protection method that is faster and more effective than existing methods, addressing privacy concerns caused by generative models.


Flexible Variational Information Bottleneck: Achieving Diverse Compression with a Single Training

http://arxiv.org/abs/2402.01238v1

Compressor summary: Flexible Variational Information Bottleneck (FVIB) is a new framework that optimizes the objective function of Information Bottleneck for classification tasks by finding the best value of Lagrange multiplier $\beta$ with a single, efficient training cycle.


Unveiling Delay Effects in Traffic Forecasting: A Perspective from Spatial-Temporal Delay Differential Equations

http://arxiv.org/abs/2402.01231v1

Compressor summary: The paper proposes a new neural model (STDDE) that captures time delay in spatial information propagation for traffic flow forecasting, allowing predictions at various frequencies and improving accuracy.


HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays

http://arxiv.org/abs/2402.01226v1

Compressor summary: The paper proposes an automated optimization flow for DNNs on low-resolution IR array sensors for people counting, achieving significant reductions in model size, code size, and energy consumption while maintaining accuracy.


Delving into Decision-based Black-box Attacks on Semantic Segmentation

http://arxiv.org/abs/2402.01220v1

Compressor summary: The paper explores black-box attacks on semantic segmentation models and proposes the Discrete Linear Attack, which shows strong adversarial robustness against five models using only 50 queries.


Taming Uncertainty in Sparse-view Generalizable NeRF via Indirect Diffusion Guidance

http://arxiv.org/abs/2402.01217v1

Compressor summary: ID-NeRF uses a diffusion prior to reduce uncertainty and improve novel view synthesis in Neural Radiance Fields with sparse inputs.


TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network

http://arxiv.org/abs/2402.01212v1

Compressor summary: TSJNet is a fusion network that integrates object and semantic information from different modalities, improving image quality and target detection in multi-modality fusion tasks.


Location Agnostic Adaptive Rain Precipitation Prediction using Deep Learning

http://arxiv.org/abs/2402.01208v1

Compressor summary: Key points: - Rain prediction is challenging due to location-dependent features and changing weather patterns - Adaptive deep learning framework can generalize model for any location - Method shows significant improvement for Paris, Los Angeles, and Tokyo Summary: The paper proposes an adaptive deep learning method that can improve rain prediction across different locations by adapting to their specific features and weather changes.


Efficient Causal Graph Discovery Using Large Language Models

http://arxiv.org/abs/2402.01207v1

Compressor summary: The proposed framework uses a BFS approach with LLMs to efficiently discover causal relationships in large graphs, incorporating observational data if available.


Comparative Evaluation of Weather Forecasting using Machine Learning Models

http://arxiv.org/abs/2402.01206v1

Compressor summary: This paper examines how machine learning algorithms can improve weather forecasting by predicting precipitation and temperature patterns in Dhaka city using various algorithms and performance metrics.


A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

http://arxiv.org/abs/2402.01204v1

Compressor summary: SSL for non-sequential tabular data surveys recent progress, challenges, applications, benchmarks, and future directions in learning robust representations from unlabeled data.


Structured World Modeling via Semantic Vector Quantization

http://arxiv.org/abs/2402.01203v1

Compressor summary: The paper introduces a new model (SVQ) for learning semantic neural discrete representations from object-centric data, which improves generation and scene understanding tasks.


Few-Shot Class-Incremental Learning with Prior Knowledge

http://arxiv.org/abs/2402.01201v1

Compressor summary: LwPK uses unlabeled data from new classes to improve a pre-trained model's generalization and prevent catastrophic forgetting in few-shot class-incremental learning.


Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations

http://arxiv.org/abs/2402.01195v1

Compressor summary: The authors propose a method to improve the efficiency and exploration of molecular systems using normalizing flows conditioned on coarse-grained simulations with active learning, achieving significant speedups compared to existing approaches.


Unsupervised Generation of Pseudo Normal PET from MRI with Diffusion Model for Epileptic Focus Localization

http://arxiv.org/abs/2402.01191v1

Compressor summary: The study explores unsupervised deep learning methods to generate pseudo normal FDG PET images for localizing the epileptic focus when a healthy control group is unavailable.


Segment Any Change

http://arxiv.org/abs/2402.01188v1

Compressor summary: AnyChange is a new model that can detect changes in images without training on them, using semantic similarities and point queries to generalize to unseen change types and data distributions.


DeepBranchTracer: A Generally-Applicable Approach to Curvilinear Structure Reconstruction Using Multi-Feature Learning

http://arxiv.org/abs/2402.01187v1

Compressor summary: The paper introduces DeepBranchTracer, a method that learns image and geometric features to reconstruct curvilinear structures from images.


In-Context Learning for Few-Shot Nested Named Entity Recognition

http://arxiv.org/abs/2402.01182v1

Compressor summary: The paper proposes a novel example selection method for few-shot nested named entity recognition using contrastive learning and pretrained language models.


Towards a Unified Language Model for Knowledge-Intensive Tasks Utilizing External Corpus

http://arxiv.org/abs/2402.01176v1

Compressor summary: This paper proposes a unified language model that uses generative retrieval to improve factual accuracy in knowledge-intensive tasks by integrating continuous decoding strategies and auxiliary understanding tasks.


Efficient Prompt Caching via Embedding Similarity

http://arxiv.org/abs/2402.01173v1

Compressor summary: The paper proposes a method to improve LLM inference efficiency by fine-tuning prompt embeddings for better caching prediction using distillation and finite-sample guarantees, achieving better results on a hard question-answering dataset.


Streaming Sequence Transduction through Dynamic Compression

http://arxiv.org/abs/2402.01172v1

Compressor summary: STAR is a Transformer model that compresses input streams for efficient sequence-to-sequence transduction, achieving near lossless compression and superior performance in speech recognition and synchronization tasks.


Faster Inference of Integer SWIN Transformer by Removing the GELU Activation

http://arxiv.org/abs/2402.01169v1

Compressor summary: The paper proposes a method to speed up the SWIN transformer model for image classification by replacing GELU activation with ReLU and using iterative knowledge distillation, achieving up to 11% faster inference while maintaining low accuracy loss.


A Comprehensive Survey on 3D Content Generation

http://arxiv.org/abs/2402.01166v1

Compressor summary: This text summarizes recent advances in creating 3D content using different methods, and provides a website with resources on the topic.


Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning

http://arxiv.org/abs/2402.01163v1

Compressor summary: EUPAS is a new graph collaborative filtering model that uses human mobility, POIs, and geographic data to generate region embeddings for smart cities, improving on existing methods with self-supervised learning and spatial perturbation augmentation.


2AFC Prompting of Large Multimodal Models for Image Quality Assessment

http://arxiv.org/abs/2402.01162v1

Compressor summary: The text studies how large multimodal models can assess image quality using two-alternative forced choice prompting and introduces three evaluation criteria for this purpose.


Truncated Non-Uniform Quantization for Distributed SGD

http://arxiv.org/abs/2402.01160v1

Compressor summary: Our method improves distributed SGD's communication efficiency by using truncation and non-uniform quantization of gradients, with theoretical guarantees and optimal parameters derived for best performance.


LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

http://arxiv.org/abs/2402.01158v1

Compressor summary: The paper introduces LLM-Detector, a method that uses instruction tuning of large language models to detect AI-generated texts at sentence and document levels, improving detection performance and generalization.


Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale

http://arxiv.org/abs/2402.01157v1

Compressor summary: SFUDA is a challenging task where a model adapts without target data using multiple hypotheses, pseudo-labeling, and semi-supervised learning for improved performance.


CABINET: Content Relevance based Noise Reduction for Table Question Answering

http://arxiv.org/abs/2402.01155v1

Compressor summary: CABINAT (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) is a framework that helps large language models focus on relevant information in tables to improve question-answering performance and robustness.


AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents

http://arxiv.org/abs/2402.01152v1

Compressor summary: AccentFold is a method that uses spatial relationships between learned accent embeddings to improve speech recognition, especially for African accents.


Scale Equalization for Multi-Level Feature Fusion

http://arxiv.org/abs/2402.01149v1

Compressor summary: The text discusses a flaw in deep neural networks for semantic segmentation called scale disequilibrium caused by bilinear upsampling, and proposes scale equalizers to fix it.


Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

http://arxiv.org/abs/2402.01147v1

Compressor summary: ACHQ is a policy gradient algorithm that efficiently learns a soft threshold policy for routing jobs to heterogeneous servers, achieving convergence guarantees and improving performance over the greedy policy.


Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging

http://arxiv.org/abs/2402.01146v1

Compressor summary: The text introduces a scalable online gradient descent algorithm for pairwise learning that generalizes to nonlinear models and reduces complexity, performing better than existing methods on real-world data.


Learning Network Representations with Disentangled Graph Auto-Encoder

http://arxiv.org/abs/2402.01143v1

Compressor summary: The paper proposes Disentangled Graph Auto-Encoder (DGA) and Disentangled Variational Graph Auto-Encoder (DVGA), methods that use generative models to learn independent latent factors for graph data, improving analysis and explanation of representations.


Root Cause Analysis In Microservice Using Neural Granger Causal Discovery

http://arxiv.org/abs/2402.01140v1

Compressor summary: RUN is a novel approach for root cause analysis in microservices using neural Granger causal discovery with contrastive learning, which captures temporal relationships and efficiently recommends top-k causes.


DeepAAT: Deep Automated Aerial Triangulation for Fast UAV-based Mapping

http://arxiv.org/abs/2402.01134v1

Compressor summary: DeepAAT is a deep learning network for automated aerial triangulation of UAV images, improving efficiency and accuracy in 3D reconstruction tasks by considering spatial and spectral imagery characteristics.


Seeing Objects in a Cluttered World: Computational Objectness from Motion in Video

http://arxiv.org/abs/2402.01126v1

Compressor summary: The text proposes a method to perceive objects in cluttered scenes using motion cues and spatio-temporal attention, which can handle real video data with blur and camera shake.


A Single Simple Patch is All You Need for AI-generated Image Detection

http://arxiv.org/abs/2402.01123v1

Compressor summary: The paper introduces SSP, a simple and effective method to detect AI-generated images by using noise patterns from a single patch.


PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models

http://arxiv.org/abs/2402.01118v1

Compressor summary: Pok'eLLMon is a text-based AI that learns from human feedback and external knowledge to play Pok'emon games effectively and win against humans.


DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

http://arxiv.org/abs/2402.01117v1

Compressor summary: The text proposes a two-stage fine-tuning method for the text-to-SQL task that reduces the gap between small open-source and large proprietary language models, addressing data privacy concerns.


Interpretation of Intracardiac Electrograms Through Textual Representations

http://arxiv.org/abs/2402.01115v1

Compressor summary: The authors propose using pretrained language models for interpreting and classifying atrial fibrillation electrograms, achieving competitive performance and providing interpretability studies to aid clinical decision-making.


Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and Randomization

http://arxiv.org/abs/2402.01114v1

Compressor summary: Double-Dip combines transfer learning and randomization to protect overfitted DNNs from membership inference attacks, improving both privacy and accuracy.


Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

http://arxiv.org/abs/2402.01111v1

Compressor summary: The paper proposes an algorithm for multi-agent reinforcement learning with adaptivity constraints that achieves a near-optimal trade-off between regret and batch complexity, and also extends to related problems like bandit games and reward-free MARL.


Vaccine: Perturbation-aware Alignment for Large Language Model

http://arxiv.org/abs/2402.01109v1

Compressor summary: The text discusses a new attack on large language models through finetuning, and proposes Vaccine, a technique to make models more robust against harmful data.


Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions

http://arxiv.org/abs/2402.01108v1

Compressor summary: The authors propose reasoning capacity as a unifying criterion for optimizing, evaluating, and improving multi-agent systems that integrate large language models for real-world tasks, addressing constraints like budget, resources, and time.


Simulation of Graph Algorithms with Looped Transformers

http://arxiv.org/abs/2402.01107v1

Compressor summary: This paper studies how looped transformer networks with extra attention heads can simulate graph algorithms and proves their ability to do so for any graph size, while also discussing limitations due to finite precision and achieving Turing Completeness with constant width.


A Survey for Foundation Models in Autonomous Driving

http://arxiv.org/abs/2402.01105v1

Compressor summary: The survey reviews how foundation models, which combine language and vision processing, improve autonomous driving by enhancing tasks like planning, simulation, object detection, and scenario creation.


Compositional Generative Modeling: A Single Model is Not All You Need

http://arxiv.org/abs/2402.01103v1

Compressor summary: The paper proposes a compositional approach to large generative models, improving data efficiency, generalization, and programmability.


Bayesian Deep Learning for Remaining Useful Life Estimation via Stein Variational Gradient Descent

http://arxiv.org/abs/2402.01098v1

Compressor summary: The authors use Stein variational gradient descent to train Bayesian deep learning models for predictive maintenance, achieving better performance and uncertainty estimates than other methods.


Let's Negotiate! A Survey of Negotiation Dialogue Systems

http://arxiv.org/abs/2402.01097v1

Compressor summary: The text reviews recent studies on negotiation dialogue systems, which aim to create intelligent agents that help people negotiate, and discusses future directions for this research area.


Trustworthy Distributed AI Systems: Robustness, Privacy, and Governance

http://arxiv.org/abs/2402.01096v1

Compressor summary: The paper reviews techniques and algorithms for ensuring security, privacy, and fairness in distributed AI systems, which have many economic and societal benefits but also new risks.


How many views does your deep neural network use for prediction?

http://arxiv.org/abs/2402.01095v1

Compressor summary: The paper introduces Minimal Sufficient Views (MSVs), a method to efficiently estimate a set of distinct features in an input that preserve a model's prediction, and shows its relationship with prediction accuracy across various deep neural networks.


Specialized Language Models with Cheap Inference from Limited Domain Data

http://arxiv.org/abs/2402.01093v1

Compressor summary: This work analyzes the trade-offs between different variables when applying large language models to tasks with limited resources and proposes alternative approaches to improve performance.


Reading Between the Tweets: Deciphering Ideological Stances of Interconnected Mixed-Ideology Communities

http://arxiv.org/abs/2402.01091v1

Compressor summary: This paper proposes a new approach to analyze the nuanced views of online communities discussing the 2020 U.S. election on Twitter, using message passing to finetune language models for probing ideologies.


Recent Advances in Predictive Modeling with Electronic Health Records

http://arxiv.org/abs/2402.01077v1

Compressor summary: This paper surveys recent advances in using deep learning to create predictive models from electronic health records data in healthcare.


DoseGNN: Improving the Performance of Deep Learning Models in Adaptive Dose-Volume Histogram Prediction through Graph Neural Networks

http://arxiv.org/abs/2402.01076v1

Compressor summary: The paper proposes efficient deep learning models for DVH prediction in radiotherapy using graph neural networks, which can handle different input images and improve performance.


Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities

http://arxiv.org/abs/2402.01071v1

Compressor summary: Chameleon is a system that uses generative AI and large language models to augment data sets with synthetic tuples and reduce under-representation of minorities, especially in multi-modal settings.