arxiv compressed, 2024-02-07

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-07 generated by the compressor, my personal LLM-based project.


AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

http://arxiv.org/abs/2402.04253v1

Compressor summary: AnyTool is a large language model agent that uses over 16,000 APIs to solve user queries and outperforms previous models like ToolLLM.


EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

http://arxiv.org/abs/2402.04252v1

Compressor summary: The authors present EVA-CLIP-18B, a powerful open-source CLIP model with 18 billion parameters that outperforms other CLIP models on image classification benchmarks using a smaller training dataset.


Linear-time Minimum Bayes Risk Decoding with Reference Aggregation

http://arxiv.org/abs/2402.04251v1

Compressor summary: This paper proposes a method to make MBR decoding cheaper and faster by using aggregated reference representations instead of pairwise calculations.


HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

http://arxiv.org/abs/2402.04249v1

Compressor summary: HarmBench is a framework for evaluating automated red teaming methods against large language models, uncovering risks and improving LLM robustness.


Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

http://arxiv.org/abs/2402.04248v1

Compressor summary: State-space models (SSMs) perform similarly to Transformers in standard regression tasks and better in sparse parity learning, but struggle with non-standard retrieval tasks; a hybrid model improves their performance across tasks.


CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

http://arxiv.org/abs/2402.04239v1

Compressor summary: CAST is a novel self-attention mechanism that uses learnable surrogate tokens to cluster input sequences and reduce quadratic complexity, improving efficiency and performance on long-range sequence modeling tasks.


CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

http://arxiv.org/abs/2402.04236v1

Compressor summary: Key points: - VLMs struggle with complex visual problems and unfaithful responses - Chain of Manipulations helps VLMs solve problems with a series of operations on the visual input - CogCoM is a VLM with this reasoning mechanism that achieves state-of-the-art performance Summary: The paper introduces Chain of Manipulations, a mechanism to improve VLMs' visual reasoning by applying operations on the input. CogCoM, a 17B VLM with this mechanism, excels in complex visual tasks.


Can Generative Agents Predict Emotion?

http://arxiv.org/abs/2402.04232v1

Compressor summary: The authors propose a novel architecture for LLMs to understand new experiences in context by comparing them to past memories, aiming to improve their emotional alignment with humans.


MusicRL: Aligning Music Generation to Human Preferences

http://arxiv.org/abs/2402.04229v1

Compressor summary: MusicRL is a system that generates music based on text inputs using reinforcement learning with human feedback, improving upon existing text-to-music models.


What is 'Typological Diversity' in NLP?

http://arxiv.org/abs/2402.04222v1

Compressor summary: The paper investigates how NLP research uses and defines 'typological diversity' in language selection, and suggests improving the criteria for measuring it.


Variational Shapley Network: A Probabilistic Approach to Self-Explaining Shapley values with Uncertainty Quantification

http://arxiv.org/abs/2402.04211v1

Compressor summary: The text introduces a new method to compute Shapley values for model explainability using a probabilistic framework and a latent embedding space, improving computation speed and handling uncertainty.


"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors

http://arxiv.org/abs/2402.04210v1

Compressor summary: The paper explores using large vision and language models as verifiers for robot tasks, evaluates their effectiveness, and suggests ways to integrate their feedback into policy refinement.


Acute kidney injury prediction for non-critical care patients: a retrospective external and internal validation study

http://arxiv.org/abs/2402.04209v1

Compressor summary: Deep learning and conventional machine learning models can predict progression to Stage 2 or higher acute kidney injury with high accuracy, especially when trained locally.


Human-Like Geometric Abstraction in Large Pre-trained Neural Networks

http://arxiv.org/abs/2402.04203v1

Compressor summary: The study shows that large pre-trained neural networks in AI exhibit more human-like abilities in processing complex, regular geometric shapes and their parts and relations, challenging previous claims of a fundamental difference between human and neural network geometric reasoning.


Instance by Instance: An Iterative Framework for Multi-instance 3D Registration

http://arxiv.org/abs/2402.04195v1

Compressor summary: The paper introduces a new iterative framework for multi-instance 3D registration, which improves accuracy by eliminating outliers and achieves state-of-the-art performance on synthetic and real datasets.


Gradient Coding in Decentralized Learning for Evading Stragglers

http://arxiv.org/abs/2402.04193v1

Compressor summary: The paper presents a new decentralized learning method that combines gossip-based averaging and gradient coding to handle stragglers and improve performance for strongly convex loss functions.


Reinforcement Learning with Ensemble Model Predictive Safety Certification

http://arxiv.org/abs/2402.04182v1

Compressor summary: The paper proposes a new algorithm that combines model-based deep RL with tube-based MPC to minimize safety constraint violations and enable real-world deployment of reinforcement learning on safety-critical tasks.


SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models

http://arxiv.org/abs/2402.04178v1

Compressor summary: The paper introduces SHIELD, a benchmark to evaluate multimodal large language models' ability to detect face spoofing and forgery using various questions and modalities.


Scaling Laws for Downstream Task Performance of Large Language Models

http://arxiv.org/abs/2402.04177v1

Compressor summary: This paper studies how pretraining data size and distribution alignment affect machine translation quality in large language models and provides guidelines for selecting suitable pretraining data.


Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions

http://arxiv.org/abs/2402.04168v1

Compressor summary: The paper proposes Informed Reinforcement Learning, which uses a structured rulebook and situation-aware rewards to improve autonomous driving in complex scenarios.


Tempered Calculus for ML: Application to Hyperbolic Model Embedding

http://arxiv.org/abs/2402.04163v1

Compressor summary: The paper proposes a generalization of mathematical distortions used in machine learning (ML), focusing on properties related to metricity, hyperbolicity, and encoding, and applies it to improve hyperbolic embeddings for decision trees.


Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

http://arxiv.org/abs/2402.04161v1

Compressor summary: The paper proposes a framework to study transformers' sequential modeling capabilities using Markov chains and analyzes the effect of data properties, architecture, and learnt distribution on their performance.


Harnessing the Plug-and-Play Controller by Prompting

http://arxiv.org/abs/2402.04160v1

Compressor summary: The paper presents a novel method for flexible attribute control in text generation using pre-trained language models with plug-and-play controllers and reinforcement learning, resulting in improved smoothness and attribute consistency.


Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

http://arxiv.org/abs/2402.04154v1

Compressor summary: The paper proposes a "read-to-play" capability for artificial agents by using multimodal game instructions to improve their multitasking and generalization skills in Reinforcement Learning.


Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs)

http://arxiv.org/abs/2402.04140v1

Compressor summary: This study uses AI applications like SHIRLEY, SAM, and SARA to analyze court judgments from five countries, detect biases, and facilitate a fair arbitration process with human collaboration.


U-shaped Vision Mamba for Single Image Dehazing

http://arxiv.org/abs/2402.04139v1

Compressor summary: UVM-Net is a new, efficient single-image dehazing network that uses a Bi-SSM block to model long-range dependencies and overcome the limitations of Transformer architecture on resource-constrained devices.


OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

http://arxiv.org/abs/2402.04129v1

Compressor summary: The paper proposes a regularization method for rehearsal-free class-incremental learning using virtual outliers and a simplified prompt-based approach with less parameters and lower cost, achieving comparable or better results than previous methods on ImageNet-R and CIFAR-100.


Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science

http://arxiv.org/abs/2402.04119v1

Compressor summary: The text introduces ChEBI-20-MM, a multi-modal benchmark to assess large language models' performance and knowledge acquisition in molecular science.


Behind the Screen: Investigating ChatGPT's Dark Personality Traits and Conspiracy Beliefs

http://arxiv.org/abs/2402.04110v1

Compressor summary: The paper investigates GPT-3.5 and GPT-4's dark personality traits and conspiracy beliefs using psychological tests and finds no significant differences between them, except for GPT-4's increased belief in information withholding.


Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems

http://arxiv.org/abs/2402.04108v1

Compressor summary: The paper proposes a machine learning-based decision support for assigning delay attribution codes to train delays in Sweden, which performs better than a random classifier but not as well as the current manual method.


An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market

http://arxiv.org/abs/2402.04103v1

Compressor summary: The paper develops a customer segmentation model for online retail using a UK dataset and shows that the Gaussian mixture model performs best.


VRMM: A Volumetric Relightable Morphable Head Model

http://arxiv.org/abs/2402.04101v1

Compressor summary: The paper presents a novel volumetric and parametric facial prior (VRMM) for 3D face modeling that efficiently disentangles and encodes identity, expression, and lighting into low-dimensional representations, enabling relighting capabilities and various applications.


Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction

http://arxiv.org/abs/2402.04097v1

Compressor summary: The text describes a study on deep image prior, its limitations, and a proposed self-driven method to improve image restoration without needing reference images or supervision.


The Use of a Large Language Model for Cyberbullying Detection

http://arxiv.org/abs/2402.04088v1

Compressor summary: The paper explores the use of large language models like RoBERTa for detecting cyberbullying in social media, showing its superior performance over other models.


A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

http://arxiv.org/abs/2402.04087v1

Compressor summary: The paper proposes a fast and efficient method using Gaussian Discriminant Analysis to enhance CLIP's performance in various tasks without additional training time or resources.


Provably learning a multi-head attention layer

http://arxiv.org/abs/2402.04084v1

Compressor summary: The text introduces a study on provably learning a multi-head attention layer from random examples, providing algorithms, upper and lower bounds, and analyzing different settings.


An Optimal House Price Prediction Algorithm: XGBoost

http://arxiv.org/abs/2402.04082v1

Compressor summary: The text discusses using machine learning techniques to predict house prices in Ames City, USA, and finds that XGBoost performs best.


Improved Generalization of Weight Space Networks via Augmentations

http://arxiv.org/abs/2402.04081v1

Compressor summary: The paper analyzes the overfitting problem in deep weight space models and proposes a MixUp method for data augmentation to increase diversity and improve performance.


Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning

http://arxiv.org/abs/2402.04080v1

Compressor summary: The paper introduces a diffusion policy with entropy regularization and Q-ensembles for offline reinforcement learning, achieving state-of-the-art results on D4RL benchmarks.


Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models

http://arxiv.org/abs/2402.04075v1

Compressor summary: The study presents a teacher-student architecture using LLMs to improve prostate cancer radiotherapy symptom extraction from clinical notes, achieving significant improvements in accuracy and other metrics.


Retrieve to Explain: Evidence-driven Predictions with Language Models

http://arxiv.org/abs/2402.04068v1

Compressor summary: R2E is a retrieval-based language model that explains its predictions using Shapley values and can adapt to new evidence without retraining, improving drug target identification from scientific literature.


Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

http://arxiv.org/abs/2402.04064v1

Compressor summary: Key points: - Road pavement detection and segmentation are important for autonomous road repair systems - Proposed a novel end-to-end method for multi-class road defect detection and segmentation with attention blocks - Experiments show the proposed method outperforms existing methods on a new dataset Summary: The authors propose a new end-to-end method for detecting and segmenting multiple road defects using attention blocks, which achieves better results than previous methods on a new dataset.


Link Prediction with Relational Hypergraphs

http://arxiv.org/abs/2402.04062v1

Compressor summary: The paper proposes two frameworks for link prediction with relational hypergraphs using graph neural networks, analyzing their expressive power and empirically showing their effectiveness.


Deep Learning for Multivariate Time Series Imputation: A Survey

http://arxiv.org/abs/2402.04059v1

Compressor summary: The paper surveys deep learning methods for imputing missing values in multivariate time series data and evaluates their impact on downstream tasks, while providing a taxonomy and highlighting strengths and limitations.


More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

http://arxiv.org/abs/2402.04054v1

Compressor summary: The text presents a new framework for analyzing and designing meta-learning methods using PAC-Bayesian theory, which allows more direct and flexible transfer of knowledge between tasks.


Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching

http://arxiv.org/abs/2402.04051v1

Compressor summary: The paper analyzes how weight matching (WM) helps identify linear mode connectivity (LMC) by aligning the directions of singular vectors with large singular values across models for effective model merging.


Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

http://arxiv.org/abs/2402.04050v1

Compressor summary: The paper proposes CraFT, a method to fine-tune black-box vision-language models using input prompts and output predictions, achieving significant improvements in few-shot classification with less memory and faster training.


Systematic Biases in LLM Simulations of Debates

http://arxiv.org/abs/2402.04049v1

Compressor summary: Large language models struggle to simulate human political debates due to inherent social biases that affect their behavior and deviation from established social dynamics.


On provable privacy vulnerabilities of graph representations

http://arxiv.org/abs/2402.04033v1

Compressor summary: This paper studies how sensitive information can be inferred through edge reconstruction attacks on graph neural models, and explores the effectiveness of a private graph representation method against such attacks.


Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation

http://arxiv.org/abs/2402.04031v1

Compressor summary: Polyp-DDPM is a diffusion-based method that generates realistic images of polyps using segmentation masks, improving image quality and polyp segmentation performance.


Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

http://arxiv.org/abs/2402.04030v1

Compressor summary: The text describes how neural networks can be trained directly with the energy function of density functional theory to predict molecular properties faster and more efficiently than previous methods.


Positive concave deep equilibrium models

http://arxiv.org/abs/2402.04029v1

Compressor summary: pcDEQ models improve deep equilibrium models by ensuring existence, uniqueness, and stability of the fixed point through nonnegative and concave constraints, with theoretical convergence guarantees.


AlbNews: A Corpus of Headlines for Topic Modeling in Albanian

http://arxiv.org/abs/2402.04028v1

Compressor summary: AlbNews is a new text corpus for Albanian news headlines that can be used for research in topic modeling and machine learning, with initial classification scores reported.


Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication

http://arxiv.org/abs/2402.04023v1

Compressor summary: The study evaluates Google Translate's accuracy and comprehensibility for translating mental health information into different languages and finds challenges in medical terminology, fluency, and formatting.


Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses

http://arxiv.org/abs/2402.04013v1

Compressor summary: This paper provides a comprehensive survey of MI attacks and defenses on DNNs, covering various modalities and learning tasks.


Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously

http://arxiv.org/abs/2402.04010v1

Compressor summary: The paper proposes new methods to protect data from unauthorized use by making it hard for both supervised and contrastive learning algorithms to learn from it.


Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning

http://arxiv.org/abs/2402.04009v1

Compressor summary: LAST is a method that finetunes pretrained models efficiently by disentangling trainable modules from the frozen model using low-rank self-attention and reducing GPU memory consumption and training time.


Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

http://arxiv.org/abs/2402.04005v1

Compressor summary: The paper proposes a new gradient aggregation method for multi-task learning that considers uncertainty in gradient dimensions using Bayesian inference, leading to improved performance.


Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought

http://arxiv.org/abs/2402.04004v1

Compressor summary: This study examines how different types and intensities of noise in chain of thought traces affect performance of large language models on algorithmically solvable tasks using a custom framework to generate noisy execution traces.


Gradient Sketches for Training Data Attribution and Studying the Loss Landscape

http://arxiv.org/abs/2402.03994v1

Compressor summary: The text discusses the importance of random projections for storing many vectors while maintaining accurate geometry information in neural networks, and proposes a design space for scalable sketching algorithms.


Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias

http://arxiv.org/abs/2402.03991v1

Compressor summary: The paper investigates low-rank bias and neural rank collapse in nonlinear deep networks, showing that increasing weight decay leads to lower layer ranks proportional to hidden-space variability.


YOLOPoint Joint Keypoint and Object Detection

http://arxiv.org/abs/2402.03989v1

Compressor summary: YOLOPoint is a fast and accurate neural network model that detects keypoints and objects in images for GNSS-independent navigation of intelligent vehicles.


A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets

http://arxiv.org/abs/2402.03985v1

Compressor summary: Using multiple synthetic datasets for supervised learning improves accuracy and model selection, especially for high-variance predictors, according to a new theoretical analysis.


Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

http://arxiv.org/abs/2402.03981v1

Compressor summary: The paper proposes a new method, Controllable Diffusion Trajectory (CDT), for predicting future vehicle trajectories in complex traffic scenarios using map information and social interactions with a conditional denoising diffusion model that generates diverse and realistic predictions.


Cross Entropy versus Label Smoothing: A Neural Collapse Perspective

http://arxiv.org/abs/2402.03979v1

Compressor summary: The paper explores how label smoothing affects deep neural networks' convergence, performance, and calibration using Neural Collapse theory and empirical evidence.


Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time

http://arxiv.org/abs/2402.03973v1

Compressor summary: Key points: - The text compares human and deep network performance on recognizing objects in unusual poses - Humans are better than networks at this task, but need more time - The mental processes of humans differ from feed-forward networks Summary: The text shows that humans can recognize objects better than deep networks when they are in unusual poses, but both take extra time for this task and use different mental processes.


Tabular Data: Is Attention All You Need?

http://arxiv.org/abs/2402.03970v1

Compressor summary: The paper compares neural networks and transformers with decision trees and traditional MLPs on tabular data and finds that neural networks are competitive against decision trees, while transformers do not outperform simpler MLP variants.


In-context learning agents are asymmetric belief updaters

http://arxiv.org/abs/2402.03969v1

Compressor summary: The study investigates how large language models learn from feedback and shows that their learning is influenced by the problem's framing, similar to human cognition.


On dimensionality of feature vectors in MPNNs

http://arxiv.org/abs/2402.03966v1

Compressor summary: For any non-polynomial activation function, message-passing graph neural networks (MPNNs) can be equivalent to the Weisfeiler--Leman isomorphism test with constant dimension feature vectors, unlike previous results that required higher dimensions depending on the graph size.


Position Paper: Against Spurious Sparks-Dovelating Inflated AI Claims

http://arxiv.org/abs/2402.03962v1

Compressor summary: The paper discusses how anthropomorphism affects Machine Learning research, leading to over-attribution of human-like qualities to Large Language Models, and calls for academic caution and integrity in interpreting AI results.


Sparse Graph Representations for Procedural Instructional Documents

http://arxiv.org/abs/2402.03957v1

Compressor summary: The paper proposes two methods to improve document similarity computation by using directed and sparse graphs that capture sequential information, achieving better results than a traditional undirected graph approach.


Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

http://arxiv.org/abs/2402.03951v1

Compressor summary: The paper proposes a new attacking strategy, DeCoWA, that can effectively transfer adversarial examples across different model genera, such as CNNs and Transformers.


Discovery of the Hidden World with Large Language Models

http://arxiv.org/abs/2402.03941v1

Compressor summary: COAT is a tool that uses large language models to help discover hidden causal variables from raw observational data, and then uses a causal learning module to provide explanations and feedback for improvement.


Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

http://arxiv.org/abs/2402.03927v1

Compressor summary: The paper analyzes data contamination in OpenAI's GPT-3.5 and GPT-4 models, finding they received $\sim$4.7M samples from 263 benchmarks and exposing evaluation malpractices.


Return-Aligned Decision Transformer

http://arxiv.org/abs/2402.03923v1

Compressor summary: RADT is a new model for offline reinforcement learning that improves the alignment between the actual and target returns by decoupling them from the input sequence.


Large Language Models to Enhance Bayesian Optimization

http://arxiv.org/abs/2402.03921v1

Compressor summary: LLMs can enhance BO by proposing promising solutions based on historical evaluations, improving surrogate modeling and candidate sampling in the early stages of search.


Elastic Feature Consolidation for Cold Start Exemplar-free Incremental Learning

http://arxiv.org/abs/2402.03917v1

Compressor summary: Elastic Feature Consolidation (EFC) is a method that improves Exemplar-Free Class Incremental Learning (EFCIL) by consolidating features, regularizing drift, and balancing prototype rehearsal for cold start scenarios.


Learning Metrics that Maximise Power for Accelerated A/B-Tests

http://arxiv.org/abs/2402.03915v1

Compressor summary: The paper proposes learning metrics from short-term signals to improve the statistical power and reduce the cost of online controlled experiments in technology companies.


EscherNet: A Generative Model for Scalable View Synthesis

http://arxiv.org/abs/2402.03908v1

Compressor summary: EscherNet is a diffusion model that learns implicit 3D representations and can synthesize multiple views with high quality and flexibility, outperforming existing methods in various tasks.


Employee Turnover Analysis Using Machine Learning Algorithms

http://arxiv.org/abs/2402.03905v1

Compressor summary: The paper explores using machine learning to predict employee turnover and its impact on organizational knowledge.


Deep MSFOP: Multiple Spectral filter Operators Preservation in Deep Functional Maps for Unsupervised Shape Matching

http://arxiv.org/abs/2402.03904v1

Compressor summary: The paper proposes a new method for shape matching using functional maps that preserves multiple spectral filter operators, which leads to more informative and stable results, outperforming existing methods.


Compound Returns Reduce Variance in Reinforcement Learning

http://arxiv.org/abs/2402.03903v1

Compressor summary: The text introduces compound returns, a method to reduce variance in multistep reinforcement learning by using weighted averages of $n$-step returns and two-bootstrap returns, which improve sample efficiency with minimal extra cost.


A phase transition between positional and semantic learning in a solvable model of dot-product attention

http://arxiv.org/abs/2402.03902v1

Compressor summary: The paper studies how a neural network layer learns to attend to tokens based on their positions or meanings, and shows that it can learn either mechanism depending on the data size and quality.


Pro-HAN: A Heterogeneous Graph Attention Network for Profile-Based Spoken Language Understanding

http://arxiv.org/abs/2402.03900v1

Compressor summary: The paper proposes Pro-HAN, a method that uses a heterogeneous graph attention network to reason across multiple types of profile information for spoken language understanding tasks.


DistiLLM: Towards Streamlined Distillation for Large Language Models

http://arxiv.org/abs/2402.03898v1

Compressor summary: DistiLLM is a new knowledge distillation framework for language models that uses a novel skew divergence loss and an adaptive off-policy approach to compress teacher models, reduce inference costs, and achieve significant speedups.


Convincing Rationales for Visual Question Answering Reasoning

http://arxiv.org/abs/2402.03896v1

Compressor summary: The text introduces CRVQA, a method to generate visual and textual rationales for VQA answers, which improves accuracy and trust in the predictions.


Shifting social norms as a driving force for linguistic change: Struggles about language and gender in the German Bundestag

http://arxiv.org/abs/2402.03887v1

Compressor summary: The paper examines how language and gender have been a recurring issue in the German Bundestag since the 1980s, using examples of linguistic practices related to gender inclusivity and discussing their implications for the current debate on gender-inclusive language.


MOMENT: A Family of Open Time-series Foundation Models

http://arxiv.org/abs/2402.03885v1

Compressor summary: MOMENT is a family of open-source time-series foundation models that overcomes challenges in pre-training and evaluation, and shows effectiveness on diverse tasks with minimal data.


Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models

http://arxiv.org/abs/2402.03877v1

Compressor summary: This paper explores the challenges large language models face in constructive geometric problem-solving and proposes a framework to enhance their reasoning abilities using an internal dialogue among specialized agents.


Less than one percent of words would be affected by gender-inclusive language in German press texts

http://arxiv.org/abs/2402.03870v1

Compressor summary: Our study finds that rewriting non-gender-inclusive texts in German to be gender-inclusive would require changing less than 1% of all tokens on average, challenging the arguments that gender-inclusive language makes texts too long or negatively affects language learners.


The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

http://arxiv.org/abs/2402.03864v1

Compressor summary: The text discusses how the Neural Tangent Kernel perspective can help analyze Physics-Informed Neural Networks for solving nonlinear Partial Differential Equations, highlighting the benefits of using second-order methods and addressing their challenges with numerical examples and validation.


Position Paper: Toward New Frameworks for Studying Model Representations

http://arxiv.org/abs/2402.03855v1

Compressor summary: The text discusses the importance of studying hidden representations in neural networks for mechanistic interpretability and argues that current methods are insufficient for this purpose.


ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

http://arxiv.org/abs/2402.03848v1

Compressor summary: The paper introduces ANLS*, a new metric for evaluating generative large language models (GLLMs) on various tasks, and shows that SFT prompting technique outperforms others in most cases.


Efficient Generation of Hidden Outliers for Improved Outlier Detection

http://arxiv.org/abs/2402.03846v1

Compressor summary: BISECT is a new outlier generation method that creates realistic outliers with 'multiple views' property, improving outlier detection in diverse datasets.


On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models

http://arxiv.org/abs/2402.03845v1

Compressor summary: The authors analyze the vector field of diffusion models, which can be either conservative or not, and show its impact on density estimation and sampling performance.


A new method for optical steel rope non-destructive damage detection

http://arxiv.org/abs/2402.03843v1

Compressor summary: The paper introduces an algorithm using RGBD-UNet and VovNetV3.5 models to detect damage in steel ropes in high-altitude environments with improved accuracy and background augmentation.


An SVD-free Approach to Nonlinear Dictionary Learning based on RVFL

http://arxiv.org/abs/2402.03833v1

Compressor summary: The paper proposes a new nonlinear dictionary learning algorithm based on a feed-forward neural network called Random Vector Functional Link, which learns a sparse-to-dense feature map and incorporates higher-order dependencies between input coefficients and dictionary atoms, achieving good performance in image classification and reconstruction tasks.


Rethinking Skill Extraction in the Job Market Domain using Large Language Models

http://arxiv.org/abs/2402.03832v1

Compressor summary: The paper explores using large language models for skill extraction, which can handle complex skill mentions better than supervised models.


OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

http://arxiv.org/abs/2402.03830v1

Compressor summary: OASim is an open and adaptive autonomous driving data generator using implicit neural rendering to create high-quality, customizable, and diverse datasets for training algorithms efficiently and safely.


Estimating Barycenters of Distributions with Neural Optimal Transport

http://arxiv.org/abs/2402.03828v1

Compressor summary: The paper proposes a new scalable method to find an average distribution from multiple probability measures using Neural OT solver and shows its advantages and error bounds in various scenarios.


A call for embodied AI

http://arxiv.org/abs/2402.03824v1

Compressor summary: Embodied AI is a new approach to artificial intelligence that emphasizes perception, action, memory, and learning as essential components of an embodied agent, aiming to achieve Artificial General Intelligence through cognitive architectures and active inference principles.


RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

http://arxiv.org/abs/2402.03822v1

Compressor summary: RevOrder is a technique that improves arithmetic operations in large language models by reversing output digits, reducing complexity, and enhancing performance especially with division tasks.


Asymptotic generalization error of a single-layer graph convolutional network

http://arxiv.org/abs/2402.03818v1

Compressor summary: The article analyzes the generalization performance of graph convolutional networks on data from attributed stochastic block models in different settings and compares their convergence rates to the Bayes-optimal rate.


Masked Graph Autoencoder with Non-discrete Bandwidths

http://arxiv.org/abs/2402.03814v1

Compressor summary: The paper proposes a new graph self-supervised learning method using continuous edge masks that improve message propagation and node classification on graph neural networks.


SEABO: A Simple Search-Based Method for Offline Imitation Learning

http://arxiv.org/abs/2402.03807v1

Compressor summary: SEABO is a search-based method for offline imitation learning that learns a reward function from expert and unlabeled data, achieving competitive performance to offline RL with ground-truth rewards and outperforming prior methods.


ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

http://arxiv.org/abs/2402.03804v1

Compressor summary: Sparse computation using non-ReLU activation functions improves efficiency and performance of Large Language Models, with ReLU$^2$ being the best choice among tested functions.


Face Detection: Present State and Research Directions

http://arxiv.org/abs/2402.03796v1

Compressor summary: The review paper discusses the current state and future challenges of face detection in computer vision applications involving humans.


Energy-based Domain-Adaptive Segmentation with Depth Guidance

http://arxiv.org/abs/2402.03795v1

Compressor summary: SMART is a novel framework for unsupervised domain adaptation in semantic segmentation that uses Energy-Based Models to reduce discrepancy between semantic and depth features and assess feature fusion reliability.


No-Regret Reinforcement Learning in Smooth MDPs

http://arxiv.org/abs/2402.03792v1

Compressor summary: The paper proposes two algorithms for no-regret reinforcement learning in continuous state and action spaces based on a novel structural assumption called $u-$smoothness.


Weakly Supervised Anomaly Detection via Knowledge-Data Alignment

http://arxiv.org/abs/2402.03785v1

Compressor summary: KDAlign is a novel framework that uses rule knowledge from experts and Optimal Transport technique to improve weakly supervised anomaly detection accuracy on web-based applications.


AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction

http://arxiv.org/abs/2402.03784v1

Compressor summary: AirPhyNet combines physics principles and neural networks for better air quality prediction and understanding.


Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning

http://arxiv.org/abs/2402.03783v1

Compressor summary: The paper proposes MedPrompt, a weakly supervised method to automatically generate medical text prompts for vision-language models, reducing the need for manual annotations and expert input in medical image recognition tasks.


Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More

http://arxiv.org/abs/2402.03782v1

Compressor summary: The paper studies soft prompt tuning (SPT) for cross-lingual transfer by training only the learnable embeddings without modifying the model parameters, reducing costs and improving performance for linguistically distant languages.


Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification

http://arxiv.org/abs/2402.03780v1

Compressor summary: The paper presents the PPN dataset of propaganda news articles and tests various NLP techniques to identify their stylistic features.


Large Language Models As MOOCs Graders

http://arxiv.org/abs/2402.03776v1

Compressor summary: The study explores using large language models to replace peer grading in online courses, showing promising results when combined with instructor guidance and rubrics.


Learning a Decision Tree Algorithm with Transformers

http://arxiv.org/abs/2402.03774v1

Compressor summary: MetaTree is a transformer-based model that produces high-quality decision trees for classification by learning from outputs of classical algorithms and adapting strategies based on context.


Reinforcement Learning from Bagged Reward: A Transformer-based Approach for Instance-Level Reward Redistribution

http://arxiv.org/abs/2402.03771v1

Compressor summary: RLBR is a new setting where agents learn from bagged rewards and RBT is a Transformer-based model that helps them explore and understand these rewards better.


MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

http://arxiv.org/abs/2402.03766v1

Compressor summary: The authors present MobileVLM V2, an improved family of vision language models that perform well with fewer parameters than previous models.


MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction

http://arxiv.org/abs/2402.03762v1

Compressor summary: MoD-SLAM is a monocular dense mapping method for real-time global pose optimization and 3D reconstruction in unbounded scenes, overcoming the limitations of existing neural SLAM approaches.


Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting

http://arxiv.org/abs/2402.03758v1

Compressor summary: The study presents MDKNet, a method to handle domain bias in multidomain crowd counting by modulating the information flow and learning a domain-separable latent space.


The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

http://arxiv.org/abs/2402.03757v1

Compressor summary: The paper introduces CorrelationQA, a benchmark to measure how well multi-modal language models can resist being fooled by spurious images that are relevant but inconsistent with the correct answers.


QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

http://arxiv.org/abs/2402.03755v1

Compressor summary: The paper presents a framework to improve LLM-based agents for specialized domains like quantitative investment by iteratively refining their knowledge base from real-world scenarios and demonstrates its effectiveness with an agent named QuantAgent.


Intensive Vision-guided Network for Radiology Report Generation

http://arxiv.org/abs/2402.03754v1

Compressor summary: Key points: - Automatic radiology report generation is booming but faces challenges in multi-view reasoning and context reasoning - Proposed model simulates clinicians' perspectives by integrating multi-view vision perception and multi-modal information in report generation - Experiments show superior performance of the proposed method on two datasets Summary: The paper proposes a new model for automatic radiology report generation that better mimics clinicians' reasoning by using multi-view vision perception and multi-modal information, and shows its effectiveness on two datasets.


Enhanced sampling of robust molecular datasets with uncertainty-based collective variables

http://arxiv.org/abs/2402.03753v1

Compressor summary: The proposed method uses uncertainty as a collective variable to guide the acquisition of chemically-relevant data points for improving machine learned interatomic potentials.


Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

http://arxiv.org/abs/2402.03752v1

Compressor summary: The report shows that a lightweight Vision Transformer can outperform Convolutional Neural Networks on small image resolutions and datasets with minimal scaling and pre-training using a masked auto-encoder technique.


Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

http://arxiv.org/abs/2402.03750v1

Compressor summary: The text introduces a new method, Digital Twin Mobility Profiling (DTMP), which uses alignment diagrams and a specialized network to learn spatio-temporal patterns in urban traffic data for intelligent transportation systems.


Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

http://arxiv.org/abs/2402.03749v1

Compressor summary: The paper proposes a new loss function for weak-to-strong supervision, which improves the performance of vision foundation models in various scenarios, outperforming strong-to-strong and fine-tuning methods.


An invariance constrained deep learning network for PDE discovery

http://arxiv.org/abs/2402.03747v1

Compressor summary: The study proposes a deep learning network (ICNet) to discover partial differential equations (PDEs) from sparse data with high noise by incorporating Galilean invariance and other physical constraints, achieving excellent results for fluid mechanics and wave equations.


Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

http://arxiv.org/abs/2402.03746v1

Compressor summary: The paper introduces a new method for aligning video and text modalities using reinforcement learning from AI feedback, which improves performance on various video benchmarks.


INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

http://arxiv.org/abs/2402.03744v1

Compressor summary: The text proposes a new method to detect when large language models make mistakes by analyzing their internal states and using a simple metric called EigenScore.


SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

http://arxiv.org/abs/2402.03741v1

Compressor summary: The paper proposes a novel black-box attack (SUB-PLAY) that exploits the vulnerabilities of multi-agent systems under partial observability, inducing significant changes in their policy networks and posing security threats.


AoSRNet: All-in-One Scene Recovery Networks via Multi-knowledge Integration

http://arxiv.org/abs/2402.03738v1

Compressor summary: The paper proposes a scene recovery network (AoSRNet) that improves low-visibility imaging by integrating multiple techniques to enhance contrast, color, and texture in challenging conditions like haze, dust, and low light.


Differentially Private High Dimensional Bandits

http://arxiv.org/abs/2402.03737v1

Compressor summary: PrivateLASSO is a differentially private LASSO bandit algorithm that works well for high-dimensional stochastic contextual linear bandit problems with sparse parameters and privacy constraints.


Deep Outdated Fact Detection in Knowledge Graphs

http://arxiv.org/abs/2402.03732v1

Compressor summary: DEAN is a deep learning framework that uses contrastive learning on a pre-defined graph to detect outdated facts in knowledge graphs effectively.


Consistent Joint Decision-Making with Heterogeneous Learning Models

http://arxiv.org/abs/2402.03728v1

Compressor summary: The paper presents a framework that improves decision-making consistency using external knowledge and global normalization based on ILP.


Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes

http://arxiv.org/abs/2402.03726v1

Compressor summary: The paper proposes a new deep learning framework, ISAHP, that can discover fine-grained causal relationships among events without strong assumptions or heuristics.


Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos

http://arxiv.org/abs/2402.03723v1

Compressor summary: This paper introduces Rig3DGS, a method to generate 3D human portraits from smartphone videos by using learned deformations guided by a 3D morphable model, which improves rendering quality and facial expression control.


Similarity-based Neighbor Selection for Graph LLMs

http://arxiv.org/abs/2402.03720v1

Compressor summary: SNS is a method that uses Language Learning Models to improve node classification in Text-attributed Graphs by selecting similar neighbors, leading to better graph representation and generalization.


Empowering Language Models with Active Inquiry for Deeper Understanding

http://arxiv.org/abs/2402.03719v1

Compressor summary: The paper introduces LaMAI, a method that uses active learning to help large language models ask better questions and improve their responses in conversational contexts, leading to more accurate answers and better user experiences.


Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification

http://arxiv.org/abs/2402.03716v1

Compressor summary: The paper proposes a method to identify people across videos when they change clothes using gait and body shape features learned with graph attention networks, improving performance significantly.


Clarify: Improving Model Robustness With Natural Language Corrections

http://arxiv.org/abs/2402.03715v1

Compressor summary: Clarify is a method that uses natural language feedback to improve model training by correcting misconceptions.


SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images

http://arxiv.org/abs/2402.03708v1

Compressor summary: The paper introduces SISP, a new benchmark dataset for ship instance segmentation in panchromatic satellite images, with well-annotated data and a proposed method (DFRInst) to improve performance on real-world scenes.


FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

http://arxiv.org/abs/2402.03705v1

Compressor summary: TAGA attacks use natural perturbations like blur to manipulate image attributes in guided image synthesis methods, raising ethical concerns about their potential to contradict user intentions.


Improving and Unifying Discrete&Continuous-time Discrete Denoising Diffusion

http://arxiv.org/abs/2402.03701v1

Compressor summary: The paper introduces USD3, a simplified and unified framework for training and sampling discrete diffusion models on discrete data like language and graphs, improving performance over existing methods.


Estimating the Local Learning Coefficient at Scale

http://arxiv.org/abs/2402.03698v1

Compressor summary: The local learning coefficient (LLC) measures model complexity and can be estimated accurately for deep linear networks using a method from singular learning theory.


SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification

http://arxiv.org/abs/2402.03697v1

Compressor summary: SHMC-Net is a new method for sperm head morphology classification that uses segmentation masks to guide the analysis and improves accuracy on small datasets with noisy labels.


3Doodle: Compact Abstraction of Objects with 3D Strokes

http://arxiv.org/abs/2402.03690v1

Compressor summary: Key points: - Free-hand sketching is inefficient and subjective for representing 3D objects - 3Dooole generates view-consistent sketches from multi-view images using 3D strokes - 3D strokes are optimized to minimize perceptual losses and represent essential 3D shapes Summary: 3Dooole is a method that creates realistic and consistent sketches of 3D objects from multiple views by optimizing 3D strokes that capture essential shapes.


Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

http://arxiv.org/abs/2402.03687v1

Compressor summary: PAARD is a new graph generation model that combines autoregressive and diffusion methods for efficient and high-quality graph generation, achieving state-of-the-art results on various datasets.


Minds versus Machines: Rethinking Entailment Verification with Language Models

http://arxiv.org/abs/2402.03686v1

Compressor summary: Key points: - The paper compares human and LLM inference judgments using a curated entailment verification benchmark - LLMs are better at multi-hop reasoning, humans excel in simple deductive reasoning - A fine-tuned Flan-T5 model outperforms GPT-3.5 and rivals with GPT-4 Summary: The paper evaluates human and LLM inference skills on a complex benchmark, finds differences in their strengths, and introduces a new Flan-T5 model that performs well in entailment verification and explanation generation.


Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

http://arxiv.org/abs/2402.03678v1

Compressor summary: LSTS is a novel approach that learns a set of RL policies to guide an agent from an initial state to a goal state based on high-level task specifications, while minimizing the number of environmental interactions.


Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

http://arxiv.org/abs/2402.03667v1

Compressor summary: The paper proposes a novel Indirect Reasoning method for LLMs that uses contrapositives and contradictions to improve their reasoning abilities in tasks like factual reasoning and mathematic proof.


QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

http://arxiv.org/abs/2402.03666v1

Compressor summary: Key points: - Existing methods for compressing and accelerating diffusion models fail when quantized to low-bits - The paper identifies three properties that make low-bit quantization difficult: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules - The paper proposes finetuning the quantized model to address these issues and achieve state-of-the-art performance on image generation tasks Summary: The paper proposes a finetuning method for low-bit quantized diffusion models that improves their performance on image generation by addressing imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations.


Efficient Solvers for Partial Gromov-Wasserstein

http://arxiv.org/abs/2402.03664v1

Compressor summary: The paper proposes two new solvers for the partial Gromov-Wasserstein problem, which enables comparison and matching of measures with unequal masses in different metric spaces, and shows their effectiveness on shape-matching and positive-unlabeled learning problems.


Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

http://arxiv.org/abs/2402.03663v1

Compressor summary: The paper introduces Neurosymbolic Deep Neural Networks, which combine neural and symbolic layers for perception and reasoning tasks, and proposes the principle of symbol correctness for designing and analyzing these models.


Transductive Reward Inference on Graph

http://arxiv.org/abs/2402.03661v1

Compressor summary: The study proposes a method to estimate rewards for unlabelled data in offline reinforcement learning using graph-based reward inference and limited human annotations, improving task performance.


Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm

http://arxiv.org/abs/2402.03660v1

Compressor summary: The paper discovers that finetuned models from a common pretrained checkpoint have a linear relationship across tasks, which reveals new insights into model merging and editing.


Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models

http://arxiv.org/abs/2402.03659v1

Compressor summary: The SEP framework uses a self-reflective agent and PPO to train a LLM to generate explainable stock predictions without human annotations.


Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

http://arxiv.org/abs/2402.03658v1

Compressor summary: The text introduces EDGE, a novel framework that incorporates utterance, video, and audio sentiment signals to generate natural language explanations for sarcastic dialogues, addressing challenges in previous models.


Operator SVD with Neural Networks via Nested Low-Rank Approximation

http://arxiv.org/abs/2402.03655v1

Compressor summary: This paper introduces a new method to learn eigenfunctions using neural networks that approximates low-rank matrices and preserves orthogonality efficiently.


Reviewing FID and SID Metrics on Generative Adversarial Networks

http://arxiv.org/abs/2402.03654v1

Compressor summary: This paper evaluates two metrics (FID and SID) for measuring the performance of image-to-image GANs and finds that SID might be more efficient and effective than FID.


CAMBranch: Contrastive Learning with Augmented MILPs for Branching

http://arxiv.org/abs/2402.03647v1

Compressor summary: The text introduces CAMBranch, a framework that uses contrastive learning and variable shifting to improve machine learning-based branching policies for MILPs with limited expert data.


Lens: A Foundation Model for Network Traffic

http://arxiv.org/abs/2402.03646v1

Compressor summary: Lens is a network traffic model using T5 architecture that learns representations from large-scale unlabeled data and performs well in downstream tasks with less labeled data needed.


Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

http://arxiv.org/abs/2402.03642v1

Compressor summary: The Stanceosaurus 2.0 dataset adds Russian and Spanish tweets to analyze cross-cultural and cross-lingual misinformation using stance classification.


torchmSAT: A GPU-Accelerated Approximation To The Maximum Satisfiability Problem

http://arxiv.org/abs/2402.03640v1

Compressor summary: The authors propose a new method to solve MaxSAT using a single differentiable function and a neural network architecture that works without training data or an underlying SAT solver, achieving better results than existing solvers.


BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection

http://arxiv.org/abs/2402.03634v1

Compressor summary: BEAM is a novel technique that improves multi-view 3D object detection by using Beta Distribution Ray Denoising to handle ambiguous depth information and achieve state-of-the-art results.


CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

http://arxiv.org/abs/2402.03631v1

Compressor summary: CAT-SAM is a method that adapts SAM for various unconventional image segmentation tasks with few-shot target samples by tuning the mask decoder and image encoder together using a prompt bridge structure.


Disparate Impact on Group Accuracy of Linearization for Private Inference

http://arxiv.org/abs/2402.03629v1

Compressor summary: The paper explores how reducing non-linear activations in neural networks for privacy-preserving inference may harm minority groups' accuracy and proposes a mitigation strategy.


Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies

http://arxiv.org/abs/2402.03628v1

Compressor summary: PAgents are autonomous agents using large language models to develop expertise and provide professional services in various domains.


Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

http://arxiv.org/abs/2402.03625v1

Compressor summary: The paper analyzes how two-layer ReLU networks with weight decay regularization and their convex relaxations perform when trained with random data, showing that a simple algorithm can solve the non-convex problem efficiently and proving that random initialization leads to low training loss for local gradient methods.


Neural Network Approximators for Marginal MAP in Probabilistic Circuits

http://arxiv.org/abs/2402.03621v1

Compressor summary: The paper proposes a self-supervised neural network approach to approximate near-optimal solutions for (M)MAP inference tasks in probabilistic circuits, achieving linear time performance and outperforming existing methods on benchmark datasets.


Self-Discover: Large Language Models Self-Compose Reasoning Structures

http://arxiv.org/abs/2402.03620v1

Compressor summary: SELF-DISCOVER is a framework that helps large language models improve their reasoning skills by composing multiple reasoning modules into an explicit structure.


Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction

http://arxiv.org/abs/2402.03618v1

Compressor summary: The text studies how humans create mental abstractions from sensory data using serial reproduction, comparing unimodal and multimodal chains with both humans and GPT-4, and finds that adding language increases human abstractions more than GPT-4's.


Leveraging Large Language Models for Hybrid Workplace Decision Support

http://arxiv.org/abs/2402.03616v1

Compressor summary: Large Language Models (LLMs) assist workers in choosing suitable workspaces for hybrid work environments by providing intelligent suggestions and explanations based on resource trade-offs.


Bayesian Factorised Granger-Causal Graphs For Multivariate Time-series Data

http://arxiv.org/abs/2402.03614v1

Compressor summary: The paper proposes a new Bayesian VAR model with a hierarchical graph prior for discovering Granger causal relations from observational multivariate time-series data, which improves uncertainty quantification, has fewer hyperparameters, and performs better than existing methods.


RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

http://arxiv.org/abs/2402.03610v1

Compressor summary: The Retrieval-Augmented Planning (RAP) framework improves large language models' decision-making abilities by dynamically using past experiences in various textual and multimodal scenarios.


Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning

http://arxiv.org/abs/2402.03607v1

Compressor summary: The text discusses how combining explicit commonsense knowledge in the form of knowledge graphs with large Vision Language Models can improve predicting the effectiveness of multi-modal marketing campaigns, enabling early detection and assessment of persuasive campaigns.


Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models

http://arxiv.org/abs/2402.03597v1

Compressor summary: The study demonstrates GPT-4's ability to accurately extract reasons for contraceptive switching from clinical notes, outperforming baseline models and showing that patient preference, adverse events, and insurance are key factors in switch decisions.


GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation

http://arxiv.org/abs/2402.03592v1

Compressor summary: GRASP is a novel framework that uses graph structures and multi-magnification information to improve cancer subtyping in digital pathology, outperforming existing methods while being smaller and more interpretable.