This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-07 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.04253v1
Compressor summary: AnyTool is a large language model agent that uses over 16,000 APIs to solve user queries and outperforms previous models like ToolLLM.
http://arxiv.org/abs/2402.04252v1
Compressor summary: The authors present EVA-CLIP-18B, a powerful open-source CLIP model with 18 billion parameters that outperforms other CLIP models on image classification benchmarks using a smaller training dataset.
http://arxiv.org/abs/2402.04251v1
Compressor summary: This paper proposes a method to make MBR decoding cheaper and faster by using aggregated reference representations instead of pairwise calculations.
http://arxiv.org/abs/2402.04249v1
Compressor summary: HarmBench is a framework for evaluating automated red teaming methods against large language models, uncovering risks and improving LLM robustness.
http://arxiv.org/abs/2402.04248v1
Compressor summary: State-space models (SSMs) perform similarly to Transformers in standard regression tasks and better in sparse parity learning, but struggle with non-standard retrieval tasks; a hybrid model improves their performance across tasks.
http://arxiv.org/abs/2402.04239v1
Compressor summary: CAST is a novel self-attention mechanism that uses learnable surrogate tokens to cluster input sequences and reduce quadratic complexity, improving efficiency and performance on long-range sequence modeling tasks.
http://arxiv.org/abs/2402.04236v1
Compressor summary: Key points: - VLMs struggle with complex visual problems and unfaithful responses - Chain of Manipulations helps VLMs solve problems with a series of operations on the visual input - CogCoM is a VLM with this reasoning mechanism that achieves state-of-the-art performance Summary: The paper introduces Chain of Manipulations, a mechanism to improve VLMs' visual reasoning by applying operations on the input. CogCoM, a 17B VLM with this mechanism, excels in complex visual tasks.
http://arxiv.org/abs/2402.04232v1
Compressor summary: The authors propose a novel architecture for LLMs to understand new experiences in context by comparing them to past memories, aiming to improve their emotional alignment with humans.
http://arxiv.org/abs/2402.04229v1
Compressor summary: MusicRL is a system that generates music based on text inputs using reinforcement learning with human feedback, improving upon existing text-to-music models.
http://arxiv.org/abs/2402.04222v1
Compressor summary: The paper investigates how NLP research uses and defines 'typological diversity' in language selection, and suggests improving the criteria for measuring it.
http://arxiv.org/abs/2402.04211v1
Compressor summary: The text introduces a new method to compute Shapley values for model explainability using a probabilistic framework and a latent embedding space, improving computation speed and handling uncertainty.
http://arxiv.org/abs/2402.04210v1
Compressor summary: The paper explores using large vision and language models as verifiers for robot tasks, evaluates their effectiveness, and suggests ways to integrate their feedback into policy refinement.
http://arxiv.org/abs/2402.04209v1
Compressor summary: Deep learning and conventional machine learning models can predict progression to Stage 2 or higher acute kidney injury with high accuracy, especially when trained locally.
http://arxiv.org/abs/2402.04203v1
Compressor summary: The study shows that large pre-trained neural networks in AI exhibit more human-like abilities in processing complex, regular geometric shapes and their parts and relations, challenging previous claims of a fundamental difference between human and neural network geometric reasoning.
http://arxiv.org/abs/2402.04195v1
Compressor summary: The paper introduces a new iterative framework for multi-instance 3D registration, which improves accuracy by eliminating outliers and achieves state-of-the-art performance on synthetic and real datasets.
http://arxiv.org/abs/2402.04193v1
Compressor summary: The paper presents a new decentralized learning method that combines gossip-based averaging and gradient coding to handle stragglers and improve performance for strongly convex loss functions.
http://arxiv.org/abs/2402.04182v1
Compressor summary: The paper proposes a new algorithm that combines model-based deep RL with tube-based MPC to minimize safety constraint violations and enable real-world deployment of reinforcement learning on safety-critical tasks.
http://arxiv.org/abs/2402.04178v1
Compressor summary: The paper introduces SHIELD, a benchmark to evaluate multimodal large language models' ability to detect face spoofing and forgery using various questions and modalities.
http://arxiv.org/abs/2402.04177v1
Compressor summary: This paper studies how pretraining data size and distribution alignment affect machine translation quality in large language models and provides guidelines for selecting suitable pretraining data.
http://arxiv.org/abs/2402.04168v1
Compressor summary: The paper proposes Informed Reinforcement Learning, which uses a structured rulebook and situation-aware rewards to improve autonomous driving in complex scenarios.
http://arxiv.org/abs/2402.04163v1
Compressor summary: The paper proposes a generalization of mathematical distortions used in machine learning (ML), focusing on properties related to metricity, hyperbolicity, and encoding, and applies it to improve hyperbolic embeddings for decision trees.
http://arxiv.org/abs/2402.04161v1
Compressor summary: The paper proposes a framework to study transformers' sequential modeling capabilities using Markov chains and analyzes the effect of data properties, architecture, and learnt distribution on their performance.
http://arxiv.org/abs/2402.04160v1
Compressor summary: The paper presents a novel method for flexible attribute control in text generation using pre-trained language models with plug-and-play controllers and reinforcement learning, resulting in improved smoothness and attribute consistency.
http://arxiv.org/abs/2402.04154v1
Compressor summary: The paper proposes a "read-to-play" capability for artificial agents by using multimodal game instructions to improve their multitasking and generalization skills in Reinforcement Learning.
http://arxiv.org/abs/2402.04140v1
Compressor summary: This study uses AI applications like SHIRLEY, SAM, and SARA to analyze court judgments from five countries, detect biases, and facilitate a fair arbitration process with human collaboration.
http://arxiv.org/abs/2402.04139v1
Compressor summary: UVM-Net is a new, efficient single-image dehazing network that uses a Bi-SSM block to model long-range dependencies and overcome the limitations of Transformer architecture on resource-constrained devices.
http://arxiv.org/abs/2402.04129v1
Compressor summary: The paper proposes a regularization method for rehearsal-free class-incremental learning using virtual outliers and a simplified prompt-based approach with less parameters and lower cost, achieving comparable or better results than previous methods on ImageNet-R and CIFAR-100.
http://arxiv.org/abs/2402.04119v1
Compressor summary: The text introduces ChEBI-20-MM, a multi-modal benchmark to assess large language models' performance and knowledge acquisition in molecular science.
http://arxiv.org/abs/2402.04110v1
Compressor summary: The paper investigates GPT-3.5 and GPT-4's dark personality traits and conspiracy beliefs using psychological tests and finds no significant differences between them, except for GPT-4's increased belief in information withholding.
http://arxiv.org/abs/2402.04108v1
Compressor summary: The paper proposes a machine learning-based decision support for assigning delay attribution codes to train delays in Sweden, which performs better than a random classifier but not as well as the current manual method.
http://arxiv.org/abs/2402.04103v1
Compressor summary: The paper develops a customer segmentation model for online retail using a UK dataset and shows that the Gaussian mixture model performs best.
http://arxiv.org/abs/2402.04101v1
Compressor summary: The paper presents a novel volumetric and parametric facial prior (VRMM) for 3D face modeling that efficiently disentangles and encodes identity, expression, and lighting into low-dimensional representations, enabling relighting capabilities and various applications.
http://arxiv.org/abs/2402.04097v1
Compressor summary: The text describes a study on deep image prior, its limitations, and a proposed self-driven method to improve image restoration without needing reference images or supervision.
http://arxiv.org/abs/2402.04088v1
Compressor summary: The paper explores the use of large language models like RoBERTa for detecting cyberbullying in social media, showing its superior performance over other models.
http://arxiv.org/abs/2402.04087v1
Compressor summary: The paper proposes a fast and efficient method using Gaussian Discriminant Analysis to enhance CLIP's performance in various tasks without additional training time or resources.
http://arxiv.org/abs/2402.04084v1
Compressor summary: The text introduces a study on provably learning a multi-head attention layer from random examples, providing algorithms, upper and lower bounds, and analyzing different settings.
http://arxiv.org/abs/2402.04082v1
Compressor summary: The text discusses using machine learning techniques to predict house prices in Ames City, USA, and finds that XGBoost performs best.
http://arxiv.org/abs/2402.04081v1
Compressor summary: The paper analyzes the overfitting problem in deep weight space models and proposes a MixUp method for data augmentation to increase diversity and improve performance.
http://arxiv.org/abs/2402.04080v1
Compressor summary: The paper introduces a diffusion policy with entropy regularization and Q-ensembles for offline reinforcement learning, achieving state-of-the-art results on D4RL benchmarks.
http://arxiv.org/abs/2402.04075v1
Compressor summary: The study presents a teacher-student architecture using LLMs to improve prostate cancer radiotherapy symptom extraction from clinical notes, achieving significant improvements in accuracy and other metrics.
http://arxiv.org/abs/2402.04068v1
Compressor summary: R2E is a retrieval-based language model that explains its predictions using Shapley values and can adapt to new evidence without retraining, improving drug target identification from scientific literature.
http://arxiv.org/abs/2402.04064v1
Compressor summary: Key points: - Road pavement detection and segmentation are important for autonomous road repair systems - Proposed a novel end-to-end method for multi-class road defect detection and segmentation with attention blocks - Experiments show the proposed method outperforms existing methods on a new dataset Summary: The authors propose a new end-to-end method for detecting and segmenting multiple road defects using attention blocks, which achieves better results than previous methods on a new dataset.
http://arxiv.org/abs/2402.04062v1
Compressor summary: The paper proposes two frameworks for link prediction with relational hypergraphs using graph neural networks, analyzing their expressive power and empirically showing their effectiveness.
http://arxiv.org/abs/2402.04059v1
Compressor summary: The paper surveys deep learning methods for imputing missing values in multivariate time series data and evaluates their impact on downstream tasks, while providing a taxonomy and highlighting strengths and limitations.
http://arxiv.org/abs/2402.04054v1
Compressor summary: The text presents a new framework for analyzing and designing meta-learning methods using PAC-Bayesian theory, which allows more direct and flexible transfer of knowledge between tasks.
http://arxiv.org/abs/2402.04051v1
Compressor summary: The paper analyzes how weight matching (WM) helps identify linear mode connectivity (LMC) by aligning the directions of singular vectors with large singular values across models for effective model merging.
http://arxiv.org/abs/2402.04050v1
Compressor summary: The paper proposes CraFT, a method to fine-tune black-box vision-language models using input prompts and output predictions, achieving significant improvements in few-shot classification with less memory and faster training.
http://arxiv.org/abs/2402.04049v1
Compressor summary: Large language models struggle to simulate human political debates due to inherent social biases that affect their behavior and deviation from established social dynamics.
http://arxiv.org/abs/2402.04033v1
Compressor summary: This paper studies how sensitive information can be inferred through edge reconstruction attacks on graph neural models, and explores the effectiveness of a private graph representation method against such attacks.
http://arxiv.org/abs/2402.04031v1
Compressor summary: Polyp-DDPM is a diffusion-based method that generates realistic images of polyps using segmentation masks, improving image quality and polyp segmentation performance.
http://arxiv.org/abs/2402.04030v1
Compressor summary: The text describes how neural networks can be trained directly with the energy function of density functional theory to predict molecular properties faster and more efficiently than previous methods.
http://arxiv.org/abs/2402.04029v1
Compressor summary: pcDEQ models improve deep equilibrium models by ensuring existence, uniqueness, and stability of the fixed point through nonnegative and concave constraints, with theoretical convergence guarantees.
http://arxiv.org/abs/2402.04028v1
Compressor summary: AlbNews is a new text corpus for Albanian news headlines that can be used for research in topic modeling and machine learning, with initial classification scores reported.
http://arxiv.org/abs/2402.04023v1
Compressor summary: The study evaluates Google Translate's accuracy and comprehensibility for translating mental health information into different languages and finds challenges in medical terminology, fluency, and formatting.
http://arxiv.org/abs/2402.04013v1
Compressor summary: This paper provides a comprehensive survey of MI attacks and defenses on DNNs, covering various modalities and learning tasks.
http://arxiv.org/abs/2402.04010v1
Compressor summary: The paper proposes new methods to protect data from unauthorized use by making it hard for both supervised and contrastive learning algorithms to learn from it.
http://arxiv.org/abs/2402.04009v1
Compressor summary: LAST is a method that finetunes pretrained models efficiently by disentangling trainable modules from the frozen model using low-rank self-attention and reducing GPU memory consumption and training time.
http://arxiv.org/abs/2402.04005v1
Compressor summary: The paper proposes a new gradient aggregation method for multi-task learning that considers uncertainty in gradient dimensions using Bayesian inference, leading to improved performance.
http://arxiv.org/abs/2402.04004v1
Compressor summary: This study examines how different types and intensities of noise in chain of thought traces affect performance of large language models on algorithmically solvable tasks using a custom framework to generate noisy execution traces.
http://arxiv.org/abs/2402.03994v1
Compressor summary: The text discusses the importance of random projections for storing many vectors while maintaining accurate geometry information in neural networks, and proposes a design space for scalable sketching algorithms.
http://arxiv.org/abs/2402.03991v1
Compressor summary: The paper investigates low-rank bias and neural rank collapse in nonlinear deep networks, showing that increasing weight decay leads to lower layer ranks proportional to hidden-space variability.
http://arxiv.org/abs/2402.03989v1
Compressor summary: YOLOPoint is a fast and accurate neural network model that detects keypoints and objects in images for GNSS-independent navigation of intelligent vehicles.
http://arxiv.org/abs/2402.03985v1
Compressor summary: Using multiple synthetic datasets for supervised learning improves accuracy and model selection, especially for high-variance predictors, according to a new theoretical analysis.
http://arxiv.org/abs/2402.03981v1
Compressor summary: The paper proposes a new method, Controllable Diffusion Trajectory (CDT), for predicting future vehicle trajectories in complex traffic scenarios using map information and social interactions with a conditional denoising diffusion model that generates diverse and realistic predictions.
http://arxiv.org/abs/2402.03979v1
Compressor summary: The paper explores how label smoothing affects deep neural networks' convergence, performance, and calibration using Neural Collapse theory and empirical evidence.
http://arxiv.org/abs/2402.03973v1
Compressor summary: Key points: - The text compares human and deep network performance on recognizing objects in unusual poses - Humans are better than networks at this task, but need more time - The mental processes of humans differ from feed-forward networks Summary: The text shows that humans can recognize objects better than deep networks when they are in unusual poses, but both take extra time for this task and use different mental processes.
http://arxiv.org/abs/2402.03970v1
Compressor summary: The paper compares neural networks and transformers with decision trees and traditional MLPs on tabular data and finds that neural networks are competitive against decision trees, while transformers do not outperform simpler MLP variants.
http://arxiv.org/abs/2402.03969v1
Compressor summary: The study investigates how large language models learn from feedback and shows that their learning is influenced by the problem's framing, similar to human cognition.
http://arxiv.org/abs/2402.03966v1
Compressor summary: For any non-polynomial activation function, message-passing graph neural networks (MPNNs) can be equivalent to the Weisfeiler--Leman isomorphism test with constant dimension feature vectors, unlike previous results that required higher dimensions depending on the graph size.
http://arxiv.org/abs/2402.03962v1
Compressor summary: The paper discusses how anthropomorphism affects Machine Learning research, leading to over-attribution of human-like qualities to Large Language Models, and calls for academic caution and integrity in interpreting AI results.
http://arxiv.org/abs/2402.03957v1
Compressor summary: The paper proposes two methods to improve document similarity computation by using directed and sparse graphs that capture sequential information, achieving better results than a traditional undirected graph approach.
http://arxiv.org/abs/2402.03951v1
Compressor summary: The paper proposes a new attacking strategy, DeCoWA, that can effectively transfer adversarial examples across different model genera, such as CNNs and Transformers.
http://arxiv.org/abs/2402.03941v1
Compressor summary: COAT is a tool that uses large language models to help discover hidden causal variables from raw observational data, and then uses a causal learning module to provide explanations and feedback for improvement.
http://arxiv.org/abs/2402.03927v1
Compressor summary: The paper analyzes data contamination in OpenAI's GPT-3.5 and GPT-4 models, finding they received $\sim$4.7M samples from 263 benchmarks and exposing evaluation malpractices.
http://arxiv.org/abs/2402.03923v1
Compressor summary: RADT is a new model for offline reinforcement learning that improves the alignment between the actual and target returns by decoupling them from the input sequence.
http://arxiv.org/abs/2402.03921v1
Compressor summary: LLMs can enhance BO by proposing promising solutions based on historical evaluations, improving surrogate modeling and candidate sampling in the early stages of search.
http://arxiv.org/abs/2402.03917v1
Compressor summary: Elastic Feature Consolidation (EFC) is a method that improves Exemplar-Free Class Incremental Learning (EFCIL) by consolidating features, regularizing drift, and balancing prototype rehearsal for cold start scenarios.
http://arxiv.org/abs/2402.03915v1
Compressor summary: The paper proposes learning metrics from short-term signals to improve the statistical power and reduce the cost of online controlled experiments in technology companies.
http://arxiv.org/abs/2402.03908v1
Compressor summary: EscherNet is a diffusion model that learns implicit 3D representations and can synthesize multiple views with high quality and flexibility, outperforming existing methods in various tasks.
http://arxiv.org/abs/2402.03905v1
Compressor summary: The paper explores using machine learning to predict employee turnover and its impact on organizational knowledge.
http://arxiv.org/abs/2402.03904v1
Compressor summary: The paper proposes a new method for shape matching using functional maps that preserves multiple spectral filter operators, which leads to more informative and stable results, outperforming existing methods.
http://arxiv.org/abs/2402.03903v1
Compressor summary: The text introduces compound returns, a method to reduce variance in multistep reinforcement learning by using weighted averages of $n$-step returns and two-bootstrap returns, which improve sample efficiency with minimal extra cost.
http://arxiv.org/abs/2402.03902v1
Compressor summary: The paper studies how a neural network layer learns to attend to tokens based on their positions or meanings, and shows that it can learn either mechanism depending on the data size and quality.
http://arxiv.org/abs/2402.03900v1
Compressor summary: The paper proposes Pro-HAN, a method that uses a heterogeneous graph attention network to reason across multiple types of profile information for spoken language understanding tasks.
http://arxiv.org/abs/2402.03898v1
Compressor summary: DistiLLM is a new knowledge distillation framework for language models that uses a novel skew divergence loss and an adaptive off-policy approach to compress teacher models, reduce inference costs, and achieve significant speedups.
http://arxiv.org/abs/2402.03896v1
Compressor summary: The text introduces CRVQA, a method to generate visual and textual rationales for VQA answers, which improves accuracy and trust in the predictions.
http://arxiv.org/abs/2402.03887v1
Compressor summary: The paper examines how language and gender have been a recurring issue in the German Bundestag since the 1980s, using examples of linguistic practices related to gender inclusivity and discussing their implications for the current debate on gender-inclusive language.
http://arxiv.org/abs/2402.03885v1
Compressor summary: MOMENT is a family of open-source time-series foundation models that overcomes challenges in pre-training and evaluation, and shows effectiveness on diverse tasks with minimal data.
http://arxiv.org/abs/2402.03877v1
Compressor summary: This paper explores the challenges large language models face in constructive geometric problem-solving and proposes a framework to enhance their reasoning abilities using an internal dialogue among specialized agents.
http://arxiv.org/abs/2402.03870v1
Compressor summary: Our study finds that rewriting non-gender-inclusive texts in German to be gender-inclusive would require changing less than 1% of all tokens on average, challenging the arguments that gender-inclusive language makes texts too long or negatively affects language learners.
http://arxiv.org/abs/2402.03864v1
Compressor summary: The text discusses how the Neural Tangent Kernel perspective can help analyze Physics-Informed Neural Networks for solving nonlinear Partial Differential Equations, highlighting the benefits of using second-order methods and addressing their challenges with numerical examples and validation.
http://arxiv.org/abs/2402.03855v1
Compressor summary: The text discusses the importance of studying hidden representations in neural networks for mechanistic interpretability and argues that current methods are insufficient for this purpose.
http://arxiv.org/abs/2402.03848v1
Compressor summary: The paper introduces ANLS*, a new metric for evaluating generative large language models (GLLMs) on various tasks, and shows that SFT prompting technique outperforms others in most cases.
http://arxiv.org/abs/2402.03846v1
Compressor summary: BISECT is a new outlier generation method that creates realistic outliers with 'multiple views' property, improving outlier detection in diverse datasets.
http://arxiv.org/abs/2402.03845v1
Compressor summary: The authors analyze the vector field of diffusion models, which can be either conservative or not, and show its impact on density estimation and sampling performance.
http://arxiv.org/abs/2402.03843v1
Compressor summary: The paper introduces an algorithm using RGBD-UNet and VovNetV3.5 models to detect damage in steel ropes in high-altitude environments with improved accuracy and background augmentation.
http://arxiv.org/abs/2402.03833v1
Compressor summary: The paper proposes a new nonlinear dictionary learning algorithm based on a feed-forward neural network called Random Vector Functional Link, which learns a sparse-to-dense feature map and incorporates higher-order dependencies between input coefficients and dictionary atoms, achieving good performance in image classification and reconstruction tasks.
http://arxiv.org/abs/2402.03832v1
Compressor summary: The paper explores using large language models for skill extraction, which can handle complex skill mentions better than supervised models.
http://arxiv.org/abs/2402.03830v1
Compressor summary: OASim is an open and adaptive autonomous driving data generator using implicit neural rendering to create high-quality, customizable, and diverse datasets for training algorithms efficiently and safely.
http://arxiv.org/abs/2402.03828v1
Compressor summary: The paper proposes a new scalable method to find an average distribution from multiple probability measures using Neural OT solver and shows its advantages and error bounds in various scenarios.
http://arxiv.org/abs/2402.03824v1
Compressor summary: Embodied AI is a new approach to artificial intelligence that emphasizes perception, action, memory, and learning as essential components of an embodied agent, aiming to achieve Artificial General Intelligence through cognitive architectures and active inference principles.
http://arxiv.org/abs/2402.03822v1
Compressor summary: RevOrder is a technique that improves arithmetic operations in large language models by reversing output digits, reducing complexity, and enhancing performance especially with division tasks.
http://arxiv.org/abs/2402.03818v1
Compressor summary: The article analyzes the generalization performance of graph convolutional networks on data from attributed stochastic block models in different settings and compares their convergence rates to the Bayes-optimal rate.
http://arxiv.org/abs/2402.03814v1
Compressor summary: The paper proposes a new graph self-supervised learning method using continuous edge masks that improve message propagation and node classification on graph neural networks.
http://arxiv.org/abs/2402.03807v1
Compressor summary: SEABO is a search-based method for offline imitation learning that learns a reward function from expert and unlabeled data, achieving competitive performance to offline RL with ground-truth rewards and outperforming prior methods.
http://arxiv.org/abs/2402.03804v1
Compressor summary: Sparse computation using non-ReLU activation functions improves efficiency and performance of Large Language Models, with ReLU$^2$ being the best choice among tested functions.
http://arxiv.org/abs/2402.03796v1
Compressor summary: The review paper discusses the current state and future challenges of face detection in computer vision applications involving humans.
http://arxiv.org/abs/2402.03795v1
Compressor summary: SMART is a novel framework for unsupervised domain adaptation in semantic segmentation that uses Energy-Based Models to reduce discrepancy between semantic and depth features and assess feature fusion reliability.
http://arxiv.org/abs/2402.03792v1
Compressor summary: The paper proposes two algorithms for no-regret reinforcement learning in continuous state and action spaces based on a novel structural assumption called $u-$smoothness.
http://arxiv.org/abs/2402.03785v1
Compressor summary: KDAlign is a novel framework that uses rule knowledge from experts and Optimal Transport technique to improve weakly supervised anomaly detection accuracy on web-based applications.
http://arxiv.org/abs/2402.03784v1
Compressor summary: AirPhyNet combines physics principles and neural networks for better air quality prediction and understanding.
http://arxiv.org/abs/2402.03783v1
Compressor summary: The paper proposes MedPrompt, a weakly supervised method to automatically generate medical text prompts for vision-language models, reducing the need for manual annotations and expert input in medical image recognition tasks.
http://arxiv.org/abs/2402.03782v1
Compressor summary: The paper studies soft prompt tuning (SPT) for cross-lingual transfer by training only the learnable embeddings without modifying the model parameters, reducing costs and improving performance for linguistically distant languages.
http://arxiv.org/abs/2402.03780v1
Compressor summary: The paper presents the PPN dataset of propaganda news articles and tests various NLP techniques to identify their stylistic features.
http://arxiv.org/abs/2402.03776v1
Compressor summary: The study explores using large language models to replace peer grading in online courses, showing promising results when combined with instructor guidance and rubrics.
http://arxiv.org/abs/2402.03774v1
Compressor summary: MetaTree is a transformer-based model that produces high-quality decision trees for classification by learning from outputs of classical algorithms and adapting strategies based on context.
http://arxiv.org/abs/2402.03771v1
Compressor summary: RLBR is a new setting where agents learn from bagged rewards and RBT is a Transformer-based model that helps them explore and understand these rewards better.
http://arxiv.org/abs/2402.03766v1
Compressor summary: The authors present MobileVLM V2, an improved family of vision language models that perform well with fewer parameters than previous models.
http://arxiv.org/abs/2402.03762v1
Compressor summary: MoD-SLAM is a monocular dense mapping method for real-time global pose optimization and 3D reconstruction in unbounded scenes, overcoming the limitations of existing neural SLAM approaches.
http://arxiv.org/abs/2402.03758v1
Compressor summary: The study presents MDKNet, a method to handle domain bias in multidomain crowd counting by modulating the information flow and learning a domain-separable latent space.
http://arxiv.org/abs/2402.03757v1
Compressor summary: The paper introduces CorrelationQA, a benchmark to measure how well multi-modal language models can resist being fooled by spurious images that are relevant but inconsistent with the correct answers.
http://arxiv.org/abs/2402.03755v1
Compressor summary: The paper presents a framework to improve LLM-based agents for specialized domains like quantitative investment by iteratively refining their knowledge base from real-world scenarios and demonstrates its effectiveness with an agent named QuantAgent.
http://arxiv.org/abs/2402.03754v1
Compressor summary: Key points: - Automatic radiology report generation is booming but faces challenges in multi-view reasoning and context reasoning - Proposed model simulates clinicians' perspectives by integrating multi-view vision perception and multi-modal information in report generation - Experiments show superior performance of the proposed method on two datasets Summary: The paper proposes a new model for automatic radiology report generation that better mimics clinicians' reasoning by using multi-view vision perception and multi-modal information, and shows its effectiveness on two datasets.
http://arxiv.org/abs/2402.03753v1
Compressor summary: The proposed method uses uncertainty as a collective variable to guide the acquisition of chemically-relevant data points for improving machine learned interatomic potentials.
http://arxiv.org/abs/2402.03752v1
Compressor summary: The report shows that a lightweight Vision Transformer can outperform Convolutional Neural Networks on small image resolutions and datasets with minimal scaling and pre-training using a masked auto-encoder technique.
http://arxiv.org/abs/2402.03750v1
Compressor summary: The text introduces a new method, Digital Twin Mobility Profiling (DTMP), which uses alignment diagrams and a specialized network to learn spatio-temporal patterns in urban traffic data for intelligent transportation systems.
http://arxiv.org/abs/2402.03749v1
Compressor summary: The paper proposes a new loss function for weak-to-strong supervision, which improves the performance of vision foundation models in various scenarios, outperforming strong-to-strong and fine-tuning methods.
http://arxiv.org/abs/2402.03747v1
Compressor summary: The study proposes a deep learning network (ICNet) to discover partial differential equations (PDEs) from sparse data with high noise by incorporating Galilean invariance and other physical constraints, achieving excellent results for fluid mechanics and wave equations.
http://arxiv.org/abs/2402.03746v1
Compressor summary: The paper introduces a new method for aligning video and text modalities using reinforcement learning from AI feedback, which improves performance on various video benchmarks.
http://arxiv.org/abs/2402.03744v1
Compressor summary: The text proposes a new method to detect when large language models make mistakes by analyzing their internal states and using a simple metric called EigenScore.
http://arxiv.org/abs/2402.03741v1
Compressor summary: The paper proposes a novel black-box attack (SUB-PLAY) that exploits the vulnerabilities of multi-agent systems under partial observability, inducing significant changes in their policy networks and posing security threats.
http://arxiv.org/abs/2402.03738v1
Compressor summary: The paper proposes a scene recovery network (AoSRNet) that improves low-visibility imaging by integrating multiple techniques to enhance contrast, color, and texture in challenging conditions like haze, dust, and low light.
http://arxiv.org/abs/2402.03737v1
Compressor summary: PrivateLASSO is a differentially private LASSO bandit algorithm that works well for high-dimensional stochastic contextual linear bandit problems with sparse parameters and privacy constraints.
http://arxiv.org/abs/2402.03732v1
Compressor summary: DEAN is a deep learning framework that uses contrastive learning on a pre-defined graph to detect outdated facts in knowledge graphs effectively.
http://arxiv.org/abs/2402.03728v1
Compressor summary: The paper presents a framework that improves decision-making consistency using external knowledge and global normalization based on ILP.
http://arxiv.org/abs/2402.03726v1
Compressor summary: The paper proposes a new deep learning framework, ISAHP, that can discover fine-grained causal relationships among events without strong assumptions or heuristics.
http://arxiv.org/abs/2402.03723v1
Compressor summary: This paper introduces Rig3DGS, a method to generate 3D human portraits from smartphone videos by using learned deformations guided by a 3D morphable model, which improves rendering quality and facial expression control.
http://arxiv.org/abs/2402.03720v1
Compressor summary: SNS is a method that uses Language Learning Models to improve node classification in Text-attributed Graphs by selecting similar neighbors, leading to better graph representation and generalization.
http://arxiv.org/abs/2402.03719v1
Compressor summary: The paper introduces LaMAI, a method that uses active learning to help large language models ask better questions and improve their responses in conversational contexts, leading to more accurate answers and better user experiences.
http://arxiv.org/abs/2402.03716v1
Compressor summary: The paper proposes a method to identify people across videos when they change clothes using gait and body shape features learned with graph attention networks, improving performance significantly.
http://arxiv.org/abs/2402.03715v1
Compressor summary: Clarify is a method that uses natural language feedback to improve model training by correcting misconceptions.
http://arxiv.org/abs/2402.03708v1
Compressor summary: The paper introduces SISP, a new benchmark dataset for ship instance segmentation in panchromatic satellite images, with well-annotated data and a proposed method (DFRInst) to improve performance on real-world scenes.
http://arxiv.org/abs/2402.03705v1
Compressor summary: TAGA attacks use natural perturbations like blur to manipulate image attributes in guided image synthesis methods, raising ethical concerns about their potential to contradict user intentions.
http://arxiv.org/abs/2402.03701v1
Compressor summary: The paper introduces USD3, a simplified and unified framework for training and sampling discrete diffusion models on discrete data like language and graphs, improving performance over existing methods.
http://arxiv.org/abs/2402.03698v1
Compressor summary: The local learning coefficient (LLC) measures model complexity and can be estimated accurately for deep linear networks using a method from singular learning theory.
http://arxiv.org/abs/2402.03697v1
Compressor summary: SHMC-Net is a new method for sperm head morphology classification that uses segmentation masks to guide the analysis and improves accuracy on small datasets with noisy labels.
http://arxiv.org/abs/2402.03690v1
Compressor summary: Key points: - Free-hand sketching is inefficient and subjective for representing 3D objects - 3Dooole generates view-consistent sketches from multi-view images using 3D strokes - 3D strokes are optimized to minimize perceptual losses and represent essential 3D shapes Summary: 3Dooole is a method that creates realistic and consistent sketches of 3D objects from multiple views by optimizing 3D strokes that capture essential shapes.
http://arxiv.org/abs/2402.03687v1
Compressor summary: PAARD is a new graph generation model that combines autoregressive and diffusion methods for efficient and high-quality graph generation, achieving state-of-the-art results on various datasets.
http://arxiv.org/abs/2402.03686v1
Compressor summary: Key points: - The paper compares human and LLM inference judgments using a curated entailment verification benchmark - LLMs are better at multi-hop reasoning, humans excel in simple deductive reasoning - A fine-tuned Flan-T5 model outperforms GPT-3.5 and rivals with GPT-4 Summary: The paper evaluates human and LLM inference skills on a complex benchmark, finds differences in their strengths, and introduces a new Flan-T5 model that performs well in entailment verification and explanation generation.
http://arxiv.org/abs/2402.03678v1
Compressor summary: LSTS is a novel approach that learns a set of RL policies to guide an agent from an initial state to a goal state based on high-level task specifications, while minimizing the number of environmental interactions.
http://arxiv.org/abs/2402.03667v1
Compressor summary: The paper proposes a novel Indirect Reasoning method for LLMs that uses contrapositives and contradictions to improve their reasoning abilities in tasks like factual reasoning and mathematic proof.
http://arxiv.org/abs/2402.03666v1
Compressor summary: Key points: - Existing methods for compressing and accelerating diffusion models fail when quantized to low-bits - The paper identifies three properties that make low-bit quantization difficult: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules - The paper proposes finetuning the quantized model to address these issues and achieve state-of-the-art performance on image generation tasks Summary: The paper proposes a finetuning method for low-bit quantized diffusion models that improves their performance on image generation by addressing imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations.
http://arxiv.org/abs/2402.03664v1
Compressor summary: The paper proposes two new solvers for the partial Gromov-Wasserstein problem, which enables comparison and matching of measures with unequal masses in different metric spaces, and shows their effectiveness on shape-matching and positive-unlabeled learning problems.
http://arxiv.org/abs/2402.03663v1
Compressor summary: The paper introduces Neurosymbolic Deep Neural Networks, which combine neural and symbolic layers for perception and reasoning tasks, and proposes the principle of symbol correctness for designing and analyzing these models.
http://arxiv.org/abs/2402.03661v1
Compressor summary: The study proposes a method to estimate rewards for unlabelled data in offline reinforcement learning using graph-based reward inference and limited human annotations, improving task performance.
http://arxiv.org/abs/2402.03660v1
Compressor summary: The paper discovers that finetuned models from a common pretrained checkpoint have a linear relationship across tasks, which reveals new insights into model merging and editing.
http://arxiv.org/abs/2402.03659v1
Compressor summary: The SEP framework uses a self-reflective agent and PPO to train a LLM to generate explainable stock predictions without human annotations.
http://arxiv.org/abs/2402.03658v1
Compressor summary: The text introduces EDGE, a novel framework that incorporates utterance, video, and audio sentiment signals to generate natural language explanations for sarcastic dialogues, addressing challenges in previous models.
http://arxiv.org/abs/2402.03655v1
Compressor summary: This paper introduces a new method to learn eigenfunctions using neural networks that approximates low-rank matrices and preserves orthogonality efficiently.
http://arxiv.org/abs/2402.03654v1
Compressor summary: This paper evaluates two metrics (FID and SID) for measuring the performance of image-to-image GANs and finds that SID might be more efficient and effective than FID.
http://arxiv.org/abs/2402.03647v1
Compressor summary: The text introduces CAMBranch, a framework that uses contrastive learning and variable shifting to improve machine learning-based branching policies for MILPs with limited expert data.
http://arxiv.org/abs/2402.03646v1
Compressor summary: Lens is a network traffic model using T5 architecture that learns representations from large-scale unlabeled data and performs well in downstream tasks with less labeled data needed.
http://arxiv.org/abs/2402.03642v1
Compressor summary: The Stanceosaurus 2.0 dataset adds Russian and Spanish tweets to analyze cross-cultural and cross-lingual misinformation using stance classification.
http://arxiv.org/abs/2402.03640v1
Compressor summary: The authors propose a new method to solve MaxSAT using a single differentiable function and a neural network architecture that works without training data or an underlying SAT solver, achieving better results than existing solvers.
http://arxiv.org/abs/2402.03634v1
Compressor summary: BEAM is a novel technique that improves multi-view 3D object detection by using Beta Distribution Ray Denoising to handle ambiguous depth information and achieve state-of-the-art results.
http://arxiv.org/abs/2402.03631v1
Compressor summary: CAT-SAM is a method that adapts SAM for various unconventional image segmentation tasks with few-shot target samples by tuning the mask decoder and image encoder together using a prompt bridge structure.
http://arxiv.org/abs/2402.03629v1
Compressor summary: The paper explores how reducing non-linear activations in neural networks for privacy-preserving inference may harm minority groups' accuracy and proposes a mitigation strategy.
http://arxiv.org/abs/2402.03628v1
Compressor summary: PAgents are autonomous agents using large language models to develop expertise and provide professional services in various domains.
http://arxiv.org/abs/2402.03625v1
Compressor summary: The paper analyzes how two-layer ReLU networks with weight decay regularization and their convex relaxations perform when trained with random data, showing that a simple algorithm can solve the non-convex problem efficiently and proving that random initialization leads to low training loss for local gradient methods.
http://arxiv.org/abs/2402.03621v1
Compressor summary: The paper proposes a self-supervised neural network approach to approximate near-optimal solutions for (M)MAP inference tasks in probabilistic circuits, achieving linear time performance and outperforming existing methods on benchmark datasets.
http://arxiv.org/abs/2402.03620v1
Compressor summary: SELF-DISCOVER is a framework that helps large language models improve their reasoning skills by composing multiple reasoning modules into an explicit structure.
http://arxiv.org/abs/2402.03618v1
Compressor summary: The text studies how humans create mental abstractions from sensory data using serial reproduction, comparing unimodal and multimodal chains with both humans and GPT-4, and finds that adding language increases human abstractions more than GPT-4's.
http://arxiv.org/abs/2402.03616v1
Compressor summary: Large Language Models (LLMs) assist workers in choosing suitable workspaces for hybrid work environments by providing intelligent suggestions and explanations based on resource trade-offs.
http://arxiv.org/abs/2402.03614v1
Compressor summary: The paper proposes a new Bayesian VAR model with a hierarchical graph prior for discovering Granger causal relations from observational multivariate time-series data, which improves uncertainty quantification, has fewer hyperparameters, and performs better than existing methods.
http://arxiv.org/abs/2402.03610v1
Compressor summary: The Retrieval-Augmented Planning (RAP) framework improves large language models' decision-making abilities by dynamically using past experiences in various textual and multimodal scenarios.
http://arxiv.org/abs/2402.03607v1
Compressor summary: The text discusses how combining explicit commonsense knowledge in the form of knowledge graphs with large Vision Language Models can improve predicting the effectiveness of multi-modal marketing campaigns, enabling early detection and assessment of persuasive campaigns.
http://arxiv.org/abs/2402.03597v1
Compressor summary: The study demonstrates GPT-4's ability to accurately extract reasons for contraceptive switching from clinical notes, outperforming baseline models and showing that patient preference, adverse events, and insurance are key factors in switch decisions.
http://arxiv.org/abs/2402.03592v1
Compressor summary: GRASP is a novel framework that uses graph structures and multi-magnification information to improve cancer subtyping in digital pathology, outperforming existing methods while being smaller and more interpretable.