arxiv compressed, 2024-02-13

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-13 generated by the compressor, my personal LLM-based project.


FAST: Factorizable Attention for Speeding up Transformers

http://arxiv.org/abs/2402.07901v1

Compressor summary: The paper proposes a faster, more efficient attention mechanism for transformers using a factorable form of attention that reduces computational and memory complexity while maintaining full representation and all-to-all relationship between tokens.


Wavefront Randomization Improves Deconvolution

http://arxiv.org/abs/2402.07900v1

Compressor summary: Adding a random mask to an imaging system reduces optical aberrations and improves image quality by making deconvolution less sensitive to noise.


A systematic investigation of learnability from single child linguistic input

http://arxiv.org/abs/2402.07899v1

Compressor summary: The study trains six model architectures on five datasets containing subsets of a child's linguistic input to examine their ability to form meaningful syntactic and semantic representations, finding that they consistently match previous results.


Suppressing Pink Elephants with Direct Principle Feedback

http://arxiv.org/abs/2402.07896v1

Compressor summary: Key points: - Existing methods for controlling language models are not always suitable for inference time use - The paper proposes a novel method called Direct Principle Feedback that simplifies Constitutional AI - The method is tested on the Pink Elephant Problem, where an LLM should avoid discussing a certain entity and focus on another - The results show that the proposed method performs well compared to other models and GPT-4 Summary: The paper introduces Direct Principle Feedback, a simplified version of Constitutional AI, for controlling language models at inference time. It demonstrates its effectiveness on the Pink Elephant Problem, where an LLM should avoid mentioning a forbidden entity and discuss a preferred one.


Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets

http://arxiv.org/abs/2402.07895v1

Compressor summary: The study presents a visual machine learning method for plant disease detection using real-world camera data, showing improved accuracy with a sequential CNN model.


MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

http://arxiv.org/abs/2402.07894v1

Compressor summary: YOLO Phantom is a small, efficient object detection model that works well in low-light and occluded scenarios for IoT applications.


Label-Efficient Model Selection for Text Generation

http://arxiv.org/abs/2402.07891v1

Compressor summary: DiffUse is an efficient method for choosing between text generation models by clustering embeddings and reducing the need for preference annotations.


MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement Learning

http://arxiv.org/abs/2402.07890v1

Compressor summary: The paper introduces MAIDCRL, a semi-centralized reinforcement learning method for multi-agent control using convolutional layers, which improves performance and speed on both homogeneous and heterogeneous StarCraft scenarios.


WildfireGPT: Tailored Large Language Model for Wildfire Analysis

http://arxiv.org/abs/2402.07877v1

Compressor summary: WildfireGPT is a prototype LLM agent that uses climate projections and scientific literature to provide detailed, domain-specific insights on wildfire risks for various end users.


Policy Improvement using Language Feedback Models

http://arxiv.org/abs/2402.07876v1

Compressor summary: Language Feedback Models (LFMs) improve instruction following by identifying desirable actions from large language models and generalize to unseen environments.


Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

http://arxiv.org/abs/2402.07875v1

Compressor summary: The paper explores how the implicit bias of policy gradient in reinforcement learning affects extrapolation to unseen initial states and suggests selecting initial states wisely for better performance.


Scaling Laws for Fine-Grained Mixture of Experts

http://arxiv.org/abs/2402.07871v1

Compressor summary: This paper analyzes how Mixture of Experts models can be optimized by adjusting a new hyperparameter called granularity, leading to more efficient and better performing language models than dense Transformers.


Nesting Particle Filters for Experimental Design in Dynamical Systems

http://arxiv.org/abs/2402.07868v1

Compressor summary: The paper presents a new Bayesian Experimental Design method that optimizes risk-sensitive policies using nested sequential Monte Carlo estimators, outperforming existing methods on dynamical systems.


Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

http://arxiv.org/abs/2402.07865v1

Compressor summary: The authors evaluate, analyze, and improve visually-conditioned language models (VLMs) for visual dialogue and related tasks, providing a unified framework, code, and checkpoints.


Lissard: Long and Simple Sequential Reasoning Datasets

http://arxiv.org/abs/2402.07859v1

Compressor summary: The paper presents Lissard, a benchmark to test language models' ability to handle long sequences with repetitive rules, and shows that existing models perform worse on these tasks as the sequence length increases.


Multiscale Neuroimaging Features for the Identification of Medication Class and Non-Responders in Mood Disorder Treatment

http://arxiv.org/abs/2402.07858v1

Compressor summary: The text discusses using multi spatial scale neuroimaging features to help identify patients with mood disorders who may not respond to standard treatments and find better alternatives.


Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NCEP-NWP forecasts

http://arxiv.org/abs/2402.07851v1

Compressor summary: Key points: - The paper trains neural networks to forecast rainfall in India using historical data from 1901 to 2022 - The paper compares the neural network predictions with NWP forecasts and persistence estimates - The paper finds that neural network predictions are more accurate than both alternatives, especially for three day forecasts - The paper suggests that NWP forecasts can be improved by using more diverse data and better neural network architecture Summary: The paper shows how neural networks trained on historical rainfall data in India outperform existing methods in predicting short-term rainfall, and proposes ways to improve NWP forecasts.


Generative Modeling of Discrete Joint Distributions by E-Geodesic Flow Matching on Assignment Manifolds

http://arxiv.org/abs/2402.07846v1

Compressor summary: The paper proposes a new generative model for discrete distributions using normalizing flows, which gradually assign categories and avoid discretization issues, and can represent complex dependencies in structured data.


An Investigation into Using Unsupervised Metrics to Optimise GNNs for Node Clustering

http://arxiv.org/abs/2402.07845v1

Compressor summary: This paper shows that modularity can be used to optimize graph neural networks (GNNs) without ground-truth comparisons and investigates its limitations on synthetic datasets with different information partitioning scenarios.


Do Membership Inference Attacks Work on Large Language Models?

http://arxiv.org/abs/2402.07841v1

Compressor summary: The paper studies how well membership inference attacks can guess if a text is part of the training data for large language models, finding that these attacks are mostly ineffective due to factors like dataset size and fuzzy boundaries between members and non-members.


Towards Meta-Pruning via Optimal Transport

http://arxiv.org/abs/2402.07839v1

Compressor summary: Intra-Fusion is a novel neural network pruning method that uses fusion and Optimal Transport to create a more effective sparse model without the need for fine-tuning, and it can also reduce training time.


Generalizing across Temporal Domains with Koopman Operators

http://arxiv.org/abs/2402.07834v1

Compressor summary: The study proposes Temporal Koopman Networks (TKNets) for addressing the challenging problem of generalizing predictive models to evolving domains using Koopman theory.


Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

http://arxiv.org/abs/2402.07827v1

Compressor summary: Aya is a multilingual language model that performs well on various tasks across 101 languages and introduces new evaluation methods to assess its performance.


Understanding fitness landscapes in morpho-evolution via local optima networks

http://arxiv.org/abs/2402.07822v1

Compressor summary: The paper applies Local Optima Network analysis to compare the fitness landscapes of three genetic encodings for robot morpho-evolution and locomotion tasks, providing insights for designing better algorithms.


On Computationally Efficient Multi-Class Calibration

http://arxiv.org/abs/2402.07821v1

Compressor summary: This work proposes a novel notion of multi-class calibration called projected smooth calibration, which provides strong guarantees for downstream binary classification tasks and can be computed efficiently in polynomial time.


A Benchmark Grocery Dataset of Realworld Point Clouds From Single View

http://arxiv.org/abs/2402.07819v1

Compressor summary: The text introduces a new large-scale 3D grocery dataset (3DGrocery100) for computer vision applications, addressing the lack of fine-grained and real-world data in this domain.


Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

http://arxiv.org/abs/2402.07818v1

Compressor summary: The paper proposes stagewise DP zeroth-order methods for LLM pretraining that balance privacy, utility, and scalability, and reduces trainable parameters using data-free pruning.


Injecting Wiktionary to improve token-level contextual representations using contrastive learning

http://arxiv.org/abs/2402.07817v1

Compressor summary: Key points: - Contextual word embeddings are sensitive to context but need more supervision - The paper proposes injecting a lexicon as an alternative source of supervision using Wiktionary - The paper evaluates the approach on the Word-In-Context task and achieves new state-of-the-art results Summary: The paper introduces a novel method to improve contextual word embeddings by using Wiktionary as extra supervision and shows its effectiveness on the Word-In-Context task.


PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

http://arxiv.org/abs/2402.07814v1

Compressor summary: PBADet is a new method for detecting human body parts and their associations with individuals using multi-scale features without anchors, achieving better accuracy and efficiency than existing methods.


Retrieval-Augmented Thought Process as Sequential Decision Making

http://arxiv.org/abs/2402.07812v1

Compressor summary: RATP is a method that uses Monte-Carlo Tree Search to improve the thought generation of large language models by leveraging external knowledge and addressing privacy, hallucination, and context handling issues.


Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

http://arxiv.org/abs/2402.07808v1

Compressor summary: Key points: - The task is to estimate a distribution of parameters that can generate data-consistent simulations - The problem is ill-posed because many source distributions can match the data - The proposed approach maximizes entropy to retain uncertainty and uses Sliced-Wasserstein distance - The method works on sample-based tasks and recovers high-entropy source distributions without sacrificing fidelity - The method is applied to infer parameters of a neuron model from experimental datasets Summary: The authors propose a method for inferring uncertain source distributions of simulator parameters using maximum entropy and Sliced-Wasserstein distance, and demonstrate its application to a neuron model.


Generalising Planning Environment Redesign

http://arxiv.org/abs/2402.07799v1

Compressor summary: The paper proposes a general, metric-agnostic approach to planning environment redesign that can handle various objectives and outperforms existing approaches on benchmarks.


From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

http://arxiv.org/abs/2402.07790v1

Compressor summary: Key points: - Binary classifier performance typically measured by accuracy, which ignores uncertainty - Calibration important for interpreting model scores as probabilities - Local Calibration Score introduced as a refined metric to detect score distortions - Local regressions recommended as effective recalibration tools and visualization facilitators - Applied to Random Forest classifier for credit default prediction Summary: The paper proposes the Local Calibration Score, a new metric to measure and improve calibration of binary classifiers, especially in sensitive domains like finance, using local regressions.


Multi-Intent Attribute-Aware Text Matching in Searching

http://arxiv.org/abs/2402.07788v1

Compressor summary: The study proposes a multi-intent attribute-aware matching model (MIM) that leverages attributes from both queries and items to improve text matching in searching platforms.


Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

http://arxiv.org/abs/2402.07787v1

Compressor summary: The paper introduces EMGF, a framework that efficiently integrates diverse linguistic and structural features for improved Aspect-based Sentiment Analysis using multi-anchor triplet learning and orthogonal projection.


HYPO: Hyperspherical Out-of-Distribution Generalization

http://arxiv.org/abs/2402.07785v1

Compressor summary: HYPO is a novel framework for machine learning models to learn domain-invariant features across different environments by using a hyperspherical space and a prototypical learning objective, which improves out-of-distribution generalization.


TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

http://arxiv.org/abs/2402.07776v1

Compressor summary: The text proposes a novel framework for detecting fake news that integrates human expertise, logical predicates, and generalizable rules to achieve explainability, generalizability, and controllability in the detection process.


End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty

http://arxiv.org/abs/2402.07772v1

Compressor summary: The paper presents a method for integrating nondifferentiable optimization problems with uncertain parameters and fairness/robustness properties into machine learning models using the Predict-Then-Optimize paradigm.


Text Detoxification as Style Transfer in English and Hindi

http://arxiv.org/abs/2402.07767v1

Compressor summary: The paper proposes three methods to automatically convert toxic text into non-toxic text while keeping its meaning and fluency, using a dataset with expert-annotated detoxified versions of toxic sentences.


Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

http://arxiv.org/abs/2402.07757v1

Compressor summary: The text describes a synthetic graph navigation task to study stepwise inference in autoregressive Transformer models, revealing various phenomena related to reasoning and generalization.


Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

http://arxiv.org/abs/2402.07754v1

Compressor summary: The paper introduces Diffusion-of-Thought, a new model that combines diffusion and Chain-of-Thought techniques to improve reasoning in text processing tasks.


Predictive Churn with the Set of Good Models

http://arxiv.org/abs/2402.07745v1

Compressor summary: The paper studies predictive churn due to model updates and proposes using predictive multiplicity measures to examine expected churn over the Rashomon set of prospective models.


Towards Unified Alignment Between Agents, Humans, and Environment

http://arxiv.org/abs/2402.07744v1

Compressor summary: The paper introduces $\mathbf{UA}^2$ principles for aligning agents with human intentions, environmental dynamics, and self-constraints to improve their performance in realistic environments.


Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search

http://arxiv.org/abs/2402.07742v1

Compressor summary: The text proposes adding images to clarifying questions in conversational search systems to improve multimodal query clarification, introduces a new dataset (Melon) and a model (Marto) for this task, and shows significant improvements in retrieval performance.


Task-conditioned adaptation of visual features in multi-task policy learning

http://arxiv.org/abs/2402.07739v1

Compressor summary: The paper proposes task-conditioned adapters for multi-task policy learning in autonomous agents, enabling them to flexibly adapt their perception modules based on current tasks without finetuning pre-trained models and using example demonstrations when the task is unknown.


Universal link predictor by In-context Learning

http://arxiv.org/abs/2402.07738v1

Compressor summary: UniLP is a novel link prediction model that adapts to different graphs without targeted training by combining heuristic and parametric approaches with In-context Learning.


Unsupervised Sign Language Translation and Generation

http://arxiv.org/abs/2402.07726v1

Compressor summary: USLNet is an unsupervised model that translates and generates sign language from text and video data without parallel sign language data, using reconstruction modules and cross-modality back-translation procedure.


LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

http://arxiv.org/abs/2402.07721v1

Compressor summary: LoRA-drop is a method to improve resource efficiency in fine-tuning large pre-trained models by analyzing and retaining LoRA output for important layers.


Model Collapse Demystified: The Case of Regression

http://arxiv.org/abs/2402.07712v1

Compressor summary: The paper studies how large language models like ChatGPT deteriorate when trained on their own generated data and proposes an adaptive regularization strategy to prevent this.


Optimization of Sparse Convolution for 3D-Point Cloud on GPUs with CUDA

http://arxiv.org/abs/2402.07710v1

Compressor summary: Deep learning methods, especially CNNs, are widely used for analyzing structured grid data like images, but face challenges with sparse and unstructured 3D point clouds from LiDAR and 3D sensors.


Signed Distance Field based Segmentation and Statistical Shape Modelling of the Left Atrial Appendage

http://arxiv.org/abs/2402.07708v1

Compressor summary: The paper proposes a pipeline for automatic segmentation, mesh model creation, and statistical shape modelling of the left atrial appendage in patients with atrial fibrillation using deep learning methods.


Online Sequential Decision-Making with Unknown Delays

http://arxiv.org/abs/2402.07703v1

Compressor summary: Key points: - The text is about online sequential decision-making with delays using online convex optimization (OCO) framework - The text proposes three families of delayed algorithms based on approximate solutions for different types of feedback - The text provides regret bounds and demonstrates efficiency under different norms Summary: The text presents three families of OCO algorithms for online sequential decision-making with delays, along with their regret bounds and efficiency in various norms.


Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints

http://arxiv.org/abs/2402.07692v1

Compressor summary: BE-CBO is a new Bayesian optimization method that uses neural networks to learn constraints and efficiently explore the boundary between feasible and infeasible regions of the design space.


OrderBkd: Textual backdoor attack through repositioning

http://arxiv.org/abs/2402.07689v1

Compressor summary: Our new backdoor attack on NLP systems uses repositioning of two words as a trigger, maintains high success rate, and is robust against ONION defense.


CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity

http://arxiv.org/abs/2402.07688v1

Compressor summary: CyberMetric is a dataset with 10,000 questions to benchmark LLMs in cybersecurity, showing they often perform better than human experts.


Contrastive Multiple Instance Learning for Weakly Supervised Person ReID

http://arxiv.org/abs/2402.07685v1

Compressor summary: CMIL is a new weakly supervised ReID framework that leverages contrastive losses and outperforms baselines on three datasets, introducing the WL-MUDD dataset.


Auxiliary Tasks to Boost Biaffine Semantic Dependency Parsing

http://arxiv.org/abs/2402.07682v1

Compressor summary: The paper proposes auxiliary tasks that improve the biaffine parser's performance on semantic dependency parsing by introducing interdependence between arcs while preserving its O(n^2) complexity.


Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?

http://arxiv.org/abs/2402.07681v1

Compressor summary: The study finds that large language models perform well in translating legal texts, despite lower scores on automatic evaluation metrics, suggesting the importance of human evaluation methods.


AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

http://arxiv.org/abs/2402.07680v1

Compressor summary: AYDIV is a new framework that aligns LiDAR and camera data to improve long-distance object detection for autonomous driving systems.


GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance

http://arxiv.org/abs/2402.07677v1

Compressor summary: GBOT is a novel graph-based tracking approach for augmented reality assembly guidance that handles occlusions, complex assembly states, and multiple objects in real time.


The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models

http://arxiv.org/abs/2402.07658v1

Compressor summary: This study shows that Large Language Models can significantly improve the accuracy of Automatic Speech Recognition in medical transcription by enhancing various aspects of transcript quality, including word errors, medical concepts, speaker diarization, and semantic coherence.


Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models

http://arxiv.org/abs/2402.07645v1

Compressor summary: The authors developed a tool using a large language model to extract and label factors from electronic health records that are associated with difficult-to-treat depression, achieving good performance on real and synthetic data.


A Flow-based Credibility Metric for Safety-critical Pedestrian Detection

http://arxiv.org/abs/2402.07642v1

Compressor summary: This paper proposes a new metric, c-flow, for evaluating how well object detectors in automated driving can avoid safety-critical mistakes by using optical flow and without needing extra labels.


Complete Instances Mining for Weakly Supervised Instance Segmentation

http://arxiv.org/abs/2402.07633v1

Compressor summary: The paper proposes a novel approach for weakly supervised instance segmentation using MaskIoU heads, Complete Instances Mining strategy, and Anti-noise strategy to refine proposals and improve robustness, achieving state-of-the-art performance on PASCAL VOC 2012 and MS COCO datasets.


Overconfident and Unconfident AI Hinder Human-AI Collaboration

http://arxiv.org/abs/2402.07632v1

Compressor summary: The text discusses how AI's overconfidence or underconfidence can affect human trust, acceptance of AI suggestions, and collaboration outcomes, and suggests that aligning AI's expressed confidence with its actual performance and calibrating human trust is crucial for enhancing human-AI collaboration.


G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

http://arxiv.org/abs/2402.07630v1

Compressor summary: The paper presents a question-answering framework for textual graphs that integrates GNNs, LLMs, and RAG, and introduces a benchmark and outperforms baselines on various tasks.


AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts

http://arxiv.org/abs/2402.07625v1

Compressor summary: Our method improves language models' math skills by using meta-prompted models to select high-quality math content from the AutoMathText dataset, achieving significant token efficiency gains.


Anchor-based Large Language Models

http://arxiv.org/abs/2402.07616v1

Compressor summary: The Anchor-based LLM (AnLLM) uses a new self-attention network and inference strategy to compress sequence information into an anchor token, reducing cache and improving inference efficiency for large language models.


Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping

http://arxiv.org/abs/2402.07610v1

Compressor summary: The paper explores how multi-time bootstrapping self-alignment can improve large language models' performance by exploiting data diversity from in-context learning and proposes Step-On-Feet Tuning (SOFT) to enhance zero or one-shot capabilities.


Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

http://arxiv.org/abs/2402.07598v1

Compressor summary: The paper introduces a new algorithm for distributional reinforcement learning that approximates return distributions using a generative model and proves its minimax-optimality, along with new theoretical results and experimental comparisons.


Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription

http://arxiv.org/abs/2402.07596v1

Compressor summary: The Sheet Music Transformer is a new model for optical music recognition that can handle complex musical scores without relying on monophonic strategies.


Foundational Inference Models for Dynamical Systems

http://arxiv.org/abs/2402.07594v1

Compressor summary: The paper introduces a new method for inferring ordinary differential equations (ODEs) from noisy data using neural networks, and shows its effectiveness on various systems.


Identifying architectural design decisions for achieving green ML serving

http://arxiv.org/abs/2402.07585v1

Compressor summary: This paper reviews ML serving architectural design choices and quality characteristics, focusing on energy efficiency for achieving green AI.


Topic Modeling as Multi-Objective Contrastive Optimization

http://arxiv.org/abs/2402.07577v1

Compressor summary: The paper proposes a new method for neural topic modeling that balances between document-level contrastive learning and evidence lower bound optimization to improve topic coherence and downstream performance.


Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction

http://arxiv.org/abs/2402.07570v1

Compressor summary: The General Time Transformer (GTT) is a foundation model for zero-shot multivariate time series forecasting that uses a channel-wise framework to predict next curve shapes based on past ones, achieving superior results on unseen datasets and surpassing supervised baselines.


Weisfeiler-Leman at the margin: When more expressivity matters

http://arxiv.org/abs/2402.07568v1

Compressor summary: The paper explores how subgraph information and margin theory can improve the generalization performance of graph isomorphism algorithms, such as $1$-WL, and message-passing graph neural networks (MPNNs).


TransAxx: Efficient Transformers with Approximate Computing

http://arxiv.org/abs/2402.07545v1

Compressor summary: The authors propose TransAxx, a framework that enables fast support for approximate arithmetic on ViT models and uses MCTS to generate approximate accelerators for them, achieving significant trade-offs between accuracy and power.


Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models

http://arxiv.org/abs/2402.07543v1

Compressor summary: Fine-tuning language models with explanations improves their performance and enables them to solve tasks they couldn't before, especially for smaller models.


BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection

http://arxiv.org/abs/2402.07536v1

Compressor summary: BreakGPT is a large language model that improves financial breakout detection accuracy using a multi-stage structure.


Morse sequences

http://arxiv.org/abs/2402.07526v1

Compressor summary: A Morse sequence is a sequence of expansions and fillings that represents the gradient vector field of a discrete Morse function on a simplicial complex.


MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

http://arxiv.org/abs/2402.07519v1

Compressor summary: The authors propose a method to debias pretrained language models across multiple dimensions using structured knowledge and a large generative model, improving performance on various tasks and languages.


Physics-informed machine learning as a kernel method

http://arxiv.org/abs/2402.07514v1

Compressor summary: Physics-informed machine learning combines data and physical models for better regression, with faster convergence rates depending on the physical error.


The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

http://arxiv.org/abs/2402.07513v1

Compressor summary: The study investigates biases in speech recognition systems for casual Portuguese conversations using Whisper and MMS methods and shows that oversampling techniques reduce stereotypical biases.


Secret Collusion Among Generative AI Agents

http://arxiv.org/abs/2402.07510v1

Compressor summary: The paper explores privacy and security issues in systems of communicating AI agents, focusing on the potential use of steganography for secret collusion, and proposes a framework to test and monitor these risks.


Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS Registrations

http://arxiv.org/abs/2402.07507v1

Compressor summary: The paper proposes a method to predict traffic speeds using sparse GPS data and topographical features, outperforming existing methods in regions with limited data coverage.


NeuralSentinel: Safeguarding Neural Network Reliability and Trustworthiness

http://arxiv.org/abs/2402.07506v1

Compressor summary: NeuralSentinel is a tool that validates AI models by combining attack and defence strategies, explainability concepts, and an easy-to-use interface, which was tested on a skin cancer image detector in a Hackathon event.


ClusterTabNet: Supervised clustering method for table detection and table structure recognition

http://arxiv.org/abs/2402.07502v1

Compressor summary: The paper proposes a new deep learning method for detecting tables in documents using word relations and shows it is more accurate and efficient than existing methods.


One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning

http://arxiv.org/abs/2402.07501v1

Compressor summary: The paper proposes CLE-TFE, a model that uses contrastive learning and graph data augmentation to improve encrypted traffic classification by jointly training packet-level and flow-level tasks with less computational overhead.


Understanding Deep Learning defenses Against Adversarial Examples Through Visualizations for Dynamic Risk Assessment

http://arxiv.org/abs/2402.07496v1

Compressor summary: The text discusses the importance of studying possible attacks on Deep Neural Network models used in critical tasks and visualizing the effectiveness of different defenses against adversarial example attacks.


Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

http://arxiv.org/abs/2402.07487v1

Compressor summary: The article explains score-based diffusion models using stochastic differential equations (SDE), covering sampling and score matching methods, with proofs and examples for both beginners and practitioners.


T-RAG: Lessons from the LLM Trenches

http://arxiv.org/abs/2402.07483v1

Compressor summary: The text describes the development and deployment of Tree-RAG, a question answering system using a large language model that incorporates a tree structure to represent entity hierarchies in private enterprise documents.


Topological Safeguard for Evasion Attack based on the Interpretability of Artificial Neural Network Behavior

http://arxiv.org/abs/2402.07480v1

Compressor summary: The text discusses a novel evasion attack detector for Deep Learning models that uses Graph Convolutional Neural Networks (GCN) to analyze neuron activations and model topology, aiming to improve cybersecurity against such threats.


Food Recommendation as Language Processing (F-RLP): A Personalized and Contextual Paradigm

http://arxiv.org/abs/2402.07477v1

Compressor summary: The text introduces F-RLP, a new framework that combines language models and food-specific data to create better food recommendations.


Pushing The Limit of LLM Capacity for Text Classification

http://arxiv.org/abs/2402.07470v1

Compressor summary: RGPT is a framework that uses adaptive boosting to create a specialized text classification LLM by recurrently ensembling base learners, which significantly outperforms existing models and humans.


Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations

http://arxiv.org/abs/2402.07465v1

Compressor summary: The paper proposes a novel score-based method to solve high-dimensional Fokker-Planck equations, overcoming the challenges of curse of dimensionality and numerical errors in existing methods.


A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse?

http://arxiv.org/abs/2402.07462v1

Compressor summary: HALO is a regulatory paradigm for artificial intelligence that uses hormetic analysis to ensure safe and optimal limits of AI behaviors by modeling them as allostatic opponent processes, solving the value-loading problem and weak-to-strong generalization problem.


On the Distance from Calibration in Sequential Prediction

http://arxiv.org/abs/2402.07458v1

Compressor summary: The paper studies a binary prediction setting with a calibration distance measure that evaluates deviation from perfect calibration and proves an O(sqrt(T)) bound on it for a forecasting algorithm.


OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

http://arxiv.org/abs/2402.07456v1

Compressor summary: OS-Copilot is a framework that helps create digital agents that can interact with various elements in an operating system, such as the web, files, and applications, and improve their skills on different tasks.


Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs

http://arxiv.org/abs/2402.07453v1

Compressor summary: We analyze the impact of bandit feedback on multiclass classification loss and show that it increases the optimal mistake bound by a factor of at most $k$, where $k$ is the number of labels, compared to full information. We also reveal nearly optimal bounds for the gap between randomized and deterministic learners, and adaptive and oblivious adversaries in bandit feedback settings, which differ significantly from the full information scenario.


TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced Breast Lesion in Ultrasound

http://arxiv.org/abs/2402.07452v1

Compressor summary: The paper proposes a method for detecting out-of-distribution samples in breast ultrasound images and improving classification accuracy with triplet state augmentation and balanced sphere loss.


AraSpider: Democratizing Arabic-to-SQL

http://arxiv.org/abs/2402.07448v1

Compressor summary: The study introduces AraSpider, an Arabic version of Spider dataset, improves Arabic NLP with multilingual translation models, and highlights the importance of context, back translation, and data sharing in NLP research.


Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

http://arxiv.org/abs/2402.07446v1

Compressor summary: The study evaluates the quality of web-mined corpora for low-resource languages using similarity measures and shows that NMT models can perform well with high-quality portions of these corpora.


The I/O Complexity of Attention, or How Optimal is Flash Attention?

http://arxiv.org/abs/2402.07443v1

Compressor summary: FlashAttention algorithm optimizes Transformer's self-attention by reducing I/O complexity, and this paper investigates its optimal performance for various memory hierarchies.


Game Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and Behavior Branch

http://arxiv.org/abs/2402.07442v1

Compressor summary: This paper introduces a natural language-based text command control system for game agents that can understand and execute free-form commands using a large language model and behavior trees in a Pok'emon game simulation.


Intrinsic Task-based Evaluation for Referring Expression Generation

http://arxiv.org/abs/2402.07432v1

Compressor summary: The study proposes a new evaluation method for Referring Expression Generation (REG) models that considers referential success and alternative suggestions, improving on previous ratings-based evaluations.


SALAD: Smart AI Language Assistant Daily

http://arxiv.org/abs/2402.07431v1

Compressor summary: SALAD is an AI-powered app that helps foreigners learn Japanese by providing translations, speech recognition, audio, vocabulary tracking, grammar explanations, and songs using daily data to improve fluency and confidence in communication with native speakers.


Particle Filter SLAM for Vehicle Localization

http://arxiv.org/abs/2402.07429v1

Compressor summary: The paper presents a Particle Filter SLAM method that combines encoded data, fiber optic gyro information, and lidar technology to enable precise estimation of vehicle motion and environmental perception for simultaneous localization and mapping in robotics.


News Recommendation with Attention Mechanism

http://arxiv.org/abs/2402.07422v1

Compressor summary: The paper introduces NRAM, a new algorithm for news recommendation with attention mechanism, which could greatly enhance personalization of news content on digital platforms.


Conditional Generative Models are Sufficient to Sample from Any Causal Effect Estimand

http://arxiv.org/abs/2402.07419v1

Compressor summary: The paper presents a method to compute causal effects from observational image data using conditional generative models and diffusion techniques, and applies it to evaluate conditional generative models on CelebA dataset.


SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy Adaptation

http://arxiv.org/abs/2402.07418v1

Compressor summary: The text introduces a framework called SemTra that uses multi-modal models and a pretrained language model to adapt semantic skills from user input snippets for cross-domain long-horizon tasks.


An Empirical Study Into What Matters for Calibrating Vision-Language Models

http://arxiv.org/abs/2402.07417v1

Compressor summary: This study examines how well vision-language models can estimate uncertainty across different settings and shows that temperature scaling improves their calibration, even with few examples.


Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems

http://arxiv.org/abs/2402.07415v1

Compressor summary: SHIFT is a system that adapts object detection models based on contextual information and computational resources, improving energy efficiency and latency in autonomous systems.


Auxiliary Reward Generation with Transition Distance Representation Learning

http://arxiv.org/abs/2402.07412v1

Compressor summary: The text proposes a novel representation learning approach for reinforcement learning that generates auxiliary rewards based on the transition distance between states, improving learning efficiency and stability in manipulation tasks.


Potential-Based Reward Shaping For Intrinsic Motivation

http://arxiv.org/abs/2402.07411v1

Compressor summary: The paper introduces Potential-Based Intrinsic Motivation (PBIM), which preserves optimal policies and prevents suboptimal behavior in complex environments by converting intrinsic motivation rewards into a potential-based form.


A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)

http://arxiv.org/abs/2402.07410v1

Compressor summary: This paper investigates the safety objectives of CLIP models, focusing on their resilience to visual factor variations, uncertainty estimations, and anomalous input detection, by testing 83 CLIP models and 127 ImageNet classifiers under various conditions.


Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English

http://arxiv.org/abs/2402.07405v1

Compressor summary: The paper introduces Tois'on de Oro, a bilingual framework for financial natural language processing in Spanish and English, which includes a large curated dataset, a finetuned LLM, and an evaluation benchmark to address the gap in Spanish finance NLP research.


Enhancing Multi-Criteria Decision Analysis with AI: Integrating Analytic Hierarchy Process and GPT-4 for Automated Decision Support

http://arxiv.org/abs/2402.07404v1

Compressor summary: The study introduces a novel framework that combines the Analytic Hierarchy Process and GPT-4 to automate and enhance cybersecurity decision-making processes using AI-driven virtual experts.


Make it more specific: A novel uncertainty based airway segmentation application on 3D U-Net and its variants

http://arxiv.org/abs/2402.07403v1

Compressor summary: The paper proposes two new network structures, B-UNet and B-CE-UNet, for improving lung trachea segmentation by adding branch loss and central line loss to learn fine branch features and uncertainty estimation for confidence.


Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

http://arxiv.org/abs/2402.07401v1

Compressor summary: The study investigates how Large Language Models can generate more faithful explanations for fact-checking using a Multi-Agent Debate Refinement framework that improves credibility and trustworthiness.


VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization

http://arxiv.org/abs/2402.07398v1

Compressor summary: VisLingInstruct optimizes instructions and visual features for MMLMs to improve zero-shot performance in multi-modal tasks, achieving significant gains on TextVQA and HatefulMemes datasets.


Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

http://arxiv.org/abs/2402.07386v1

Compressor summary: Chain-of-Layer is a method for automatically constructing taxonomies from entities using in-context learning and ranking filters to minimize errors.


Exploring Perceptual Limitation of Multimodal Large Language Models

http://arxiv.org/abs/2402.07384v1

Compressor summary: This paper studies how small objects in images affect the performance of large language models in answering visual questions and identifies four factors that limit their perception.


Unsupervised Discovery of Object-Centric Neural Fields

http://arxiv.org/abs/2402.07376v1

Compressor summary: The text introduces uOCF, an unsupervised method for learning 3D object representations from real images, which improves generalization and enables applications like segmentation and scene manipulation.


Real-World Atmospheric Turbulence Correction via Domain Adaptation

http://arxiv.org/abs/2402.07371v1

Compressor summary: The paper proposes a domain adaptation framework that combines supervised simulated and unsupervised real-world atmospheric turbulence correction to improve image quality and downstream vision tasks.


SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder

http://arxiv.org/abs/2402.07370v1

Compressor summary: The paper proposes SAMAE, a self-supervised face swapping method that enhances model training by masking facial regions, using disentangled features, and addressing shape misalignment issues.


Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory Generation

http://arxiv.org/abs/2402.07369v1

Compressor summary: The paper proposes Diff-RNTraj, a diffusion model that generates road network-constrained trajectories with road-related information to address privacy concerns and scale limitations in existing trajectory data.


Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning

http://arxiv.org/abs/2402.07368v1

Compressor summary: The study examines how well LLM-based SRMs generalize from data and respond to different demographic groups, finding in-context learning helps some groups while hurting others.


A Novel Gaussian Min-Max Theorem and its Applications

http://arxiv.org/abs/2402.07356v1

Compressor summary: The paper introduces a new pair of Gaussian processes that allows extending classical theorems in high-dimensional statistics and machine learning to non-identically-distributed rows.


Data Distribution-based Curriculum Learning

http://arxiv.org/abs/2402.07352v1

Compressor summary: The paper introduces Data Distribution-based Curriculum Learning (DDCL), a new approach to ordering training samples from easy to hard, which improves the performance and speed of classification for different classifiers and datasets.


Antagonistic AI

http://arxiv.org/abs/2402.07350v1

Compressor summary: The paper explores the idea of antagonistic AI systems that challenge users and may have benefits, such as helping them build resilience or healthier relationships, while discussing ethical considerations for their responsible design.


Measurement Scheduling for ICU Patients with Offline Reinforcement Learning

http://arxiv.org/abs/2402.07344v1

Compressor summary: The study explores new offline reinforcement learning methods to optimize laboratory test scheduling for ICU patients using a preprocessed dataset.


Random Geometric Graph Alignment with Graph Neural Networks

http://arxiv.org/abs/2402.07340v1

Compressor summary: The paper studies how one-layer graph neural networks can recover correct vertex alignments between two noisy graphs with random geometric structure and features, outperforming direct assignment methods in high noise levels.


Exploring Saliency Bias in Manipulation Detection

http://arxiv.org/abs/2402.07338v1

Compressor summary: The text discusses the importance of considering semantics when detecting image manipulations that spread misinformation through social media.