arxiv compressed, 2024-02-06

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-06 generated by the compressor, my personal LLM-based project.


Test-Time Adaptation for Depth Completion

http://arxiv.org/abs/2402.03312v1

Compressor summary: The paper proposes a method for adapting depth completion models in real-time without needing source data, which improves performance by 21.1% on average.


HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

http://arxiv.org/abs/2402.03311v1

Compressor summary: HASSOD is a novel self-supervised object detection method that adapts to the number of objects per image and understands their compositions, achieving better performance and interpretability than existing methods.


V-IRL: Grounding Virtual Intelligence in Real Life

http://arxiv.org/abs/2402.03310v1

Compressor summary: V-IRL is a platform that lets AI agents interact with a virtual but realistic version of the physical world to improve their abilities in perception, decision-making, and interaction.


AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

http://arxiv.org/abs/2402.03309v1

Compressor summary: The text describes a new method (AONeuS) that uses acoustic-optical neural fusion to create high-resolution 3D underwater scenes from limited data, improving on existing RGB and sonar methods.


4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes

http://arxiv.org/abs/2402.03307v1

Compressor summary: The paper proposes a novel method for synthesizing views in dynamic scenes using anisotropic 4D Gaussians that can capture complex motion dynamics and achieve real-time rendering speeds.


Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

http://arxiv.org/abs/2402.03305v1

Compressor summary: The text explains how diffusion models learn to generate 2D Gaussian bumps by traversing three phases of latent representations and demonstrates that they cannot factorize localization in x and y positions, suggesting the need for better inductive biases.


Nevermind: Instruction Override and Moderation in Large Language Models

http://arxiv.org/abs/2402.03303v1

Compressor summary: The paper benchmarks different-sized LLMs on instruction following in conflicting situations and finds that larger models perform better, while instruction following conflicts with safety filters or guidelines.


Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

http://arxiv.org/abs/2402.03302v1

Compressor summary: The paper introduces a new Mamba-based model, Swin-UMamba, for medical image segmentation that leverages ImageNet pretraining to achieve superior performance over existing methods.


DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

http://arxiv.org/abs/2402.03300v1

Compressor summary: DeepSeekMath 7B is a language model pre-trained with math-related web data and optimized for mathematical reasoning using GRPO, achieving high scores on MATH benchmark without external tools.


GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

http://arxiv.org/abs/2402.03299v1

Compressor summary: Key points: - The paper proposes a system (GUARD) to generate jailbreaks for testing LLMs' safety and ethical behavior - GUARD uses a role-playing method, a knowledge graph, and a guideline-following setting - GUARD is tested on various LLMs across different modalities and shows effectiveness Summary: The paper introduces GUARD, a system that generates jailbreaks for testing the safety and ethics of LLMs using roles, a knowledge graph, and guidelines. It demonstrates its performance on different LLMs and modalities.


Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

http://arxiv.org/abs/2402.03295v1

Compressor summary: Ginger is a method to improve second-order optimization in deep learning by efficiently computing the inverse of the generalized Gauss-Newton matrix.


Flora: Low-Rank Adapters Are Secretly Gradient Compressors

http://arxiv.org/abs/2402.03293v1

Compressor summary: Flora uses random projections to enable high-rank weight updates for neural networks, reducing memory usage without compromising performance.


Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

http://arxiv.org/abs/2402.03292v1

Compressor summary: RONIN is a method for detecting out-of-distribution objects by using a diffusion model to inpaint the object with the predicted in-distribution label, making it easier to distinguish between in-distribution and out-of-distribution samples.


InstanceDiffusion: Instance-level Control for Image Generation

http://arxiv.org/abs/2402.03290v1

Compressor summary: InstanceDiffusion is a text-to-image model that allows precise control over individual objects in an image using various location methods and outperforms existing state-of-the-art models.


Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS

http://arxiv.org/abs/2402.03289v1

Compressor summary: The paper proposes an automated transformer decoding algorithm that uses Monte Carlo tree-search for generating RTL code with improved PPA efficiency and correctness.


A Lennard-Jones Layer for Distribution Normalization

http://arxiv.org/abs/2402.03287v1

Compressor summary: The Lennard-Jones layer (LJL) is a method to equalize the density of 2D and 3D point clouds by simulating interactions between points and adjusting their distribution without retraining neural networks.


Training-Free Consistent Text-to-Image Generation

http://arxiv.org/abs/2402.03286v1

Compressor summary: ConsiStory is a training-free method for consistent subject generation in text-to-image models using shared activations, subject-driven attention, and feature injection.


Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models

http://arxiv.org/abs/2402.03284v1

Compressor summary: FortUne Dial is a task that evaluates language models' ability to represent uncertainty in conversations, using two types of uncertainty representations and fine-tuning strategies to improve calibration.


A Framework for Partially Observed Reward-States in RLHF

http://arxiv.org/abs/2402.03282v1

Compressor summary: The paper proposes a new model for reinforcement learning with human feedback (RLHF) that considers partially observed rewards and different types of feedback, and presents efficient algorithms and generalizations based on this model.


Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

http://arxiv.org/abs/2402.03271v1

Compressor summary: UoT is a new algorithm that helps language models ask better questions to seek information and solve tasks more effectively in uncertain situations.


Multiclass Classification Procedure for Detecting Attacks on MQTT-IoT Protocol

http://arxiv.org/abs/2402.03270v1

Compressor summary: The text discusses how machine learning techniques can improve intrusion detection systems in IoT networks using a dataset with MQTT protocol frames under attack.


Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

http://arxiv.org/abs/2402.03268v1

Compressor summary: Pre-trained language models reason by aggregating indirect reasoning paths from knowledge graphs and math word problems, which can be improved by augmenting unlabeled random walk reasoning paths.


MobilityGPT: Enhanced Human Mobility Modeling with a GPT model

http://arxiv.org/abs/2402.03264v1

Compressor summary: MobilityGPT is a geospatially-aware generative model that uses GPT to create realistic human mobility trajectories with controllable generation, semantic sequence similarity, and road connectivity constraints.


Learning Best-in-Class Policies for the Predict-then-Optimize Framework

http://arxiv.org/abs/2402.03256v1

Compressor summary: PG losses are new decision-aware surrogate losses that approximate the downstream loss and perform well in misspecified settings with non-central symmetric noise.


Fair Active Ranking from Pairwise Preferences

http://arxiv.org/abs/2402.03252v1

Compressor summary: The paper proposes a fair ranking method that minimizes the error in groups of items using pairwise comparisons and an oracle, adapting to different fairness preferences by adjusting parameters.


CLIP Can Understand Depth

http://arxiv.org/abs/2402.03251v1

Compressor summary: The paper adapts CLIP for monocular depth estimation by jointly training a compact decoder and a tiny embedding matrix named mirror, improving its performance without fine-tuning and refining CLIP's prior knowledge.


SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

http://arxiv.org/abs/2402.03246v1

Compressor summary: SGS-SLAM is a novel system that combines semantic understanding with 3D Gaussian representations for accurate scene interpretation and high-quality visualizations in real time.


Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

http://arxiv.org/abs/2402.03244v1

Compressor summary: Skill Set Optimization (SSO) is a method to improve LLM actor performance by constructing and refining sets of transferable skills using subgoals and instructions.


PINN-BO: A Black-box Optimization Algorithm using Physics-Informed Neural Networks

http://arxiv.org/abs/2402.03243v1

Compressor summary: PINN-BO uses Physics-Informed Neural Networks with Partial Differential Equations to improve black-box optimization efficiency and sample quality.


JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

http://arxiv.org/abs/2402.03242v1

Compressor summary: JobSkape is a framework for generating synthetic job postings to improve skill-to-taxonomy matching using large language models.


FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition

http://arxiv.org/abs/2402.03241v1

Compressor summary: FROSTER is a framework that uses residual feature distillation to adapt CLIP for open-vocabulary action recognition while preserving its generalization capability.


ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection

http://arxiv.org/abs/2402.03235v1

Compressor summary: The paper proposes ActiveAnno3D, an active learning framework for 3D object detection that selects informative data samples for labeling and minimizes labeling costs.


Smart Flow Matching: On The Theory of Flow Matching Algorithms with Applications

http://arxiv.org/abs/2402.03232v1

Compressor summary: The paper derives a new loss and algorithm for training vector field models that improves over standard Conditional Flow Matching with smaller variance and better learning results.


CT-based Anatomical Segmentation for Thoracic Surgical Planning: A Benchmark Study for 3D U-shaped Deep Learning Models

http://arxiv.org/abs/2402.03230v1

Compressor summary: Key points: - The text introduces a benchmark study for different 3D U-shaped models in medical image segmentation for thoracic surgery planning - It compares the impact of attention mechanisms, resolution stages, and network configurations on accuracy and complexity - STUNet performs best among the models tested Summary: The text summarizes a benchmark study that evaluates various 3D U-shaped deep learning models for segmenting thoracic anatomy from CT scans and compares their performance and features.


IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images

http://arxiv.org/abs/2402.03227v1

Compressor summary: IGUANe is a 3D deep learning model that harmonizes brain MR images from multiple sites by integrating an arbitrary number of domains and preserving individual information related to age and Alzheimer's disease.


FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

http://arxiv.org/abs/2402.03226v1

Compressor summary: FuseMoE is a novel framework that combines different types of data and handles missing information to improve machine learning models' performance in various tasks, especially in medical settings.


English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts

http://arxiv.org/abs/2402.03223v1

Compressor summary: The paper explores the best language for prompting emotion labels on non-English texts using multilingual large language models, and finds that English prompts are consistently better than the target language.


"Define Your Terms" : Enhancing Efficient Offensive Speech Classification with Definition

http://arxiv.org/abs/2402.03221v1

Compressor summary: The paper proposes a joint embedding model for detecting different types of offensive speech on social media using limited data and showing promising results.


BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

http://arxiv.org/abs/2402.03216v1

Compressor summary: The paper introduces M3-Embedding, a versatile model for multi-lingual, cross-lingual, and various functionalities of information retrieval, with new state-of-the-art results and novel training techniques.


Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

http://arxiv.org/abs/2402.03214v1

Compressor summary: The paper evaluates different methods for distinguishing AI-generated images from human art across various settings and scenarios, finding that a hybrid approach combining both human and automated detectors is most effective.


Light and Optimal Schrödinger Bridge Matching

http://arxiv.org/abs/2402.03207v1

Compressor summary: The paper proposes an optimal Schr"odinger bridge matching procedure that recovers the SB process with a single step and arbitrary transport plan, and relates it to energy-based modeling objectives.


Guidance with Spherical Gaussian Constraint for Conditional Diffusion

http://arxiv.org/abs/2402.03201v1

Compressor summary: The paper proposes Diffusion with Spherical Gaussian constraint (DSG), a method that improves conditional diffusion models by constraining guidance steps within the data manifold, leading to better sample quality and faster sampling processes.


Isotropy, Clusters, and Classifiers

http://arxiv.org/abs/2402.03191v1

Compressor summary: The paper argues that isotropy in embedding spaces, a property that has been debated recently, cannot coexist with clustered data, which also harms linear classification tasks.


Unified Hallucination Detection for Multimodal Large Language Models

http://arxiv.org/abs/2402.03190v1

Compressor summary: The paper introduces MHaluBench, a benchmark for evaluating multimodal hallucination detection methods, and UNIHD, a framework that uses auxiliary tools to detect hallucinations in large language models.


Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms

http://arxiv.org/abs/2402.03188v1

Compressor summary: The paper proposes a new loss equation for face swapping that improves the realism of the eyes, reducing uncanny valley effects and making it harder to detect deepfakes.


How Good is a Single Basin?

http://arxiv.org/abs/2402.03187v1

Compressor summary: The study shows that connected ensembles with more interaction negatively affect performance but can be improved by re-discovering multi-basin deep ensembles through distillation.


Empowering Time Series Analysis with Large Language Models: A Survey

http://arxiv.org/abs/2402.03182v1

Compressor summary: The text reviews methods that use large language models (LLMs) for time series analysis, discussing their challenges, motivations, and applications in various domains.


C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

http://arxiv.org/abs/2402.03181v1

Compressor summary: This paper proposes C-RAG, a framework to certify and reduce generation risks in retrieval-augmented language models by grounding external knowledge and providing theoretical guarantees.


CIDAR: Culturally Relevant Instruction Dataset For Arabic

http://arxiv.org/abs/2402.03177v1

Compressor summary: The paper introduces CIDAR, an open Arabic instruction-tuning dataset that reflects the diverse cultures of the Arab region and addresses the biases in existing instruction datasets towards Western culture.


The Matrix: A Bayesian learning model for LLMs

http://arxiv.org/abs/2402.03175v1

Compressor summary: The paper proposes a Bayesian learning model to understand how Large Language Models work by approximating an ideal generative text model based on predicting the next token.


Multi: Multimodal Understanding Leaderboard with Text and Images

http://arxiv.org/abs/2402.03173v1

Compressor summary: Multi is a comprehensive benchmark for multimodal large language models that tests their understanding of complex figures, tables, and scientific questions in realistic examination styles.


Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

http://arxiv.org/abs/2402.03172v1

Compressor summary: The paper presents a new automated method for assigning ICD codes to clinical texts using a Transformer-based encoder and label embeddings, achieving better performance than previous models.


Homograph Attacks on Maghreb Sentiment Analyzers

http://arxiv.org/abs/2402.03171v1

Compressor summary: Homograph attacks severely reduce sentiment analysis accuracy for Arabic dialects, highlighting the need for ethical and responsible machine learning.


Is Mamba Capable of In-Context Learning?

http://arxiv.org/abs/2402.03170v1

Compressor summary: Mamba is a new model that performs as well as transformers in learning from sequential data.


RRWNet: Recursive Refinement Network for Effective Retinal Artery/Vein Segmentation and Classification

http://arxiv.org/abs/2402.03166v1

Compressor summary: RRWNet is an automated framework that uses a neural network to segment retinal blood vessels and correct errors in classification, improving the accuracy of disease biomarkers.


Decidable Reasoning About Time in Finite-Domain Situation Calculus Theories

http://arxiv.org/abs/2402.03164v1

Compressor summary: The paper proposes a new approach to represent time in cyber-physical systems using clocks as real-valued fluents, making the reachability problem decidable and enabling Golog program realization.


Linguistic features for sentence difficulty prediction in ABSA

http://arxiv.org/abs/2402.03163v1

Compressor summary: The paper explores what makes sentences difficult for aspect-based sentiment analysis by analyzing different data sets and using a combination of classifiers and linguistic features to measure difficulty.


Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

http://arxiv.org/abs/2402.03162v1

Compressor summary: The paper introduces Direct-a-Video, a system that enables independent control of object motion and camera movement in text-to-video models using cross-attention layers and self-supervised training.


Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

http://arxiv.org/abs/2402.03161v1

Compressor summary: The paper proposes an efficient method to pre-train LLMs on videos by decomposing them into keyframes and motions, enabling unified generative pre-training of video, image, and text content.


Optimal and Near-Optimal Adaptive Vector Quantization

http://arxiv.org/abs/2402.03158v1

Compressor summary: The paper presents new algorithms that improve the efficiency and scalability of adaptive vector quantization (AVQ), enabling its wider use in machine learning optimization.


A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning

http://arxiv.org/abs/2402.03146v1

Compressor summary: This paper proposes a multi-step objective for training one-step models in model-based reinforcement learning, which improves trajectory prediction and handling of noisy data.


Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models

http://arxiv.org/abs/2402.03142v1

Compressor summary: Key points: - Neural network pruning is important for reducing model complexity and memory usage - Existing pruning algorithms have limitations - KEN is a novel, universal, and unstructured pruning algorithm based on KDE - KEN preserves the most significant parameters and restores others to pre-training state - KEN achieves better or equal performance with 25% parameter reduction or more - KEN_viz is an explainable tool that visualizes the optimized model composition and subnetwork selection Summary: KEN is a new pruning algorithm that uses KDE to selectively preserve significant parameters in transformer models, achieving better or equal performance with less memory and an explainable tool called KEN_viz.


Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task

http://arxiv.org/abs/2402.03141v1

Compressor summary: AD-RL improves reinforcement learning in delayed scenarios by using a short-delayed auxiliary task to learn the value function faster and more efficiently for the long-delayed main task.


Enhancing Neural Subset Selection: Integrating Background Information into Set Representations

http://arxiv.org/abs/2402.03139v1

Compressor summary: This paper proposes a new method for selecting subsets from larger sets in drug discovery using neural networks that considers the superset's information, which improves performance over existing methods.


Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations

http://arxiv.org/abs/2402.03138v1

Compressor summary: The paper proposes a clustering-based exploration method for 3-D environments using random or pre-trained representations, which outperforms other exploration methods.


Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification

http://arxiv.org/abs/2402.03137v1

Compressor summary: The study investigates if pre-trained language models can learn associations between language choice and emotional expression in Hinglish, finding that they do but may overgeneralize this in some cases.


Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

http://arxiv.org/abs/2402.03136v1

Compressor summary: Albatross is a novel algorithm that uses simulated self-play and planning to learn how to interact with agents of any strength in simultaneous games, achieving better results than AlphaZero and previous methods in both competitive and cooperative scenarios.


Constrained Decoding for Cross-lingual Label Projection

http://arxiv.org/abs/2402.03131v1

Compressor summary: The authors propose a constrained decoding method for label projection in zero-shot cross-lingual transfer, improving translation quality and performance over existing methods.


How Free is Parameter-Free Stochastic Optimization?

http://arxiv.org/abs/2402.03126v1

Compressor summary: The paper explores if there are completely parameter-free methods for stochastic optimization, and shows that simple hyperparameter search can achieve this in non-convex settings, while providing a partially parameter-free method in convex settings.


Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

http://arxiv.org/abs/2402.03119v1

Compressor summary: The paper introduces e$^2$KD, a method that improves knowledge distillation by aligning teacher and student explanations, leading to better accuracy, agreement, and robustness.


Discovering interpretable models of scientific image data with deep learning

http://arxiv.org/abs/2402.03115v1

Compressor summary: The paper proposes methods to create interpretable models from complex image data using disentangled representation learning, sparse neural networks, and symbolic regression, and shows their usefulness in bioimaging for cell state classification.


Infrared Spectra Prediction for Diazo Groups Utilizing a Machine Learning Approach with Structural Attention Mechanism

http://arxiv.org/abs/2402.03112v1

Compressor summary: The text describes a machine learning model that uses Structural Attention Mechanism to improve the prediction and interpretation of infrared spectra, especially for diazo compounds, by focusing on chemical information near functional groups.


Non-Stationary Latent Auto-Regressive Bandits

http://arxiv.org/abs/2402.03110v1

Compressor summary: A new model and algorithm for non-stationary multi-armed bandits with latent auto-regressive rewards are proposed and shown to perform better than standard UCB in various settings.


Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases

http://arxiv.org/abs/2402.03099v1

Compressor summary: The authors propose a new method for automatic prompt engineering that uses calibration and synthetic data generation to improve the performance of Large Language Models on real-world tasks.


Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

http://arxiv.org/abs/2402.03094v1

Compressor summary: The paper proposes a new method, CD-ViTO, for cross-domain few-shot object detection that improves the performance of open-set detectors by adding novel components and outperforms existing methods on both out-of-domain and in-domain datasets.


AI-Enhanced Virtual Reality in Medicine: A Comprehensive Survey

http://arxiv.org/abs/2402.03093v1

Compressor summary: The paper examines how artificial intelligence and virtual reality are transforming medical care and services through three categories of applications: visualization enhancement, medical data processing, and intervention assistance.


Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing

http://arxiv.org/abs/2402.03082v1

Compressor summary: The authors provide a comprehensive analysis of recent advancements and challenges in the field of visual text processing, covering various tasks, features, learning paradigms, and datasets.


Multilingual transformer and BERTopic for short text topic modeling: The case of Serbian

http://arxiv.org/abs/2402.03067v1

Compressor summary: BERTopic performs well in topic modeling for partially preprocessed Serbian tweets, providing more informative topics than LDA and NMF.


Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes Uncertainty

http://arxiv.org/abs/2402.03055v1

Compressor summary: PAC is a new reinforcement learning algorithm that combines stochastic policies and critics, using PAC-Bayes analysis to model and adapt uncertainty, leading to better exploration and control.


Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations

http://arxiv.org/abs/2402.03053v1

Compressor summary: The paper presents Malaysian language models Llama2 and Mistral fine-tuned for embedding tasks, showing their effectiveness in Semantic Similarity and Retrieval-Augmented Generation.


EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models

http://arxiv.org/abs/2402.03049v1

Compressor summary: EasyInstruct is an open-source framework that simplifies instruction processing for large language models and encourages more research on instruction data.


PFDM: Parser-Free Virtual Try-on via Diffusion Model

http://arxiv.org/abs/2402.03047v1

Compressor summary: Key points: - Virtual try-on improves garment shopping experiences but needs accurate segmentation masks - PFDM is a parser-free virtual try-on method based on diffusion model - PFDM uses pseudo-images, Garment Fusion Attention, and large-scale dataset to synthesize high-fidelity images Summary: PFDM is a novel parser-free virtual try-on method that can seamlessly wear garments on the target person using diffusion model, pseudo-images, and Garment Fusion Attention, achieving high-fidelity results.


Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

http://arxiv.org/abs/2402.03046v1

Compressor summary: Open RL Benchmark is a community-driven repository of fully tracked Reinforcement Learning experiments that allows easy comparison and reproducibility of RL algorithms.


SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach

http://arxiv.org/abs/2402.03043v1

Compressor summary: The paper proposes SIDU-TXT, an explainable AI method that provides word-level explanations for text classification models, and evaluates its performance on image and text datasets using a comprehensive framework.


InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

http://arxiv.org/abs/2402.03040v1

Compressor summary: InteractiveVideo is a framework that allows users to interactively generate videos using various input mechanisms and refine the result through user instructions.


Automatic Combination of Sample Selection Strategies for Few-Shot Learning

http://arxiv.org/abs/2402.03038v1

Compressor summary: This paper investigates how different sample selection methods affect few-shot learning performance and proposes a new method (ACSESS) that combines them for better results.


Data-induced multiscale losses and efficient multirate gradient descent schemes

http://arxiv.org/abs/2402.03021v1

Compressor summary: The paper explores how multiscale data affects deep learning and proposes a new gradient descent method that adapts learning rates based on data variations.


Taylor Videos for Action Recognition

http://arxiv.org/abs/2402.03019v1

Compressor summary: Taylor video is a new format that highlights dominant motions in each frame using Taylor series expansion, improving action recognition performance when combined with RGB or optical flow videos.


Toward Green and Human-Like Artificial Intelligence: A Complete Survey on Contemporary Few-Shot Learning Approaches

http://arxiv.org/abs/2402.03017v1

Compressor summary: The text provides a comprehensive overview of Few-Shot Learning, its taxonomy, applications, and future research directions.


Whom to Trust? Elective Learning for Distributed Gaussian Process Regression

http://arxiv.org/abs/2402.03014v1

Compressor summary: The paper proposes a new algorithm, Pri-GP, that improves cooperative learning in multi-agent systems by allowing agents to selectively request predictions from trustworthy neighbors and ensuring reliable predictions.


On the Impact of Output Perturbation on Fairness in Binary Linear Classification

http://arxiv.org/abs/2402.03011v1

Compressor summary: The paper investigates how output perturbation affects individual and group fairness in binary linear classification under differential privacy.


UniMem: Towards a Unified View of Long-Context Large Language Models

http://arxiv.org/abs/2402.03009v1

Compressor summary: UniMem is a unified framework for improving large language models' ability to process long contexts by enhancing their memory capabilities, and UniMix integrates the strengths of 16 existing methods based on this framework.


On the development of a practical Bayesian optimisation algorithm for expensive experiments and simulations with changing environmental conditions

http://arxiv.org/abs/2402.03006v1

Compressor summary: The article presents ENVBO, an algorithm that optimizes systems in changing environments with controllable and uncontrollable parameters by conditioning on measurements of the latter, demonstrating its effectiveness in a wind farm simulator example.


[Citation needed] Data usage and citation practices in medical imaging conferences

http://arxiv.org/abs/2402.03003v1

Compressor summary: The authors developed two tools to detect dataset usage in medical imaging papers and found that there is a high concentration of usage of a limited set of datasets and inconsistent citation practices.


Careful with that Scalpel: Improving Gradient Surgery with an EMA

http://arxiv.org/abs/2402.02998v1

Compressor summary: Bloop is a method for combining auxiliary objectives with training losses in deep learning models using gradient surgery and moving averages, which improves performance on NLP and vision tasks.


Text-Guided Image Clustering

http://arxiv.org/abs/2402.02996v1

Compressor summary: The paper proposes Text-Guided Image Clustering, which uses generated text from captioning and VQA models to cluster images, and introduces a novel approach to inject task or domain knowledge for clustering.


Decoding-time Realignment of Language Models

http://arxiv.org/abs/2402.02992v1

Compressor summary: DeRa is a method to improve language models by adjusting their alignment without retraining, making them more efficient and less prone to errors and biases.


A Safety-Adapted Loss for Pedestrian Detection in Automated Driving

http://arxiv.org/abs/2402.02986v1

Compressor summary: The paper proposes a new training strategy for object detectors in automated driving that considers the criticality of pedestrians to prevent dangerous misdetections.


Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing

http://arxiv.org/abs/2402.02985v1

Compressor summary: The paper proposes an unsupervised road parsing framework that uses a vision language model and a computer vision model to process UAV images without manual annotations, achieving high accuracy and flexibility.


Evaluating Datalog Tools for Meta-reasoning over OWL 2 QL

http://arxiv.org/abs/2402.02978v1

Compressor summary: The paper compares different logic programming tools for meta-querying in ontologies under the Metamodeling Semantic Entailment Regime (MSER) using Datalog, a practical approach for sizeable ontologies.


Variational Flow Models: Flowing in Your Style

http://arxiv.org/abs/2402.02977v1

Compressor summary: The paper introduces variational inference methods for posterior flows, a class of stochastic processes, and proposes a training-free method to transform linear flows into straight constant-speed flows for faster sampling and improved accuracy.


Boosting, Voting Classifiers and Randomized Sample Compression Schemes

http://arxiv.org/abs/2402.02976v1

Compressor summary: The paper proposes a randomized boosting algorithm that improves the theoretical performance of voting classifiers by reducing logarithmic dependencies in the generalization error.


Putting Context in Context: the Impact of Discussion Structure on Text Classification

http://arxiv.org/abs/2402.02975v1

Compressor summary: The authors investigate how different types of contextual information, such as linguistic, structural, and temporal, can improve stance detection in text classification using a transformer-based model on a large dataset.


Retrieval-Augmented Score Distillation for Text-to-3D Generation

http://arxiv.org/abs/2402.02972v1

Compressor summary: RetDream improves text-to-3D generation by using a retrieval-based approach to enhance the quality and geometry of generated scenes, while adapting the diffusion model's 2D prior for view consistency.


Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

http://arxiv.org/abs/2402.02968v1

Compressor summary: Foundation models improve intelligent vehicle capabilities by processing and fusing diverse data modalities and tasks, with potential applications in various learning paradigms.


Mixed Noise and Posterior Estimation with Conditional DeepGEM

http://arxiv.org/abs/2402.02964v1

Compressor summary: The authors present a new method for estimating posterior and noise parameters in Bayesian inverse problems using an expectation maximization algorithm and a learned conditional normalizing flow.


AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image

http://arxiv.org/abs/2402.02956v1

Compressor summary: The paper proposes a framework called AdaTreeFormer that uses a shared encoder with hierarchical feature extraction and attention mechanisms to estimate tree density from aerial or satellite images in different domains.


Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

http://arxiv.org/abs/2402.02951v1

Compressor summary: DynaBRO is a fault-tolerant distributed machine learning method that can handle dynamic Byzantine behaviors without requiring knowledge of the number of malicious machines or losing convergence speed.


Kernel PCA for Out-of-Distribution Detection

http://arxiv.org/abs/2402.02949v1

Compressor summary: Key points: - OoD detection is important for DNN reliability - PCA fails to separate OoD and InD features in nonlinear subspace - KPCA with non-linear kernels improves OoD-InD separability - Reconstruction error in KPCA subspace is used for efficient detection - Empirical results show superior efficiency and efficacy of KPCA-based detector Summary: The authors propose a Kernel PCA (KPCA)-based method for Out-of-Distribution (OoD) detection in Deep Neural Networks, which uses non-linear kernels to enhance the separability between OoD and In-Distribution features and achieves efficient and accurate detection with low reconstruction error.


HoughToRadon Transform: New Neural Network Layer for Features Improvement in Projection Space

http://arxiv.org/abs/2402.02946v1

Compressor summary: The paper proposes a new layer for neural networks, HoughToRadon Transform, which improves speed and accuracy in semantic image segmentation tasks by modifying feature maps after Hough Transform.


Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey

http://arxiv.org/abs/2402.02941v1

Compressor summary: The text is a comprehensive review of hybrid CNN-ViT architectures in computer vision, exploring their synergies, challenges, and future directions.


InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

http://arxiv.org/abs/2402.02933v1

Compressor summary: InterpretCC is a family of interpretable neural networks that adaptively activate features to provide trustworthy explanations, actionable interpretations, and accurate predictions for human-facing domains.


Instance Segmentation XXL-CT Challenge of a Historic Airplane

http://arxiv.org/abs/2402.02928v1

Compressor summary: The text describes a challenge to test machine learning-based image segmentation for identifying parts of a historic airplane in XL-CT images.


Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer

http://arxiv.org/abs/2402.02926v1

Compressor summary: The paper proposes a transformer-based method for automated cognate detection in historical linguistics, which uses labeled information and multiple sequence alignments to improve accuracy and efficiency.


Pixel-Wise Color Constancy via Smoothness Techniques in Multi-Illuminant Scenes

http://arxiv.org/abs/2402.02922v1

Compressor summary: The text proposes a new color constancy method for images with multiple light sources, which learns pixel-wise illumination maps and preserves smoothness and natural appearance using total variation loss, bilateral filter, and label-smoothing techniques.


A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

http://arxiv.org/abs/2402.02915v1

Compressor summary: The paper proposes a computer-assisted method using a linguistic model and semantic vectors to study mutual intelligibility between closely related languages, such as German, Dutch, and English.


DS-MS-TCN: Otago Exercises Recognition with a Dual-Scale Multi-Stage Temporal Convolutional Network

http://arxiv.org/abs/2402.02910v1

Compressor summary: This study develops a waist-mounted sensor that accurately recognizes four exercises in an older adult rehabilitation program, improving on existing methods.


ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis

http://arxiv.org/abs/2402.02906v1

Compressor summary: ViewFusion is an end-to-end generative approach to novel view synthesis that adapts to multiple scenes and object classes, uses variable number of views, and works well in undetermined conditions, but has limitations in inference speed and dataset size.


LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models

http://arxiv.org/abs/2402.02896v1

Compressor summary: The text explores how personality profiles affect the behaviour of large language models in naturalistic dialogues and calls for more research on crafting human-like personas for interactive AI agents.


Motion-Aware Video Frame Interpolation

http://arxiv.org/abs/2402.02892v1

Compressor summary: The paper proposes a novel video frame interpolation method that uses a hierarchical pyramid module to estimate intermediate optical flow, reducing complexity and improving accuracy.


Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition

http://arxiv.org/abs/2402.02890v1

Compressor summary: HTBB is a new black-box approximation and optimization method that uses low-rank hierarchical Tucker decomposition and outperforms existing gradient-free methods and tensor train decomposition for high-dimensional problems.


Time-, Memory- and Parameter-Efficient Visual Adaptation

http://arxiv.org/abs/2402.02887v1

Compressor summary: Our proposed adaptation method for foundation models does not backpropagate gradients through the backbone, reducing training-time, memory usage, and achieving state-of-the-art accuracy-parameter trade-offs on VTAB benchmark.


A Review on Building Blocks of Decentralized Artificial Intelligence

http://arxiv.org/abs/2402.02885v1

Compressor summary: Key points: - AI is transforming our lives but has ethical issues - Decentralized AI (DEAI) is an alternative to centralized AI (CEAI) - The paper reviews 71 studies on DEAI and its building blocks Summary: The paper explores decentralized AI as a way to address ethical challenges of AI, by reviewing existing work and identifying its components.


Approximate Attributions for Off-the-Shelf Siamese Transformers

http://arxiv.org/abs/2402.02883v1

Compressor summary: The authors propose a method for attributing Siamese encoders' decisions to linguistic aspects and compare exact and approximate attributions for understanding their behavior.


How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

http://arxiv.org/abs/2402.02872v1

Compressor summary: The text explores how in-context learning works by merging demonstration features and using attention weights to transfer label information, and proposes a hypothesis with experiments on different models.


Statistics without Interpretation: A Sober Look at Explainable Machine Learning

http://arxiv.org/abs/2402.02870v1

Compressor summary: Explanation algorithms are complex, but need clear interpretations to avoid errors; papers should clarify how to use and understand them.


Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

http://arxiv.org/abs/2402.02868v1

Compressor summary: Fine-tuning RL models can cause forgetting of pre-trained capabilities, leading to poor transfer performance; however, knowledge retention techniques can mitigate this issue and improve results.


EEVEE: An Easy Annotation Tool for Natural Language Processing

http://arxiv.org/abs/2402.02864v1

Compressor summary: EEVEE is an easy-to-use web-based tool for creating NLP datasets with support for various tasks.


Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning

http://arxiv.org/abs/2402.02858v1

Compressor summary: The authors propose a single autoregressive model for model-based reinforcement learning without ensembles, achieving good performance on D4RL benchmark while analyzing model properties.


Enhancing Compositional Generalization via Compositional Feature Alignment

http://arxiv.org/abs/2402.02851v1

Compressor summary: Compositional Feature Alignment (CFA) is a two-stage technique that improves the compositional generalization of pretrained models by learning orthogonal class and domain features and finetuning the encoder with them.


Comparing Knowledge Sources for Open-Domain Scientific Claim Verification

http://arxiv.org/abs/2402.02844v1

Compressor summary: The paper evaluates open-domain claim verification systems on biomedical and health claims using different knowledge sources and retrieval techniques.


With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

http://arxiv.org/abs/2402.02837v1

Compressor summary: The paper proposes a formal approach to segment dialogues into topic segments and analyse their interactions to understand topic organization in open-domain conversations.


Shortened LLaMA: A Simple Depth Pruning for Large Language Models

http://arxiv.org/abs/2402.02834v1

Compressor summary: This paper proposes a depth pruning method for large language models that improves inference speeds, especially when memory is limited, compared to width pruning methods.


PowerGraph: A power grid benchmark dataset for graph neural networks

http://arxiv.org/abs/2402.02827v1

Compressor summary: PowerGraph is a new dataset for training and explaining graph neural networks on cascading failure events in power grids, which could improve GNN applications across various disciplines.


SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data

http://arxiv.org/abs/2402.02826v1

Compressor summary: Key points: - Computer vision models can detect Human Papilloma Virus Genital warts using synthetic data generated by diffusion models - The model achieved high accuracy, precision, recall, and F1 Score for both HPV and normal cases - The approach is fast and innovative for urgent medical situations Summary: The study shows how diffusion models can create realistic synthetic images for training a computer vision model to detect genital warts caused by Human Papilloma Virus with high accuracy and reliability.


Evading Data Contamination Detection for Language Models is (too) Easy

http://arxiv.org/abs/2402.02823v1

Compressor summary: The text discusses the problem of deliberate contamination of large language models' performance measurements by malicious providers and proposes a new method to detect it.


Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

http://arxiv.org/abs/2402.02820v1

Compressor summary: Frequency-enhanced Conditional Variational Autoencoder (FCVAE) is a novel unsupervised method for detecting anomalies in time series data that integrates global and local frequency features to capture both long-periodic and short-periodic patterns, achieving better performance than existing VAE-based methods.


State estimation of urban air pollution with statistical, physical, and super-learning graph models

http://arxiv.org/abs/2402.02812v1

Compressor summary: The authors propose different reconstruction methods for real-time urban air pollution maps using city graphs and super-learning models, and test their performance in Paris.


Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI

http://arxiv.org/abs/2402.02811v1

Compressor summary: The study uses a novel deep learning technique to analyze multi-scale views of resting-state fMRI volumes and classify mild cognitive impairment from healthy controls, revealing differences in brain network activity and dynamics.


Are Sounds Sound for Phylogenetic Reconstruction?

http://arxiv.org/abs/2402.02807v1

Compressor summary: The text compares the performance of using lexical cognates and sound correspondences for language family tree reconstruction and finds that cognate-based approaches are more accurate.


Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

http://arxiv.org/abs/2402.02805v1

Compressor summary: The study investigates if large language models can plan asynchronously and presents a new technique called Plan Like a Graph that improves their performance but reveals their limitations in complex tasks.


KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models

http://arxiv.org/abs/2402.02801v1

Compressor summary: Key points: - Paper proposes KS-Lottery, a method to find effective parameters for multilingual fine-tuning of LLMs using Kolmogorov-Smirnov Test - Theoretically proves that KS-Lottery can find certified winning tickets in the embedding layer - Experiments show that KS-Lottery achieves comparable performance with full fine-tuning using much fewer parameters Summary: KS-Lottery is a new method to efficiently fine-tune LLMs for multilingual tasks by identifying a small subset of parameters that perform well, using a statistical test and a theoretical guarantee.


Extreme Two-View Geometry From Object Poses with Diffusion Models

http://arxiv.org/abs/2402.02800v1

Compressor summary: The paper presents a new method to estimate camera poses from extreme viewpoint differences by using object priors and diffusion models to synthesize novel-view images and match them.


Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

http://arxiv.org/abs/2402.02797v1

Compressor summary: JAFFNet is a feature fusion network for saliency detection of surface defects, which adapts to different scales and backgrounds by using joint attention and dense receptive fields.


Rethinking Optimization and Architecture for Tiny Language Models

http://arxiv.org/abs/2402.02791v1

Compressor summary: Key points: - The study explores how to optimize tiny language models (1B parameters) for mobile devices. - It proposes several design formulas and trains PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on multilingual corpora. - The experiments show significant improvement over baselines and even some state-of-the-art models. Summary: The study optimizes tiny language models for mobile devices using design formulas and achieves better performance than many larger models.


Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear Unit (TeLU)

http://arxiv.org/abs/2402.02790v1

Compressor summary: The Hyperbolic Tangent Exponential Linear Unit (TeLU) is a novel activation function for neural networks that improves stability and robustness over existing functions like ReLU, GELU, and Mish.


From Partial to Strictly Incremental Constituent Parsing

http://arxiv.org/abs/2402.02782v1

Compressor summary: The paper explores how to build incremental parsers that output trees using prefix information, guided by left-to-right language models and tree-decoding modules, and compares them to non-incremental and partially incremental models.


Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning

http://arxiv.org/abs/2402.02772v1

Compressor summary: The paper introduces CDiffuser, a novel diffusion-based reinforcement learning method that uses return contrast to improve the base distribution and generate trajectories towards high-return states.


Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

http://arxiv.org/abs/2402.02769v1

Compressor summary: LoT is a new technique that improves generalization in deep neural networks by training auxiliary student learners who teach the main model more abstract and generalizable correlations.


Transmission Line Detection Based on Improved Hough Transform

http://arxiv.org/abs/2402.02761v1

Compressor summary: Our improved stochastic Hough transform technique can accurately and quickly detect transmission lines in UAV images by using the Hessian matrix for initial processing and enhancing boundary search and pixel row segmentation.


KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

http://arxiv.org/abs/2402.02750v1

Compressor summary: The paper proposes KIVI, a 2bit KV cache quantization algorithm that reduces memory usage and enables larger batch sizes for large language models.


Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

http://arxiv.org/abs/2402.02746v1

Compressor summary: Standard Bayesian Optimization with Gaussian process regression often performs well in high-dimensional optimization problems, contrary to common belief.


Glocal Hypergradient Estimation with Koopman Operator

http://arxiv.org/abs/2402.02741v1

Compressor summary: The paper proposes a new method to optimize hyperparameters called "glocal hypergradient estimation", which combines reliability and efficiency by using Koopman operator theory to approximate global hypergradients from local ones.


Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective

http://arxiv.org/abs/2402.02738v1

Compressor summary: The paper evaluates and proposes a method to improve the robustness of LiDAR-camera fusion models for 3D object detection in various weather conditions.


Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes

http://arxiv.org/abs/2402.02736v1

Compressor summary: The text describes a method to improve human body pose and shape estimation from a single camera using unannotated videos for extra supervision when there is not enough annotated data available.


InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data

http://arxiv.org/abs/2402.02734v1

Compressor summary: The paper introduces InVA, a novel method that uses variational autoencoders to efficiently borrow information from multiple images and capture complex non-linear associations for predictive inference.


ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer

http://arxiv.org/abs/2402.02733v1

Compressor summary: The paper introduces a novel method to re-age faces and transfer portrait styles in non-photorealistic images in one generative step, using existing networks trained within the same domain.


A Generative Approach to Surrogate-based Black-box Attacks

http://arxiv.org/abs/2402.02732v1

Compressor summary: The paper proposes a generative surrogate method for black-box attacks on DNNs, which learns the distribution of samples near the target's decision boundaries and crafts adversarial examples with imperceptible differences.


Innovative Cybersickness Detection: Exploring Head Movement Patterns in Virtual Reality

http://arxiv.org/abs/2402.02725v1

Compressor summary: This research shows that analyzing head movement patterns can help detect cybersickness with high accuracy using machine learning algorithms.


FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem Cells

http://arxiv.org/abs/2402.02724v1

Compressor summary: The paper introduces a new dataset and method (FDNet) for segmenting astrocytes from microscopy images, which helps to study neuronal metabolism and predict differentiation progress in iPSCs.


Discounted Adaptive Online Prediction

http://arxiv.org/abs/2402.02720v1

Compressor summary: The paper proposes an adaptive online learning algorithm that can gracefully forget old information and improve over gradient descent for tasks where the future may differ from the past.


Understanding the planning of LLM agents: A survey

http://arxiv.org/abs/2402.02716v1

Compressor summary: This survey examines how large language models can enhance the planning abilities of autonomous agents, reviewing existing works and categorizing them into different directions, such as task decomposition and memory.


Position Paper: What Can Large Language Models Tell Us about Time Series Analysis

http://arxiv.org/abs/2402.02713v1

Compressor summary: The paper discusses the potential of large language models to revolutionize time series analysis and improve decision-making.


Architectural Strategies for the optimization of Physics-Informed Neural Networks

http://arxiv.org/abs/2402.02711v1

Compressor summary: Gaussian activations and preconditioned neural architectures improve the training of physics-informed neural networks for solving partial differential equations.


Representation Surgery for Multi-Task Model Merging

http://arxiv.org/abs/2402.02705v1

Compressor summary: The paper introduces "Surgery," a method to reduce representation bias in multi-task learning by adjusting the merged model's representation to match individual models.


Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

http://arxiv.org/abs/2402.02701v1

Compressor summary: This paper analyzes the factors affecting the generalization gap in visual reinforcement learning with distractors and shows that minimizing representation distance between training and testing environments is crucial.


Sample Complexity Characterization for Linear Contextual MDPs

http://arxiv.org/abs/2402.02700v1

Compressor summary: The paper studies CMDPs with time-varying environments using two linear function approximation models and proposes novel algorithms with guaranteed suboptimality gap and polynomial sample complexity.


Beyond Expectations: Learning with Stochastic Dominance Made Practical

http://arxiv.org/abs/2402.02698v1

Compressor summary: The paper proposes a general framework for learning with stochastic dominance, which extends the concept to compare arbitrary random variables and finds optimal solutions efficiently.


Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

http://arxiv.org/abs/2402.02697v1

Compressor summary: The paper analyzes the connections between implicit deep equilibrium models and explicit neural networks using random matrix theory, showing how their spectral behavior depends on activation functions and weight variances.


Causal Feature Selection for Responsible Machine Learning

http://arxiv.org/abs/2402.02696v1

Compressor summary: Key points: - Responsible ML addresses issues like interpretability, fairness, robustness, and domain generalization. - Feature selection is important for responsible ML tasks. - Causal feature selection identifies features with causal impacts on outcomes and distinguishes correlation from causation. Summary: The survey discusses how causal feature selection can enhance responsible ML by ensuring ethical and social values, reliability, and trustworthiness in high-stakes applications.


Exploiting Class Probabilities for Black-box Sentence-level Attacks

http://arxiv.org/abs/2402.02695v1

Compressor summary: The paper proposes a new algorithm that uses class probabilities for black-box sentence-level attacks, which are adversarial sentences that fool text classifiers.


Statistical Guarantees for Link Prediction using Graph Neural Networks

http://arxiv.org/abs/2402.02692v1

Compressor summary: The paper develops a linear GNN architecture (LG-GNN) that accurately predicts edges in graphs generated by a graphon and provides theoretical guarantees for its performance.


Poisson Process for Bayesian Optimization

http://arxiv.org/abs/2402.02687v1

Compressor summary: PoPBO is a novel, efficient, and robust Bayesian Optimization method that uses a ranking-based surrogate model based on the Poisson process to handle noisy and intractable function responses in optimization problems.


Equivariant Symmetry Breaking Sets

http://arxiv.org/abs/2402.02681v1

Compressor summary: The paper proposes a novel framework for systematically breaking symmetry in equivariant neural networks using symmetry breaking sets, which are data efficient and applicable to any group.


Large Language Models are Geographically Biased

http://arxiv.org/abs/2402.02680v1

Compressor summary: The authors study geographic biases in large language models (LLMs) and show that they are correlated with socioeconomic conditions, affecting predictions on sensitive topics.


Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating

http://arxiv.org/abs/2402.02678v1

Compressor summary: The paper proposes a new XAI framework that uses counterfactual probabilities and prior information to explain black-box models without assuming a known causal graph.


Verifiable evaluations of machine learning models using zkSNARKs

http://arxiv.org/abs/2402.02675v1

Compressor summary: The paper introduces a method to verify model evaluations of private neural networks using zero-knowledge proofs, improving transparency and trust in commercial machine learning.


Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

http://arxiv.org/abs/2402.02665v1

Compressor summary: The paper proposes extending the utility-based paradigm from multi-objective reinforcement learning to single-objective reinforcement learning for various benefits and challenges.


Counterfactual Fairness Is Not Demographic Parity, and Other Observations

http://arxiv.org/abs/2402.02663v1

Compressor summary: The author challenges the claim that counterfactual fairness and demographic parity are equivalent, and clarifies common misconceptions about counterfactual fairness.


Image-Caption Encoding for Improving Zero-Shot Generalization

http://arxiv.org/abs/2402.02662v1

Compressor summary: The paper proposes ICE, a method that uses generated captions to improve OOD generalization of vision-language models for image classification.


Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

http://arxiv.org/abs/2402.02658v1

Compressor summary: The paper proposes MiPS, a method to automate data curation for process supervision, which improves problem-solving performance by sampling and scoring intermediate step completions using a reasoning model.


RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

http://arxiv.org/abs/2402.02656v1

Compressor summary: The RACER system uses a large language model to analyze semi-structured interviews in healthcare research, producing themes that match human evaluators' results with high agreement.


VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension

http://arxiv.org/abs/2402.02655v1

Compressor summary: The paper describes VlogQA, a corpus of Vietnamese spoken language for machine reading comprehension tasks, based on YouTube videos about food and travel, and reports promising results using deep learning models.


Learning with Mixture of Prototypes for Out-of-Distribution Detection

http://arxiv.org/abs/2402.02653v1

Compressor summary: PALM is a method that improves OOD detection by modeling each class with multiple prototypes and learning more faithful sample embeddings, achieving state-of-the-art performance on CIFAR-100.


Vision-Language Models Provide Promptable Representations for Reinforcement Learning

http://arxiv.org/abs/2402.02651v1

Compressor summary: The text proposes a new method for embodied reinforcement learning that uses vision-language models as promptable representations, which improves performance on complex RL tasks in Minecraft and Habitat compared to generic image embeddings or instruction-following methods.


Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation

http://arxiv.org/abs/2402.02649v1

Compressor summary: The paper proposes densely decoded networks (ddn) with crutch connections for refined dense prediction in medical image segmentation, and adaptive deep supervision (ads) for robust feature extraction using layer-wise effective receptive fields (lerf).


Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses

http://arxiv.org/abs/2402.02648v1

Compressor summary: The paper discusses the issues with large language models' responses to knowledge-intensive questions and proposes a new prompting method, Recursive Chain of Feedback (R-CoF), to improve reliability and validity.