arxiv compressed, 2024-01-31

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-31 generated by the compressor, my personal LLM-based project.


A simple, strong baseline for building damage detection on the xBD dataset

http://arxiv.org/abs/2401.17271v1

Compressor summary: The authors present a simple and easy to apply building damage detection method based on a winning xView2 solution, while showing that both their simplified and the original model struggle to generalize to unseen locations.


YOLO-World: Real-Time Open-Vocabulary Object Detection

http://arxiv.org/abs/2401.17270v1

Compressor summary: YOLO-World is a new approach that improves YOLO detectors by using vision-language modeling to recognize objects with unknown names efficiently and accurately.


Weaver: Foundation Models for Creative Writing

http://arxiv.org/abs/2401.17268v1

Compressor summary: Weaver is a family of large language models specialized in content creation that outperforms generalist LLMs on various writing tasks and supports retrieval-augmented generation and function calling.


ReacLLaMA: Merging chemical and textual information in chemical reactivity AI models

http://arxiv.org/abs/2401.17267v1

Compressor summary: The text presents two methods to improve chemical reactivity prediction by incorporating procedural text information into a Graphormer model.


You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

http://arxiv.org/abs/2401.17258v1

Compressor summary: YONOS-SR is a new image super-resolution method that uses scale distillation to train a diffusion model, achieving state-of-the-art results with just one DDIM step.


Weak-to-Strong Jailbreaking on Large Language Models

http://arxiv.org/abs/2401.17256v1

Compressor summary: The paper proposes and demonstrates a weak-to-strong jailbreaking attack on large language models, revealing a safety issue that needs to be addressed when aligning them.


LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

http://arxiv.org/abs/2401.17244v1

Compressor summary: LLaMP is a multimodal framework that uses reasoning-and-acting agents to reduce hallucination in Large Language Models for materials informatics tasks, such as data retrieval and synthesis procedures.


ReAlnet: Achieving More Human Brain-Like Vision via Human Neural Representational Alignment

http://arxiv.org/abs/2401.17231v1

Compressor summary: ReAlnet is a new AI vision model that aligns with human brain activity using non-invasive EEG recordings, improving its performance and robustness.


Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning

http://arxiv.org/abs/2401.17228v1

Compressor summary: The authors propose a pluralist moral sentence embedding space using contrastive learning, which captures the nuances of moral judgment but needs supervised learning with human labels.


MouSi: Poly-Visual-Expert Vision-Language Models

http://arxiv.org/abs/2401.17221v1

Compressor summary: The paper proposes using ensemble experts technique to improve vision-language models by synergizing different visual encoders and reducing positional encoding waste.


ContactGen: Contact-Guided Interactive 3D Human Generation for Partners

http://arxiv.org/abs/2401.17212v1

Compressor summary: The paper introduces a new method called ContactGen that can generate 3D interactive humans with different poses and contact regions based on an interaction label using a guided diffusion framework.


Self-Supervised Representation Learning for Nerve Fiber Distribution Patterns in 3D-PLI

http://arxiv.org/abs/2401.17207v1

Compressor summary: The authors propose a new data-driven method to characterize nerve fiber architecture in 3D brain images using contrastive learning, enabling downstream analysis tasks and improving understanding of the human brain organization.


Gazetteer-Enhanced Bangla Named Entity Recognition with BanglaBERT Semantic Embeddings K-Means-Infused CRF Model

http://arxiv.org/abs/2401.17206v1

Compressor summary: This paper reviews Bangla Named Entity Recognition, identifies its limitations, and proposes a Gazetteer and a new NER solution using advanced NLP tools.


CPR++: Object Localization via Single Coarse Point Supervision

http://arxiv.org/abs/2401.17203v1

Compressor summary: The paper proposes coarse point refinement (CPR) and CPR++ methods to reduce semantic variance in point-based object localization by selecting a semantic centre point and using variance regularization, improving object detection performance on four datasets.


NormEnsembleXAI: Unveiling the Strengths and Weaknesses of XAI Ensemble Techniques

http://arxiv.org/abs/2401.17200v1

Compressor summary: The paper compares XAI ensemble methods, proposes a new method called NormEnsembleXAI for improving interpretability, and provides a library to implement it.


Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers

http://arxiv.org/abs/2401.17196v1

Compressor summary: The paper proposes a metric to measure classifier robustness against single-word changes in text classification, an efficient attack method exploiting this vulnerability, and a defense mechanism to improve robustness.


Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

http://arxiv.org/abs/2401.17186v1

Compressor summary: The paper proposes CLL-CLIP, a multilingual VL model that learns to update its language knowledge incrementally without forgetting, and evaluates it on image-text retrieval tasks across 36 languages.


Transfer Learning for Text Diffusion Models

http://arxiv.org/abs/2401.17181v1

Compressor summary: Text diffusion could potentially replace autoregressive decoding in large language models, with some tasks showing better performance and faster generation, but more research is needed.


GraphViz2Vec: A Structure-aware Feature Generation Model to Improve Classification in GNNs

http://arxiv.org/abs/2401.17178v1

Compressor summary: GraphViz2Vec creates initial embeddings for GNNs using energy diagrams from random walks to capture structural information and improve performance on node and link classification tasks.


Zero-Shot Reinforcement Learning via Function Encoders

http://arxiv.org/abs/2401.17173v1

Compressor summary: The paper introduces the function encoder, an algorithm that helps reinforcement learning agents transfer between related tasks using a coherent vector representation of the reward or transition function.


Conditional and Modal Reasoning in Large Language Models

http://arxiv.org/abs/2401.17169v1

Compressor summary: The paper investigates how well large language models can reason with conditionals and epistemic modals, which are important for human reasoning.


Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

http://arxiv.org/abs/2401.17167v1

Compressor summary: UltraTool is a novel benchmark that evaluates large language models' ability in planning, creating, and using tools in complex real-world scenarios.


Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis

http://arxiv.org/abs/2401.17159v1

Compressor summary: The paper proposes a Monte Carlo Tree Search (MCTS) based method for automatic synthesis of effective SMT strategies, which improves performance on various SMT logics compared to existing methods and solvers.


Large Language Model Evaluation via Matrix Entropy

http://arxiv.org/abs/2401.17139v1

Compressor summary: Key points: - Paper introduces matrix entropy as a new metric for evaluating LLMs - Matrix entropy measures data compression proficiency in LLMs - Applicable in single-modal and multi-modal settings - Reveals scaling laws and alignment quality of LLMs Summary: The paper presents matrix entropy, a novel metric that assesses how well LLMs compress data and handle multiple modalities, showing their scalability and alignment performance.


Personalized Differential Privacy for Ridge Regression

http://arxiv.org/abs/2401.17127v1

Compressor summary: The text introduces a novel method called Personalized-DP Output Perturbation that allows training machine learning models with individual privacy levels, and provides theoretical accuracy guarantees for this approach.


Explainable data-driven modeling via mixture of experts: towards effective blending of grey and black-box models

http://arxiv.org/abs/2401.17118v1

Compressor summary: The proposed framework combines diverse local models using a "mixture of experts" rationale, enabling accurate and interpretable predictions for complex systems.


Evaluation in Neural Style Transfer: A Review

http://arxiv.org/abs/2401.17109v1

Compressor summary: The text discusses the challenges of evaluating Neural Style Transfer (NST) methods, highlighting inconsistencies and limitations, and providing recommendations for a standardized framework to compare and understand results better.


MT-Ranker: Reference-free machine translation evaluation by inter-system ranking

http://arxiv.org/abs/2401.17099v1

Compressor summary: The paper proposes a new way to evaluate machine translation quality without references by ranking translations in pairs and shows that it correlates better with human judgments and outperforms existing methods on various benchmarks.


CharNet: Generalized Approach for High-Complexity Character Classification

http://arxiv.org/abs/2401.17098v1

Compressor summary: CharNet is a simple and effective method for classifying handwritten characters with complex structures.


Traffic estimation in unobserved network locations using data-driven macroscopic models

http://arxiv.org/abs/2401.17095v1

Compressor summary: The paper proposes a model called MaTE that uses macroscopic flow theory and multi-source data to estimate traffic flow and travel time accurately, even when sensor measurements are unavailable.


StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

http://arxiv.org/abs/2401.17093v1

Compressor summary: StrokeNUWA is a new method that uses stroke tokens on vector graphics to generate better and faster visuals for LLMs.


NNOSE: Nearest Neighbor Occupational Skill Extraction

http://arxiv.org/abs/2401.17092v1

Compressor summary: The paper proposes NNOSE, a method that leverages multiple occupational skill datasets by retrieving similar skills from external sources and improves skill extraction without additional fine-tuning.


Active Generation Network of Human Skeleton for Action Recognition

http://arxiv.org/abs/2401.17086v1

Compressor summary: The active generative network (AGN) can create diverse and temporally consistent actions for human action recognition with very little data by adapting motion styles and using uncertainty metrics.


SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

http://arxiv.org/abs/2401.17072v1

Compressor summary: The paper introduces SemScore, a simple but effective evaluation metric for instruction-tuned LLMs that compares generated responses to gold target responses using semantic textual similarity and shows it outperforms other metrics in correlating with human evaluation.


Outline of an Independent Systematic Blackbox Test for ML-based Systems

http://arxiv.org/abs/2401.17062v1

Compressor summary: The article suggests a testing procedure for ML models and systems that considers their black box nature and stochastic properties, and provides test results and method extensions.


OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision

http://arxiv.org/abs/2401.17061v1

Compressor summary: Key points: - The paper presents a tool for generating omnidirectional images with semantic and depth information - The images are synthesized in a virtual environment using various projection models and lenses - The tool provides pixel-wise ground-truth information for training and testing computer vision algorithms Summary: The paper introduces a tool that generates realistic omnidirectional images with semantic and depth data from different projection models and lenses, enabling accurate training and testing of computer vision methods.


Atlanta Scaled layouts from non-central panoramas

http://arxiv.org/abs/2401.17058v1

Compressor summary: The paper proposes a novel method to reconstruct 3D indoor layouts from non-central panoramas using neural networks and geometry reasoning, outperforming previous methods and solving the problem in Manhattan and Atlanta environments.


BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

http://arxiv.org/abs/2401.17053v1

Compressor summary: BlockFusion is a diffusion-based model that creates 3D scenes using unit blocks and can seamlessly extend them by incorporating new blocks, achieving diverse, high-quality, and geometrically consistent results.


Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again

http://arxiv.org/abs/2401.17052v1

Compressor summary: The authors propose a new method for anomaly detection on tabular data using deep learning models that leverage retrieval techniques to improve reconstruction accuracy.


ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization

http://arxiv.org/abs/2401.17050v1

Compressor summary: ViTree combines vision transformers and neural decision trees to create interpretable, fine-grained visual categorization models that outperform competitors.


Explaining Explanations in Probabilistic Logic Programming

http://arxiv.org/abs/2401.17045v1

Compressor summary: The authors propose an approach to improve the explanations generated by probabilistic logic programming systems by defining a query-driven inference mechanism that allows for causal analysis and relevance filtering.


Scalable Mechanism Design for Multi-Agent Path Finding

http://arxiv.org/abs/2401.17044v1

Compressor summary: The text discusses the challenges and solutions for scalable mechanism design in multi-agent path finding, where self-interested agents may misrepresent their goals to achieve better outcomes.


CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

http://arxiv.org/abs/2401.17043v1

Compressor summary: RAG is a technique that improves language models by using external knowledge sources, but current benchmarks are limited; this paper creates a large-scale benchmark with four CRUD application types and evaluates all RAG components in various scenarios.


Forecasting VIX using Bayesian Deep Learning

http://arxiv.org/abs/2401.17042v1

Compressor summary: The paper proposes a probabilistic deep learning model for volatility index prediction using TCN, Transformers, and uncertainty calibration methods, achieving better performance than traditional models.


Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests

http://arxiv.org/abs/2401.17039v1

Compressor summary: This study explores how action taking as a side task affects learning to ask clarification requests in instruction-following interactions and finds that it has limited impact, but uncertainty in predictions can help.


Bayesian Optimization with Noise-Free Observations: Improved Regret Bounds via Random Exploration

http://arxiv.org/abs/2401.17037v1

Compressor summary: The paper presents new Bayesian optimization algorithms that use scattered data approximation and random exploration to improve query point distribution and regret bounds, while being easy to implement and performing better than existing methods.


Intrinsic Data Constraints and Upper Bounds in Binary Classification Performance

http://arxiv.org/abs/2401.17036v1

Compressor summary: The structure of data organization affects machine learning algorithms' performance, which can reach its theoretical maximum depending on the dataset's characteristics and class overlap.


Robust Kernel Sparse Subspace Clustering

http://arxiv.org/abs/2401.17035v1

Compressor summary: The paper proposes a robust kernel sparse subspace clustering algorithm for data with gross sparse corruptions and shows its improved performance compared to a linear robust algorithm.


Multilayer Graph Approach to Deep Subspace Clustering

http://arxiv.org/abs/2401.17033v1

Compressor summary: The paper proposes a method to improve deep subspace clustering by using information from multiple layers of the encoder network and integrating them with a multilayer graph, leading to better performance on four datasets.


Heterogeneous treatment effect estimation with subpopulation identification for personalized medicine in opioid use disorder

http://arxiv.org/abs/2401.17027v1

Compressor summary: The text introduces SubgroupTE, a neural network framework that estimates treatment effects for diverse subgroups, improving personalized recommendations for conditions like opioid use disorder.


Static and Dynamic Synthesis of Bengali and Devanagari Signatures

http://arxiv.org/abs/2401.17026v1

Compressor summary: The text describes a method to create synthetic handwriting in Indic scripts using a motor equivalence model and evaluates its effectiveness.


MF-MOS: A Motion-Focused Model for Moving Object Segmentation

http://arxiv.org/abs/2401.17023v1

Compressor summary: The paper proposes a novel motion-focused model for LiDAR moving object segmentation that uses both range images and residual maps, achieving state-of-the-art performance on the SemanticKITTI dataset.


Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

http://arxiv.org/abs/2401.17013v1

Compressor summary: The paper proposes a method to measure the risk of Deep Neural Networks (DNNs) producing incorrect outputs, by using a Mahalanobis distance-based score that can reduce classification risk at the cost of pixel coverage.


Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels

http://arxiv.org/abs/2401.16991v1

Compressor summary: The paper proposes Category-wise Fine-Tuning (CFT), a method that calibrates model predictions for partially labeled image datasets by fine-tuning logistic regressions using known labels and genetic algorithm, achieving state-of-the-art results on three benchmarks.


CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

http://arxiv.org/abs/2401.16974v1

Compressor summary: CORE is a deep reinforcement learning method for discovering causal structures and performing interventions on graphs with up to 10 variables, which outperforms existing approaches in accuracy and efficiency.


Deep 3D World Models for Multi-Image Super-Resolution Beyond Optical Flow

http://arxiv.org/abs/2401.16972v1

Compressor summary: EpiMISR is a multi-image super-resolution method that uses epipolar geometry and transformer networks to increase resolution from images with arbitrary camera positions and orientations, outperforming existing methods.


Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution

http://arxiv.org/abs/2401.16968v1

Compressor summary: The paper investigates using pretrained Authorship Verification models to identify speakers in English novels based on stylistic and topical information, but finds that it is not always more accurate than semantic-only models.


Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment

http://arxiv.org/abs/2401.16960v1

Compressor summary: The study proposes a new entity alignment method that combines knowledge graph embeddings with large language model inference to find equivalent entities across different knowledge graphs.


Online Resource Allocation with Non-Stationary Customers

http://arxiv.org/abs/2401.16945v1

Compressor summary: Key points: - Novel algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates - Leverages results from stochastic contextual bandit with knapsack and online matching with adversarial arrivals - Achieves sublinear regret under near-stationary customer arrivals and optimal competitive ratio under general customer arrival distributions - Numerical experiments show near-optimal revenues for all scenarios Summary: The algorithm combines two techniques to allocate resources online efficiently in the presence of non-stationary customers and unknown click-through rates, and demonstrates its effectiveness through numerical experiments.


Segmentation and Characterization of Macerated Fibers and Vessels Using Deep Learning

http://arxiv.org/abs/2401.16937v1

Compressor summary: The paper presents a fast and accurate deep learning method for segmenting and characterizing wood cell types in microscopy images, using the YOLOv8 model and providing a user-friendly web application.


Multi-modal Representation Learning for Cross-modal Prediction of Continuous Weather Patterns from Discrete Low-Dimensional Data

http://arxiv.org/abs/2401.16936v1

Compressor summary: The text discusses the importance of wind energy for reducing greenhouse gas emissions and proposes a deep learning method to improve wind data analysis by addressing three challenges: data resolution, dimensionality reduction, and extrapolation.


Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

http://arxiv.org/abs/2401.16923v1

Compressor summary: The paper introduces MISS, a task for studying modality incompleteness in multi-modal segmentation, and proposes MMS and FPT methods to improve robustness against missing modalities.


Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials

http://arxiv.org/abs/2401.16914v1

Compressor summary: The paper proposes a higher-order graph neural network model that predicts the stiffness of periodic lattices, with improved accuracy and efficiency compared to traditional methods.


Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching

http://arxiv.org/abs/2401.16895v1

Compressor summary: The authors propose a method to improve cross-lingual transfer for Maltese by selectively transliterating words based on their etymology and present results on four downstream tasks.


CAFCT: Contextual and Attentional Feature Fusions of Convolutional Neural Networks and Transformer for Liver Tumor Segmentation

http://arxiv.org/abs/2401.16886v1

Compressor summary: Key points: - The paper proposes a new model for liver tumor segmentation using CNN and Transformer - The model has three modules to improve contextual information: AFF, ASPP, and AGs - The model achieves high IoU and Dice scores on the LiTS dataset Summary: The authors present a novel hybrid CNN and Transformer model with three context-enhancing modules for liver tumor segmentation, which significantly improves accuracy on the LiTS dataset.


Zero-shot Classification using Hyperdimensional Computing

http://arxiv.org/abs/2401.16876v1

Compressor summary: The HDC-ZSC model uses symbol-like representations and attribute encoders to perform zero-shot learning and classification tasks with high accuracy and fewer parameters.


A Tournament of Transformation Models: B-Spline-based vs. Mesh-based Multi-Objective Deformable Image Registration

http://arxiv.org/abs/2401.16867v1

Compressor summary: The paper compares B-spline and mesh transformation models for deformable image registration using a multi-objective optimization method and shows their impact on registration outcomes in cervical cancer patients.


State Value Generation with Prompt Learning and Self-Training for Low-Resource Dialogue State Tracking

http://arxiv.org/abs/2401.16862v1

Compressor summary: SVAG is a novel framework for low-resource dialogue state tracking that generates state values and domain slot types using self-training and an estimator to improve performance and generalization.


Repositioning the Subject within Image

http://arxiv.org/abs/2401.16861v1

Compressor summary: The paper introduces SEELE, a framework for dynamic image manipulation using diffusion generative models and task inversion techniques, applied to the novel task of subject repositioning.


Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess

http://arxiv.org/abs/2401.16852v1

Compressor summary: The paper proposes a new deep learning chess method that uses a combination of specialized models, MoE, and MCTS to improve playing strength and align with strategic phases of chess.


Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case Study

http://arxiv.org/abs/2401.16843v1

Compressor summary: This paper introduces refined datasets for network traffic anomaly detection and evaluates the performance of a Random Forest algorithm across different datasets, finding it robust against data integrity issues.


Coseparable Nonnegative Tensor Factorization With T-CUR Decomposition

http://arxiv.org/abs/2401.16836v1

Compressor summary: Coseparable Nonnegative Tensor Factorization (NTF) extends coseparable NMF to tensors, preserving multi-dimensional correlations in high-dimensional data and offering a more efficient core representation using alternating index selection methods.


EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

http://arxiv.org/abs/2401.16822v1

Compressor summary: EarthGPT is a new large language model that can interpret various remote sensing (RS) images and perform well in different RS tasks using a large dataset called MMRS.


H2O-Danube-1.8B Technical Report

http://arxiv.org/abs/2401.16818v1

Compressor summary: H2O-Danube-1.8B is a large language model with strong performance on various benchmarks, released under an open license.


Reviving Undersampling for Long-Tailed Learning

http://arxiv.org/abs/2401.16811v1

Compressor summary: The paper proposes using balanced undersampling and model ensembling to improve the performance of long-tailed recognition, focusing on harmonic and geometric mean accuracy rather than average accuracy.


An Embeddable Implicit IUVD Representation for Part-based 3D Human Surface Reconstruction

http://arxiv.org/abs/2401.16810v1

Compressor summary: The paper proposes a new 3D human reconstruction method that uses a combination of parametric body models and neural implicit functions, improving accuracy, speed, and robustness.


Encoding Temporal Statistical-space Priors via Augmented Representation

http://arxiv.org/abs/2401.16808v1

Compressor summary: Key points: - The paper proposes SSAR, a technique to improve time series forecasting by augmenting the representation with statistical prior - SSAR outperforms five baselines on two data sets using different algorithms - SSAR is modular and easy to apply Summary: The paper introduces SSAR, a simple but effective method that enhances time series forecasting by adding statistical prior to the representation. SSAR beats five competitors on various settings and is flexible to use.


Online Algorithm for Node Feature Forecasting in Temporal Graphs

http://arxiv.org/abs/2401.16800v1

Compressor summary: The paper introduces an online algorithm called "mspace" that accurately predicts node features in temporal graphs, outperforming existing methods especially when training data is limited.


Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction

http://arxiv.org/abs/2401.16796v1

Compressor summary: The paper proposes PAI, a new training protocol that uses learnable prompts to model missing values in EHR without injecting imputed data, improving performance and robustness of EHR analysis models.


Performance Insights-based AI-driven Football Transfer Fee Prediction

http://arxiv.org/abs/2401.16795v1

Compressor summary: The text describes an AI-based model that predicts the transfer fees of football players, helping clubs to optimize their player acquisition and retention strategies.


Accelerated Cloud for Artificial Intelligence (ACAI)

http://arxiv.org/abs/2401.16791v1

Compressor summary: ACAI is a cloud-based platform that automates ML workflows, improves productivity, and reduces costs and experiment time by providing data storage, resource provisioning, job scheduling, and experiment tracking.


Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

http://arxiv.org/abs/2401.16788v1

Compressor summary: ScaleEval is a framework that uses multiple LLM agents to help human annotators evaluate other LLMs effectively and efficiently in various tasks and scenarios.


Enhancing Efficiency and Robustness in Support Vector Regression with HawkEye Loss

http://arxiv.org/abs/2401.16785v1

Compressor summary: The paper introduces a new loss function for support vector regression that is bounded, smooth, and insensitive, improving its robustness and performance on various datasets.


Graph Fairness Learning under Distribution Shifts

http://arxiv.org/abs/2401.16784v1

Compressor summary: Key points: - GNNs may be biased and discriminatory based on sensitive attributes - Graph fairness learning assumes same distribution for training and testing data - Theoretical analysis identifies factors that influence bias and fairness on graphs - FatraGNN framework uses graph generator to create diverse and biased graphs for training and minimizes representation distances between groups - Experiments show improved accuracy and fairness performance of FatraGNN Summary: The paper proposes FatraGNN, a framework that ensures fairness on GNNs under distribution shifts by using a graph generator to create diverse and biased graphs for training and reducing group representation distances.


Addressing Distribution Shift in Time Series Forecasting with Instance Normalization Flows

http://arxiv.org/abs/2401.16777v1

Compressor summary: The paper presents a new method for time series forecasting that handles distribution shifts, works with various models, and uses a novel invertible network for transformation.


Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach

http://arxiv.org/abs/2401.16775v1

Compressor summary: The paper proposes Bayesian methods for activity detection in cell-free networks without requiring precise network information, outperforming existing approaches.


Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator

http://arxiv.org/abs/2401.16772v1

Compressor summary: Discriminator Soft Q Imitation Learning (DSQIL) is a method that combines soft Q-learning with adversarial inverse reinforcement learning to improve imitation learning efficiency and robustness in unknown states.


MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints

http://arxiv.org/abs/2401.16771v1

Compressor summary: MolPLA is a new framework that uses graph contrastive learning to understand molecular structures and help chemists find better R-groups for drug development.


Detection and Recovery Against Deep Neural Network Fault Injection Attacks Based on Contrastive Learning

http://arxiv.org/abs/2401.16766v1

Compressor summary: The paper proposes a self-supervised learning approach to make DNNs more resilient to fault injection attacks during inference.


BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

http://arxiv.org/abs/2401.16764v1

Compressor summary: BoostDream is a fast and effective method for improving the quality of 3D models generated by text-to-3D approaches, combining 3D model distillation, multi-view SDS loss, and prompt-based guidance.


Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization

http://arxiv.org/abs/2401.16762v1

Compressor summary: Pick-and-Draw is a training-free method to improve text-to-image personalization by using appearance and layout guidance from reference images, enhancing identity consistency and diversity.


One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training

http://arxiv.org/abs/2401.16760v1

Compressor summary: The paper proposes a loss-aware quantization method that uses a one-step forward and backtrack approach to find more accurate and stable gradient directions for faster model convergence on edge devices.


SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget

http://arxiv.org/abs/2401.16757v1

Compressor summary: SwapNet is a middleware that efficiently swaps DNN blocks to run large models on edge AI devices with limited memory, reducing latency and maintaining accuracy.


Diffusion model for relational inference

http://arxiv.org/abs/2401.16755v1

Compressor summary: The Diffusion model for Relational Inference (DiffRI) is a new method that can learn and discover hidden interactions between components of complex systems using observable dynamics, without any supervision.


AI Oversight and Human Mistakes: Evidence from Centre Court

http://arxiv.org/abs/2401.16754v1

Compressor summary: The introduction of Hawk-Eye review in tennis increased the mistake rate of umpires due to psychological costs of being overruled by AI.


MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

http://arxiv.org/abs/2401.16753v1

Compressor summary: The paper proposes a novel zero-shot anomaly classification and segmentation method for industrial vision, using mutual scoring of unlabeled test images to exploit cues for anomaly determination without training or prompts.


Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

http://arxiv.org/abs/2401.16748v1

Compressor summary: Key points: - The paper aims to detect racist comments in Bengali using NLP and deep learning techniques - They built a novel dataset, annotated it, and validated it - They achieved an accuracy rate of 87.94% with the Ensemble approach - They used RNN, LSTM, BERT Embeddings, and MCNN-LSTM models Summary: The paper presents a Bengali racist comment detection system using NLP and deep learning, achieving an accuracy of 87.94% with a novel dataset and Ensemble approach.


MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models

http://arxiv.org/abs/2401.16745v1

Compressor summary: MT-Eval is a benchmark that evaluates large language models' ability to handle complex multi-turn conversations by categorizing interaction patterns and analyzing their strengths and weaknesses.


ShaRP: Explaining Rankings with Shapley Values

http://arxiv.org/abs/2401.16744v1

Compressor summary: ShaRP is a framework that explains how features contribute to different aspects of ranked outcomes using Shapley values, and can be applied to both score-based and learned ranking models.


MESA: Matching Everything by Segmenting Anything

http://arxiv.org/abs/2401.16741v1

Compressor summary: MESA is a new method to reduce matching redundancy in computer vision by using SAM's image segmentation and a multi-relational graph to find precise area matches.


Engineering A Large Language Model From Scratch

http://arxiv.org/abs/2401.16736v1

Compressor summary: Atinuke is a Transformer-based neural network that uses attention mechanisms and advanced matrix operations to perform well on various natural language tasks while maintaining interpretability and robustness.


Towards Generating Informative Textual Description for Neurons in Language Models

http://arxiv.org/abs/2401.16731v1

Compressor summary: The paper proposes a novel framework to generate human-interpretable textual descriptions for neurons in transformer-based language models using generative language models and an unsupervised approach.


Widely Linear Matched Filter: A Lynchpin towards the Interpretability of Complex-valued CNNs

http://arxiv.org/abs/2401.16729v1

Compressor summary: The study introduces a new paradigm for interpreting complex-valued CNNs using matched filtering, showing improved performance and physical meaning compared to standard methods.


Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

http://arxiv.org/abs/2401.16727v1

Compressor summary: The text surveys recent advances in hate speech moderation using large language models and multimodal models, while highlighting gaps and challenges in underrepresented languages and contexts.


Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

http://arxiv.org/abs/2401.16722v1

Compressor summary: The paper introduces a new face morphing attack using optimal-landmark-guided image blending and Graph Convolutional Networks to create realistic and effective morphed images that evade face recognition systems.


SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing

http://arxiv.org/abs/2401.16720v1

Compressor summary: SmartFRZ is a generic and efficient training framework for AI models that uses attention-guided layer freezing to reduce computation and achieve training acceleration without compromising accuracy.


LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

http://arxiv.org/abs/2401.16712v1

Compressor summary: The paper proposes a new method called LF Tracy that uses light field cameras and a single pipeline to improve salient object detection, outperforming existing methods with fewer parameters.


Multivariate Beta Mixture Model: Probabilistic Clustering With Flexible Cluster Shapes

http://arxiv.org/abs/2401.16708v1

Compressor summary: The paper presents a new model for soft clustering called multivariate beta mixture model, which can adapt to different cluster shapes using a flexible probability density function and shows its effectiveness on synthetic and real datasets.


Multi-granularity Correspondence Learning from Long-term Noisy Videos

http://arxiv.org/abs/2401.16702v1

Compressor summary: The paper proposes Norton, a method that uses optimal transport to address misalignment between video clips and captions, improving temporal learning and video understanding.


Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

http://arxiv.org/abs/2401.16700v1

Compressor summary: The paper proposes a multi-stage framework for 3D human pose estimation using transformers and self-attention to capture spatial-temporal correlations from multi-view video data, achieving state-of-the-art results on the Human3.6M dataset.


EdgeOL: Efficient in-situ Online Learning on Edge Devices

http://arxiv.org/abs/2401.16694v1

Compressor summary: EdgeOL is an edge learning framework that optimizes inference accuracy, fine-tuning speed, and energy efficiency for DNNs in applications like robot-assisted eldercare and object recognition.


Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models

http://arxiv.org/abs/2401.16692v1

Compressor summary: The paper introduces Calibrated Loss Metric, a new framework that reduces variance in neural network evaluation metrics and improves accuracy in detecting effective modeling improvement.


Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection using Template Matching and CNN

http://arxiv.org/abs/2401.16688v1

Compressor summary: The study introduces TM-CNN, a novel technique that combines template matching and convolutional neural networks to accurately detect defects in magnetic labyrinthine patterns, overcoming the limitations of previous methods.


The Detection and Understanding of Fictional Discourse

http://arxiv.org/abs/2401.16678v1

Compressor summary: The paper explores different datasets for detecting fiction and introduces new features to generalize semantics, aiming to enhance cultural archive knowledge and understand fictional storytelling.


Is Artificial Intelligence Providing the Second Revolution for Weather Forecasting?

http://arxiv.org/abs/2401.16669v1

Compressor summary: This study proposes "Three Large Rules" for developing large artificial intelligence weather forecast models, which can revolutionize numerical weather prediction by integrating with traditional models.


Fast Dual-Regularized Autoencoder for Sparse Biological Data

http://arxiv.org/abs/2401.16664v1

Compressor summary: The paper proposes a fast and accurate shallow autoencoder for predicting drug-target interactions and drug-disease associations using sparse matrix completion.


Generalization of LiNGAM that allows confounding

http://arxiv.org/abs/2401.16661v1

Compressor summary: LiNGAM-MMI is a method that improves LiNGAM by quantifying and minimizing the effect of confounding on variable order determination using KL divergence and the shortest path problem.


Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo

http://arxiv.org/abs/2401.16657v1

Compressor summary: The paper proposes using Large Language Models as elements of a sampling algorithm to study their mental representations, finding increased efficiency and performance compared to direct prompting.


OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

http://arxiv.org/abs/2401.16658v1

Compressor summary: The authors improved the performance and efficiency of an open speech model by using a new architecture and released their work for public use.


Gradient-Based Language Model Red Teaming

http://arxiv.org/abs/2401.16656v1

Compressor summary: GBRT is a method for automatically generating diverse, coherent prompts that can find unsafe responses in generative language models, outperforming reinforcement learning-based red teaming.


Augmenting Replay in World Models for Continual Reinforcement Learning

http://arxiv.org/abs/2401.16650v1

Compressor summary: Our method improves continual RL by using less memory, but may sometimes struggle with learning new tasks.


Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

http://arxiv.org/abs/2401.16649v1

Compressor summary: The authors propose a new approach that uses Transformer-based forecasting to predict future user behavior in VR environments, improving task-based behavioral biometric authentication.


Incoherent Probability Judgments in Large Language Models

http://arxiv.org/abs/2401.16646v1

Compressor summary: The text discusses how large language models can make incoherent probability judgments, similar to humans, due to their connection to implicit Bayesian inference.


Speeding up and reducing memory usage for scientific machine learning via mixed precision

http://arxiv.org/abs/2401.16645v1

Compressor summary: Mixed precision training improves the efficiency of physics-informed neural networks without sacrificing accuracy in solving complex problems.


TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

http://arxiv.org/abs/2401.16640v1

Compressor summary: The study introduces TeenyTinyLlama, two small language models for Brazilian Portuguese text generation, to address limitations of large language models in low-resource settings.


Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

http://arxiv.org/abs/2401.16638v1

Compressor summary: The paper proposes a method to enhance NLP classification tasks by using task-specific context attribution, which improves generalizability and performance on three datasets.


Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

http://arxiv.org/abs/2401.16635v1

Compressor summary: RLHF may produce outputs misaligned with human values due to inaccurate reward model predictions, but a reward ensemble method can improve its alignment performance.


The Why, When, and How to Use Active Learning in Large-Data-Driven 3D Object Detection for Safe Autonomous Driving: An Empirical Exploration

http://arxiv.org/abs/2401.16634v1

Compressor summary: Entropy querying helps select informative 3D object detection samples, reducing annotation costs and improving model performance in autonomous driving datasets.