arxiv compressed, 2024-02-08

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-08 generated by the compressor, my personal LLM-based project.


Edu-ConvoKit: An Open-Source Library for Education Conversation Data

http://arxiv.org/abs/2402.05111v1

Compressor summary: Edu-ConvoKit is an open-source library for analyzing education conversation data with pre-processing, annotation, and analysis features.


Opening the AI black box: program synthesis via mechanistic interpretability

http://arxiv.org/abs/2402.05110v1

Compressor summary: MIPS is a new method that uses neural networks to synthesize Python code from algorithmic tasks and makes them more interpretable and trustworthy.


Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

http://arxiv.org/abs/2402.05109v1

Compressor summary: The paper introduces Hydra heads, a sequentially dependent replacement for standard draft heads in speculative decoding, which improves accuracy and throughput compared to existing methods.


Image captioning for Brazilian Portuguese using GRIT model

http://arxiv.org/abs/2402.05106v1

Compressor summary: The authors developed a Brazilian Portuguese version of the GRIT model, which generates better image captions using two visual features.


Hydragen: High-Throughput LLM Inference with Shared Prefixes

http://arxiv.org/abs/2402.05099v1

Compressor summary: Hydragen is a hardware-aware attention implementation that significantly improves efficiency for large language models with shared prefixes.


On diffusion models for amortized inference: Benchmarking and improving stochastic control and sampling

http://arxiv.org/abs/2402.05098v1

Compressor summary: The paper compares different diffusion models, proposes a new exploration strategy for off-policy methods, and provides open-source code for future research.


NITO: Neural Implicit Fields for Resolution-free Topology Optimization

http://arxiv.org/abs/2402.05073v1

Compressor summary: NITO is a novel deep learning approach for topology optimization that offers faster, more efficient, and domain-agnostic solutions compared to existing methods.


A Roadmap to Pluralistic Alignment

http://arxiv.org/abs/2402.05070v1

Compressor summary: The text discusses the challenge of creating AI systems that serve diverse human values and proposes a roadmap with different types of pluralistic models and benchmarks to address this issue.


LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

http://arxiv.org/abs/2402.05054v1

Compressor summary: The paper introduces LGM, a novel framework that generates high-resolution 3D models from text or single-view images using multi-view Gaussian features and an asymmetric U-Net backbone.


Causal Representation Learning from Multiple Distributions: A General Setting

http://arxiv.org/abs/2402.05052v1

Compressor summary: The paper proposes a general nonparametric method for learning causal representations from multiple distributions and shows that under certain conditions, it can recover the underlying causal graph and latent variables.


How VADER is your AI? Towards a definition of artificial intelligence systems appropriate for regulation

http://arxiv.org/abs/2402.05048v1

Compressor summary: The paper proposes a framework (VADER) to assess how well AI definitions are suited for regulation, highlighting issues with current AI regulation proposals affecting non-AI works.


Efficient Multi-Resolution Fusion for Remote Sensing Data with Label Uncertainty

http://arxiv.org/abs/2402.05045v1

Compressor summary: The paper introduces a new method for fusing multi-modal and multi-resolution remote sensor data without pixel-level labels, which improves efficiency by using binary fuzzy measures instead of the previous fuzzy measures.


SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

http://arxiv.org/abs/2402.05044v1

Compressor summary: SALAD-Bench is a safety benchmark for evaluating the robustness of large language models against various attacks and defense methods.


PAC Learnability under Explanation-Preserving Graph Perturbations

http://arxiv.org/abs/2402.05039v1

Compressor summary: This paper studies how to use graph explanations to improve GNNs' performance, sample complexity, and robustness.


A Survey on Domain Generalization for Medical Image Analysis

http://arxiv.org/abs/2402.05035v1

Compressor summary: Key points: - The paper reviews developments in Domain Generalization (DG) for Medical Image Analysis (MedIA), a tool for computer-aided diagnosis systems using deep learning. - It defines domain shift and DG, discusses settings, summarizes methods from three viewpoints, introduces datasets, and suggests future research topics. - It also provides a GitHub project with supporting resources. Summary: The paper gives an overview of Domain Generalization for Medical Image Analysis, which deals with the performance drop of deep learning models across different medical data distributions. It covers definitions, settings, methods, datasets, and future directions, and provides a GitHub project as a resource.


How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models

http://arxiv.org/abs/2402.05034v1

Compressor summary: The paper examines how well BERT and other models capture historical changes in English by testing them on fill-in-the-blank questions with sentences from different time periods.


Simulated Overparameterization

http://arxiv.org/abs/2402.05033v1

Compressor summary: Key points: - Simulated Overparametrization (SOP): trains larger model with fewer parameters for inference - Majority kernels: novel algorithm that integrates with different architectures and boosts performance - Low overhead and strong results on various datasets and models Summary: The paper proposes SOP, a method to train overparameterized models with fewer parameters for inference, and majority kernels, an algorithm that improves performance across architectures with minimal cost.


Strong convexity-guided hyper-parameter optimization for flatter losses

http://arxiv.org/abs/2402.05025v1

Compressor summary: The proposed white-box method optimizes hyper-parameters for neural networks by minimizing the strong convexity of the loss to improve flatness and generalization.


A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

http://arxiv.org/abs/2402.05015v1

Compressor summary: The text discusses using large language models to improve Bayesian optimization in molecular space discovery, but only if they are pretrained or finetuned with relevant data.


Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

http://arxiv.org/abs/2402.05013v1

Compressor summary: Shallow autoencoders lose the sparse structure of input data during gradient descent, but adding denoising and multi-layer decoding improves compression for sparse data.


Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

http://arxiv.org/abs/2402.05011v1

Compressor summary: The paper proposes a new lossless graph condensation method that uses curriculum learning and expanding window matching to better transfer knowledge from the original graph to the condensed one, reducing computational cost for training GNNs.


EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

http://arxiv.org/abs/2402.05008v1

Compressor summary: EfficientViT-SAM is a faster and accurate segment anything model that combines EfficientViT and SAM, achieving 48.9x speedup on A100 GPU without compromising performance.


Example-based Explanations for Random Forests using Machine Unlearning

http://arxiv.org/abs/2402.05007v1

Compressor summary: FairDebugger is a system that uses machine unlearning to find and explain unfair outcomes in tree-based classifiers.


Randomized Confidence Bounds for Stochastic Partial Monitoring

http://arxiv.org/abs/2402.05002v1

Compressor summary: The paper introduces new randomized strategies for sequential learning problems with incomplete feedback and shows their effectiveness in contextual and non-contextual settings with stochastic outcomes, using a real-world example of classifier monitoring.


Pedagogical Alignment of Large Language Models

http://arxiv.org/abs/2402.05000v1

Compressor summary: The paper introduces pedagogically aligned large language models that use reinforcement learning and human feedback to guide students towards solving complex problems in education, outperforming supervised fine-tuning.


PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses

http://arxiv.org/abs/2402.04987v1

Compressor summary: The paper explores how to construct aggregation sets for better event-level prediction using one-dimensional clustering and the PriorBoost algorithm, which improves homogeneity of samples and considers label differential privacy.


Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction

http://arxiv.org/abs/2402.04982v1

Compressor summary: The paper proposes a method that combines explainable AI with adaptive learning for energy consumption prediction models, using SHAP clustering to provide insights and balance complexity and performance.


Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training

http://arxiv.org/abs/2402.04979v1

Compressor summary: The authors propose a client-server-based augmented reality app that uses synthetic data to enable object pose estimation on edge devices like HoloLens 2 and iPad, without relying on real photographs.


An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

http://arxiv.org/abs/2402.04978v1

Compressor summary: The study proposes a cooperative reasoning scheme between Knowledge Graph and Large Language Models to improve their performance in knowledge-based reasoning tasks and enhance transparency.


Multi-Sender Persuasion -- A Computational Perspective

http://arxiv.org/abs/2402.04971v1

Compressor summary: The paper proposes a novel method using differentiable neural networks to find local Nash equilibria in complex signaling games with multiple senders and self-interested receivers, showing improvements over existing approaches.


Text or Image? What is More Important in Cross-Domain Generalization Capabilities of Hate Meme Detection Models?

http://arxiv.org/abs/2402.04967v1

Compressor summary: The paper shows that text helps multimodal hate meme detection across domains, while images hinder it.


ConvLoRA and AdaBN based Domain Adaptation via Self-Training

http://arxiv.org/abs/2402.04964v1

Compressor summary: Key points: - ConvLoRA is a method for multi-target domain adaptation that reduces parameters by adding low-rank decomposition matrices to convolutional layers and using adaptive batch normalization. - It outperforms or matches independent fine-tuned networks with much fewer trainable parameters. - It can be applied to any architecture with convolutional and batch normalization layers. Summary: ConvLoRA is a simple and effective method for multi-target domain adaptation that uses low-rank decomposition and adaptive batch normalization to reduce parameters and improve performance.


Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

http://arxiv.org/abs/2402.04958v1

Compressor summary: Test-time adaptation with selective channel adaptation improves robustness to label distribution shifts and reduces failure risks in deep neural networks.


Reconfidencing LLMs from the Grouping Loss Perspective

http://arxiv.org/abs/2402.04957v1

Compressor summary: Large language models like ChatGPT and LLaMA can generate wrong answers confidently; researchers propose a method to correct their overconfidence using a knowledge base.


4-Dimensional deformation part model for pose estimation using Kalman filter constraints

http://arxiv.org/abs/2402.04953v1

Compressor summary: The article explores how a Kalman filter enhances pose estimation accuracy in 4D deformation models using two data sets.


An approach to automated videogame beta testing

http://arxiv.org/abs/2402.04938v1

Compressor summary: The paper proposes a way to automate quality assurance in AAA game development, which is currently mostly manual and done by human beta testers.


A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

http://arxiv.org/abs/2402.04933v1

Compressor summary: BCoR is an online RL approach for RMABs that combines Bayesian modeling and Thompson sampling to handle contextual and non-stationary settings, improving performance in public health interventions.


Blue noise for diffusion models

http://arxiv.org/abs/2402.04930v1

Compressor summary: The paper introduces a new class of diffusion models that use correlated noise to improve the training process and generation quality for computer graphics tasks.


Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation

http://arxiv.org/abs/2402.04929v1

Compressor summary: The paper proposes DM-SFDA, a method that uses diffusion models to generate source domain images from target features, improving domain adaptation performance.


Two Trades is not Baffled: Condense Graph via Crafting Rational Gradient Matching

http://arxiv.org/abs/2402.04924v1

Compressor summary: The paper proposes a new graph condensation method, CTRL, that reduces errors in training trajectories and improves performance on various graph datasets and tasks.


Prompting Implicit Discourse Relation Annotation

http://arxiv.org/abs/2402.04918v1

Compressor summary: The text explores how to improve ChatGPT's performance in identifying discourse relations using different prompting techniques, but finds that it still struggles with the task even with advanced methods.


Moco: A Learnable Meta Optimizer for Combinatorial Optimization

http://arxiv.org/abs/2402.04915v1

Compressor summary: Moco is a meta optimizer that learns to adapt its solution construction procedure based on features extracted from the current search state, improving performance on combinatorial optimization problems like TSP and MIS.


Personalized Text Generation with Fine-Grained Linguistic Control

http://arxiv.org/abs/2402.04914v1

Compressor summary: The paper proposes a new benchmark for evaluating generative models' ability to control fine-grained linguistic attributes in text generation and analyzes various language models' performance on it.


Conformal Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects

http://arxiv.org/abs/2402.04906v1

Compressor summary: The authors propose a new method called Conformal Monte Carlo (CMC) meta-learners that can estimate the uncertainty in the treatment effect and help make personalized decisions based on this information.


L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ

http://arxiv.org/abs/2402.04902v1

Compressor summary: L4Q is an algorithm that combines parameter-efficient fine-tuning and quantization-aware training for large language models, improving accuracy with sub-4-bit precision.


The Strain of Success: A Predictive Model for Injury Risk Mitigation and Team Success in Soccer

http://arxiv.org/abs/2402.04898v1

Compressor summary: Key points: - novel sequential team selection model in soccer - models player injury and unavailability using real-world data - Monte-Carlo Tree Search for optimal long-term performance - validated on 2018/19 English Premier League season - reduced injuries and costs compared to benchmark Summary: The paper proposes a soccer team selection model that uses real-world data and Monte-Carlo Tree Search to reduce injuries and costs while maintaining performance.


A Unified Framework for Probabilistic Verification of AI Systems via Weighted Model Integration

http://arxiv.org/abs/2402.04892v1

Compressor summary: The paper proposes a general framework for verifying various properties of AI systems using Weighted Model Integration, which can handle different models and properties without strong distributional assumptions.


Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

http://arxiv.org/abs/2402.04883v1

Compressor summary: The paper proposes a cascade framework that uses depth information for effective feature lifting and 3D object localization in camera-based 3D detection, improving performance on NuScenes benchmark and other detectors.


STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation

http://arxiv.org/abs/2402.04878v1

Compressor summary: The paper proposes a texture-agnostic object detection method that focuses on shape features by randomizing textures during training with CAD models, addressing the challenge of textureless and metallic objects in robotics.


On Provable Length and Compositional Generalization

http://arxiv.org/abs/2402.04875v1

Compressor summary: The text explores proving how various sequence-to-sequence models achieve length and compositional generalization, which are essential forms of out-of-distribution generalization.


Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

http://arxiv.org/abs/2402.04869v1

Compressor summary: The paper proposes a framework for incorporating causality into reinforcement learning, using interventions for causal structure learning during exploration and policy guidance during exploitation, and evaluates it on a simulated fault alarm environment.


CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

http://arxiv.org/abs/2402.04858v1

Compressor summary: The paper introduces CodeIt, a self-improvement method for large language models that uses program sampling, hindsight relabeling, and prioritized experience replay to solve tasks requiring human-level reasoning, achieving state-of-the-art performance on the Abstraction and Reasoning Corpus.


Explaining Learned Reward Functions with Counterfactual Trajectories

http://arxiv.org/abs/2402.04856v1

Compressor summary: CTEs are explanations for reinforcement learning reward functions that show how different actions affect outcomes and can help users understand and evaluate learned rewards better.


Dual-Path Coupled Image Deraining Network via Spatial-Frequency Interaction

http://arxiv.org/abs/2402.04855v1

Compressor summary: DPCNet is a novel image deraining method that uses spatial and frequency information from two separate feature extraction blocks and an adaptive fusion module to outperform existing methods and provide visually pleasing results.


Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

http://arxiv.org/abs/2402.04852v1

Compressor summary: aLLM4TS adapts Large Language Models for time-series representation learning by using multi-patch prediction, which captures temporal dynamics better than traditional methods, and achieves superior performance in various downstream tasks.


Data-efficient Large Vision Models through Sequential Autoregression

http://arxiv.org/abs/2402.04841v1

Compressor summary: Key points: - Paper proposes an efficient vision model that works on minimal visual data without linguistic inputs - Model uses autoregression and reduces parameter size and training data requirements - Model shows proficiency in various high-level and low-level visual tasks Summary: The paper introduces a new vision model that can understand visual data with little data and no language, using an autoregressive architecture to improve efficiency and adaptability.


PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

http://arxiv.org/abs/2402.04838v1

Compressor summary: The study proposes PaDeLLM-NER, a method to reduce NER generation latency using parallel decoding in large language models without additional modules or architecture changes.


On the Completeness of Invariant Geometric Deep Learning Models

http://arxiv.org/abs/2402.04836v1

Compressor summary: The paper studies the expressiveness of invariant geometric deep learning models, introduces a new model called GeoNGNN that can handle some corner cases, and proves its E(3)-completeness for three existing models.


SARI: Simplistic Average and Robust Identification based Noisy Partial Label Learning

http://arxiv.org/abs/2402.04835v1

Compressor summary: SARI is a novel framework for noisy partial label learning that uses pseudo-labeling and neural network classification to achieve state-of-the-art results in various settings.


Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

http://arxiv.org/abs/2402.04833v1

Compressor summary: The paper suggests that using the 1,000 longest instructions from standard datasets as a simple and effective way to fine-tune LLMs, and that refining these long instructions further improves their performance.


Structured d-DNNF Is Not Closed Under Negation

http://arxiv.org/abs/2402.04832v1

Compressor summary: Structured d-DNNF is more succinct than SDD and less tractable in terms of transformations, while OBDD supports more tractable transformations but is less succinct.


Closing the Gap Between SGP4 and High-Precision Propagation via Differentiable Programming

http://arxiv.org/abs/2402.04830v1

Compressor summary: dSGP4 is a differentiable version of SGP4 that enables fast and precise orbital propagation for space applications, integrating with machine learning techniques to further improve precision.


NeRF as Non-Distant Environment Emitter in Physics-based Inverse Rendering

http://arxiv.org/abs/2402.04829v1

Compressor summary: The paper proposes using NeRF as a non-distant environment emitter for inverse rendering, improving accuracy over the common distant environment map approach.


Learning Communication Policies for Different Follower Behaviors in a Collaborative Reference Game

http://arxiv.org/abs/2402.04824v1

Compressor summary: The authors evaluate how well neural agents can adapt their language grounding and coordination strategies to different Follower behaviors in a collaborative reference game, using PPO reinforcement learning with an additional communicative effort signal.


How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data

http://arxiv.org/abs/2402.04823v1

Compressor summary: The paper proposes Constrained Deep Generative Models (C-DGMs) that ensure synthetic data complies with given constraints by integrating a Constraint Layer with the model, improving utility and detection.


E(3)-Equivariant Mesh Neural Networks

http://arxiv.org/abs/2402.04821v1

Compressor summary: The paper proposes EMNN, an improved equivariant method for 3D mesh tasks that simplifies geometric deep learning by incorporating face information and hierarchy, achieving better results with less complexity and pre-processing.


BOWLL: A Deceptively Simple Open World Lifelong Learner

http://arxiv.org/abs/2402.04814v1

Compressor summary: Key points: - Deep learning often optimizes for scalar performance on benchmarks, not real-world applications - Open world lifelong learning is a new trend that requires recognition of novel concepts, avoidance of uninformative data, and retention of previous knowledge - The paper introduces a simple baseline using batch normalization to repurpose standard models for open world lifelong learning - The approach shows promising results and should be a future standard for this field Summary: The paper proposes a batch normalization-based baseline for open world lifelong learning, a challenging real-world task that requires adaptability and knowledge retention.


Aspect-Based Sentiment Analysis for Open-Ended HR Survey Responses

http://arxiv.org/abs/2402.04812v1

Compressor summary: The paper proposes a machine learning method for analyzing employee satisfaction surveys in Dutch, identifying key aspects and using pre-trained language models.


Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer

http://arxiv.org/abs/2402.04798v1

Compressor summary: The Spiking-PhysFormer model combines artificial neural networks and spiking neural networks to measure cardiac activity and physiological signals from facial videos with less power consumption than existing methods.


Scalable Multi-view Clustering via Explicit Kernel Features Maps

http://arxiv.org/abs/2402.04794v1

Compressor summary: The paper introduces a new scalable framework for multi-view subspace clustering, using kernel feature maps to reduce computation time, and shows its effectiveness on large networks.


Direct Language Model Alignment from Online AI Feedback

http://arxiv.org/abs/2402.04792v1

Compressor summary: OAIF improves direct alignment from preferences methods by providing online feedback from a large language model annotator.


MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

http://arxiv.org/abs/2402.04788v1

Compressor summary: The paper introduces a new benchmark, MLLM-as-a-Judge, to evaluate multimodal large language models in assisting judges, but finds they still have limitations and biases.


A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models

http://arxiv.org/abs/2402.04787v1

Compressor summary: The authors propose a framework that uses Bayesian networks to generate and compare explanations for natural language inference tasks with LLM-generated explanations, aiming to understand how LLMs solve problems.


Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

http://arxiv.org/abs/2402.04783v1

Compressor summary: The paper analyzes how periodic activation functions improve performance in vision tasks and provides theoretical evidence for their better behavior compared to ReLU-activated networks.


StableMask: Refining Causal Masking in Decoder-only Transformer

http://arxiv.org/abs/2402.04779v1

Compressor summary: StableMask improves the decoder-only Transformer by refining the causal mask to balance attention distributions, encode absolute positional information, and support efficient extrapolation and integration with other techniques.


Code as Reward: Empowering Reinforcement Learning with VLMs

http://arxiv.org/abs/2402.04764v1

Compressor summary: The paper proposes Code as Reward (VLM-CaR), a framework that generates dense reward functions from VLMs through code generation, enabling faster and more accurate training of RL agents.


Color Recognition in Challenging Lighting Environments: CNN Approach

http://arxiv.org/abs/2402.04762v1

Compressor summary: The text proposes a CNN-based color detection method for computer vision that improves robustness in various lighting conditions and outperforms existing methods.


Boundary-aware Contrastive Learning for Semi-supervised Nuclei Instance Segmentation

http://arxiv.org/abs/2402.04756v1

Compressor summary: The paper introduces a network that uses contrastive learning to improve nuclei boundary denoising in semi-supervised segmentation, addressing challenges in pathological images due to color and morphological variations.


Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

http://arxiv.org/abs/2402.04754v1

Compressor summary: LACE is a continuous diffusion model for controllable layout generation that incorporates aesthetic constraints and outperforms existing methods.


Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

http://arxiv.org/abs/2402.04744v1

Compressor summary: This paper studies the problem of high-sparsity regions in N:M structured sparsity models and proposes a method to reduce induced noise and improve performance by using decay mechanisms on gradient flows.


Graph Cuts with Arbitrary Size Constraints Through Optimal Transport

http://arxiv.org/abs/2402.04732v1

Compressor summary: Key points: - The paper proposes a new graph cut algorithm for partitioning graphs under arbitrary size constraints - The algorithm is based on a regularized Gromov-Wasserstein problem and uses an accelerated proximal gradient descent method - The algorithm has several advantages over classical methods, such as global convergence, sparsity and efficiency Summary: The paper presents a new graph partitioning algorithm that can handle arbitrary size constraints by formulating the problem as a regularized Gromov-Wasserstein optimization and solving it with an efficient gradient-based method.


InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

http://arxiv.org/abs/2402.04717v1

Compressor summary: InstructScene is a new method for generating 3D indoor scenes from natural language instructions, improving controllability and fidelity with a semantic graph prior and a layout decoder.


Incorporating Retrieval-based Causal Learning with Information Bottlenecks for Interpretable Graph Neural Networks

http://arxiv.org/abs/2402.04710v1

Compressor summary: The paper proposes a novel interpretable causal Graph Neural Network framework that combines retrieval-based causal learning and Graph Information Bottleneck theory to improve both explanation and prediction.


EvoSeed: Unveiling the Threat on Deep Neural Networks with Real-World Illusions

http://arxiv.org/abs/2402.04699v1

Compressor summary: Key points: - EvoSeed is a new algorithm to generate natural adversarial samples for deep neural networks - It uses evolutionary strategy, diffusion model, and classifier model in a black-box setting - The generated samples are of high quality and transferable to different classifiers Summary: EvoSeed is an evolutionary algorithm that creates natural adversarial samples for deep neural networks by using a diffusion model and a classifier model in a black-box way, resulting in high-quality and transferable samples.


The Influence of Autofocus Lenses in the Camera Calibration Process

http://arxiv.org/abs/2402.04686v1

Compressor summary: The paper analyzes camera calibration in robotics and computer vision, proposing a modified method that considers distance-dependent focal length to improve accuracy.


Large Language Models As Faithful Explainers

http://arxiv.org/abs/2402.04678v1

Compressor summary: xLLM is a framework that improves the faithfulness and accuracy of natural language explanations for large language models' decisions by optimizing an evaluator that quantifies faithfulness.


Source Identification in Abstractive Summarization

http://arxiv.org/abs/2402.04677v1

Compressor summary: The paper studies how neural summarization models convert source information into summaries by analyzing the source sentences of reference and system summaries on two datasets.


G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

http://arxiv.org/abs/2402.04672v1

Compressor summary: The paper proposes G-NAS, a method that uses Differentiable NAS and Generalizable loss to train object detectors on one source domain and generalize to multiple target domains with complex feature imbalance issues.


V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication

http://arxiv.org/abs/2402.04671v1

Compressor summary: The paper introduces a collaborative semantic scene completion framework for autonomous vehicles using vehicle-to-vehicle communication to overcome occlusion and short-range perception challenges, improving performance on geometric and visual metrics.


A Perspective on Individualized Treatment Effects Estimation from Time-series Health Data

http://arxiv.org/abs/2402.04668v1

Compressor summary: This paper provides an overview of individualized treatment effects methods for electronic health records data, discussing challenges and future research directions in this emerging field.


Open-Vocabulary Calibration for Vision-Language Models

http://arxiv.org/abs/2402.04655v1

Compressor summary: The paper proposes a method called Distance-Aware Calibration (DAC) to improve confidence calibration in vision-language models fine-tuned with prompt learning, especially for open-vocabulary tasks.


An Over Complete Deep Learning Method for Inverse Problems

http://arxiv.org/abs/2402.04653v1

Compressor summary: The authors propose a new method to improve machine learning techniques for solving inverse problems by embedding the solution into higher dimensions and jointly designing and learning the regularizer.


OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

http://arxiv.org/abs/2402.04648v1

Compressor summary: OV-NeRF improves semantic field learning for 3D scenes using pre-trained vision and language models, addressing noisy and view-inconsistent semantics issues with single-view and cross-view strategies.


Latent Plan Transformer: Planning as Latent Variable Inference

http://arxiv.org/abs/2402.04647v1

Compressor summary: The Latent Plan Transformer (LPT) is a model that uses latent space to connect a Trajectory Generator and the final return, enabling improved decisions and planning with suboptimal trajectories in tasks without step-wise rewards.


Learning with Diversification from Block Sparse Signal

http://arxiv.org/abs/2402.04646v1

Compressor summary: The paper presents a new prior for block sparse learning that adapts to data and reduces sensitivity to pre-defined block information, leading to better performance than existing methods.


LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

http://arxiv.org/abs/2402.04644v1

Compressor summary: LEVI is a novel method to improve fine-tuning generalization by adaptively ensembling pre-trained and task-specific models, addressing limitations in both data sources.


Domain Bridge: Generative model-based domain forensic for black-box models

http://arxiv.org/abs/2402.04640v1

Compressor summary: The paper presents an enhanced approach to determine not only the general data domain but also its specific attributes using image embeddings and generative models, leveraging the large LAION-5B dataset.


TransLLaMa: LLM-based Simultaneous Translation System

http://arxiv.org/abs/2402.04636v1

Compressor summary: The study shows that large language models can perform simultaneous machine translation by generating a "wait" token and achieving comparable results to state-of-the-art baselines.


GSN: Generalisable Segmentation in Neural Radiance Field

http://arxiv.org/abs/2402.04632v1

Compressor summary: The paper introduces a new representation called GSN that combines generalized radiance fields with distilled semantic features, enabling multi-view segmentation of unseen scenes.


LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

http://arxiv.org/abs/2402.04630v1

Compressor summary: DVDet is a detector that uses conditional context prompts and hierarchical textual descriptors to align visual embeddings with fine-grained text descriptions of object parts, improving open-vocabulary detection performance.


SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph

http://arxiv.org/abs/2402.04627v1

Compressor summary: The study evaluates strategies for fine-tuning an LLM for question answering over life science KGs, using data augmentation and semantic clues in queries to overcome the scarcity of training data.


Noise Map Guidance: Inversion with Spatial Context for Real Image Editing

http://arxiv.org/abs/2402.04625v1

Compressor summary: Noise Map Guidance (NMG) is a new text-guided diffusion model that improves real-image editing by preserving quality and context without requiring optimization.


MEMORYLLM: Towards Self-Updatable Large Language Models

http://arxiv.org/abs/2402.04624v1

Compressor summary: MEMORYLLM is a self-updating language model that can memorize new knowledge and maintain its performance over time.


Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective

http://arxiv.org/abs/2402.04621v1

Compressor summary: Randomly shuffling features among nodes of the same class improves graph neural network performance by reducing the dependence between graph topology and features.


Multi-Scale Semantic Segmentation with Modified MBConv Blocks

http://arxiv.org/abs/2402.04618v1

Compressor summary: The paper proposes a new adaptation of MBConv blocks for U-Net architectures to improve semantic segmentation by extracting more detailed spatial information.


InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

http://arxiv.org/abs/2402.04617v1

Compressor summary: InfLLM is a memory-based method that allows large language models to process and understand long sequences without training, improving their performance on long-distance dependency tasks.


TinyLLM: Learning a Small Student from Multiple Large Language Models

http://arxiv.org/abs/2402.04616v1

Compressor summary: TinyLLM is a novel knowledge distillation approach that leverages multiple large teacher models to train a small student model with diverse reasoning skills and contextual understanding.


ScreenAI: A Vision-Language Model for UI and Infographics Understanding

http://arxiv.org/abs/2402.04615v1

Compressor summary: ScreenAI is a vision-language model that understands UIs and infographics, using a unique mixture of datasets and text annotations to improve performance on various tasks.


Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

http://arxiv.org/abs/2402.04614v1

Compressor summary: The text discusses the trade-off between faithfulness and plausibility in self-explanations generated by large language models, emphasizing the importance of faithfulness for high-stakes decision-making and suggesting ways to improve it without losing plausibility.


Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach

http://arxiv.org/abs/2402.04609v1

Compressor summary: Key points: - Post-editing improves text quality of LLMs but has limitations - Neural programmer-interpreter approach preserves domain generalization and adapts editing actions for text generation - Approach outperforms other post-editing methods in cross-domain settings Summary: The paper proposes a neural programmer-interpreter that enhances LLM text quality by adapting editing actions to text generation tasks, while maintaining domain generalization.


Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector

http://arxiv.org/abs/2402.04601v1

Compressor summary: Key points: - The paper proposes an alignment-enhanced corrector for Chinese grammatical error correction (CGEC) to address overcorrection problems in Seq2Seq models and decoder-only LLMs. - The method involves training a correction model, using two alignment models, and transferring knowledge from them to the correction model. - The approach improves CGEC performance on three datasets. Summary: The paper presents an alignment-enhanced corrector for CGEC that uses two alignment models and knowledge transfer to reduce overcorrection in both Seq2Seq and decoder-only LLMs, leading to better CGEC results.


Meet JEANIE: a Similarity Measure for 3D Skeleton Sequences via Temporal-Viewpoint Alignment

http://arxiv.org/abs/2402.04599v1

Compressor summary: JEANIE is a method for aligning 3D skeleton sequences by adjusting temporal and camera views to improve few-shot action recognition and clustering.


CMSA algorithm for solving the prioritized pairwise test data generation problem in software product lines

http://arxiv.org/abs/2402.04597v1

Compressor summary: The paper proposes a new hybrid metaheuristic algorithm to generate test data for software product families, which outperforms existing methods in quality but takes longer to run.


Towards Improved Imbalance Robustness in Continual Multi-Label Learning with Dual Output Spiking Architecture (DOSA)

http://arxiv.org/abs/2402.04596v1

Compressor summary: The paper proposes a new spiking neural network architecture for continual multi-label learning that is computationally efficient and robust to data imbalance.


UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

http://arxiv.org/abs/2402.04588v1

Compressor summary: This paper introduces UltraLink, an open-source multilingual supervised fine-tuning dataset that considers both language-specific and language-agnostic abilities of large language models to improve their cross-lingual transfer capabilities.


Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation

http://arxiv.org/abs/2402.04587v1

Compressor summary: The study proposes a new method using a self-supervised masked autoencoder and a sparse masked boundary prompt to accurately segment teeth in CBCT dental images with limited labeled data.


A Psychological Study: Importance of Contrast and Luminance in Color to Grayscale Mapping

http://arxiv.org/abs/2402.04583v1

Compressor summary: The paper compares different algorithms for converting color images to grayscale using a psychological experiment with participants imagining a "colorless world" and evaluates their effectiveness based on visual quality, information preservation, and selection times.


Collective Counterfactual Explanations via Optimal Transport

http://arxiv.org/abs/2402.04579v1

Compressor summary: The text proposes a collective method for generating counterfactual explanations that considers the current density of individuals, which improves upon classical approaches by using optimal transport.


S-Agents: self-organizing agents in open-ended environment

http://arxiv.org/abs/2402.04578v1

Compressor summary: The paper proposes a self-organizing agent system (S-Agents) for flexible collaboration in open-ended settings, inspired by human organizational behavior.


Progressive Conservative Adaptation for Evolving Target Domains

http://arxiv.org/abs/2402.04573v1

Compressor summary: PCAda is a meta-learning approach for evolving domain adaptation that fine-tunes classifier heads with progressive class prototypes and uses conservative sparse attention to prevent interference with historical knowledge.


OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences

http://arxiv.org/abs/2402.04567v1

Compressor summary: OIL-AD is an unsupervised method for detecting anomalies in decision-making sequences using offline imitation learning and two features derived from Q function and state value function.


Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

http://arxiv.org/abs/2402.04563v1

Compressor summary: The paper proposes an attention-guided visualization method for ViT models that explains their decisions and localizes objects with high performance using only class labels.


Can Large Language Model Agents Simulate Human Trust Behaviors?

http://arxiv.org/abs/2402.04559v1

Compressor summary: The paper examines whether large language model agents can simulate human trust behaviors in Trust Games and finds that they can, with potential implications for scenarios where trust is important.


DMAT: A Dynamic Mask-Aware Transformer for Human De-occlusion

http://arxiv.org/abs/2402.04558v1

Compressor summary: The paper introduces a dynamic mask-aware transformer (DMAT) for human de-occlusion, using an expanded convolution head, a multi-head attention mechanism, and an amodal loss to improve the model's performance on AHP dataset.


FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

http://arxiv.org/abs/2402.04555v1

Compressor summary: The paper proposes a probabilistic label fusion and instance refinement method to improve instance-aware semantic mapping from object detection generated by foundation models, achieving better zero-shot performance on real-world datasets.


BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial Imagery

http://arxiv.org/abs/2402.04554v1

Compressor summary: BirdNeRF is a novel method that uses aerial imagery to reconstruct large-scale scenes faster and with better visual fidelity than traditional approaches, by decomposing the images into smaller sub-scenes and using a projection-guided re-rendering strategy.


Curvature-Informed SGD via General Purpose Lie-Group Preconditioners

http://arxiv.org/abs/2402.04553v1

Compressor summary: The paper proposes a novel method to accelerate stochastic gradient descent using curvature information and preconditioners based on connected Lie groups, which improves convergence and performance across various tasks and architectures.


Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data

http://arxiv.org/abs/2402.04542v1

Compressor summary: The study proposes a method to improve multilingual models by using native scripts for each language and aligning their representations in code-switched texts, achieving better results on Nepali-English and Hindi-English datasets.


BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception

http://arxiv.org/abs/2402.04541v1

Compressor summary: The authors created a large dataset of five types of brightness illusions and tested data-driven neural network approaches for classifying and locating them, achieving high accuracy and pixel accuracy.


Learning Diverse Policies with Soft Self-Generated Guidance

http://arxiv.org/abs/2402.04539v1

Compressor summary: The paper proposes a reinforcement learning method that uses diverse past trajectories as guidance to learn faster and more efficiently, even with sparse and deceptive rewards.


Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers

http://arxiv.org/abs/2402.04538v1

Compressor summary: The Triplet Graph Transformer (TGT) is a new model that enables direct communication between neighboring pairs in graphs and achieves state-of-the-art results on various molecular property prediction and optimization tasks.


SumRec: A Framework for Recommendation using Open-Domain Dialogue

http://arxiv.org/abs/2402.04523v1

Compressor summary: The SumRec framework uses chat summaries to personalize information recommendations based on speakers' interests, preferences, and experiences.


On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

http://arxiv.org/abs/2402.04520v1

Compressor summary: The paper studies the efficiency of modern Hopfield models for memory retrieval and shows a phase transition behavior based on pattern norms, with efficient variants possible under SETH assumptions.


BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

http://arxiv.org/abs/2402.04519v1

Compressor summary: BioDrone is a bionic drone-based visual benchmark for single object tracking (SOT) that evaluates the robustness of SOT methods in challenging conditions such as tiny target and fast motion, camera shake, and drastic changes between frames.


Online Cascade Learning for Efficient Inference over Streams

http://arxiv.org/abs/2402.04513v1

Compressor summary: The paper proposes online cascade learning, a method to use lower-capacity models and a deferral policy to answer queries about data streams with the help of large language models, achieving high accuracy while reducing inference costs by up to 90%.


Developments in Sheaf-Theoretic Models of Natural Language Ambiguities

http://arxiv.org/abs/2402.04505v1

Compressor summary: Sheaves are mathematical tools used to model discourse ambiguities in natural language processing and improve contextual models.


Text2Street: Controllable Text-to-image Generation for Street Views

http://arxiv.org/abs/2402.04504v1

Compressor summary: The Text2Street framework generates controllable street-view images from text by using a lane-aware road topology generator, an object layout generator, and a multiple control image generator.


The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

http://arxiv.org/abs/2402.04497v1

Compressor summary: This paper shows that the time complexity of training large language models (LLMs) is almost-linear for some parameter regimes, and provides a complete characterization of their fine-grained complexity.


Grandmaster-Level Chess Without Search

http://arxiv.org/abs/2402.04494v1

Compressor summary: This paper trains a large transformer model on a huge chess dataset to achieve strong chess performance without complex heuristics or explicit search algorithms.


ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation

http://arxiv.org/abs/2402.04492v1

Compressor summary: The ColorSwap dataset helps evaluate and improve multimodal models' ability to match objects with their colors by providing image-caption pairs with swapped color words.


De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

http://arxiv.org/abs/2402.04489v1

Compressor summary: The text discusses the trade-offs between privacy and fairness in machine learning, showing that differential privacy amplifies bias but can be mitigated by counterfactual data augmentation.


BEBLID: Boosted efficient binary local image descriptor

http://arxiv.org/abs/2402.04482v1

Compressor summary: BEBLID is a learned binary image descriptor that improves matching accuracy and efficiency for computer vision applications on devices with limited hardware and energy resources.