arxiv compressed, 2024-03-06

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-06 generated by the compressor, my personal LLM-based project.


FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

http://arxiv.org/abs/2403.03221v1

Compressor summary: The paper proposes a method that combines the strengths of correspondence-based and direct pose prediction methods to achieve both precision and robustness in 6DoF camera pose estimation.


LC-Tsalis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

http://arxiv.org/abs/2403.03219v1

Compressor summary: This study proposes a new algorithm for the linear contextual bandit problem that improves regret bounds and relaxes suboptimality gap assumptions, based on Follow-The-Regularized-Leader with Tsallis entropy.


The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

http://arxiv.org/abs/2403.03218v1

Compressor summary: The WMDP benchmark evaluates hazardous knowledge in large language models and serves as a basis for developing methods to unlearn such knowledge.


Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

http://arxiv.org/abs/2403.03217v1

Compressor summary: Key points: - The text is about a 3D patient body modeling method that uses multi-modal keypoint detection and self-supervised mesh regression. - The method addresses the issues of customization, data requirement, and annotation cost of existing solutions. - The method shows superior performance in patient positioning across different scenarios. Summary: The authors propose a 3D patient body modeling method that uses multi-modal keypoint detection and self-supervised mesh regression to improve patient positioning accuracy and efficiency, without the need for customization, extensive data, or costly annotations.


Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

http://arxiv.org/abs/2403.03206v1

Compressor summary: The authors propose an improved noise sampling technique for rectified flow models that focuses on perceptually relevant scales, a novel transformer-based architecture for text-to-image synthesis, and show its superior performance in various metrics and human evaluations.


CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments

http://arxiv.org/abs/2403.03203v1

Compressor summary: Key points: - Research agenda in AI focuses on learning and reasoning integration - Humans use background knowledge (constraints) to infer answers from partially observed scenes - CLEVR-POC is a novel benchmark for reasoning-intensive VQA under constraints - Neuro-symbolic models outperform pre-trained vision language models on CLEVR-POC Summary: The paper introduces CLEVR-POC, a new challenge for AI to use logical constraints and visual cues to answer questions about hidden objects in partially observable scenes, and shows that neuro-symbolic models excel at this task.


MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets

http://arxiv.org/abs/2403.03194v1

Compressor summary: MAGID is a framework for generating diverse and high-quality images to augment text-only dialogues using a diffusion model and a feedback loop.


Triple-CFN: Restructuring Conceptual Spaces for Enhancing Abstract Reasoning process

http://arxiv.org/abs/2403.03190v1

Compressor summary: The paper proposes the Triple-CFN approach and its variants, Meta Triple-CFN and Re-space layer, to improve artificial intelligence's performance on abstract reasoning problems such as Bongard-Logo and RPM.


Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement

http://arxiv.org/abs/2403.03188v1

Compressor summary: The study introduces an AI Assistant powered by GPT-4 to help communicate flood risks effectively, integrate real-time warnings with maps and data, and provide actionable advice for decision-makers and the public.


Reliable, Adaptable, and Attributable Language Models with Retrieval

http://arxiv.org/abs/2403.03187v1

Compressor summary: The paper argues that retrieval-augmented language models can overcome the limitations of parametric ones by incorporating large-scale datastores, but current implementations are not yet optimal and need improvement.


Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

http://arxiv.org/abs/2403.03186v1

Compressor summary: The text introduces Cradle, a foundation agent that can learn any computer task from screen images and audio inputs, and demonstrates its capabilities in the game Red Dead Redemption II.


Preventing Reward Hacking with Occupancy Measure Regularization

http://arxiv.org/abs/2403.03185v1

Compressor summary: The paper proposes using Occupancy Measure (OM) divergence instead of Action Distribution (AD) divergence to prevent reward hacking in reinforcement learning by regularizing towards a safe policy.


How Well Can Transformers Emulate In-context Newton's Method?

http://arxiv.org/abs/2403.03183v1

Compressor summary: The text discusses how Transformers can approximate higher order optimization methods and perform tasks like logistic regression using linear attention and ReLU layers.


Behavior Generation with Latent Actions

http://arxiv.org/abs/2403.03181v1

Compressor summary: VQ-BeT is a model that improves behavior generation by tokenizing continuous actions with vector quantization, achieving better performance and faster inference than existing models in various environments.


Unifying and Certifying Top-Quality Planning

http://arxiv.org/abs/2403.03176v1

Compressor summary: The paper proposes a unified definition for multiple high-quality planning problems, enabling efficient certification using task transformations.


Solving the bongard-logo problem by modeling a probabilistic model

http://arxiv.org/abs/2403.03173v1

Compressor summary: The study introduces PMoC and Pose-Transformer, which improve AI's abstract reasoning and cognitive pattern recognition by modeling probability and learning positional information from image data.


Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination

http://arxiv.org/abs/2403.03172v1

Compressor summary: The paper proposes a model-based framework, MAGI, that uses an Imagined common goal to coordinate multiple agents in cooperative reinforcement learning tasks, improving sample efficiency and performance.


PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset

http://arxiv.org/abs/2403.03167v1

Compressor summary: PARADISE is a Q&A task that tests large language models' abductive reasoning and implicit knowledge inference in practical procedural text, revealing limitations of current models and providing valuable insights for future research.


Design2Code: How Far Are We From Automating Front-End Engineering?

http://arxiv.org/abs/2403.03163v1

Compressor summary: The authors benchmark Design2Code task, where multimodal LLMs convert screenshots into code implementations for real-world webpages, and find that GPT-4V performs the best in this task.


PalmProbNet: A Probabilistic Approach to Understanding Palm Distributions in Ecuadorian Tropical Forest via Transfer Learning

http://arxiv.org/abs/2403.03161v1

Compressor summary: PalmProbNet is a new probabilistic approach using transfer learning to automatically detect palm trees in dense tropical rainforests using UAV-derived orthomosaic imagery, achieving high accuracy and visualizing palm distribution with probability heatmaps.


Deep-Learned Compression for Radio-Frequency Signal Classification

http://arxiv.org/abs/2403.03150v1

Compressor summary: The paper proposes a deep learned compression model, HQARF, that uses learned vector quantization to compress RF signals for AI processing, reducing bandwidth and latency costs.


Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

http://arxiv.org/abs/2403.03145v1

Compressor summary: The paper proposes a novel semi-supervised learning framework for audio-visual source localization that uses two teachers to generate high-quality pseudo-labels and outperforms current methods by a large margin.


Language Guided Exploration for RL Agents in Text Environments

http://arxiv.org/abs/2403.03141v1

Compressor summary: The Language Guided Exploration framework uses a language model to help reinforcement learning agents explore and learn better in complex text environments, improving performance compared to baselines.


Simplicity in Complexity

http://arxiv.org/abs/2403.03134v1

Compressor summary: The paper proposes a simple linear model using segmentation models to measure image complexity, showing that it is well explained by the number of segments and classes in an image across diverse datasets.


CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

http://arxiv.org/abs/2403.03129v1

Compressor summary: CoGenesis is a collaborative generation framework that integrates large and small language models on cloud and local devices to address privacy concerns while maintaining efficient command execution.


NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

http://arxiv.org/abs/2403.03122v1

Compressor summary: The authors propose Neural Riemannian Distance Fields (NRDFs), a data-driven method for modeling the space of plausible articulations, which improves pose generation, estimation, and inverse kinematics tasks across humans, hands, and animals.


Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution

http://arxiv.org/abs/2403.03121v1

Compressor summary: The study investigates gendered emotion attribution in large language models and finds that they consistently exhibit emotions influenced by gender stereotypes.


Motion-Corrected Moving Average: Including Post-Hoc Temporal Information for Improved Video Segmentation

http://arxiv.org/abs/2403.03120v1

Compressor summary: The paper proposes a technique to improve video segmentation performance without additional labeling by using optical flow to account for motion in the moving average calculation.


Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection

http://arxiv.org/abs/2403.03111v1

Compressor summary: The paper proposes a framework to improve real-time LiDAR odometry and mapping for self-driving cars by using semantic information and rejecting outliers in the matching process.


Emergent Equivariance in Deep Ensembles

http://arxiv.org/abs/2403.03103v1

Compressor summary: Deep ensembles, which are groups of neural networks with different weights, become equivariant, meaning they respond in the same way to different transformations of inputs, by using data augmentation, and this property holds for any architecture and regardless of input location, in the infinite width limit.


"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning

http://arxiv.org/abs/2403.03102v1

Compressor summary: In-Dialogue Learning (IDL) is a fine-tuning framework that improves persona-based personalized dialogue generation without pre-defined profiles by using large language models and dialogue history, achieving significant improvements in BLEU and ROUGE scores and human evaluations.


KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

http://arxiv.org/abs/2403.03101v1

Compressor summary: KnowAgent improves language agents' planning capabilities by incorporating explicit action knowledge, reducing planning hallucination and achieving better performance on complex tasks.


Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

http://arxiv.org/abs/2403.03095v1

Compressor summary: XPL is a novel semi-supervised AVSL method that uses cross-refinement and two components (soft pseudo-labels and curriculum data selection) to avoid bias, improve stability, and achieve state-of-the-art performance.


Recall-Oriented Continual Learning with Generative Adversarial Meta-Model

http://arxiv.org/abs/2403.03082v1

Compressor summary: The paper proposes a two-level framework for continual learning that separates stability and plasticity mechanisms using generative adversarial meta-models to maintain past knowledge.


MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

http://arxiv.org/abs/2403.03077v1

Compressor summary: The MiKASA Transformer is a novel model for 3D visual grounding that improves accuracy, spatial understanding, and explainability, achieving the highest overall performance in two challenges.


Detecting Concrete Visual Tokens for Multimodal Machine Translation

http://arxiv.org/abs/2403.03075v1

Compressor summary: The paper presents new methods for detecting and selecting visually-grounded text tokens in multimodal machine translation systems, using different techniques and improving performance.


Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

http://arxiv.org/abs/2403.03069v1

Compressor summary: The paper proposes two methods to improve variational autoencoder estimation from incomplete data by addressing the increased complexity of the latent variable distribution.


CrackNex: a Few-shot Low-light Crack Segmentation Model Based on Retinex Theory for UAV Inspections

http://arxiv.org/abs/2403.03063v1

Compressor summary: CrackNex is a framework that uses Retinex Theory to learn illumination-invariant representations for crack segmentation in low-light conditions, and utilizes few-shot segmentation to overcome the data efficiency issue.


Adding Multimodal Capabilities to a Text-only Translation Model

http://arxiv.org/abs/2403.03045v1

Compressor summary: We propose a method to improve multimodal machine translation models by combining a text-only MT model with vision-text adapter layers and pre-training and fine-tuning on specific datasets.


A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

http://arxiv.org/abs/2403.03037v1

Compressor summary: Key points: - The text is about transferring holistic perception to intelligent machines - It proposes a unified approach to video understanding with shared temporal modelling of human actions - It introduces EgoPack, a solution that creates a collection of task perspectives for multiple downstream tasks - It shows the effectiveness and efficiency of its approach on four benchmarks Summary: The text presents EgoPack, a unified approach to video understanding that enables intelligent machines to learn holistic perception from different tasks and use it for novel skills.


Learning to Use Tools via Cooperative and Interactive Agents

http://arxiv.org/abs/2403.03031v1

Compressor summary: The authors propose a cooperative framework that improves tool learning in large language models by modularizing the workflow and enabling agents to adapt based on feedback.


Socratic Reasoning Improves Positive Text Rewriting

http://arxiv.org/abs/2403.03029v1

Compressor summary: The authors propose a framework called extsc{SocraticReframe} that uses question-answer pairs to help reframe negative thoughts into positive ones, improving the performance of language models in this task.


Word Importance Explains How Prompts Affect Language Model Outputs

http://arxiv.org/abs/2403.03028v1

Compressor summary: The study introduces a method to improve the explainability of LLMs by varying individual words in prompts to measure their impact on model outputs and various text scores, addressing concerns about their transparency, reliability, and ethical use.


SplAgger: Split Aggregation for Meta-Reinforcement Learning

http://arxiv.org/abs/2403.03020v1

Compressor summary: The paper explores how sequence models with different properties affect meta-RL performance and proposes a new method called SplAgger that combines them.


CRISPR: Ensemble Model

http://arxiv.org/abs/2403.03018v1

Compressor summary: Key points: - CRISPR is a gene editing technology with challenges in predicting sgRNA efficacy and off-target effects - The paper proposes a novel ensemble learning method that combines multiple models to improve accuracy and generalizability - The method outperformed existing methods on a benchmark dataset and could have implications for clinical use of CRISPR Summary: The paper presents a new method to design sgRNAs for CRISPR gene editing that uses multiple machine learning models to improve the accuracy and generalizability of predictions, which could lead to safer and more effective treatments for diseases.


OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following

http://arxiv.org/abs/2403.03017v1

Compressor summary: The text introduces OPEx, a framework to analyze the impact of various components on embodied instruction following tasks using large language models, and shows that combining multi-agent dialogue and LLMs improves performance.


The Case for Evaluating Multimodal Translation Models on Text Datasets

http://arxiv.org/abs/2403.03014v1

Compressor summary: The paper suggests that multimodal machine translation (MMT) models should be evaluated using three different test sets to capture their use of visual information and complex sentence translation abilities, which are not measured by the current Multi30k testing set.


Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations

http://arxiv.org/abs/2403.03008v1

Compressor summary: The paper proposes using knowledge graphs to improve the quality and accuracy of explanations generated by large language models for personalized education.


Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

http://arxiv.org/abs/2403.03003v1

Compressor summary: Mixture-of-Resolution Adaptation (MRA) improves multimodal large language models' visual recognition by combining low- and high-resolution features from images.


Localized Zeroth-Order Prompt Optimization

http://arxiv.org/abs/2403.02993v1

Compressor summary: The paper explores how focusing on local optima instead of global optimization can improve prompt optimization for large language models, and proposes a new algorithm called ZOPO that uses a Gaussian process to efficiently search for local optima.


MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

http://arxiv.org/abs/2403.02991v1

Compressor summary: MADTP is a framework that aligns different modalities and dynamically adjusts compression ratios to reduce VLT computation costs while maintaining performance.


Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges

http://arxiv.org/abs/2403.02990v1

Compressor summary: This paper surveys how large language models impact data augmentation techniques, discussing challenges and opportunities in natural language processing and other domains.


Evolution Transformer: In-Context Evolutionary Optimization

http://arxiv.org/abs/2403.02985v1

Compressor summary: Evolution Transformer is a meta-optimization system that improves evolutionary algorithms by learning from data and using causal Transformers to update search distributions.


Doubly Abductive Counterfactual Inference for Text-based Image Editing

http://arxiv.org/abs/2403.02981v1

Compressor summary: The paper proposes a Doubly Abductive Counterfactual inference framework (DAC) to improve text-based image editing by balancing editability and fidelity, supporting various user intents.


A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching

http://arxiv.org/abs/2403.02975v1

Compressor summary: Keywords: semantic matching, natural language processing, multilingual, MCP-SM framework


Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

http://arxiv.org/abs/2403.02969v1

Compressor summary: The proposed AnyRef model generates pixel-wise object perceptions and natural language descriptions from various multi-modality references for more flexible visual-language interactions.


Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

http://arxiv.org/abs/2403.02966v1

Compressor summary: EFSum is a method that creates evidence-focused summaries of facts for knowledge graphs to enhance question answering with large language models.


ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities

http://arxiv.org/abs/2403.02965v1

Compressor summary: The paper shows that ChatGPT can perform well in face recognition, gender detection, and age estimation tasks, despite avoiding sensitive information.


WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction

http://arxiv.org/abs/2403.02962v1

Compressor summary: The paper introduces a new dataset for testing large language models' ability to edit irregular tables and evaluates their performance on it.


SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents

http://arxiv.org/abs/2403.02959v1

Compressor summary: SimuCourt is a judicial benchmark that evaluates agents' decision-making power using real-world data and a large legal knowledge base, outperforming existing methods.


On the Asymptotic Mean Square Error Optimality of Diffusion Probabilistic Models

http://arxiv.org/abs/2403.02957v1

Compressor summary: The paper proves that a specific diffusion probabilistic model denoising method converges to the best possible estimator, and shows how it can be used for noise removal and generation.


Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

http://arxiv.org/abs/2403.02951v1

Compressor summary: Key points: - Large Language Models (LLMs) are powerful tools for Text-to-SQL, but no consensus on prompt templates and design frameworks - Existing benchmarks lack comprehensive evaluation of LLMs across sub-tasks - New dataset and five evaluation tasks to assess LLMs' performance and propose optimal in-context learning solutions Summary: The paper introduces a new dataset and evaluation tasks to assess and optimize LLM-based Text-to-SQL systems, addressing the limitations of existing benchmarks and prompt templates.


A general approach to enhance the survivability of backdoor attacks by decision path coupling

http://arxiv.org/abs/2403.02950v1

Compressor summary: Venom is a new attack method to enhance existing backdoors and make them harder to detect and remove by model reconstruction-based defenses.


SAFFIRA: a Framework for Assessing the Reliability of Systolic-Array-Based DNN Accelerators

http://arxiv.org/abs/2403.02946v1

Compressor summary: This paper proposes a faster way to test the reliability of systolic array-based DNN accelerators using a new fault injection method.


Unsupervised Learning Approaches for Identifying ICU Patient Subgroups: Do Results Generalise?

http://arxiv.org/abs/2403.02945v1

Compressor summary: The paper tests if common patient subgroups in ICUs exist by comparing two different datasets but finds limited similarities, suggesting that standardized restructuring may not work and customization might be better.


Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

http://arxiv.org/abs/2403.02944v1

Compressor summary: Text-guided image compression algorithm achieves high perceptual and pixel-wise fidelity by using text-adaptive encoding and training with joint image-text loss, outperforming all baselines in terms of LPIPS.


AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models

http://arxiv.org/abs/2403.02938v1

Compressor summary: The system adjusts playback speed based on phonemes and uses speech recognizer score to ensure intelligibility, improving time efficiency of content comprehension.


AdAM: Adaptive Fault-Tolerant Approximate Multiplier for Edge DNN Accelerators

http://arxiv.org/abs/2403.02936v1

Compressor summary: The paper introduces a new fault-tolerant approximate multiplier designed for ASIC-based deep learning chips.


Fuzzy Datalog$^\exists$ over Arbitrary t-Norms

http://arxiv.org/abs/2403.02933v1

Compressor summary: The paper proposes a fuzzy extension of Datalog with existential rules to perform logical reasoning with uncertain neural and symbolic data from various sources.


RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

http://arxiv.org/abs/2403.02932v1

Compressor summary: The paper proposes RulePrompt, a method for text classification that uses logical expressions to characterize category meanings and improves performance by iteratively updating these rules and pseudo labels.


A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study

http://arxiv.org/abs/2403.02930v1

Compressor summary: The authors thoroughly investigated and replicated the BASS summarization system, finding performance discrepancies and stressing the importance of clear communication in replication studies.


From Spectra to Biophysical Insights: End-to-End Learning with a Biased Radiative Transfer Model

http://arxiv.org/abs/2403.02922v1

Compressor summary: The text describes a new machine learning method that improves the accuracy of retrieving forest variables from satellite data and corrects biases in radiative transfer models.


TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

http://arxiv.org/abs/2403.02920v1

Compressor summary: TaylorShift is a new method that improves the efficiency of attention mechanisms for long sequences in Transformers without sacrificing performance or token-to-token interactions.


Cross-Domain Image Conversion by CycleDM

http://arxiv.org/abs/2403.02919v1

Compressor summary: The paper presents CycleDM, a machine learning method that converts machine-printed characters to handwritten ones and vice versa, by combining CycleGAN with a diffusion model.


DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting

http://arxiv.org/abs/2403.02914v1

Compressor summary: The paper proposes a novel approach called DynST for optimizing sensor deployment in earth science systems by dynamically filtering important sensors based on spatio-temporal data.


ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

http://arxiv.org/abs/2403.02910v1

Compressor summary: The paper proposes a jailbreak attack on vision language models that can bypass their safety barriers and perform harmful actions using poisoned image-text data pairs.


Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks

http://arxiv.org/abs/2403.02909v1

Compressor summary: The paper presents a novel method for predicting gaze direction in low-light conditions using temporal event encoding and a dedicated neural network, achieving high accuracy.


Demonstrating Mutual Reinforcement Effect through Information Flow

http://arxiv.org/abs/2403.02902v1

Compressor summary: The study uses information flow analysis to show how word- and text-level classifications can mutually improve each other in text classification tasks and applies this knowledge to prompt learning for better predictions.


A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods

http://arxiv.org/abs/2403.02901v1

Compressor summary: This paper surveys automatic text summarization (ATS) methods using natural language processing and large language models, focusing on practical real-world implementations.


Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation

http://arxiv.org/abs/2403.02899v1

Compressor summary: DAMP is a technique that uses mutual prompting to align visual and textual embeddings for better unsupervised domain adaptation, improving performance on three benchmarks.


Zero-Shot Cross-Lingual Document-Level Event Causality Identification with Heterogeneous Graph Contrastive Transfer Learning

http://arxiv.org/abs/2403.02893v1

Compressor summary: The paper proposes a model that can detect causal relations between events in texts across different languages using heterogeneous graphs and contrastive transfer learning.


Enhancing Long-Term Person Re-Identification Using Global, Local Body Part, and Head Streams

http://arxiv.org/abs/2403.02892v1

Compressor summary: The paper proposes a framework for long-term person re-identification that uses global and local information, including body part features, to handle clothes-changing scenarios and achieves better performance than existing methods.


In Search of Truth: An Interrogation Approach to Hallucination Detection

http://arxiv.org/abs/2403.02889v1

Compressor summary: The paper proposes a new method to detect hallucinations in large language models, which can help improve their reliability and real-world adoption.


Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders

http://arxiv.org/abs/2403.02887v1

Compressor summary: The paper proposes conditional diffusion models as a promising decoder for generative image compression, which considers both rate and visual quality by adjusting tradeoffs.


Revisiting Confidence Estimation: Towards Reliable Failure Prediction

http://arxiv.org/abs/2403.02886v1

Compressor summary: The paper reveals that many confidence estimation methods are harmful for detecting misclassification errors, proposes a new method to enlarge the confidence gap and improve failure prediction performance.


MathScale: Scaling Instruction Tuning for Mathematical Reasoning

http://arxiv.org/abs/2403.02884v1

Compressor summary: MathScale is a method to create high-quality math reasoning data using large language models, which improves their mathematical problem-solving skills.


Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement

http://arxiv.org/abs/2403.02879v1

Compressor summary: Zero-LED is a novel diffusion model that enhances low-light images without paired training data, using bidirectional constraints and frequency-domain based appearance reconstruction.


ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving

http://arxiv.org/abs/2403.02877v1

Compressor summary: The paper proposes an active learning method for autonomous driving that efficiently annotates data based on route planning criteria and achieves similar performance to state-of-the-art methods with much less data.


Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples

http://arxiv.org/abs/2403.02875v1

Compressor summary: The authors propose a new pretraining method using hard negative text examples and introduce a challenging dataset (InpaintCOCO) to improve fine-grained concept understanding in multimodal models.


A Note on High-Probability Analysis of Algorithms with Exponential, Sub-Gaussian, and General Light Tails

http://arxiv.org/abs/2403.02873v1

Compressor summary: The paper presents a method for analyzing probabilistic algorithms with light-tailed randomization by simplifying them to use bounded random variables, which works for various distributions like exponential and sub-Gaussian, and gives examples of its application.


Precise Extraction of Deep Learning Models via Side-Channel Attacks on Edge/Endpoint Devices

http://arxiv.org/abs/2403.02870v1

Compressor summary: The text discusses how side-channel attacks can reveal vital information about deep learning models, enabling model extraction attacks even without prior knowledge of the target model.


An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers

http://arxiv.org/abs/2403.02839v1

Compressor summary: This study compares different judge models' evaluation of LLMs, finding that fine-tuned judge models outperform GPT4 on in-domain tasks but lag behind GPT4 in generalization and fairness.


SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix

http://arxiv.org/abs/2403.02833v1

Compressor summary: The paper presents SOFIM, a stochastic optimization method that uses the regularized Fisher information matrix to approximate the Hessian and achieve faster convergence in machine learning model training than existing methods.


Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

http://arxiv.org/abs/2403.02827v1

Compressor summary: The paper proposes a method to improve image-to-video fidelity by adding noise and rectifying it in a tuning-free way.


An Adaptive Hydropower Management Approach for Downstream Ecosystem Preservation

http://arxiv.org/abs/2403.02821v1

Compressor summary: The text proposes using hydropower plants as ecosystem protectors by adjusting discharges with a neural network and integrating it into management software, potentially increasing electricity production and mitigating climate change impacts.


Reconstruction for Sparse View Tomography of Long Objects Applied to Imaging in the Wood Industry

http://arxiv.org/abs/2403.02820v1

Compressor summary: The paper proposes a new neural network-based method for reconstructing 3D images of wood logs from 2D slices, improving the identification of biological features.


Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?

http://arxiv.org/abs/2403.02818v1

Compressor summary: Key points: - Sparsely-annotated framework for 3D object detection requires only one 3D bounding box per scene - SS3D++ method improves training and generates confident fully-annotated scenes from sparse seeds - Achieves competitive or better performance with significantly less annotation cost than SOTA methods Summary: The paper proposes a sparsely-annotated framework for 3D object detection, which uses SS3D++ method to generate confident fully-annotated scenes and reduces annotation cost while maintaining or improving performance.


InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

http://arxiv.org/abs/2403.02814v1

Compressor summary: InjectTST is a method to improve multivariate time series forecasting by selectively injecting global information into individual channels of a channel-independent Transformer model, enhancing its robustness and performance.


Dynamic Gaussian Graph Operator: Learning parametric partial differential equations in arbitrary discrete mechanics problems

http://arxiv.org/abs/2403.02810v1

Compressor summary: The paper introduces a new operator learning algorithm, DGGO, that can learn parametric PDEs and generalize on arbitrary discretization schemes for discrete mechanics problems using dynamic Gaussian graph kernels and Fourier neural operators.


DPPA: Pruning Method for Large Language Model to Model Merging

http://arxiv.org/abs/2403.02799v1

Compressor summary: The paper introduces DPPA, a dual-stage method for merging complex fine-tuned models that combines dynamically pruning and partition amplification to enhance performance while preserving fewer parameters.


Evaluating and Optimizing Educational Content with Large Language Model Judgments

http://arxiv.org/abs/2403.02795v1

Compressor summary: Key points: - The paper proposes using Language Models (LMs) as educational experts to assess the impact of instructions on learning outcomes - GPT-3.5 can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect - The paper introduces an instruction optimization approach using one LM as a reward function for another LM to generate math word problem worksheets - Human teachers' evaluations show a significant alignment between LM judgments and human teacher preferences Summary: The paper shows how Language Models can evaluate and optimize instructional materials by replicating educational findings and creating math worksheets that align with human teacher preferences.


Semi-Supervised Graph Representation Learning with Human-centric Explanation for Predicting Fatty Liver Disease

http://arxiv.org/abs/2403.02786v1

Compressor summary: This study uses graph neural networks to predict fatty liver disease from health checkup data, while providing personalized explanations for better clinical interpretability.


DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation

http://arxiv.org/abs/2403.02784v1

Compressor summary: The paper presents a hybrid training strategy and dual-domain image fusion for semantic segmentation of remote sensing images using unsupervised domain adaptation, improving performance through pseudo-label region-specific weighting.


Where the Really Hard Quadratic Assignment Problems Are: the QAP-SAT instances

http://arxiv.org/abs/2403.02783v1

Compressor summary: This paper investigates the phase transition in the Quadratic Assignment Problem (QAP), a challenging problem in combinatorial optimization, by introducing a new submodular-based design and analyzing its correlation with solving effort.


Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

http://arxiv.org/abs/2403.02782v1

Compressor summary: The paper introduces KEPP, a system that uses probabilistic procedural knowledge graphs to help agents create strategic plans for real-life tasks based on visual observations and instructions.


PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

http://arxiv.org/abs/2403.02781v1

Compressor summary: The paper proposes an unsupervised domain prompt distillation framework to transfer knowledge from a larger CLIP model to a smaller one using unlabeled images and pre-stored text features.


Data Collaboration Analysis Over Matrix Manifolds

http://arxiv.org/abs/2403.02780v1

Compressor summary: This paper presents a rigorous theoretical foundation for Non-Readily Identifiable Data Collaboration (NRI-DC) framework to improve the quality and diversity of training datasets while protecting user privacy, and shows that the proposed approach enhances model performance without compromising communication efficiency or privacy protections.


A Zero-Shot Reinforcement Learning Strategy for Autonomous Guidewire Navigation

http://arxiv.org/abs/2403.02777v1

Compressor summary: The paper presents a zero-shot learning strategy for autonomous endovascular navigation using reinforcement learning that can generalize to unseen vascular anatomies without retraining and achieve high success rates.


EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

http://arxiv.org/abs/2403.02775v1

Compressor summary: EasyQuant is a training-free, data-independent method for reducing the size of large language models without sacrificing performance or generalization.


Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives

http://arxiv.org/abs/2403.02772v1

Compressor summary: The paper introduces a novel AI-driven virtual rehabilitation framework that uses supervised contrastive learning with hard and soft negative samples to train a single model applicable to all exercise types, improving generalizability and reducing complexity in rehabilitation exercise assessment.


HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

http://arxiv.org/abs/2403.02769v1

Compressor summary: The paper proposes an unsupervised method to detect humans in complex scenes by transferring knowledge from synthetic data to real data using novel modules for representation and feature alignment.


DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

http://arxiv.org/abs/2403.02767v1

Compressor summary: DeconfuseTrack is a new multi-object tracker that uses Decomposed Data Association and Occlusion-aware Non-Maximum Suppression to reduce confusion in tracking objects across multiple videos.


G4-Attention: Deep Learning Model with Attention for predicting DNA G-Quadruplexes

http://arxiv.org/abs/2403.02765v1

Compressor summary: The text describes a new machine learning model called G4-attention that uses neural networks and attention layers to accurately predict four-stranded nucleic acid structures involved in various biological roles.


Emerging Synergies Between Large Language Models and Machine Learning in Ecommerce Recommendations

http://arxiv.org/abs/2403.02760v1

Compressor summary: The text discusses how large language models can improve recommendation systems by understanding users' interests and capturing textual information better than deep neural networks.


In-Memory Learning: A Declarative Learning Framework for Large Language Models

http://arxiv.org/abs/2403.02757v1

Compressor summary: The paper proposes a novel learning framework called In-memory Learning, where agents improve themselves by refining notes in their memory using natural language.


Role Prompting Guided Domain Adaptation with General Capability Preserve for Large Language Models

http://arxiv.org/abs/2403.02756v1

Compressor summary: REGA is a novel approach to adapt Large Language Models for multiple domains without compromising their general capabilities or causing confusion between domains.


Learning Group Activity Features Through Person Attribute Prediction

http://arxiv.org/abs/2403.02753v1

Compressor summary: This paper presents a novel method for learning features of multi-person activities without manual annotations, using person attributes and location guidance to disentangle the complex features and achieve state-of-the-art performance on two datasets.


Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

http://arxiv.org/abs/2403.02746v1

Compressor summary: Paraformer is a framework that uses a parallel CNN-Transformer feature extractor to map large-scale high-resolution land-cover from low-resolution historical data with weak supervision.


CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models

http://arxiv.org/abs/2403.02745v1

Compressor summary: The paper proposes a method to robustly recalibrate values in preference datasets for large language models, handling incomplete and corrupted data.


Towards Training A Chinese Large Language Model for Anesthesiology

http://arxiv.org/abs/2403.02742v1

Compressor summary: Hypnos is a Chinese Anesthesia model that improves data quality, fine-tunes with general medicine data, and introduces a standardized benchmark for evaluating medical large language models in anesthesiology.


Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

http://arxiv.org/abs/2403.02738v1

Compressor summary: The paper proposes a novel causal prompting method for large language models to mitigate biases using front-door adjustment, without accessing the model's parameters or logits.


Neural Fractional Differential Equations

http://arxiv.org/abs/2403.02737v1

Compressor summary: The text introduces Neural FDEs, a new deep learning architecture that adapts Fractional Differential Equations for modelling non-local and memory-dependent systems, potentially outperforming Neural ODEs in such tasks.


Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery

http://arxiv.org/abs/2403.02736v1

Compressor summary: The paper presents new cluster-based sampling methods for detecting rare objects in high-resolution images without labeled data or spatial prior, and applies them to identify bomas in the Serengeti Mara region.


A Two-Stage Training Method for Modeling Constrained Systems With Neural Networks

http://arxiv.org/abs/2403.02730v1

Compressor summary: The paper proposes a two-stage training method for Neural ODEs that solves constrained optimization problems without hyperparameters, improving model performance and explainability.


HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

http://arxiv.org/abs/2403.02727v1

Compressor summary: The paper shows that large language models can recognize human activities from raw sensor data using appropriate prompts and outperform traditional and deep learning baselines.


Minimum Topology Attacks for Graph Neural Networks

http://arxiv.org/abs/2403.02723v1

Compressor summary: The paper proposes a new type of attack on Graph Neural Networks that adaptively finds the minimum perturbation needed to fool each node, and shows its effectiveness on different datasets.


Multi-Scale Subgraph Contrastive Learning

http://arxiv.org/abs/2403.02719v1

Compressor summary: The paper proposes a multi-scale subgraph contrastive learning method for graphs that captures fine-grained semantic information by generating global and local views based on subgraph sampling.


DP-CRE: Continual Relation Extraction via Decoupled Contrastive Learning and Memory Structure Preservation

http://arxiv.org/abs/2403.02718v1

Compressor summary: The paper proposes a new framework for incrementally learning relations from data streams that balances preserving old information and acquiring new information, improving performance over existing methods.


Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

http://arxiv.org/abs/2403.02715v1

Compressor summary: The paper presents a Vietnamese language model fine-tuning method and an evaluation framework to improve large language models' effectiveness in processing Vietnamese.


DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

http://arxiv.org/abs/2403.02714v1

Compressor summary: The text introduces a new task (Adaptive Domain Generalization) and dataset (DomainVerse) for vision-language models to adapt to various realistic domains without fine-tuning, and proposes two tuning-free methods (Domain CLIP and Domain++ CLIP).


Android in the Zoo: Chain-of-Action-Thought for GUI Agents

http://arxiv.org/abs/2403.02713v1

Compressor summary: The paper proposes CoAT, a method to improve smartphone GUI agents by considering semantic information and actions thinking in natural language tasks.


Breeze-7B Technical Report

http://arxiv.org/abs/2403.02712v1

Compressor summary: Breeze-7B is an open-source language model that improves Chinese comprehension and chatbot skills, outperforming similar models.


Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation

http://arxiv.org/abs/2403.02707v1

Compressor summary: The paper introduces a method to improve medical VQA models by applying gradient-guided parameter perturbations during pre-training and fine-tuning.


Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment

http://arxiv.org/abs/2403.02698v1

Compressor summary: Causal Walk is a novel method for debiasing multi-hop fact verification using causal inference and front-door adjustment, which can handle complicated bias patterns hidden in multiple hops of evidence.


Controllable Prompt Tuning For Balancing Group Distributional Robustness

http://arxiv.org/abs/2403.02695v1

Compressor summary: The paper proposes an optimization scheme to improve performance on different groups or domains in distribution shifts, using Controllable Prompt Tuning (CPT) to reduce computational cost.


InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

http://arxiv.org/abs/2403.02691v1

Compressor summary: InjecAgent is a benchmark to assess and mitigate indirect prompt injection attacks on tool-integrated large language models, finding them vulnerable to such attacks.


Dirichlet-based Per-Sample Weighting by Transition Matrix for Noisy Label Learning

http://arxiv.org/abs/2403.02690v1

Compressor summary: The paper proposes a new method called RENT for learning with noisy labels by resampling based on the noise transition matrix, and shows its superior performance over existing methods.


Deep Common Feature Mining for Efficient Video Semantic Segmentation

http://arxiv.org/abs/2403.02689v1

Compressor summary: DCFM is a novel video semantic segmentation approach that leverages feature sharing to address challenges like redundant computation and feature propagation reliability.


Learning to Defer to a Population: A Meta-Learning Approach

http://arxiv.org/abs/2403.02683v1

Compressor summary: The L2D framework adapts to new experts at test-time using meta-learning and attention mechanism for safe and robust autonomous systems.


Time Weaver: A Conditional Time Series Generation Model

http://arxiv.org/abs/2403.02682v1

Compressor summary: Time Weaver is a new model that uses various types of contextual metadata to generate more realistic time series for applications like electricity demand forecasting, outperforming existing methods by up to 27%.


SGD with Partial Hessian for Deep Neural Networks Optimization

http://arxiv.org/abs/2403.02681v1

Compressor summary: The paper proposes a new optimization method for deep neural networks that combines first-order and second-order techniques, improving both accuracy and generalization.


Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

http://arxiv.org/abs/2403.02677v1

Compressor summary: The proposed framework uses fine-tuned multimodal language models to filter image-text data more effectively than existing methods, leading to improved performance on popular models and tasks.


Revisiting Meta-evaluation for Grammatical Error Correction

http://arxiv.org/abs/2403.02674v1

Compressor summary: SEEDA is a new dataset for grammatical error correction meta-evaluation that improves correlation by using consistent granularity and considering modern systems.


Few-shot Learner Parameterization by Diffusion Time-steps

http://arxiv.org/abs/2403.02649v1

Compressor summary: TiF learner uses diffusion models and low-rank adapters to extract nuanced class attributes from few-shot images for classification.


Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

http://arxiv.org/abs/2403.02648v1

Compressor summary: Key points: - KATE is a new optimization algorithm based on AdaGrad - KATE is scale-invariant for Generalized Linear Models - KATE has a convergence rate similar to AdaGrad and Adam - KATE performs better than AdaGrad and matches/surpasses Adam in various tasks Summary: KATE is a novel, scale-invariant optimization algorithm that achieves comparable or better performance than existing adaptive methods like Adam and AdaGrad in different machine learning problems.


FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model

http://arxiv.org/abs/2403.02647v1

Compressor summary: The paper presents FinReport, a system that helps ordinary investors collect, analyze, and generate reports on stock earnings using financial news and a multi-factor model.


HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

http://arxiv.org/abs/2403.02640v1

Compressor summary: Key points: - V2X is a popular topic in autonomous driving - VIC improves roadside perception with multi-sensor holographic systems - HoloVIC is a large-scale dataset with different sensors and intersections - HoloVIC has four tasks and benchmarks for research purposes Summary: The paper introduces HoloVIC, a large-scale dataset for VIC research in autonomous driving, using multi-sensor holographic systems at various intersections.


False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy

http://arxiv.org/abs/2403.02639v1

Compressor summary: The paper introduces false-positive sampling, a technique that improves 3D object detection models by retraining them using point clouds predicted as false positives, and shows its effectiveness on KITTI and Waymo Open datasets.


BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection

http://arxiv.org/abs/2403.02637v1

Compressor summary: The paper proposes a method called Brain-inspired Streaming Dual-level Perturbations (BSDP) that uses old samples as perturbations to help deep learning models learn new categories without forgetting the old ones, inspired by how humans learn.


Interactive Continual Learning: Fast and Slow Thinking

http://arxiv.org/abs/2403.02628v1

Compressor summary: The paper proposes a novel Interactive Continual Learning framework that leverages collaborative interactions among different-sized language models, enabling them to better emulate advanced life forms' continual learning abilities.


Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

http://arxiv.org/abs/2403.02626v1

Compressor summary: The proposed framework replaces manual image labeling with natural language interactions to quickly and effectively train classifiers for subjective or nuanced concepts, reducing the need for crowd-sourced annotations and outperforming existing methods.


Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects

http://arxiv.org/abs/2403.02624v1

Compressor summary: The paper presents a Pareto-Efficient algorithm for optimal treatment selection by balancing short-term and long-term effects, using Pareto-Optimal Estimation and Policy Learning methods.


World Models for Autonomous Driving: An Initial Survey

http://arxiv.org/abs/2403.02622v1

Compressor summary: The paper reviews world models, a promising approach for autonomous driving systems to predict future scenarios and compensate for information gaps, and discusses their theoretical foundations, practical applications, and ongoing research challenges.


Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems

http://arxiv.org/abs/2403.02616v1

Compressor summary: MAD-Transformer is a method to detect and diagnose abnormal behaviors in industrial cyber-physical systems using fine-grained adaptive anomaly diagnosis, capturing temporal, spatial, and series dependencies among multivariate time series.


Exploring the Limitations of Large Language Models in Compositional Relation Reasoning

http://arxiv.org/abs/2403.02615v1

Compressor summary: The MCR benchmark tests LLMs' ability to reason about different types of composition relations in English and five other languages, evaluating their robustness and adaptability.


A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

http://arxiv.org/abs/2403.02611v1

Compressor summary: The paper proposes a unified framework using multi-pyramid transformer and extended frequency contrastive regularization to address defocus blur in microscope imaging, achieving state-of-the-art performance.


ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level Generation

http://arxiv.org/abs/2403.02610v1

Compressor summary: The paper describes the second ChatGPT4PCG competition at IEEE Conference on Games, which improves on the first edition by introducing a new evaluation metric, allowing Python submissions, and making other changes to foster prompt engineering for procedural content generation.


DNNLasso: Scalable Graph Learning for Matrix-Variate Data

http://arxiv.org/abs/2403.02608v1

Compressor summary: The paper introduces DNNLasso, a faster and more accurate method for estimating a sparse Kronecker-sum structure of precision matrices in matrix-variate Gaussian graphical models.


TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

http://arxiv.org/abs/2403.02600v1

Compressor summary: The paper proposes TESTAM, a deep learning model that separately captures recurring and non-recurring traffic patterns using a mixture-of-experts approach with three experts for temporal, spatio-temporal, and dynamic dependency modeling.


Pooling Image Datasets With Multiple Covariate Shift and Imbalance

http://arxiv.org/abs/2403.02598v1

Compressor summary: The paper proposes a Category theory-based approach to handle shifts/imbalances in covariates for overparameterized models, simplifying data analysis and unifying different problem settings.


Improving Event Definition Following For Zero-Shot Event Detection

http://arxiv.org/abs/2403.02586v1

Compressor summary: The authors propose a new approach to improve zero-shot event detection by creating a diverse dataset of event definitions and fine-tuning a LLaMA-2-7B model, achieving better results than GPT-3.5 on open benchmarks.


VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing

http://arxiv.org/abs/2403.02581v1

Compressor summary: VEglue is a new testing method for visual entailment systems that uses object alignment and joint erasing to detect issues and improve model performance.


What do we learn from inverting CLIP models?

http://arxiv.org/abs/2403.02580v1

Compressor summary: Inverting CLIP models generates images related to the given target prompts, which can reveal insights into their abilities and biases, but may also produce NSFW content.


Learning-augmented Online Minimization of Age of Information and Transmission Costs

http://arxiv.org/abs/2403.02573v1

Compressor summary: The paper proposes a learning-augmented online algorithm for resource-constrained wireless communication that balances transmission and staleness costs while ensuring worst-case performance guarantees and good average performance.


DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

http://arxiv.org/abs/2403.02571v1

Compressor summary: DPAdapter is a novel technique that boosts the performance of differentially private machine learning models by improving parameter robustness.


Eliciting Better Multilingual Structured Reasoning from LLMs through Code

http://arxiv.org/abs/2403.02567v1

Compressor summary: xSTREAM is a multilingual dataset for structured reasoning and explanation tasks that reveals LLM performance gaps between English and other languages, and proposes methods to improve reasoning by incorporating code using machine translation and step-by-step prompts.


Systemic Biases in Sign Language AI Research: A Deaf-Led Call to Reevaluate Research Agendas

http://arxiv.org/abs/2403.02563v1

Compressor summary: The paper reviews 101 recent papers on sign language AI, revealing significant biases and lack of input from Deaf stakeholders, and calls for more ethical development and inclusion of Deaf perspectives in the field.


Semantic Human Mesh Reconstruction with Textures

http://arxiv.org/abs/2403.02561v1

Compressor summary: SHERT is a novel pipeline for reconstructing semantic human meshes with textures and high-precision details, overcoming challenges faced by current methods in industrial applications.


Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research

http://arxiv.org/abs/2403.02558v1

Compressor summary: The text discusses the need for updated guidelines in using and evaluating generative models in clinical AI research, building on the MI-CLAIM checklist.