arxiv compressed, 2024-02-27

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-27 generated by the compressor, my personal LLM-based project.


Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision

http://arxiv.org/abs/2402.16508v1

Compressor summary: The paper presents a single encoder-decoder model for cross-lingual question answering (CLQA) that uses self-supervision from Wikipedia's link structure to perform retrieval and answer generation without auxiliary resources like machine translation.


Stochastic Conditional Diffusion Models for Semantic Image Synthesis

http://arxiv.org/abs/2402.16506v1

Compressor summary: The paper proposes SCDM, a robust conditional diffusion model for semantic image synthesis with noisy labels, which enhances robustness by stochastically perturbing the semantic label maps through Label Diffusion and using a class-wise noise schedule.


Memory GAPS: Would LLM pass the Tulving Test?

http://arxiv.org/abs/2402.16505v1

Compressor summary: The Tulving Test measures memory tasks to evaluate a model of human recall, and explores if this model applies to LLMs' remembering abilities.


LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

http://arxiv.org/abs/2402.16499v1

Compressor summary: LLMArena is a framework for testing large language models' abilities in multi-agent dynamics using seven gaming environments and Trueskill scoring.


Intelligent Known and Novel Aircraft Recognition -- A Shift from Classification to Similarity Learning for Combat Identification

http://arxiv.org/abs/2402.16486v1

Compressor summary: The text describes a novel AI method that can accurately identify both known and unknown military and civilian aircraft from low-resolution images, overcoming the limitations of traditional methods.


On Languaging a Simulation Engine

http://arxiv.org/abs/2402.16482v1

Compressor summary: The Lang2Sim framework uses functionalized language models to transform textual descriptions of material simulations into executable code, enabling interactive navigation and efficient programming.


Edge Detectors Can Make Deep Convolutional Neural Networks More Robust

http://arxiv.org/abs/2402.16479v1

Compressor summary: This paper proposes a binary edge feature branch for deep convolutional neural networks to improve their robustness against adversarial examples by incorporating shape-like features with texture features.


DCVSMNet: Double Cost Volume Stereo Matching Network

http://arxiv.org/abs/2402.16473v1

Compressor summary: DCVSMNet is a fast stereo matching network that uses two small cost volumes to produce accurate results with some trade-offs in inference time.


mEdIT: Multilingual Text Editing via Instruction Tuning

http://arxiv.org/abs/2402.16472v1

Compressor summary: mEdIT is a multi-lingual text editing model that takes user instructions and uses pre-trained language models to perform tasks like error correction, simplification, and paraphrasing across diverse languages.


Unveiling Vulnerability of Self-Attention

http://arxiv.org/abs/2402.16470v1

Compressor summary: The paper proposes HackAttend, a perturbation technique that attacks PLMs by altering attention scores in self-attention mechanisms, and introduces S-Attend, a smoothing technique that makes SA robust to such attacks.


Learning to Schedule Online Tasks with Bandit Feedback

http://arxiv.org/abs/2402.16463v1

Compressor summary: DOL-RM is an online task scheduling algorithm that optimizes performance by estimating rewards, costs, and arrival distributions under uncertainty.


Defending LLMs against Jailbreaking Attacks via Backtranslation

http://arxiv.org/abs/2402.16459v1

Compressor summary: The paper proposes a backtranslation method to reveal the hidden intent of original prompts and defend language models against jailbreaking attacks.


D-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection

http://arxiv.org/abs/2402.16458v1

Compressor summary: ID-XCB is a new technique that reduces biases in cyberbullying detection by ignoring swear words without harming performance, and it works well on different datasets.


RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering

http://arxiv.org/abs/2402.16457v1

Compressor summary: This paper introduces RetrievalQA, a benchmark to evaluate adaptive retrieval-augmented generation methods, and proposes Time-Aware Adaptive Retrieval (TA-ARE), which improves the efficiency and relevance of sourced information.


ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

http://arxiv.org/abs/2402.16444v1

Compressor summary: ShieldLM is a safety detector for Large Language Models that aligns with human safety standards, supports customization, and provides explanations.


On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

http://arxiv.org/abs/2402.16442v1

Compressor summary: The paper proposes a distributed bounding algorithm for subset selection problems with provable approximation guarantees and shows its effectiveness on large-scale datasets.


Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

http://arxiv.org/abs/2402.16438v1

Compressor summary: The paper proposes a method to identify language-specific regions in Transformer architectures of large language models (LLMs) and demonstrates how to control their output language by activating or deactivating these regions.


Training Implicit Generative Models via an Invariant Statistical Loss

http://arxiv.org/abs/2402.16435v1

Compressor summary: The authors propose a discriminator-free method for training one-dimensional and multivariate generative implicit models that learns complex data distributions and avoids mode-dropping issues.


RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

http://arxiv.org/abs/2402.16431v1

Compressor summary: The paper proposes using code-style instructions to improve LLMs' robustness to adversarial samples and introduces a novel method for few-shot learning with both clean and adversarial contexts.


COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing

http://arxiv.org/abs/2402.16424v1

Compressor summary: COMAE is a method for improving zero-shot hashing by exploring locality relationships and utilizing continuous-value attributes, achieving better performance on large-scale retrieval scenarios.


Outline-Guided Object Inpainting with Diffusion Models

http://arxiv.org/abs/2402.16421v1

Compressor summary: Key points: - The paper proposes a method to generate new instance segmentation images by filling out masked areas with desired object classes using diffusion-based inpainting and object outline guidance. - The method preserves mask annotations, shape characteristics, and introduces diversity within the augmented area. - The method can be combined with text guidance and other image augmentation techniques. Summary: The paper presents a technique to create diverse instance segmentation datasets by inpainting object classes into masked regions using object outlines as guidance, while keeping mask annotations and shape features intact.


Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

http://arxiv.org/abs/2402.16420v1

Compressor summary: The paper proposes a method to generate training data using PaLM 2 and train smaller models to predict SDGs for university courses, achieving an F1-score of 0.786.


TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

http://arxiv.org/abs/2402.16412v1

Compressor summary: The paper introduces TOTEM, a method to represent time series data with discrete vectors, enabling generalist training across tasks and domains without tuning.


CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency

http://arxiv.org/abs/2402.16407v1

Compressor summary: The paper proposes a simple depth-aware consistency method for NeRF that uses layered representations and constrained rendering to improve novel view synthesis when few input views are available.


From RAGs to riches: Using large language models to write documents for clinical trials

http://arxiv.org/abs/2402.16406v1

Compressor summary: The text evaluates the use of large language models (LLMs) to generate clinical trial protocols and shows that combining them with retrieval-augmented generation can improve their quality significantly.


Graph Learning with Distributional Edge Layouts

http://arxiv.org/abs/2402.16402v1

Compressor summary: The paper introduces Distributional Edge Layouts (DELs), a pre-processing method for Graph Neural Networks (GNNs) that samples edge layouts using Langevin dynamics and Boltzmann distribution, improving GNN performance on various tasks.


Analysis of Embeddings Learned by End-to-End Machine Learning Eye Movement-driven Biometrics Pipeline

http://arxiv.org/abs/2402.16399v1

Compressor summary: The paper investigates how eye movement biometrics using machine learning are affected by input data variations in terms of temporal persistence, reliability, and efficacy.


Placing Objects in Context via Inpainting for Out-of-distribution Segmentation

http://arxiv.org/abs/2402.16392v1

Compressor summary: The paper introduces Placing Objects in Context (POC), a pipeline that can realistically add any object into any image, and shows how it can improve anomaly segmentation and learning new classes for semantic segmentation models.


MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

http://arxiv.org/abs/2402.16389v1

Compressor summary: The paper introduces MoZIP, a new benchmark for evaluating large language models in the intellectual property domain, and presents MoZi, a multilingual IP-oriented model that outperforms other LLMs on the benchmark.


On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

http://arxiv.org/abs/2402.16387v1

Compressor summary: This paper explores the generalization ability of different Temporal Graph Learning (TGL) algorithms and proposes Simplified-Temporal-Graph-Network with improved performance and lower complexity.


Self Supervised Correlation-based Permutations for Multi-View Clustering

http://arxiv.org/abs/2402.16383v1

Compressor summary: The paper proposes a novel end-to-end deep learning framework for multi-view clustering that learns fused data representations and cluster assignments simultaneously.


Immunization against harmful fine-tuning attacks

http://arxiv.org/abs/2402.16382v1

Compressor summary: The paper proposes "Immunization conditions" as a framework for defending against harmful fine-tuning attacks on large language models (LLMs) by bad actors.


Improving LLM-based Machine Translation with Systematic Self-Correction

http://arxiv.org/abs/2402.16379v1

Compressor summary: The TER system helps improve machine translations by using human feedback to correct errors in large language models across different languages.


Graph Learning under Distribution Shifts: A Comprehensive Survey on Domain Adaptation, Out-of-distribution, and Continual Learning

http://arxiv.org/abs/2402.16374v1

Compressor summary: The text discusses graph learning methods that handle distribution shifts in real-world data, categorizes them into different scenarios, and provides a survey of existing approaches and future directions.


DEYO: DETR with YOLO for End-to-End Object Detection

http://arxiv.org/abs/2402.16370v1

Compressor summary: Step-by-step training improves object detection by initializing the backbone with a classic detector, then training the decoder from scratch, achieving real-time performance and accuracy without extra data.


Generative AI in Vision: A Survey on Models, Metrics and Applications

http://arxiv.org/abs/2402.16369v1

Compressor summary: This paper surveys generative AI diffusion models, their techniques, applications, and challenges across various domains, highlighting their potential for creative tasks and data augmentation.


Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models

http://arxiv.org/abs/2402.16367v1

Compressor summary: The study analyzes how large language models process multiple languages using a Mixture of Experts architecture, discovering both non-language-specific and language-specific neurons that can be used to improve performance and guide model training.


SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field

http://arxiv.org/abs/2402.16366v1

Compressor summary: The paper introduces SPC-NeRF, a new compression technique for Neural Radiance Fields using spatial predictive coding, which achieves better efficiency and quality than existing methods.


Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions

http://arxiv.org/abs/2402.16364v1

Compressor summary: The paper introduces the Rendezvous (RVS) task and dataset for studying geospatial instructions with map knowledge that involve more complex spatial relations than previous navigation benchmarks.


LLM Inference Unveiled: Survey and Roofline Model Insights

http://arxiv.org/abs/2402.16363v1

Compressor summary: This paper surveys the current state of research on efficient Large Language Model inference, introducing a roofline model-based framework to analyze and compare various techniques, and provides valuable insights for practical implementation.


Layer-wise Regularized Dropout for Neural Language Models

http://arxiv.org/abs/2402.16361v1

Compressor summary: LR-Drop is a new technique for Transformer-based language models that uses consistency training to regularize dropout at the output layer, leading to improved performance on various natural language understanding and generation tasks.


Feedback Efficient Online Fine-Tuning of Diffusion Models

http://arxiv.org/abs/2402.16359v1

Compressor summary: The paper proposes a new RL method to efficiently explore and optimize complex data distributions using diffusion models.


An Integrated Data Processing Framework for Pretraining Foundation Models

http://arxiv.org/abs/2402.16358v1

Compressor summary: The paper introduces a data processing framework to improve foundation models' performance by refining their pretraining data quality using operators at different levels, probing, and evaluation tools.


What Text Design Characterizes Book Genres?

http://arxiv.org/abs/2402.16356v1

Compressor summary: The study examines how text design on book covers affects our understanding of book genres using semantic information and visual design.


Language-guided Skill Learning with Temporal Variational Inference

http://arxiv.org/abs/2402.16354v1

Compressor summary: The algorithm uses LLMs to segment trajectories and then merges them to discover reusable skills for agents, improving their performance on new tasks.


MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

http://arxiv.org/abs/2402.16352v1

Compressor summary: MathGenie is a new method to create diverse and reliable math problems from seed data, using augmentation, back-translation, and rationale-based verification, achieving state-of-the-art performance in mathematical reasoning.


Impression-CLIP: Contrastive Shape-Impression Embedding for Fonts

http://arxiv.org/abs/2402.16350v1

Compressor summary: Impression-CLIP is a machine-learning model that uses CLIP to co-embed font images and their impressions for cross-modal retrieval, achieving better accuracy than existing methods and being robust to noise and missing tags.


C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

http://arxiv.org/abs/2402.16349v1

Compressor summary: Controlled-GAIL (C-GAIL) is a new algorithm that uses control theory to improve the stability and efficiency of imitation learning by GANs.


CodeS: Towards Building Open-source Language Models for Text-to-SQL

http://arxiv.org/abs/2402.16347v1

Compressor summary: The paper introduces CodeS, an open-source series of pre-trained language models designed for the text-to-SQL task, which outperforms existing closed-source models in accuracy and robustness while having smaller parameter sizes.


Boosting Graph Pooling with Persistent Homology

http://arxiv.org/abs/2402.16346v1

Compressor summary: The paper proposes a novel method to improve graph neural networks by injecting global topological invariance into pooling layers using persistent homology.


Contingency Planning Using Bi-level Markov Decision Processes for Space Missions

http://arxiv.org/abs/2402.16342v1

Compressor summary: The paper proposes a bi-level MDP framework for efficient and flexible rover mission contingency planning, addressing computational challenges in stochastic scenarios.


BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM

http://arxiv.org/abs/2402.16338v1

Compressor summary: BLO-SAM is a model that improves semantic segmentation by automatically identifying objects without manual prompts and reducing overfitting risk through bi-level optimization.


Achieving $\tilde{O}(1/ε)$ Sample Complexity for Constrained Markov Decision Process

http://arxiv.org/abs/2402.16324v1

Compressor summary: The paper proposes a new framework for analyzing constrained Markov decision processes (CMDPs), deriving optimal regret bounds and improving sample complexity by operating in the primal space and resolving the primal LP online with adaptive resource capacities.


Data-freeWeight Compress and Denoise for Large Language Models

http://arxiv.org/abs/2402.16319v1

Compressor summary: The paper proposes a novel data-free method for compressing large language models' parameters by using rank-k approximation, achieving significant parameter reduction while preserving performance.


Gradient-Guided Modality Decoupling for Missing-Modality Robustness

http://arxiv.org/abs/2402.16318v1

Compressor summary: The paper proposes a method to improve multimodal learning with incomplete data by reducing modality dominance and decoupling modalities, achieving better performance on three benchmarks.


Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models

http://arxiv.org/abs/2402.16315v1

Compressor summary: Instruction-tuned Large Vision-Language Models struggle with fine-grained visual categorization and explanation due to modality gap, while Finer benchmark aims to improve evaluation of their abilities.


Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

http://arxiv.org/abs/2402.16313v1

Compressor summary: The Chain-of-Discussion framework uses multiple open-source LLMs to improve the correctness and comprehensiveness of answers for open-ended question answering tasks.


Cross-domain Chinese Sentence Pattern Parsing

http://arxiv.org/abs/2402.16311v1

Compressor summary: The paper introduces a new SPS parsing method that uses large language models and self-training to adapt to different domains, improving performance over rule-based approaches.


REPLAY: Modeling Time-Varying Temporal Regularities of Human Mobility for Location Prediction over Sparse Trajectories

http://arxiv.org/abs/2402.16310v1

Compressor summary: REPLAY is a new RNN architecture for location prediction that incorporates timestamp embeddings with adaptive bandwidths to capture time-varying temporal regularities in human mobility data.


Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

http://arxiv.org/abs/2402.16305v1

Compressor summary: The paper proposes a training-free method to improve text-image alignment in Diffusion Probabilistic Models by using discriminative Vision-Language Models and Score Distillation Sampling, achieving near state-of-the-art performance on T2I-Compbench.


Graph Diffusion Policy Optimization

http://arxiv.org/abs/2402.16302v1

Compressor summary: Graph Diffusion Policy Optimization (GDPO) is a new method that uses reinforcement learning to optimize graph diffusion models for arbitrary objectives, achieving state-of-the-art performance in various graph generation tasks.


Conformalized Selective Regression

http://arxiv.org/abs/2402.16300v1

Compressor summary: The paper proposes a new selective regression method called conformalized selective regression that uses model-specific biases to measure uncertainty and evaluate its performance.


MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer

http://arxiv.org/abs/2402.16298v1

Compressor summary: The paper proposes a new transformer-based multi-view network for breast cancer classification that leverages inter-view correlations using a novel dynamic attention block.


Poisson-Gamma Dynamical Systems with Non-Stationary Transition Dynamics

http://arxiv.org/abs/2402.16297v1

Compressor summary: The paper proposes a new Bayesian model for count time series that can adapt to changing dynamics over time, and shows that it performs better than existing models in predicting future values.


mAPm: multi-scale Attention Pyramid module for Enhanced scale-variation in RLD detection

http://arxiv.org/abs/2402.16291v1

Compressor summary: The proposed multi-scale Attention Pyramid module (mAPm) enhances object detection by integrating dilated convolutions, global self-attention, and refined up-sampling into the Feature Pyramid Network (FPN), achieving significant improvements in scale-variant tasks like Rice Leaf Disease detection.


PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

http://arxiv.org/abs/2402.16288v1

Compressor summary: PerLTQA is a dataset for question answering that incorporates personalized memories, focusing on social interactions and events, to enhance dialogues using a novel framework for memory classification, retrieval, and synthesis.


Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation

http://arxiv.org/abs/2402.16280v1

Compressor summary: The paper proposes a meta-learning based framework for nucleus instance segmentation using few-shot learning and structural guidance, which achieves high performance with minimal annotations.


A Self-matching Training Method with Annotation Embedding Models for Ontology Subsumption Prediction

http://arxiv.org/abs/2402.16278v1

Compressor summary: The paper proposes a self-matching training method to improve concept subsumption prediction in ontologies using InME and CoME embeddings, which capture global and local information in annotation axioms.


From Large Language Models and Optimization to Decision Optimization CoPilot: A Research Manifesto

http://arxiv.org/abs/2402.16269v1

Compressor summary: The paper proposes DOCP, an AI tool that uses LLMs to help users create and solve optimization models for business problems in natural language.


Foundation Model Transparency Reports

http://arxiv.org/abs/2402.16268v1

Compressor summary: The authors propose Foundation Model Transparency Reports based on 6 design principles and 100 indicators to ensure responsible development and deployment of AI models.


Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

http://arxiv.org/abs/2402.16267v1

Compressor summary: The paper proposes using natural language to express the objective of infrared-visible image fusion, improving performance by encoding texts into a multi-modal embedding space and constructing a language-driven fusion model with a supervised loss function.


UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval

http://arxiv.org/abs/2402.16261v1

Compressor summary: The text proposes a multi-task framework with a dual-encoder architecture to act as a universal retriever for persona, knowledge, and response selection in conversational retrieval systems, improving efficiency and performance.


SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking

http://arxiv.org/abs/2402.16249v1

Compressor summary: SeqTrack3D is a novel Sequence-to-Sequence tracker that combines point clouds and bounding boxes to improve 3D single object tracking performance, especially in scenes with sparse points.


Topic-to-essay generation with knowledge-based content selection

http://arxiv.org/abs/2402.16248v1

Compressor summary: The paper proposes a novel copy mechanism model for generating diverse and coherent paragraphs from given topics, and introduces an improved prefix tuning method and a new Chinese dataset for this task.


Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

http://arxiv.org/abs/2402.16247v1

Compressor summary: The text introduces a new AI challenge called CLAP, where a 'joiner' agent learns communication strategies by imitating or translating existing interactions in a target community.


Real-Time Vehicle Detection and Urban Traffic Behavior Analysis Based on UAV Traffic Videos on Mobile Devices

http://arxiv.org/abs/2402.16246v1

Compressor summary: The paper presents a system that uses drones and deep learning to collect and analyze urban traffic data in real-time on mobile devices.


HSONet:A Siamese foreground association-driven hard case sample optimization network for high-resolution remote sensing image change detection

http://arxiv.org/abs/2402.16242v1

Compressor summary: The text proposes a new method (HSONet) to improve change detection models by addressing imbalance and missingness issues in learning hard cases using equilibrium optimization and scene context.


Active Level Set Estimation for Continuous Search Space with Theoretical Guarantee

http://arxiv.org/abs/2402.16237v1

Compressor summary: Our novel algorithm finds level sets in continuous search spaces without discretization, using a confidence-based acquisition function and achieving theoretical convergence and superior performance on various datasets.


GARNN: An Interpretable Graph Attentive Recurrent Neural Network for Predicting Blood Glucose Levels via Multivariate Time Series

http://arxiv.org/abs/2402.16230v1

Compressor summary: GARNNs are interpretable graph attentive recurrent neural networks that predict future blood glucose levels in diabetics using sensor and self-reported data, while explaining variable importance and generating feature maps, outperforming existing methods in accuracy and interpretability.