arxiv compressed, 2024-02-09

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-09 generated by the compressor, my personal LLM-based project.


InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

http://arxiv.org/abs/2402.05937v1

Compressor summary: Key points: - Paper introduces InstaGen, a diffusion model enhanced with an instance-level grounding head - InstaGen can generate synthetic images with arbitrary instances for object detection - InstaGen improves object detector performance in open-vocabulary and data-sparse scenarios Summary: The paper presents InstaGen, a diffusion model that can generate realistic images with any objects and improve object detection by aligning text and visual features.


SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

http://arxiv.org/abs/2402.05935v1

Compressor summary: SPHINX-X is a large multimodal language model that improves on SPHINX by modifying its architecture, using diverse datasets, and training on different base models.


Classifying Nodes in Graphs without GNNs

http://arxiv.org/abs/2402.05934v1

Compressor summary: The authors propose a node classification method that does not use graph neural networks (GNNs) at any stage of training or testing, achieving state-of-the-art accuracy on popular benchmarks using smoothness constraints, pseudo-labeling and neighborhood histograms.


Time Series Diffusion in the Frequency Domain

http://arxiv.org/abs/2402.05933v1

Compressor summary: The paper explores how representing time series in the frequency domain can improve score-based diffusion models for generative modelling and shows empirical evidence that frequency diffusion models perform better than time diffusion models.


An Interactive Agent Foundation Model

http://arxiv.org/abs/2402.05929v1

Compressor summary: The Interactive Agent Foundation Model trains AI agents using diverse pre-training strategies to perform well in various applications like robotics, gaming, and healthcare.


WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

http://arxiv.org/abs/2402.05930v1

Compressor summary: The authors introduce WEBLINX, a benchmark for conversational web navigation tasks, and evaluate different models on it, finding that finetuned multimodal models perform best but still struggle with generalization to new websites.


Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

http://arxiv.org/abs/2402.05928v1

Compressor summary: This paper studies learning with dependent data using a special class of functions and shows how to minimize the risk without depending on the mixing time of the data.


Collaborative Control for Geometry-Conditioned PBR Image Generation

http://arxiv.org/abs/2402.05919v1

Compressor summary: The paper proposes a method to generate physically-based rendering (PBR) images directly without relying on RGB images, using a novel cross-network communication paradigm.


Point-VOS: Pointing Up Video Object Segmentation

http://arxiv.org/abs/2402.05917v1

Compressor summary: The paper introduces Point-VOS, a video object segmentation method that reduces annotation effort by using sparse point-wise annotations instead of dense masks.


GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

http://arxiv.org/abs/2402.05916v1

Compressor summary: GenEFT is a framework that uses physics-inspired concepts to study neural network generalization, showing how data size, decoder strength, and learning rates affect the balance between generalization and overfitting.


Efficient Stagewise Pretraining via Progressive Subnetworks

http://arxiv.org/abs/2402.05913v1

Compressor summary: The proposed progressive subnetwork training framework trains smaller subsets of layers in a large language model at each step, achieving better pre-training loss, fewer FLOPs, and improved downstream performance compared to standard training or gradual stacking methods.


Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games

http://arxiv.org/abs/2402.05906v1

Compressor summary: The paper proposes a risk-sensitive reinforcement learning algorithm for non-cooperative multi-agent settings based on cumulative prospect theory, which can capture human loss aversion and probabilistic bias.


FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs

http://arxiv.org/abs/2402.05904v1

Compressor summary: FACT-GPT is a system that uses large language models to help fact-check claims on social media, speeding up the process and improving accuracy.


ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation

http://arxiv.org/abs/2402.05902v1

Compressor summary: ClickSAM is a method that improves the Segment Anything Model's ability to segment ultrasound images by fine-tuning it with click prompts from human annotators.


Large Language Model Meets Graph Neural Network in Knowledge Distillation

http://arxiv.org/abs/2402.05894v1

Compressor summary: LinguGKD is a novel framework that uses language models as teachers and graph neural networks as students to improve node classification in text-attributed graphs, achieving faster inference speed and better performance than traditional methods.


Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

http://arxiv.org/abs/2402.05892v1

Compressor summary: Mamba-ND extends Mamba's state space models to handle arbitrary multi-dimensional data with improved efficiency compared to Transformers and other alternatives.


CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

http://arxiv.org/abs/2402.05889v1

Compressor summary: CREMA is a modular framework that injects new modalities into video reasoning tasks by using existing pre-trained models, query transformers, and fusion modules to efficiently integrate diverse data types for response generation.


EUGENE: Explainable Unsupervised Approximation of Graph Edit Distance

http://arxiv.org/abs/2402.05885v1

Compressor summary: EUGENE is an efficient algebraic method that approximates Graph Edit Distance and provides explanatory edit paths without requiring ground truth or data-specific training, achieving high accuracy in comparison to existing neural approaches.


Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking

http://arxiv.org/abs/2402.05880v1

Compressor summary: LLM-based conversational search may increase selective exposure and reinforce biased opinions, which could have significant implications for users and policymakers.


Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

http://arxiv.org/abs/2402.05869v1

Compressor summary: The paper proposes a method to learn depth and surface normal from images using geometric context, which improves the accuracy of 3D geometry estimation and outperforms existing methods on various datasets.


PromptCrypt: Prompt Encryption for Secure Communication with Large Language Models

http://arxiv.org/abs/2402.05868v1

Compressor summary: PromptCrypt is a mechanism that encrypts user inputs with Emoji to protect privacy while using cloud-based large language models without compromising their performance.


Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

http://arxiv.org/abs/2402.05864v1

Compressor summary: The paper introduces Permute-and-Flip decoder, a faster and more robust method for large language model decoding with low false positive rate and high recall, and a tailored watermarking scheme to protect the generated text.


How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

http://arxiv.org/abs/2402.05863v1

Compressor summary: The paper introduces NegotiationArena, a framework for evaluating how large language models (LLMs) can negotiate with each other in various scenarios, revealing their negotiation tactics, outcomes, and irrationalities.


Let Your Graph Do the Talking: Encoding Structured Data for LLMs

http://arxiv.org/abs/2402.05862v1

Compressor summary: GraphToken is a method to represent structured data in language models, improving their performance on various reasoning tasks.


Memory Consolidation Enables Long-Context Video Understanding

http://arxiv.org/abs/2402.05861v1

Compressor summary: The memory-consolidated vision transformer (MC-ViT) extends the context of video understanding far into the past by fine-tuning pre-trained video transformers to attend to non-parametrically derived memories, outperforming methods with more parameters.


Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

http://arxiv.org/abs/2402.05860v1

Compressor summary: Key points: - DNNs for semantic segmentation in robot-assisted surgery have catastrophic forgetting problem - Privacy-preserving synthetic continual framework blends open-source old instruments with new ones and synthesized background - Overlapping class-aware temperature normalization (CAT) and multi-scale shifted-feature distillation (SD) techniques improve knowledge transfer Summary: The authors propose a privacy-preserving synthetic continual framework that blends open-source old instruments with new ones and synthesized background to address the catastrophic forgetting problem in DNNs for semantic segmentation in robot-assisted surgery. They also use CAT and SD techniques to enhance knowledge transfer.


RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

http://arxiv.org/abs/2402.05628v1

Compressor summary: RepQuant is a novel post-training quantization method that uses complex quantizers for accurate compression and simple quantizers for efficient inference in large transformer models, achieving better performance than existing methods.


Binding Dynamics in Rotating Features

http://arxiv.org/abs/2402.05627v1

Compressor summary: The paper introduces a new mechanism called "cosine binding" that improves understanding of how Rotating Features learn object-centric representations in machine learning, similar to human cognition.


The Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

http://arxiv.org/abs/2402.05626v1

Compressor summary: The paper studies the loss landscape and stationarity conditions for one-hidden-layer neural networks with non-differentiable ReLU-like activation functions, showing how escape neurons affect the training process and network embedding.


Efficient Models for the Detection of Hate, Abuse and Profanity

http://arxiv.org/abs/2402.05624v1

Compressor summary: The text discusses the problem of hateful, abusive, and profane content in large language models trained on web data and the need for detecting and filtering it for creating civil and unbiased models.


Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

http://arxiv.org/abs/2402.05617v1

Compressor summary: This survey provides an overview of deep learning methods, datasets, and terminologies in NLP-driven skill extraction and classification for computational job market analysis.


Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks

http://arxiv.org/abs/2402.05616v1

Compressor summary: Key points: - Small pretrained generative language models can be used as a general learning framework for sequence-based tasks - Instruction fine-tuning with many examples improves performance on challenging cheminformatics tasks - Data formatting and pretrained model selection are important factors for instruction fine-tuning success Summary: The authors show how to use small pretrained language models as a general learning framework for sequence-based tasks, and demonstrate the benefits of instruction fine-tuning with many examples on cheminformatics tasks. They also highlight the role of data formatting and pretrained model selection.


DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation

http://arxiv.org/abs/2402.05615v1

Compressor summary: The paper introduces DAPlankton, a new dataset for plankton recognition, which consists of phytoplankton images from different instruments and helps develop domain adaptation methods to overcome instrument-induced domain shifts.


Extending 6D Object Pose Estimators for Stereo Vision

http://arxiv.org/abs/2402.05610v1

Compressor summary: The authors propose a stereo vision method for 6D object pose estimation that uses dense features and outperforms existing methods.


Scalable Diffusion Models with State Space Backbone

http://arxiv.org/abs/2402.05608v1

Compressor summary: The paper introduces a new type of diffusion model for image generation that uses state space architecture, outperforming or matching CNN-based and Transformer-based models while being more scalable and efficient.


Optimizing Delegation in Collaborative Human-AI Hybrid Teams

http://arxiv.org/abs/2402.05605v1

Compressor summary: Key points: - The text proposes an AI manager for hybrid teams of humans and autonomous systems - The manager uses Reinforcement Learning to select the best control agent based on performance and environment - The manager minimizes intervention by avoiding constraint violations - The text demonstrates the manager's effectiveness in a simulated driving scenario Summary: The text introduces an AI manager that learns to choose the best human or autonomous driver for hybrid teams, based on performance and environment, while minimizing intervention and improving team performance.


AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

http://arxiv.org/abs/2402.05602v1

Compressor summary: Our method accurately attributes both inputs and latent representations of transformer models with efficient computation, enabling better understanding and concept-based explanations.


A Concept for Reconstructing Stucco Statues from historic Sketches using synthetic Data only

http://arxiv.org/abs/2402.05593v1

Compressor summary: The text describes a method to reconstruct medieval statues from their original red sketches using automated techniques and synthetic data.


SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

http://arxiv.org/abs/2402.05591v1

Compressor summary: The paper introduces a technique to improve text data augmentation by using soft labels, which preserves the original meaning and enhances model performance in seven classification tasks.


RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner

http://arxiv.org/abs/2402.05589v1

Compressor summary: RESMatch is a new semi-supervised learning approach for referring expression segmentation that significantly improves performance by adapting to challenges in understanding free-form linguistic descriptions and object attributes variability.


AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

http://arxiv.org/abs/2402.05584v1

Compressor summary: The paper proposes using AutoAugment to improve data augmentation for text tasks, addressing the challenges of rule-based methods and enhancing pre-trained language models.


Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

http://arxiv.org/abs/2402.05581v1

Compressor summary: The study uses an unsupervised method with ABX tests to analyze how well vector representations of speech capture extra-linguistic and linguistic characteristics in low-resource language research.


Digital Computers Break the Curse of Dimensionality: Adaptive Bounds via Finite Geometry

http://arxiv.org/abs/2402.05576v1

Compressor summary: The text discusses how digital computers' finite grids affect machine learning models and proposes new generalization bounds for kernel and deep ReLU MLP regressors, using a non-asymptotic concentration of measure result.


Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

http://arxiv.org/abs/2402.05575v1

Compressor summary: Bi-Level Fairness is a new approach for stochastic multi-armed bandits that ensures fair exposure to groups of arms and within-group meritocratic allocation, achieving optimal regret bounds and sub-linear regret with BF-UCB algorithm.


Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)-Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

http://arxiv.org/abs/2402.05571v1

Compressor summary: The study used machine learning and deep learning models to classify tweets related to eating disorders, finding that transformer-based bidirectional encoder representations performed best.


Hypergraph Node Classification With Graph Neural Networks

http://arxiv.org/abs/2402.05569v1

Compressor summary: The paper proposes WCE-GNN, a framework that combines graph neural networks with weighted clique expansion, for hypergraph node classification, achieving higher accuracy and efficiency than existing methods.


Succint Interaction-Aware Explanations

http://arxiv.org/abs/2402.05566v1

Compressor summary: The paper proposes a method to combine SHAP and NSHAP, two approaches for explaining black-box models, by partitioning features into interacting sets and generating an interpretable explanation with fewer false interactions.


On Convolutional Vision Transformers for Yield Prediction

http://arxiv.org/abs/2402.05557v1

Compressor summary: The Convolution vision Transformer (CvT) is a new method for yield prediction on remote sensing data that combines convolution and attention, but currently lags behind other approaches.


Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction

http://arxiv.org/abs/2402.05548v1

Compressor summary: The study evaluates different classifiers for assessing expression neutrality in face images and its impact on face recognition performance.


Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset

http://arxiv.org/abs/2402.05547v1

Compressor summary: ChatCoach is a system that uses NLP and AI to help inexperienced doctors practice medical communication skills with a patient agent and receive real-time feedback from a coaching agent, using Llama2 as the most effective model.


Offline Actor-Critic Reinforcement Learning Scales to Large Models

http://arxiv.org/abs/2402.05546v1

Compressor summary: Offline actor-critic reinforcement learning can scale to large models like transformers and outperform behavioral cloning for multi-task training on continuous control tasks using a Perceiver-based model with self- and cross-attention modules.


Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data

http://arxiv.org/abs/2402.05545v1

Compressor summary: The paper presents an NER model based on SlovakBERT that extracts address parts from speech-to-text transcriptions, and shows its effectiveness when trained on synthetic data generated with GPT API.


Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

http://arxiv.org/abs/2402.05536v1

Compressor summary: The authors propose a hybrid approach that combines knowledge graphs with AI to enhance the categorization of social media posts, particularly for identifying eating disorder-related content to aid in early diagnosis.


NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

http://arxiv.org/abs/2402.05532v1

Compressor summary: The paper presents a new framework called Neural Contact Radiance Field (NCRF) to reconstruct hand-object interactions from sparse videos, using a contact optimization field and a hand-object neural radiance field to achieve photo-realistic novel view synthesis and accurate pose estimation.


Differentially Private Model-Based Offline Reinforcement Learning

http://arxiv.org/abs/2402.05525v1

Compressor summary: The paper proposes DP-MORL, a method for training private reinforcement learning agents from offline data using differentially private neural networks and model-based policy optimization.


Linearizing Models for Efficient yet Robust Private Inference

http://arxiv.org/abs/2402.05521v1

Compressor summary: RLNet is a robust and efficient model that reduces latency by minimizing ReLU operations, improving performance on clean and corrupted images in client-server applications with data privacy concerns.


NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning

http://arxiv.org/abs/2402.05515v1

Compressor summary: NoisyICL perturbs language models to improve in-context learning performance, calibration, fairness, and confidence.


GPTs Are Multilingual Annotators for Sequence Generation Tasks

http://arxiv.org/abs/2402.05512v1

Compressor summary: The study introduces a cost-efficient method of data annotation for low-resource languages using large language models, and shares an image captioning dataset and the source code.


Heart disease risk prediction using deep learning techniques with feature augmentation

http://arxiv.org/abs/2402.05495v1

Compressor summary: Key points: - Cardiovascular diseases are a major cause of death and hard to diagnose with many variables involved - Deep learning methods combined with feature augmentation techniques can evaluate patients' risk better than existing methods - The proposed methods achieve 90% precision, a significant improvement for early detection and prevention Summary: The authors propose using deep learning and feature augmentation to improve cardiovascular disease risk assessment, achieving 90% precision and potentially saving lives.


Determining the severity of Parkinson's disease in patients using a multi task neural network

http://arxiv.org/abs/2402.05491v1

Compressor summary: The paper proposes a non-intrusive voice analysis method using deep learning techniques to diagnose and monitor severe or non-severe Parkinson's disease, achieving high success rates and outperforming previous approaches.


Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

http://arxiv.org/abs/2402.05476v1

Compressor summary: A novel model-free ensemble reinforcement learning algorithm adapts classical Q-learning to solve network control problems more efficiently and accurately in unknown environments.


Question Aware Vision Transformer for Multimodal Reasoning

http://arxiv.org/abs/2402.05472v1

Compressor summary: QA-ViT is a Question Aware Vision Transformer that improves multimodal reasoning by embedding question awareness in the vision encoder, resulting in better visual features tailored to image questions.


Implicit Diffusion: Efficient Optimization through Stochastic Sampling

http://arxiv.org/abs/2402.05468v1

Compressor summary: The new algorithm optimizes distributions from parameterized stochastic diffusions by jointly performing optimization and sampling steps, leveraging advances in bilevel optimization and automatic implicit differentiation.


Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

http://arxiv.org/abs/2402.05467v1

Compressor summary: RIPPLE is a new method that optimizes jailbreaking prompts for language models using subconsciousness and echopraxia, achieving high success rates and evading detection.


It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

http://arxiv.org/abs/2402.05457v1

Compressor summary: The text describes a new method, Uncertainty-Aware Dynamic Fusion (UADF), that improves generative error correction in automatic speech recognition by incorporating acoustic information into the language model's output.


Large Language Models for Psycholinguistic Plausibility Pretesting

http://arxiv.org/abs/2402.05455v1

Compressor summary: The study examines if Language Models can replace human evaluators in psycholinguistics by generating plausibility judgments for linguistic materials and finds that they work well for coarse-grained judgments but not for fine-grained ones.


Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

http://arxiv.org/abs/2402.05453v1

Compressor summary: The Convex-Concave Loss method enhances privacy and utility in machine learning models by reducing loss variance and increasing the variability of training losses with a concave term.


Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application

http://arxiv.org/abs/2402.05448v1

Compressor summary: The paper introduces Minecraft-ify, a system that generates face-focused textures for 3D virtual characters in the Minecraft game using AI techniques like StyleGAN and StyleCLIP.


Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

http://arxiv.org/abs/2402.05445v1

Compressor summary: IR-QLoRA improves the accuracy of compact LLMs through information retention using statistics-based quantization and finetuning-based elastic connections, with minimal additional time consumption.


Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport

http://arxiv.org/abs/2402.05443v1

Compressor summary: The paper introduces a scalable Wasserstein Gradient Flow model that reduces training complexity and achieves competitive performance in image generation.


Spiking Neural Network Enhanced Hand Gesture Recognition Using Low-Cost Single-photon Avalanche Diode Array

http://arxiv.org/abs/2402.05441v1

Compressor summary: The paper presents a compact spiking neural network that recognizes hand gestures in different light conditions using photon intensity data and compares its performance with a conventional CNN and an SMLP.


Improving Agent Interactions in Virtual Environments with Language Models

http://arxiv.org/abs/2402.05440v1

Compressor summary: The text discusses improving AI communication skills by using language models to enhance task understanding in Minecraft dataset and shows better results than previous methods.


Learning Uncertainty-Aware Temporally-Extended Actions

http://arxiv.org/abs/2402.05439v1

Compressor summary: UTE is a novel algorithm that uses uncertainty measurement to improve reinforcement learning with action repetition by balancing exploration and exploitation.


GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study

http://arxiv.org/abs/2402.05435v1

Compressor summary: The study evaluates GPT-4's ability to generate coherent narratives about life events and develops Machine Learning models to classify the generated narratives as valid or invalid.


Mixture Density Networks for Classification with an Application to Product Bundling

http://arxiv.org/abs/2402.05428v1

Compressor summary: The paper proposes two models based on mixture density networks (MDNs) for classification tasks, which fit Gaussian mixtures to data and use them to classify samples by evaluating the cumulative distribution function. The models perform well in a real-world product bundling application.


A Sampling Theory Perspective on Activations for Implicit Neural Representations

http://arxiv.org/abs/2402.05427v1

Compressor summary: The paper analyzes implicit neural representations using sampling theory and shows that sinc activations and dynamical systems can improve signal encoding.


Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

http://arxiv.org/abs/2402.05424v1

Compressor summary: Neural circuit diagrams are a new graphical language that improves communication of deep learning architectures by precisely showing data arrangement, operations, and parallel behavior, enabling better implementation, analysis, innovation, and ethical assurance.


MTSA-SNN: A Multi-modal Time Series Analysis Model Based on Spiking Neural Network

http://arxiv.org/abs/2402.05423v1

Compressor summary: The paper proposes a new spiking neural network model that can handle complex, non-stationary time series data and achieves better performance on three tasks.


DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

http://arxiv.org/abs/2402.05421v1

Compressor summary: DiffTOP is a new method for deep reinforcement and imitation learning that uses differentiable trajectory optimization to learn cost and dynamics functions end-to-end, achieving state-of-the-art results on various robotic tasks.


Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification

http://arxiv.org/abs/2402.05417v1

Compressor summary: The paper proposes a new text captcha classification model using connectionist temporal classification loss that achieves high accuracy and handles complex captchas.


SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector

http://arxiv.org/abs/2402.05410v1

Compressor summary: SpirDet is a novel approach that uses a dual-branch sparse decoder and a lightweight DO-RepEncoder to efficiently detect small infrared targets, achieving faster inference speed and fewer parameters than existing models.


MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

http://arxiv.org/abs/2402.05408v1

Compressor summary: The Multi-Instance Generation (MIG) task involves generating diverse instances at designated locations with accurate attributes, and a new approach called MIGC is proposed to tackle this challenge using instance enhancement attention and stable diffusion.


Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

http://arxiv.org/abs/2402.05406v1

Compressor summary: The authors propose Bonsai, a method to prune large language models using only forward passes, which results in small, fast, and accurate models that outperform existing methods.


In-Context Principle Learning from Mistakes

http://arxiv.org/abs/2402.05403v1

Compressor summary: LEAP is a method to improve LLMs' performance on various tasks by learning from mistakes and general principles derived from few input-output examples, outperforming standard few-shot prompting approaches.


Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

http://arxiv.org/abs/2402.05401v1

Compressor summary: This paper investigates how different types of adaptive activation functions affect the accuracy and uncertainty of neural networks in settings with limited data, finding that individual trainable parameters lead to better results than fixed or identical ones.


Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions

http://arxiv.org/abs/2402.05400v1

Compressor summary: Loss Conditional Training (LCT) improves imbalanced binary classification by training over a family of loss functions, making models more general and robust to hyperparameter choices.


On the Effect of Image Resolution on Semantic Segmentation

http://arxiv.org/abs/2402.05398v1

Compressor summary: The study presents a streamlined model that can produce high-resolution semantic segmentations without downscaling images or losing details, improving performance using bottom-up information propagation and Noisy Student Training.


TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

http://arxiv.org/abs/2402.05396v1

Compressor summary: TASER is a novel adaptive sampling method for Temporal Graph Neural Networks that improves accuracy, efficiency, and scalability by selecting optimal mini-batches and temporal neighbors based on various properties of the graph data.


Enhancing Zero-shot Counting via Language-guided Exemplar Learning

http://arxiv.org/abs/2402.05394v1

Compressor summary: ExpressCount is a novel method that uses language-guided exemplar learning to improve zero-shot object counting efficiency and generality by exploiting semantic priors from pre-trained Large Language Models and enhancing similarity learning with dual-branch and cross-attention schemes.


Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

http://arxiv.org/abs/2402.05391v1

Compressor summary: The text surveys over 300 articles on Knowledge Graphs (KGs) and Multi-Modal Knowledge Graphs (MMKGs), exploring their construction, tasks, challenges, and emerging trends in AI applications.


Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

http://arxiv.org/abs/2402.05382v1

Compressor summary: MoCE is a novel MAE-based pre-training method that uses cluster-conditional gates to train each expert with semantically relevant images, improving performance on diverse downstream tasks.


Tradeoffs of Diagonal Fisher Information Matrix Estimators

http://arxiv.org/abs/2402.05379v1

Compressor summary: The paper explores two methods for estimating the Fisher information matrix in neural networks, analyzing their accuracy, sample complexity, and trade-offs based on variance dependencies and nonlinearities.


Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

http://arxiv.org/abs/2402.05376v1

Compressor summary: The paper proposes a new zero-shot prompting method for large language models that uses evolutionary algorithms to generate diverse prompts and improve reasoning performance on different tasks.


Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

http://arxiv.org/abs/2402.05375v1

Compressor summary: The paper proposes two methods to improve text-to-image diffusion models by removing unwanted content from text embeddings, enhancing their ability to generate images as described in the prompt.


CIC: A framework for Culturally-aware Image Captioning

http://arxiv.org/abs/2402.05374v1

Compressor summary: The paper introduces Culturally-aware Image Captioning, a framework that generates detailed captions describing cultural elements in images, such as traditional clothing from Asian cultures.


Attention as Robust Representation for Time Series Forecasting

http://arxiv.org/abs/2402.05370v1

Compressor summary: The authors propose an attention map structure for transformer-based models to improve multivariate time series forecasting accuracy by leveraging temporal relationships and robust kernel representation.


Noise Contrastive Alignment of Language Models with Explicit Rewards

http://arxiv.org/abs/2402.05369v1

Compressor summary: The paper proposes a general framework for aligning language models using Noise Contrastive Estimation and introduces two algorithms, NCA and InfoNCA, that handle explicit evaluation rewards and preferences.


Principled Preferential Bayesian Optimization

http://arxiv.org/abs/2402.05367v1

Compressor summary: The paper proposes an optimistic algorithm for preferential Bayesian optimization using preference feedback and a confidence set of the black-box function, with an information-theoretic bound on the cumulative regret and a guaranteed convergence rate.


Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving

http://arxiv.org/abs/2402.05359v1

Compressor summary: Our method improves LLM's ability to handle tasks with repetitive sub-tasks and deceptive contents using a Divide-and-Conquer program that enhances expressive power and avoids intermediate errors.


Exploring Learning Complexity for Downstream Data Pruning

http://arxiv.org/abs/2402.05356v1

Compressor summary: Key points: - The paper proposes a new scoring function based on learning complexity to prune informative samples for fine-tuning over-parameterized models. - The method is efficient and effective for classification tasks and outperforms full training for instruction fine-tuning of language models. Summary: The paper introduces a learning complexity-based scoring function for data pruning in fine-tuning over-parameterized models, achieving high performance and efficiency for both vision and language tasks.


Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

http://arxiv.org/abs/2402.05350v1

Compressor summary: The paper introduces DESCAN-18K, a dataset for image restoration from scanned copies with complex degradations, and proposes DescanDiffusion, a model that uses an encoder and a conditional denoising diffusion probabilistic model to restore high-quality images.


Scrapping The Web For Early Wildfire Detection

http://arxiv.org/abs/2402.05349v1

Compressor summary: Pyro is a web-scraped dataset of annotated wildfire videos for early detection and rapid response, improving object detection models when combined with other datasets.


KIX: A Metacognitive Generalization Framework

http://arxiv.org/abs/2402.05346v1

Compressor summary: The KIX framework helps artificial agents learn generalist behaviors by interacting with objects and using type space to acquire transferable interaction concepts.