arxiv compressed, 2024-03-25

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-25 generated by the compressor, my personal LLM-based project.


DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

http://arxiv.org/abs/2403.15389v1

Compressor summary: The paper proposes DiffusionMTL, a novel framework that uses denoising and multi-task learning to improve scene understanding from partially labeled data.


LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

http://arxiv.org/abs/2403.15388v1

Compressor summary: PruMerge is an adaptive approach to reduce visual token redundancy in large multimodal models, achieving 14.4 times compression on average without compromising performance.


LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

http://arxiv.org/abs/2403.15385v1

Compressor summary: LATTE3D is a fast and efficient method for generating high-quality 3D objects from text inputs by using a scalable architecture and leveraging 3D data during optimization.


ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

http://arxiv.org/abs/2403.15383v1

Compressor summary: ThemeStation is a new method for creating diverse and consistent 3D assets from a few examples, using both image and 3D priors.


DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

http://arxiv.org/abs/2403.15382v1

Compressor summary: DragAPart is a method that can create realistic images of objects with interactive parts, like opening drawers, by using a pre-trained image generator and a new synthetic dataset.


Long-CLIP: Unlocking the Long-Text Capability of CLIP

http://arxiv.org/abs/2403.15378v1

Compressor summary: Long-CLIP is a plug-and-play alternative to CLIP that supports long text input and maintains or improves its performance on image retrieval, text-image generation, and zero-shot classification tasks.


InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

http://arxiv.org/abs/2403.15377v1

Compressor summary: InternVideo2 is a new video foundation model that uses progressive training and large-scale data to achieve state-of-the-art performance in various video-related tasks, such as action recognition, captioning, dialogue, and long video understanding.


Can large language models explore in-context?

http://arxiv.org/abs/2403.15371v1

Compressor summary: The text explores how well large language models can explore and make decisions without additional training, finding that only GPT-4 with some specific features showed satisfactory performance, suggesting the need for algorithmic interventions in more complex scenarios.


Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

http://arxiv.org/abs/2403.15370v1

Compressor summary: ARSim is a framework that automatically enhances real images with synthetic 3D objects, improving autonomous driving systems' ability to detect diverse objects in different scenarios.


Towards Knowledge-Grounded Natural Language Understanding and Generation

http://arxiv.org/abs/2403.15364v1

Compressor summary: The thesis explores how using structured and diverse knowledge with transformer models can improve natural language understanding and generation tasks, such as fake news detection and cross-lingual transfer.


CoLLEGe: Concept Embedding Generation for Large Language Models

http://arxiv.org/abs/2403.15362v1

Compressor summary: CoLLEGe is a meta-learning framework that generates embeddings for new concepts using example sentences or definitions, improving few-shot concept learning in language models.


Learning Topological Representations for Deep Image Understanding

http://arxiv.org/abs/2403.15361v1

Compressor summary: The dissertation proposes new deep learning methods that use topological data analysis tools to improve the segmentation and uncertainty estimation of complex structures in biomedical applications.


SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

http://arxiv.org/abs/2403.15360v1

Compressor summary: SiMBA combines EinFFT for channel modeling and Mamba for sequence modeling to create a new state-of-the-art State Space Model that outperforms existing SSMs and transformers on various image and time series datasets.


Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

http://arxiv.org/abs/2403.15356v1

Compressor summary: The Dynamic One-For-All model uses neural plasticity to integrate diverse Earth observation data from multiple sensors into a single versatile Transformer that excels in various tasks, showing great adaptability and performance.


Fully automated workflow for the design of patient-specific orthopaedic implants: application to total knee arthroplasty

http://arxiv.org/abs/2403.15353v1

Compressor summary: The authors propose a fully automated workflow that uses artificial neural networks and statistical shape models to segment and reconstruct bones from CT images, and then design patient-specific implants for total knee arthroplasty in under five minutes.


Multi-Review Fusion-in-Context

http://arxiv.org/abs/2403.15351v1

Compressor summary: Key points: - The text is about a modular approach to text generation with a focus on Fusion-in-Context (FiC) as a standalone task - The input consists of source texts with highlighted spans of target content that need to be included in the output - A curated dataset and a novel evaluation framework are developed for this task Summary: The text introduces Fusion-in-Context, a modular text generation task where models generate coherent passages from source texts with highlighted information. It presents a new dataset and evaluation method for this task in the reviews domain.


Collaborative AI Teaming in Unknown Environments via Active Goal Deduction

http://arxiv.org/abs/2403.15341v1

Compressor summary: The text proposes a new framework for AI to collaborate with unknown agents by using Bayesian inverse learning and goal-conditioned policies, which improves teaming performance in various scenarios.


Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization

http://arxiv.org/abs/2403.15330v1

Compressor summary: The paper proposes SID, a text description strategy that reduces biases in text-to-image generation by using multimodal GPT-4 to create more informative descriptions.


CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction

http://arxiv.org/abs/2403.15322v1

Compressor summary: The authors present a cyber mapping dataset for German fund prospectuses that enables entity recognition and relation extraction in outsourcing contexts, with publicly available anonymized data and code.


Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

http://arxiv.org/abs/2403.15317v1

Compressor summary: The paper introduces Point-DETR3D, a teacher-student framework for weakly semi-supervised 3D detection using point annotations, which enhances positional prior and incorporates dense imagery data for better object localization.


CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking

http://arxiv.org/abs/2403.15313v1

Compressor summary: This paper introduces CR3DT, a camera-RADAR fusion model that improves 3D object detection and tracking for self-driving vehicles, by combining the advantages of cameras and RADAR sensors.


Controlled Training Data Generation with Diffusion Models

http://arxiv.org/abs/2403.15309v1

Compressor summary: The paper proposes a method called Guided Adversarial Prompts that uses two feedback mechanisms to generate training data for supervised learning using a text-to-image model, improving over previous open-loop methods.


Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

http://arxiv.org/abs/2403.15301v1

Compressor summary: The text proposes using successor features and subpolicies to learn a policy basis for solving non-Markovian reward problems with finite state automata.


Sphere Neural-Networks for Rational Reasoning

http://arxiv.org/abs/2403.15297v1

Compressor summary: Key points: - The paper proposes Sphere Neural Networks (SphNNs) as a minimalist qualitative extension of traditional neural networks for human-like reasoning - SphNNs use spheres and spatial relations to guide transformations and determine the validity of syllogistic reasoning in one epoch - SphNNs can evolve into various types of reasoning, such as spatio-temporal, logical, event, neuro-symbolic, and humour understanding Summary: The paper introduces Sphere Neural Networks (SphNNs), a new neural model that uses spheres and spatial relations to perform human-like reasoning in various domains.


Human behaviour through a LENS: How Linguistic content triggers Emotions and Norms and determines Strategy choices

http://arxiv.org/abs/2403.15293v1

Compressor summary: The LENS model proposes that linguistic descriptions in decision problems affect behaviour in economic games by triggering emotions, suggesting norms, and shaping strategic choices, and reviews experimental evidence supporting this claim.


Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions

http://arxiv.org/abs/2403.15279v1

Compressor summary: Fundus is a user-friendly news scraper that uses custom content extractors to obtain high-quality news articles from various online newspapers.


Specifying Genericity through Inclusiveness and Abstractness Continuous Scales

http://arxiv.org/abs/2403.15278v1

Compressor summary: The paper presents a new framework for annotating noun phrases' genericity in natural language, which is simple, intuitive, grounded in linguistic theory, and validated by a pilot study and an evaluation.


Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs

http://arxiv.org/abs/2403.15273v1

Compressor summary: The paper proposes a new method to improve event temporal relation extraction by using knowledge from large language models to enhance prompt templates and verbalizers, leading to better results on three datasets.


WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization

http://arxiv.org/abs/2403.15272v1

Compressor summary: WSCLoc is a system that enhances deep learning-based camera relocalization models under weakly-supervised and sparse view conditions using two stages of co-optimization.


Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models

http://arxiv.org/abs/2403.15268v1

Compressor summary: IMcQA uses imagination to generate richer context for question answering without external resources, improving performance across various settings.


Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies

http://arxiv.org/abs/2403.15267v1

Compressor summary: The authors propose a sparse, robust, and interpretable control policy for parametric partial differential equations using dictionary learning and differentiable L$_0$ regularization, which improves performance, interpretability, and generalization over traditional deep neural network-based methods.


Hyperbolic Metric Learning for Visual Outlier Detection

http://arxiv.org/abs/2403.15260v1

Compressor summary: Key points: - OOD detection is important for deep learning models in safety-critical applications - Conventional Euclidean geometry-based methods are not optimal for visual data - Hyperbolic geometry can improve OOD detection performance - Synthetic outliers do not help in Hyperbolic space - Hyperbolic embedding dimension affects OOD detection performance Summary: The paper proposes a Hyperbolic geometry-based metric framework for OOD detection in visual data, which outperforms Euclidean methods and addresses practical concerns.


Safe Learning of PDDL Domains with Conditional Effects -- Extended Version

http://arxiv.org/abs/2403.15251v1

Compressor summary: Conditional-SAM is a new algorithm that learns safe action models with conditional effects, which enable powerful planners to solve various planning problems using realistic observations.


Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach

http://arxiv.org/abs/2403.15250v1

Compressor summary: The study re-examines the factors affecting the performance of large language models (LLMs) using a comprehensive statistical methodology, revealing new insights into their emergent abilities and developmental trajectories.


Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

http://arxiv.org/abs/2403.15249v1

Compressor summary: Spectral Motion Alignment (SMA) is a framework that enhances video customization by refining and aligning motion vectors using frequency-domain regularization, mitigating spatial artifacts and improving motion transfer.


Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

http://arxiv.org/abs/2403.15248v1

Compressor summary: Key points: - Computer vision can transform farming into a data-driven, precise, and sustainable industry - Deep learning needs large annotated datasets, which are costly and time-consuming to obtain - Self-supervised learning, using SimCLR, can learn robust features from unannotated agriculture images - The approach is more cost-effective and accessible, enabling wider adoption of computer vision in agriculture Summary: The paper proposes a self-supervised learning framework that uses SimCLR to train a model on unannotated agriculture images, which can perform diverse tasks and reduce the reliance on expensive annotated data.


Reasoning-Enhanced Object-Centric Learning for Videos

http://arxiv.org/abs/2403.15245v1

Compressor summary: Object-centric learning improves machine understanding of complex scenes using a novel reasoning module called STATM, which incorporates memory buffer, spatiotemporal attention, and fusion for enhanced perception.


A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

http://arxiv.org/abs/2403.15244v1

Compressor summary: The paper proposes a fast stochastic quasi-Newton method for non-uniform smoothness in machine learning problems, achieving optimal sample complexity and convergence speedup with gradient clipping and variance reduction techniques.


IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

http://arxiv.org/abs/2403.15241v1

Compressor summary: IS-Fusion is a novel framework that fuses instance- and scene-level information for better 3D perception in autonomous driving using multimodal data.


Shadow Generation for Composite Image Using Diffusion model

http://arxiv.org/abs/2403.15234v1

Compressor summary: The paper presents a foundation model with rich prior knowledge for generating realistic shadows in image composition tasks, using ControlNet and intensity modulation modules.


LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example

http://arxiv.org/abs/2403.15227v1

Compressor summary: Our method creates highly stylized 3D face models with diverse topologies using a surface deformation network and MAGE, achieving realistic results for applications like image-based generation and animation.


Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations

http://arxiv.org/abs/2403.15218v1

Compressor summary: SAM is a foundation model that can generate segmentation masks for medical images, but its performance for training 3D DL models is not as good as ground-truth annotations when using crowd-sourced data.


GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

http://arxiv.org/abs/2403.15212v1

Compressor summary: The authors propose a new module called G-DevLSTM that improves skeleton-based action recognition by leveraging path development and Lie group structure, achieving better performance than existing methods on several datasets.


Early Period of Training Impacts Out-of-Distribution Generalization

http://arxiv.org/abs/2403.15210v1

Compressor summary: This paper explores how gradual unfreezing affects neural network performance on in-distribution and out-of-distribution tasks, using trace of Fisher Information and sharpness as indicators.


MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

http://arxiv.org/abs/2403.15209v1

Compressor summary: The text proposes a novel framework called MSCoTDet that uses large language models to understand the complementary information between RGB and thermal modalities for improved multispectral pedestrian detection.


Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion

http://arxiv.org/abs/2403.15194v1

Compressor summary: The paper introduces a fast and flexible method to generate image variations as videos for data augmentation in deep learning tasks, improving model accuracy on various datasets.


SFOD: Spiking Fusion Object Detector

http://arxiv.org/abs/2403.15192v1

Compressor summary: The paper proposes a new spiking neural network approach for efficient object detection using event cameras, achieving state-of-the-art results on two datasets.


Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study

http://arxiv.org/abs/2403.15185v1

Compressor summary: The text compares CodeGPT and UniXcoder, two language models for code completion, on the functional programming language Haskell, finding mixed results and highlighting the need for more high-quality datasets.


PDE-CNNs: Axiomatic Derivations and Applications

http://arxiv.org/abs/2403.15182v1

Compressor summary: PDE-CNNs are a type of neural network that uses evolution equations to learn geometric features, which offer benefits such as fewer parameters, better performance, and data efficiency compared to conventional CNNs.


Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement

http://arxiv.org/abs/2403.15180v1

Compressor summary: The paper proposes a new method for training neural combinatorial optimization models that simplifies the process, improves solutions progressively, and outperforms existing methods on the Job Shop Scheduling Problem.


LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

http://arxiv.org/abs/2403.15173v1

Compressor summary: LSK3DNet is an efficient and effective LiDAR perception method that uses dynamic pruning to amplify 3D kernels, reducing model size, computational cost, and improving performance on 3D vision tasks.


Exploring the Task-agnostic Trait of Self-supervised Learning in the Context of Detecting Mental Disorders

http://arxiv.org/abs/2403.15170v1

Compressor summary: The study explores using self-supervised learning to generate a task-agnostic representation for detecting major depressive disorder and post-traumatic stress disorder from audio and video data during interactive sessions.


Transition Graph Properties of Target Class Classification

http://arxiv.org/abs/2403.15167v1

Compressor summary: The text discusses a mixed classification and transition model that assigns objects to target or normal classes through iterative actions and transitions, and analyzes the structure and properties of realistic transition graphs for medical applications.


FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

http://arxiv.org/abs/2403.15161v1

Compressor summary: FastCAD is a real-time method that retrieves and aligns CAD models for all objects in a scene, improving the performance of augmented reality and robotics applications.


A Multimodal Approach for Cross-Domain Image Retrieval

http://arxiv.org/abs/2403.15152v1

Compressor summary: Key points: - The paper proposes a caption-matching approach for cross-domain image retrieval using multimodal language-vision models pre-trained on large datasets - The method achieves state-of-the-art performance on DomainNet and Office-Home datasets - The method is tested with AI-generated images from Midjourney platform Summary: The paper presents a novel caption-matching method for retrieving similar images across domains using multimodal models, which outperforms existing approaches and works well with AI-generated images.


An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning

http://arxiv.org/abs/2403.15150v1

Compressor summary: The paper presents eight data reduction methods for tabular and image datasets, along with a Python package to apply them, and evaluates their impact on efficiency and performance.


On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

http://arxiv.org/abs/2403.15146v1

Compressor summary: Adam converges faster than SGDM in non-uniformly bounded smoothness settings and has better convergence guarantees in both deterministic and stochastic scenarios.


Modular Deep Active Learning Framework for Image Annotation: A Technical Report for the Ophthalmo-AI Project

http://arxiv.org/abs/2403.15143v1

Compressor summary: MedDeepCycleAL is an easy-to-use framework that automates image annotation for medical images using deep learning and active learning, saving time and effort.


Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

http://arxiv.org/abs/2403.15139v1

Compressor summary: The paper introduces IDA-RD, a novel process-based measure to quantify image downscaling algorithms' quality based on rate-distortion theory and deep generative models for super-resolution.


CACA Agent: Capability Collaboration based AI Agent

http://arxiv.org/abs/2403.15137v1

Compressor summary: The paper introduces CACA Agent, an open architecture AI system that collaborates with different LLMs and services to enhance extensibility and functionality in various applications.


Transfer CLIP for Generalizable Image Denoising

http://arxiv.org/abs/2403.15132v1

Compressor summary: Key points: - Image denoising is a fundamental computer vision task - Deep learning methods struggle with out-of-distribution (OOD) noise - CLIP model has exceptional open-world image recognition and segmentation capabilities - Paper explores using CLIP features for generalizable denoising - Proposed method uses dense features from CLIP encoder in learnable decoder - Progressive feature augmentation strategy improves robustness Summary: The paper proposes a generalizable image denoising method that leverages dense features from the CLIP model, which have distortion-invariant and content-related properties, and uses a learnable decoder with progressive feature augmentation to handle diverse out-of-distribution noises.


Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

http://arxiv.org/abs/2403.15127v1

Compressor summary: The text introduces a new method to improve semi-supervised object detection on class imbalanced datasets by using gradient-based sampling techniques to balance the influence of majority and minority classes.


EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting

http://arxiv.org/abs/2403.15124v1

Compressor summary: EndoGSLAM is a fast and accurate SLAM method for intrabody medical imaging devices that enables high-quality tissue reconstruction and real-time visualization during endoscopic surgeries.


Quantification using Permutation-Invariant Networks based on Histograms

http://arxiv.org/abs/2403.15123v1

Compressor summary: The paper presents HistNetQ, a novel neural architecture that uses histograms for quantification problems, which outperforms existing methods and requires only prevalence values as input.


SYNCS: Synthetic Data and Contrastive Self-Supervised Training for Central Sulcus Segmentation

http://arxiv.org/abs/2403.15121v1

Compressor summary: This study develops new techniques to better identify the central sulcus in brain images, which can help understand early brain changes in children at risk of bipolar disorder and schizophrenia.


An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

http://arxiv.org/abs/2403.15119v1

Compressor summary: The paper introduces a new diverse and open-world dataset for person re-identification (ReID) research and proposes a method to improve domain generalization using latent domain expansion (LDE).


Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

http://arxiv.org/abs/2403.15115v1

Compressor summary: The text proposes a set of six maxims for effective human-AI conversation, including two new ones (benevolence and transparency) that address issues unique to AI interactions.


Text clustering with LLM embeddings

http://arxiv.org/abs/2403.15112v1

Compressor summary: This paper examines how different text embeddings and algorithms affect text clustering, finding that large language models perform well but require careful consideration of trade-offs between nuance and efficiency.


Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

http://arxiv.org/abs/2403.15108v1

Compressor summary: The paper proposes a new active learning strategy for regression using Wasserstein distance measured by GroupSort Neural Networks, which improves estimation accuracy and speed.


UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

http://arxiv.org/abs/2403.15098v1

Compressor summary: The paper introduces UniTraj, a framework that unifies various vehicle trajectory prediction datasets, models, and evaluation criteria to study their generalization and scalability, finding that larger data size and diversity improve performance.


Argument-Aware Approach To Event Linking

http://arxiv.org/abs/2403.15097v1

Compressor summary: The paper proposes an argument-aware approach for event linking that improves the recognition of event mentions and synthesizes out-of-KB training examples using controlled manipulation of event arguments.


Improved Long Short-Term Memory-based Wastewater Treatment Simulators for Deep Reinforcement Learning

http://arxiv.org/abs/2403.15091v1

Compressor summary: The text discusses challenges and solutions for using Deep Reinforcement Learning to optimize wastewater treatment processes, focusing on improving simulation accuracy by addressing compounding error.


IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence

http://arxiv.org/abs/2403.15089v1

Compressor summary: IFSENet combines few-shot segmentation and interactive segmentation to reduce the annotation effort for training segmentation models for novel classes using clicks instead of pixel-level maps.


CHisIEC: An Information Extraction Corpus for Ancient Chinese History

http://arxiv.org/abs/2403.15088v1

Compressor summary: CHisIEC is a large, diverse, and labeled dataset for NER and RE tasks in ancient Chinese historical texts, with experiments involving LLMs.


SIMAP: A simplicial-map layer for neural networks

http://arxiv.org/abs/2403.15083v1

Compressor summary: SIMAP enhances interpretability of deep learning models by using an enhanced version of Simplicial-Map Neural Networks that can work as a substitute for classic dense layers.


Cell Variational Information Bottleneck Network

http://arxiv.org/abs/2403.15082v1

Compressor summary: Key points: - Cell Variational Information Bottleneck Network (cellVIB) is a CNN that uses information bottleneck mechanism and end-to-end training - cellVIB generates feature maps with uncertainty and learns mean and standard deviation terms for each VIB cell - cellVIB performs well on MNIST, CIFAR-10, and PACS datasets, and is robust against noise and corruption - cellVIB achieves competitive results on face recognition task Summary: The paper introduces cellVIB, a CNN that learns uncertainty and variability in feature maps using information bottleneck and end-to-end training. The method shows strong performance and robustness on various image datasets and tasks.


Automated Feature Selection for Inverse Reinforcement Learning

http://arxiv.org/abs/2403.15079v1

Compressor summary: The text describes a method to learn reward functions from expert demonstrations using polynomial features and feature selection based on trajectory probabilities and correlations.


GTAGCN: Generalized Topology Adaptive Graph Convolutional Networks

http://arxiv.org/abs/2403.15077v1

Compressor summary: The paper proposes a hybrid approach based on two techniques to apply on sequenced and static graph-structured data for node and graph classification tasks.


On the Inclusion of Charge and Spin States in Cartesian Tensor Neural Network Potentials

http://arxiv.org/abs/2403.15073v1

Compressor summary: The letter introduces an improved version of TensorNet that can handle charged molecules and spin states without compromising efficiency or accuracy.


Recent Trends in 3D Reconstruction of General Non-Rigid Scenes

http://arxiv.org/abs/2403.15064v1

Compressor summary: Key points: - 3D reconstruction of real scenes is important for computer graphics and vision - Dynamic scenes are challenging and require various techniques and inputs - The report reviews state-of-the-art methods, applications, and future directions Summary: The report summarizes the latest techniques for reconstructing 3D models of dynamic real scenes using different data sources and neural representations, highlighting their applications and open challenges.


Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans

http://arxiv.org/abs/2403.15063v1

Compressor summary: The paper proposes CT-SAM3D, a 3D segmentation model that uses a nearly fully labeled dataset and two key technical developments to achieve better performance and efficiency than previous methods for whole-body CT segmentation.


MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

http://arxiv.org/abs/2403.15059v1

Compressor summary: MM-Diff is a fast and effective method for generating high-quality images with single or multiple subjects using diffusion models, text embeddings, and multimodal cross-attention.


Continual Vision-and-Language Navigation

http://arxiv.org/abs/2403.15049v1

Compressor summary: CVLN is a paradigm for training VLN agents with continual learning and rehearsal-based methods, enabling them to navigate in new environments using natural language and visual information.


Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning

http://arxiv.org/abs/2403.15048v1

Compressor summary: Our system detects visual hallucinations in cartoon character images generated by text-to-image models using pose information and vision-language models, improving accuracy over baseline methods.


DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

http://arxiv.org/abs/2403.15045v1

Compressor summary: The paper proposes the first differentially private dueling bandit algorithm that efficiently learns near-optimal actions with user preferences, achieving optimal regret bounds in both finite and infinite decision spaces.


Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

http://arxiv.org/abs/2403.15044v1

Compressor summary: The paper proposes a method that combines multimodal fusion and pre-trained model features for expression recognition and valence-arousal estimation tasks, using the Aff-Wild2 database.


LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

http://arxiv.org/abs/2403.15042v1

Compressor summary: LLM2LLM is a data augmentation strategy that uses a teacher LLM to generate synthetic data based on incorrect examples from a student LLM, improving performance in low-data regimes.


ESG Classification by Implicit Rule Learning via GPT-4

http://arxiv.org/abs/2403.15040v1

Compressor summary: This paper explores how language models like GPT-4 can be used to evaluate ESG factors without explicit training data, and shows their potential in financial tasks.


Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

http://arxiv.org/abs/2403.15033v1

Compressor summary: Key points: - The paper proposes Data Amplify Learning (DAL) and TinyBeauty, a compact makeup model that uses pixel-to-pixel supervision with few annotations. - DAL uses a Residual Diffusion Model to generate high-fidelity details and a Fine-Grained Makeup Module to achieve precise makeup control and combination. - TinyBeauty achieves state-of-the-art performance without face parsing or landmark detection, and has high inference speed on mobile devices. Summary: The paper introduces DAL and TinyBeauty, a novel approach to facial makeup that uses image amplification with minimal supervision to produce realistic and precise results, while being fast and efficient on mobile platforms.


An Integrated Neighborhood and Scale Information Network for Open-Pit Mine Change Detection in High-Resolution Remote Sensing Images

http://arxiv.org/abs/2403.15032v1

Compressor summary: The paper proposes INSINet, a deep learning method that integrates neighborhood and scale information for open-pit mine change detection in high-resolution remote sensing images, improving performance significantly.


Grey-informed neural network for time-series forecasting

http://arxiv.org/abs/2403.15027v1

Compressor summary: The study proposes a grey-informed neural network (GINN) that improves interpretability, handles small data samples, and produces reliable forecasts by following the differential equation model of the grey system.


VRSO: Visual-Centric Reconstruction for Static Object Annotation

http://arxiv.org/abs/2403.15026v1

Compressor summary: VRSO is a visual-centric approach for static object annotation in 3D space that is low cost, high efficiency, and high quality, requiring only camera images as input and minimal manual labeling.


Robust Conformal Prediction under Distribution Shift via Physics-Informed Structural Causal Model

http://arxiv.org/abs/2403.15025v1

Compressor summary: The text discusses how conformal prediction can handle uncertainty in machine learning, but sometimes suffers from coverage loss under distributional shift, and proposes a physics-informed structural causal model to improve robustness.


Insights into the Lottery Ticket Hypothesis and the Iterative Magnitude Pruning

http://arxiv.org/abs/2403.15022v1

Compressor summary: The paper explores how initialization and the iterative pruning process affect deep neural networks' generalization and training performance.


BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation

http://arxiv.org/abs/2403.15019v1

Compressor summary: BSNet is a novel method that uses simulation-assisted transformation to generate accurate pseudo-labels for 3D instance segmentation with bounding box annotations, improving the quality of weakly supervised results.


Vehicle Detection Performance in Nordic Region

http://arxiv.org/abs/2403.15017v1

Compressor summary: The paper evaluates state-of-the-art vehicle detection methods under Nordic winter conditions using a new dataset and proposes enhancements for improved performance.


Extracting Human Attention through Crowdsourced Patch Labeling

http://arxiv.org/abs/2403.15013v1

Compressor summary: The text proposes a patch-labeling method that uses AI and crowdsourcing to reduce bias in image classification by guiding the model's attention to target objects.


Empirical investigation of multi-source cross-validation in clinical machine learning

http://arxiv.org/abs/2403.15012v1

Compressor summary: The text compares two cross-validation methods for evaluating clinical prediction models on multi-source medical datasets, showing that leave-source-out cross-validation provides more reliable performance estimates than K-fold cross-validation.


Cell Tracking according to Biological Needs -- Strong Mitosis-aware Random-finite Sets Tracker with Aleatoric Uncertainty

http://arxiv.org/abs/2403.15011v1

Compressor summary: The authors propose a novel tracker that uses uncertainty estimation and mitosis-aware assignment to improve cell tracking and segmentation in microscopy time-lapse data, outperforming existing methods by a significant margin.


Clean-image Backdoor Attacks

http://arxiv.org/abs/2403.15010v1

Compressor summary: The paper proposes clean-image backdoor attacks that can inject backdoors into image classification models via a fraction of incorrect labels without modifying the training images, posing serious threats to their fairness and robustness.


TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization

http://arxiv.org/abs/2403.15009v1

Compressor summary: TexRO is a new method to create detailed textures for 3D meshes by optimizing their UV maps using a smart viewpoint selection and a recursive optimization pipeline, achieving high quality results with fast runtime speed.


Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

http://arxiv.org/abs/2403.15008v1

Compressor summary: The paper introduces a novel framework, Tri-Perspective view Decomposition (TPVD), for depth completion in autonomous driving that explicitly models 3D geometry by decomposing point clouds into three 2D views and updating features through recurrent 2D-3D-2D aggregation.


ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

http://arxiv.org/abs/2403.15004v1

Compressor summary: The ParFormer is a new transformer architecture that combines different token mixers for better feature extraction and outperforms other models in image classification tasks.


Magic for the Age of Quantized DNNs

http://arxiv.org/abs/2403.14999v1

Compressor summary: This paper proposes MaQD, a model compression technology that uses quantization-aware training and a novel normalization method to reduce the computation cost of large DNNs without significantly affecting their accuracy.


Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation

http://arxiv.org/abs/2403.14995v1

Compressor summary: The paper proposes Guidance Training, a novel auxiliary task for unsupervised domain adaptation in semantic segmentation that uses mixed data to guide the model to extract and reconstruct target-domain features without generating divergent synthetic data.


MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic Textual Relatedness

http://arxiv.org/abs/2403.14990v1

Compressor summary: MasonTigers participated in all languages and tracks of Semantic Textual Relatedness, using a combination of statistical methods, BERT, and sentence transformers to achieve various rankings.


MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection

http://arxiv.org/abs/2403.14989v1

Compressor summary: The paper describes a system that uses various methods to detect machine-generated text in multiple languages and tasks, achieving good results with ensemble models and zero-shot prompting.


Risk and Response in Large Language Models: Evaluating Key Threat Categories

http://arxiv.org/abs/2403.14988v1

Compressor summary: The paper examines how reward models can help assess risks in large language models, focusing on information hazards and finding that they are less harmful and vulnerable to attacks than other risks.


Generative Active Learning for Image Synthesis Personalization

http://arxiv.org/abs/2403.14987v1

Compressor summary: The paper explores active learning for generative models in image synthesis personalization tasks using direction-based uncertainty sampling and shows that open-source models outperform closed-source ones like Google's StyleDrop.


MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts

http://arxiv.org/abs/2403.14982v1

Compressor summary: The paper shows how different prompting techniques improve large language models' performance in natural language understanding puzzles, achieving 2nd and 13th place results.


Piecewise-Linear Manifolds for Deep Metric Learning

http://arxiv.org/abs/2403.14977v1

Compressor summary: UDML uses piecewise-linear approximations of data manifolds to estimate similarity between unlabeled points and improves zero-shot image retrieval.


AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies

http://arxiv.org/abs/2403.14974v1

Compressor summary: The paper proposes a new forgery detection method called AVT2-DWF that combines audio and visual information using dual transformers and dynamic weight fusion to enhance detection capabilities.


Trajectory Regularization Enhances Self-Supervised Geometric Representation

http://arxiv.org/abs/2403.14973v1

Compressor summary: The text introduces a new pose-estimation benchmark for evaluating self-supervised learning (SSL) in geometric tasks, and presents two methods to enhance SSL geometric representations without compromising semantic classification.


A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

http://arxiv.org/abs/2403.14972v1

Compressor summary: The paper introduces a new debating approach called Blueprint Debate on Graphs (BDoG) that tackles challenges of opinion trivialization and distractor concepts in multimodal reasoning by using a top-down, evidence-based method.


DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow

http://arxiv.org/abs/2403.14966v1

Compressor summary: The paper presents DreamFlow, a fast and high-quality text-to-3D generation method that uses a novel optimization algorithm based on probability flow and a predetermined timestep schedule.


Adapprox: Adaptive Approximation in Adam Optimization via Randomized Low-Rank Matrices

http://arxiv.org/abs/2403.14958v1

Compressor summary: Adapprox is a memory-efficient optimizer that uses randomized low-rank matrix approximation to approximate Adam's second moment, achieving better accuracy and stability in deep learning models training.


Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation

http://arxiv.org/abs/2403.14952v1

Compressor summary: The paper proposes a text generation approach called RARG that collects evidence from scientific sources and generates polite, factual counter-misinformation responses using reinforcement learning.


Simple Graph Condensation

http://arxiv.org/abs/2403.14951v1

Compressor summary: SimGC simplifies graph condensation by aligning the condensed graph with the original graph using a pre-trained SGC model, achieving up to 10 times speedup without compromising performance.


KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

http://arxiv.org/abs/2403.14950v1

Compressor summary: KnowLA is a method to integrate knowledge graph embeddings into large language models for better adaptation to downstream tasks using instruction data and parameter-efficient finetuning.


Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt

http://arxiv.org/abs/2403.14949v1

Compressor summary: D3A is a novel approach to online time series forecasting that detects and adapts to concept drifts using data augmentation and reduces errors compared to existing models.


GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

http://arxiv.org/abs/2403.14947v1

Compressor summary: The paper proposes GPT-connect, a method to generate scene-aware human motions using an existing blank-background motion generator and ChatGPT, without any additional training.


A Single Linear Layer Yields Task-Adapted Low-Rank Matrices

http://arxiv.org/abs/2403.14946v1

Compressor summary: The study investigates the relationship between initial weight matrices and low-rank matrices in Low-Rank Adaptation (LoRA) method, and proposes a new method called Conditionally Parameterized LoRA (CondLoRA) that uses a single linear layer to derive task-adapted low-rank matrices.


CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model

http://arxiv.org/abs/2403.14944v1

Compressor summary: The proposed CLIP-VQDiffusion model uses the CLIP pretraining to generate realistic images from text captions, even when the text is out of distribution.


Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

http://arxiv.org/abs/2403.14941v1

Compressor summary: The paper analyzes lane-level traffic prediction research, proposes a unified evaluation framework, and introduces a baseline model using graph structure and MLP networks, while also releasing new datasets and codes for the community.


STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

http://arxiv.org/abs/2403.14939v1

Compressor summary: STAG4D is a novel framework that combines diffusion models with dynamic 3D Gaussian splatting to generate high-fidelity 4D content with spatial-temporal consistency from various inputs.


On Zero-Shot Counterspeech Generation by LLMs

http://arxiv.org/abs/2403.14938v1

Compressor summary: The study compares four large language models' performance in zero-shot counterspeech generation and proposes three prompting strategies to improve quality and reduce toxicity.


Survey on Modeling of Articulated Objects

http://arxiv.org/abs/2403.14937v1

Compressor summary: The text surveys the current state-of-the-art in 3D modeling of articulated objects, focusing on part perception and creation, and discusses geometry processing and articulation modeling challenges and future directions.


Attention-Driven Reasoning: Unlocking the Potential of Large Language Models

http://arxiv.org/abs/2403.14932v1

Compressor summary: The paper introduces a method to improve large language models' reasoning by optimizing their attention mechanisms without extra data, focusing on non-STEM questions.


CODA: A COst-efficient Test-time Domain Adaptation Mechanism for HAR

http://arxiv.org/abs/2403.14922v1

Compressor summary: CODA is a cost-efficient mobile sensing adaptation mechanism that uses active learning to handle real-time drifts and improve human activity recognition.


Hierarchical Skip Decoding for Efficient Autoregressive Text Generation

http://arxiv.org/abs/2403.14919v1

Compressor summary: Hierarchical Skip Decoding is a new method for efficient text generation that adapts to the sequence length and preserves most of the quality.


Deep learning-based method for weather forecasting: A case study in Itoshima

http://arxiv.org/abs/2403.14918v1

Compressor summary: The research presents a new multilayer perceptron model that outperforms other models in weather forecasting in Itoshua, Kyushu, Japan.


Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective

http://arxiv.org/abs/2403.14917v1

Compressor summary: The paper investigates how two-layer neural networks learn features using kernel methods in a mean-field regime, showing their advantages over traditional kernel methods and discussing convergence, error, and regularization aspects.


Defying Imbalanced Forgetting in Class Incremental Learning

http://arxiv.org/abs/2403.14910v1

Compressor summary: The text describes a new imbalance phenomenon in Class Incremental Learning and proposes a method called CLAD to address it by enhancing the accuracy of forgotten classes.


Web-based Melanoma Detection

http://arxiv.org/abs/2403.14898v1

Compressor summary: The study presents a fast and accurate melanoma detection method that works well on various datasets and deep learning models, enabling efficient skin cancer screening.


Geometric Generative Models based on Morphological Equivariant PDEs and GANs

http://arxiv.org/abs/2403.14897v1

Compressor summary: Key points: - The text proposes a geometric generative model based on an equivariant PDE for G-CNNs, called PDE-G-CNNs - The model uses morphology operators and GANs to extract specific features and reduce complexity - The model is equivariant under group symmetries and has multiscale geometric interpretability - The model performs better than classical GAN on MNIST data Summary: The text introduces a novel geometric generative model that uses an equivariant PDE for G-CNNs, morphology operators, and GANs to create images with specific features and reduced complexity, while preserving group symmetries and multiscale structures. The model shows superior performance on MNIST data.


Stance Reasoner: Zero-Shot Stance Detection on Social Media with Explicit Reasoning

http://arxiv.org/abs/2403.14895v1

Compressor summary: Stance Reasoner is a method that uses background knowledge and reasoning to detect opinions on new topics without training data.