This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-03-25 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2403.15389v1
Compressor summary: The paper proposes DiffusionMTL, a novel framework that uses denoising and multi-task learning to improve scene understanding from partially labeled data.
http://arxiv.org/abs/2403.15388v1
Compressor summary: PruMerge is an adaptive approach to reduce visual token redundancy in large multimodal models, achieving 14.4 times compression on average without compromising performance.
http://arxiv.org/abs/2403.15385v1
Compressor summary: LATTE3D is a fast and efficient method for generating high-quality 3D objects from text inputs by using a scalable architecture and leveraging 3D data during optimization.
http://arxiv.org/abs/2403.15383v1
Compressor summary: ThemeStation is a new method for creating diverse and consistent 3D assets from a few examples, using both image and 3D priors.
http://arxiv.org/abs/2403.15382v1
Compressor summary: DragAPart is a method that can create realistic images of objects with interactive parts, like opening drawers, by using a pre-trained image generator and a new synthetic dataset.
http://arxiv.org/abs/2403.15378v1
Compressor summary: Long-CLIP is a plug-and-play alternative to CLIP that supports long text input and maintains or improves its performance on image retrieval, text-image generation, and zero-shot classification tasks.
http://arxiv.org/abs/2403.15377v1
Compressor summary: InternVideo2 is a new video foundation model that uses progressive training and large-scale data to achieve state-of-the-art performance in various video-related tasks, such as action recognition, captioning, dialogue, and long video understanding.
http://arxiv.org/abs/2403.15371v1
Compressor summary: The text explores how well large language models can explore and make decisions without additional training, finding that only GPT-4 with some specific features showed satisfactory performance, suggesting the need for algorithmic interventions in more complex scenarios.
http://arxiv.org/abs/2403.15370v1
Compressor summary: ARSim is a framework that automatically enhances real images with synthetic 3D objects, improving autonomous driving systems' ability to detect diverse objects in different scenarios.
http://arxiv.org/abs/2403.15364v1
Compressor summary: The thesis explores how using structured and diverse knowledge with transformer models can improve natural language understanding and generation tasks, such as fake news detection and cross-lingual transfer.
http://arxiv.org/abs/2403.15362v1
Compressor summary: CoLLEGe is a meta-learning framework that generates embeddings for new concepts using example sentences or definitions, improving few-shot concept learning in language models.
http://arxiv.org/abs/2403.15361v1
Compressor summary: The dissertation proposes new deep learning methods that use topological data analysis tools to improve the segmentation and uncertainty estimation of complex structures in biomedical applications.
http://arxiv.org/abs/2403.15360v1
Compressor summary: SiMBA combines EinFFT for channel modeling and Mamba for sequence modeling to create a new state-of-the-art State Space Model that outperforms existing SSMs and transformers on various image and time series datasets.
http://arxiv.org/abs/2403.15356v1
Compressor summary: The Dynamic One-For-All model uses neural plasticity to integrate diverse Earth observation data from multiple sensors into a single versatile Transformer that excels in various tasks, showing great adaptability and performance.
http://arxiv.org/abs/2403.15353v1
Compressor summary: The authors propose a fully automated workflow that uses artificial neural networks and statistical shape models to segment and reconstruct bones from CT images, and then design patient-specific implants for total knee arthroplasty in under five minutes.
http://arxiv.org/abs/2403.15351v1
Compressor summary: Key points: - The text is about a modular approach to text generation with a focus on Fusion-in-Context (FiC) as a standalone task - The input consists of source texts with highlighted spans of target content that need to be included in the output - A curated dataset and a novel evaluation framework are developed for this task Summary: The text introduces Fusion-in-Context, a modular text generation task where models generate coherent passages from source texts with highlighted information. It presents a new dataset and evaluation method for this task in the reviews domain.
http://arxiv.org/abs/2403.15341v1
Compressor summary: The text proposes a new framework for AI to collaborate with unknown agents by using Bayesian inverse learning and goal-conditioned policies, which improves teaming performance in various scenarios.
http://arxiv.org/abs/2403.15330v1
Compressor summary: The paper proposes SID, a text description strategy that reduces biases in text-to-image generation by using multimodal GPT-4 to create more informative descriptions.
http://arxiv.org/abs/2403.15322v1
Compressor summary: The authors present a cyber mapping dataset for German fund prospectuses that enables entity recognition and relation extraction in outsourcing contexts, with publicly available anonymized data and code.
http://arxiv.org/abs/2403.15317v1
Compressor summary: The paper introduces Point-DETR3D, a teacher-student framework for weakly semi-supervised 3D detection using point annotations, which enhances positional prior and incorporates dense imagery data for better object localization.
http://arxiv.org/abs/2403.15313v1
Compressor summary: This paper introduces CR3DT, a camera-RADAR fusion model that improves 3D object detection and tracking for self-driving vehicles, by combining the advantages of cameras and RADAR sensors.
http://arxiv.org/abs/2403.15309v1
Compressor summary: The paper proposes a method called Guided Adversarial Prompts that uses two feedback mechanisms to generate training data for supervised learning using a text-to-image model, improving over previous open-loop methods.
http://arxiv.org/abs/2403.15301v1
Compressor summary: The text proposes using successor features and subpolicies to learn a policy basis for solving non-Markovian reward problems with finite state automata.
http://arxiv.org/abs/2403.15297v1
Compressor summary: Key points: - The paper proposes Sphere Neural Networks (SphNNs) as a minimalist qualitative extension of traditional neural networks for human-like reasoning - SphNNs use spheres and spatial relations to guide transformations and determine the validity of syllogistic reasoning in one epoch - SphNNs can evolve into various types of reasoning, such as spatio-temporal, logical, event, neuro-symbolic, and humour understanding Summary: The paper introduces Sphere Neural Networks (SphNNs), a new neural model that uses spheres and spatial relations to perform human-like reasoning in various domains.
http://arxiv.org/abs/2403.15293v1
Compressor summary: The LENS model proposes that linguistic descriptions in decision problems affect behaviour in economic games by triggering emotions, suggesting norms, and shaping strategic choices, and reviews experimental evidence supporting this claim.
http://arxiv.org/abs/2403.15279v1
Compressor summary: Fundus is a user-friendly news scraper that uses custom content extractors to obtain high-quality news articles from various online newspapers.
http://arxiv.org/abs/2403.15278v1
Compressor summary: The paper presents a new framework for annotating noun phrases' genericity in natural language, which is simple, intuitive, grounded in linguistic theory, and validated by a pilot study and an evaluation.
http://arxiv.org/abs/2403.15273v1
Compressor summary: The paper proposes a new method to improve event temporal relation extraction by using knowledge from large language models to enhance prompt templates and verbalizers, leading to better results on three datasets.
http://arxiv.org/abs/2403.15272v1
Compressor summary: WSCLoc is a system that enhances deep learning-based camera relocalization models under weakly-supervised and sparse view conditions using two stages of co-optimization.
http://arxiv.org/abs/2403.15268v1
Compressor summary: IMcQA uses imagination to generate richer context for question answering without external resources, improving performance across various settings.
http://arxiv.org/abs/2403.15267v1
Compressor summary: The authors propose a sparse, robust, and interpretable control policy for parametric partial differential equations using dictionary learning and differentiable L$_0$ regularization, which improves performance, interpretability, and generalization over traditional deep neural network-based methods.
http://arxiv.org/abs/2403.15260v1
Compressor summary: Key points: - OOD detection is important for deep learning models in safety-critical applications - Conventional Euclidean geometry-based methods are not optimal for visual data - Hyperbolic geometry can improve OOD detection performance - Synthetic outliers do not help in Hyperbolic space - Hyperbolic embedding dimension affects OOD detection performance Summary: The paper proposes a Hyperbolic geometry-based metric framework for OOD detection in visual data, which outperforms Euclidean methods and addresses practical concerns.
http://arxiv.org/abs/2403.15251v1
Compressor summary: Conditional-SAM is a new algorithm that learns safe action models with conditional effects, which enable powerful planners to solve various planning problems using realistic observations.
http://arxiv.org/abs/2403.15250v1
Compressor summary: The study re-examines the factors affecting the performance of large language models (LLMs) using a comprehensive statistical methodology, revealing new insights into their emergent abilities and developmental trajectories.
http://arxiv.org/abs/2403.15249v1
Compressor summary: Spectral Motion Alignment (SMA) is a framework that enhances video customization by refining and aligning motion vectors using frequency-domain regularization, mitigating spatial artifacts and improving motion transfer.
http://arxiv.org/abs/2403.15248v1
Compressor summary: Key points: - Computer vision can transform farming into a data-driven, precise, and sustainable industry - Deep learning needs large annotated datasets, which are costly and time-consuming to obtain - Self-supervised learning, using SimCLR, can learn robust features from unannotated agriculture images - The approach is more cost-effective and accessible, enabling wider adoption of computer vision in agriculture Summary: The paper proposes a self-supervised learning framework that uses SimCLR to train a model on unannotated agriculture images, which can perform diverse tasks and reduce the reliance on expensive annotated data.
http://arxiv.org/abs/2403.15245v1
Compressor summary: Object-centric learning improves machine understanding of complex scenes using a novel reasoning module called STATM, which incorporates memory buffer, spatiotemporal attention, and fusion for enhanced perception.
http://arxiv.org/abs/2403.15244v1
Compressor summary: The paper proposes a fast stochastic quasi-Newton method for non-uniform smoothness in machine learning problems, achieving optimal sample complexity and convergence speedup with gradient clipping and variance reduction techniques.
http://arxiv.org/abs/2403.15241v1
Compressor summary: IS-Fusion is a novel framework that fuses instance- and scene-level information for better 3D perception in autonomous driving using multimodal data.
http://arxiv.org/abs/2403.15234v1
Compressor summary: The paper presents a foundation model with rich prior knowledge for generating realistic shadows in image composition tasks, using ControlNet and intensity modulation modules.
http://arxiv.org/abs/2403.15227v1
Compressor summary: Our method creates highly stylized 3D face models with diverse topologies using a surface deformation network and MAGE, achieving realistic results for applications like image-based generation and animation.
http://arxiv.org/abs/2403.15218v1
Compressor summary: SAM is a foundation model that can generate segmentation masks for medical images, but its performance for training 3D DL models is not as good as ground-truth annotations when using crowd-sourced data.
http://arxiv.org/abs/2403.15212v1
Compressor summary: The authors propose a new module called G-DevLSTM that improves skeleton-based action recognition by leveraging path development and Lie group structure, achieving better performance than existing methods on several datasets.
http://arxiv.org/abs/2403.15210v1
Compressor summary: This paper explores how gradual unfreezing affects neural network performance on in-distribution and out-of-distribution tasks, using trace of Fisher Information and sharpness as indicators.
http://arxiv.org/abs/2403.15209v1
Compressor summary: The text proposes a novel framework called MSCoTDet that uses large language models to understand the complementary information between RGB and thermal modalities for improved multispectral pedestrian detection.
http://arxiv.org/abs/2403.15194v1
Compressor summary: The paper introduces a fast and flexible method to generate image variations as videos for data augmentation in deep learning tasks, improving model accuracy on various datasets.
http://arxiv.org/abs/2403.15192v1
Compressor summary: The paper proposes a new spiking neural network approach for efficient object detection using event cameras, achieving state-of-the-art results on two datasets.
http://arxiv.org/abs/2403.15185v1
Compressor summary: The text compares CodeGPT and UniXcoder, two language models for code completion, on the functional programming language Haskell, finding mixed results and highlighting the need for more high-quality datasets.
http://arxiv.org/abs/2403.15182v1
Compressor summary: PDE-CNNs are a type of neural network that uses evolution equations to learn geometric features, which offer benefits such as fewer parameters, better performance, and data efficiency compared to conventional CNNs.
http://arxiv.org/abs/2403.15180v1
Compressor summary: The paper proposes a new method for training neural combinatorial optimization models that simplifies the process, improves solutions progressively, and outperforms existing methods on the Job Shop Scheduling Problem.
http://arxiv.org/abs/2403.15173v1
Compressor summary: LSK3DNet is an efficient and effective LiDAR perception method that uses dynamic pruning to amplify 3D kernels, reducing model size, computational cost, and improving performance on 3D vision tasks.
http://arxiv.org/abs/2403.15170v1
Compressor summary: The study explores using self-supervised learning to generate a task-agnostic representation for detecting major depressive disorder and post-traumatic stress disorder from audio and video data during interactive sessions.
http://arxiv.org/abs/2403.15167v1
Compressor summary: The text discusses a mixed classification and transition model that assigns objects to target or normal classes through iterative actions and transitions, and analyzes the structure and properties of realistic transition graphs for medical applications.
http://arxiv.org/abs/2403.15161v1
Compressor summary: FastCAD is a real-time method that retrieves and aligns CAD models for all objects in a scene, improving the performance of augmented reality and robotics applications.
http://arxiv.org/abs/2403.15152v1
Compressor summary: Key points: - The paper proposes a caption-matching approach for cross-domain image retrieval using multimodal language-vision models pre-trained on large datasets - The method achieves state-of-the-art performance on DomainNet and Office-Home datasets - The method is tested with AI-generated images from Midjourney platform Summary: The paper presents a novel caption-matching method for retrieving similar images across domains using multimodal models, which outperforms existing approaches and works well with AI-generated images.
http://arxiv.org/abs/2403.15150v1
Compressor summary: The paper presents eight data reduction methods for tabular and image datasets, along with a Python package to apply them, and evaluates their impact on efficiency and performance.
http://arxiv.org/abs/2403.15146v1
Compressor summary: Adam converges faster than SGDM in non-uniformly bounded smoothness settings and has better convergence guarantees in both deterministic and stochastic scenarios.
http://arxiv.org/abs/2403.15143v1
Compressor summary: MedDeepCycleAL is an easy-to-use framework that automates image annotation for medical images using deep learning and active learning, saving time and effort.
http://arxiv.org/abs/2403.15139v1
Compressor summary: The paper introduces IDA-RD, a novel process-based measure to quantify image downscaling algorithms' quality based on rate-distortion theory and deep generative models for super-resolution.
http://arxiv.org/abs/2403.15137v1
Compressor summary: The paper introduces CACA Agent, an open architecture AI system that collaborates with different LLMs and services to enhance extensibility and functionality in various applications.
http://arxiv.org/abs/2403.15132v1
Compressor summary: Key points: - Image denoising is a fundamental computer vision task - Deep learning methods struggle with out-of-distribution (OOD) noise - CLIP model has exceptional open-world image recognition and segmentation capabilities - Paper explores using CLIP features for generalizable denoising - Proposed method uses dense features from CLIP encoder in learnable decoder - Progressive feature augmentation strategy improves robustness Summary: The paper proposes a generalizable image denoising method that leverages dense features from the CLIP model, which have distortion-invariant and content-related properties, and uses a learnable decoder with progressive feature augmentation to handle diverse out-of-distribution noises.
http://arxiv.org/abs/2403.15127v1
Compressor summary: The text introduces a new method to improve semi-supervised object detection on class imbalanced datasets by using gradient-based sampling techniques to balance the influence of majority and minority classes.
http://arxiv.org/abs/2403.15124v1
Compressor summary: EndoGSLAM is a fast and accurate SLAM method for intrabody medical imaging devices that enables high-quality tissue reconstruction and real-time visualization during endoscopic surgeries.
http://arxiv.org/abs/2403.15123v1
Compressor summary: The paper presents HistNetQ, a novel neural architecture that uses histograms for quantification problems, which outperforms existing methods and requires only prevalence values as input.
http://arxiv.org/abs/2403.15121v1
Compressor summary: This study develops new techniques to better identify the central sulcus in brain images, which can help understand early brain changes in children at risk of bipolar disorder and schizophrenia.
http://arxiv.org/abs/2403.15119v1
Compressor summary: The paper introduces a new diverse and open-world dataset for person re-identification (ReID) research and proposes a method to improve domain generalization using latent domain expansion (LDE).
http://arxiv.org/abs/2403.15115v1
Compressor summary: The text proposes a set of six maxims for effective human-AI conversation, including two new ones (benevolence and transparency) that address issues unique to AI interactions.
http://arxiv.org/abs/2403.15112v1
Compressor summary: This paper examines how different text embeddings and algorithms affect text clustering, finding that large language models perform well but require careful consideration of trade-offs between nuance and efficiency.
http://arxiv.org/abs/2403.15108v1
Compressor summary: The paper proposes a new active learning strategy for regression using Wasserstein distance measured by GroupSort Neural Networks, which improves estimation accuracy and speed.
http://arxiv.org/abs/2403.15098v1
Compressor summary: The paper introduces UniTraj, a framework that unifies various vehicle trajectory prediction datasets, models, and evaluation criteria to study their generalization and scalability, finding that larger data size and diversity improve performance.
http://arxiv.org/abs/2403.15097v1
Compressor summary: The paper proposes an argument-aware approach for event linking that improves the recognition of event mentions and synthesizes out-of-KB training examples using controlled manipulation of event arguments.
http://arxiv.org/abs/2403.15091v1
Compressor summary: The text discusses challenges and solutions for using Deep Reinforcement Learning to optimize wastewater treatment processes, focusing on improving simulation accuracy by addressing compounding error.
http://arxiv.org/abs/2403.15089v1
Compressor summary: IFSENet combines few-shot segmentation and interactive segmentation to reduce the annotation effort for training segmentation models for novel classes using clicks instead of pixel-level maps.
http://arxiv.org/abs/2403.15088v1
Compressor summary: CHisIEC is a large, diverse, and labeled dataset for NER and RE tasks in ancient Chinese historical texts, with experiments involving LLMs.
http://arxiv.org/abs/2403.15083v1
Compressor summary: SIMAP enhances interpretability of deep learning models by using an enhanced version of Simplicial-Map Neural Networks that can work as a substitute for classic dense layers.
http://arxiv.org/abs/2403.15082v1
Compressor summary: Key points: - Cell Variational Information Bottleneck Network (cellVIB) is a CNN that uses information bottleneck mechanism and end-to-end training - cellVIB generates feature maps with uncertainty and learns mean and standard deviation terms for each VIB cell - cellVIB performs well on MNIST, CIFAR-10, and PACS datasets, and is robust against noise and corruption - cellVIB achieves competitive results on face recognition task Summary: The paper introduces cellVIB, a CNN that learns uncertainty and variability in feature maps using information bottleneck and end-to-end training. The method shows strong performance and robustness on various image datasets and tasks.
http://arxiv.org/abs/2403.15079v1
Compressor summary: The text describes a method to learn reward functions from expert demonstrations using polynomial features and feature selection based on trajectory probabilities and correlations.
http://arxiv.org/abs/2403.15077v1
Compressor summary: The paper proposes a hybrid approach based on two techniques to apply on sequenced and static graph-structured data for node and graph classification tasks.
http://arxiv.org/abs/2403.15073v1
Compressor summary: The letter introduces an improved version of TensorNet that can handle charged molecules and spin states without compromising efficiency or accuracy.
http://arxiv.org/abs/2403.15064v1
Compressor summary: Key points: - 3D reconstruction of real scenes is important for computer graphics and vision - Dynamic scenes are challenging and require various techniques and inputs - The report reviews state-of-the-art methods, applications, and future directions Summary: The report summarizes the latest techniques for reconstructing 3D models of dynamic real scenes using different data sources and neural representations, highlighting their applications and open challenges.
http://arxiv.org/abs/2403.15063v1
Compressor summary: The paper proposes CT-SAM3D, a 3D segmentation model that uses a nearly fully labeled dataset and two key technical developments to achieve better performance and efficiency than previous methods for whole-body CT segmentation.
http://arxiv.org/abs/2403.15059v1
Compressor summary: MM-Diff is a fast and effective method for generating high-quality images with single or multiple subjects using diffusion models, text embeddings, and multimodal cross-attention.
http://arxiv.org/abs/2403.15049v1
Compressor summary: CVLN is a paradigm for training VLN agents with continual learning and rehearsal-based methods, enabling them to navigate in new environments using natural language and visual information.
http://arxiv.org/abs/2403.15048v1
Compressor summary: Our system detects visual hallucinations in cartoon character images generated by text-to-image models using pose information and vision-language models, improving accuracy over baseline methods.
http://arxiv.org/abs/2403.15045v1
Compressor summary: The paper proposes the first differentially private dueling bandit algorithm that efficiently learns near-optimal actions with user preferences, achieving optimal regret bounds in both finite and infinite decision spaces.
http://arxiv.org/abs/2403.15044v1
Compressor summary: The paper proposes a method that combines multimodal fusion and pre-trained model features for expression recognition and valence-arousal estimation tasks, using the Aff-Wild2 database.
http://arxiv.org/abs/2403.15042v1
Compressor summary: LLM2LLM is a data augmentation strategy that uses a teacher LLM to generate synthetic data based on incorrect examples from a student LLM, improving performance in low-data regimes.
http://arxiv.org/abs/2403.15040v1
Compressor summary: This paper explores how language models like GPT-4 can be used to evaluate ESG factors without explicit training data, and shows their potential in financial tasks.
http://arxiv.org/abs/2403.15033v1
Compressor summary: Key points: - The paper proposes Data Amplify Learning (DAL) and TinyBeauty, a compact makeup model that uses pixel-to-pixel supervision with few annotations. - DAL uses a Residual Diffusion Model to generate high-fidelity details and a Fine-Grained Makeup Module to achieve precise makeup control and combination. - TinyBeauty achieves state-of-the-art performance without face parsing or landmark detection, and has high inference speed on mobile devices. Summary: The paper introduces DAL and TinyBeauty, a novel approach to facial makeup that uses image amplification with minimal supervision to produce realistic and precise results, while being fast and efficient on mobile platforms.
http://arxiv.org/abs/2403.15032v1
Compressor summary: The paper proposes INSINet, a deep learning method that integrates neighborhood and scale information for open-pit mine change detection in high-resolution remote sensing images, improving performance significantly.
http://arxiv.org/abs/2403.15027v1
Compressor summary: The study proposes a grey-informed neural network (GINN) that improves interpretability, handles small data samples, and produces reliable forecasts by following the differential equation model of the grey system.
http://arxiv.org/abs/2403.15026v1
Compressor summary: VRSO is a visual-centric approach for static object annotation in 3D space that is low cost, high efficiency, and high quality, requiring only camera images as input and minimal manual labeling.
http://arxiv.org/abs/2403.15025v1
Compressor summary: The text discusses how conformal prediction can handle uncertainty in machine learning, but sometimes suffers from coverage loss under distributional shift, and proposes a physics-informed structural causal model to improve robustness.
http://arxiv.org/abs/2403.15022v1
Compressor summary: The paper explores how initialization and the iterative pruning process affect deep neural networks' generalization and training performance.
http://arxiv.org/abs/2403.15019v1
Compressor summary: BSNet is a novel method that uses simulation-assisted transformation to generate accurate pseudo-labels for 3D instance segmentation with bounding box annotations, improving the quality of weakly supervised results.
http://arxiv.org/abs/2403.15017v1
Compressor summary: The paper evaluates state-of-the-art vehicle detection methods under Nordic winter conditions using a new dataset and proposes enhancements for improved performance.
http://arxiv.org/abs/2403.15013v1
Compressor summary: The text proposes a patch-labeling method that uses AI and crowdsourcing to reduce bias in image classification by guiding the model's attention to target objects.
http://arxiv.org/abs/2403.15012v1
Compressor summary: The text compares two cross-validation methods for evaluating clinical prediction models on multi-source medical datasets, showing that leave-source-out cross-validation provides more reliable performance estimates than K-fold cross-validation.
http://arxiv.org/abs/2403.15011v1
Compressor summary: The authors propose a novel tracker that uses uncertainty estimation and mitosis-aware assignment to improve cell tracking and segmentation in microscopy time-lapse data, outperforming existing methods by a significant margin.
http://arxiv.org/abs/2403.15010v1
Compressor summary: The paper proposes clean-image backdoor attacks that can inject backdoors into image classification models via a fraction of incorrect labels without modifying the training images, posing serious threats to their fairness and robustness.
http://arxiv.org/abs/2403.15009v1
Compressor summary: TexRO is a new method to create detailed textures for 3D meshes by optimizing their UV maps using a smart viewpoint selection and a recursive optimization pipeline, achieving high quality results with fast runtime speed.
http://arxiv.org/abs/2403.15008v1
Compressor summary: The paper introduces a novel framework, Tri-Perspective view Decomposition (TPVD), for depth completion in autonomous driving that explicitly models 3D geometry by decomposing point clouds into three 2D views and updating features through recurrent 2D-3D-2D aggregation.
http://arxiv.org/abs/2403.15004v1
Compressor summary: The ParFormer is a new transformer architecture that combines different token mixers for better feature extraction and outperforms other models in image classification tasks.
http://arxiv.org/abs/2403.14999v1
Compressor summary: This paper proposes MaQD, a model compression technology that uses quantization-aware training and a novel normalization method to reduce the computation cost of large DNNs without significantly affecting their accuracy.
http://arxiv.org/abs/2403.14995v1
Compressor summary: The paper proposes Guidance Training, a novel auxiliary task for unsupervised domain adaptation in semantic segmentation that uses mixed data to guide the model to extract and reconstruct target-domain features without generating divergent synthetic data.
http://arxiv.org/abs/2403.14990v1
Compressor summary: MasonTigers participated in all languages and tracks of Semantic Textual Relatedness, using a combination of statistical methods, BERT, and sentence transformers to achieve various rankings.
http://arxiv.org/abs/2403.14989v1
Compressor summary: The paper describes a system that uses various methods to detect machine-generated text in multiple languages and tasks, achieving good results with ensemble models and zero-shot prompting.
http://arxiv.org/abs/2403.14988v1
Compressor summary: The paper examines how reward models can help assess risks in large language models, focusing on information hazards and finding that they are less harmful and vulnerable to attacks than other risks.
http://arxiv.org/abs/2403.14987v1
Compressor summary: The paper explores active learning for generative models in image synthesis personalization tasks using direction-based uncertainty sampling and shows that open-source models outperform closed-source ones like Google's StyleDrop.
http://arxiv.org/abs/2403.14982v1
Compressor summary: The paper shows how different prompting techniques improve large language models' performance in natural language understanding puzzles, achieving 2nd and 13th place results.
http://arxiv.org/abs/2403.14977v1
Compressor summary: UDML uses piecewise-linear approximations of data manifolds to estimate similarity between unlabeled points and improves zero-shot image retrieval.
http://arxiv.org/abs/2403.14974v1
Compressor summary: The paper proposes a new forgery detection method called AVT2-DWF that combines audio and visual information using dual transformers and dynamic weight fusion to enhance detection capabilities.
http://arxiv.org/abs/2403.14973v1
Compressor summary: The text introduces a new pose-estimation benchmark for evaluating self-supervised learning (SSL) in geometric tasks, and presents two methods to enhance SSL geometric representations without compromising semantic classification.
http://arxiv.org/abs/2403.14972v1
Compressor summary: The paper introduces a new debating approach called Blueprint Debate on Graphs (BDoG) that tackles challenges of opinion trivialization and distractor concepts in multimodal reasoning by using a top-down, evidence-based method.
http://arxiv.org/abs/2403.14966v1
Compressor summary: The paper presents DreamFlow, a fast and high-quality text-to-3D generation method that uses a novel optimization algorithm based on probability flow and a predetermined timestep schedule.
http://arxiv.org/abs/2403.14958v1
Compressor summary: Adapprox is a memory-efficient optimizer that uses randomized low-rank matrix approximation to approximate Adam's second moment, achieving better accuracy and stability in deep learning models training.
http://arxiv.org/abs/2403.14952v1
Compressor summary: The paper proposes a text generation approach called RARG that collects evidence from scientific sources and generates polite, factual counter-misinformation responses using reinforcement learning.
http://arxiv.org/abs/2403.14951v1
Compressor summary: SimGC simplifies graph condensation by aligning the condensed graph with the original graph using a pre-trained SGC model, achieving up to 10 times speedup without compromising performance.
http://arxiv.org/abs/2403.14950v1
Compressor summary: KnowLA is a method to integrate knowledge graph embeddings into large language models for better adaptation to downstream tasks using instruction data and parameter-efficient finetuning.
http://arxiv.org/abs/2403.14949v1
Compressor summary: D3A is a novel approach to online time series forecasting that detects and adapts to concept drifts using data augmentation and reduces errors compared to existing models.
http://arxiv.org/abs/2403.14947v1
Compressor summary: The paper proposes GPT-connect, a method to generate scene-aware human motions using an existing blank-background motion generator and ChatGPT, without any additional training.
http://arxiv.org/abs/2403.14946v1
Compressor summary: The study investigates the relationship between initial weight matrices and low-rank matrices in Low-Rank Adaptation (LoRA) method, and proposes a new method called Conditionally Parameterized LoRA (CondLoRA) that uses a single linear layer to derive task-adapted low-rank matrices.
http://arxiv.org/abs/2403.14944v1
Compressor summary: The proposed CLIP-VQDiffusion model uses the CLIP pretraining to generate realistic images from text captions, even when the text is out of distribution.
http://arxiv.org/abs/2403.14941v1
Compressor summary: The paper analyzes lane-level traffic prediction research, proposes a unified evaluation framework, and introduces a baseline model using graph structure and MLP networks, while also releasing new datasets and codes for the community.
http://arxiv.org/abs/2403.14939v1
Compressor summary: STAG4D is a novel framework that combines diffusion models with dynamic 3D Gaussian splatting to generate high-fidelity 4D content with spatial-temporal consistency from various inputs.
http://arxiv.org/abs/2403.14938v1
Compressor summary: The study compares four large language models' performance in zero-shot counterspeech generation and proposes three prompting strategies to improve quality and reduce toxicity.
http://arxiv.org/abs/2403.14937v1
Compressor summary: The text surveys the current state-of-the-art in 3D modeling of articulated objects, focusing on part perception and creation, and discusses geometry processing and articulation modeling challenges and future directions.
http://arxiv.org/abs/2403.14932v1
Compressor summary: The paper introduces a method to improve large language models' reasoning by optimizing their attention mechanisms without extra data, focusing on non-STEM questions.
http://arxiv.org/abs/2403.14922v1
Compressor summary: CODA is a cost-efficient mobile sensing adaptation mechanism that uses active learning to handle real-time drifts and improve human activity recognition.
http://arxiv.org/abs/2403.14919v1
Compressor summary: Hierarchical Skip Decoding is a new method for efficient text generation that adapts to the sequence length and preserves most of the quality.
http://arxiv.org/abs/2403.14918v1
Compressor summary: The research presents a new multilayer perceptron model that outperforms other models in weather forecasting in Itoshua, Kyushu, Japan.
http://arxiv.org/abs/2403.14917v1
Compressor summary: The paper investigates how two-layer neural networks learn features using kernel methods in a mean-field regime, showing their advantages over traditional kernel methods and discussing convergence, error, and regularization aspects.
http://arxiv.org/abs/2403.14910v1
Compressor summary: The text describes a new imbalance phenomenon in Class Incremental Learning and proposes a method called CLAD to address it by enhancing the accuracy of forgotten classes.
http://arxiv.org/abs/2403.14898v1
Compressor summary: The study presents a fast and accurate melanoma detection method that works well on various datasets and deep learning models, enabling efficient skin cancer screening.
http://arxiv.org/abs/2403.14897v1
Compressor summary: Key points: - The text proposes a geometric generative model based on an equivariant PDE for G-CNNs, called PDE-G-CNNs - The model uses morphology operators and GANs to extract specific features and reduce complexity - The model is equivariant under group symmetries and has multiscale geometric interpretability - The model performs better than classical GAN on MNIST data Summary: The text introduces a novel geometric generative model that uses an equivariant PDE for G-CNNs, morphology operators, and GANs to create images with specific features and reduced complexity, while preserving group symmetries and multiscale structures. The model shows superior performance on MNIST data.
http://arxiv.org/abs/2403.14895v1
Compressor summary: Stance Reasoner is a method that uses background knowledge and reasoning to detect opinions on new topics without training data.