arxiv compressed, 2023-12-15

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-15 generated by the compressor, my personal LLM-based project.


SAM-guided Graph Cut for 3D Instance Segmentation

Haoyu Guo,He Zhu,Sida Peng,Yuang Wang,Yujun Shen,Ruizhen Hu,Xiaowei Zhou

http://arxiv.org/abs/2312.08372v1

Compressor summary: The paper proposes a novel 3D-to-2D query framework to use 2D segmentation models for 3D instance segmentation, improving generalization ability and robustness across various scenes.


PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Kuan-Chih Huang,Weijie Lyu,Ming-Hsuan Yang,Yi-Hsuan Tsai

http://arxiv.org/abs/2312.08371v1

Compressor summary: The paper proposes a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection using current-frame objects and their historical trajectories as input.


VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering

Xijun Wang,Junbang Liang,Chun-Kai Wang,Kenan Deng,Yu Lou,Ming Lin,Shan Yang

http://arxiv.org/abs/2312.08367v1

Compressor summary: Key points: - VLAP model for efficient and effective video-language alignment - Frame-Prompter and QFormer-Distiller modules for frame sampling and cross-modal alignment - Improved accuracy, latency, and speed up compared to prior work and state-of-the-art methods Summary: The VLAP model uses Frame-Prompter and QFormer-Distiller to align video and language efficiently and effectively, achieving better results on video question-answering benchmarks.


See, Say, and Segment: Teaching LMMs to Overcome False Premises

Tsung-Han Wu,Giscard Biamby,David Chan,Lisa Dunlap,Ritwik Gupta,Xudong Wang,Joseph E. Gonzalez,Trevor Darrell

http://arxiv.org/abs/2312.08366v1

Compressor summary: The paper proposes a method for large multimodal models to detect and correct false premises in image segmentation tasks, improving their performance and human interaction.


An Invitation to Deep Reinforcement Learning

Bernhard Jaeger,Andreas Geiger

http://arxiv.org/abs/2312.08365v1

Compressor summary: The text introduces reinforcement learning as a generalization of supervised learning for optimizing non-differentiable objectives in deep neural networks, and provides an accessible introduction to state-of-the-art algorithms like PPO.


View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for Procedural Synthetic Data

Zeyu Ma,Alexander Raistrick,Lahav Lipson,Jia Deng

http://arxiv.org/abs/2312.08364v1

Compressor summary: OcMesher is a mesh extraction algorithm that creates octrees from signed distance functions and camera views to generate high-quality synthetic data for computer vision tasks.


Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Alexander Borzunov,Max Ryabinin,Artem Chumachenko,Dmitry Baranchuk,Tim Dettmers,Younes Belkada,Pavel Samygin,Colin Raffel

http://arxiv.org/abs/2312.08361v1

Compressor summary: The authors propose cost-efficient methods for running large language models on geodistributed devices, addressing reliability and load-balancing challenges with special algorithms to achieve up to 10x faster inference.


Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Anand Siththaranjan,Cassidy Laidlaw,Dylan Hadfield-Menell

http://arxiv.org/abs/2312.08358v1

Compressor summary: Preference learning from human feedback depends on incomplete data with hidden context, which can lead to counter-intuitive results and vulnerabilities in RLHF; distributional preference learning (DPL) methods can mitigate these issues.


FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen,Wei Yang,Jan Kautz,Stan Birchfield

http://arxiv.org/abs/2312.08344v1

Compressor summary: FoundationPose is a single model that estimates and tracks 6D object poses using CAD models or reference images, achieving strong generalization with synthetic training and large language models.


Global Latent Neural Rendering

Thomas Tanay,Matteo Maggioni

http://arxiv.org/abs/2312.08338v1

Compressor summary: The text describes ConvGLR, a new method for novel view synthesis that uses a global rendering operator in a low-resolution latent space to improve performance over existing methods.


LD-SDM: Language-Driven Hierarchical Species Distribution Modeling

Srikumar Sastry,Xin Xing,Aayush Dhakal,Subash Khanal,Adeel Ahmad,Nathan Jacobs

http://arxiv.org/abs/2312.08334v1

Compressor summary: The authors propose a language model-based method for species distribution modeling that uses taxonomic hierarchy and a novel proximity-aware evaluation metric, achieving state-of-the-art results on various tasks.


PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Xin You,Ming Ding,Minghui Zhang,Hanxiao Zhang,Yi Yu,Jie Yang,Yun Gu

http://arxiv.org/abs/2312.08323v1

Compressor summary: The paper proposes a unified network, PnPNet, that uses pushing and pulling branches to generate precise boundary segmentation of volumetric images, improving diagnosis and intervention in clinical practice.


Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

Jiang Zhang,Qiong Wu,Yiming Xu,Cheng Cao,Zheng Du,Konstantinos Psounis

http://arxiv.org/abs/2312.08303v1

Compressor summary: BD-LLM is a method to improve toxic content detection by using decision trees to guide Large Language Models and distill their knowledge into smaller, faster models.


Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data

Van Minh Nguyen,Nasheen Nur,William Stern,Thomas Mercer,Chiradeep Sen,Siddhartha Bhattacharyya,Victor Tumbiolo,Seng Jhing Goh

http://arxiv.org/abs/2312.08299v1

Compressor summary: The study uses AI to analyze social media posts and identify patterns of suicidal behavior by assigning attribution scores to tokens in users' texts.


VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Guénolé Fiche,Simon Leglaive,Xavier Alameda-Pineda,Antonio Agudo,Francesc Moreno-Noguer

http://arxiv.org/abs/2312.08291v1

Compressor summary: The authors propose a novel method for human pose and shape estimation using a low-dimensional discrete latent representation, achieving realistic results and outperforming current non-parametric approaches.


Hybrid Sample Synthesis-based Debiasing of Classifier in Limited Data Setting

Piyush Arora,Pratik Mazumder

http://arxiv.org/abs/2312.08288v1

Compressor summary: This paper proposes a novel method to reduce bias in deep learning models without prior knowledge, especially when training data is limited, by synthesizing hybrid samples that balance the bias and improve predictions.


On the verification of Embeddings using Hybrid Markov Logic

Anup Shakya,Abisha Thapa Magar,Somdeb Sarkhel,Deepak Venugopal

http://arxiv.org/abs/2312.08287v1

Compressor summary: The text proposes a framework using Hybrid Markov Logic Networks to verify complex properties of learned representations from Deep Neural Networks by encoding verification as a Mixed Integer Linear Program.


Prompting LLMs with content plans to enhance the summarization of scientific articles

Aldan Creo,Manuel Lama,Juan C. Vidal

http://arxiv.org/abs/2312.08282v1

Compressor summary: This paper proposes new ways to help automatic summarizers handle long and complex scientific articles by providing them with key terms from the articles, improving performance especially for smaller models.


High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models

Songchi Zhou,Sheng Yu

http://arxiv.org/abs/2312.08274v2

Compressor summary: Key points: - The paper proposes a method for high-throughput biomedical relation extraction using large language models (LLMs) and their reading comprehension ability and world knowledge. - The method formulates the task as a binary classification problem, uses a biomedical thesaurus to match head entities, and slices text into chunks for compatibility with LLMs. - The method achieves performance comparable to GPT-4 on a curated benchmark dataset and can be extended to different semi-structured biomedical websites. Summary: The paper presents a method that leverages large language models' reading comprehension and world knowledge for extracting various types of biomedical relations from semi-structured web articles, with performance comparable to GPT-4.


Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation

Arul Selvam Periyasamy,Vladimir Tsaturyan,Sven Behnke

http://arxiv.org/abs/2312.08268v1

Compressor summary: The paper proposes a new vision transformer model for multi-object pose estimation with inductive biases, deformable attention, and query aggregation, achieving state-of-the-art results on the YCB-Video dataset.


A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing

Gwilherm Lesné,Yann Gousseau,Saïd Ladjal,Alasdair Newson

http://arxiv.org/abs/2312.08256v1

Compressor summary: Key points: - The paper proposes an auto-encoder for controlled image editing using StyleGAN - The auto-encoder re-organizes the latent space of StyleGAN to encourage disentanglement of attributes - The approach has shorter training time and better disentanglement than competing methods Summary: The paper presents a simple auto-encoder that edits images using StyleGAN by re-organizing its latent space for disentangled attribute editing, with faster training and higher quality.


A Survey of Generative AI for Intelligent Transportation Systems

Huan Yan,Yong Li

http://arxiv.org/abs/2312.08248v1

Compressor summary: Generative AI is a vital tool for improving traffic management and optimization by addressing various issues in different tasks within intelligent transportation systems.


Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation

Yujun Chen,Xin Tan,Zhizhong Zhang,Yanyun Qu,Yuan Xie

http://arxiv.org/abs/2312.08234v1

Compressor summary: The paper presents a semi-supervised segmentation method for point clouds using latent labels from LiDAR and image data, improving performance over the state-of-the-art method, LaserMix.


Partial Symmetry Detection for 3D Geometry using Contrastive Learning with Geodesic Point Cloud Patches

Gregor Kobsik,Isaak Lim,Leif Kobbelt

http://arxiv.org/abs/2312.08230v1

Compressor summary: Key points: - The paper proposes a self-supervised method for detecting partial and extrinsic symmetries in 3D shapes using contrastive learning and geodesic point cloud patches. - The method learns rotation, reflection, translation and scale invariant local shape features that generalize across different datasets and classes. - The paper introduces a new benchmark test for this task and shows how the detected symmetries can be used for 3D shape partitioning. Summary: The paper presents a novel self-supervised approach to detect partial and extrinsic symmetries in 3D shapes using contrastive learning on geodesic point cloud patches, and demonstrates its applications for shape analysis and partitioning.


GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time

Haoran Ye,Jiarui Wang,Helan Liang,Zhiguang Cao,Yong Li,Fanzhang Li

http://arxiv.org/abs/2312.08224v1

Compressor summary: GLOP is a hierarchical framework that combines non-autoregressive and autoregressive neural heuristics to efficiently scale up large-scale routing problems, achieving state-of-the-art real-time performance.


Patch-wise Graph Contrastive Learning for Image Translation

Chanyong Jung,Gihyun Kwon,Jong Chul Ye

http://arxiv.org/abs/2312.08223v1

Compressor summary: The paper proposes a method for image translation using graph neural networks and patch-wise similarity to capture semantic correspondence between input and output images.


Curriculum-Enhanced Residual Soft An-Isotropic Normalization for Over-smoothness in Deep GNNs

Jin Li,Qirong Zhang,Shuling Xu,Xinlong Chen,Longkun Guo,Yang-Geng Fu

http://arxiv.org/abs/2312.08221v2

Compressor summary: Key points: - The paper proposes a soft graph normalization method and a label-smoothing-based learning framework to improve deep GNNs. - The method can preserve node diversity, capture input knowledge, and enhance optimization with residual connections. - The method outperforms existing baselines on twelve real-world node classification benchmarks. Summary: The paper presents a new approach to deepen graph neural networks by normalizing graphs softly, using residual connections, and learning from smoothed labels, which improves node classification performance.


EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset

Peiqi Duan,Boyu Li,Yixin Yang,Hanyue Lou,Minggui Teng,Yi Ma,Boxin Shi

http://arxiv.org/abs/2312.08220v1

Compressor summary: Event cameras improve image/video quality by enhancing traditional frame-based cameras in dynamic range, speed, and other aspects using five specific tasks and a new dataset.


LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao,Jiacheng Ruan,Suncheng Xiang,Zefang Yu,Ke Ji,Mingye Xie,Ting Liu,Yuzhuo Fu

http://arxiv.org/abs/2312.08212v1

Compressor summary: The paper introduces LAMM, a label alignment method for pre-trained visual-language models that improves few-shot learning and continual learning performance by adjusting category embeddings and using a hierarchical loss.


SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space

Yunchen Li,Zhou Yu,Gaoqi He,Yunhang Shen,Ke Li,Xing Sun,Shaohui Lin

http://arxiv.org/abs/2312.08200v1

Compressor summary: The paper proposes a novel generative model, SPD-DDPM, which uses Gaussian distributions in the SPD space for efficient predictions on large-scale data.


Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Kamil Kanclerz,Julita Bielaniewicz,Marcin Gruza,Jan Kocon,Stanisław Woźniak,Przemysław Kazienko

http://arxiv.org/abs/2312.08198v1

Compressor summary: Key points: - Data annotated by humans is valuable but costly for subjective NLP tasks - A new model-based approach reduces annotations needed with little knowledge loss - The method highlights the importance of diverse data collection and multi-task learning for subjective NLP Summary: The paper proposes a model that cuts annotation costs for subjective NLP problems by selecting tasks individually, while preserving knowledge quality and emphasizing the need for diverse data.


Concept-centric Personalization with Large-scale Diffusion Priors

Pu Cao,Lu Yang,Feng Zhou,Tianrui Huang,Qing Song

http://arxiv.org/abs/2312.08195v1

Compressor summary: The paper proposes a framework for customizing diffusion models to generate high-quality images for specific concepts while maintaining versatility and controllability, using generalized classifier-free guidance and a concept-specific generator.


SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion

Mojtaba Najafi Khatounabad,Hacer Yalim Keles,Selma Kadioglu

http://arxiv.org/abs/2312.08194v1

Compressor summary: The study introduces SVInvNet, a deep learning-based approach for seismic velocity inversion that performs better than FWI with diverse seismic models and varying noise levels.


PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images

Tao Zhang,Kun Ding,Jinyong Wen,Yu Xiong,Zeyu Zhang,Shiming Xiang,Chunhong Pan

http://arxiv.org/abs/2312.08192v1

Compressor summary: The paper proposes PAD, a pre-training paradigm for infrared images that uses adapters to learn domain-specific features while retaining general feature extraction ability, and shows its effectiveness on three downstream tasks.


Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression

Peilin Cai

http://arxiv.org/abs/2312.08177v1

Compressor summary: The paper applies advanced image segmentation techniques using CNNs and the Unet model to analyze C-fos gene expression, a marker for neural activity, and develops a novel workflow with pre-processing steps and labeling approaches for efficient and automated segmentation.


ASC: Adaptive Scale Feature Map Compression for Deep Neural Network

Yuan Yao,Tian-Sheuan Chang

http://arxiv.org/abs/2312.08176v1

Compressor summary: The proposed technique compresses feature maps using adaptive scale interpolation and independent channel indexing, achieving high compression rates and low hardware costs.


Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

Haifeng Huang,Zehan Wang,Rongjie Huang,Luping Liu,Xize Cheng,Yang Zhao,Tao Jin,Zhou Zhao

http://arxiv.org/abs/2312.08168v1

Compressor summary: The paper proposes a method to use object identifiers for referring to multiple objects in 3D scenes and fine-tunes a language model on various tasks using instruction tuning.


CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem

Qian Chen,Taolin Zhang,Dongyang Li,Xiaofeng He

http://arxiv.org/abs/2312.08157v1

Compressor summary: CIDR is a new method for finding minimal features in natural language processing models by detecting interactions between them using Cooperative Integrated Gradients and solving a knapsack problem.


Active learning with biased non-response to label requests

Thomas Robinson,Niek Tax,Richard Mudd,Ido Guy

http://arxiv.org/abs/2312.08150v1

Compressor summary: The paper proposes UCB-EU, a cost-based sampling strategy for active learning that reduces the impact of biased non-response on prediction models' performance in real-world contexts.


ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields

Juan Luis Gonzalez Bello,Minh-Quan Viet Bui,Munchurl Kim

http://arxiv.org/abs/2312.08136v1

Compressor summary: ProNeRF is a novel neural rendering method that balances memory, speed, and quality by using projection-aware sampling and a new training strategy for efficient ray exploration and exploitation.


Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Amirhossein Habibian,Amir Ghodrati,Noor Fathima,Guillaume Sautiere,Risheek Garrepalli,Fatih Porikli,Jens Petersen

http://arxiv.org/abs/2312.08128v1

Compressor summary: Clockwork Diffusion is a method that saves computational resources in text-to-image diffusion models by reusing previous denoising operations on low-resolution feature maps.


Neural Radiance Fields for Transparent Object Using Visual Hull

Heechan Yoon,Seungkyu Lee

http://arxiv.org/abs/2312.08118v1

Compressor summary: The paper proposes a method to improve NeRF for synthesizing transparent objects by considering their refraction using visual hull, Snell's law, and NeRF sampling.


Towards Better Morphed Face Images without Ghosting Artifacts

Clemens Seibold,Anna Hilsmann,Peter Eisert

http://arxiv.org/abs/2312.08111v1

Compressor summary: The paper proposes a method to prevent ghosting artifacts in face morphing by aligning pixels during generation and improves detection resistance and biometric quality.


Causal Optimal Transport of Abstractions

Yorgos Felekis,Fabio Massimo Zennaro,Nicola Branchini,Theodoros Damoulas

http://arxiv.org/abs/2312.08107v1

Compressor summary: COTA is a novel method to learn abstraction maps between causal models using multi-marginal Optimal Transport and interventional data, without assuming complete knowledge of the underlying models.


Machine Learning for the Multi-Dimensional Bin Packing Problem: Literature Review and Empirical Evaluation

Wenjie Wu,Changjun Fan,Jincai Huang,Zhong Liu,Junchi Yan

http://arxiv.org/abs/2312.08103v1

Compressor summary: The article surveys machine learning methods for solving multi-dimensional bin packing problems and provides a benchmark dataset and future research directions.


3DGEN: A GAN-based approach for generating novel 3D models from image data

Antoine Schnepf,Flavian Vasile,Ugo Tanielian

http://arxiv.org/abs/2312.08094v1

Compressor summary: 3DGEN is a model that uses Neural Radiance Fields and GANs to generate realistic 3D meshes from images for various creative applications.


A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis

Tianshuo Peng,Zuchao Li,Ping Wang,Lefei Zhang,Hai Zhao

http://arxiv.org/abs/2312.08084v1

Compressor summary: DQPSA is a novel framework for multi-modal sentiment analysis that uses prompt as dual query to extract relevant visual information and an energy-based pairwise expert to model target boundaries.


Extending Whisper with prompt tuning to target-speaker ASR

Hao Ma,Zhiyuan Peng,Mingjie Shao,Jing Li,Ju Liu

http://arxiv.org/abs/2312.08079v1

Compressor summary: The paper proposes prompt tuning to extend a single-talker ASR model to target-speaker ASR, achieving comparable performance to full fine-tuning with much less training cost and retaining the original features of the model.


Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation

Wenting Chen,Xiang Li,Linlin Shen,Yixuan Yuan

http://arxiv.org/abs/2312.08078v2

Compressor summary: The Adaptive patch-word Matching (AdaMatch) model uses adaptive patches and keywords to generate explainable chest X-ray reports from images and text.


TERM Model: Tensor Ring Mixture Model for Density Estimation

Ruituo Wu,Jiani Liu,Ce Zhu,Anh-Huy Phan,Ivan V. Oseledets,Yipeng Liu

http://arxiv.org/abs/2312.08075v1

Compressor summary: The paper proposes a tensor ring decomposition method for probabilistic graph modeling, which improves expressiveness and flexibility compared to existing methods, and uses an ensemble learning-inspired mixture model to incorporate multiple permutation candidates for better probability density estimation.


Novel View Synthesis with View-Dependent Effects from a Single Image

Juan Luis Gonzalez Bello,Munchurl Kim

http://arxiv.org/abs/2312.08071v1

Compressor summary: Key points: - Paper proposes a self-supervised learning method for single-image NVS that considers view-dependent effects (VDE) - VDEs are modeled as negative disparity using camera motion priors and specularities - Method uses relaxed volumetric rendering to improve efficiency - Outperforms state-of-the-art methods on two datasets Summary: The paper presents a self-supervised learning method for synthesizing novel views that captures view-dependent effects by modeling them as negative disparity and using relaxed volumetric rendering, achieving better results than existing methods.


A Novel Metric for Measuring Data Quality in Classification Applications (extended version)

Jouseau Roxane,Salva Sébastien,Samir Chafik

http://arxiv.org/abs/2312.08066v1

Compressor summary: The paper introduces a new metric to measure data quality for machine learning models, based on the relationship between performance and data deterioration.


Exploring the Impact of Lay User Feedback for Improving AI Fairness

Evdoxia Taka,Yuri Nakao,Ryosuke Sonoda,Takuya Yokota,Lin Luo,Simone Stumpf

http://arxiv.org/abs/2312.08064v1

Compressor summary: The authors explore how to involve non-experts in improving AI fairness by collecting feedback on a credit model and studying its effects, while also providing resources for further research.


Estimation of Concept Explanations Should be Uncertainty Aware

Vihari Piratla,Juyeon Heo,Sukriti Singh,Adrian Weller

http://arxiv.org/abs/2312.08063v1

Compressor summary: The authors propose a Bayesian method to improve the reliability of concept explanations in multi-modal learning models, which are valuable for interpreting and debugging predictions using human-understandable concepts.


C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation

Florian Fervers,Sebastian Bullinger,Christoph Bodensteiner,Michael Arens,Rainer Stiefelhagen

http://arxiv.org/abs/2312.08060v1

Compressor summary: The paper proposes C-BEV, a novel retrieval method for geolocating street-view images using bird's eye view maps, which outperforms existing methods in challenging scenarios and infers camera pose without metric groundtruth.


Combinatorial Stochastic-Greedy Bandit

Fares Fourati,Christopher John Quinn,Mohamed-Slim Alouini,Vaneet Aggarwal

http://arxiv.org/abs/2312.08057v1

Compressor summary: The SGB algorithm is a novel combinatorial bandit method that samples and selects actions from a subset of unselected arms, achieving better regret bounds than existing methods for constrained social influence maximization.


Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision

Shengguang Wu,Zhenglun Chen,Qi Su

http://arxiv.org/abs/2312.08056v1

Compressor summary: Key points: - The paper proposes a novel artifact image synthesis approach that uses diffusion models, archaeological knowledge, and historical expertise. - The approach generates higher-quality images that capture intricate details and align with written documents. - The approach outperforms existing methods in automatic metrics and human evaluation. Summary: The paper presents a new method to generate high-quality images of ancient artifacts using diffusion models, archaeological knowledge, and historical expertise, achieving better alignment with written texts and outperforming previous approaches.


Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence

Zifan Wang,Zhuorui Ye,Haoran Wu,Junyu Chen,Li Yi

http://arxiv.org/abs/2312.08054v1

Compressor summary: The paper proposes SCSFNet, a novel network that predicts future scenes and their semantic labels from dynamic point cloud sequences using a hybrid geometric representation and an attention-based skip connection scheme.


Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Jihao Xin,Ivan Ilin,Shunkang Zhang,Marco Canini,Peter Richtárik

http://arxiv.org/abs/2312.08053v1

Compressor summary: Kimad is an adaptive compression method for distributed deep learning that adjusts to network layer needs and improves communication efficiency.


Explainable Trajectory Representation through Dictionary Learning

Yuanbo Tang,Zhiyuan Peng,Yang Li

http://arxiv.org/abs/2312.08052v1

Compressor summary: The paper proposes a sparse and interpretable trajectory representation framework using pathlet dictionaries that improves downstream applications like traffic analysis and data compression.


Compositional Inversion for Stable Diffusion Models

Xu-Lu Zhang,Xiao-Yong Wei,Jin-Lin Wu,Tian-Yi Zhang,Zhaoxiang Zhang,Zhen Lei,Qing Li

http://arxiv.org/abs/2312.08048v2

Compressor summary: The paper proposes a method to improve inversion methods that generate personalized images by guiding the inversion process towards the core distribution and using spatial regularization, resulting in more diverse and balanced compositions of concepts.


CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs

Huaiyuan Ying,Zhengyun Zhao,Yang Zhao,Sihang Zeng,Sheng Yu

http://arxiv.org/abs/2312.08036v1

Compressor summary: CoRTEx uses ChatGPT explanations and contrastive learning to improve term clustering in biomedical knowledge graphs, achieving high accuracy and robustness.


Beyond Top-Class Agreement: Using Divergences to Forecast Performance under Distribution Shift

Mona Schirmer,Dan Zhang,Eric Nalisnick

http://arxiv.org/abs/2312.08033v1

Compressor summary: The paper investigates how different model disagreement measures based on divergences perform in detecting out-of-distribution data and estimating test errors, using various vision models.


ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models

Jie Yan,Jing Liu,Zhong-yuan Zhang

http://arxiv.org/abs/2312.08029v1

Compressor summary: The paper proposes a new EM framework for clustering using DDPMs, which shows better performance in clustering and related tasks than VAEs and GANs.


Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Jinta Weng,Jiarui Zhang,Yue Hu,Daidong Fa,Xiaofeng Xuand,Heyan Huang

http://arxiv.org/abs/2312.08027v1

Compressor summary: MTPrompt is a method that improves the performance of chatbots using large language models by incorporating task-related information into prompts, making it easier to access the model's knowledge.


Mono3DVG: 3D Visual Grounding in Monocular Images

Yang Zhan,Yuan Yuan,Zhitong Xiong

http://arxiv.org/abs/2312.08022v1

Compressor summary: The paper introduces a new task, Mono3DRefer, that uses language descriptions with appearance and geometry information to locate 3D objects in monocular RGB images, and proposes a transformer-based network, Mono3DVG-TR, that leverages both text and visual features for this task.


Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network

Yuyang Sun,Huy H. Nguyen,Chun-Shien Lu,ZhiYong Zhang,Lu Sun,Isao Echizen

http://arxiv.org/abs/2312.08020v1

Compressor summary: The text proposes a blended-based detection method for detecting digital face manipulations using synthetic training samples and a multi-scale feature reconstruction network, which performs well on unseen data.


AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Zhiyuan Ma,Guoli Jia,Bowen Zhou

http://arxiv.org/abs/2312.08019v1

Compressor summary: The paper proposes AdapEdit, an algorithm that adapts image editing based on text-driven instructions using soft-attention, improving visual contents generation without needing extra training or data.


uSF: Learning Neural Semantic Field with Uncertainty

Vsevolod Skorokhodov,Darya Drozdova,Dmitry Yudin

http://arxiv.org/abs/2312.08012v1

Compressor summary: The paper introduces uSF, a neural network model that reconstructs 3D scenes with color and semantic labels, as well as uncertainty estimates, improving performance with limited training data.


EZ-CLIP: Efficient Zeroshot Video Action Recognition

Shahzad Ahmad,Sukalpa Chanda,Yogesh S Rawat

http://arxiv.org/abs/2312.08010v1

Compressor summary: EZ-CLIP is an efficient adaptation of CLIP that leverages temporal visual prompts for video action recognition and zero-shot learning with fewer parameters.


Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix

Kewei Wang,Yizheng Wu,Zhiyu Pan,Xingyi Li,Ke Xian,Zhe Wang,Zhiguo Cao,Guosheng Lin

http://arxiv.org/abs/2312.08009v2

Compressor summary: The study explores semi-supervised learning for class-agnostic motion prediction in autonomous driving, using a consistency-based self-training paradigm, a novel motion selection and re-generation module, and two data augmentation strategies to improve performance while reducing annotation costs.


Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

Wenxuan Wang,Tongtian Yue,Yisi Zhang,Longteng Guo,Xingjian He,Xinlong Wang,Jing Liu

http://arxiv.org/abs/2312.08007v1

Compressor summary: The paper introduces a new finer-grained part-level referring expression segmentation task, RefCOCOm dataset, MRES-32M dataset, and model UniRES that outperform previous methods on both object-level and part-level vision-language understanding.


Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

Yang Jiao,Zequn Jie,Shaoxiang Chen,Lechao Cheng,Jingjing Chen,Lin Ma,Yu-Gang Jiang

http://arxiv.org/abs/2312.08004v1

Compressor summary: The paper introduces IA-BEV, a method that integrates instance awareness into depth estimation for object detection in autonomous driving using camera-based bird-eye-view perception.


A Unified View on Forgetting and Strong Equivalence Notions in Answer Set Programming

Zeynep G. Saribatur,Stefan Woltran

http://arxiv.org/abs/2312.07993v1

Compressor summary: The paper introduces a new equivalence notion for ASP programs called relativized simplifications, which captures different notions of forgetting and abstraction in knowledge representation and reasoning.


Accelerating the Global Aggregation of Local Explanations

Alon Mor,Yonatan Belinkov,Benny Kimelfeld

http://arxiv.org/abs/2312.07991v1

Compressor summary: The text describes methods to quickly find and rank important words in a document using statistical analysis, while reducing noise and computational cost.


SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Róbert Csordás,Piotr Piękos,Kazuki Irie,Jürgen Schmidhuber

http://arxiv.org/abs/2312.07987v2

Compressor summary: The SwitchHead method reduces the compute and memory needs of self-attention layers in Transformers using Mixture-of-Experts (MoE) layers, achieving speedup without sacrificing performance.


Multi-perspective Feedback-attention Coupling Model for Continuous-time Dynamic Graphs

Xiaobo Zhu,Yan Wu,Zhipeng Li,Hailong Su,Jin Che,Zhanheng Chen,Liying Wang

http://arxiv.org/abs/2312.07983v1

Compressor summary: The MPFA model combines evolving and raw perspectives in graph networks, enabling efficient learning of interleaved dynamics with fewer temporal neighbors.


Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation

Haiming Yi,Lei Hou,Yuhong Jin,Nasser A. Saeed

http://arxiv.org/abs/2312.07981v1

Compressor summary: The paper proposes a Time Series Diffusion Method (TSDM) for vibration signal generation using a U-net architecture with attention block, which improves small sample fault diagnosis accuracy for bearing datasets.


SLJP: Semantic Extraction based Legal Judgment Prediction

Prameela Madambakam,Shathanaa Rajmohan,Himangshu Sharma,Tummepalli Anka Chandrahas Purushotham Gupta

http://arxiv.org/abs/2312.07979v1

Compressor summary: The proposed semantic extraction based LJP model uses pretrained transformers to understand complex legal case documents, extract semantics at multiple levels, and predict judgment with attention mechanism.


Challenges of YOLO Series for Object Detection in Extremely Heavy Rain: CALRA Simulator based Synthetic Evaluation Dataset

T. Kim,H. Jeon,Y. Lim

http://arxiv.org/abs/2312.07976v2

Compressor summary: The study presents a new dataset for testing object detection in different weather conditions using the CARLA simulator and evaluates YOLO series' performance in various rain scenarios.


LMD: Faster Image Reconstruction with Latent Masking Diffusion

Zhiyuan Ma,zhihuan yu,Jianjun Li,Bowen Zhou

http://arxiv.org/abs/2312.07971v1

Compressor summary: LMD is a faster image reconstruction framework that combines masking diffusion from autoencoders and diffusion probabilistic models, reducing training time and improving inference speed.


Divide and Conquer: Hybrid Pre-training for Person Search

Yanling Tian,Di Chen,Yunan Liu,Jian Yang,Shanshan Zhang

http://arxiv.org/abs/2312.07970v1

Compressor summary: The paper proposes a hybrid pre-training framework for person search, using sub-task data only, and shows significant improvements across diverse protocols.


ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation

Shiyun Chen,Li Lin,Pujin Cheng,Xiaoying Tang

http://arxiv.org/abs/2312.07969v1

Compressor summary: ASLseg is a new semi-supervised framework that improves liver tumor segmentation by adapting and refining the SAM model using pseudo-labels and an adaptation network.


A multi-sourced data and agent-based approach for complementing Time Use Surveys in the context of residential human activity and load curve simulation

Mathieu Schumann,Quentin Reynaud,François Sempé,Julien Guibourdenche,Jean-Baptiste Ly,Nicolas Sabouret

http://arxiv.org/abs/2312.07966v1

Compressor summary: The SMACH approach combines qualitative and quantitative data with agent-based simulation to improve the representation of Time-Use Surveys in activity and energy simulation.


Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

Shengsheng Qian,Yifei Wang,Dizhan Xue,Shengjie Zhang,Huaiwen Zhang,Changsheng Xu

http://arxiv.org/abs/2312.07955v1

Compressor summary: The paper proposes a novel PoisonCAM method that precisely detects and removes SSL backdoors by cluster activation masking, achieving better results than existing methods on ImageNet-100.


Semantic-aware Data Augmentation for Text-to-image Synthesis

Zhaorui Tan,Xi Yang,Kaizhu Huang

http://arxiv.org/abs/2312.07951v1

Compressor summary: SADA is a novel framework for text-to-image synthesis that uses semantic-aware data augmentation and image semantic regularization to improve consistency and quality of generated images.


CBQ: Cross-Block Quantization for Large Language Models

Xin Ding,Xiaoyu Liu,Yun Zhang,Zhijun Tu,Wei Li,Jie Hu,Hanting Chen,Yehui Tang,Zhiwei Xiong,Baoqun Yin,Yunhe Wang

http://arxiv.org/abs/2312.07950v1

Compressor summary: CBQ is a method for efficient large language models that uses cross-block reconstruction to reduce errors from block quantization and outlier handling techniques for better low-bit quantization.


ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss via Meta-Learning

Haowen Bai,Zixiang Zhao,Jiangshe Zhang,Yichen Wu,Lilun Deng,Yukun Cui,Shuang Xu,Baisong Jiang

http://arxiv.org/abs/2312.07943v1

Compressor summary: ReFusion is a meta-learning based framework that learns optimal fusion loss from reconstructing source images, allowing it to adapt to diverse image fusion tasks.


BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

Wenqian Zhang,Molin Huang,Yuxuan Zhou,Juze Zhang,Jingyi Yu,Jingya Wang,Lan Xu

http://arxiv.org/abs/2312.07937v1

Compressor summary: The paper introduces BOTH57M, a new dataset for generating realistic two-hand motions based on both body dynamics and text prompts, and proposes BOTH2Hands, a method that combines diffusion models and a cross-attention transformer for this task.


Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments

Ranjan Sapkota,Dawood Ahmed,Manoj Karkee

http://arxiv.org/abs/2312.07935v1

Compressor summary: YOLOv8 outperforms Mask R-CNN in instance segmentation for apple trees and immature apples under different conditions, with faster inference times.


Levenshtein Distance Embedding with Poisson Regression for DNA Storage

Xiang Wei,Alan J. X. Guo,Sihan Sun,Mengyi Wei,Wei Yu

http://arxiv.org/abs/2312.07931v1

Compressor summary: The paper proposes a neural network-based sequence embedding technique using Poisson regression for efficient computation or approximation of Levenshtein distance in biological applications like DNA storage.


Towards Optimal Statistical Watermarking

Baihe Huang,Banghua Zhu,Hanlin Zhu,Jason D. Lee,Jiantao Jiao,Michael I. Jordan

http://arxiv.org/abs/2312.07930v1

Compressor summary: The paper proposes a new statistical framework for watermarking, improves the rate and trade-off between error types, and explores model-agnostic and robust watermarking scenarios.


Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation

Weiguang Zhang,Qiufeng Wang,Kaizhu Huang

http://arxiv.org/abs/2312.07925v1

Compressor summary: Key points: - Document dewarping aims to remove geometric distortion in photos for text recognition - Polar coordinates representation (Polar-Doc) is explored as an alternative to Cartesian coordinates - One-stage model with multi-scope constraints achieves state-of-the-art results on two benchmarks Summary: The paper proposes a novel document dewarping method using Polar coordinates, which improves efficiency and performance compared to existing methods.


Memory-Efficient Reversible Spiking Neural Networks

Hong Zhang,Yu Zhang

http://arxiv.org/abs/2312.07922v1

Compressor summary: The paper proposes reversible spiking neural networks that reduce memory costs during training, enabling deeper and larger SNN models with similar accuracies to ANNs.


DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Xiaoyu Zhou,Zhiwei Lin,Xiaojun Shan,Yongtao Wang,Deqing Sun,Ming-Hsuan Yang

http://arxiv.org/abs/2312.07920v1

Compressor summary: DrivingGaussian is a framework that efficiently and effectively reconstructs dynamic autonomous driving scenes with moving objects and high-fidelity details using Gaussian models and LiDAR prior.


A Survey of Text Watermarking in the Era of Large Language Models

Aiwei Liu,Leyi Pan,Yijian Lu,Jingjing Li,Xuming Hu,Lijie Wen,Irwin King,Philip S. Yu

http://arxiv.org/abs/2312.07913v1

Compressor summary: The text summarizes current text watermarking technologies that can prevent misuse and piracy of generated texts by large language models.


PromptBench: A Unified Library for Evaluation of Large Language Models

Kaijie Zhu,Qinlin Zhao,Hao Chen,Jindong Wang,Xing Xie

http://arxiv.org/abs/2312.07910v1

Compressor summary: PromptBench is a library to evaluate large language models with various components for research purposes.


Plant Disease Recognition Datasets in the Age of Deep Learning: Challenges and Opportunities

Mingle Xu,Ji Eun Park,Jaehwan Lee,Jucheng Yang,Sook Yoon

http://arxiv.org/abs/2312.07905v1

Compressor summary: The study proposes a taxonomy to describe potential plant disease datasets and suggests future directions for creating more challenge-oriented datasets for deep learning applications in plant disease recognition and species identification.


Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models

Junhao Zheng,Shengjie Qiu,Qianli Ma

http://arxiv.org/abs/2312.07887v1

Compressor summary: SEQ* is an easy method for incremental learning with pre-trained language models that outperforms existing techniques and shows their underestimation of PLMs' anti-forgetting ability.


Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI

Kai Huang,Boyuan Yang,Wei Gao

http://arxiv.org/abs/2312.07886v1

Compressor summary: mPnP-LLM is a technique that allows adaptive modality adaptation for LLMs at runtime, reducing FLOPs and memory usage while maintaining or improving accuracy on embodied AI tasks.


Mutual-Learning Knowledge Distillation for Nighttime UAV Tracking

Yufeng Liu,Haobo Zuo,Liangliang Yao,Kunhan Lu,Guangze Zheng,Changhong Fu

http://arxiv.org/abs/2312.07884v1

Compressor summary: The paper proposes a novel mutual-learning knowledge distillation framework for nighttime UAV tracking that uses low-light enhancers and a tight coupling-aware tracking backbone to improve performance while reducing computational burden.


CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation

Zhenduo Zhang,Bowen Zhang,Guang Liu

http://arxiv.org/abs/2312.07879v1

Compressor summary: CoIE is a technique that enhances text-to-image editing models by using a series of instructions from a large language model to manipulate multiple facial attributes more precisely and effectively.


Causality Analysis for Evaluating the Security of Large Language Models

Wei Zhao,Zhe Li,Jun Sun

http://arxiv.org/abs/2312.07876v1

Compressor summary: The study proposes a framework to analyze and understand the security vulnerabilities of large language models, revealing overfitting to harmful prompts and a mysterious neuron with high causal effect on output that can be targeted by attacks.


Enhance Sketch Recognition's Explainability via Semantic Component-Level Parsing

Guangming Zhu,Siyuan Wang,Tianci Wu,Liang Zhang

http://arxiv.org/abs/2312.07875v1

Compressor summary: The paper proposes a structured sketch recognition network with a semantic component-level memory module that can handle different situations of sketch recognition and improve the network's explainability.


MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation

Yanzuo Lu,Meng Shen,Andy J Ma,Xiaohua Xie,Jian-Huang Lai

http://arxiv.org/abs/2312.07871v2

Compressor summary: The paper proposes a novel Mutual Learning Network (MLNet) for universal domain adaptation, which reduces intra-domain variations and improves unknown-class identification using confidence-guided invariant feature learning and cross-domain mixup.


Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue

Yizhe Yang,Heyan Huang,Yihang Liu,Yang Gao

http://arxiv.org/abs/2312.07868v1

Compressor summary: The text discusses how to generate informative responses in dialogue using different types of knowledge sources and their effects on the task.


BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering

Xiaojie Hong,Zixin Song,Liangzhi Li,Xiaoli Wang,Feiyan Liu

http://arxiv.org/abs/2312.07867v1

Compressor summary: The paper introduces BESTMVQA, a system that helps users build and evaluate medical image question answering datasets and models.


GraphGuard: Detecting and Counteracting Training Data Misuse in Graph Neural Networks

Bang Wu,He Zhang,Xiangwen Yang,Shuo Wang,Minhui Xue,Shirui Pan,Xingliang Yuan

http://arxiv.org/abs/2312.07861v1

Compressor summary: GraphGuard is a training-data-free method for detecting and mitigating data misuse in graph neural networks on cloud-based machine learning platforms.


Data-Dependent Higher-Order Clique Selection for Artery-Vein Segmentation by Energy Minimization

Yoshiro Kitamura,Yuanzhong Li,Wataru Ito,Hiroshi Ishikawa

http://arxiv.org/abs/2312.07860v1

Compressor summary: The paper presents a new image segmentation technique that uses higher-order potentials to model prior knowledge and improve segment quality, especially for pulmonary vessels.


Invariant Graph Transformer

Zhe Xu,Menghai Pan,Yuzhong Chen,Huiyuan Chen,Yuchen Yan,Mahashweta Das,Hanghang Tong

http://arxiv.org/abs/2312.07859v1

Compressor summary: The paper proposes invariant graph Transformer (IGT), a method that uses self-attention to perform fine-grained interventions on graph data for rationale discovery in graph machine learning, leading to improved model performance.


DTL: Disentangled Transfer Learning for Visual Recognition

Minghao Fu,Ke Zhu,Jianxin Wu

http://arxiv.org/abs/2312.07856v1

Compressor summary: DTL uses a lightweight Compact Side Network to disentangle trainable parameters from the backbone, reducing GPU memory usage and improving accuracy on downstream tasks.


Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users

Tianxun Zhou,Muhammad Nur Shahril Iskandar,Keng-Hwee Chiam

http://arxiv.org/abs/2312.07854v1

Compressor summary: This study proposes a new method using image generation diffusion models to improve markerless gait analysis for lower-limb amputees, providing valuable insights for rehabilitation.


High-Order Structure Based Middle-Feature Learning for Visible-Infrared Person Re-Identification

Liuxiang Qiu,Si Chen,Yan Yan,Jing-Hao Xue,Da-Han Wang,Shunzhi Zhu

http://arxiv.org/abs/2312.07853v2

Compressor summary: A novel network called HOS-Net uses short-, long-range features and high-order structure learning to improve visible-infrared person re-identification, achieving state-of-the-art performance.


Noise in the reverse process improves the approximation capabilities of diffusion models

Karthik Elamvazhuthi,Samet Oymak,Fabio Pasqualetti

http://arxiv.org/abs/2312.07851v2

Compressor summary: The paper compares neural ODEs and neural SDEs as reverse processes in SGMs, showing that stochasticity improves approximation of trajectories and enables controllability even with limited network width.


Large Language Model Enhanced Multi-Agent Systems for 6G Communications

Feibo Jiang,Li Dong,Yubo Peng,Kezhi Wang,Kun Yang,Cunhua Pan,Dusit Niyato,Octavia A. Dobre

http://arxiv.org/abs/2312.07850v1

Compressor summary: The text describes how integrating large language models with agents' abilities can enhance 6G communications and proposes a multi-agent system for natural language communication tasks in 6G.


Encoder-minimal and Decoder-minimal Framework for Remote Sensing Image Dehazing

Yuanbo Wen,Tao Gao,Ziqi Li,Jing Zhang,Ting Chen

http://arxiv.org/abs/2312.07849v1

Compressor summary: RSHazeNet is a new framework that uses adaptive transposed self-attention, cross-level multi-view interaction, and view-progressive feature learning to efficiently remove haze from remote sensing images.


Finetuning an LLM on Contextual Knowledge of Classics for Q&A

Shane Storm Strachan

http://arxiv.org/abs/2312.07848v1

Compressor summary: The project aims to create a large language model for Classics that is accurate, consistent, and appealing by fine-tuning an open-source LLM with a refined dataset and addressing its occasional hallucinations through continuous finetuning.


On the Dynamics Under the Unhinged Loss and Beyond

Xiong Zhou,Xianming Liu,Hanzhang Wang,Deming Zhai,Junjun Jiang,Xiangyang Ji

http://arxiv.org/abs/2312.07841v1

Compressor summary: The paper proposes a new loss function called unhinged loss that allows more in-depth analysis of deep learning dynamics and offers practical techniques for enhancing training.


Conflict Transformation and Management. From Cognitive Maps to Value Trees

Berkay H. Tosunlu,Joseph H. A. Guillaume,Alexis Tsoukiàs

http://arxiv.org/abs/2312.07838v1

Compressor summary: The text proposes a framework for using problem structuring methods to transform cognitive maps into value trees, enabling design-oriented decision support for conflict management with higher innovation potential.


Synthetic Data: Can We Trust Statistical Estimators?

Alexander Decruyenaere,Heidelinde Dehaene,Paloma Rabaey,Christiaan Polet,Johan Decruyenaere,Stijn Vansteelandt,Thomas Demeester

http://arxiv.org/abs/2312.07837v1

Compressor summary: The text discusses the challenges of inferring from synthetic data and emphasizes the need for better statistical inference methods for this type of data.


Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements

Gaurav Shrivastava,Ser-Nam Lim,Abhinav Shrivastava

http://arxiv.org/abs/2312.07835v1

Compressor summary: The paper proposes a robust framework for low-level vision tasks that learns from corrupted test sequences without external data and uses a novel spatial pyramid loss for noise robustness.


Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences

C Kupferschmidt,A. D. Binns,K. L. Kupferschmidt,G. W Taylor

http://arxiv.org/abs/2312.07833v1

Compressor summary: Text-to-image generative models can produce realistic images of rivers but may contain biases from the training data that need to be addressed in earth sciences applications.


Abusive Span Detection for Vietnamese Narrative Texts

Nhu-Thanh Nguyen,Khoa Thi-Kim Phan,Duc-Vu Nguyen,Ngan Luu-Thuy Nguyen

http://arxiv.org/abs/2312.07831v1

Compressor summary: The authors created a Vietnamese dataset for detecting abuse in texts using NLP methods and found that PhoBERT performed best among tested models.


A Deep Learning-Based System for Automatic Case Summarization

Minh Duong,Long Nguyen,Yen Vuong,Trong Le,Ha-Thanh Nguyen

http://arxiv.org/abs/2312.07824v1

Compressor summary: The paper introduces a deep learning system that generates concise and relevant summaries of legal case documents using natural language processing techniques, aiming to benefit legal professionals by reducing workload and increasing efficiency.


Semantic-Lens: Instance-Centric Semantic Alignment for Video Super-Resolution

Qi Tang,Yao Zhao,Meiqin Liu,Jian Jin,Chao Yao

http://arxiv.org/abs/2312.07823v1

Compressor summary: The Semantic Lens is a novel method for video super-resolution that uses semantic priors from degraded videos to improve pixel-level alignment and generate realistic visual results.


Prototypical Self-Explainable Models Without Re-training

Srishti Gautam,Ahcene Boubekki,Marina M. C. Höhne,Michael C. Kampffmeyer

http://arxiv.org/abs/2312.07822v1

Compressor summary: KMEx is a simple method that transforms pre-trained models into self-explainable ones, providing diverse and trustworthy explanations without retraining.


Native Language Identification with Large Language Models

Wei Zhang,Alexandre Salle

http://arxiv.org/abs/2312.07819v1

Compressor summary: The text describes experiments with GPT models like GPT-4 in Native Language Identification (NLI), where they predict a writer's first language from their second language writings, achieving high performance and providing justifications based on various linguistic features.


A Foundational Multimodal Vision Language AI Assistant for Human Pathology

Ming Y. Lu,Bowen Chen,Drew F. K. Williamson,Richard J. Chen,Kenji Ikamura,Georg Gerber,Ivy Liang,Long Phi Le,Tong Ding,Anil V Parwani,Faisal Mahmood

http://arxiv.org/abs/2312.07814v1

Compressor summary: PathChat is a multimodal AI assistant for human pathology that combines a vision encoder pretrained on histology images with a language model, achieving high diagnostic accuracy and being more accurate and preferred by experts than other AI assistants.