arxiv compressed, 2023-11-27

This page contains one-sentence summaries of cs.AI/ML/CV papers announced on 2023-11-27 generated by the compressor, my personal LLM-based project.


Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Munan Ning,Bin Zhu,Yujia Xie,Bin Lin,Jiaxi Cui,Lu Yuan,Dongdong Chen,Li Yuan

http://arxiv.org/abs/2311.16103v1

Compressor summary: The paper introduces Video-Bench, a comprehensive evaluation system for video-based large language models, with 10 tasks covering understanding, question-answering, and decision-making.


Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback

Mihir Prabhudesai,Tsung-Wei Ke,Alexander C. Li,Deepak Pathak,Katerina Fragkiadaki

http://arxiv.org/abs/2311.16102v1

Compressor summary: Diffusion-TTA adapts pre-trained discriminative models using generative feedback from a diffusion model, improving their accuracy in various tasks.


How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu,Chenhang Cui,Zijun Wang,Yiyang Zhou,Bingchen Zhao,Junlin Han,Wangchunshu Zhou,Huaxiu Yao,Cihang Xie

http://arxiv.org/abs/2311.16101v1

Compressor summary: This study evaluates Vision LLMs' visual reasoning abilities by introducing a comprehensive safety evaluation suite that covers OOD generalization and adversarial robustness, revealing their strengths and weaknesses in handling different conditions.


GART: Gaussian Articulated Template Models

Jiahui Lei,Yufu Wang,Georgios Pavlakos,Lingjie Liu,Kostas Daniilidis

http://arxiv.org/abs/2311.16099v1

Compressor summary: GART is a model that uses moving 3D Gaussians to represent deformable subjects in monocular videos with efficient reconstruction and rendering.


CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller,Angela Dai

http://arxiv.org/abs/2311.16097v1

Compressor summary: CG-HOI is a method for generating realistic 3D human-object interactions from text by modeling contact between the human body and object geometry.


Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Zhe Li,Zerong Zheng,Lizhen Wang,Yebin Liu

http://arxiv.org/abs/2311.16096v1

Compressor summary: The paper presents a new method for creating realistic and dynamic human avatars using a combination of 2D and 3D neural networks, which can adapt to different clothing styles and poses.


Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Aiyu Cui,Jay Mahajan,Viraj Shah,Preeti Gomathinayagam,Svetlana Lazebnik

http://arxiv.org/abs/2311.16094v1

Compressor summary: The paper introduces a Street TryOn benchmark and a novel method for virtual try-on on in-the-wild scenes without paired data, using DensePose warping correction and diffusion-based inpainting.


Have we built machines that think like people?

Luca M. Schulze Buschoff,Elif Akata,Matthias Bethge,Eric Schulz

http://arxiv.org/abs/2311.16093v1

Compressor summary: The paper evaluates how well vision-based large language models perform in intuitive physics, causal reasoning, and intuitive psychology tasks, finding that they are still far from human capabilities in these domains.


Self-correcting LLM-controlled Diffusion Models

Tsung-Han Wu,Long Lian,Joseph E. Gonzalez,Boyi Li,Trevor Darrell

http://arxiv.org/abs/2311.16090v1

Compressor summary: SLD is a framework that generates images from text prompts, assesses their alignment, and performs self-corrections to ensure correctness in the resulting image, without needing additional training or integrating with existing diffusion models.


DUnE: Dataset for Unified Editing

Afra Feyza Akyürek,Eric Pan,Garry Kuwanto,Derry Wijaya

http://arxiv.org/abs/2311.16087v1

Compressor summary: This paragraph discusses a study that explores different ways to edit language models beyond factual data, introduces a new benchmark called DUnE, and shows that no existing methods have completely solved the generalized editing problem.


MAST: Model-Agnostic Sparsified Training

Yury Demidovich,Grigory Malinovsky,Egor Shulgin,Peter Richtárik

http://arxiv.org/abs/2311.16086v1

Compressor summary: The text introduces a new optimization problem that uses pre-trained models and random sketch operators for sparsification during machine learning model training, leading to improved convergence rates and relaxed assumptions.


BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification

Dmitri Roussinov,Serge Sharoff

http://arxiv.org/abs/2311.16083v1

Compressor summary: The paper shows that PLMs struggle with topic changes in text classification tasks, proposes using synthetic texts to improve performance, and provides empirical results and code for replication.


ViT-Lens-2: Gateway to Omni-modal Intelligence

Weixian Lei,Yixiao Ge,Kun Yi,Jianfeng Zhang,Difei Gao,Dylan Sun,Yuying Ge,Ying Shan,Mike Zheng Shou

http://arxiv.org/abs/2311.16081v1

Compressor summary: The paper introduces ViT-Lens-2, a method for efficient learning of diverse modalities using pretrained vision transformers and modality alignment with existing foundation models.


MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Zeming Chen,Alejandro Hernández Cano,Angelika Romanou,Antoine Bonnet,Kyle Matoba,Francesco Salvi,Matteo Pagliardini,Simin Fan,Andreas Köpf,Amirkeivan Mohtashami,Alexandre Sallinen,Alireza Sakhaeirad,Vinitra Swamy,Igor Krawczuk,Deniz Bayazit,Axel Marmet,Syrielle Montariol,Mary-Anne Hartley,Martin Jaggi,Antoine Bosselut

http://arxiv.org/abs/2311.16079v1

Compressor summary: MEDITRON is an open-source suite of large-scale medical language models that outperform several closed-source models on medical benchmarks.


BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

François Remy,Kris Demuynck,Thomas Demeester

http://arxiv.org/abs/2311.16075v1

Compressor summary: The study uses Large Language Models and UMLS knowledge graph to create high-fidelity representations of biomedical concepts and sentences, improving performance on various tasks and releasing a multilingual model.


A Survey on Vulnerability of Federated Learning: A Learning Algorithm Perspective

Xianghua Xie,Chen Hu,Hanchi Ren,Jingjing Deng

http://arxiv.org/abs/2311.16065v1

Compressor summary: This paper reviews malicious attacks on federated learning (FL) systems, categorizes them into four types, and discusses defense strategies that aim to protect FL's learning process, data, and models from manipulation and sabotage.


DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization

Zhaoyang Xia,Carol Neidle,Dimitris N. Metaxas

http://arxiv.org/abs/2311.16060v1

Compressor summary: The research introduces DiffSLVA, a method that uses diffusion models and image features to anonymize sign language videos without losing linguistic content, potentially benefiting Deaf and Hard-of-Hearing communities.


Metric Space Magnitude for Evaluating Unsupervised Representation Learning

Katharina Limbeck,Rayna Andreeva,Rik Sarkar,Bastian Rieck

http://arxiv.org/abs/2311.16054v1

Compressor summary: The paragraph introduces magnitude as a measure of the effective size of a space, and presents a new quality measure for dimensionality reduction tasks based on dissimilarity between magnitude functions.


Exploring Attribute Variations in Style-based GANs using Diffusion Models

Rishubh Parihar,Prasanna Balaji,Raghav Magazine,Sarthak Vora,Tejan Karmali,Varun Jampani,R. Venkatesh Babu

http://arxiv.org/abs/2311.16052v1

Compressor summary: The paper proposes a new method for diverse attribute editing by modeling multidimensional attribute edits using disentangled latent spaces of pretrained GANs and training a Denoising Diffusion Probabilistic Model.


Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing

Jian Gao,Chun Gu,Youtian Lin,Hao Zhu,Xun Cao,Li Zhang,Yao Yao

http://arxiv.org/abs/2311.16043v1

Compressor summary: The paragraph describes a new method to render 3D scenes from multiple images using point-based rendering, which allows for editing, ray-tracing, and real-time relighting of the scene.


Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps

Jane Wu,Diego Thomas,Ronald Fedkiw

http://arxiv.org/abs/2311.16042v1

Compressor summary: The paper proposes a new method using deep learning to reconstruct 3D clothed humans from 2D normal maps and RGB images, without volumetric information ground truth.


OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng,Weiliang Chen,Yuanhui Huang,Borui Zhang,Yueqi Duan,Jiwen Lu

http://arxiv.org/abs/2311.16038v1

Compressor summary: The paper proposes OccWorld, a world model that predicts the movement of the ego car and the evolution of surrounding scenes in 3D occupancy space, using scene tokens and a GPT-like generative transformer.


GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Jiemin Fang,Junjie Wang,Xiaopeng Zhang,Lingxi Xie,Qi Tian

http://arxiv.org/abs/2311.16037v1

Compressor summary: The GaussianEditor framework allows delicate and precise editing of 3D scenes using text instructions and 3D Gaussians, with faster training speed compared to previous methods.


Machine Learning-Enhanced Aircraft Landing Scheduling under Uncertainties

Yutian Pang,Peng Zhao,Jueming Hu,Yongming Liu

http://arxiv.org/abs/2311.16030v1

Compressor summary: The paper proposes a machine learning-enhanced landing scheduling method that reduces aircraft delays, improves safety, and considers uncertainties in flight events.


A Neural Framework for Generalized Causal Sensitivity Analysis

Dennis Frauen,Fergus Imrie,Alicia Curth,Valentyn Melnychuk,Stefan Feuerriegel,Mihaela van der Schaar

http://arxiv.org/abs/2311.16026v1

Compressor summary: NeuralCSA is a neural framework for causal sensitivity analysis that works with various sensitivity models, treatment types, and causal queries, and can infer valid bounds on the causal query of interest.


Forecasting Auxiliary Energy Consumption for Electric Heavy-Duty Vehicles

Yuantao Fan,Zhenkan Wang,Sepideh Pashami,Slawomir Nowaczyk,Henrik Ydreskog

http://arxiv.org/abs/2311.16003v1

Compressor summary: The paper proposes a method to improve energy consumption prediction and explainability for electric commercial vehicles by training multiple regression models on subsets of data based on relevant sub-populations.


Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

Teo Deveney,Jan Stanczuk,Lisa Maria Kreusser,Chris Budd,Carola-Bibiane Schönlieb

http://arxiv.org/abs/2311.15996v1

Compressor summary: This paper analyses the differences between ODE and SDE dynamics in score-based diffusion models and proposes a regularisation term to reduce these differences, but it may degrade SDE sample quality.


Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks

Evelyn Herberg,Roland Herzog,Frederik Köhne,Leonie Kreis,Anton Schiela

http://arxiv.org/abs/2311.15995v1

Compressor summary: The paper presents a method to insert new layers in neural networks during training using sensitivity-based techniques, which improves training efficiency and reduces computational effort.


Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework

Shaobo Wang,Xiangdong Zhang,Junchi Yan

http://arxiv.org/abs/2311.15993v1

Compressor summary: UBN is a two-stage framework that alleviates feature condensation and unifies various BN variants to improve neural network training stability and convergence.


DiffAnt: Diffusion Models for Action Anticipation

Zeyun Zhong,Chengzhi Wu,Manuel Martin,Michael Voit,Juergen Gall,Jürgen Beyerer

http://arxiv.org/abs/2311.15991v1

Compressor summary: The authors propose a new generative model that captures different possible future actions by iteratively generating them from Gaussian noise, conditioned on the observed video, and show its effectiveness on four benchmark datasets.


Should We Learn Most Likely Functions or Parameters?

Shikai Qiu,Tim G. J. Rudner,Sanyam Kapoor,Andrew Gordon Wilson

http://arxiv.org/abs/2311.15990v1

Compressor summary: The text discusses alternatives to standard regularized training methods by directly estimating the most likely function implied by a model and data, which can improve generalization and robustness.


Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers

Yilun Liu,Difan Jiao,Ashton Anderson

http://arxiv.org/abs/2311.15983v1

Compressor summary: The Sparsify-then-Classify (STC) approach improves text classification performance by using all internal representations of Large Language Models with multiple pooling strategies and sparsifying task-specific features.


Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Yuanxun Lu,Jingyang Zhang,Shiwei Li,Tian Fang,David McKinnon,Yanghai Tsin,Long Quan,Xun Cao,Yao Yao

http://arxiv.org/abs/2311.15980v1

Compressor summary: The authors propose a novel method for generating diverse and high-fidelity 3D content using a multi-view 2.5D diffusion model that is fine-tuned from a pre-trained 2D diffusion model, without the need for score distillation sampling or extensive 3D training data.


Soil Organic Carbon Estimation from Climate-related Features with Graph Neural Network

Weiying Zhao,Natalia Efremova

http://arxiv.org/abs/2311.15979v1

Compressor summary: The study compared four Graph Neural Network operators to estimate soil organic carbon using satellite data and found that PESAGE and PETransformer models performed best, showing the potential of GNNs in predicting SOC.


Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia,Letian Shi,Zifeng Ding,João F. Henriques,Daniel Cremers

http://arxiv.org/abs/2311.15977v1

Compressor summary: The Text2Loc neural network uses natural language descriptions and a coarse-to-fine localization pipeline to improve 3D point cloud localization accuracy by up to 2 times over previous methods.


FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding in Open World

Thanh-Dat Truong,Utsav Prabhu,Bhiksha Raj,Jackson Cothren,Khoa Luu

http://arxiv.org/abs/2311.15965v1

Compressor summary: The paper introduces a new Fairness Contrastive Clustering loss to address catastrophic forgetting and fairness in continual learning for semantic scene understanding, and proposes an attention-based visual grammar approach for background shift and unknown classes.


Efficient Pre-training for Localized Instruction Generation of Videos

Anil Batra,Davide Moltisanti,Laura Sevilla-Lara,Marcus Rohrbach,Frank Keller

http://arxiv.org/abs/2311.15964v1

Compressor summary: Sieve-&-Swap is a technique to automatically filter and improve procedural video transcripts for better step localization and instruction generation with less computational resources.


From Pixels to Titles: Video Game Identification by Screenshots using Convolutional Neural Networks

Fabricio Breve

http://arxiv.org/abs/2311.15963v1

Compressor summary: The paper shows how convolutional neural networks can identify video games from single screenshots with high accuracy, using different architectures and initial weights.


Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines

Yu-An Lin,Chen-Tao Lee,Guan-Ting Liu,Pu-Jen Cheng,Shao-Hua Sun

http://arxiv.org/abs/2311.15960v1

Compressor summary: The paper introduces Program Machine Policies (POMPs), which combine programmatic RL and state machine policies to represent complex behaviors and address long-term tasks, outperforming previous methods on various tasks and generalizing inductively without fine-tuning.


A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

Shuyue Stella Li,Beining Xu,Xiangyu Zhang,Hexin Liu,Wenhan Chao,Leibny Paola Garcia

http://arxiv.org/abs/2311.15954v1

Compressor summary: The study examines how self-supervised learning (SSL) models perform as feature extractors in cross-lingual settings and proposes a new metric, PSR, to measure their effectiveness using ASR performance.


Replay across Experiments: A Natural Extension of Off-Policy RL

Dhruva Tirumala,Thomas Lampe,Jose Enrique Chen,Tuomas Haarnoja,Sandy Huang,Guy Lever,Ben Moran,Tim Hertweck,Leonard Hasenclever,Martin Riedmiller,Nicolas Heess,Markus Wulfmeier

http://arxiv.org/abs/2311.15951v1

Compressor summary: Replay Across Experiments (RaE) is a simple framework that uses experience from previous experiments to improve RL performance, exploration, and bootstrapping while requiring minimal changes.


GloNets: Globally Connected Neural Networks

Antonio Di Cecco,Carlo Metta,Marco Fantozzi,Francesco Morandin,Maurizio Parton

http://arxiv.org/abs/2311.15947v1

Compressor summary: GloNet is a new architecture that helps deep neural networks work better at higher depths by uniformly connecting and regulating information flow across the network.


Leveraging deep active learning to identify low-resource mobility functioning information in public clinical notes

Tuan-Dung Le,Zhuqi Miao,Samuel Alvarado,Brittany Smith,William Paiva,Thanh Thieu

http://arxiv.org/abs/2311.15946v1

Compressor summary: The paragraph introduces a new dataset for extracting and analyzing mobility functioning information from clinical notes using BERT and CRF models.


Over-Squashing in Riemannian Graph Neural Networks

Julia Balla

http://arxiv.org/abs/2311.15945v1

Compressor summary: The paper investigates whether using Riemannian manifolds of variable curvature in Hyperbolic Graph Neural Networks (HGNNs) can reduce over-squashing, a phenomenon where node features become insensitive to distant nodes in the graph.


Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Sicong Leng,Yang Zhou,Mohammed Haroon Dupty,Wee Sun Lee,Sam Conrad Joyce,Wei Lu

http://arxiv.org/abs/2311.15941v1

Compressor summary: The authors introduce a new dataset, model, and evaluation method for generating floor plans from natural language descriptions, aiming to advance the field of language-guided design generation.


Physics-informed neural networks for transformed geometries and manifolds

Samuel Burbulla

http://arxiv.org/abs/2311.15940v1

Compressor summary: The paper proposes a method to improve physics-informed neural networks (PINNs) by incorporating geometric transformations, allowing them to handle complex or varying shapes better and enable shape optimization.


Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Zhongyi Shui,Yunlong Zhang,Kai Yao,Chenglu Zhu,Yuxuan Sun,Lin Yang

http://arxiv.org/abs/2311.15939v1

Compressor summary: The paper introduces a novel framework that uses a point prompter and a segment anything model (SAM) for automatic nuclear instance segmentation in histology images, achieving state-of-the-art results.


Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo,Javier Civera

http://arxiv.org/abs/2311.15937v1

Compressor summary: SALAD is a new method for visual place recognition that uses optimal transport to aggregate local features, discards non-informative ones, and leverages a fast-learning backbone to achieve better performance than existing approaches.


A new fuzzy multi-attribute group decision-making method based on TOPSIS and optimization models

Qixiao Hu,Shiquan Zhang,Chaolang Hu,Yuetong Liu

http://arxiv.org/abs/2311.15933v1

Compressor summary: The paper proposes a new method for multi-attribute group decision-making using TOPSIS and optimization models with interval-valued intuitionistic fuzzy sets, which combines subjective and objective weighting methods and is demonstrated on a real case study.


WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

Youssef Benchekroun,Megi Dervishi,Mark Ibrahim,Jean-Baptiste Gaya,Xavier Martinet,Grégoire Mialon,Thomas Scialom,Emmanuel Dupoux,Dieuwke Hupkes,Pascal Vincent

http://arxiv.org/abs/2311.15930v1

Compressor summary: WorldSense is a benchmark to test LLMs' ability to understand simple arrangements of entities, but current chat-LLMs make errors and have response biases even with three objects.


Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Alexander Tapley,Marissa Dotter,Michael Doyle,Aidan Fennelly,Dhanuj Gandikota,Savanna Smith,Michael Threet,Tim Welsh

http://arxiv.org/abs/2311.15925v1

Compressor summary: The paper introduces SimFire, a realistic wildfire simulator, and SimHarness, an agent-based machine learning system to generate land management strategies, to help prepare for and react to increasingly severe fire seasons due to climate change.


Diagnosis driven Anomaly Detection for CPS

Henrik S. Steude,Lukas Moddemann,Alexander Diedrich,Jonas Ehrhardt,Oliver Niggemann

http://arxiv.org/abs/2311.15924v1

Compressor summary: The authors propose a method that combines deep learning-based anomaly detection with Consistency-Based Diagnosis for holistic diagnosis in Cyber-Physical Systems and show its effectiveness on simulated and real data.


A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning

Jianxiong Li,Shichao Lin,Tianyu Shi,Chujie Tian,Yu Mei,Jian Song,Xianyuan Zhan,Ruimin Li

http://arxiv.org/abs/2311.15920v1

Compressor summary: The paper proposes a data-driven framework for traffic signal control using machine learning and traffic flow theory to infer rewards from coarse-grained data and learn policies from historical datasets.


ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization

Elahe Vahdani,Yingli Tian

http://arxiv.org/abs/2311.15916v1

Compressor summary: The paper introduces ADM-Loc, a novel framework for detecting actions in videos with limited annotations, by generating action proposals from a composite distribution and enforcing consistency in action classification scores.


Computer Vision for Carriers: PATRIOT

Ari Goodman,Gurpreet Singh,James Hing,Ryan O'Shea

http://arxiv.org/abs/2311.15914v1

Compressor summary: PATRIOT is a prototype system that uses existing camera feeds and passive sensing to automatically track and update aircraft positions on a virtual Ouija board interface, improving deck tracking efficiency and safety without GPS sensors.


LIFT OFF: LoRaWAN Installation and Fiducial Tracking Operations for the Flightline of the Future

Ari Goodman,Ryan O'Shea

http://arxiv.org/abs/2311.15912v1

Compressor summary: LIFT OFF is a hybrid framework that uses machine vision, GPS sensors, and LoRaWAN to provide real-time situational awareness of people, equipment, and aircraft positions in various environments, including military flightlines.


Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

Claudio Rota,Marco Buzzelli,Joost van de Weijer

http://arxiv.org/abs/2311.15908v1

Compressor summary: The paper proposes StableVSR, a method that uses Diffusion Models and Temporal Conditioning Module to enhance the quality of upscaled videos by synthesizing realistic and temporally-consistent details.


MetaDefa: Meta-learning based on Domain Enhancement and Feature Alignment for Single Domain Generalization

Can Sun,Hao Zheng,Zhigang Hu,Liu Yang,Meiguang Zheng,Bo Xu

http://arxiv.org/abs/2311.15906v1

Compressor summary: MetaDefa is a novel meta-learning method that improves SDG model generalization by enhancing domains and aligning features using background substitution, visual corruptions, and class activation maps.


Data Generation for Post-OCR correction of Cyrillic handwriting

Evgenii Davydkin,Aleksandr Markelov,Egor Iuldashev,Anton Dudkin,Ivan Krivorotov

http://arxiv.org/abs/2311.15896v1

Compressor summary: The paper proposes a novel method to generate realistic synthetic Cyrillic handwriting and use it to create a large dataset for training a post-OCR correction model, which can improve error identification and evaluation of student performance.


Stability-Informed Initialization of Neural Ordinary Differential Equations

Theodor Westny,Arman Mohammadi,Daniel Jung,Erik Frisk

http://arxiv.org/abs/2311.15890v1

Compressor summary: The paper explores how different aspects of neural ODE training impact performance and introduces a new initialization technique based on stability regions.


FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters

D. M. Bot,J. Peeters,J. Liesenborgs,J. Aerts

http://arxiv.org/abs/2311.15887v1

Compressor summary: FLASC is a flare-sensitive clustering algorithm that improves upon HDBSCAN* by differentiating branches within detected clusters and offering two variants with varying computational cost and noise robustness.


EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li,Duc Minh Vo,Akihiro Sugimoto,Hideki Nakayama

http://arxiv.org/abs/2311.15879v1

Compressor summary: EVCap is a retrieval-augmented image captioning method that uses external visual-name memory to enable LLMs to describe novel objects without relying on large amounts of data or scaling up network parameters.


RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization

Kwanyoung Kim,Yujin Oh,Sangjoon Park,Hwa Kyung Byun,Jin Sung Kim,Yong Bae Kim,Jong Chul Ye

http://arxiv.org/abs/2311.15876v1

Compressor summary: RO-LLaMA is a versatile AI model that can handle various tasks in radiation oncology, thanks to the CEFTune technique and LLM-driven segmentation framework.


InterControl: Generate Human Motion Interactions by Controlling Every Joint

Zhenzhi Wang,Jingbo Wang,Dahua Lin,Bo Dai

http://arxiv.org/abs/2311.15864v1

Compressor summary: InterControl is a novel approach that uses motion diffusion models and controlnets to generate realistic human interactions with flexible spatial control of every joint.


SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho,Jie Song,Otmar Hilliges

http://arxiv.org/abs/2311.15855v1

Compressor summary: SiTH is a novel pipeline that uses an image-conditioned diffusion model to create lifelike and detailed 3D human reconstructions from single images by decomposing the problem into hallucination and reconstruction subproblems.


A systematic study comparing hyperparameter optimization engines on tabular data

Balazs Kegl

http://arxiv.org/abs/2311.15854v1

Compressor summary: The authors compare different hyperparameter optimization engines using normalization and aggregation methods and identify three top-performing engines.


Single-Model and Any-Modality for Video Object Tracking

Zongwei Wu,Jilai Zheng,Xiangxuan Ren,Florin-Alexandru Vasluianu,Chao Ma,Danda Pani Paudel,Luc Van Gool,Radu Timofte

http://arxiv.org/abs/2311.15851v1

Compressor summary: Un-Track is a single transformer-based tracker that learns a common latent space for multiple modalities using RGB-X pairs and achieves significant F-score gains on various datasets without modality-specific fine-tuning.


Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration

Lei Wang,Qingbo Wu,Desen Yuan,King Ngi Ngan,Hongliang Li,Fanman Meng,Linfeng Xu

http://arxiv.org/abs/2311.15846v1

Compressor summary: The paper proposes a method for learning robust image quality assessment models from low-cost opinion scores, which can perform well even with noisy and limited data.


Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang,Biao Gong,Yutong Feng,Xi Chen,Yuqian Fu,Yu Liu,Donglin Wang

http://arxiv.org/abs/2311.15841v1

Compressor summary: The study introduces a new method called Action-Disentangled Identifier (ADI) for text-to-image generation that improves action customization by learning action-specific identifiers and blocking the inversion of irrelevant features.


Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Alexander Tapley,Kyle Gatesman,Luis Robaina,Brett Bissey,Joseph Weissman

http://arxiv.org/abs/2311.15838v1

Compressor summary: ARLIN Toolkit helps identify weaknesses in Deep Reinforcement Learning models using clear explanations and visualizations, making them safer to use in real situations.


Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis

Léo Lebrat,Rodrigo Santa Cruz,Remi Chierchia,Yulia Arzhaeva,Mohammad Ali Armin,Joshua Goldsmith,Jeremy Oorloff,Prithvi Reddy,Chuong Nguyen,Lars Petersson,Michelle Barakat-Johnson,Georgina Luscombe,Clinton Fookes,Olivier Salvado,David Ahmedt-Aristizabal

http://arxiv.org/abs/2311.15836v1

Compressor summary: Syn3DWound is an open-source dataset of realistic simulated wounds with annotations, aimed at improving machine learning-based wound management through image analysis.


Temporal Action Localization for Inertial-based Human Activity Recognition

Marius Bock,Michael Moeller,Kristof Van Laerhoven

http://arxiv.org/abs/2311.15831v1

Compressor summary: This paper shows how temporal attention models can improve wearable human activity recognition using raw data, outperforming previous methods with up to 25% better results.


Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Soyed Tuhin Ahmed,Kamal Danouchi,Michael Hefenbrock,Guillaume Prenat,Lorena Anghel,Mehdi B. Tahoori

http://arxiv.org/abs/2311.15816v1

Compressor summary: The paper proposes Scale Dropout, a novel regularization technique for binary neural networks, and Monte Carlo-Scale Dropout-based Bayesian neural networks for efficient uncertainty estimation on spintronic memory-based computation-in-memory architectures with significant energy savings.


FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

Yu Lu,Linchao Zhu,Hehe Fan,Yi Yang

http://arxiv.org/abs/2311.15813v1

Compressor summary: FlowZero is a novel framework that uses LLMs and image diffusion models to generate temporally-coherent videos from complex spatio-temporal text descriptions.


C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

Avigyan Bhattacharya,Mainak Singha,Ankit Jha,Biplab Banerjee

http://arxiv.org/abs/2311.15812v1

Compressor summary: C-SAW is a method that improves CLIP's performance in analyzing optical remote sensing images by enhancing visual features and prompt learning, addressing domain and content variations.


Exploring Artificial Intelligence Methods for Energy Prediction in Healthcare Facilities: An In-Depth Extended Systematic Review

Marjan FatehiJananloo,Helen Stopps,J. J. McArthur

http://arxiv.org/abs/2311.15807v1

Compressor summary: The study reviewed 17 articles using machine learning and AI to predict hospital energy consumption, finding that occupancy and meteorological data are significant factors, while highlighting the need for more research on optimizing methods and integrating real-time data.


PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions

Edouard Yvinec,Arnaud Dapogny,Kevin Bailly

http://arxiv.org/abs/2311.15806v1

Compressor summary: PIPE is a data-free quantization method for deep neural networks that adapts well to different devices and achieves good accuracy-speed trade-offs using residual error expansion, group sparsity, and ensemble approximation.


SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin Herau,Nathan Piasco,Moussab Bennehar,Luis Roldão,Dzmitry Tsishkou,Cyrille Migniot,Pascal Vasseur,Cédric Demonceaux

http://arxiv.org/abs/2311.15803v1

Compressor summary: The paper proposes a NeRF-based sensor calibration method for autonomous driving that uses overlapping areas and improves accuracy and robustness.


Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

Lukas Wutschitz,Boris Köpf,Andrew Paverd,Saravan Rajmohan,Ahmed Salem,Shruti Tople,Santiago Zanella-Béguelin,Menglin Xia,Victor Rühle

http://arxiv.org/abs/2311.15792v1

Compressor summary: The authors propose using metadata in machine learning systems to address security and privacy issues and compare two methods for achieving user-level non-interference, finding that retrieval augmented models provide the best balance of utility, scalability, and flexibility.


YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Shaohua Wu,Xudong Zhao,Shenling Wang,Jiangang Luo,Lingjun Li,Xi Chen,Bing Zhao,Wei Wang,Tong Yu,Rongguo Zhang,Jiahua Zhang,Chao Wang

http://arxiv.org/abs/2311.15786v1

Compressor summary: The paragraph describes a new language model called Yuan 2.0 that uses local dependencies in natural language to improve attention, has a large number of parameters, and can perform well in various tasks such as code generation and math problem-solving.


Relationship between Model Compression and Adversarial Robustness: A Review of Current Evidence

Svetlana Pavlitska,Hannes Grolig,J. Marius Zöllner

http://arxiv.org/abs/2311.15782v1

Compressor summary: The paper reviews how different techniques to make neural networks smaller can affect their ability to resist attacks, but the results are not consistent.


Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Simone Conia,Min Li,Daniel Lee,Umar Farooq Minhas,Ihab Ilyas,Yunyao Li

http://arxiv.org/abs/2311.15781v1

Compressor summary: The authors propose a new task called Knowledge Graph Enhancement (KGE) that aims to improve the quality and quantity of textual information for non-English entity names and descriptions in Wikidata using a novel unsupervised approach, M-NTA, which combines Machine Translation, Web Search, and Large Language Models. They also introduce WikiKGE-10, the first benchmark to evaluate KGE methods across 10 languages and 7 language families.


Stable Segment Anything Model

Qi Fan,Xin Tao,Lei Ke,Mingqiao Ye,Yuan Zhang,Pengfei Wan,Zhongyuan Wang,Yu-Wing Tai,Chi-Keung Tang

http://arxiv.org/abs/2311.15776v1

Compressor summary: The paper analyzes how well SAM can segment objects with low-quality prompts and proposes Stable-SAM, which adjusts feature sampling locations to improve stability without changing the model architecture or adding many parameters.


Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Biao Gong,Siteng Huang,Yutong Feng,Shiwei Zhang,Yuyuan Li,Yu Liu

http://arxiv.org/abs/2311.15773v1

Compressor summary: SimM is a system that adjusts image generation to match textual layout instructions without needing additional training, using a pipeline of error detection and rectification with minimal overhead.


Attend Who is Weak: Enhancing Graph Condensation via Cross-Free Adversarial Training

Xinglin Li,Kun Wang,Hanhui Deng,Yuxuan Liang,Di Wu

http://arxiv.org/abs/2311.15772v1

Compressor summary: The paper proposes Shock Absorber, a perturbation technique that enhances graph neural networks' robustness and stability by generating synthetic graphs with minimal additional time overhead.


Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

Huanjin Yao,Wenhao Wu,Zhiheng Li

http://arxiv.org/abs/2311.15769v1

Compressor summary: The paper introduces Side4Video, a method for memory-efficient fine-tuning of large vision models to video understanding using a lightweight spatial-temporal side network.


Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

Nianwen Si,Hao Zhang,Heyu Chang,Wenlin Zhang,Dan Qu,Weiqiang Zhang

http://arxiv.org/abs/2311.15766v1

Compressor summary: The paragraph discusses knowledge unlearning as a solution to mitigate risks associated with large language models' potential to retain harmful knowledge without compromising their capabilities.


Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

Yunxin Li,Baotian Hu,Wei Wang,Xiaochun Cao,Min Zhang

http://arxiv.org/abs/2311.15759v1

Compressor summary: The paper proposes MKS2, an approach to improve multimodal language models by enhancing their visual memory and collaboration abilities, leading to better reasoning and performance on benchmarks.


Learning Multi-Frequency Partial Correlation Graphs

Gabriele D'Acunto,Paolo Di Lorenzo,Francesco Bonchi,Stefania Sardellitti,Sergio Barbarossa

http://arxiv.org/abs/2311.15756v1

Compressor summary: The paper proposes two methods to learn partial correlations between time series across different frequency bands, and shows their effectiveness on synthetic and financial data.


One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

Minghui Hu,Jianbin Zheng,Chuanxia Zheng,Chaoyue Wang,Dacheng Tao,Tat-Jen Cham

http://arxiv.org/abs/2311.15744v1

Compressor summary: The text proposes a new method called One More Step (OMS) to improve image quality in diffusion models by integrating a compact network and an additional step during inference while preserving original model parameters.


Machine Learning-Based Jamun Leaf Disease Detection: A Comprehensive Review

Auvick Chandra Bhowmik,Dr. Md. Taimur Ahad,Yousuf Rayhan Emon

http://arxiv.org/abs/2311.15741v1

Compressor summary: The paper reviews image processing techniques and Vision Transformer models used for detecting plant leaf diseases, with potential applications for jamun leaf disease detection.


Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

Mariana Dias,Carla Teixeira Lopes

http://arxiv.org/abs/2311.15740v1

Compressor summary: The paper evaluates how image processing methods and parameter tuning in Optical Character Recognition (OCR) can improve the recognition of text in images of typewritten cultural heritage documents.


GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

Wenhao Wu,Huanjin Yao,Mengxi Zhang,Yuxin Song,Wanli Ouyang,Jingdong Wang

http://arxiv.org/abs/2311.15732v1

Compressor summary: The paper evaluates GPT-4's linguistic and visual capabilities in zero-shot recognition tasks across images, videos, and point clouds, finding improved performance with rich textual descriptions.


Adinkra Symbol Recognition using Classical Machine Learning and Deep Learning

Michael Adjeisah,Kwame Omono Asamoah,Martha Asamoah Yeboah,Raji Rafiu King,Godwin Ferguson Achaab,Kingsley Adjei

http://arxiv.org/abs/2311.15728v1

Compressor summary: The researchers created a dataset and model to recognize and classify Adinkra symbols, an example of using AI for cultural preservation and community empowerment.


MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Mengxi Zhang,Yiming Liu,Xiangjun Yin,Huanjing Yue,Jingyu Yang

http://arxiv.org/abs/2311.15727v1

Compressor summary: MARIS is a referring image segmentation method that uses the Segment Anything Model and mutual-aware attention to improve cross-modal fusion for more accurate segmentation.


Italian Crossword Generator: Enhancing Education through Interactive Word Puzzles

Kamyar Zeinalipour,Tommaso laquinta,Asya Zanollo,Giovanni Angelini,Leonardo Rigutini,Marco Maggini,Marco Gori

http://arxiv.org/abs/2311.15723v1

Compressor summary: The paragraph describes how advanced language models can be used to generate and verify educational crossword clues, enhancing student engagement and learning outcomes.


GLIME: General, Stable and Local LIME Explanation

Zeren Tan,Yang Tian,Jian Li

http://arxiv.org/abs/2311.15722v1

Compressor summary: GLIME is an improved method for explaining machine learning models that addresses instability and low local fidelity issues in LIME by using faster convergence, a local and unbiased sampling distribution, and flexible sampling choices.


Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Benjamin Keel,Aaron Quyn,David Jayne,Samuel D. Relton

http://arxiv.org/abs/2311.15719v1

Compressor summary: The study uses generative AI models to analyze lung cancer lesions from CT scans and develop an interpretable classifier with high accuracy and a clear latent space.


Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications

Sabine Wehnert

http://arxiv.org/abs/2311.15716v1

Compressor summary: The author proposes using Justifiable AI to increase trust in Large Language Models' legal outputs by gathering evidence for and against their predictions.


SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Jiehong Lin,Lihua Liu,Dekun Lu,Kui Jia

http://arxiv.org/abs/2311.15707v1

Compressor summary: SAM-6D is a framework that uses two sub-networks to detect new objects in cluttered scenes and estimate their 6D poses using instance segmentation and pose estimation, outperforming existing methods on BOP Benchmark datasets.


Cerbero-7B: A Leap Forward in Language-Specific LLMs Through Enhanced Chat Corpus Generation and Evaluation

Federico A. Galatolo,Mario G. C. A. Cimino

http://arxiv.org/abs/2311.15698v1

Compressor summary: The study presents a novel method to generate high-quality chat corpora using a generator and embedder LLM, evaluate them with a new MLM metric, and improve the performance of an Italian LLM.


Automated discovery of trade-off between utility, privacy and fairness in machine learning models

Bogdan Ficiu,Neil D. Lawrence,Andrei Paleyes

http://arxiv.org/abs/2311.15691v1

Compressor summary: The authors propose a pipeline called PFairDP, which uses Bayesian optimization to find Pareto-optimal points between fairness, privacy, and utility of machine learning models in a multi-objective optimization problem.


Information theoretic study of the neural geometry induced by category learning

Laurent Bonnasse-Gahot,Jean-Pierre Nadal

http://arxiv.org/abs/2311.15682v1

Compressor summary: The paragraph discusses an information theoretic approach to evaluate the efficiency of category learning in biological and artificial neural networks, focusing on the coding and decoding costs and the expansion of neural space near decision boundaries.


Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Maurice Günder,Sneha Banerjee,Rafet Sifa,Christian Bauckhage

http://arxiv.org/abs/2311.15679v1

Compressor summary: The authors propose a framework to use sampling-based explanation methods in pedestrian detection and introduce a new method similar to KernelSHAP that is more efficient for large-scale datasets.


Accelerating Hierarchical Associative Memory: A Deep Equilibrium Approach

Cédric Goemaere,Johannes Deleu,Thomas Demeester

http://arxiv.org/abs/2311.15673v1

Compressor summary: The paper proposes two strategies to speed up memory retrieval in Hierarchical Associative Memory models, which are a type of neural network, by using faster solvers and alternating optimization of layers.


HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang,Xingyu Chen,Shaohui Wang,Daiheng Gao,Xiaoguang Han,Baoyuan Wang

http://arxiv.org/abs/2311.15672v1

Compressor summary: The paper proposes HaveFun, a framework for reconstructing human avatars from few-shot unconstrained photos using a tetrahedral representation and a two-phase optimization method.


Deformation-Guided Unsupervised Non-Rigid Shape Matching

Aymen Merrouche,Joao Regateiro,Stefanie Wuhrer,Edmond Boyer

http://arxiv.org/abs/2311.15668v1

Compressor summary: The paper proposes a robust unsupervised method for matching shapes with fine details and different types of noise using a hierarchical patch representation and a near-rigid deformation model.


Technical Report for Argoverse Challenges on 4D Occupancy Forecasting

Pengfei Zheng,Kanokphan Lertniphonphan,Feng Chen,Siwei Chen,Bingchuan Sun,Jun Xie,Zhepeng Wang

http://arxiv.org/abs/2311.15660v1

Compressor summary: The paper introduces a LiDAR-based 4D occupancy forecasting method that outperforms the baseline and ranks first in Argoverse Challenges at CVPR 2023.


Regularization by Texts for Latent Diffusion Inverse Solvers

Jeongsol Kim,Geon Yeong Park,Hyungjin Chung,Jong Chul Ye

http://arxiv.org/abs/2311.15658v1

Compressor summary: The authors propose a new method called TReg that uses textual descriptions to help solve ill-posed inverse problems in latent diffusion models, improving their performance and accuracy.


Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Chaofeng Chen,Annan Wang,Haoning Wu,Liang Liao,Wenxiu Sun,Qiong Yan,Weisi Lin

http://arxiv.org/abs/2311.15657v1

Compressor summary: Text-to-image diffusion models can be improved by fine-tuning the text encoder using reinforcement learning, leading to better text-image alignment and visual quality.


MoDS: Model-oriented Data Selection for Instruction Tuning

Qianlong Du,Chengqing Zong,Jiajun Zhang

http://arxiv.org/abs/2311.15653v1

Compressor summary: The paper proposes a MoDS approach to select high-quality and necessary instruction data for fine-tuning LLMs, outperforming the full original dataset.


Reinforcement Learning from Diffusion Feedback: Q* for Image Search

Aboli Marathe

http://arxiv.org/abs/2311.15648v1

Compressor summary: The paper introduces two models for image generation using model-agnostic learning, RLDF and noisy diffusion gradient, which use a special CFG encoding to guide semantic priors and produce high-quality images from single input images.


Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Thomas Kleine Buening,Aadirupa Saha,Christos Dimitrakakis,Haifeng Xu

http://arxiv.org/abs/2311.15647v1

Compressor summary: The paper proposes a learning algorithm for online recommendation systems that considers both click-through rates and post-click rewards, and designs an incentive mechanism to encourage desirable arm behavior while minimizing regret.


PaintNeSF: Artistic Creation of Stylized Scenes with Vectorized 3D Strokes

Hao-Bin Duan,Miao Wang,Yan-Xun Li,Yong-Liang Yang

http://arxiv.org/abs/2311.15637v1

Compressor summary: PaintNeSF is a new technique that uses vector strokes to create stylized 3D images from multi-view 2D images, optimizing stroke parameters with gradient descent and maintaining consistent appearance across views.


The WebCrow French Crossword Solver

Giovanni Angelini,Marco Ernandes,Tommaso laquinta,Caroline Stehlé,Fanny Simões,Kamyar Zeinalipour,Andrea Zugarini,Marco Gori

http://arxiv.org/abs/2311.15626v1

Compressor summary: The authors present a crossword solver for French that uses multiple modules to find candidate answers from various sources and performs well compared to humans in challenges.


Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Renkai Wu,Yinghao Liu,Pengchen Liang,Qing Chang

http://arxiv.org/abs/2311.15625v1

Compressor summary: The paper proposes MHA-UNet, a model that uses high-order attention interaction to segment skin lesions and detect their presence or absence in an explainable way without needing negative samples.


Injecting linguistic knowledge into BERT for Dialogue State Tracking

Xiaohan Feng,Xixin Wu,Helen Meng

http://arxiv.org/abs/2311.15623v1

Compressor summary: The paper proposes an unsupervised method to improve BERT's performance and interpretability in dialogue state tracking tasks using linguistic knowledge extracted from conversations.


Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen,Dapeng Chen,Ruijin Liu,Sai Zhou,Wenyuan Xue,Wei Peng

http://arxiv.org/abs/2311.15619v1

Compressor summary: The paper proposes a new "Align before Adapt" paradigm for video action recognition that leverages region-aware image embeddings matched to a text corpus and exploits the visual-language alignment of VLP during adaptation to better understand actions by bridging the gap with complex activity semantics.


Technical Report for Argoverse Challenges on Unified Sensor-based Detection, Tracking, and Forecasting

Zhepeng Wang,Feng Chen,Kanokphan Lertniphonphan,Siwei Chen,Jinyao Bao,Pengfei Zheng,Jinbao Zhang,Kaer Huang,Tao Zhang

http://arxiv.org/abs/2311.15615v1

Compressor summary: The report introduces Le3DE2E, a unified network for sensor-based detection, tracking, and forecasting in autonomous driving, which achieved 1st place in Argoverse Challenges at CVPR 2023 WAD.


FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Ruixuan Xiao,Yiwen Dong,Junbo Zhao,Runze Wu,Minmin Lin,Gang Chen,Haobo Wang

http://arxiv.org/abs/2311.15614v1

Compressor summary: The authors propose a collaborative learning framework called FreeAL that uses a large language model as an active annotator and a small language model as a student to distill and filter task-specific knowledge, reducing the annotation cost and improving zero-shot performances.


A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor

Jialin Liu,Lu Yan,Xiaowei Liu,Yuzhuo Dai,Fanggen Lu,Yuanting Ma,Muzhou Hou,Zheng Wang

http://arxiv.org/abs/2311.15609v1

Compressor summary: The paragraph describes a study that used image processing of high-resolution manometry data to predict esophageal contraction vigor and make the evaluation of esophageal dynamic function easier and more accurate.


2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation

Ozan Unal,Dengxin Dai,Lukas Hoyer,Yigit Baran Can,Luc Van Gool

http://arxiv.org/abs/2311.15605v1

Compressor summary: IGNet is a method for weakly-supervised LiDAR semantic segmentation that uses RGB images to compensate for boundary estimation and false negative issues, achieving state-of-the-art results with minimal annotations.


UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Xiaohan Ding,Yiyuan Zhang,Yixiao Ge,Sijie Zhao,Lin Song,Xiangyu Yue,Ying Shan

http://arxiv.org/abs/2311.15599v1

Compressor summary: The paper proposes architectural guidelines for large-kernel ConvNets, which outperform conventional ConvNets in image recognition and show universal perception ability across modalities.


Can Vision-Language Models Think from a First-Person Perspective?

Sijie Cheng,Zhicheng Guo,Jingwen Wu,Kechen Fang,Peng Li,Huaping Liu,Yang Liu

http://arxiv.org/abs/2311.15596v1

Compressor summary: EgoThink is a new visual question-answering test for vision-language models that assesses their first-person perspective abilities using egocentric video clips, which can help improve autonomous agents and robotics.


A Simple Geometric-Aware Indoor Positioning Interpolation Algorithm Based on Manifold Learning

Suorong Yang,Geng Zhang,Jian Zhao,Furao Shen

http://arxiv.org/abs/2311.15583v1

Compressor summary: The paper proposes a simple geometric-aware interpolation algorithm for indoor positioning that exploits local topological manifold using manifold learning principles, improving accuracy and efficiency over existing methods.


Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings

Sudheer Achary,Rohit Girmaji,Adhiraj Anil Deshmukh,Vineet Gandhi

http://arxiv.org/abs/2311.15581v1

Compressor summary: Real Time GAZED is a novel system that allows users to create high-quality, professionally edited videos in real-time by combining the GAZED framework with CineFilter, a new camera trajectory stabilization technique.


Experimental Analysis of Large-scale Learnable Vector Storage Compression

Hailin Zhang,Penghao Zhao,Xupeng Miao,Yingxia Shao,Zirui Liu,Tong Yang,Bin Cui

http://arxiv.org/abs/2311.15578v1

Compressor summary: The paper compares 14 embedding compression methods in machine learning tasks, evaluates their performance under different memory budgets, and recommends the best approach for each use case.


EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

Cindy Le,Congrui Hetang,Ang Cao,Yihui He

http://arxiv.org/abs/2311.15573v1

Compressor summary: The paper introduces a new way to create realistic textures for 3D models using text descriptions and depth information, and shows that it outperforms existing methods in quality, diversity, and speed.


Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models

Yongjin Yang,Jongwoo Ko,Se-Young Yun

http://arxiv.org/abs/2311.15569v1

Compressor summary: This paper explores how vision-language models (VLMs) use prompts and adapters for image classification tasks, and proposes an adaptive ensemble method to improve generalization across domains.


Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text

Finbarrs Oketunji

http://arxiv.org/abs/2311.15565v1

Compressor summary: The paragraph describes a research project that employs advanced deep learning models to distinguish AI-generated texts from human-written ones using a diverse dataset and natural language processing techniques.


Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Fan Jiang,Qiongkai Xu,Tom Drummond,Trevor Cohn

http://arxiv.org/abs/2311.15564v1

Compressor summary: ABEL is a simple unsupervised method to enhance passage retrieval by iteratively improving a dense retriever and a reranker, achieving strong results on BEIR benchmark and adapting well to new tasks and domains.


Noisy Self-Training with Synthetic Queries for Dense Retrieval

Fan Jiang,Tom Drummond,Trevor Cohn

http://arxiv.org/abs/2311.15563v1

Compressor summary: The paper introduces a new self-training framework that improves neural retrievers without external data and shows better performance on different benchmarks, even with limited training data.


Fully Authentic Visual Question Answering Dataset from Online Communities

Chongyan Chen,Mengchen Liu,Noel Codella,Yunsheng Li,Lu Yuan,Danna Gurari

http://arxiv.org/abs/2311.15562v1

Compressor summary: The paper introduces VQAonline, a new VQA dataset with longer answers from online forums, and evaluates six models on it.


ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Yiming Chen,Zhiqi Li,Peidong Liu

http://arxiv.org/abs/2311.15561v1

Compressor summary: The authors propose a fast text-to-3D generation method that uses images from a pre-trained text-to-image diffusion model to train a 3D generative network, which takes only about 8 milliseconds per 3D asset.


PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Jiquan Yuan,Xinyan Cao,Changjin Li,Fanyi Yang,Jinlong Lin,Xixin Cao

http://arxiv.org/abs/2311.15556v1

Compressor summary: The text introduces a new database (PKU-I2IQA) and two benchmark models for evaluating the quality of AI-generated images in various scenarios.


Instruct2Attack: Language-Guided Semantic Adversarial Attacks

Jiang Liu,Chen Wei,Yuxiang Guo,Heng Yu,Alan Yuille,Soheil Feizi,Chun Pong Lau,Rama Chellappa

http://arxiv.org/abs/2311.15551v1

Compressor summary: Instruct2Attack (I2A) is a language-guided semantic attack that uses latent diffusion models to generate natural and diverse adversarial examples based on image and text instructions, breaking state-of-the-art neural networks even under strong defenses.


Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination

Haoqiang Kang,Xiao-Yang Liu

http://arxiv.org/abs/2311.15548v1

Compressor summary: The paper investigates and proposes solutions for the problem of large language models hallucinating or making up information when performing financial tasks.


Dataset Distillation in Latent Space

Yuxuan Duan,Jianfu Zhang,Liqing Zhang

http://arxiv.org/abs/2311.15547v1

Compressor summary: The paper proposes a new method for dataset distillation using latent space to address problems with time and space complexity and info-compactness, enabling better compression and performance.


Out-of-Distribution Generalized Dynamic Graph Neural Network for Human Albumin Prediction

Zeyang Zhang,Xingwang Li,Fei Teng,Ning Lin,Xueling Zhu,Xin Wang,Wenwu Zhu

http://arxiv.org/abs/2311.15545v1

Compressor summary: Key points: - Human albumin is important for health, but hard to predict and dose accurately, especially for critically ill patients. - The paper proposes a framework called DyG-HAP that uses dynamic graph regression and attention to capture invariant and variant patterns in the data. - The paper also introduces a new dataset (ANIC) for evaluating albumin prediction methods. Summary: The paper presents DyG-HAP, a framework that uses graphs and attention to predict human albumin levels accurately for ICU patients, and a new dataset (ANIC) to test it on.


The effect of source disclosure on evaluation of AI-generated messages: A two-part study

Sue Lim,Ralf Schmälzle

http://arxiv.org/abs/2311.15544v1

Compressor summary: This paper explores how people's evaluation of AI-generated health prevention messages changes depending on whether they know the source is AI or human, and how their negative attitudes towards AI affect this preference.


Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models

Tong Zhang,Haoyang Liu,Peiyan Zhang,Yuxuan Cheng,Haohan Wang

http://arxiv.org/abs/2311.15543v1

Compressor summary: The text introduces Simple-SVG-Generation (Sextsuperscript{2}VGextsuperscript{2}), a method that generates accurate and simple SVGs for images, improving readability and interpretability compared to previous methods.


EAFP-Med: An Efficient Adaptive Feature Processing Module Based on Prompts for Medical Image Detection

Xiang Li,Long Lan,Husam Lahza,Shaowu Yang,Shuihua Wang,Wenjing Yang,Hengzhu Liu,Yudong Zhang

http://arxiv.org/abs/2311.15540v1

Compressor summary: EAFP-Med is a module that uses language models to adaptively process lesion features in different medical imaging technologies, improving disease detection performance.


SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Bin Xie,Jiale Cao,Jin Xie,Fahad Shahbaz Khan,Yanwei Pang

http://arxiv.org/abs/2311.15537v1

Compressor summary: The paper proposes SED, an encoder-decoder model for open-vocabulary semantic segmentation that uses a hierarchical backbone and category early rejection to improve efficiency and accuracy.


SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation

Jia Li,Yanyan Shen,Lei Chen,Charles Wang Wai NG

http://arxiv.org/abs/2311.15530v1

Compressor summary: Key points: - SSIN is a novel data-driven self-supervised learning framework for rainfall spatial interpolation - SpaFormer model uses Transformer architecture and random masking to learn embeddings and model spatial correlations - SSIN outperforms state-of-the-art solutions on two real-world datasets and shows effectiveness on traffic spatial interpolation Summary: SSIN is a new method that uses SpaFormer, a Transformer-based model with self-supervision, to interpolate rainfall distribution from historical data and achieve better results than existing methods.


Efficient Dataset Distillation via Minimax Diffusion

Jianyang Gu,Saeed Vahidian,Vyacheslav Kungurtsev,Haonan Wang,Wei Jiang,Yang You,Yiran Chen

http://arxiv.org/abs/2311.15529v1

Compressor summary: The paper proposes a new method to reduce the storage and computational cost of training networks using generative diffusion techniques that enhance representativeness and diversity, achieving better performance with less distillation time compared to previous methods.


Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

Mai-Vu Tran,Hoang-Quynh Le,Duy-Cat Can,Quoc-An Nguyen

http://arxiv.org/abs/2311.15525v1

Compressor summary: The paper describes the VLSP 2022 shared task on Vietnamese abstractive multi-document summarization (Abmusu) and presents a human-annotated dataset of Vietnamese news documents in 8 categories.


A Comparative and Experimental Study on Automatic Question Answering Systems and its Robustness against Word Jumbling

Shashidhar Reddy Javaji,Haoran Hu,Sai Sameer Vennam,Vijaya Gajanan Buddhavarapu

http://arxiv.org/abs/2311.15513v1

Compressor summary: Question answer generation using NLP models is widely used in various applications, improving customer satisfaction and ease of usage, but can be affected by human errors.


Sparse Pedestrian Character Learning for Trajectory Prediction

Yonghao Dong,Le Wang,Sanpin Zhou,Gang Hua,Changyin Sun

http://arxiv.org/abs/2311.15512v1

Compressor summary: TSNet is a novel network for pedestrian trajectory prediction in autonomous driving that uses a sparse character graph to learn and remove harmful negative character information, achieving state-of-the-art performance.


CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

Haidong Zhu,Tianyu Ding,Tianyi Chen,Ilya Zharkov,Ram Nevatia,Luming Liang

http://arxiv.org/abs/2311.15510v1

Compressor summary: CaesarNeRF is an end-to-end approach that combines scene-level and pixel-level representations to improve few-shot, generalizable neural rendering with a holistic understanding of scenes.


A Corpus for Named Entity Recognition in Chinese Novels with Multi-genres

Hanjie Zhao,Jinge Xie,Yuchen Yan,Yuxiang Jia,Yawen Ye,Hongying Zan

http://arxiv.org/abs/2311.15509v1

Compressor summary: The authors build a large corpus of annotated named entities from different genres of Chinese novels and study the characteristics, genre differences, and challenges of named entity recognition in literature.


Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

Elijah Rippeth,Marine Carpuat,Kevin Duh,Matt Post

http://arxiv.org/abs/2311.15507v1

Compressor summary: The authors propose a simple and scalable way to resolve translation ambiguity in neural machine translation using extra-sentential context without sense annotation or model changes, and evaluate their method on a new challenge set.


Learning with Complementary Labels Revisited: A Consistent Approach via Negative-Unlabeled Learning

Wei Wang,Takashi Ishida,Yu-Jie Zhang,Gang Niu,Masashi Sugiyama

http://arxiv.org/abs/2311.15502v1

Compressor summary: The paper proposes a novel complementary-label learning method that doesn't need uniform distribution assumption or ordinary-label training set, uses negative-unlabeled binary classification, and has theoretical guarantees and experimental validation.


Function-constrained Program Synthesis

Patrick Hajali,Ignas Budvytis

http://arxiv.org/abs/2311.15500v1

Compressor summary: The authors present a technique for using user-provided code and generating modular sub-functions to aid LLMs in solving programming tasks, as well as introducing a new evaluation method for assessing their performance.


Adaptive Image Registration: A Hybrid Approach Integrating Deep Learning and Optimization Functions for Enhanced Precision

Gabriel De Araujo,Shanlin Sun,Xiaohui Xie

http://arxiv.org/abs/2311.15497v1

Compressor summary: The paper proposes a new image registration method that combines learning and optimization to improve accuracy, efficiency, and smoothness.


Optimizing and Fine-tuning Large Language Model for Urban Renewal

Xi Wang,Xianyao Ling,Tom Zhang,Xuecao Li,Shaolan Wang,Zhixing Li,Liang Zhang,Peng Gong

http://arxiv.org/abs/2311.15490v1

Compressor summary: The study uses ChatGLM to generate QA datasets for urban renewal, then fine-tunes it with Prefix and LoRA methods to improve performance in knowledge QA tasks.


Global $\mathcal{L}^2$ minimization with certainty via geometrically adapted gradient descent in Deep Learning

Thomas Chen

http://arxiv.org/abs/2311.15487v1

Compressor summary: The paper introduces two modified versions of gradient descent flow for different levels of over- and under-parametrization in Deep Learning, with invariant geometric meanings and proven convergence properties.


Automatic Time Signature Determination for New Scores Using Lyrics for Latent Rhythmic Structure

Callie C. Liao,Duoduo Liao,Jesse Guessford

http://arxiv.org/abs/2311.15480v1

Compressor summary: The paper proposes a novel method using lyrics as input to generate time signatures for lyrical songs, discovering patterns and utilizing explainable machine learning models with high accuracy.


AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image

Divya Kothandaraman,Tianyi Zhou,Ming Lin,Dinesh Manocha

http://arxiv.org/abs/2311.15478v1

Compressor summary: AerialBooth is a new method that can generate aerial views from a single image based on its text description, using a pretrained model and mutual information guidance.


DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Kam Woh Ng,Xiatian Zhu,Yi-Zhe Song,Tao Xiang

http://arxiv.org/abs/2311.15477v1

Compressor summary: DreamCreature is a novel method that generates new hybrid creatures by extracting sub-concepts from unlabeled images and composing them in a text-to-image model.


MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Yawar Siddiqui,Antonio Alliegro,Alexey Artemov,Tatiana Tommasi,Daniele Sirigatti,Vladislav Rosov,Angela Dai,Matthias Nießner

http://arxiv.org/abs/2311.15475v1

Compressor summary: MeshGPT is a new method for generating compact triangle meshes using a sequence-based approach inspired by large language models, which improves upon existing methods with better shape coverage and lower FID scores.