arxiv compressed, 2023-11-27

This page contains one-sentence summaries of cs.AI/ML/CV papers announced on 2023-11-27 generated by the compressor, my personal LLM-based project.

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Munan Ning,Bin Zhu,Yujia Xie,Bin Lin,Jiaxi Cui,Lu Yuan,Dongdong Chen,Li Yuan

Compressor summary: The paper introduces Video-Bench, a comprehensive evaluation system for video-based large language models, with 10 tasks covering understanding, question-answering, and decision-making.

Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback

Mihir Prabhudesai,Tsung-Wei Ke,Alexander C. Li,Deepak Pathak,Katerina Fragkiadaki

Compressor summary: Diffusion-TTA adapts pre-trained discriminative models using generative feedback from a diffusion model, improving their accuracy in various tasks.

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu,Chenhang Cui,Zijun Wang,Yiyang Zhou,Bingchen Zhao,Junlin Han,Wangchunshu Zhou,Huaxiu Yao,Cihang Xie

Compressor summary: This study evaluates Vision LLMs' visual reasoning abilities by introducing a comprehensive safety evaluation suite that covers OOD generalization and adversarial robustness, revealing their strengths and weaknesses in handling different conditions.

GART: Gaussian Articulated Template Models

Jiahui Lei,Yufu Wang,Georgios Pavlakos,Lingjie Liu,Kostas Daniilidis

Compressor summary: GART is a model that uses moving 3D Gaussians to represent deformable subjects in monocular videos with efficient reconstruction and rendering.

CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller,Angela Dai

Compressor summary: CG-HOI is a method for generating realistic 3D human-object interactions from text by modeling contact between the human body and object geometry.

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Zhe Li,Zerong Zheng,Lizhen Wang,Yebin Liu

Compressor summary: The paper presents a new method for creating realistic and dynamic human avatars using a combination of 2D and 3D neural networks, which can adapt to different clothing styles and poses.

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Aiyu Cui,Jay Mahajan,Viraj Shah,Preeti Gomathinayagam,Svetlana Lazebnik

Compressor summary: The paper introduces a Street TryOn benchmark and a novel method for virtual try-on on in-the-wild scenes without paired data, using DensePose warping correction and diffusion-based inpainting.

Have we built machines that think like people?

Luca M. Schulze Buschoff,Elif Akata,Matthias Bethge,Eric Schulz

Compressor summary: The paper evaluates how well vision-based large language models perform in intuitive physics, causal reasoning, and intuitive psychology tasks, finding that they are still far from human capabilities in these domains.

Self-correcting LLM-controlled Diffusion Models

Tsung-Han Wu,Long Lian,Joseph E. Gonzalez,Boyi Li,Trevor Darrell

Compressor summary: SLD is a framework that generates images from text prompts, assesses their alignment, and performs self-corrections to ensure correctness in the resulting image, without needing additional training or integrating with existing diffusion models.

DUnE: Dataset for Unified Editing

Afra Feyza Akyürek,Eric Pan,Garry Kuwanto,Derry Wijaya

Compressor summary: This paragraph discusses a study that explores different ways to edit language models beyond factual data, introduces a new benchmark called DUnE, and shows that no existing methods have completely solved the generalized editing problem.

MAST: Model-Agnostic Sparsified Training

Yury Demidovich,Grigory Malinovsky,Egor Shulgin,Peter Richtárik

Compressor summary: The text introduces a new optimization problem that uses pre-trained models and random sketch operators for sparsification during machine learning model training, leading to improved convergence rates and relaxed assumptions.

BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification

Dmitri Roussinov,Serge Sharoff

Compressor summary: The paper shows that PLMs struggle with topic changes in text classification tasks, proposes using synthetic texts to improve performance, and provides empirical results and code for replication.

ViT-Lens-2: Gateway to Omni-modal Intelligence

Weixian Lei,Yixiao Ge,Kun Yi,Jianfeng Zhang,Difei Gao,Dylan Sun,Yuying Ge,Ying Shan,Mike Zheng Shou

Compressor summary: The paper introduces ViT-Lens-2, a method for efficient learning of diverse modalities using pretrained vision transformers and modality alignment with existing foundation models.

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Zeming Chen,Alejandro Hernández Cano,Angelika Romanou,Antoine Bonnet,Kyle Matoba,Francesco Salvi,Matteo Pagliardini,Simin Fan,Andreas Köpf,Amirkeivan Mohtashami,Alexandre Sallinen,Alireza Sakhaeirad,Vinitra Swamy,Igor Krawczuk,Deniz Bayazit,Axel Marmet,Syrielle Montariol,Mary-Anne Hartley,Martin Jaggi,Antoine Bosselut

Compressor summary: MEDITRON is an open-source suite of large-scale medical language models that outperform several closed-source models on medical benchmarks.

BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

François Remy,Kris Demuynck,Thomas Demeester

Compressor summary: The study uses Large Language Models and UMLS knowledge graph to create high-fidelity representations of biomedical concepts and sentences, improving performance on various tasks and releasing a multilingual model.

A Survey on Vulnerability of Federated Learning: A Learning Algorithm Perspective

Xianghua Xie,Chen Hu,Hanchi Ren,Jingjing Deng

Compressor summary: This paper reviews malicious attacks on federated learning (FL) systems, categorizes them into four types, and discusses defense strategies that aim to protect FL's learning process, data, and models from manipulation and sabotage.

DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization

Zhaoyang Xia,Carol Neidle,Dimitris N. Metaxas

Compressor summary: The research introduces DiffSLVA, a method that uses diffusion models and image features to anonymize sign language videos without losing linguistic content, potentially benefiting Deaf and Hard-of-Hearing communities.

Metric Space Magnitude for Evaluating Unsupervised Representation Learning

Katharina Limbeck,Rayna Andreeva,Rik Sarkar,Bastian Rieck

Compressor summary: The paragraph introduces magnitude as a measure of the effective size of a space, and presents a new quality measure for dimensionality reduction tasks based on dissimilarity between magnitude functions.

Exploring Attribute Variations in Style-based GANs using Diffusion Models

Rishubh Parihar,Prasanna Balaji,Raghav Magazine,Sarthak Vora,Tejan Karmali,Varun Jampani,R. Venkatesh Babu

Compressor summary: The paper proposes a new method for diverse attribute editing by modeling multidimensional attribute edits using disentangled latent spaces of pretrained GANs and training a Denoising Diffusion Probabilistic Model.

Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing

Jian Gao,Chun Gu,Youtian Lin,Hao Zhu,Xun Cao,Li Zhang,Yao Yao

Compressor summary: The paragraph describes a new method to render 3D scenes from multiple images using point-based rendering, which allows for editing, ray-tracing, and real-time relighting of the scene.

Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps

Jane Wu,Diego Thomas,Ronald Fedkiw

Compressor summary: The paper proposes a new method using deep learning to reconstruct 3D clothed humans from 2D normal maps and RGB images, without volumetric information ground truth.

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng,Weiliang Chen,Yuanhui Huang,Borui Zhang,Yueqi Duan,Jiwen Lu

Compressor summary: The paper proposes OccWorld, a world model that predicts the movement of the ego car and the evolution of surrounding scenes in 3D occupancy space, using scene tokens and a GPT-like generative transformer.

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Jiemin Fang,Junjie Wang,Xiaopeng Zhang,Lingxi Xie,Qi Tian

Compressor summary: The GaussianEditor framework allows delicate and precise editing of 3D scenes using text instructions and 3D Gaussians, with faster training speed compared to previous methods.

Machine Learning-Enhanced Aircraft Landing Scheduling under Uncertainties

Yutian Pang,Peng Zhao,Jueming Hu,Yongming Liu

Compressor summary: The paper proposes a machine learning-enhanced landing scheduling method that reduces aircraft delays, improves safety, and considers uncertainties in flight events.

A Neural Framework for Generalized Causal Sensitivity Analysis

Dennis Frauen,Fergus Imrie,Alicia Curth,Valentyn Melnychuk,Stefan Feuerriegel,Mihaela van der Schaar

Compressor summary: NeuralCSA is a neural framework for causal sensitivity analysis that works with various sensitivity models, treatment types, and causal queries, and can infer valid bounds on the causal query of interest.

Forecasting Auxiliary Energy Consumption for Electric Heavy-Duty Vehicles

Yuantao Fan,Zhenkan Wang,Sepideh Pashami,Slawomir Nowaczyk,Henrik Ydreskog

Compressor summary: The paper proposes a method to improve energy consumption prediction and explainability for electric commercial vehicles by training multiple regression models on subsets of data based on relevant sub-populations.

Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

Teo Deveney,Jan Stanczuk,Lisa Maria Kreusser,Chris Budd,Carola-Bibiane Schönlieb

Compressor summary: This paper analyses the differences between ODE and SDE dynamics in score-based diffusion models and proposes a regularisation term to reduce these differences, but it may degrade SDE sample quality.

Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks

Evelyn Herberg,Roland Herzog,Frederik Köhne,Leonie Kreis,Anton Schiela

Compressor summary: The paper presents a method to insert new layers in neural networks during training using sensitivity-based techniques, which improves training efficiency and reduces computational effort.

Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework

Shaobo Wang,Xiangdong Zhang,Junchi Yan

Compressor summary: UBN is a two-stage framework that alleviates feature condensation and unifies various BN variants to improve neural network training stability and convergence.

DiffAnt: Diffusion Models for Action Anticipation

Zeyun Zhong,Chengzhi Wu,Manuel Martin,Michael Voit,Juergen Gall,Jürgen Beyerer

Compressor summary: The authors propose a new generative model that captures different possible future actions by iteratively generating them from Gaussian noise, conditioned on the observed video, and show its effectiveness on four benchmark datasets.

Should We Learn Most Likely Functions or Parameters?

Shikai Qiu,Tim G. J. Rudner,Sanyam Kapoor,Andrew Gordon Wilson

Compressor summary: The text discusses alternatives to standard regularized training methods by directly estimating the most likely function implied by a model and data, which can improve generalization and robustness.

Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers

Yilun Liu,Difan Jiao,Ashton Anderson

Compressor summary: The Sparsify-then-Classify (STC) approach improves text classification performance by using all internal representations of Large Language Models with multiple pooling strategies and sparsifying task-specific features.

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Yuanxun Lu,Jingyang Zhang,Shiwei Li,Tian Fang,David McKinnon,Yanghai Tsin,Long Quan,Xun Cao,Yao Yao

Compressor summary: The authors propose a novel method for generating diverse and high-fidelity 3D content using a multi-view 2.5D diffusion model that is fine-tuned from a pre-trained 2D diffusion model, without the need for score distillation sampling or extensive 3D training data.

Soil Organic Carbon Estimation from Climate-related Features with Graph Neural Network

Weiying Zhao,Natalia Efremova

Compressor summary: The study compared four Graph Neural Network operators to estimate soil organic carbon using satellite data and found that PESAGE and PETransformer models performed best, showing the potential of GNNs in predicting SOC.

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia,Letian Shi,Zifeng Ding,João F. Henriques,Daniel Cremers

Compressor summary: The Text2Loc neural network uses natural language descriptions and a coarse-to-fine localization pipeline to improve 3D point cloud localization accuracy by up to 2 times over previous methods.

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding in Open World

Thanh-Dat Truong,Utsav Prabhu,Bhiksha Raj,Jackson Cothren,Khoa Luu

Compressor summary: The paper introduces a new Fairness Contrastive Clustering loss to address catastrophic forgetting and fairness in continual learning for semantic scene understanding, and proposes an attention-based visual grammar approach for background shift and unknown classes.

Efficient Pre-training for Localized Instruction Generation of Videos

Anil Batra,Davide Moltisanti,Laura Sevilla-Lara,Marcus Rohrbach,Frank Keller

Compressor summary: Sieve-&-Swap is a technique to automatically filter and improve procedural video transcripts for better step localization and instruction generation with less computational resources.

From Pixels to Titles: Video Game Identification by Screenshots using Convolutional Neural Networks

Fabricio Breve

Compressor summary: The paper shows how convolutional neural networks can identify video games from single screenshots with high accuracy, using different architectures and initial weights.

Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines

Yu-An Lin,Chen-Tao Lee,Guan-Ting Liu,Pu-Jen Cheng,Shao-Hua Sun

Compressor summary: The paper introduces Program Machine Policies (POMPs), which combine programmatic RL and state machine policies to represent complex behaviors and address long-term tasks, outperforming previous methods on various tasks and generalizing inductively without fine-tuning.

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

Shuyue Stella Li,Beining Xu,Xiangyu Zhang,Hexin Liu,Wenhan Chao,Leibny Paola Garcia

Compressor summary: The study examines how self-supervised learning (SSL) models perform as feature extractors in cross-lingual settings and proposes a new metric, PSR, to measure their effectiveness using ASR performance.

Replay across Experiments: A Natural Extension of Off-Policy RL

Dhruva Tirumala,Thomas Lampe,Jose Enrique Chen,Tuomas Haarnoja,Sandy Huang,Guy Lever,Ben Moran,Tim Hertweck,Leonard Hasenclever,Martin Riedmiller,Nicolas Heess,Markus Wulfmeier

Compressor summary: Replay Across Experiments (RaE) is a simple framework that uses experience from previous experiments to improve RL performance, exploration, and bootstrapping while requiring minimal changes.

GloNets: Globally Connected Neural Networks

Antonio Di Cecco,Carlo Metta,Marco Fantozzi,Francesco Morandin,Maurizio Parton

Compressor summary: GloNet is a new architecture that helps deep neural networks work better at higher depths by uniformly connecting and regulating information flow across the network.

Leveraging deep active learning to identify low-resource mobility functioning information in public clinical notes

Tuan-Dung Le,Zhuqi Miao,Samuel Alvarado,Brittany Smith,William Paiva,Thanh Thieu

Compressor summary: The paragraph introduces a new dataset for extracting and analyzing mobility functioning information from clinical notes using BERT and CRF models.

Over-Squashing in Riemannian Graph Neural Networks

Julia Balla

Compressor summary: The paper investigates whether using Riemannian manifolds of variable curvature in Hyperbolic Graph Neural Networks (HGNNs) can reduce over-squashing, a phenomenon where node features become insensitive to distant nodes in the graph.

Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Sicong Leng,Yang Zhou,Mohammed Haroon Dupty,Wee Sun Lee,Sam Conrad Joyce,Wei Lu

Compressor summary: The authors introduce a new dataset, model, and evaluation method for generating floor plans from natural language descriptions, aiming to advance the field of language-guided design generation.

Physics-informed neural networks for transformed geometries and manifolds

Samuel Burbulla

Compressor summary: The paper proposes a method to improve physics-informed neural networks (PINNs) by incorporating geometric transformations, allowing them to handle complex or varying shapes better and enable shape optimization.

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Zhongyi Shui,Yunlong Zhang,Kai Yao,Chenglu Zhu,Yuxuan Sun,Lin Yang

Compressor summary: The paper introduces a novel framework that uses a point prompter and a segment anything model (SAM) for automatic nuclear instance segmentation in histology images, achieving state-of-the-art results.

Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo,Javier Civera

Compressor summary: SALAD is a new method for visual place recognition that uses optimal transport to aggregate local features, discards non-informative ones, and leverages a fast-learning backbone to achieve better performance than existing approaches.

A new fuzzy multi-attribute group decision-making method based on TOPSIS and optimization models

Qixiao Hu,Shiquan Zhang,Chaolang Hu,Yuetong Liu

Compressor summary: The paper proposes a new method for multi-attribute group decision-making using TOPSIS and optimization models with interval-valued intuitionistic fuzzy sets, which combines subjective and objective weighting methods and is demonstrated on a real case study.

WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

Youssef Benchekroun,Megi Dervishi,Mark Ibrahim,Jean-Baptiste Gaya,Xavier Martinet,Grégoire Mialon,Thomas Scialom,Emmanuel Dupoux,Dieuwke Hupkes,Pascal Vincent

Compressor summary: WorldSense is a benchmark to test LLMs' ability to understand simple arrangements of entities, but current chat-LLMs make errors and have response biases even with three objects.

Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments

Alexander Tapley,Marissa Dotter,Michael Doyle,Aidan Fennelly,Dhanuj Gandikota,Savanna Smith,Michael Threet,Tim Welsh

Compressor summary: The paper introduces SimFire, a realistic wildfire simulator, and SimHarness, an agent-based machine learning system to generate land management strategies, to help prepare for and react to increasingly severe fire seasons due to climate change.

Diagnosis driven Anomaly Detection for CPS

Henrik S. Steude,Lukas Moddemann,Alexander Diedrich,Jonas Ehrhardt,Oliver Niggemann

Compressor summary: The authors propose a method that combines deep learning-based anomaly detection with Consistency-Based Diagnosis for holistic diagnosis in Cyber-Physical Systems and show its effectiveness on simulated and real data.

A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning

Jianxiong Li,Shichao Lin,Tianyu Shi,Chujie Tian,Yu Mei,Jian Song,Xianyuan Zhan,Ruimin Li

Compressor summary: The paper proposes a data-driven framework for traffic signal control using machine learning and traffic flow theory to infer rewards from coarse-grained data and learn policies from historical datasets.

ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization

Elahe Vahdani,Yingli Tian

Compressor summary: The paper introduces ADM-Loc, a novel framework for detecting actions in videos with limited annotations, by generating action proposals from a composite distribution and enforcing consistency in action classification scores.

Computer Vision for Carriers: PATRIOT

Ari Goodman,Gurpreet Singh,James Hing,Ryan O'Shea

Compressor summary: PATRIOT is a prototype system that uses existing camera feeds and passive sensing to automatically track and update aircraft positions on a virtual Ouija board interface, improving deck tracking efficiency and safety without GPS sensors.

LIFT OFF: LoRaWAN Installation and Fiducial Tracking Operations for the Flightline of the Future

Ari Goodman,Ryan O'Shea

Compressor summary: LIFT OFF is a hybrid framework that uses machine vision, GPS sensors, and LoRaWAN to provide real-time situational awareness of people, equipment, and aircraft positions in various environments, including military flightlines.

Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

Claudio Rota,Marco Buzzelli,Joost van de Weijer

Compressor summary: The paper proposes StableVSR, a method that uses Diffusion Models and Temporal Conditioning Module to enhance the quality of upscaled videos by synthesizing realistic and temporally-consistent details.

MetaDefa: Meta-learning based on Domain Enhancement and Feature Alignment for Single Domain Generalization

Can Sun,Hao Zheng,Zhigang Hu,Liu Yang,Meiguang Zheng,Bo Xu

Compressor summary: MetaDefa is a novel meta-learning method that improves SDG model generalization by enhancing domains and aligning features using background substitution, visual corruptions, and class activation maps.

Data Generation for Post-OCR correction of Cyrillic handwriting

Evgenii Davydkin,Aleksandr Markelov,Egor Iuldashev,Anton Dudkin,Ivan Krivorotov

Compressor summary: The paper proposes a novel method to generate realistic synthetic Cyrillic handwriting and use it to create a large dataset for training a post-OCR correction model, which can improve error identification and evaluation of student performance.

Stability-Informed Initialization of Neural Ordinary Differential Equations

Theodor Westny,Arman Mohammadi,Daniel Jung,Erik Frisk

Compressor summary: The paper explores how different aspects of neural ODE training impact performance and introduces a new initialization technique based on stability regions.

FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters

D. M. Bot,J. Peeters,J. Liesenborgs,J. Aerts

Compressor summary: FLASC is a flare-sensitive clustering algorithm that improves upon HDBSCAN* by differentiating branches within detected clusters and offering two variants with varying computational cost and noise robustness.

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li,Duc Minh Vo,Akihiro Sugimoto,Hideki Nakayama

Compressor summary: EVCap is a retrieval-augmented image captioning method that uses external visual-name memory to enable LLMs to describe novel objects without relying on large amounts of data or scaling up network parameters.

RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization

Kwanyoung Kim,Yujin Oh,Sangjoon Park,Hwa Kyung Byun,Jin Sung Kim,Yong Bae Kim,Jong Chul Ye

Compressor summary: RO-LLaMA is a versatile AI model that can handle various tasks in radiation oncology, thanks to the CEFTune technique and LLM-driven segmentation framework.

InterControl: Generate Human Motion Interactions by Controlling Every Joint

Zhenzhi Wang,Jingbo Wang,Dahua Lin,Bo Dai

Compressor summary: InterControl is a novel approach that uses motion diffusion models and controlnets to generate realistic human interactions with flexible spatial control of every joint.

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho,Jie Song,Otmar Hilliges

Compressor summary: SiTH is a novel pipeline that uses an image-conditioned diffusion model to create lifelike and detailed 3D human reconstructions from single images by decomposing the problem into hallucination and reconstruction subproblems.

A systematic study comparing hyperparameter optimization engines on tabular data

Balazs Kegl

Compressor summary: The authors compare different hyperparameter optimization engines using normalization and aggregation methods and identify three top-performing engines.

Single-Model and Any-Modality for Video Object Tracking

Zongwei Wu,Jilai Zheng,Xiangxuan Ren,Florin-Alexandru Vasluianu,Chao Ma,Danda Pani Paudel,Luc Van Gool,Radu Timofte

Compressor summary: Un-Track is a single transformer-based tracker that learns a common latent space for multiple modalities using RGB-X pairs and achieves significant F-score gains on various datasets without modality-specific fine-tuning.

Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration

Lei Wang,Qingbo Wu,Desen Yuan,King Ngi Ngan,Hongliang Li,Fanman Meng,Linfeng Xu

Compressor summary: The paper proposes a method for learning robust image quality assessment models from low-cost opinion scores, which can perform well even with noisy and limited data.

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang,Biao Gong,Yutong Feng,Xi Chen,Yuqian Fu,Yu Liu,Donglin Wang

Compressor summary: The study introduces a new method called Action-Disentangled Identifier (ADI) for text-to-image generation that improves action customization by learning action-specific identifiers and blocking the inversion of irrelevant features.

Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Alexander Tapley,Kyle Gatesman,Luis Robaina,Brett Bissey,Joseph Weissman

Compressor summary: ARLIN Toolkit helps identify weaknesses in Deep Reinforcement Learning models using clear explanations and visualizations, making them safer to use in real situations.

Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis

Léo Lebrat,Rodrigo Santa Cruz,Remi Chierchia,Yulia Arzhaeva,Mohammad Ali Armin,Joshua Goldsmith,Jeremy Oorloff,Prithvi Reddy,Chuong Nguyen,Lars Petersson,Michelle Barakat-Johnson,Georgina Luscombe,Clinton Fookes,Olivier Salvado,David Ahmedt-Aristizabal

Compressor summary: Syn3DWound is an open-source dataset of realistic simulated wounds with annotations, aimed at improving machine learning-based wound management through image analysis.

Temporal Action Localization for Inertial-based Human Activity Recognition

Marius Bock,Michael Moeller,Kristof Van Laerhoven

Compressor summary: This paper shows how temporal attention models can improve wearable human activity recognition using raw data, outperforming previous methods with up to 25% better results.

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Soyed Tuhin Ahmed,Kamal Danouchi,Michael Hefenbrock,Guillaume Prenat,Lorena Anghel,Mehdi B. Tahoori

Compressor summary: The paper proposes Scale Dropout, a novel regularization technique for binary neural networks, and Monte Carlo-Scale Dropout-based Bayesian neural networks for efficient uncertainty estimation on spintronic memory-based computation-in-memory architectures with significant energy savings.

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

Yu Lu,Linchao Zhu,Hehe Fan,Yi Yang

Compressor summary: FlowZero is a novel framework that uses LLMs and image diffusion models to generate temporally-coherent videos from complex spatio-temporal text descriptions.

C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

Avigyan Bhattacharya,Mainak Singha,Ankit Jha,Biplab Banerjee

Compressor summary: C-SAW is a method that improves CLIP's performance in analyzing optical remote sensing images by enhancing visual features and prompt learning, addressing domain and content variations.

Exploring Artificial Intelligence Methods for Energy Prediction in Healthcare Facilities: An In-Depth Extended Systematic Review

Marjan FatehiJananloo,Helen Stopps,J. J. McArthur

Compressor summary: The study reviewed 17 articles using machine learning and AI to predict hospital energy consumption, finding that occupancy and meteorological data are significant factors, while highlighting the need for more research on optimizing methods and integrating real-time data.

PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions

Edouard Yvinec,Arnaud Dapogny,Kevin Bailly

Compressor summary: PIPE is a data-free quantization method for deep neural networks that adapts well to different devices and achieves good accuracy-speed trade-offs using residual error expansion, group sparsity, and ensemble approximation.

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin Herau,Nathan Piasco,Moussab Bennehar,Luis Roldão,Dzmitry Tsishkou,Cyrille Migniot,Pascal Vasseur,Cédric Demonceaux

Compressor summary: The paper proposes a NeRF-based sensor calibration method for autonomous driving that uses overlapping areas and improves accuracy and robustness.

Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

Lukas Wutschitz,Boris Köpf,Andrew Paverd,Saravan Rajmohan,Ahmed Salem,Shruti Tople,Santiago Zanella-Béguelin,Menglin Xia,Victor Rühle

Compressor summary: The authors propose using metadata in machine learning systems to address security and privacy issues and compare two methods for achieving user-level non-interference, finding that retrieval augmented models provide the best balance of utility, scalability, and flexibility.

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Shaohua Wu,Xudong Zhao,Shenling Wang,Jiangang Luo,Lingjun Li,Xi Chen,Bing Zhao,Wei Wang,Tong Yu,Rongguo Zhang,Jiahua Zhang,Chao Wang

Compressor summary: The paragraph describes a new language model called Yuan 2.0 that uses local dependencies in natural language to improve attention, has a large number of parameters, and can perform well in various tasks such as code generation and math problem-solving.

Relationship between Model Compression and Adversarial Robustness: A Review of Current Evidence

Svetlana Pavlitska,Hannes Grolig,J. Marius Zöllner

Compressor summary: The paper reviews how different techniques to make neural networks smaller can affect their ability to resist attacks, but the results are not consistent.

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Simone Conia,Min Li,Daniel Lee,Umar Farooq Minhas,Ihab Ilyas,Yunyao Li

Compressor summary: The authors propose a new task called Knowledge Graph Enhancement (KGE) that aims to improve the quality and quantity of textual information for non-English entity names and descriptions in Wikidata using a novel unsupervised approach, M-NTA, which combines Machine Translation, Web Search, and Large Language Models. They also introduce WikiKGE-10, the first benchmark to evaluate KGE methods across 10 languages and 7 language families.

Stable Segment Anything Model

Qi Fan,Xin Tao,Lei Ke,Mingqiao Ye,Yuan Zhang,Pengfei Wan,Zhongyuan Wang,Yu-Wing Tai,Chi-Keung Tang

Compressor summary: The paper analyzes how well SAM can segment objects with low-quality prompts and proposes Stable-SAM, which adjusts feature sampling locations to improve stability without changing the model architecture or adding many parameters.

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Biao Gong,Siteng Huang,Yutong Feng,Shiwei Zhang,Yuyuan Li,Yu Liu

Compressor summary: SimM is a system that adjusts image generation to match textual layout instructions without needing additional training, using a pipeline of error detection and rectification with minimal overhead.

Attend Who is Weak: Enhancing Graph Condensation via Cross-Free Adversarial Training

Xinglin Li,Kun Wang,Hanhui Deng,Yuxuan Liang,Di Wu

Compressor summary: The paper proposes Shock Absorber, a perturbation technique that enhances graph neural networks' robustness and stability by generating synthetic graphs with minimal additional time overhead.

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

Huanjin Yao,Wenhao Wu,Zhiheng Li

Compressor summary: The paper introduces Side4Video, a method for memory-efficient fine-tuning of large vision models to video understanding using a lightweight spatial-temporal side network.

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

Nianwen Si,Hao Zhang,Heyu Chang,Wenlin Zhang,Dan Qu,Weiqiang Zhang

Compressor summary: The paragraph discusses knowledge unlearning as a solution to mitigate risks associated with large language models' potential to retain harmful knowledge without compromising their capabilities.

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

Yunxin Li,Baotian Hu,Wei Wang,Xiaochun Cao,Min Zhang

Compressor summary: The paper proposes MKS2, an approach to improve multimodal language models by enhancing their visual memory and collaboration abilities, leading to better reasoning and performance on benchmarks.

Learning Multi-Frequency Partial Correlation Graphs

Gabriele D'Acunto,Paolo Di Lorenzo,Francesco Bonchi,Stefania Sardellitti,Sergio Barbarossa

Compressor summary: The paper proposes two methods to learn partial correlations between time series across different frequency bands, and shows their effectiveness on synthetic and financial data.

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

Minghui Hu,Jianbin Zheng,Chuanxia Zheng,Chaoyue Wang,Dacheng Tao,Tat-Jen Cham

Compressor summary: The text proposes a new method called One More Step (OMS) to improve image quality in diffusion models by integrating a compact network and an additional step during inference while preserving original model parameters.

Machine Learning-Based Jamun Leaf Disease Detection: A Comprehensive Review

Auvick Chandra Bhowmik,Dr. Md. Taimur Ahad,Yousuf Rayhan Emon

Compressor summary: The paper reviews image processing techniques and Vision Transformer models used for detecting plant leaf diseases, with potential applications for jamun leaf disease detection.

Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

Mariana Dias,Carla Teixeira Lopes

Compressor summary: The paper evaluates how image processing methods and parameter tuning in Optical Character Recognition (OCR) can improve the recognition of text in images of typewritten cultural heritage documents.

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

Wenhao Wu,Huanjin Yao,Mengxi Zhang,Yuxin Song,Wanli Ouyang,Jingdong Wang

Compressor summary: The paper evaluates GPT-4's linguistic and visual capabilities in zero-shot recognition tasks across images, videos, and point clouds, finding improved performance with rich textual descriptions.

Adinkra Symbol Recognition using Classical Machine Learning and Deep Learning

Michael Adjeisah,Kwame Omono Asamoah,Martha Asamoah Yeboah,Raji Rafiu King,Godwin Ferguson Achaab,Kingsley Adjei

Compressor summary: The researchers created a dataset and model to recognize and classify Adinkra symbols, an example of using AI for cultural preservation and community empowerment.

MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Mengxi Zhang,Yiming Liu,Xiangjun Yin,Huanjing Yue,Jingyu Yang

Compressor summary: MARIS is a referring image segmentation method that uses the Segment Anything Model and mutual-aware attention to improve cross-modal fusion for more accurate segmentation.

Italian Crossword Generator: Enhancing Education through Interactive Word Puzzles

Kamyar Zeinalipour,Tommaso laquinta,Asya Zanollo,Giovanni Angelini,Leonardo Rigutini,Marco Maggini,Marco Gori

Compressor summary: The paragraph describes how advanced language models can be used to generate and verify educational crossword clues, enhancing student engagement and learning outcomes.

GLIME: General, Stable and Local LIME Explanation

Zeren Tan,Yang Tian,Jian Li

Compressor summary: GLIME is an improved method for explaining machine learning models that addresses instability and low local fidelity issues in LIME by using faster convergence, a local and unbiased sampling distribution, and flexible sampling choices.

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Benjamin Keel,Aaron Quyn,David Jayne,Samuel D. Relton

Compressor summary: The study uses generative AI models to analyze lung cancer lesions from CT scans and develop an interpretable classifier with high accuracy and a clear latent space.

Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications

Sabine Wehnert

Compressor summary: The author proposes using Justifiable AI to increase trust in Large Language Models' legal outputs by gathering evidence for and against their predictions.

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Jiehong Lin,Lihua Liu,Dekun Lu,Kui Jia

Compressor summary: SAM-6D is a framework that uses two sub-networks to detect new objects in cluttered scenes and estimate their 6D poses using instance segmentation and pose estimation, outperforming existing methods on BOP Benchmark datasets.

Cerbero-7B: A Leap Forward in Language-Specific LLMs Through Enhanced Chat Corpus Generation and Evaluation

Federico A. Galatolo,Mario G. C. A. Cimino

Compressor summary: The study presents a novel method to generate high-quality chat corpora using a generator and embedder LLM, evaluate them with a new MLM metric, and improve the performance of an Italian LLM.

Automated discovery of trade-off between utility, privacy and fairness in machine learning models

Bogdan Ficiu,Neil D. Lawrence,Andrei Paleyes

Compressor summary: The authors propose a pipeline called PFairDP, which uses Bayesian optimization to find Pareto-optimal points between fairness, privacy, and utility of machine learning models in a multi-objective optimization problem.

Information theoretic study of the neural geometry induced by category learning

Laurent Bonnasse-Gahot,Jean-Pierre Nadal

Compressor summary: The paragraph discusses an information theoretic approach to evaluate the efficiency of category learning in biological and artificial neural networks, focusing on the coding and decoding costs and the expansion of neural space near decision boundaries.

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Maurice Günder,Sneha Banerjee,Rafet Sifa,Christian Bauckhage

Compressor summary: The authors propose a framework to use sampling-based explanation methods in pedestrian detection and introduce a new method similar to KernelSHAP that is more efficient for large-scale datasets.

Accelerating Hierarchical Associative Memory: A Deep Equilibrium Approach

Cédric Goemaere,Johannes Deleu,Thomas Demeester

Compressor summary: The paper proposes two strategies to speed up memory retrieval in Hierarchical Associative Memory models, which are a type of neural network, by using faster solvers and alternating optimization of layers.

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang,Xingyu Chen,Shaohui Wang,Daiheng Gao,Xiaoguang Han,Baoyuan Wang

Compressor summary: The paper proposes HaveFun, a framework for reconstructing human avatars from few-shot unconstrained photos using a tetrahedral representation and a two-phase optimization method.

Deformation-Guided Unsupervised Non-Rigid Shape Matching

Aymen Merrouche,Joao Regateiro,Stefanie Wuhrer,Edmond Boyer

Compressor summary: The paper proposes a robust unsupervised method for matching shapes with fine details and different types of noise using a hierarchical patch representation and a near-rigid deformation model.

Technical Report for Argoverse Challenges on 4D Occupancy Forecasting

Pengfei Zheng,Kanokphan Lertniphonphan,Feng Chen,Siwei Chen,Bingchuan Sun,Jun Xie,Zhepeng Wang

Compressor summary: The paper introduces a LiDAR-based 4D occupancy forecasting method that outperforms the baseline and ranks first in Argoverse Challenges at CVPR 2023.

Regularization by Texts for Latent Diffusion Inverse Solvers

Jeongsol Kim,Geon Yeong Park,Hyungjin Chung,Jong Chul Ye

Compressor summary: The authors propose a new method called TReg that uses textual descriptions to help solve ill-posed inverse problems in latent diffusion models, improving their performance and accuracy.

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Chaofeng Chen,Annan Wang,Haoning Wu,Liang Liao,Wenxiu Sun,Qiong Yan,Weisi Lin

Compressor summary: Text-to-image diffusion models can be improved by fine-tuning the text encoder using reinforcement learning, leading to better text-image alignment and visual quality.

MoDS: Model-oriented Data Selection for Instruction Tuning

Qianlong Du,Chengqing Zong,Jiajun Zhang

Compressor summary: The paper proposes a MoDS approach to select high-quality and necessary instruction data for fine-tuning LLMs, outperforming the full original dataset.

Reinforcement Learning from Diffusion Feedback: Q* for Image Search

Aboli Marathe

Compressor summary: The paper introduces two models for image generation using model-agnostic learning, RLDF and noisy diffusion gradient, which use a special CFG encoding to guide semantic priors and produce high-quality images from single input images.

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Thomas Kleine Buening,Aadirupa Saha,Christos Dimitrakakis,Haifeng Xu

Compressor summary: The paper proposes a learning algorithm for online recommendation systems that considers both click-through rates and post-click rewards, and designs an incentive mechanism to encourage desirable arm behavior while minimizing regret.

PaintNeSF: Artistic Creation of Stylized Scenes with Vectorized 3D Strokes

Hao-Bin Duan,Miao Wang,Yan-Xun Li,Yong-Liang Yang

Compressor summary: PaintNeSF is a new technique that uses vector strokes to create stylized 3D images from multi-view 2D images, optimizing stroke parameters with gradient descent and maintaining consistent appearance across views.

The WebCrow French Crossword Solver

Giovanni Angelini,Marco Ernandes,Tommaso laquinta,Caroline Stehlé,Fanny Simões,Kamyar Zeinalipour,Andrea Zugarini,Marco Gori

Compressor summary: The authors present a crossword solver for French that uses multiple modules to find candidate answers from various sources and performs well compared to humans in challenges.

Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Renkai Wu,Yinghao Liu,Pengchen Liang,Qing Chang

Compressor summary: The paper proposes MHA-UNet, a model that uses high-order attention interaction to segment skin lesions and detect their presence or absence in an explainable way without needing negative samples.

Injecting linguistic knowledge into BERT for Dialogue State Tracking

Xiaohan Feng,Xixin Wu,Helen Meng

Compressor summary: The paper proposes an unsupervised method to improve BERT's performance and interpretability in dialogue state tracking tasks using linguistic knowledge extracted from conversations.

Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen,Dapeng Chen,Ruijin Liu,Sai Zhou,Wenyuan Xue,Wei Peng

Compressor summary: The paper proposes a new "Align before Adapt" paradigm for video action recognition that leverages region-aware image embeddings matched to a text corpus and exploits the visual-language alignment of VLP during adaptation to better understand actions by bridging the gap with complex activity semantics.

Technical Report for Argoverse Challenges on Unified Sensor-based Detection, Tracking, and Forecasting

Zhepeng Wang,Feng Chen,Kanokphan Lertniphonphan,Siwei Chen,Jinyao Bao,Pengfei Zheng,Jinbao Zhang,Kaer Huang,Tao Zhang

Compressor summary: The report introduces Le3DE2E, a unified network for sensor-based detection, tracking, and forecasting in autonomous driving, which achieved 1st place in Argoverse Challenges at CVPR 2023 WAD.

FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Ruixuan Xiao,Yiwen Dong,Junbo Zhao,Runze Wu,Minmin Lin,Gang Chen,Haobo Wang

Compressor summary: The authors propose a collaborative learning framework called FreeAL that uses a large language model as an active annotator and a small language model as a student to distill and filter task-specific knowledge, reducing the annotation cost and improving zero-shot performances.

A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor

Jialin Liu,Lu Yan,Xiaowei Liu,Yuzhuo Dai,Fanggen Lu,Yuanting Ma,Muzhou Hou,Zheng Wang

Compressor summary: The paragraph describes a study that used image processing of high-resolution manometry data to predict esophageal contraction vigor and make the evaluation of esophageal dynamic function easier and more accurate.

2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation

Ozan Unal,Dengxin Dai,Lukas Hoyer,Yigit Baran Can,Luc Van Gool

Compressor summary: IGNet is a method for weakly-supervised LiDAR semantic segmentation that uses RGB images to compensate for boundary estimation and false negative issues, achieving state-of-the-art results with minimal annotations.

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Xiaohan Ding,Yiyuan Zhang,Yixiao Ge,Sijie Zhao,Lin Song,Xiangyu Yue,Ying Shan

Compressor summary: The paper proposes architectural guidelines for large-kernel ConvNets, which outperform conventional ConvNets in image recognition and show universal perception ability across modalities.

Can Vision-Language Models Think from a First-Person Perspective?

Sijie Cheng,Zhicheng Guo,Jingwen Wu,Kechen Fang,Peng Li,Huaping Liu,Yang Liu

Compressor summary: EgoThink is a new visual question-answering test for vision-language models that assesses their first-person perspective abilities using egocentric video clips, which can help improve autonomous agents and robotics.

A Simple Geometric-Aware Indoor Positioning Interpolation Algorithm Based on Manifold Learning

Suorong Yang,Geng Zhang,Jian Zhao,Furao Shen

Compressor summary: The paper proposes a simple geometric-aware interpolation algorithm for indoor positioning that exploits local topological manifold using manifold learning principles, improving accuracy and efficiency over existing methods.

Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings

Sudheer Achary,Rohit Girmaji,Adhiraj Anil Deshmukh,Vineet Gandhi

Compressor summary: Real Time GAZED is a novel system that allows users to create high-quality, professionally edited videos in real-time by combining the GAZED framework with CineFilter, a new camera trajectory stabilization technique.

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Hailin Zhang,Penghao Zhao,Xupeng Miao,Yingxia Shao,Zirui Liu,Tong Yang,Bin Cui

Compressor summary: The paper compares 14 embedding compression methods in machine learning tasks, evaluates their performance under different memory budgets, and recommends the best approach for each use case.

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

Cindy Le,Congrui Hetang,Ang Cao,Yihui He

Compressor summary: The paper introduces a new way to create realistic textures for 3D models using text descriptions and depth information, and shows that it outperforms existing methods in quality, diversity, and speed.

Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models

Yongjin Yang,Jongwoo Ko,Se-Young Yun

Compressor summary: This paper explores how vision-language models (VLMs) use prompts and adapters for image classification tasks, and proposes an adaptive ensemble method to improve generalization across domains.

Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text

Finbarrs Oketunji

Compressor summary: The paragraph describes a research project that employs advanced deep learning models to distinguish AI-generated texts from human-written ones using a diverse dataset and natural language processing techniques.

Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Fan Jiang,Qiongkai Xu,Tom Drummond,Trevor Cohn

Compressor summary: ABEL is a simple unsupervised method to enhance passage retrieval by iteratively improving a dense retriever and a reranker, achieving strong results on BEIR benchmark and adapting well to new tasks and domains.

Noisy Self-Training with Synthetic Queries for Dense Retrieval

Fan Jiang,Tom Drummond,Trevor Cohn

Compressor summary: The paper introduces a new self-training framework that improves neural retrievers without external data and shows better performance on different benchmarks, even with limited training data.

Fully Authentic Visual Question Answering Dataset from Online Communities

Chongyan Chen,Mengchen Liu,Noel Codella,Yunsheng Li,Lu Yuan,Danna Gurari

Compressor summary: The paper introduces VQAonline, a new VQA dataset with longer answers from online forums, and evaluates six models on it.

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Yiming Chen,Zhiqi Li,Peidong Liu

Compressor summary: The authors propose a fast text-to-3D generation method that uses images from a pre-trained text-to-image diffusion model to train a 3D generative network, which takes only about 8 milliseconds per 3D asset.

PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Jiquan Yuan,Xinyan Cao,Changjin Li,Fanyi Yang,Jinlong Lin,Xixin Cao

Compressor summary: The text introduces a new database (PKU-I2IQA) and two benchmark models for evaluating the quality of AI-generated images in various scenarios.

Instruct2Attack: Language-Guided Semantic Adversarial Attacks

Jiang Liu,Chen Wei,Yuxiang Guo,Heng Yu,Alan Yuille,Soheil Feizi,Chun Pong Lau,Rama Chellappa

Compressor summary: Instruct2Attack (I2A) is a language-guided semantic attack that uses latent diffusion models to generate natural and diverse adversarial examples based on image and text instructions, breaking state-of-the-art neural networks even under strong defenses.

Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination

Haoqiang Kang,Xiao-Yang Liu

Compressor summary: The paper investigates and proposes solutions for the problem of large language models hallucinating or making up information when performing financial tasks.

Dataset Distillation in Latent Space

Yuxuan Duan,Jianfu Zhang,Liqing Zhang

Compressor summary: The paper proposes a new method for dataset distillation using latent space to address problems with time and space complexity and info-compactness, enabling better compression and performance.

Out-of-Distribution Generalized Dynamic Graph Neural Network for Human Albumin Prediction

Zeyang Zhang,Xingwang Li,Fei Teng,Ning Lin,Xueling Zhu,Xin Wang,Wenwu Zhu

Compressor summary: Key points: - Human albumin is important for health, but hard to predict and dose accurately, especially for critically ill patients. - The paper proposes a framework called DyG-HAP that uses dynamic graph regression and attention to capture invariant and variant patterns in the data. - The paper also introduces a new dataset (ANIC) for evaluating albumin prediction methods. Summary: The paper presents DyG-HAP, a framework that uses graphs and attention to predict human albumin levels accurately for ICU patients, and a new dataset (ANIC) to test it on.

The effect of source disclosure on evaluation of AI-generated messages: A two-part study

Sue Lim,Ralf Schmälzle

Compressor summary: This paper explores how people's evaluation of AI-generated health prevention messages changes depending on whether they know the source is AI or human, and how their negative attitudes towards AI affect this preference.

Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models

Tong Zhang,Haoyang Liu,Peiyan Zhang,Yuxuan Cheng,Haohan Wang

Compressor summary: The text introduces Simple-SVG-Generation (Sextsuperscript{2}VGextsuperscript{2}), a method that generates accurate and simple SVGs for images, improving readability and interpretability compared to previous methods.

EAFP-Med: An Efficient Adaptive Feature Processing Module Based on Prompts for Medical Image Detection

Xiang Li,Long Lan,Husam Lahza,Shaowu Yang,Shuihua Wang,Wenjing Yang,Hengzhu Liu,Yudong Zhang

Compressor summary: EAFP-Med is a module that uses language models to adaptively process lesion features in different medical imaging technologies, improving disease detection performance.

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Bin Xie,Jiale Cao,Jin Xie,Fahad Shahbaz Khan,Yanwei Pang

Compressor summary: The paper proposes SED, an encoder-decoder model for open-vocabulary semantic segmentation that uses a hierarchical backbone and category early rejection to improve efficiency and accuracy.

SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation

Jia Li,Yanyan Shen,Lei Chen,Charles Wang Wai NG

Compressor summary: Key points: - SSIN is a novel data-driven self-supervised learning framework for rainfall spatial interpolation - SpaFormer model uses Transformer architecture and random masking to learn embeddings and model spatial correlations - SSIN outperforms state-of-the-art solutions on two real-world datasets and shows effectiveness on traffic spatial interpolation Summary: SSIN is a new method that uses SpaFormer, a Transformer-based model with self-supervision, to interpolate rainfall distribution from historical data and achieve better results than existing methods.

Efficient Dataset Distillation via Minimax Diffusion

Jianyang Gu,Saeed Vahidian,Vyacheslav Kungurtsev,Haonan Wang,Wei Jiang,Yang You,Yiran Chen

Compressor summary: The paper proposes a new method to reduce the storage and computational cost of training networks using generative diffusion techniques that enhance representativeness and diversity, achieving better performance with less distillation time compared to previous methods.

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

Mai-Vu Tran,Hoang-Quynh Le,Duy-Cat Can,Quoc-An Nguyen

Compressor summary: The paper describes the VLSP 2022 shared task on Vietnamese abstractive multi-document summarization (Abmusu) and presents a human-annotated dataset of Vietnamese news documents in 8 categories.

A Comparative and Experimental Study on Automatic Question Answering Systems and its Robustness against Word Jumbling

Shashidhar Reddy Javaji,Haoran Hu,Sai Sameer Vennam,Vijaya Gajanan Buddhavarapu

Compressor summary: Question answer generation using NLP models is widely used in various applications, improving customer satisfaction and ease of usage, but can be affected by human errors.

Sparse Pedestrian Character Learning for Trajectory Prediction

Yonghao Dong,Le Wang,Sanpin Zhou,Gang Hua,Changyin Sun

Compressor summary: TSNet is a novel network for pedestrian trajectory prediction in autonomous driving that uses a sparse character graph to learn and remove harmful negative character information, achieving state-of-the-art performance.

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

Haidong Zhu,Tianyu Ding,Tianyi Chen,Ilya Zharkov,Ram Nevatia,Luming Liang

Compressor summary: CaesarNeRF is an end-to-end approach that combines scene-level and pixel-level representations to improve few-shot, generalizable neural rendering with a holistic understanding of scenes.

A Corpus for Named Entity Recognition in Chinese Novels with Multi-genres

Hanjie Zhao,Jinge Xie,Yuchen Yan,Yuxiang Jia,Yawen Ye,Hongying Zan

Compressor summary: The authors build a large corpus of annotated named entities from different genres of Chinese novels and study the characteristics, genre differences, and challenges of named entity recognition in literature.

Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

Elijah Rippeth,Marine Carpuat,Kevin Duh,Matt Post

Compressor summary: The authors propose a simple and scalable way to resolve translation ambiguity in neural machine translation using extra-sentential context without sense annotation or model changes, and evaluate their method on a new challenge set.

Learning with Complementary Labels Revisited: A Consistent Approach via Negative-Unlabeled Learning

Wei Wang,Takashi Ishida,Yu-Jie Zhang,Gang Niu,Masashi Sugiyama

Compressor summary: The paper proposes a novel complementary-label learning method that doesn't need uniform distribution assumption or ordinary-label training set, uses negative-unlabeled binary classification, and has theoretical guarantees and experimental validation.

Function-constrained Program Synthesis

Patrick Hajali,Ignas Budvytis

Compressor summary: The authors present a technique for using user-provided code and generating modular sub-functions to aid LLMs in solving programming tasks, as well as introducing a new evaluation method for assessing their performance.

Adaptive Image Registration: A Hybrid Approach Integrating Deep Learning and Optimization Functions for Enhanced Precision

Gabriel De Araujo,Shanlin Sun,Xiaohui Xie

Compressor summary: The paper proposes a new image registration method that combines learning and optimization to improve accuracy, efficiency, and smoothness.

Optimizing and Fine-tuning Large Language Model for Urban Renewal

Xi Wang,Xianyao Ling,Tom Zhang,Xuecao Li,Shaolan Wang,Zhixing Li,Liang Zhang,Peng Gong

Compressor summary: The study uses ChatGLM to generate QA datasets for urban renewal, then fine-tunes it with Prefix and LoRA methods to improve performance in knowledge QA tasks.

Global $\mathcal{L}^2$ minimization with certainty via geometrically adapted gradient descent in Deep Learning

Thomas Chen

Compressor summary: The paper introduces two modified versions of gradient descent flow for different levels of over- and under-parametrization in Deep Learning, with invariant geometric meanings and proven convergence properties.

Automatic Time Signature Determination for New Scores Using Lyrics for Latent Rhythmic Structure

Callie C. Liao,Duoduo Liao,Jesse Guessford

Compressor summary: The paper proposes a novel method using lyrics as input to generate time signatures for lyrical songs, discovering patterns and utilizing explainable machine learning models with high accuracy.

AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image

Divya Kothandaraman,Tianyi Zhou,Ming Lin,Dinesh Manocha

Compressor summary: AerialBooth is a new method that can generate aerial views from a single image based on its text description, using a pretrained model and mutual information guidance.

DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Kam Woh Ng,Xiatian Zhu,Yi-Zhe Song,Tao Xiang

Compressor summary: DreamCreature is a novel method that generates new hybrid creatures by extracting sub-concepts from unlabeled images and composing them in a text-to-image model.

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Yawar Siddiqui,Antonio Alliegro,Alexey Artemov,Tatiana Tommasi,Daniele Sirigatti,Vladislav Rosov,Angela Dai,Matthias Nießner

Compressor summary: MeshGPT is a new method for generating compact triangle meshes using a sequence-based approach inspired by large language models, which improves upon existing methods with better shape coverage and lower FID scores.