arxiv compressed, 2023-12-04

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-04 generated by the compressor, my personal LLM-based project.

Dense Optical Tracking: Connecting the Dots

Guillaume Le Moing,Jean Ponce,Cordelia Schmid

Compressor summary: DOT is a fast point tracking method that uses key regions, nearest-neighbor interpolation, and a learnable optical flow estimator to handle occlusions and outperforms existing techniques while being much faster.

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai,Xinyang Geng,Karttikeya Mangalam,Amir Bar,Alan Yuille,Trevor Darrell,Jitendra Malik,Alexei A Efros

Compressor summary: The paragraph introduces a novel method to train a large vision model using visual sentences without linguistic data and shows it can handle various tasks with different prompts.

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Mu Cai,Haotian Liu,Siva Karthik Mustikovela,Gregory P. Meyer,Yuning Chai,Dennis Park,Yong Jae Lee

Compressor summary: The paragraph introduces a new multimodal model that can understand user-friendly visual prompts like colored boxes or arrows on images and performs well on region-understanding tasks.

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video

Hengyi Wang,Jingwen Wang,Lourdes Agapito

Compressor summary: MorpheuS is a framework for reconstructing dynamic, deformable objects in 360 degrees from casually captured RGB-D videos using neural representations and a view-dependent diffusion prior.

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang,Tianxing Wu,Shuai Yang,Chenyang Si,Dahua Lin,Yu Qiao,Chen Change Loy,Ziwei Liu

Compressor summary: VideoBooth is a video generation framework that uses image prompts to create customized and high-quality videos with coarse-to-fine embeddings and cross-frame attention layers.

Context Retrieval via Normalized Contextual Latent Interaction for Conversational Agent

Junfeng Liu,Zhuocheng Mei,Kewen Peng,Ranga Raju Vatsavai

Compressor summary: The paper introduces a new method, PK-NCLI, that uses low-level normalized contextual latent interaction to efficiently identify relevant auxiliary information for improving conversational agents' responses and outperforms the existing state-of-the-art method, PK-FoCus.

Automated Material Properties Extraction For Enhanced Beauty Product Discovery and Makeup Virtual Try-on

Fatemeh Taheri Dezaki,Himanshu Arora,Rahul Suresh,Amin Banitalebi-Dehkordi

Compressor summary: The paper introduces an automated pipeline using machine learning to extract material attributes from makeup product images, enhancing product discovery and virtual try-on experiences.

Explaining Knock-on Effects of Bias Mitigation

Svetoslav Nizhnichenkov,Rahul Nair,Elizabeth Daly,Brian Mac Namee

Compressor summary: The paper proposes an explainable meta-classifier to identify cohorts affected by bias mitigation strategies, and shows that some mitigation methods negatively impact certain groups even with improved fairness metrics.

Deep Unlearning: Fast and Efficient Training-free Approach to Controlled Forgetting

Sangamesh Kodge,Gobinda Saha,Kaushik Roy

Compressor summary: The paper introduces a new algorithm for machine unlearning that strategically removes classes from a model using novel spaces and a singular value decomposition-based technique, achieving good performance and efficiency in retaining accuracy and improving privacy against attacks.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu,Tri Dao

Compressor summary: Mamba is a fast and scalable sequence model that uses selective structured state space models for content-based reasoning and achieves state-of-the-art performance on various modalities.

Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Tam Nguyen,Tan M. Nguyen,Richard G. Baraniuk

Compressor summary: The paper introduces a new type of transformer model that reduces token representation over-smoothing by penalizing the difference between input and output tokens using self-attention.

Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games

Dekun Wu,Haochen Shi,Zhiyuan Sun,Bang Liu

Compressor summary: The study explores using Large Language Models (LLMs) in Chinese murder mystery role-playing games, introducing a new dataset and framework to improve AI agent performance and evaluation.

Adversarial Score Distillation: When score distillation meets GAN

Min Wei,Jingkai Zhou,Junyao Sun,Xuesong Zhang

Compressor summary: The paper proposes a new score distillation method (ASD) that improves stability and performance in various tasks by optimizing the discriminator using the Wasserstein Generative Adversarial Network (WGAN) paradigm.

SeaLLMs -- Large Language Models for Southeast Asia

Xuan-Phi Nguyen,Wenxuan Zhang,Xin Li,Mahani Aljunied,Qingyu Tan,Liying Cheng,Guanzheng Chen,Yue Deng,Sen Yang,Chaoqun Liu,Hang Zhang,Lidong Bing

Compressor summary: SeaLLMs are language models for Southeast Asian languages that respect local culture and perform better than ChatGPT in non-Latin languages.

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Mingqiao Ye,Martin Danelljan,Fisher Yu,Lei Ke

Compressor summary: Gaussian Grouping extends Gaussian Splatting to jointly reconstruct and segment objects in 3D scenes using Identity Encodings, 2D mask predictions, and spatial consistency regularization, enabling versatile scene editing applications.

Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

Xiaoyuan Cheng,Boli Chen,Liz Varga,Yukun Hu

Compressor summary: The paper proposes a stochastic model-based approach for safe reinforcement learning in partially observable environments using Predictive State Representation and Reproducing Kernel Hilbert Space.

Removing Biases from Molecular Representations via Information Maximization

Chenyu Wang,Sharut Gupta,Caroline Uhler,Tommi Jaakkola

Compressor summary: InfoCORE is an information maximization method for removing batch effects in drug screening data, improving molecular property prediction and retrieval.

SpaCE: The Spatial Confounding Environment

Mauricio Tec,Ana Trisovic,Michelle Audirac,Sophie Woodward,Naeem Khoshnevis,Francesca Dominici

Compressor summary: SpaCE is a toolkit for assessing causal inference methods in studies with spatial confounding by providing realistic benchmark datasets and tools.

PointBeV: A Sparse Approach to BeV Predictions

Loick Chambon,Eloi Zablocki,Mickael Chen,Florent Bartoccioni,Patrick Perez,Matthieu Cord

Compressor summary: PointBeV is a sparse Bird's-eye View segmentation model that uses sparse cells instead of dense grids, improving memory efficiency and enabling better performance on vehicle, pedestrian, and lane detection tasks.

GIFT: Generative Interpretable Fine-Tuning Transformers

Chinmay Savadikar,Xi Song,Tianfu Wu

Compressor summary: GIFT is a method for fine-tuning Transformer models with built-in interpretability, using a Parameter-to-Cluster Attention mechanism to generate and explain the fine-tuning parameters.

Rethinking Detection Based Table Structure Recognition for Visually Rich Documents

Bin Xiao,Murat Simsek,Burak Kantarci,Ala Abu Alkheir

Compressor summary: The paragraph discusses limitations of existing table detection methods, compares two-stage and transformer-based models, and identifies key design aspects for improving a two-stage model's performance in table structure recognition tasks.

Object Detector Differences when using Synthetic and Real Training Data

Martin Georg Ljungqvist,Otto Nordander,Markus Skans,Arvid Mildner,Tony Liu,Pierre Nugues

Compressor summary: The paper explores how training a neural network object detector on real vs synthetic data affects each layer using a similarity analysis method, and finds the largest differences in the head part of the network.

VisionaryVR: An Optical Simulation Tool for Evaluating and Optimizing Vision Correction Solutions in Virtual Reality

Benedikt W. Hosp,Martin Dechant,Yannick Sauer,Rajat Agarwala,Siegfried Wahl

Compressor summary: The study introduces a virtual reality simulation tool for evaluating vision science methods in various real-world scenarios with high control and flexibility.

Open-vocabulary object 6D pose estimation

Jaime Corsetti,Davide Boscaini,Changjae Oh,Andrea Cavallaro,Fabio Poiesi

Compressor summary: The paragraph describes a new open-vocabulary object 6D pose estimation method that uses a textual prompt, no object model, and two viewpoints of two different scenes, outperforming existing methods on a new benchmark.

Towards Transparency in Coreference Resolution: A Quantum-Inspired Approach

Hadi Wazni,Mehrnoosh Sadrzadeh

Compressor summary: The paper presents a Quantum Natural Language Processing system that uses Parametrised Quantum Circuits to perform pronoun resolution tasks and shows its effectiveness compared to classical systems.

Contextualized word senses: from attention to compositionality

Pablo Gamallo

Compressor summary: The paper proposes a transparent and interpretable method for encoding word contexts based on semantic compositionality, and shows that it can compete with Transformers in calculating word sense similarity.

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Tianyu Ding,Tianyi Chen,Haidong Zhu,Jiachen Jiang,Yiqi Zhong,Jinxin Zhou,Guangzhi Wang,Zhihui Zhu,Ilya Zharkov,Luming Liang

Compressor summary: The text summarizes a survey of algorithmic advancements to improve the efficiency of Large Language Models (LLMs) in various aspects, such as scaling laws, data utilization, and training strategies.

LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

Ying Nie,Wei He,Kai Han,Yehui Tang,Tianyu Guo,Fanyi Du,Yunhe Wang

Compressor summary: The paper proposes a multi-level alignment and masked language modeling approach to train lightweight CLIP models for vision-language tasks without increasing inference cost.

CellMixer: Annotation-free Semantic Cell Segmentation of Heterogeneous Cell Populations

Mehdi Naouar,Gabriel Kalweit,Anusha Klett,Yannick Vogt,Paula Silvestrini,Diana Laura Infante Ramirez,Roland Mertelsmann,Joschka Boedecker,Maria Kalweit

Compressor summary: CellMixer is an annotation-free approach that uses image-level labels to train a segmentation model for identifying different cell types in heterogeneous cell populations.

Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning

Kangcheng Liu,Yong-Jin Liu,Kai Tang,Ming Liu,Baoquan Chen

Compressor summary: This paper presents a method to improve 3D scene understanding with limited labels by using pre-trained vision-language models, boundary awareness, and unsupervised learning.

Nonparametric Variational Regularisation of Pretrained Transformers

Fabio Fehr,James Henderson

Compressor summary: The authors propose extending Nonparametric Variational Information Bottleneck (NVIB) to all attention functions in Transformers, which improves out-of-domain generalisation without additional training and suggests that pretrained Transformers are implicitly NV Bayesian models.

Resource-constrained knowledge diffusion processes inspired by human peer learning

Ehsan Beikihassan,Amy K. Hoover,Ioannis Koutis,Ali Parviz,Niloofar Aghaieabiane

Compressor summary: The paper explores how natural knowledge diffusion processes in networks of artificial learners can optimize performance under resource constraints, inspired by human peer learning.

Simple Transferability Estimation for Regression Tasks

Cuong N. Nguyen,Phong Tran,Lam Si Tung Ho,Vu Dinh,Anh T. Tran,Tal Hassner,Cuong V. Nguyen

Compressor summary: The authors propose two efficient methods for estimating how well deep learning models transfer from one task to another in regression tasks, and show that their methods significantly outperform existing approaches in both accuracy and efficiency.

Machine Learning for Health symposium 2023 -- Findings track

Stefan Hegselmann,Antonio Parziale,Divya Shanmugam,Shengpu Tang,Mercy Nyamewaa Asiedu,Serina Chang,Thomas Hartvigsen,Harvineet Singh

Compressor summary: The paragraph describes the collection of accepted Findings papers from the 3rd Machine Learning for Health symposium (ML4H 2023), which featured health-related topics and two submission tracks, with all submissions undergoing a double-blind peer-review process.

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

Pengxiang Li,Zhili Liu,Kai Chen,Lanqing Hong,Yunzhi Zhuge,Dit-Yan Yeung,Huchuan Lu,Xu Jia

Compressor summary: The paper introduces TrackDiffusion, a new architecture that generates continuous video sequences from tracklets, improving instance consistency and perceptual metrics, and enabling better training of multi-object tracking systems.

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Ioannis Kakogeorgiou,Spyros Gidaris,Konstantinos Karantzalos,Nikos Komodakis

Compressor summary: This paper introduces two new techniques to improve slot-based autoencoders for unsupervised object segmentation in complex scenes.

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

Paul Bricman

Compressor summary: The authors propose hashmarking, a method to evaluate language models on sensitive topics without revealing the correct answers, by using cryptographic hashing of solutions before publication.

EvE: Exploiting Generative Priors for Radiance Field Enrichment

Karim Kassab,Antoine Schnepf,Jean-Yves Franceschi,Laurent Caraffa,Jeremie Mary,Valérie Gouet-Brunet

Compressor summary: EvE is a new method that uses generative networks to improve in-the-wild scene modeling and produce more realistic images, outperforming existing methods on novel view synthesis tasks.

Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

Yuxin Li,Qiang Han,Mengying Yu,Yuxin Jiang,Chaikiat Yeo,Yiheng Li,Zihang Huang,Nini Liu,Hsuanhan Chen,Xiaojun Wu

Compressor summary: BEVENet is a fast and efficient 3D object detection framework that uses convolutional neural networks instead of vision-transformers, making it suitable for autonomous driving applications.

Rethinking the Domain Gap in Near-infrared Face Recognition

Michail Tarasiou,Jiankang Deng,Stefanos Zafeiriou

Compressor summary: The authors propose a framework for heterogeneous face recognition that uses large neural networks pre-trained on homogeneous visible data and fine-tuned on near-infrared data, achieving state-of-the-art results.

Forecasting Trends in Food Security: a Reservoir Computing Approach

Joschka Herteux,Christoph Räth,Amine Baha,Giulia Martini,Duccio Piovani

Compressor summary: The authors present a new quantitative method to forecast food consumption levels for 60 days in four countries using data from the World Food Programme's hunger monitoring system and various models, finding Reservoir Computing as the best performer for this task.

Practical Path-based Bayesian Optimization

Jose Pablo Folch,James Odgers,Shiqiang Zhang,Robert M Lee,Behrang Shafei,David Walz,Calvin Tsay,Mark van der Wilk,Ruth Misener

Compressor summary: The paper presents an extended SnAKe algorithm that handles costs and constraints for Bayesian optimization in data-driven experimental design.

Investigating a domain adaptation approach for integrating different measurement instruments in a longitudinal clinical registry

Maren Hackenberg,Michelle Pfaffenlehner,Max Behrens,Astrid Pechmann,Janbernd Kirschner,Harald Binder

Compressor summary: The paper explores using deep learning and domain adaptation to combine different measurement instruments for assessing individuals over time, specifically in a spinal muscular atrophy registry.

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang,Nicolas Michel,Ling Xiao,Toshihiko Yamasaki

Compressor summary: The paper proposes Collaborative Continual Learning (CCL) to address the challenge of acquiring new knowledge in online learning, and introduces Distillation Chain (DC) as a novel collaborative learning scheme that improves model plasticity and performance.

Learning from One Continuous Video Stream

João Carreira,Michael King,Viorica Pătrăucean,Dilara Gokay,Cătălin Ionescu,Yi Yang,Daniel Zoran,Joseph Heyward,Carl Doersch,Yusuf Aytar,Dima Damen,Andrew Zisserman

Compressor summary: The authors present a framework for online learning from continuous video streams, addressing challenges related to high frame correlations, and demonstrate its effectiveness through experiments with pixel-to-pixel modelling and future prediction tasks.

BCN: Batch Channel Normalization for Image Classification

Afifa Khaled,Chao Li,Jia Ning,Kun He

Compressor summary: BCN is a novel normalization technique for deep learning that adapts to both channel and batch dependence, enabling higher learning rates and better performance on different tasks.

Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers

Sahar Nasirihaghighi,Negin Ghamsarian,Heinrich Husslein,Klaus Schoeffmann

Compressor summary: The paper introduces a dataset for recognizing critical events in laparoscopic gynecology videos using a hybrid transformer architecture and a frame sampling strategy that improves event recognition accuracy.

Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version)

Emma Cramer,Jonas Reiher,Sebastian Trimpe

Compressor summary: The paper proposes a metric to evaluate how well spatial autoencoders (SAEs) can track objects in images, which is important for robotic reinforcement learning (RL), and suggests three modifications to improve SAE architectures.

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

Xudong Li,Jingyuan Zheng,Xiawu Zheng,Runze Hu,Enwei Zhang,Yuting Gao,Yunhang Shen,Ke Li,Yutao Liu,Pingyang Dai,Yan Zhang,Rongrong Ji

Compressor summary: The paper proposes a new method to assess image quality without reference images by learning from non-aligned references using feature distillation, achieving state-of-the-art performance.

Explainable Fraud Detection with Deep Symbolic Classification

Samantha Visbeek,Erman Acar,Floris den Hengst

Compressor summary: The paper proposes Deep Symbolic Classification (DSC), a framework that combines deep neural networks and reinforcement learning to search for explainable, transparent, and data-driven fraud detection models that can handle class imbalance without oversampling or undersampling.

The Ethics of Automating Legal Actors

Josef Valvoda,Alec Thompson,Ryan Cotterell,Simone Teufel

Compressor summary: The paper discusses the ethical challenges of using NLP models to automate the role of judges in common law systems, arguing that current models are not capable of shaping the law and even if they were, there would still be ethical concerns.

Pathway to a fully data-driven geotechnics: lessons from materials informatics

Stephen Wu,Yu Otake,Yosuke Higo,Ikumasa Yoshida

Compressor summary: The paper discusses how data-driven methods and deep learning can improve geotechnics by addressing soil complexity and promoting open science, and envisions a future where advanced computational tools revolutionize the field.

Instruction-tuning Aligns LLMs to the Human Brain

Khai Loong Aw,Syrielle Montariol,Badr AlKhamissi,Martin Schrimpf,Antoine Bosselut

Compressor summary: Instruction-tuning enhances language models' similarity to human brain activity but not behavior on a reading task.

Generative models for visualising abstract social processes: Guiding streetview image synthesis of StyleGAN2 with indices of deprivation

Aleksi Knuutila

Compressor summary: The paper applies GANs to study visual aspects of social processes in London, mapping how different areas vary in health, income, and environmental quality using image synthesis and comparing the results from three inversion techniques.

Explanatory Argument Extraction of Correct Answers in Resident Medical Exams

Iakes Goenaga,Aitziber Atutxa,Koldo Gojenola,Maite Oronoz,Rodrigo Agerri

Compressor summary: The paper presents a new Spanish dataset for extractive question answering in Evidence-Based Medicine, with explanations written by medical doctors and benchmarks both correct and incorrect answers.

Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees

Tingting Ni,Maryam Kamgarpour

Compressor summary: The paper proposes a new zeroth-order interior point method for constrained Markov decision processes that guarantees constraint satisfaction during learning and converges faster than existing algorithms.

Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?

Aniket Deroy,Subhankar Maity

Compressor summary: The study examines how biases in legal dataset summaries and large language models affect justice systems and explores various types of biases, such as gender, race, crime against women, country names, and religious keywords.

Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs

Qing Wang,Kang Zhou,Qiao Qiao,Yuepei Li,Qi Li

Compressor summary: The paper introduces AugURE, a method for unsupervised relation extraction that increases positive pair diversity with cross-sentence augmentation and uses margin loss instead of NCE to improve relation representation learning.

Domain Adaptive Imitation Learning with Visual Observation

Sungho Choi,Seungyul Han,Woojun Kim,Jongseong Chae,Whiyoung Jung,Youngchul Sung

Compressor summary: The paper presents a novel framework for cross-domain imitation learning with visual observation that extracts domain-independent features to improve performance in practical scenarios.

Target-agnostic Source-free Domain Adaptation for Regression Tasks

Tianlang He,Zhiqiu Xia,Jierun Chen,Haoliang Li,S. -H. Gary Chan

Compressor summary: TASFAR is a new target-agnostic source-free domain adaptation method for regression tasks that uses prediction confidence to estimate a label density map and calibrate the source model on the target domain, achieving superior performance compared to existing approaches.

Trained MT Metrics Learn to Cope with Machine-translated References

Jannis Vamvas,Tobias Domhan,Sony Trenous,Rico Sennrich,Eva Hasler

Compressor summary: The paper compares two metrics for machine translation evaluation and finds that the trained one is more robust to machine-translated references, indicating unintended positive effects of metric training.

LiDAR-based curb detection for ground truth annotation in automated driving validation

Jose Luis Apellániz,Mikel García,Nerea Aranjuelo,Javier Barandiarán,Marcos Nieto

Compressor summary: The paper presents a method for detecting 3D curbs from LiDAR point clouds and shows how it reduces manual annotation time by 50.99%.

DeepDR: Deep Structure-Aware RGB-D Inpainting for Diminished Reality

Christina Gsaxner,Shohei Mori,Dieter Schmalstieg,Jan Egger,Gerhard Paar,Werner Bailer,Denis Kalkofen

Compressor summary: DeepDR is a new RGB-D inpainting framework that can remove real objects from scenes and generate coherent structure and 3D geometry, achieving high quality results at real-time speeds.

SurreyAI 2023 Submission for the Quality Estimation Shared Task

Archchana Sindhujan,Diptesh Kanojia,Constantin Orasan,Tharindu Ranasinghe

Compressor summary: The paper describes an approach that uses autoencoder pre-trained language models within the MonoTransQuest architecture to assess translation quality without reference, and shows that MonoTQ-InfoXLM-large performs best among the tested models.

Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting

Haotian Gao,Renhe Jiang,Zheng Dong,Jinliang Deng,Xuan Song

Compressor summary: The paper introduces a new method called STD-MAE that uses masked autoencoders to learn complex spatio-temporal patterns in traffic data and improve forecasting performance.

Summarization-based Data Augmentation for Document Classification

Yueguan Wang,Naoki Yoshinaga

Compressor summary: SUMMaug is a data augmentation technique that uses summarization to generate pseudo examples for document classification tasks, improving robustness and accuracy.

On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals

Aristotelis Ballas,Vasileios Papapanagiotou,Christos Diou

Compressor summary: The authors propose contrastive self-supervised learning (SSL) to improve deep-learning models' effectiveness and robustness in detecting abnormalities in phonocardiogram signals using audio-based augmentations, addressing the scarcity of labeled data in medicine.

Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras

Mohammad Altillawi,Zador Pataki,Shile Li,Ziyuan Liu

Compressor summary: The authors propose a novel network to train a deep neural network for camera localization in robotics and AR/VR applications using relative spatial and temporal geometric constraints, achieving better performance with very limited ground-truth data.

Explainable AI in Diagnosing and Anticipating Leukemia Using Transfer Learning Method

Wahidul Hasan Abir,Md. Fahim Uddin,Faria Rahman Khanam,Mohammad Monirujjaman Khan

Compressor summary: The paper proposes an automated detection method for Acute Lymphoblastic Leukemia using deep learning models and explainable artificial intelligence to improve accuracy and efficiency in diagnosis.

REDUCR: Robust Data Downsampling Using Class Priority Reweighting

William Bankes,George Hughes,Ilija Bogunovic,Zi Wang

Compressor summary: REDUCR is a data downsampling method that uses class priority reweighting to reduce training costs and improve worst-class generalization performance in image and text classification tasks.

Backbone-based Dynamic Graph Spatio-Temporal Network for Epidemic Forecasting

Junkai Mao,Yuexing Han,Gouhei Tanaka,Bing Wang

Compressor summary: The proposed Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN) is a novel deep learning model that combines static backbone graphs with temporal models for accurate epidemic forecasting, overcoming limitations of recurrent structures and showing superior complexity and efficiency.

MultiView Independent Component Analysis with Delays

Ambroise Heurtebise,Pierre Ablin,Alexandre Gramfort

Compressor summary: MVICAD is an improved MultiView ICA algorithm that accounts for source delays and better separates sources in observed signals, applicable to neuroscience data analysis.

Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction

Hiroaki Yamada,Takenobu Tokunaga,Ryutaro Ohara,Akira Tokutsu,Keisuke Takeshita,Mihoko Sumida

Compressor summary: The paper introduces JTD, a novel dataset for Japanese Legal Judgment Prediction with two tasks: tort prediction and rationale extraction, which requires identifying court-accepted arguments from party allegations.

Interpretable Meta-Learning of Physical Systems

Matthieu Blanke,Marc Lelarge

Compressor summary: The authors propose a simpler learning model for multi-environment generalization in machine learning, which can identify the physical parameters of the system and enable interpretable learning while having competitive performance and low computational cost.

A Bayesian approach for prompt optimization in pre-trained language models

Antonio Sabbatella,Andrea Ponti,Antonio Candelieri,Ilaria Giordani,Francesco Archetti

Compressor summary: The paper proposes a Bayesian optimization method for discrete prompt tuning in classification tasks, which can efficiently search for optimal token sequences without relying on large language models.

Unfolder: Fast localization and image rectification of a document with a crease from folding in half

A. M. Ershov,D. V. Tropin,E. E. Limonova,D. P. Nikolaev,V. V. Arlazarov

Compressor summary: Unfolder is a novel algorithm that rectifies images of documents with a crease from folding in half, outperforming advanced neural network methods and having fast runtime on smartphones.

Learning Unorthogonalized Matrices for Rotation Estimation

Kerui Gu,Zhihao Li,Shiyong Liu,Jianzhuang Liu,Songcen Xu,Youliang Yan,Michael Bi Mi,Kenji Kawaguchi,Angela Yao

Compressor summary: The text discusses how removing orthogonalization from rotation matrices improves training efficiency and leads to better results in 3D computer vision tasks like human pose estimation.

Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?

Mayalen Etcheverry,Bert Wang-Chak Chan,Clément Moulin-Frier,Pierre-Yves Oudeyer

Compressor summary: The article presents a framework for generating endless complex artifacts in Minecraft using a complex system and a meta-diversity search algorithm, which learns to discover diverse patterns and seek novel sources of diversity.

An Encoding Framework for Binarized Images using HyperDimensional Computing

Laura Smets,Werner Van Leekwijck,Ing Jyh Tsang,Steven Latré

Compressor summary: Hyperdimensional Computing (HDC) is a brain-inspired, lightweight machine learning method that performs well in image classification tasks with a novel encoding approach that preserves pattern similarity and improves accuracy and robustness.

Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence

Yajie Liu,Pu Ge,Haoxiang Ma,Shichao Fan,Qingjie Liu,Di Huang,Yunhong Wang

Compressor summary: The paper proposes a novel RIS method that improves generalization by using a prompt to handle linguistic style changes and a multi-modal fusion module to leverage spatial relations, achieving consistent gains on various datasets.

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu,Zhiwen Fan,Yifan Jiang,Zhangyang Wang

Compressor summary: The paper proposes FSGS, a few-shot view synthesis framework that uses 3D Gaussian Splatting to generate real-time and photo-realistic views from as few as three training images while accurately filling in sparse scene details.

Dolphins: Multimodal Language Model for Driving

Yingzi Ma,Yulong Cao,Jiachen Sun,Marco Pavone,Chaowei Xiao

Compressor summary: Dolphins is a vision-language model that can process multimodal inputs and generate outputs for various autonomous driving tasks by using Grounded Chain of Thought, open-source pretrained model, and human-like capabilities.

Enhancing Image Captioning with Neural Models

Pooja Bhatnagar,Sai Mrunaal,Sachin Kamnure

Compressor summary: This research compares different neural architectures for image captioning, proposes a new quality metric, and highlights the importance of data refinement and hyperparameter optimization.

PEFTDebias : Capturing debiasing information using PEFTs

Sumit Agarwal,Aditya Srikanth Veerubhotla,Srijan Bansal

Compressor summary: PEFTDebias is a new method that uses parameter-efficient fine-tuning to reduce biases in foundation models by acquiring debiasing parameters and incorporating them during training.

A Low-Power Neuromorphic Approach for Efficient Eye-Tracking

Pietro Bonazzi,Sizhen Bian,Giovanni Lippolis,Yawei Li,Sadique Sheik,Michele Magno

Compressor summary: The paper presents a neuromorphic eye-tracking method using a spiking neural network model called Retina, which performs better than the latest method with less power consumption and fewer parameters.

Towards Explaining Satellite Based Poverty Predictions with Convolutional Neural Networks

Hamid Sarmadi,Thorsteinn Rögnvaldsson,Nils Roger Carlsson,Mattias Ohlsson,Ibrahim Wahab,Ola Hall

Compressor summary: The paper examines how deep convolutional neural networks predict poverty and development indicators from satellite images and identifies key features that influence their predictions.

Large-scale Vision-Language Models Learn Super Images for Efficient and High-Performance Partially Relevant Video Retrieval

Taichi Nishimura,Shota Nakada,Masayoshi Kondo

Compressor summary: The paper presents an efficient method for retrieving partially relevant videos using super images and large-scale vision-and-language models, outperforming previous methods.

SCHEME: Scalable Channer Mixer for Vision Transformers

Deepak Sridhar,Yunsheng Li,Nuno Vasconcelos

Compressor summary: The paper proposes SCHEME, a method to improve Vision Transformers by using sparse feature mixing and a channel covariance attention mechanism, leading to better performance with fewer computations.

A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering

Yeshuo Shu,Gangcheng Zhang,Keyi Liu,Jintong Tang,Liyan Xu

Compressor summary: The study presents a method to extract high-order features from human mobility data and cluster users into different lifestyle profiles based on their movement patterns, time series, and place semantics.

CoLLiE: Collaborative Training of Large Language Models in an Efficient Way

Kai Lv,Shuo Zhang,Tianle Gu,Shuhao Xing,Jiawei Hong,Keyu Chen,Xiaoran Liu,Yuqing Yang,Honglin Guo,Tengxiao Liu,Yu Sun,Qipeng Guo,Hang Yan,Xipeng Qiu

Compressor summary: CoLLiE is an efficient library that enables collaborative training of large language models with various optimizers and fine-tuning methods, offering efficiency, ease of use, and customization.

A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space

Hyunju Kim,Heesuk Son,Dongman Lee

Compressor summary: The paper proposes an efficient group activity recognition scheme using causality patterns extracted from pervasive sensor data without user identification, achieving high accuracy and low runtime overhead in real environments.

VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things

Yaoyao Zhong,Mengshi Qi,Rui Wang,Yuhan Qiu,Yang Zhang,Huadong Ma

Compressor summary: The paper introduces VIoTGPT, a framework that uses large language models to interact with humans, query knowledge from videos, and invoke vision models for complex tasks in the Video Internet of Things.

GFN-SR: Symbolic Regression with Generative Flow Networks

Sida Li,Ioana Marinescu,Sebastian Musslick

Compressor summary: The paper proposes a new method (GFN-SR) for symbolic regression using deep learning and stochastic policy learning, which performs better than existing methods in noisy data scenarios.

Study and Survey on Gesture Recognition Systems

Kshitij Deshpande,Varad Mashalkar,Kaustubh Mhaisekar,Amaan Naikwadi,Archana Ghotkar

Compressor summary: This paper surveys the application, methodology, data sources, and challenges of gesture recognition systems in various sectors and compares different techniques for capturing gestures.

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Junchen Zhao,Yurun Song,Simeng Liu,Ian G. Harris,Sangeetha Abdu Jyothi

Compressor summary: LinguaLinked is a system that enables efficient distributed inference of large language models on mobile devices by optimizing model assignment, data transmission, and runtime load balancing.

Enhancing Explainability in Mobility Data Science through a combination of methods

Georgios Makridis,Vasileios Koukos,Georgios Fatouros,Dimosthenis Kyriazis

Compressor summary: The paragraph introduces a comprehensive framework that combines various XAI techniques to interpret models trained on trajectory data and improve the understanding of model decisions for different user demographics.

Optimal Sample Complexity of Contrastive Learning

Noga Alon,Dmitrii Avdiukhin,Dor Elboim,Orr Fischer,Grigory Yaroslavtsev

Compressor summary: The paper studies how many labeled examples are needed to learn good representations using contrastive learning, and gives optimal bounds for various distance functions.

SynFundus: Generating a synthetic fundus images dataset with millions of samples and multi-disease annotations

Fangxin Shang,Jie Fu,Yehui Yang,Lei Ma

Compressor summary: The SynFundus-1M dataset provides over 1 million realistic synthetic retinal fundus images and annotations for medical imaging research, overcoming the challenge of privacy restrictions and outperforming existing methods.

Text-Guided 3D Face Synthesis -- From Generation to Editing

Yunjie Wu,Yapeng Meng,Zhipeng Hu,Lincheng Li,Haoqian Wu,Kun Zhou,Weiwei Xu,Xin Yu

Compressor summary: The paper presents a text-guided framework for generating and editing 3D faces using geometry-texture decoupling and diffusion models, with improved quality and consistency.

Streaming Bayesian Modeling for predicting Fat-Tailed Customer Lifetime Value

Alexey V. Calabourdin,Konstantin A. Aksenov

Compressor summary: The authors present an online learning method for hierarchical bayesian models, a generalized fat-tailed LTV model, and its application to commercial LTV data.

Benchmarking Multi-Domain Active Learning on Image Classification

Jiayi Li,Rohan Taori,Tatsunori B. Hashimoto

Compressor summary: Active learning's effectiveness on large-real world datasets is underexplored; existing research mostly ignores multi-domain data, which our new benchmark and dataset aim to address.

Dancing with Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang,Yue Xu,Cewu Lu,Yong-Lu Li

Compressor summary: The authors present a new method for efficient machine learning with videos by disentangling static and dynamic information using still images and a learnable memory block.

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

Shaohua Dong,Yunhe Feng,Qing Yang,Yan Huang,Dongfang Liu,Heng Fan

Compressor summary: The paper introduces DPLNet, a simple and efficient network for multimodal semantic segmentation that adapts a frozen pre-trained RGB model with two prompt learning modules.

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Yefan Zhou,Tianyu Pang,Keqin Liu,Charles H. Martin,Michael W. Mahoney,Yaoqing Yang

Compressor summary: TempBalance is a layer-wise learning rate method based on Heavy-Tailed Self-Regularization Theory, which improves performance in neural network training by balancing temperature across layers.

On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs

Pei-Chi Lo,Yi-Hang Tsai,Ee-Peng Lim,San-Yih Hwang

Compressor summary: The paper studies how large language models use their pre-trained knowledge graphs for reasoning tasks and identifies two types of hallucinations that may occur.

Manipulating the Label Space for In-Context Classification

Haokun Chen,Xu Yang,Yuhang Huang,Zihan Wu,Jing Wang,Xin Geng

Compressor summary: The paper proposes two strategies to improve In-context Learning for Vision-Language Models by manipulating the label space of in-context examples, leading to better classification performance on various datasets.

The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP

Julian Michael

Compressor summary: The author proposes a method to develop scalable theories of linguistic structure using machine learning and Question-Answer driven Semantic Role Labeling, aiming to contribute to intelligible AI systems.

RTQ: Rethinking Video-language Understanding Based on Image-text Model

Xiao Wang,Yaoyu Li,Tian Gan,Zheng Zhang,Jingjing Lv,Liqiang Nie

Compressor summary: The RTQ framework tackles challenges in video-language understanding by refining information, modeling temporal relations, and querying task-specific details, achieving high performance without pre-training.

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

Xianda Guo,Juntao Lu,Chenming Zhang,Yiqi Wang,Yiqun Duan,Tian Yang,Zheng Zhu,Long Chen

Compressor summary: The paper introduces OpenStereo, a comprehensive and efficient stereo matching toolbox with over 12 network models, and evaluates its performance on the SceneFlow dataset.

Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk

Dohyeong Kim,Songhwai Oh

Compressor summary: The paper presents off-policy TRC, an RL method with CVaR constraints that uses surrogate functions to reduce estimation error and adapts trust-region constraint to ensure policy stability in complex environments.

Hypergraph Node Representation Learning with One-Stage Message Passing

Shilin Qu,Weiqing Wang,Yuan-Fang Li,Xin Zhou,Fajie Yuan

Compressor summary: The paragraph describes a novel one-stage message passing paradigm for hypergraph node representation learning that combines Transformers and hypergraph Laplacian to model both global and local information, achieving state-of-the-art results on semi-supervised hypernode classification.

Learning Anatomically Consistent Embedding for Chest Radiography

Ziyu Zhou,Haozhe Luo,Jiaxuan Pang,Xiaowei Ding,Michael Gotway,Jianming Liang

Compressor summary: PEAC is a new self-supervised learning approach for medical images that leverages anatomical consistency to improve performance and interpretability in various downstream tasks.

Green Edge AI: A Contemporary Survey

Yuyi Mao,Xianghao Yu,Kaibin Huang,Ying-Jun Angela Zhang,Jun Zhang

Compressor summary: The text discusses how artificial intelligence technologies are becoming essential across various industries, but their resource-intensive nature and the need for large amounts of data pose challenges for edge AI on wireless networks near end-user devices, requiring an energy-conscious approach to ensure optimal performance.

Matching Weak Informative Ontologies

Peng Wang

Compressor summary: The paper proposes a method for matching weak informative ontologies (WIOs) using semantic subgraphs and a similarity propagation model that balances efficiency and quality in ontology matching.

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Gongye Liu,Menghan Xia,Yong Zhang,Haoxin Chen,Jinbo Xing,Xintao Wang,Yujiu Yang,Ying Shan

Compressor summary: StyleCrafter is a method that improves text-to-video models to generate diverse, stylized videos by using a style control adapter trained with style-rich images and a decoupling learning strategy.

Agent-OM: Leveraging Large Language Models for Ontology Matching

Zhangcheng Qiang,Weiqing Wang,Kerry Taylor

Compressor summary: The paper introduces a novel agent-powered language model approach for ontology matching, which improves performance on complex and few-shot tasks compared to existing systems.

Improving Efficiency of DNN-based Relocalization Module for Autonomous Driving with Server-side Computing

Dengbo Li,Jieren Cheng,Boyi Liu

Compressor summary: The paper proposes a new way to move cameras in self-driving cars using neural networks that works better by sharing some tasks with a remote server.

Improving Normalization with the James-Stein Estimator

Seyedalireza Khoshsirat,Chandra Kambhamettu

Compressor summary: The paper introduces a novel method to use the James-Stein estimator in deep learning normalization layers, which improves mean and variance estimation and enhances computer vision task performance without extra computational cost.

Segment Anything Model-guided Collaborative Learning Network for Scribble-supervised Polyp Segmentation

Yiming Zhao,Tao Zhou,Yunqi Gu,Yi Zhou,Yizhe Zhang,Ye Wu,Huazhu Fu

Compressor summary: The paper proposes a novel method, SAM-CLNet, for scribble-supervised polyp segmentation using a collaborative learning process between the segmentation network and SAM to boost performance and outperforms existing methods.

3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation

Zidu Wang,Xiangyu Zhu,Tianshuo Zhang,Baiqin Wang,Zhen Lei

Compressor summary: The paper introduces Part Re-projection Distance Loss (PRDL), a method that uses facial part segmentation geometry to improve 3D face reconstruction with extreme expressions, outperforming renderer-based methods in various experiments.

A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

Longfeng Nie,Yuntian Chen,Mengge Du,Changqi Sun,Dongxiao Zhang

Compressor summary: The text describes a new cloud-type identification system called CldNet that uses satellite data and improves the accuracy of identifying different cloud types.

Developmental Pretraining (DPT) for Image Classification Networks

Niranjan Rajesh,Debayan Gupta

Compressor summary: DPT is a curriculum-based pre-training approach for deep neural networks that teaches basic features like edges and shapes, inspired by human visual development, to address data scarcity issues in object recognition tasks.

Towards Aligned Canonical Correlation Analysis: Preliminary Formulation and Proof-of-Concept Results

Biqian Cheng,Evangelos E. Papalexakis,Jia Chen

Compressor summary: The proposed ACCA method aligns multiple data perspectives and embeds them in a correlated latent space using an iterative approach.

PsyAttention: Psychological Attention Model for Personality Detection

Baohua Zhang,Yongyi Huang,Wenyao Cui,Huaping Zhang,Jianyun Shang

Compressor summary: The paper proposes PsyAttention, a method for personality detection that adapts different psychological models, encodes features more effectively, and achieves higher accuracy than existing methods.

SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection

Anku Rani,Dwip Dalal,Shreya Gautam,Pankaj Gupta,Vinija Jain,Aman Chadha,Amit Sheth,Amitava Das

Compressor summary: This study proposes a novel framework using NLP techniques to detect lies of omission in deception, and analyzes their relationship with propaganda techniques.

Learning to forecast diagnostic parameters using pre-trained weather embedding

Peetak P. Mitra,Vivek Ramavajjala

Compressor summary: The paper proposes a two-stage method that enables adding new diagnostic variables to a weather prediction model without retraining the entire model, using an autoencoder to embed prognostic variables into a latent space and then training downstream models on these representations.

Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach

Xingqiu He,Chaoqun You,Tony Q. S. Quek

Compressor summary: The text discusses a new definition of Age of Information for mobile edge computing applications, which can be minimized using reinforcement learning algorithms with post-decision states to improve performance and efficiency.

Text Attribute Control via Closed-Loop Disentanglement

Lei Sha,Thomas Lukasiewicz

Compressor summary: The paper proposes a semi-supervised contrastive learning method for disentangling attributes in text without changing content, which improves on previous methods by using a closed-loop process and reducing computation cost.

Automating Continual Learning

Kazuki Irie,Róbert Csordás,Jürgen Schmidhuber

Compressor summary: The text proposes Automated Continual Learning (ACL), a method that trains neural networks to meta-learn their own algorithms for preventing catastrophic forgetting in changing environments, and shows its effectiveness on various image classification tasks.

Towards Clinical Prediction with Transparency: An Explainable AI Approach to Survival Modelling in Residential Aged Care

Teo Susnjak,Elise Griffin,Mitchell McCutcheon,Kathleen Potter

Compressor summary: The researchers developed an interpretable machine learning survival model for elderly aged care residents, which can predict 6-month survival probabilities based on various factors like age, gender, health status, and more.

Adaptability of Computer Vision at the Tactical Edge: Addressing Environmental Uncertainty

Hayden Moore

Compressor summary: The paper proposes synchronizing robust data operations and model fine-tuning driven by uncertainty quantification (UQ) to improve adaptability of computer vision (CV) systems in command and control (C2) at the tactical edge.

Academic competitions

Hugo Jair Escalante,Aleksandra Kruchinina

Compressor summary: Academic challenges in machine learning and related fields advance research, highlight specific topics and problems, and promote diversity in participation and access to research.

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

Viraj Mehta,Vikramjeet Das,Ojash Neopane,Yijia Dai,Ilija Bogunovic,Jeff Schneider,Willie Neiswanger

Compressor summary: The paper proposes an algorithm that optimizes contextual choice for human feedback in reinforcement learning, improving performance and reducing sample cost.