arxiv compressed, 2024-01-18

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-18 generated by the compressor, my personal LLM-based project.


GARField: Group Anything with Radiance Fields

Chung Min Kim,Mingxuan Wu,Justin Kerr,Ken Goldberg,Matthew Tancik,Angjoo Kanazawa

http://arxiv.org/abs/2401.09419v1

Compressor summary: GARField is a method to decompose 3D scenes into groups based on posed images, using scale and physical features to handle ambiguity and create multi-view consistent hierarchies.


Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu,Bencheng Liao,Qian Zhang,Xinlong Wang,Wenyu Liu,Xinggang Wang

http://arxiv.org/abs/2401.09417v1

Compressor summary: The paper proposes Vim, a new generic vision backbone based on bidirectional Mamba blocks, which improves efficiency and performance compared to existing vision transformers like DeiT for image classification, object detection, and semantic segmentation tasks.


TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Yu-Ying Yeh,Jia-Bin Huang,Changil Kim,Lei Xiao,Thu Nguyen-Phuoc,Numair Khan,Cheng Zhang,Manmohan Chandraker,Carl S Marshall,Zhao Dong,Zhengqin Li

http://arxiv.org/abs/2401.09416v1

Compressor summary: TextureDreamer is a novel image-guided texture synthesis method that can transfer detailed textures from real-world environments to any 3D object using only a few casually captured images, outperforming previous methods.


Vlogger: Make Your Dream A Vlog

Shaobin Zhuang,Kunchang Li,Xinyuan Chen,Yaohui Wang,Ziwei Liu,Yu Qiao,Yali Wang

http://arxiv.org/abs/2401.09414v1

Compressor summary: Vlogger is an AI system that uses a large language model as director to generate minute-long vlogs with diverse scenes, by coordinating various foundation models for scripting, acting, videography, and voice acting.


Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Mazal Bethany,Brandon Wherry,Emet Bethany,Nishant Vishwamitra,Peyman Najafirad

http://arxiv.org/abs/2401.09407v1

Compressor summary: The paper introduces T5LLMCipher, a system that detects machine-generated text in real-world scenarios by combining a pretrained T5 encoder with LLM embedding sub-clustering, achieving better generalization and attribution than existing methods.


Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations

Pengfei Hong,Deepanway Ghosal,Navonil Majumder,Somak Aditya,Rada Mihalcea,Soujanya Poria

http://arxiv.org/abs/2401.09395v1

Compressor summary: The authors develop a method to create a dataset of perturbed math problems to test the limits of large language models' mathematical reasoning skills, and find that current models perform worse on these questions, indicating a lack of robustness in their abilities.


Tri$^{2}$-plane: Volumetric Avatar Reconstruction with Feature Pyramid

Luchuan Song,Pinxin Liu,Lele Chen,Celong Liu,Chenliang Xu

http://arxiv.org/abs/2401.09386v1

Compressor summary: The Tri$^2$-plane approach improves monocular photo-realistic volumetric head avatar reconstruction by using multiple tri-planes for details enhancement and a camera-based sliding window method for robustness.


Unlocking Unlabeled Data: Ensemble Learning with the Hui- Walter Paradigm for Performance Estimation in Online and Static Settings

Kevin Slote,Elaine Lee

http://arxiv.org/abs/2401.09376v1

Compressor summary: The authors propose a new method to estimate performance metrics in machine learning without ground truth, using latent classes and Gibbs sampling from Hui-Walter paradigm.


Efficient slot labelling

Vladimir Vlasov

http://arxiv.org/abs/2401.09343v1

Compressor summary: A lightweight slot labelling method outperforms heavy PLMs in finding important arguments in user turns, requiring fewer resources and being more suitable for real-world applications.


SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Baoxiong Jia,Yixin Chen,Huangyue Yu,Yan Wang,Xuesong Niu,Tengyu Liu,Qing Li,Siyuan Huang

http://arxiv.org/abs/2401.09340v1

Compressor summary: The paper introduces SceneVerse, a large-scale 3D indoor scene dataset and GPS, a pre-training framework for 3D vision-language learning that addresses challenges in grounding language with 3D scenes.


Large Language Models Are Neurosymbolic Reasoners

Meng Fang,Shilong Deng,Yudi Zhang,Zijing Shi,Ling Chen,Mykola Pechenizkiy,Jun Wang

http://arxiv.org/abs/2401.09334v1

Compressor summary: The paper explores using Large Language Models to enhance symbolic reasoning in text-based games, achieving high performance in various symbolic tasks.


Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora

Diana Davila Gordillo,Joan Timoneda,Sebastian Vallejo Vera

http://arxiv.org/abs/2401.09333v1

Compressor summary: Key points: - The article presents a guideline to identify and classify different forms of racist discourse in large text corpora using XLM-R, a cross-lingual model. - The approach considers the context and time of interest for each form of racism. - The approach outperforms other methods and is illustrated with tweets about the Ecuadorian ind'igena community. Summary: The article introduces a method to classify various forms of racist language in large texts using a cross-lingual model that adapts to different contexts and times, and shows its effectiveness on tweets about an indigenous group.


Event-Based Visual Odometry on Non-Holonomic Ground Vehicles

Wanting Xu,Si'ao Zhang,Li Cui,Xin Peng,Laurent Kneip

http://arxiv.org/abs/2401.09331v1

Compressor summary: Key points: - Event-based motion estimation is hard but possible for planar ground vehicles using Ackermann steering model and constrained non-holonomic motion model - Single feature n-linearities are extended to quasi time-continuous event-tracks via Taylor expansions - Histogram voting is used for robust averaging over multiple event tracks - Algorithm achieves accurate and robust estimates comparable to frame-based sensors in normal conditions and outperforms traditional alternatives in challenging illumination scenarios Summary: The authors propose a method for event-based visual odometry on planar ground vehicles using Taylor expansions, histogram voting, and the Ackermann steering model, which performs similarly or better than frame-based sensors.


Online Stability Improvement of Groebner Basis Solvers using Deep Learning

Wanting Xu,Lan Hu,Manolis C. Tsakiris,Laurent Kneip

http://arxiv.org/abs/2401.09328v1

Compressor summary: The paper presents two contributions: showing that variable reordering affects accuracy in geometric vision problems and proposing a method to train a classifier for selecting the best solver using only the original coefficients.


Siamese Meets Diffusion Network: SMDNet for Enhanced Change Detection in High-Resolution RS Imagery

Jia Jia,Geunho Lee,Zhibo Wang,Lyu Zhi,Yuchu He

http://arxiv.org/abs/2401.09325v1

Compressor summary: SMDNet is a new deep learning network that combines Siamese Meets Diffusion Network to improve edge change detection accuracy and robustness in remote sensing images.


BENO: Boundary-embedded Neural Operators for Elliptic PDEs

Haixin Wang,Jiaxin Li,Anubhav Dwivedi,Kentaro Hara,Tailin Wu

http://arxiv.org/abs/2401.09323v1

Compressor summary: Boundary-Embedded Neural Operators (BENO) is a novel neural operator architecture that solves elliptic PDEs with complex geometries and inhomogeneous boundary values more efficiently by using Graph Neural Networks (GNNs) and Transformer encoder.


Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation

Wanting Xu,Xin Peng,Laurent Kneip

http://arxiv.org/abs/2401.09296v1

Compressor summary: The paper proposes a novel method for estimating camera velocities using a dynamic vision sensor and trifocal tensor geometry, improving stability and accuracy in highly dynamic scenarios.


Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Zhou Lu,Qiuyi Zhang,Xinyi Chen,Fred Zhang,David Woodruff,Elad Hazan

http://arxiv.org/abs/2401.09278v1

Compressor summary: The paper presents optimal bandit algorithms that achieve near-optimal adaptive regret in fast changing environments with minimal queries per round and show their effectiveness in various applications.


PixelDINO: Semi-Supervised Semantic Segmentation for Detecting Permafrost Disturbances

Konrad Heidler,Ingmar Nitze,Guido Grosse,Xiao Xiang Zhu

http://arxiv.org/abs/2401.09271v1

Compressor summary: The text describes a semi-supervised learning approach called PixelDINO that uses deep learning to detect retrogressive thaw slumps in Arctic permafrost using remote sensing data and improves generalization across the region.


P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced Clustering

Chuyu Zhang,Hui Ren,Xuming He

http://arxiv.org/abs/2401.09266v1

Compressor summary: The paper proposes a novel framework for deep imbalanced clustering that uses pseudo-labeling based on progressive optimal transport to handle class distribution and achieves good performance on various datasets.


MSHyper: Multi-Scale Hypergraph Transformer for Long-Range Time Series Forecasting

Zongjiang Shang,Ling Chen

http://arxiv.org/abs/2401.09261v1

Compressor summary: MSHyper is a framework for modeling high-order interactions between temporal patterns of different scales for long-range time series forecasting, achieving state-of-the-art results.


A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

Feiyang Ye,Baijiong Lin,Xiaofeng Cao,Yu Zhang,Ivor Tsang

http://arxiv.org/abs/2401.09257v1

Compressor summary: The paper presents FORUM, a novel first-order multi-gradient method for solving Multi-Objective Bi-Level Optimization (MOBLO) problems efficiently and effectively in different learning scenarios.


3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

Thiago Lopes Trugillo da Silveira,Paulo Gamarra Lessa Pinto,Jeffri Erwin Murrugarra Llerena,Claudio Rosito Jung

http://arxiv.org/abs/2401.09252v1

Compressor summary: This paper reviews various methods for estimating 3D scene geometry from omnidirectional images, covering monocular, stereo, and multi-view approaches.


Bridging the Gap Between General and Down-Closed Convex Sets in Submodular Maximization

Loay Mualem,Murad Tukan,Moran Fledman

http://arxiv.org/abs/2401.09251v1

Compressor summary: This paper proposes new algorithms for non-convex optimization that can smoothly interpolate between different types of constraints, unlike previous methods that relied on the minimum $\ell_\infty$ norm.


Learning from Emotions, Demographic Information and Implicit User Feedback in Task-Oriented Document-Grounded Dialogues

Dominic Petrak,Thy Thy Tran,Iryna Gurevych

http://arxiv.org/abs/2401.09248v1

Compressor summary: The authors introduce FEDI, a dialogue dataset for task-oriented document-grounded dialogues with demographic information, user emotions and implicit feedback annotations, which improves system performance and user acceptance.


Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handling

Jan Küchler,Daniel Kröll,Sebastian Schoenen,Andreas Witte

http://arxiv.org/abs/2401.09245v1

Compressor summary: The paper proposes a meta-classification model to improve the reliability and quality of car body part segmentation in motor claims handling using low-quality photos.


Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

Aiqi Jiang,Arkaitz Zubiaga

http://arxiv.org/abs/2401.09244v1

Compressor summary: This paper explores cross-lingual transfer learning techniques for detecting offensive language in social media, analysing 67 papers and summarising three main transfer approaches.


DaFoEs: Mixing Datasets towards the generalization of vision-state deep-learning Force Estimation in Minimally Invasive Robotic Surgery

Mikel De Iturrate Reyzabal,Mingcong Chen,Wei Huang,Sebastien Ourselin,Hongbin Liu

http://arxiv.org/abs/2401.09239v1

Compressor summary: Key points: - The paper introduces a new dataset and a pipeline to train deep neural models for contact force prediction in MIRS - It compares single dataset training with dataset mixing and shows that mixing improves translation across different domains - It also shows that increasing data volume boosts the performance of transformers for force estimation Summary: The paper presents a new vision-haptic dataset and a method to mix different datasets for contact force prediction in MIRS, which improves generalization and performance compared to single dataset training.


Classification and Reconstruction Processes in Deep Predictive Coding Networks: Antagonists or Allies?

Jan Rathjens,Laurenz Wiskott

http://arxiv.org/abs/2401.09237v1

Compressor summary: The study analyzes how classification and reconstruction processes interact in deep learning architectures inspired by predictive coding and finds a trade-off effect between them in shared latent layers, which can be alleviated by expanding dimensions or increasing network complexity.


A Characterization Theorem for Equivariant Networks with Point-wise Activations

Marco Pacini,Xiaowen Dong,Bruno Lepri,Gabriele Santin

http://arxiv.org/abs/2401.09235v1

Compressor summary: The paper presents a theorem on equivariant neural networks, discusses their practical relevance, and characterizes different types of such networks.


Dynamic Relation Transformer for Contextual Text Block Detection

Jiawei Wang,Shunchi Zhang,Kai Hu,Chixiang Ma,Zhuoyao Zhong,Lei Sun,Qiang Huo

http://arxiv.org/abs/2401.09232v1

Compressor summary: The text introduces a new graph generation framework for detecting contextual text blocks in natural scenes, using DQ-DETR for node detection and DRFormer for edge generation.


UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

Kai Hu,Jiawei Wang,Weihong Lin,Zhuoyao Zhong,Lei Sun,Qiang Huo

http://arxiv.org/abs/2401.09220v1

Compressor summary: UniVIE is a new model for Visual Information Extraction from form-like documents that uses a unified relation prediction approach to handle hierarchical structures.


Space and Time Continuous Physics Simulation From Partial Observations

Janny Steeven,Nadri Madiha,Digne Julie,Wolf Christian

http://arxiv.org/abs/2401.09198v1

Compressor summary: The authors propose a data-driven method for predicting fluid dynamics in a continuous spatial and temporal domain using recurrent GNNs and spatio-temporal attention, achieving better results than existing approaches.


Training-Free Semantic Video Composition via Pre-trained Diffusion Model

Jiaqi Guo,Sitong Su,Junchen Zhu,Lianli Gao,Jingkuan Song

http://arxiv.org/abs/2401.09195v1

Compressor summary: The paper proposes a training-free method using a diffusion model to create harmonious video compositions with different foregrounds and backgrounds, handling various semantic disparities.


GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based Histogram Intersection

Alessandro Bicciato,Luca Cosmo,Giorgia Minello,Luca Rossi,Andrea Torsello

http://arxiv.org/abs/2401.09193v1

Compressor summary: Key points: - New graph neural network architecture using local feature distribution analysis - Histogram intersection kernel for similarity measurement - Outperforms graph kernels and graph neural networks on standard benchmarks Summary: The paper introduces a novel graph neural network that uses histogram intersection kernel to compare feature distributions locally, and shows its superior performance on graph classification and regression tasks.


Preparing Lessons for Progressive Training on Language Models

Yu Pan,Ye Yuan,Yichun Yin,Jiaxin Shi,Zenglin Xu,Ming Zhang,Lifeng Shang,Xin Jiang,Qun Liu

http://arxiv.org/abs/2401.09192v1

Compressor summary: Apollo is a novel method to train deep Transformers faster by learning high-layer functionality during low-layer training, with techniques like low-value-prioritized sampling and weight sharing.


An Optimal Transport Approach for Computing Adversarial Training Lower Bounds in Multiclass Classification

Nicolas Garcia Trillos,Matt Jacobs,Jakwang Kim,Matthew Werenski

http://arxiv.org/abs/2401.09191v1

Compressor summary: The paper proposes computationally efficient algorithms for adversarial training in multiclass classification using multimarginal optimal transport, and shows their effectiveness on image datasets.


Exploring the Role of Convolutional Neural Networks (CNN) in Dental Radiography Segmentation: A Comprehensive Systematic Literature Review

Walid Brahmi,Imen Jdey,Fadoua Drira

http://arxiv.org/abs/2401.09190v1

Compressor summary: This paper summarizes existing studies on using deep learning techniques for dental imaging analysis to improve diagnostic accuracy and early detection of oral health issues.


Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer

Junhao Zheng,Qianli Ma,Zhen Liu,Binquan Wu,Huawen Feng

http://arxiv.org/abs/2401.09181v1

Compressor summary: Fwd-Prompt is a prompt-based method that reduces catastrophic forgetting and negative forward transfer in multimodal large language models by projecting gradients to minimize interference between tasks and reuse pre-trained knowledge.


Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

Almudévar Antonio,Mariotte Théo,Ortega Alfonso,Tahon Marie

http://arxiv.org/abs/2401.09180v1

Compressor summary: Key points: - The paper proposes a method for unsupervised translation between domains using a modified Variational Autoencoder with two disentangled latent variables. - One latent variable is controlled by the domain, and the other one by other factors of the data. - The approach outperforms existing methods on different vision datasets and allows better control and understanding of the latent space. Summary: The paper presents a new Variational Autoencoder with two disentangled latent variables for unsupervised translation between domains, which improves performance and latent space interpretation over existing methods.


ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

Liye Chen,Biaoshun Li,Yihao Chen,Mujie Lin,Shipeng Zhang,Chenxin Li,Yu Pang,Ling Wang

http://arxiv.org/abs/2401.09176v1

Compressor summary: Key points: - ADCNet is a deep learning framework to help design potential ADCs - It integrates protein and small-molecule representation learning models - It predicts activity based on antigen, antibody, linker, payload, and DAR features - It outperforms baseline machine learning models and shows stability and robustness - It has an online platform and open source code Summary: ADCNet is a deep learning framework that uses protein and small-molecule language models to predict the activity of antibody-drug conjugates based on their structures and features, and provides an online platform and open source code.


QAnswer: Towards Question Answering Search over Websites

Kunpeng Guo,Clement Defretiere,Dennis Diefenbach,Christophe Gravier,Antoine Gourru

http://arxiv.org/abs/2401.09175v1

Compressor summary: The text discusses using Question Answering technologies for website search functionality, combining knowledge graphs and free text, and evaluates their benefits and drawbacks using case studies from Wikimedia Foundation websites.


Fine-tuning Strategies for Domain Specific Question Answering under Low Annotation Budget Constraints

Kunpeng Guo,Dennis Diefenbach,Antoine Gourru,Christophe Gravier

http://arxiv.org/abs/2401.09168v1

Compressor summary: The paper proposes a better strategy for fine-tuning QA models using pre-trained language models, which significantly improves performance under low annotation budgets without extra labeling effort.


Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System

Feng Jiang,Kuang Wang,Haizhou Li

http://arxiv.org/abs/2401.09150v1

Compressor summary: MMAPIS is an open-source system that uses LLMs to preprocess, summarize, and present scientific papers effectively across various modalities and scenarios.


Continuous Piecewise-Affine Based Motion Model for Image Animation

Hexiang Wang,Fengqi Liu,Qianyu Zhou,Ran Yi,Xin Tan,Lizhuang Ma

http://arxiv.org/abs/2401.09146v1

Compressor summary: The paper proposes a new method to animate images using expressive transformations based on keypoints, and improves the semantic consistency and structure alignment between source and driving images.


SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

Haowen Wang,Zhen Zhao,Zhao Jin,Zhengping Che,Liang Qiao,Yakun Huang,Zhipeng Fan,Xiuquan Qiao,Jian Tang

http://arxiv.org/abs/2401.09133v1

Compressor summary: Key points: - The paper proposes a self-supervised method (SM$^3$) for modeling articulated objects from multi-view RGB images before and after interaction - SM$^3$ reconstructs 3D geometries and textures, identifies movable parts, and infers joint parameters without annotations - The paper introduces the MMArt dataset, which covers diverse categories of articulated objects with multi-view and multi-modal data Summary: The paper presents SM$^3$, a self-supervised method that models articulated objects from multi-view images and reconstructs their 3D structures and joints without annotations, and the MMArt dataset, which supports this approach.


Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Benjamin Ummenhofer,Sanskar Agrawal,Rene Sepulveda,Yixing Lao,Kai Zhang,Tianhang Cheng,Stephan Richter,Shenlong Wang,German Ros

http://arxiv.org/abs/2401.09126v1

Compressor summary: The paper introduces a real-world dataset for measuring object reconstruction and rendering quality under different lighting conditions, showing that novel view synthesis is not a reliable proxy for performance evaluation.


Understanding Heterophily for Graph Neural Networks

Junfu Wang,Yuanfang Guo,Liang Yang,Yunhong Wang

http://arxiv.org/abs/2401.09125v1

Compressor summary: The paper studies how different heterophily patterns affect Graph Neural Networks, proposes Heterophilous Stochastic Block Models to incorporate graph convolution operations, and provides theoretical insights on separability gains for various heterophily patterns and node degrees.


Stream Query Denoising for Vectorized HD Map Construction

Shuo Wang,Fan Jia,Yingfei Liu,Yucheng Zhao,Zehui Chen,Tiancai Wang,Chi Zhang,Xiangyu Zhang,Feng Zhao

http://arxiv.org/abs/2401.09112v1

Compressor summary: The paper proposes a novel method, Stream Query Denoising (SQD), for improving temporal modeling in high-definition map construction for autonomous driving by denoising stream queries.


Trapped in texture bias? A large scale comparison of deep instance segmentation

Johannes Theodoridis,Jessica Hofmann,Johannes Maucher,Andreas Schilling

http://arxiv.org/abs/2401.09109v1

Compressor summary: This study evaluates the robustness of 20 instance segmentation models to out-of-distribution texture and identifies YOLACT++, SOTR, and SOLOv2 as the most robust frameworks.


RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks

Haowen Hou,F. Richard Yu

http://arxiv.org/abs/2401.09093v1

Compressor summary: The authors propose an efficient RNN-based model, RWKV-TS, for time series tasks with three features: low time complexity and memory usage, enhanced long-term sequence information capture, and high computational efficiency.


UniVG: Towards UNIfied-modal Video Generation

Ludan Ruan,Lei Tian,Chuanwei Huang,Xu Zhang,Xinyan Xiao

http://arxiv.org/abs/2401.09084v1

Compressor summary: Key points: - The paper proposes a Unified-modal Video Generation system that handles multiple tasks across text and image modalities. - It classifies video generation tasks into high-freedom and low-freedom categories, and uses different methods for each category to generate videos with better alignment and preservation of input conditions. - It outperforms existing methods in FVD, human evaluations, and is comparable to a close-source method. Summary: The paper presents a system that generates videos from text and image inputs in various tasks, using different techniques for high-freedom and low-freedom video generation to improve quality and alignment.


Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Haonan Guo,Xin Su,Chen Wu,Bo Du,Liangpei Zhang,Deren Li

http://arxiv.org/abs/2401.09083v1

Compressor summary: Remote Sensing ChatGPT is an AI agent that uses a large language model to understand user requests and execute tasks involving remote sensing images, making interpretation accessible to non-experts.


What makes for a 'good' social actor? Using respect as a lens to evaluate interactions with language agents

Lize Alberts,Geoff Keeling,Amanda McCroskery

http://arxiv.org/abs/2401.09082v1

Compressor summary: The text discusses the importance of considering relational and situational factors in ensuring ethical behavior of dialogue agents based on large language models, focusing on how they can be treated as good social actors.


Code Simulation Challenges for Large Language Models

Emanuele La Malfa,Christoph Weinhuber,Orazio Torre,Fangru Lin,Anthony Cohn,Nigel Shadbolt,Michael Wooldridge

http://arxiv.org/abs/2401.09074v1

Compressor summary: The authors study how well Large Language Models can execute computer code and find that they struggle with complex programs, suggesting a new prompting method to improve their performance.


Fixed-Budget Differentially Private Best Arm Identification

Zhirui Chen,P. N. Karthik,Yeow Meng Chee,Vincent Y. F. Tan

http://arxiv.org/abs/2401.09073v1

Compressor summary: The paper proposes a policy for identifying the best arm in linear bandits with a fixed budget and differential privacy constraints, and analyzes its performance in terms of error probability bounds.


Rethinking Spectral Graph Neural Networks with Spatially Adaptive Filtering

Jingwei Guo,Kaizhu Huang,Xinping Yi,Zixian Su,Rui Zhang

http://arxiv.org/abs/2401.09071v1

Compressor summary: This paper connects spectral filtering in Graph Neural Networks (GNNs) to spatial aggregation, showing their interpretability and proposing a new Spatially Adaptive Filtering (SAF) method that improves node classification performance.


Knowledge Pyramid: A Novel Hierarchical Reasoning Structure for Generalized Knowledge Augmentation and Inference

Qinghua Huang,Yongzhen Wang

http://arxiv.org/abs/2401.09070v1

Compressor summary: This paper introduces a new inference method for knowledge graphs that uses high-level pyramidal knowledge to improve generalization and robustness in reasoning tasks.


DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

Lixiang Han,Zhen Xiao,Zhenjiang Li

http://arxiv.org/abs/2401.09068v1

Compressor summary: DTMM is a library that efficiently deploys and executes pruned machine learning models on weak IoT devices like MCUs, using techniques for model compression, optimization, acceleration, and storage.


Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Depeng Li,Tianqi Wang,Junwei Chen,Qining Ren,Kenji Kawaguchi,Zhigang Zeng

http://arxiv.org/abs/2401.09067v1

Compressor summary: The paper proposes a method to reduce catastrophic forgetting in deep neural networks by overwriting layer-wise parameters and adapting decision boundaries between tasks without using extra memory or privacy-sensitive techniques.


CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding

Yunze Liu,Changxi Chen,Zifan Wang,Li Yi

http://arxiv.org/abs/2401.09057v1

Compressor summary: The paper presents CrossVideo, a self-supervised method for point cloud video understanding using cross-modal contrastive learning between point clouds and images.


Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu,Pan Zhou,Xuanyu Yi,Xiaoding Yuan,Hanwang Zhang

http://arxiv.org/abs/2401.09050v1

Compressor summary: The paper proposes a method called Consistent3D that uses ordinary differential equations to generate more accurate and consistent 3D models from text, improving on existing text-to-3D methods.


Enhancing Lidar-based Object Detection in Adverse Weather using Offset Sequences in Time

Raphael van Kempen,Tim Rehbronn,Abin Jose,Johannes Stegmaier,Bastian Lampe,Timo Woopen,Lutz Eckstein

http://arxiv.org/abs/2401.09049v1

Compressor summary: The text investigates how to improve lidar-based object detection in adverse weather by processing sequential data samples from lidar sensors using various neural network architectures, including a novel temporal offset augmentation strategy.


Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Jonghyun Lee,Hansam Cho,Youngjoon Yoo,Seoung Bum Kim,Yonghyun Jeong

http://arxiv.org/abs/2401.09048v1

Compressor summary: The paper introduces a method to generate images by controlling the placement of objects at different depths and applying global style from multiple examples.


VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen,Yong Zhang,Xiaodong Cun,Menghan Xia,Xintao Wang,Chao Weng,Ying Shan

http://arxiv.org/abs/2401.09047v1

Compressor summary: The paper proposes a method to train video models using low-quality videos and synthesized high-quality images, resulting in better quality videos without motion degradation.


LLMs for Relational Reasoning: How Far are We?

Zhiming Li,Yushi Cao,Xiufeng Xu,Junzhe Jiang,Xu Liu,Yon Shin Teo,Shang-wei Lin,Yang Liu

http://arxiv.org/abs/2401.09042v1

Compressor summary: The text evaluates large language models' reasoning ability on an inductive logic programming benchmark, finding them to have poor performance and generalization compared to smaller neural program induction systems.


Textual Summarisation of Large Sets: Towards a General Approach

Kittipitch Kuptavanich,Ehud Reiter,Kees Van Deemter,Advaith Siddharthan

http://arxiv.org/abs/2401.09041v1

Compressor summary: The paper presents and tests a method to create brief descriptions of groups of academic papers or consumer products using rules.


Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Tong Xie,Haoyu Li,Andrew Bai,Cho-Jui Hsieh

http://arxiv.org/abs/2401.09031v1

Compressor summary: Diffusion-TracIn and Diffusion-ReTrac are methods that help understand how neural networks learn over time by tracing model behavior back to training data and mitigating bias in influence estimation.


Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

Dunyuan Xu,Xi Wang,Jinyue Cai,Pheng-Ann Heng

http://arxiv.org/abs/2401.09029v1

Compressor summary: The text proposes a novel method for automated brain tumor diagnosis using multiple MRI modalities and dual attention to capture semantic interdependencies, which can improve accuracy and reduce noise.


Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with Explanation

Krishanu Maity,Prince Jha,Raghav Jain,Sriparna Saha,Pushpak Bhattacharyya

http://arxiv.org/abs/2401.09023v1

Compressor summary: The authors propose a new multi-task model for detecting cyberbullying in code-mixed languages, which can also explain why a post is considered bullying.


Residual Alignment: Uncovering the Mechanisms of Residual Networks

Jianing Li,Vardan Papyan

http://arxiv.org/abs/2401.09018v1

Compressor summary: The paper investigates how ResNets work by analyzing their residual blocks and identifies Residual Alignment, a process that leads to good generalization across various settings.


Inductive Models for Artificial Intelligence Systems are Insufficient without Good Explanations

Udesh Habaraduwa

http://arxiv.org/abs/2401.09011v1

Compressor summary: The paper emphasizes the need for more transparent and explainable artificial neural networks that provide insights and not just predictions.


Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks

Sulthan Rafif,Mochamad Arfan Ravy Wahyu Pratama,Mohammad Faris Azhar,Ahmad Mustafidul Ibad,Lailil Muflikhah,Novanto Yudistira

http://arxiv.org/abs/2401.09008v1

Compressor summary: The paper proposes a CNN model with learnable stride and spectral pooling techniques to improve image classification accuracy by preserving more information.


Generalized Face Liveness Detection via De-spoofing Face Generator

Xingming Long,Shiguang Shan,Jie Zhang

http://arxiv.org/abs/2401.09006v1

Compressor summary: The paper introduces AG-FAS, a method that leverages real faces to improve face anti-spoofing models by generating "real" versions of input faces using a De-spoofing Face Generator and then extracting anomalous cues with a cross-attention transformer.


Augmenting Math Word Problems via Iterative Question Composing

Haoxiong Liu,Andrew Chi-Chih Yao

http://arxiv.org/abs/2401.09003v1

Compressor summary: The authors introduce MMIQC, a dataset that improves mathematical reasoning skills of large language models by combining web data and synthetic question-response pairs, leading to higher accuracy on math problems.


AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models

Dong shu,Mingyu Jin,Suiyuan Zhu,Beichen Wang,Zihao Zhou,Chong Zhang,Yongfeng Zhang

http://arxiv.org/abs/2401.09002v1

Compressor summary: The study introduces two novel evaluation frameworks to measure how effective jailbreak attacks are on large language models and develops a comprehensive dataset as a benchmark for future research.


Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent

Hugo Laurencon,Yesoda Bhargava,Riddhi Zantye,Charbel-Raphaël Ségerie,Johann Lussange,Veeky Baths,Boris Gutkin

http://arxiv.org/abs/2401.08999v1

Compressor summary: The paper proposes a continuous time-space model (CTCS-HRRL) to study how living beings maintain their internal balance by learning from their environment and their own changing states.


Attack and Reset for Unlearning: Exploiting Adversarial Noise toward Machine Unlearning through Parameter Re-initialization

Yoonhwa Jung,Ikhyun Cho,Shun-Hsiang Hsu,Julia Hockenmaier

http://arxiv.org/abs/2401.08998v1

Compressor summary: ARU is a novel machine-unlearning approach that uses adversarial noise to create a mask and reset specific parameters, making them unlearnable and improving privacy and regulatory compliance.


MicroNAS: Zero-Shot Neural Architecture Search for MCUs

Ye Qiao,Haocheng Xu,Yifan Zhang,Sitao Huang

http://arxiv.org/abs/2401.08996v1

Compressor summary: MicroNAS is a fast and efficient method for finding optimal neural network architectures for edge devices, like microcontrollers, without sacrificing accuracy.


Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

Junwen Bai,Bo Li,Qiujia Li,Tara N. Sainath,Trevor Strohman

http://arxiv.org/abs/2401.08992v1

Compressor summary: The study proposes a simple and effective method to improve streaming multilingual ASR by using Language-Dependent Adapters (LDAs) that only account for 0.4% of the full model per language, achieving significant word error rate reduction and alleviating performance degradation.


Rigid Protein-Protein Docking via Equivariant Elliptic-Paraboloid Interface Prediction

Ziyang Yu,Wenbing Huang,Yang Liu

http://arxiv.org/abs/2401.08986v1

Compressor summary: ElliDock is a novel, fast, and accurate learning-based method for protein-protein docking that uses elliptic paraboloid interfaces to represent the docking interface.


ACT-GAN: Radio map construction based on generative adversarial networks with ACT blocks

Chen Qi,Yang Jingjing,Huang Ming,Zhou Qiang

http://arxiv.org/abs/2401.08976v1

Compressor summary: The paper introduces ACT-GAN, a new method to create more accurate radio maps using generative adversarial networks and various blocks, which improves reconstruction accuracy and local texture, and works well in different scenarios.


OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

Aditya Sharma,Luke Yoffe,Tobias Höllerer

http://arxiv.org/abs/2401.08973v1

Compressor summary: The paper presents OCTO+, a new method for automatically placing virtual objects in natural locations using open-vocabulary vision-language models, and introduces a benchmark for evaluating this task without user studies.


Hearing Loss Detection from Facial Expressions in One-on-one Conversations

Yufeng Yin,Ishwarya Ananthabhotla,Vamsi Krishna Ithapu,Stavros Petridis,Yu-Hsiang Wu,Christi Miller

http://arxiv.org/abs/2401.08972v1

Compressor summary: The paper proposes a self-supervised pre-training strategy to detect hearing loss from facial expressions in real-world conversation scenarios, mitigating age bias using adversarial representation learning.


COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Xiaotian Han,Yiqi Wang,Bohan Zhai,Quanzeng You,Hongxia Yang

http://arxiv.org/abs/2401.08968v1

Compressor summary: Key points: - MLLMs are AI models that need visual IFT to align output with user intentions - Visual IFT datasets are constructed with a multifaceted approach using GPT-4 and rule-based templates - LLaVA-mix-665k is an effective but limited IFT dataset for multi-round dialog - A new IFT dataset with diverse and detailed instructions is proposed and shows better performance on open-ended evaluation benchmarks Summary: The paper proposes a new IFT dataset for MLLMs that improves their open-ended generation and instruction following ability in dialog settings.


ReFT: Reasoning with Reinforced Fine-Tuning

Trung Quoc Luong,Xinbo Zhang,Zhanming Jie,Peng Sun,Xiaoran Jin,Hang Li

http://arxiv.org/abs/2401.08967v1

Compressor summary: Reinforced Fine-Tuning (ReFT) improves the reasoning capability of Large Language Models (LLMs) by using online reinforcement learning to learn from multiple annotated reasoning paths for math problems, without needing extra training data.


Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices

Lei Xun,Jonathon Hare,Geoff V. Merrett

http://arxiv.org/abs/2401.08965v1

Compressor summary: The text proposes a system for managing DNN performance trade-offs using dynamic super-networks and runtime resource management to improve efficiency and reduce latency on mobile devices.


Cascading Reinforcement Learning

Yihan Du,R. Srikant,Wei Chen

http://arxiv.org/abs/2401.08961v1

Compressor summary: The text proposes a generalized cascading bandit framework that considers user states and state transitions in recommendation systems, and develops two computationally-efficient and sample-efficient algorithms with near-optimal regret and sample complexity guarantees.


Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback

Teng Xiao,Suhang Wang

http://arxiv.org/abs/2401.08959v1

Compressor summary: The paper proposes a new algorithm that optimizes both user rewards and ranking metrics using offline data in a unified EM framework.


Fluid Dynamic DNNs for Reliable and Adaptive Distributed Inference on Edge Devices

Lei Xun,Mingyu Hu,Hengrui Zhao,Amit Kumar Singh,Jonathon Hare,Geoff V. Merrett

http://arxiv.org/abs/2401.08943v1

Compressor summary: Fluid DyDNNs are a novel distributed DNN inference approach that improves system reliability, adaptability, and efficiency on embedded Arm CPUs.


CEL: A Continual Learning Model for Disease Outbreak Prediction by Leveraging Domain Adaptation via Elastic Weight Consolidation

Saba Aslam,Abdur Rasool,Hongyan Wu,Xiaoli Li

http://arxiv.org/abs/2401.08940v1

Compressor summary: The study introduces a novel Continual Extreme Learning (CEL) model that uses Elastic Weight Consolidation to mitigate catastrophic forgetting and achieve high performance in predicting disease outbreaks.


ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

Weiyao Wang,Pierre Gleize,Hao Tang,Xingyu Chen,Kevin J Liang,Matt Feiszli

http://arxiv.org/abs/2401.08937v1

Compressor summary: ICON is a novel optimization procedure for training NeRFs from 2D video frames without pose initialization, using smooth camera motion and adaptive weighting of gradients based on confidence measures.


DeLF: Designing Learning Environments with Foundation Models

Aida Afshar,Wenchao Li

http://arxiv.org/abs/2401.08936v1

Compressor summary: DeLF is a method that uses large language models to design and code user-intended learning scenarios for reinforcement learning applications.


Learning to detect cloud and snow in remote sensing images from noisy labels

Zili Liu,Hao Chen,Wenyuan Li,Keyan Chen,Zipeng Qi,Chenyang Liu,Zhengxia Zou,Zhenwei Shi

http://arxiv.org/abs/2401.08932v1

Compressor summary: The text proposes a new dataset and training strategy for cloud and snow detection in remote sensing images, addressing the impact of noisy labels and improving performance with UNet and Segformer models.


3D Human Pose Analysis via Diffusion Synthesis

Haorui Ji,Hongdong Li

http://arxiv.org/abs/2401.08930v1

Compressor summary: PADS is a novel framework that uses diffusion synthesis to learn a task-agnostic pose prior and unify various 3D human pose analysis tasks as inverse problems.


Uncertainty-aware No-Reference Point Cloud Quality Assessment

Songlin Fan,Zixuan Guo,Wei Gao,Ge Li

http://arxiv.org/abs/2401.08926v1

Compressor summary: The paper proposes a probabilistic model for point cloud quality assessment that accounts for human judging stochasticity and generates multiple intermediate ratings to predict the final quality score.


Partial Diacritization: A Context-Contrastive Inference Approach

Muhammad ElNokrashy,Badr AlKhamissi

http://arxiv.org/abs/2401.08919v1

Compressor summary: The study introduces Context-Contrastive Partial Diacritization (CCPD), a new method for improving readability and disambiguation in Arabic texts by marking only some characters based on context, and proposes novel performance indicators and a Transformer-variant model for CCPD.


Efficient Image Super-Resolution via Symmetric Visual Attention Network

Chengxu Wu,Qinrui Fan,Shu Hu,Xi Wu,Xin Wang,Jing Hu

http://arxiv.org/abs/2401.08913v1

Compressor summary: SVAN is a novel network that uses large receptive fields for efficient super-resolution by combining convolution operations and an attention mechanism, achieving high-quality results with fewer parameters.


Characterising Gradients for Unsupervised Accuracy Estimation under Distribution Shift

Renchunzi Xie,Ambroise Odonnat,Vasilii Feofanov,Ievgen Redko,Jianfeng Zhang,Bo An

http://arxiv.org/abs/2401.08909v1

Compressor summary: The paper proposes a new method to estimate test accuracy using gradient norms over one gradient step, which works better than existing methods under distribution shift.


Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni,Benjamin Eysenbach,Erfan Seyedsalehi,Michel Ma,Clement Gehring,Aditya Mahajan,Pierre-Luc Bacon

http://arxiv.org/abs/2401.08898v1

Compressor summary: The paper introduces self-predictive abstraction as a unifying idea behind various representation learning methods in deep reinforcement learning, and presents a minimalist algorithm to learn such representations with theoretical insights and empirical validation.


CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder

Hee-Jun Jung,Jaehyoung Jeong,Kangil Kim

http://arxiv.org/abs/2401.08897v1

Compressor summary: CFASL is a novel method for unsupervised learning of symmetry-based disentanglement in VAEs without knowing the dataset factor information, incorporating three features to align latent vector dimensions and induce group equivariant encoder and decoder.


cedar: Composable and Optimized Machine Learning Input Data Pipelines

Mark Zhao,Emanuel Adamiak,Christos Kozyrakis

http://arxiv.org/abs/2401.08895v1

Compressor summary: Cedar is a framework that simplifies building, optimizing, and executing input data pipelines for machine learning training using various optimizations and operators.


MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Kaan Ozkara,Can Karakus,Parameswaran Raman,Mingyi Hong,Shoham Sabach,Branislav Kveton,Volkan Cevher

http://arxiv.org/abs/2401.08893v1

Compressor summary: MADA is a framework that learns the best adaptive optimizer for deep learning during training, and it often outperforms existing optimizers like Adam, Lion, and Adan.