arxiv compressed, 2024-01-12

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-12 generated by the compressor, my personal LLM-based project.


Distilling Vision-Language Models on Millions of Videos

Yue Zhao,Long Zhao,Xingyi Zhou,Jialin Wu,Chun-Te Chu,Hui Miao,Florian Schroff,Hartwig Adam,Ting Liu,Boqing Gong,Philipp Krähenbühl,Liangzhe Yuan

http://arxiv.org/abs/2401.06129v1

Compressor summary: The authors create a video-language model using synthesized data and fine-tuning, which improves performance on various benchmarks compared to existing methods.


E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Yifan Gong,Zheng Zhan,Qing Jin,Yanyu Li,Yerlan Idelbayev,Xian Liu,Andrey Zharkov,Kfir Aberman,Sergey Tulyakov,Yanzhi Wang,Jian Ren

http://arxiv.org/abs/2401.06127v1

Compressor summary: The text describes a novel approach to improve the efficiency of distilling GANs from diffusion models for real-time image editing on mobile devices using innovative techniques like generalized features, Low-Rank Adaptation, and minimal data fine-tuning.


Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

Jack Saunders,Vinay Namboodiri

http://arxiv.org/abs/2401.06126v1

Compressor summary: The authors propose a method for high-quality visual dubbing using data-efficient neural rendering priors and actor-specific adaptation, which generalizes to limited data and outperforms existing approaches in terms of visual quality and recognizability.


Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva,Marina M. -C. Höhne,Alexander Warnecke,Lukas Pirch,Klaus-Robert Müller,Konrad Rieck,Kirill Bykov

http://arxiv.org/abs/2401.06122v1

Compressor summary: The paper explores how adversarial manipulations can falsify neural network explanations and proposes a method to protect them.


TOFU: A Task of Fictitious Unlearning for LLMs

Pratyush Maini,Zhili Feng,Avi Schwarzschild,Zachary C. Lipton,J. Zico Kolter

http://arxiv.org/abs/2401.06121v1

Compressor summary: TOFU is a benchmark for evaluating the effectiveness of unlearning methods in large language models, using synthetic author profiles and diverse metrics.


Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian,Andrei Panferov,Denis Kuznedelev,Elias Frantar,Artem Babenko,Dan Alistarh

http://arxiv.org/abs/2401.06118v1

Compressor summary: The paper proposes a new algorithm called Additive Quantization for Language Models (AQLM) that significantly improves the compression and accuracy of large language models, enabling them to run on end-user devices with very low bit counts.


Gaussian Shadow Casting for Neural Characters

Luis Bolanos,Shih-Yang Su,Helge Rhodin

http://arxiv.org/abs/2401.06116v1

Compressor summary: The proposed Gaussian shadow model enables realistic shadows and shading in neural character models by using a simple analytic formula instead of costly sampling, improving reconstructions and poses in various scenes.


Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings

Hiroaki Yamagiwa,Yusuke Takase,Hidetoshi Shimodaira

http://arxiv.org/abs/2401.06112v1

Compressor summary: Axis Tour is a novel method that optimizes the order of interpretable semantic axes in word embeddings using Independent Component Analysis (ICA) for improved clarity and performance on downstream tasks.


PALP: Prompt Aligned Personalization of Text-to-Image Models

Moab Arar,Andrey Voynov,Amir Hertz,Omri Avrahami,Shlomi Fruchter,Yael Pritch,Daniel Cohen-Or,Ariel Shamir

http://arxiv.org/abs/2401.06105v1

Compressor summary: Prompt-aligned personalization is a new method for generating personalized images that align well with complex textual prompts, while preserving subject fidelity and user requirements.


Transformers are Multi-State RNNs

Matanel Oren,Michael Hassid,Yossi Adi,Roy Schwartz

http://arxiv.org/abs/2401.06104v1

Compressor summary: This paper shows that transformers can be seen as infinite multi-state RNNs and introduces a new conversion policy, TOVA, which improves performance on long range tasks with less cache memory usage.


Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Asma Ghandeharioun,Avi Caciularu,Adam Pearce,Lucas Dixon,Mor Geva

http://arxiv.org/abs/2401.06102v1

Compressor summary: Patchscopes is a framework that explains the hidden representations of large language models in natural language, addressing limitations of prior methods and enabling new applications.


A Closer Look at AUROC and AUPRC under Class Imbalance

Matthew B. A. McDermott,Lasse Hyldig Hansen,Haoran Zhang,Giovanni Angelotti,Jack Gallifant

http://arxiv.org/abs/2401.06091v1

Compressor summary: The paper challenges the notion that AUPRC is superior to AUROC for binary classification tasks with class imbalance, and shows that AUPRC can be biased and harmful in such cases.


PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Piyush Sao,Andrey Prokopenko,Damien Lebrun-Grandié

http://arxiv.org/abs/2401.06089v1

Compressor summary: \pandora is a parallel algorithm that efficiently constructs dendrograms for hierarchical clustering, using a recursive tree contraction method, achieving significant speed-ups on CPUs and GPUs.


Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models

K M Sajjadul Islam,Ayesha Siddika Nipu,Praveen Madiraju,Priya Deshpande

http://arxiv.org/abs/2401.06088v1

Compressor summary: Key points: - The text is about a study that develops an autocompletion tool for documenting Chief Complaints (CC) in medical records using machine learning models and natural language generation techniques. - The study compares three variants of BioGPT and a Long Short-Term Memory (LSTM) model, as well as a GPT-4 prompt. - The study evaluates the models' performance based on perplexity, modified BERTScore, and cosine similarity score. - The results show that BioGPT-Large performs best among the models and leads to an effective autocompletion tool. Summary: The study develops and evaluates an autocompletion tool for generating Chief Complaints in medical records using various machine learning models and natural language generation techniques, finding that BioGPT-Large outperforms other models and produces accurate and well-formatted CC phrases or sentences.


XGBoost Learning of Dynamic Wager Placement for In-Play Betting on an Agent-Based Model of a Sports Betting Exchange

Chawin Terawong,Dave Cliff

http://arxiv.org/abs/2401.06086v1

Compressor summary: XGBoost is a machine learning method that learns profitable betting strategies from synthetic data generated by an agent-based model of a sports-betting exchange, and can generalize to outperform the original strategies.


Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint

Zhipeng Chen,Kun Zhou,Wayne Xin Zhao,Junchen Wan,Fuzheng Zhang,Di Zhang,Ji-Rong Wen

http://arxiv.org/abs/2401.06081v1

Compressor summary: RLMEC is a new RL method that uses a generative model to provide token-level rewards for training large language models, improving their performance on complex reasoning tasks and reducing harmful outputs.


Secrets of RLHF in Large Language Models Part II: Reward Modeling

Binghai Wang,Rui Zheng,Lu Chen,Yan Liu,Shihan Dou,Caishuang Huang,Wei Shen,Senjie Jin,Enyu Zhou,Chenyu Shi,Songyang Gao,Nuo Xu,Yuhao Zhou,Xiaoran Fan,Zhiheng Xi,Jun Zhao,Xiao Wang,Tao Ji,Hang Yan,Lixing Shen,Zhan Chen,Tao Gui,Qi Zhang,Xipeng Qiu,Xuanjing Huang,Zuxuan Wu,Yu-Gang Jiang

http://arxiv.org/abs/2401.06080v1

Compressor summary: The paper proposes methods to improve reward models for reinforcement learning from human feedback by addressing challenges related to preference data quality and generalization.


Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion

Ruilin Luo,Tianle Gu,Haoling Li,Junzhe Li,Zicheng Lin,Jiayi Li,Yujiu Yang

http://arxiv.org/abs/2401.06072v1

Compressor summary: The paper presents a novel approach to predict missing event links in future timestamps using LLMs fine-tuned with historical data and structural information, achieving state-of-the-art results.


LEGO:Language Enhanced Multi-modal Grounding Model

Zhaowei Li,Qi Xu,Dong Zhang,Hang Song,Yiqing Cai,Qi Qi,Ran Zhou,Junting Pan,Zefeng Li,Van Tu Vu,Zhida Huang,Tao Wang

http://arxiv.org/abs/2401.06071v1

Compressor summary: The paper introduces LEGO, a multi-modal model that captures both global and local information across different modalities, enhancing its performance in various tasks requiring fine-grained understanding of input data.


DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Damai Dai,Chengqi Deng,Chenggang Zhao,R. X. Xu,Huazuo Gao,Deli Chen,Jiashi Li,Wangding Zeng,Xingkai Yu,Y. Wu,Zhenda Xie,Y. K. Li,Panpan Huang,Fuli Luo,Chong Ruan,Zhifang Sui,Wenfeng Liang

http://arxiv.org/abs/2401.06066v1

Compressor summary: DeepSeekMoE is a new language model architecture that improves expert specialization and reduces computational costs compared to conventional MoE architectures like GShard.


Investigating Data Contamination for Pre-training Language Models

Minhao Jiang,Ken Ziyu Liu,Ming Zhong,Rylan Schaeffer,Siru Ouyang,Jiawei Han,Sanmi Koyejo

http://arxiv.org/abs/2401.06059v1

Compressor summary: The paper explores how pre-training language models with evaluation data affects their performance on downstream tasks and highlights limitations of current contamination definitions.


MatSynth: A Modern PBR Materials Dataset

Giuseppe Vecchio,Valentin Deschaintre

http://arxiv.org/abs/2401.06056v1

Compressor summary: MatSynth is a large and diverse dataset of high-quality, public domain materials for use in virtual environments.


Fast High Dynamic Range Radiance Fields for Dynamic Scenes

Guanjun Wu,Taoran Yi,Jiemin Fang,Wenyu Liu,Xinggang Wang

http://arxiv.org/abs/2401.06052v1

Compressor summary: The paper proposes HDR-HexPlane, a dynamic HDR NeRF framework that can learn 3D scenes from dynamic 2D images with various exposures and render high-quality novel-view images at any time point with desired exposure.


Wavelet-Inspired Multiscale Graph Convolutional Recurrent Network for Traffic Forecasting

Qipeng Qian,Tanwi Mallick

http://arxiv.org/abs/2401.06040v1

Compressor summary: WavGCRN is a novel method that combines wavelet transformation, graph convolutional recurrent networks, and road network information to improve spatiotemporal traffic forecasting by modeling multiscale structure in traffic data.


RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks

Partha Ghosh,Soubhik Sanyal,Cordelia Schmid,Bernhard Schölkopf

http://arxiv.org/abs/2401.06035v1

Compressor summary: The authors propose a new generative model for videos that can handle long-term dependencies, reduce computational complexity, and synthesize high-quality video clips efficiently.


LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization

Muhammad Farid Adilazuarda,Samuel Cahyawijaya,Alham Fikri Aji,Genta Indra Winata,Ayu Purwarianti

http://arxiv.org/abs/2401.06034v1

Compressor summary: LinguAlchemy improves PLMs' performance on unseen languages by regularizing them with linguistic constraints, making them more inclusive and accessible.


GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

Zhiyu Zhu,Huaming Chen,Xinyi Wang,Jiayu Zhang,Zhibo Jin,Kim-Kwang Raymond Choo

http://arxiv.org/abs/2401.06031v1

Compressor summary: GE-AdvGAN improves the efficiency and transferability of adversarial attacks by optimizing generator training with a novel gradient editing mechanism in GANs.


Automatic UAV-based Airport Pavement Inspection Using Mixed Real and Virtual Scenarios

Pablo Alonso,Jon Ander Iñiguez de Gordoa,Juan Diego Ortega,Sara García,Francisco Javier Iriarte,Marcos Nieto

http://arxiv.org/abs/2401.06019v1

Compressor summary: Key points: - Runway and taxiway pavements deteriorate over time and need regular inspection - UAV-based vision system using DL can identify pavement defects automatically - Synthetic dataset generation helps overcome data scarcity for training Summary: The paper proposes a vision-based DL method using UAVs to detect pavement defects, and a synthetic dataset generation technique to train the model with limited real data.


Surgical-DINO: Adapter Learning of Foundation Model for Depth Estimation in Endoscopic Surgery

Cui Beilei,Islam Mobarakol,Bai Long,Ren Hongliang

http://arxiv.org/abs/2401.06013v1

Compressor summary: The paper introduces Surgical-DINO, a low-rank adaptation of DINOv2 for depth estimation in robotic surgery, which significantly outperforms existing models on the SCARED dataset.


Attention to detail: inter-resolution knowledge distillation

Rocío del Amor,Julio Silva-Rodríguez,Adrián Colomer,Valery Naranjo

http://arxiv.org/abs/2401.06010v1

Compressor summary: Key points: - Computer vision solutions for gigapixel images in digital pathology face computational limitations due to large image size. - Knowledge distillation uses soft labels and features from high-resolution images to train a model that works with lower-resolution images. - Incorporating attention maps, such as grad-CAMs, helps transfer discriminative information across resolutions and improve performance. Summary: The authors propose using attention maps, like grad-CAMs, to guide knowledge distillation for computer vision in digital pathology, improving model performance at different image resolutions.


Sea ice detection using concurrent multispectral and synthetic aperture radar imagery

Martin S J Rogers,Maria Fox,Andrew Fleming,Louisa van Zeeland,Jeremy Wilkinson,J. Scott Hosking

http://arxiv.org/abs/2401.06009v1

Compressor summary: ViSual_IceD is a CNN that fuses multispectral and SAR imagery for accurate sea ice detection in polar regions, outperforming other models and complementing passive microwave data.


TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Linus Franke,Darius Rückert,Laura Fink,Marc Stamminger

http://arxiv.org/abs/2401.06003v1

Compressor summary: TRIPS is a novel technique that combines ideas from Gaussian Splatting and ADOP to render high-quality images of highly detailed scenes at real-time speeds with a differentiable pipeline.


MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring

Qian Gong,Jieyang Chen,Ben Whitney,Xin Liang,Viktor Reshniak,Tania Banerjee,Jaemoon Lee,Anand Rangarajan,Lipeng Wan,Nicolas Vidal,Qing Liu,Ana Gainaru,Norbert Podhorszki,Richard Archibald,Sanjay Ranka,Scott Klasky

http://arxiv.org/abs/2401.05994v1

Compressor summary: MGARD is a software tool for compressing and managing large scientific data on grids across various computing architectures.


UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization

Rouwan Wu,Xiaoya Cheng,Juelin Zhu,Xuxiang Liu,Maojun Zhang,Shen Yan

http://arxiv.org/abs/2401.05971v1

Compressor summary: The paper introduces a large-scale dataset for UAV localization and presents a two-stage pipeline that combines synthetic data generation and visual localization, as well as a hierarchical system for 3D ground target tracking.


Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem

Niklas Strauß,Matthias Schubert

http://arxiv.org/abs/2401.05969v1

Compressor summary: Spatial-aware deep reinforcement learning (SATOP) method improves the traveling officer problem by exploiting spatial relationships and learning future inter-action correlations, resulting in 22% more fines in Melbourne.


A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting

Yashwardhan Chaudhuri,Ankit Kumar,Orchid Chetia Phukan,Arun Balaji Buduru

http://arxiv.org/abs/2401.05968v1

Compressor summary: The paper introduces two lightweight crowd-counting models that use MobileNet and MobileViT backbones, feature fusion, and are computationally efficient compared to previous methods.


Block-Diagonal Orthogonal Relation and Matrix Entity for Knowledge Graph Embedding

Yihua Zhu,Hidetoshi Shimodaira

http://arxiv.org/abs/2401.05967v1

Compressor summary: OrthogonalE is a novel Knowledge Graph embedding model that uses matrices for entities and block-diagonal orthogonal matrices with Riemannian optimization for relations, improving generality and flexibility over existing methods.


An attempt to generate new bridge types from latent space of PixelCNN

Hongjun Zhang

http://arxiv.org/abs/2401.05964v1

Compressor summary: The authors propose a method to generate new bridge types using generative artificial intelligence, which combines different structural components and can potentially lead to artificial general intelligence.


Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

Xijun Li,Fangzhou Zhu,Hui-Ling Zhen,Weilin Luo,Meng Lu,Yimin Huang,Zhenan Fan,Zirui Zhou,Yufei Kuang,Zhihai Wang,Zijie Geng,Yang Li,Haoyang Liu,Zhiwu An,Muming Yang,Jianshu Li,Jie Wang,Junchi Yan,Defeng Sun,Tao Zhong,Yong Zhang,Jia Zeng,Mingxuan Yuan,Jianye Hao,Jun Yao,Kun Mao

http://arxiv.org/abs/2401.05960v1

Compressor summary: The paper presents a study on enhancing Huawei Cloud's OptVerse AI Solver with machine learning techniques to improve efficiency and performance in mathematical programming tasks.


LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase

Chujie Gao,Dongping Chen,Qihui Zhang,Yue Huang,Yao Wan,Lichao Sun

http://arxiv.org/abs/2401.05952v1

Compressor summary: The text introduces mixcase, a hybrid text form of machine-generated and human-generated content, and MixSet, the first dataset to study mixed modification scenarios in large language models.


Universal Vulnerabilities in Large Language Models: In-context Learning Backdoor Attacks

Shuai Zhao,Meihuizi Jia,Luu Anh Tuan,Jinming Wen

http://arxiv.org/abs/2401.05949v1

Compressor summary: In-context learning is an effective NLP paradigm but has security risks as it can be exploited by ICLAttack, a new backdoor attack method that manipulates model behavior without fine-tuning.


Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

Antoine Dedieu,Wolfgang Lehrach,Guangyao Zhou,Dileep George,Miguel Lázaro-Gredilla

http://arxiv.org/abs/2401.05946v1

Compressor summary: The paper proposes a transformer model with discrete bottlenecks that can learn compressed representations of observations and actions, enabling it to extract interpretable cognitive maps for path planning in partially observed environments.


SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully

Jushi Kai,Tianhang Zhang,Hai Hu,Zhouhan Lin

http://arxiv.org/abs/2401.05930v1

Compressor summary: The paper proposes a method to help language models generate text more truthfully by highlighting and hesitating on less probable but factual tokens.


Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback

Jiashuo Wang,Chunpu Xu,Chak Tou Leong,Wenjie Li,Jing Li

http://arxiv.org/abs/2401.05928v1

Compressor summary: Muffin is a framework that uses contrastive learning to reduce unhelpful responses in emotional support systems by considering multiple factors such as empathy, support strategies, and coherence.


CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians

Bin Dou,Tianyu Zhang,Yongjia Ma,Zhaohui Wang,Zejian Yuan

http://arxiv.org/abs/2401.05925v1

Compressor summary: Our method improves 3D scene segmentation speed and quality by optimizing Gaussian points, fusing spatial and semantic features, and using a shallow decoding network.


How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Sabina Elkins,Ekaterina Kochmar,Jackie C. K. Cheung,Iulian Serban

http://arxiv.org/abs/2401.05914v1

Compressor summary: The paper shows how a language model-based question generation system can be used by teachers to create quizzes with learning goals from Bloom's taxonomy, and demonstrates its advantages over handwritten quizzes in terms of quality and metrics.


Prompt-based mental health screening from social media text

Wesley Ramos dos Santos,Ivandre Paraboni

http://arxiv.org/abs/2401.05912v1

Compressor summary: The article describes how to use GPT 3.5 prompts and a simple text classifier to screen for mental health issues in social media posts, achieving similar results to a more complex BERT method but with less computation.


EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge

Xuyang Zhao,Qibin Zhao,Toshihisa Tanaka

http://arxiv.org/abs/2401.05908v1

Compressor summary: EpilepsyLLM is a customized language model that provides more accurate and relevant medical information about epilepsy in Japanese by fine-tuning a pre-trained LLM with domain-specific datasets.


Efficient Image Deblurring Networks based on Diffusion Models

Kang Chen,Yuanjie Liu

http://arxiv.org/abs/2401.05907v1

Compressor summary: Swintormer is a sliding window model for defocus deblurring that uses diffusion, Transformer blocks, and optimized Macs to achieve high performance with low memory usage and improved SNR.


PartSTAD: 2D-to-3D Part Segmentation Task Adaptation

Hyunjin Kim,Minhyuk Sung

http://arxiv.org/abs/2401.05906v1

Compressor summary: PartSTAD adapts 2D segmentation models for 3D segmentation tasks using finetuning, merging weights, and a foreground segmentation model, achieving significant improvements on PartNet-Mobility dataset.


ConKeD: Multiview contrastive descriptor learning for keypoint-based retinal image registration

David Rivas-Villar,Álvaro S. Hervella,José Rouco,Jorge Novo

http://arxiv.org/abs/2401.05901v1

Compressor summary: ConKeD is a novel deep learning approach that learns descriptors for retinal image registration using a multi-positive multi-negative contrastive learning strategy, achieving comparable results to state-of-the-art methods with fewer training samples and detected keypoints.


Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

Yuanzhao Zhai,Yiying Li,Zijian Gao,Xudong Gong,Kele Xu,Dawei Feng,Ding Bo,Huaimin Wang

http://arxiv.org/abs/2401.05899v1

Compressor summary: ORPO is an offline RL framework that uses optimistic rollouts to improve policy optimization and generalization with synthetic model rollouts.


Binary Linear Tree Commitment-based Ownership Protection for Distributed Machine Learning

Tianxiu Xie,Keke Gai,Jing Yu,Liehuang Zhu

http://arxiv.org/abs/2401.05895v1

Compressor summary: The paper proposes a novel ownership protection model for distributed machine learning using binary linear tree commitment, which ensures computational integrity with efficient proof aggregation and watermarking.


LiDAR data acquisition and processing for ecology applications

Ion Ciobotari,Adriana Príncipe,Maria Alexandra Oliveira,João Nuno Silva

http://arxiv.org/abs/2401.05891v1

Compressor summary: The text describes a low-cost terrestrial laser scanner (TLS) for ecological data acquisition and its application in two case studies, showing its effectiveness in measuring vegetation structure.


Generative Deduplication For Socia Media Data Selection

Xianming Li,Jing Li

http://arxiv.org/abs/2401.05883v1

Compressor summary: Generative duplication is a novel approach that removes duplicate text from noisy social media data, mitigating model bias, improving performance, and saving training time.


YOIO: You Only Iterate Once by mining and fusing multiple necessary global information in the optical flow estimation

Yu Jing,Tan Yujuan,Ren Ao,Liu Duo

http://arxiv.org/abs/2401.05879v1

Compressor summary: The YOIO framework improves optical flow prediction accuracy in occluded regions using spatiotemporal information from two frames and achieves state-of-the-art results with high efficiency.


Safe reinforcement learning in uncertain contexts

Dominik Baumann,Thomas B. Schön

http://arxiv.org/abs/2401.05876v1

Compressor summary: The paper proposes a safe learning method for robotic systems that can handle discrete environmental changes without directly measuring them, using multi-class classification and experiments to estimate the context.


Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks

Yahui Fu,Haiyue Song,Tianyu Zhao,Tatsuya Kawahara

http://arxiv.org/abs/2401.05871v1

Compressor summary: The text describes a new method for improving personality recognition in robots using data augmentation and a specialized network architecture, leading to better human-robot interactions.


HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models

Hanzhang Wang,Haoran Wang,Jinze Yang,Zhongrui Yu,Zeke Xie,Lei Tian,Xinyan Xiao,Junjun Jiang,Xianming Liu,Mingming Sun

http://arxiv.org/abs/2401.05870v1

Compressor summary: HiCAST is a novel approach for arbitrary style transfer that allows flexible and customized stylization by using a Latent Diffusion Model and a Style Adapter, and can also apply to video AST with improved temporal consistency.


Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models

Pengzhi Gao,Zhongjun He,Hua Wu,Haifeng Wang

http://arxiv.org/abs/2401.05861v1

Compressor summary: The paper proposes a new method, XConST, to improve multilingual zero-shot translation by using prompt strategies and cross-lingual consistency regularization during instruction finetuning on pretrained LLMs.


Inferring Intentions to Speak Using Accelerometer Data In-the-Wild

Litian Li,Jord Molhoek,Jing Zhou

http://arxiv.org/abs/2401.05849v1

Compressor summary: The text studies whether AI can recognize intentions to speak from accelerometer data in real-life settings, but finds that it is not reliable enough and more data sources are needed.


Revisiting Silhouette: From Micro to Macro Aggregation

Georgios Vardakas,John Pavlopoulos,Aristidis Likas

http://arxiv.org/abs/2401.05831v1

Compressor summary: The paper proposes a new way to evaluate data clustering quality, called macro-averaging, which is more robust to cluster imbalance and background noise than the standard micro-averaging method.


Hallucination Benchmark in Medical Visual Question Answering

Jinge Wu,Yunsoo Kim,Honghan Wu

http://arxiv.org/abs/2401.05827v1

Compressor summary: The text discusses the potential of visual assistants for healthcare using large language and vision models, but warns that these models may hallucinate when faced with unfamiliar medical images.


Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

Michael Free,Andrew Langworthy,Mary Dimitropoulaki,Simon Thompson

http://arxiv.org/abs/2401.05822v1

Compressor summary: The text describes a system that trains a chatbot with reinforcement learning to solve evolving problems by conversing with a simulated user about a virtual game.


Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents

Quentin Delfosse,Sebastian Sztwiertnia,Wolfgang Stammer,Mark Rothermel,Kristian Kersting

http://arxiv.org/abs/2401.05821v1

Compressor summary: SCoBots are transparent agents that use concept bottleneck layers to enable domain experts to understand and regularize their behavior, leading to better human-aligned RL.


Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification

Yannick Emonds,Kai Xi,Holger Fröning

http://arxiv.org/abs/2401.05820v1

Compressor summary: The text explores how image classification with neural networks can tolerate noise in resistive memory operations and proposes methods to improve resilience.


Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Zhuoyuan Mao,Yen Yu

http://arxiv.org/abs/2401.05811v1

Compressor summary: The article proposes AlignInstruct, a method that improves machine translation on large language models for unseen and low-resource languages by using cross-lingual supervision.


On the representation and methodology for wide and short range head pose estimation

Alejandro Cobo,Roberto Valle,José M. Buenaposada,Luis Baumela

http://arxiv.org/abs/2401.05807v1

Compressor summary: Key points: - The paper analyzes different methods and metrics for head pose estimation (HPE) in face processing tasks - It shows that Euler angles are good for short-range HPE but not for extreme rotations - It proposes a new cross-data set evaluation methodology and a generalization of the geodesic angular distance metric - It introduces a wide range HPE benchmark based on the CMU Panoptic data set Summary: The paper discusses HPE methods and metrics for different rotation ranges, improves cross-data set evaluation, and introduces a new benchmark.


CLIP-Driven Semantic Discovery Network for Visible-Infrared Person Re-Identification

Xiaoyan Yu,Neng Dong,Liehuang Zhu,Hao Peng,Dapeng Tao

http://arxiv.org/abs/2401.05806v1

Compressor summary: The paper proposes a new method, CLIP-Driven Semantic Discovery Network (CSDN), that uses high-level semantics to bridge the modality gap between visible and infrared images for person re-identification.


Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing Values

Yu Zheng,Huan Yee Koh,Ming Jin,Lianhua Chi,Haishuai Wang,Khoa T. Phan,Yi-Ping Phoebe Chen,Shirui Pan,Wei Xiang

http://arxiv.org/abs/2401.05800v1

Compressor summary: The text introduces GST-Pro, a framework for detecting anomalies in irregularly-sampled multivariate time series using neural controlled differential equations and a distribution-based scoring mechanism.


Designing Heterogeneous LLM Agents for Financial Sentiment Analysis

Frank Xing

http://arxiv.org/abs/2401.05799v1

Compressor summary: This paper explores using large language models without fine-tuning for financial sentiment analysis, proposing a framework that leverages their generative power and domain knowledge to improve accuracy and discusses its implications on business and management.


Bounds on the price of feedback for mistake-bounded online learning

Jesse Geneson,Linus Tang

http://arxiv.org/abs/2401.05794v1

Compressor summary: Key improvements in online learning bounds for various scenarios are presented, including sharpened upper and lower bounds and solved problems.


Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations

Zhihui Xie,Handong Zhao,Tong Yu,Shuai Li

http://arxiv.org/abs/2401.05792v1

Compressor summary: The paper proposes a method to remove language-specific factors from multilingual embeddings using singular value decomposition, which improves semantic tasks like cross-lingual sentence retrieval without finetuning.


Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning

Md Rizwan Parvez

http://arxiv.org/abs/2401.05787v1

Compressor summary: E2G is a novel framework that leverages evidence from context to improve LLMs' reasoning and generation performance across various tasks.


EraseDiff: Erasing Data Influence in Diffusion Models

Jing Wu,Trung Le,Munawar Hayat,Mehrtash Harandi

http://arxiv.org/abs/2401.05779v1

Compressor summary: The paper introduces an algorithm for diffusion models to remove data while preserving their utility and effectiveness on other data.


Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Tianyu Cui,Yanling Wang,Chuanpu Fu,Yong Xiao,Sijia Li,Xinhao Deng,Yunpeng Liu,Qinglin Zhang,Ziyi Qiu,Peiyang Li,Zhixing Tan,Junwu Xiong,Xinyu Kong,Zujie Wen,Ke Xu,Qi Li

http://arxiv.org/abs/2401.05778v1

Compressor summary: This paper proposes a taxonomy for large language models (LLMs) to analyze and mitigate risks in their four essential modules, and reviews benchmarks for risk assessment.


Probing Structured Semantics Understanding and Generation of Language Models via Question Answering

Jinxin Liu,Shulin Cao,Jiaxin Shi,Tingjian Zhang,Lei Hou,Juanzi Li

http://arxiv.org/abs/2401.05777v1

Compressor summary: This paper evaluates how well large language models understand and generate structured logical forms for question answering using different formal languages and suggests generating more training data rather than directly relying on LLMs for answering questions.


Knowledge Translation: A New Pathway for Model Compression

Wujie Sun,Defang Chen,Jiawei Chen,Yan Feng,Chun Chen,Can Wang

http://arxiv.org/abs/2401.05772v1

Compressor summary: The paper proposes a novel framework called Knowledge Translation that uses neural networks to compress deep learning models without re-training or architectural constraints, inspired by language translation.


Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification

Kunpeng Qiu,Zhiying Zhou,Yongxin Guo

http://arxiv.org/abs/2401.05771v1

Compressor summary: The paper proposes a new method for classifying lesions in Wireless Capsule Endoscopy images using Decoupled Supervised Contrastive Learning and Saliency Augmentor, achieving state-of-the-art results.


Evaluating Data Augmentation Techniques for Coffee Leaf Disease Classification

Adrian Gheorghiu,Iulian-Marius Tăiatu,Dumitru-Clementin Cercel,Iuliana Marin,Florin Pop

http://arxiv.org/abs/2401.05768v1

Compressor summary: The paper presents a deep learning-based method for classifying Robusta coffee leaf diseases using the RoCoLe dataset, augmented with synthetic data generated by CycleGAN, improving the performance of the model.


Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

Na Wang,Lei Qi,Jintao Guo,Yinghuan Shi,Yang Gao

http://arxiv.org/abs/2401.05752v1

Compressor summary: This paper proposes two modules to improve domain generalization by disentangling spurious correlations and enhancing potential correlations using sample and feature perspectives, achieving better results for CNNs or MLPs.


GO-NeRF: Generating Virtual Objects in Neural Radiance Fields

Peng Dai,Feitong Tan,Xin Yu,Yinda Zhang,Xiaojuan Qi

http://arxiv.org/abs/2401.05750v1

Compressor summary: The paper introduces GO-NeRF, a method for creating 3D objects within an existing Neural Radiance Field (NeRF) scene using scene context and producing high-quality, harmonious results with minimal artifacts.


A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

Brian Thompson,Mehak Preet Dhaliwal,Peter Frisch,Tobias Domhan,Marcello Federico

http://arxiv.org/abs/2401.05749v1

Compressor summary: The text suggests that machine translation is widely used for creating low-quality translations of web content in multiple languages, especially in lower resource languages.


Surface Normal Estimation with Transformers

Barry Shichen Hu,Siyun Liang,Johannes Paetzold,Huy H. Nguyen,Isao Echizen,Jiapeng Tang

http://arxiv.org/abs/2401.05745v1

Compressor summary: Key points: - Use Transformer to predict normals from noisy point clouds - Previous methods use PointNet variants and surface fitting methods with limitations - Proposed method is simpler, faster, more robust, and achieves state-of-the-art performance on two datasets Summary: The paper proposes a simple and effective Transformer-based model for predicting normals from noisy point clouds, outperforming previous methods that use PointNet variants and surface fitting methods with limitations.


Consistent Query Answering for Existential Rules under Tuple-Deletion Semantics

Lorenzo Marconi,Riccardo Rosati

http://arxiv.org/abs/2401.05743v1

Compressor summary: The paper investigates how to answer queries over knowledge bases with existential rules, finding some cases where it is easy or possible to fix inconsistencies using new techniques.


LKCA: Large Kernel Convolutional Attention

Chenghao Li,Boheng Zeng,Yi Lu,Pengbo Shi,Qingzi Chen,Jirui Liu,Lingyun Zhu

http://arxiv.org/abs/2401.05738v1

Compressor summary: The paper introduces Large Kernel Convolutional Attention (LKCA), a new spatial attention method for visual transformers that simplifies the attention operation and combines the advantages of convolutional neural networks and visual transformers, achieving competitive performance in various visual tasks.


An experimental evaluation of Deep Reinforcement Learning algorithms for HVAC control

Antonio Manjavacas,Alejandro Campoy-Nieves,Javier Jiménez-Raboso,Miguel Molina-Solana,Juan Gómez-Romero

http://arxiv.org/abs/2401.05737v1

Compressor summary: This paper evaluates state-of-the-art Deep Reinforcement Learning algorithms for HVAC control, showing their potential in comfort and energy efficiency while highlighting challenges in generalization and incremental learning.


Cross-modal Retrieval for Knowledge-based Visual Question Answering

Paul Lerner,Olivier Ferret,Camille Guinaudeau

http://arxiv.org/abs/2401.05736v1

Compressor summary: The text discusses how cross-modal retrieval can help recognize named entities in visual question answering tasks by bridging the semantic gap between entities and their depictions.


Object-Centric Diffusion for Efficient Video Editing

Kumara Kahatapitiya,Adil Karjauv,Davide Abati,Fatih Porikli,Yuki M. Asano,Amirhossein Habibian

http://arxiv.org/abs/2401.05735v1

Compressor summary: The paper analyzes the inefficiencies of diffusion-based video editing, introduces Object-Centric Diffusion (OCD) to reduce computation costs by focusing on foreground regions, and proposes two novel techniques that can significantly speed up video editing without retraining.


Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing

Jaeill Kim,Duhun Hwang,Eunjung Lee,Jangwon Suh,Jimyeong Kim,Wonjong Rhee

http://arxiv.org/abs/2401.05730v1

Compressor summary: The paper proposes a multi-view strategy called ECPP that improves the speed and performance of contrastive and non-contrastive visual representation learning methods.


Zero Resource Cross-Lingual Part Of Speech Tagging

Sahil Chopra

http://arxiv.org/abs/2401.05727v1

Compressor summary: The text discusses using a hidden Markov model to predict part-of-speech tags in low-resource languages by transferring data from source languages, and finds that this method is effective for zero-resource languages.


Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Xu Cai,Jonathan Scarlett

http://arxiv.org/abs/2401.05716v1

Compressor summary: The paper investigates how the difficulty of estimating a normalizing constant using black-box function queries depends on the problem parameter $\lambda$, ranging from Bayesian quadrature to Bayesian optimization and considering noisy function evaluations.


Dynamic Indoor Fingerprinting Localization based on Few-Shot Meta-Learning with CSI Images

Jiyu Jiao,Xiaojun Wang,Chenpei Han,Yuhua Huang,Yizhuo Zhang

http://arxiv.org/abs/2401.05711v1

Compressor summary: A meta-learning algorithm that uses historical localization tasks to improve adaptability and efficiency in indoor environments reduces data acquisition costs and increases accuracy.


The Distributional Reward Critic Architecture for Perturbed-Reward Reinforcement Learning

Xi Chen,Zhihui Zhu,Andrew Perrault

http://arxiv.org/abs/2401.05710v1

Compressor summary: The paper proposes a new method for reinforcement learning with unknown reward perturbations that can recover the true rewards in various settings and outperforms existing methods.


CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer

Zhen Tao,Dinghao Xi,Zhiyu Li,Liumin Tang,Wei Xu

http://arxiv.org/abs/2401.05707v1

Compressor summary: The proposed Chinese Article-style Transfer framework (CAT-LLM) leverages Large Language Models to analyze and transfer text features from Chinese articles while preserving the original content's integrity.


R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation

Jiaxin Guo,Zhanglin Wu,Zongyao Li,Hengchao Shang,Daimeng Wei,Xiaoyu Chen,Zhiqiang Rao,Shaojun Li,Hao Yang

http://arxiv.org/abs/2401.05700v1

Compressor summary: Regularized Batched Inputs is a novel method for low-latency simultaneous speech translation that improves input diversity and reduces output errors with suitable regularization techniques.


Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback

Chengfeng Dou,Zhi Jin,Wenpin Jiao,Haiyan Zhao,Yongqiang Zhao,Zhenwei Tao

http://arxiv.org/abs/2401.05695v1

Compressor summary: PLPF is a method to improve medical dialogue generation by integrating diagnostic logic into large language models using rule modeling, preference data generation, and preference alignment.


UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Jiaxin Guo,Minghan Wang,Xiaosong Qiao,Daimeng Wei,Hengchao Shang,Zongyao Li,Zhengzhe Yu,Yinglu Li,Chang Su,Min Zhang,Shimin Tao,Hao Yang

http://arxiv.org/abs/2401.05689v1

Compressor summary: UCorrect is an unsupervised method for correcting errors in automatic speech recognition (ASR) output without relying on specific training data or fine-tuning, achieving significant word error rate reduction and outperforming popular correction models.


Self Expanding Convolutional Neural Networks

Blaise Appolinary,Alex Deaconu,Sophia Yang

http://arxiv.org/abs/2401.05686v1

Compressor summary: The paper proposes a method to dynamically expand CNNs during training using an expansion score, which improves performance, reduces resource use, and is eco-friendly.


Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Weibo Jiang,Weihong Ren,Jiandong Tian,Liangqiong Qu,Zhiyong Wang,Honghai Liu

http://arxiv.org/abs/2401.05676v1

Compressor summary: The paper proposes a new method for human-object interaction detection that considers both self-triplet and cross-triplet dependencies, as well as leveraging the CLIP model to obtain interaction-aware features.


Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Seung Hyun Lee,Yinxiao Li,Junjie Ke,Innfarn Yoo,Han Zhang,Jiahui Yu,Qifei Wang,Fei Deng,Glenn Entis,Junfeng He,Gang Li,Sangpil Kim,Irfan Essa,Feng Yang

http://arxiv.org/abs/2401.05675v1

Compressor summary: Parrot is a new framework for text-to-image generation that uses reinforcement learning to automatically balance multiple quality rewards and improve the generated images.


ConcEPT: Concept-Enhanced Pre-Training for Language Models

Xintao Wang,Zhouhong Gu,Jiaqing Liang,Dakuan Lu,Yanghua Xiao,Wei Wang

http://arxiv.org/abs/2401.05669v1

Compressor summary: ConcEPT is a novel pre-training method for language models that infuses conceptual knowledge by predicting the concepts of entities in the context, improving performance on tasks like entity typing.


EsaCL: Efficient Continual Learning of Sparse Models

Weijieying Ren,Vasant G Honavar

http://arxiv.org/abs/2401.05667v1

Compressor summary: EsaCL is a method for efficient continual learning of sparse models that prunes redundant parameters without retraining and uses intelligent data selection to improve data efficiency.


Root Cause Analysis on Energy Efficiency with Transfer Entropy Flow

Jian Ma

http://arxiv.org/abs/2401.05664v1

Compressor summary: The paper proposes using transfer entropy to diagnose the root causes of low energy efficiency in industrial systems, and tests it on a real compressing air system data set.


Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability

Kaixun Yang,Mladen Raković,Yuyang Li,Quanlong Guan,Dragan Gašević,Guanliang Chen

http://arxiv.org/abs/2401.05655v1

Compressor summary: The study investigates the relationship between AES model accuracy, fairness, and generalizability, finding that prompt-specific models perform better in accuracy but may have more bias towards students of different economic statuses compared to cross-prompt models, while traditional machine learning models with engineered features achieve higher accuracy and fairness than complex neural networks.


Towards Conversational Diagnostic AI

Tao Tu,Anil Palepu,Mike Schaekermann,Khaled Saab,Jan Freyberg,Ryutaro Tanno,Amy Wang,Brenna Li,Mohamed Amin,Nenad Tomasev,Shekoofeh Azizi,Karan Singhal,Yong Cheng,Le Hou,Albert Webson,Kavita Kulkarni,S Sara Mahdavi,Christopher Semturs,Juraj Gottweis,Joelle Barral,Katherine Chou,Greg S Corrado,Yossi Matias,Alan Karthikesalingam,Vivek Natarajan

http://arxiv.org/abs/2401.05654v1

Compressor summary: AMIE is an AI system that can have diagnostic dialogues with patients and performs better than primary care physicians in some aspects, but it is not yet ready for real-world use.


Quantifying Marketing Performance at Channel-Partner Level by Using Marketing Mix Modeling (MMM) and Shapley Value Regression

Sean Tang,Sriya Musunuru,Baoshi Zong,Brooks Thornton

http://arxiv.org/abs/2401.05653v1

Compressor summary: The paper shows how Shapley Value Regression can help measure individual partner's impact on marketing performance in financial services, without the need for complex and costly cooperative game theory testing.


On Detecting Cherry-picking in News Coverage Using Large Language Models

Israa Jaradat,Haiqi Zhang,Chengkai Li

http://arxiv.org/abs/2401.05650v1

Compressor summary: Cherry is a novel approach that uses language models and multiple news sources to automatically detect cherry-picked statements in news articles by identifying missing important statements.


Masked Attribute Description Embedding for Cloth-Changing Person Re-identification

Chunlei Peng,Boyu Wang,Decheng Liu,Nannan Wang,Ruimin Hu,Xinbo Gao

http://arxiv.org/abs/2401.05646v1

Compressor summary: The paper proposes a method called MADE that uses attribute descriptions to enhance person re-identification across clothing changes, by masking and embedding them into the Transformer blocks of an image model.


MatSAM: Efficient Materials Microstructure Extraction via Visual Large Model

Changtai Li,Xu Han,Chao Yao,Xiaojuan Ban

http://arxiv.org/abs/2401.05638v1

Compressor summary: MatSAM is a general and efficient solution for microstructure extraction in microscopic images based on SAM, using point-based prompts generation to adapt to different materials and microscopy types.


Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach

Gang Wu,Junjun Jiang,Junpeng Jiang,Xianming Liu

http://arxiv.org/abs/2401.05633v1

Compressor summary: The ConvFormer-based Super-Resolution network (CFSR) is a lightweight method for image super-resolution that replaces the self-attention module with large kernel convolution and uses an edge-preserving feed-forward network to preserve high-frequency information.


Natural Language Processing for Dialects of a Language: A Survey

Aditya Joshi,Raj Dabre,Diptesh Kanojia,Zhuang Li,Haolan Zhan,Gholamreza Haffari,Doris Dippold

http://arxiv.org/abs/2401.05632v1

Compressor summary: The text surveys past research on natural language processing (NLP) for dialects, covering various tasks, languages, and methods, with a focus on improving the equity of language technologies.


Learning Performance-Oriented Control Barrier Functions Under Complex Safety Constraints and Limited Actuation

Shaoru Chen,Mahyar Fazlyab

http://arxiv.org/abs/2401.05629v1

Compressor summary: The paper proposes a self-supervised learning framework for finding control barrier functions that maximize safety and accommodate complex constraints in nonlinear control systems.


Face-GPS: A Comprehensive Technique for Quantifying Facial Muscle Dynamics in Videos

Juni Kim,Zhikang Dong,Pawel Polak

http://arxiv.org/abs/2401.05625v1

Compressor summary: The authors present a new method that uses geometry, smoothing, and spectral analysis to measure facial muscle activity from videos, which could have various applications in security, medicine, and emotion recognition.


The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models

Matthew Renze,Erhan Guven

http://arxiv.org/abs/2401.05618v1

Compressor summary: The paper introduces Concise Chain-of-Thought (CCoT) prompting, which reduces response length and per-token cost for GPT-3.5 and GPT-4 on MCQA tasks, but may impair math problem-solving for GPT-3.5.


Graph Q-Learning for Combinatorial Optimization

Victoria M. Dax,Jiachen Li,Kevin Leahy,Mykel J. Kochenderfer

http://arxiv.org/abs/2401.05610v1

Compressor summary: The paper shows how Graph Neural Networks (GNNs) can be used to optimize discrete solutions in Combinatorial Optimization problems by learning policies through Q-Learning.


Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Damjan Kalajdzievski

http://arxiv.org/abs/2401.05605v1

Compressor summary: Our study shows that LoRA, a PEFT strategy, causes significant catastrophic forgetting in LLMs and provides scaling laws for its relationship with performance and number of parameters.


REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Andrew Gritsevskiy,Arjun Panickssery,Aaron Kirtland,Derik Kauffman,Hans Gundlach,Irina Gritsevskaya,Joe Cavanagh,Jonathan Chiang,Lydia La Roux,Michelle Hung

http://arxiv.org/abs/2401.05604v1

Compressor summary: The paper introduces a rebus puzzle benchmark to evaluate multimodal language models' performance, which requires various skills and shows current models' weaknesses in reasoning and explanation.


Nucleus subtype classification using inter-modality learning

Lucas W. Remedios,Shunxing Bao,Samuel W. Remedios,Ho Hin Lee,Leon Y. Cai,Thomas Li,Ruining Deng,Can Cui,Jia Li,Qi Liu,Ken S. Lau,Joseph T. Roland,Mary K. Washington,Lori A. Coburn,Keith T. Wilson,Yuankai Huo,Bennett A. Landman

http://arxiv.org/abs/2401.05602v1

Compressor summary: The paper proposes using inter-modality learning to classify more cell types on virtual H&E stains by synthesizing them from multiplexed immunofluorescence images and transferring labels from the latter.


POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Shilong Pan,Zhiliang Tian,Liang Ding,Zhen Huang,Zhihua Wen,Dongsheng Li

http://arxiv.org/abs/2401.05596v1

Compressor summary: The paper proposes a novel method called POMP that uses a dynamic graph of multiple auxiliary languages to improve unsupervised neural machine translation for low-resource languages by mitigating linguistic noise in large language models.