arxiv compressed, 2024-01-16

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-16 generated by the compressor, my personal LLM-based project.


Machine Translation Models are Zero-Shot Detectors of Translation Direction

Michelle Wastl,Jannis Vamvas,Rico Sennrich

http://arxiv.org/abs/2401.06769v1

Compressor summary: The authors propose an unsupervised method to detect translation direction of parallel text using the simplification effect in machine translation and achieve high accuracies for both NMT and human translations.


Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

Anton Voronov,Lena Wolf,Max Ryabinin

http://arxiv.org/abs/2401.06766v1

Compressor summary: The prompt template format significantly affects large language models' in-context learning performance, and using Template Ensembles can improve their accuracy by aggregating predictions across different templates.


Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery

Caleb Robinson,Isaac Corley,Anthony Ortiz,Rahul Dodhia,Juan M. Lavista Ferres,Peyman Najafirad

http://arxiv.org/abs/2401.06762v1

Compressor summary: The paper introduces a new benchmark dataset (Chesapeake Roads Spatial Context) for testing how well geospatial machine learning models understand spatial context over long distances, and shows that current models often fail at this task.


APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding

Mingdao Liu,Aohan Zeng,Bowen Wang,Peng Zhang,Jie Tang,Yuxiao Dong

http://arxiv.org/abs/2401.06761v1

Compressor summary: The paper introduces a parallel auto-regressive generation method that speeds up text generation by LLMs and reduces resource consumption.


Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

Tom Kocmi,Vilém Zouhar,Christian Federmann,Matt Post

http://arxiv.org/abs/2401.06760v1

Compressor summary: The paper explores how different metrics measure human-noticeable differences in machine translation quality using a new dataset and finding more stable results than p-values.


Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction

Muhammad Naveed Riaz,Maciej Wielgosz,Abel Garcia Romera,Antonio M. Lopez

http://arxiv.org/abs/2401.06757v1

Compressor summary: ARCANE generates diverse synthetic datasets for pedestrian intention prediction, complementing real-world data, and PedGNN is a fast and memory-efficient model for this task.


Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Muhammad Tayyab Zamir,Muhammad Asif Ayub,Asma Gul,Nasir Ahmad,Kashif Ahmad

http://arxiv.org/abs/2401.06752v1

Compressor summary: The paper presents a new framework that uses style analysis to detect authorship and changes in multi-authored documents, improving on existing methods with special characters and weight optimization.


The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

Peter Hase,Mohit Bansal,Peter Clark,Sarah Wiegreffe

http://arxiv.org/abs/2401.06751v1

Compressor summary: This paper shows that current language models can generalize well from easy to hard data using simple training methods, and suggests collecting easy data instead of hard data for better performance.


Using Natural Language Inference to Improve Persona Extraction from Dialogue in a New Domain

Alexandra DeLucia,Mengjie Zhao,Yoshinori Maeda,Makoto Yoda,Keiichi Yamada,Hiromi Wakaki

http://arxiv.org/abs/2401.06742v1

Compressor summary: The authors propose a natural language inference method for adapting a persona extraction model to new settings without needing new data or human annotation.


Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

Kaitlyn Zhou,Jena D. Hwang,Xiang Ren,Maarten Sap

http://arxiv.org/abs/2401.06730v1

Compressor summary: The text discusses the challenges of AI language models expressing uncertainties in their responses, which can lead to overconfidence and safety risks for users who rely on them.


Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding

Bozhen Hu,Zelin Zang,Jun Xia,Lirong Wu,Cheng Tan,Stan Z. Li

http://arxiv.org/abs/2401.06727v1

Compressor summary: The paper introduces a new method for embedding graph data in a low-dimensional space that preserves topological structure and improves stability and quality, outperforming existing approaches on various tasks.


Reframing Tax Law Entailment as Analogical Reasoning

Xinrui Zou,Ming Zhang,Nathaniel Weir,Benjamin Van Durme,Nils Holzenberger

http://arxiv.org/abs/2401.06715v1

Compressor summary: The paper proposes re-framing statutory reasoning as an analogy task to increase dataset size and interpretability, and improves upon existing methods by combining retrieval and analogy models.


Few-Shot Detection of Machine-Generated Text using Style Representations

Rafael Rivera Soto,Kailin Koch,Aleem Khan,Barry Chen,Marcus Bishop,Nicholas Andrews

http://arxiv.org/abs/2401.06712v1

Compressor summary: The authors propose a method to distinguish human from machine-written text without relying on training data from potentially abusive language models, using style features that work across different models.


Model-Free Approximate Bayesian Learning for Large-Scale Conversion Funnel Optimization

Garud Iyengar,Raghav Singal

http://arxiv.org/abs/2401.06710v1

Compressor summary: Key points: - The paper studies how to optimize personalized interventions for different consumer states in marketing campaigns. - It proposes a novel algorithm that combines attribution-based decision-making and approximate Bayesian learning. - It shows high accuracy, interpretability, scalability, and asymptotic optimality of the algorithm on a real-world email marketing dataset. Summary: The paper introduces an optimal algorithm for personalized interventions in marketing campaigns that uses attribution-based decision-making and approximate Bayesian learning to learn from consumer interactions.


Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text

Muskan Garg,MSVPJ Sathvik,Amrit Chadha,Shaina Raza,Sunghwan Sohn

http://arxiv.org/abs/2401.06709v1

Compressor summary: The paper presents a new dataset for analyzing low self-esteem in Reddit posts and suggests that current models should focus more on textual cues related to self-esteem rather than triggers or consequences of mental disturbances.


Multi-Candidate Speculative Decoding

Sen Yang,Shujian Huang,Xinyu Dai,Jiajun Chen

http://arxiv.org/abs/2401.06706v1

Compressor summary: This paper improves speculative decoding by sampling multiple candidates and verifying them in batches, leading to higher acceptance rates for large language models.


Scalable 3D Panoptic Segmentation With Superpoint Graph Clustering

Damien Robert,Hugo Raguet,Loic Landrieu

http://arxiv.org/abs/2401.06704v1

Compressor summary: SuperCluster is an efficient and scalable method for panoptic segmentation of large 3D point clouds using graph clustering and superpoint adaptation, achieving state-of-the-art results on multiple datasets while being much smaller and faster than previous methods.


A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks

Slavisa Tomic,João Pedro Matos-Carvalho,Marko Beko

http://arxiv.org/abs/2401.06699v1

Compressor summary: The paper proposes a weight optimization method for neural networks using least squares that is faster and easier to implement than existing methods.


An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Gantavya Bhatt,Yifang Chen,Arnav M. Das,Jifan Zhang,Sang T. Truong,Stephen Mussmann,Yinglun Zhu,Jeffrey Bilmes,Simon S. Du,Kevin Jamieson,Jordan T. Ash,Robert D. Nowak

http://arxiv.org/abs/2401.06692v1

Compressor summary: The paper proposes using experimental design techniques to reduce the annotation cost and increase the label efficiency for supervised finetuning on instruction datasets for large language models.


Embedded Planogram Compliance Control System

M. Erkin Yücel,Serkan Topaloğlu,Cem Ünsalan

http://arxiv.org/abs/2401.06690v1

Compressor summary: The study proposes an embedded system using computer vision and deep learning techniques for planogram compliance control in retail, achieving high F1 scores and working stand-alone for up to two years with energy harvesting options.


Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

Giorgos Vernikos,Andrei Popescu-Belis

http://arxiv.org/abs/2401.06688v1

Compressor summary: QE-fusion is a method that uses quality estimation metrics to improve neural machine translation, achieving better results than existing techniques and scaling linearly with candidate diversity.


Proximal Causal Inference With Text Data

Jacob M. Chen,Rohit Bhattacharya,Katherine A. Keith

http://arxiv.org/abs/2401.06687v1

Compressor summary: The paper proposes a new causal inference method using text data and zero-shot models to handle unobserved confounding variables, which is novel and has low bias.


How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Yi Zeng,Hongpeng Lin,Jingwen Zhang,Diyi Yang,Ruoxi Jia,Weiyan Shi

http://arxiv.org/abs/2401.06373v1

Compressor summary: The paper proposes using social science research to generate persuasive prompts that can jailbreak large language models, highlighting the need for better defense mechanisms.


WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

Wenbin Wang,Liang Ding,Li Shen,Yong Luo,Han Hu,Dacheng Tao

http://arxiv.org/abs/2401.06659v1

Compressor summary: WisdoM is a framework that uses large vision-language models to analyze images and text for sentiment analysis, incorporating contextual world knowledge and improving performance by +1.89 F1 score on average.


Decoupling Pixel Flipping and Occlusion Strategy for Consistent XAI Benchmarks

Stefan Blücher,Johanna Vielhaben,Nils Strodthoff

http://arxiv.org/abs/2401.06654v1

Compressor summary: The study proposes the R-OMS score to compare occlusion strategies in XAI and the SRG measure to resolve disagreements between MIF and LIF measures.


Block Majorization Minimization with Extrapolation and Application to $β$-NMF

Le Thi Khanh Hien,Valentin Leplat,Nicolas Gillis

http://arxiv.org/abs/2401.06646v1

Compressor summary: BMMe is a new method to solve multi-convex optimization problems faster by updating parameters adaptively and using extrapolation, which is shown to be efficient for nonnegative matrix factorization with different $\beta$-divergences.


SeizNet: An AI-enabled Implantable Sensor Network System for Seizure Prediction

Ali Saeizadeh,Douglas Schonholtz,Daniel Uvaydov,Raffaele Guida,Emrecan Demirors,Pedram Johari,Jorge M. Jimenez,Joseph S. Neimat,Tommaso Melodia

http://arxiv.org/abs/2401.06644v1

Compressor summary: SeizNet is a closed-loop system that uses deep learning and implantable sensors to predict drug-resistant epileptic seizures with high accuracy and specificity, providing a better alternative to traditional treatments.


Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Jan Cegin,Branislav Pecher,Jakub Simko,Ivan Srba,Maria Bielikova,Peter Brusilovsky

http://arxiv.org/abs/2401.06643v1

Compressor summary: The study explores how using taboo words, hints from previous solutions, and chaining on previous solutions can increase text diversity in LLM-generated data and improve downstream models' performance.


Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, but Inconsistently

Kanishka Misra,Allyson Ettinger,Kyle Mahowald

http://arxiv.org/abs/2401.06640v1

Compressor summary: Experimental contexts can improve language models' performance in predicting semantic properties of novel concepts, but this ability is inconsistent and relies on controlling the input examples and instructions.


OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Shuai Wang,Liang Ding,Li Shen,Yong Luo,Bo Du,Dacheng Tao

http://arxiv.org/abs/2401.06628v1

Compressor summary: The text introduces an OOP-focused code generation benchmark and evaluation metric, pass@o, which reveals the limitations of current LLMs in OOP.


TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models

Yihong Liu,Chunlan Ma,Haotian Ye,Hinrich Schütze

http://arxiv.org/abs/2401.06620v1

Compressor summary: TransliCo is a framework that fine-tunes pretrained language models to improve crosslingual transfer by contrasting sentences with their transliterations in a unified script.


Identifying Policy Gradient Subspaces

Jan Schneider,Pierre Schumacher,Simon Guist,Le Chen,Daniel Häufle,Bernhard Schölkopf,Dieter Büchler

http://arxiv.org/abs/2401.06604v1

Compressor summary: This paper evaluates how exploiting low-dimensional gradient subspaces can improve the training efficiency of deep policy gradient methods in reinforcement learning.


Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

Shangding Gu

http://arxiv.org/abs/2401.06603v1

Compressor summary: The study proposes a teacher-student learning framework where a large language model (LLM) acts as a teacher and a reinforcement learning (RL) model serves as a student, enabling both agents to improve each other through feedback and cooperation in challenging tasks.


Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Pengfei Zhu,Qian Wang,Yu Wang,Jialu Li,Qinghua Hu

http://arxiv.org/abs/2401.06595v1

Compressor summary: The paper proposes a graph clustering method that adapts the weights of self-supervised learning tasks for different nodes and fuses their embeddings, achieving state-of-the-art results.


Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

Seongyun Lee,Seungone Kim,Sue Hyun Park,Geewook Kim,Minjoon Seo

http://arxiv.org/abs/2401.06591v1

Compressor summary: The authors propose Prometheus-Vision, an evaluator model that uses a new feedback dataset to assess long-form responses generated by Vision-Language Models based on user-defined criteria.


Mapping Transformer Leveraged Embeddings for Cross-Lingual Document Representation

Tsegaye Misikir Tashu,Eduard-Raul Kontos,Matthia Sabatelli,Matias Valdenegro-Toro

http://arxiv.org/abs/2401.06583v1

Compressor summary: The study evaluates four multilingual transformer models in creating cross-lingual document representations to overcome limitations in recommending documents in different languages.


360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

Qian Wang,Weiqi Li,Chong Mou,Xinhua Cheng,Jian Zhang

http://arxiv.org/abs/2401.06578v1

Compressor summary: The paper introduces a method to generate 360-degree panoramic videos from text prompts using a modified text-to-video diffusion model and a new panorama dataset called WEB360.


Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Xu Huang,Zhirui Zhang,Xiang Geng,Yichao Du,Jiajun Chen,Shujian Huang

http://arxiv.org/abs/2401.06568v1

Compressor summary: The study examines how large language models use source and reference information to evaluate translations, finding that reference information improves accuracy while source information sometimes hinders it, indicating a need for better cross-lingual capability in LLMs.


Resource-Efficient Gesture Recognition using Low-Resolution Thermal Camera via Spiking Neural Networks and Sparse Segmentation

Ali Safa,Wout Mommen,Lars Keuninckx

http://arxiv.org/abs/2401.06563v1

Compressor summary: The paper presents a low-cost hand gesture recognition system using a thermal sensor and a Spiking Neural Network, achieving high accuracy and outperforming deep learning approaches.


Intention Analysis Prompting Makes Large Language Models A Good Jailbreak Defender

Yuqi Zhang,Liang Ding,Lefei Zhang,Dacheng Tao

http://arxiv.org/abs/2401.06561v1

Compressor summary: The study proposes a defense strategy called Intention Analysis Prompting (IAPrompt) to reduce the harmfulness of large language models without compromising their helpfulness, by triggering their self-correct and improve ability.


A General Benchmark Framework is Dynamic Graph Neural Network Need

Yusen Zhang

http://arxiv.org/abs/2401.06559v1

Compressor summary: This paper argues that a unified benchmark framework is needed to accurately evaluate dynamic graph learning models and improve their performance in various applications.


Treatment-Aware Hyperbolic Representation Learning for Causal Effect Estimation with Social Networks

Ziqiang Cui,Xing Tang,Yang Qiao,Bowei He,Liang Chen,Xiuqiang He,Chen Ma

http://arxiv.org/abs/2401.06557v1

Compressor summary: TAHyper is a novel method that uses hyperbolic space and treatment-aware relationship identification to estimate hidden confounders in social networks for individual treatment effect estimation.


Multimodal Learning for detecting urban functional zones using remote sensing image and multi-semantic information

Chuanji Shi,Yingying Zhang,Jiaotuan Wang,Qiqi Zhu

http://arxiv.org/abs/2401.06550v1

Compressor summary: The paper proposes an algorithm using multimodal deep learning to detect urban area-of-interest fence polygons from remote sensing images and multi-semantics data, improving accuracy for mobile Internet businesses.


Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning

Chenyang Wang,Junjun Jiang,Xingyu Hu,Xianming Liu,Xiangyang Ji

http://arxiv.org/abs/2401.06548v1

Compressor summary: Key points: - Deep learning systems suffer from catastrophic forgetting when learning new tasks without access to old data - Data-free data replay methods invert samples from the classification model to reuse old data - Existing methods ignore the inconsistency of inverted and real data, which affects performance - The proposed CCIL method measures and reduces the data consistency problem using a novel loss function and class weight regularization Summary: CCIL is a new method that improves deep learning systems' continuity by measuring and reducing the inconsistency of inverting samples from the model to replay old data, which mitigates catastrophic forgetting.


Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach

Vandad Imani,Elaheh Moradi,Carlos Sevilla-Salcedo,Vittorio Fortino,Jussi Tohka

http://arxiv.org/abs/2401.06546v1

Compressor summary: The paper proposes NMFS-GA, a genetic algorithm to choose accurate and interpretable features for binary classification with noisy labels.


Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook

Ziying Song,Lin Liu,Feiyang Jia,Yadan Luo,Guoxin Zhang,Lei Yang,Li Wang,Caiyan Jia

http://arxiv.org/abs/2401.06542v1

Compressor summary: The text discusses the importance of evaluating 3D object detection methods for autonomous driving in terms of accuracy, latency, and robustness against environmental variations and weather changes.


Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

Kaishuai Xu,Wenjun Hou,Yi Cheng,Jian Wang,Wenjie Li

http://arxiv.org/abs/2401.06541v1

Compressor summary: The IADDx framework generates medical dialogues that include a comprehensive differential diagnosis using retrieval-based intuitive association and graph-enhanced analytic reasoning, helping both clinicians and patients understand the diagnostic process.


INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning

Yutao Zhu,Peitian Zhang,Chenghao Zhang,Yifei Chen,Binyu Xie,Zhicheng Dou,Zheng Liu,Ji-Rong Wen

http://arxiv.org/abs/2401.06532v1

Compressor summary: This paper introduces INTERS, a new dataset for instruction tuning to improve large language models' performance in information retrieval tasks.


PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards

Elias Arbash,Margret Fuchs,Behnood Rasti,Sandra Lorenz,Pedram Ghamisi,Richard Gloaguen

http://arxiv.org/abs/2401.06528v1

Compressor summary: The paper presents PCB-Vision, a dataset of RGB and hyperspectral images for analyzing electronic waste composition to improve recycling efficiency and align with the UN Sustainable Development Goals.


MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection

Paloma Piot,Patricia Martín-Rodilla,Javier Parapar

http://arxiv.org/abs/2401.06526v1

Compressor summary: The paper presents MetaHate, a meta-collection of datasets for studying hate speech, to help develop better models for combating it online.


Domain Adaptation for Time series Transformers using One-step fine-tuning

Subina Khanal,Seshu Tirupathi,Giulio Zizzo,Ambrish Rawat,Torben Bach Pedersen

http://arxiv.org/abs/2401.06524v1

Compressor summary: The paper proposes a one-step fine-tuning method to improve Transformers' performance in time series prediction for domains with limited data by incorporating source domain data gradually and using gradual unfreezing.


Exploring Diverse Representations for Open Set Recognition

Yu Wang,Junxian Mu,Pengfei Zhu,Qinghua Hu

http://arxiv.org/abs/2401.06521v1

Compressor summary: MEDAF is a discriminative model that learns diverse representations to improve open set recognition and outperforms generative models with less computational cost.


Personalized Reinforcement Learning with a Budget of Policies

Dmitry Ivanov,Omer Ben-Porat

http://arxiv.org/abs/2401.06514v1

Compressor summary: The paper proposes r-MDPs, a framework for balancing personalization and regulatory constraints in high-stakes fields like healthcare using deep reinforcement learning algorithms inspired by K-means clustering.


AntEval: Quantitatively Evaluating Informativeness and Expressiveness of Agent Social Interactions

Yuanzhi Liang,Linchao Zhu,Yi Yang

http://arxiv.org/abs/2401.06509v1

Compressor summary: The authors propose a virtual setting using tabletop role-playing games to foster complex, context-rich interactions among agents and introduce AntEval, a framework to evaluate the informativeness and expressiveness of these interactions using novel metrics.


Frequency Masking for Universal Deepfake Detection

Chandler Timm Doloriel,Ngai-Man Cheung

http://arxiv.org/abs/2401.06506v1

Compressor summary: The paper proposes a novel deepfake detector using frequency masking in the self-supervised pre-training phase, which outperforms existing methods.


Improving the Detection of Small Oriented Objects in Aerial Images

Chandler Timm C. Doloriel,Rhandley D. Cajote

http://arxiv.org/abs/2401.06503v1

Compressor summary: The paper proposes an Attention-Points Network that improves the detection of small oriented objects in aerial images using two losses: Guided-Attention Loss and Box-Points Loss.


An investigation of structures responsible for gender bias in BERT and DistilBERT

Thibaud Leteno,Antoine Gourru,Charlotte Laclau,Christophe Gravier

http://arxiv.org/abs/2401.06495v1

Compressor summary: The paper investigates gender bias in BERT and DistilBERT, finding that bias is uniformly encoded by every attention head except a few in underrepresented classes, and that distillation may increase bias.


Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

Tianyu Zheng,Shuyue Guo,Xingwei Qu,Jiawei Guo,Weixu Zhang,Xinrun Du,Chenghua Lin,Wenhao Huang,Wenhu Chen,Jie Fu,Ge Zhang

http://arxiv.org/abs/2401.06477v1

Compressor summary: Key points: - Kun is a novel approach to create instruction-tuning datasets for LLMs without manual annotations - It uses self-training, back-translation, and answer polishing with diverse unlabelled data sources - It improves data retention, clarity, and reduces manual annotation costs - It shows robustness and scalability on Yi model across various benchmarks - It has implications for LLM applications in diverse fields Summary: Kun is a new method that generates instruction-tuning datasets for large language models using self-training, back-translation, and answer polishing, without manual annotations. It enhances data quality, reduces costs, and improves LLM performance across different tasks.


Self-supervised Learning of Dense Hierarchical Representations for Medical Image Segmentation

Eytan Kats,Jochen G. Hirsch,Mattias P. Heinrich

http://arxiv.org/abs/2401.06473v1

Compressor summary: The paper presents a self-supervised framework for learning voxel-wise coarse-to-fine representations that balances global and local features, improves downstream tasks with limited annotations, and outperforms baselines.


A Brain-inspired Computational Model for Human-like Concept Learning

Yuwei Wang,Yi Zeng

http://arxiv.org/abs/2401.06471v1

Compressor summary: The study develops a computational model for concept learning based on spiking neural networks that mimic human brain mechanisms and achieves human-like concept representations.


Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

Kaiyi Zhang,Ang Lv,Yuhan Chen,Hansen Ha,Tao Xu,Rui Yan

http://arxiv.org/abs/2401.06469v1

Compressor summary: Batch-ICL is an efficient and order-agnostic inference algorithm for in-context learning that consistently outperforms most example sequences and reduces computational resources.


Adapting Large Language Models for Document-Level Machine Translation

Minghao Wu,Thuy-Trang Vu,Lizhen Qu,George Foster,Gholamreza Haffari

http://arxiv.org/abs/2401.06468v1

Compressor summary: This paper investigates how to adapt large language models for document-level machine translation, finding that some specialized models outperform GPT-4 while others struggle with off-target translations.


PersianMind: A Cross-Lingual Persian-English Large Language Model

Pedram Rostami,Ali Salemi,Mohammad Javad Dousti

http://arxiv.org/abs/2401.06466v1

Compressor summary: PersianMind is an open-source bilingual large language model that performs similarly to GPT-3.5-turbo in Persian by expanding LLaMa2's vocabulary and training it on a large Persian dataset.


Sanity Checks Revisited: An Exploration to Repair the Model Parameter Randomisation Test

Anna Hedström,Leander Weber,Sebastian Lapuschkin,Marina MC Höhne

http://arxiv.org/abs/2401.06465v1

Compressor summary: Smooth MPRT and Efficient MPRT are new adaptations of the Model Parameter Randomisation Test that address methodological caveats and improve reliability in eXplainable Artificial Intelligence evaluations.


AttributionScanner: A Visual Analytics System for Metadata-Free Data-Slicing Based Model Validation

Xiwei Xuan,Jorge Piazentin Ono,Liang Gou,Kwan-Liu Ma,Liu Ren

http://arxiv.org/abs/2401.06462v1

Compressor summary: AttributionScanner is a Visual Analytics system that helps evaluate machine learning models on images by finding interpretable subgroups with explainable features and enabling users to fix model issues using neural network regularization.


Automated Machine Learning for Positive-Unlabelled Learning

Jack D. Saunders,Alex A. Freitas

http://arxiv.org/abs/2401.06452v1

Compressor summary: The text introduces two new Automated Machine Learning systems for Positive-Unlabelled learning and evaluates them along with a previous system and other methods on various datasets.


BOK-VQA: Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining

Minjun Kim,Seungwoo Song,Youhan Lee,Haneol Jang,Kyungtae Lim

http://arxiv.org/abs/2401.06443v1

Compressor summary: The paper introduces a new bilingual dataset for visual question answering and proposes a framework to inject external knowledge into the system using graph embeddings.


RotationDrag: Point-based Image Editing with Rotated Diffusion Features

Minxing Luo,Wentao Cheng,Jian Yang

http://arxiv.org/abs/2401.06442v1

Compressor summary: This paper introduces RotationDrag, a novel point-based image editing method that improves in-plane rotation accuracy by using feature maps of rotated images, and presents RotateBench, the first benchmark to evaluate this task on real and generated images.


Improving Low-Light Image Recognition Performance Based on Image-adaptive Learnable Module

Seitaro Ono,Yuka Ogino,Takahiro Toizumi,Atsushi Ito,Masato Tsukada

http://arxiv.org/abs/2401.06438v1

Compressor summary: The study presents a method to improve image recognition in low-light conditions by applying an adaptive image processing module and forecasting optimal parameters for it.


Improving Graph Convolutional Networks with Transformer Layer in social-based items recommendation

Thi Linh Hoang,Tuan Dung Pham,Viet Cuong Ta

http://arxiv.org/abs/2401.06436v1

Compressor summary: The paper presents a transformer-based GCN model with an improved encoder for node embedding and attention mechanism, which outperforms GCN in predicting social network ratings.


From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape

Changrong Xiao,Wenxing Ma,Sean Xin Xu,Kunpeng Zhang,Yufang Wang,Qi Fu

http://arxiv.org/abs/2401.06431v1

Compressor summary: The study shows that large language models like GPT-4 and fine-tuned GPT-3.5 can significantly improve automated essay scoring for second-language learners by providing accurate, consistent, generalizable, and interpretable feedback, as well as assisting human graders to perform better.


Mutual Distillation Learning For Person Re-Identification

Huiyuan Fu,Kuilong Cui,Chuanming Wang,Mengshi Qi,Huadong Ma

http://arxiv.org/abs/2401.06430v1

Compressor summary: The paper proposes MDPR, a novel approach for person re-identification that extracts features from multiple perspectives using mutual distillation and fusion, achieving state-of-the-art results on widely used datasets.


UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

Ji Liu,Dehua Tang,Yuanxian Huang,Li Zhang,Xiaocheng Zeng,Dong Li,Mingjie Lu,Jinzhang Peng,Yu Wang,Fan Jiang,Lu Tian,Ashish Sirasao

http://arxiv.org/abs/2401.06426v1

Compressor summary: Key points: - Traditional channel-wise pruning methods struggle to prune efficient CNN models with depth-wise convolutions and inverted residual blocks. - The paper proposes a novel depth pruning method that uses a block pruning strategy and progressive training for the subnet, and works on vision transformer models as well. - The method outperforms existing depth pruning methods and achieves state-of-the-art results on efficiency and performance. Summary: The paper presents a new depth pruning method that improves the efficiency and performance of efficient CNN models with depth-wise convolutions and inverted residual blocks, as well as vision transformer models.


Uncertainty quantification for probabilistic machine learning in earth observation using conformal prediction

Geethen Singh,Glenn Moncrieff,Zander Venter,Kerry Cawse-Nicholson,Jasper Slingsby,Tamara B Robinson

http://arxiv.org/abs/2401.06421v1

Compressor summary: Conformal prediction is a model-agnostic framework for uncertainty quantification that can improve the reliability of AI systems for Earth Observation applications without requiring access to the underlying model or training dataset.


Mission: Impossible Language Models

Julie Kallini,Isabel Papadimitriou,Richard Futrell,Kyle Mahowald,Christopher Potts

http://arxiv.org/abs/2401.06416v1

Compressor summary: The study tests GPT-2's ability to learn synthetic impossible languages and finds that it struggles, challenging the claim that LLMs can learn any possible language.


3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image

Junuk Cha,Hansol Lee,Jaewon Kim,Nhat Nguyen Bao Truong,Jae Shin Yoon,Seungryul Baek

http://arxiv.org/abs/2401.06415v1

Compressor summary: The paper presents a novel pipeline to reconstruct 3D geometry of interacting people in clothing from a single image using priors for complete geometry and surface contacts, overcoming occlusion challenges.


AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Li Lucy,Suchin Gururangan,Luca Soldaini,Emma Strubell,David Bamman,Lauren Klein,Jesse Dodge

http://arxiv.org/abs/2401.06408v1

Compressor summary: The authors examine how different quality and language identification filters affect web text pretraining data, revealing implicit biases in data curation based on social and geographic contexts.


Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

Lingchao Mao,Hairong Wang,Leland S. Hu,Nhan L Tran,Peter D Canoll,Kristin R Swanson,Jing Li

http://arxiv.org/abs/2401.06406v1

Compressor summary: Key points: - Machine learning can analyze multi-omics profiles and medical imaging for cancer diagnosis and prognosis - Machine learning models face challenges such as limited labeled samples, high-dimensionality data types, heterogeneity, and interpretability - Knowledge-informed machine learning integrates biomedical knowledge into data-driven models to improve accuracy, robustness, and interpretability - The paper reviews different forms of knowledge representation and integration strategies for four primary data types - The paper discusses future directions to advance cancer research through knowledge-informed machine learning Summary: The paper reviews how knowledge-informed machine learning, which integrates biomedical knowledge into data-driven models, can overcome challenges in cancer diagnosis and prognosis using multi-omics profiles and medical imaging.


Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

Taehee Kim,Yeongjae Cho,Heejun Shin,Yohan Jo,Dongmyung Shin

http://arxiv.org/abs/2401.06400v1

Compressor summary: CoQAH is a new method that uses a sequence of QA interactions between a language model and a VQA model to answer human-written questions for images, achieving state-of-the-art accuracy without finetuning.


An approach for mistranslation removal from popular dataset for Indic MT Task

Sudhansu Bala Das,Leo Raphael Rodrigues,Tapas Kumar Mishra,Bidyut Kr. Patra

http://arxiv.org/abs/2401.06398v1

Compressor summary: The paper proposes an algorithm to remove mistranslations from a parallel dataset for Indian languages and evaluates its impact on neural machine translation quality.


UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Bowen Shi,Peisen Zhao,Zichen Wang,Yuhang Zhang,Yaoming Wang,Jin Li,Wenrui Dai,Junni Zou,Hongkai Xiong,Qi Tian,Xiaopeng Zhang

http://arxiv.org/abs/2401.06397v1

Compressor summary: UMG-CLIP is a new model that improves vision-language understanding by aligning local image regions with text tokens at different levels of detail, achieving state-of-the-art performance on various tasks.


ModaVerse: Efficiently Transforming Modalities with LLMs

Xinyu Wang,Bohan Zhuang,Qi Wu

http://arxiv.org/abs/2401.06395v1

Compressor summary: ModaVerse is a new multi-modal language model that can understand and transform images, videos, and audio using natural language without complex latent feature alignments, making it more efficient and cost-effective.


Adaptive Data Augmentation for Aspect Sentiment Quad Prediction

Wenyuan Zhang,Xinghua Zhang,Shiyao Cui,Kun Huang,Xuebin Wang,Tingwen Liu

http://arxiv.org/abs/2401.06394v1

Compressor summary: The paper proposes an Adaptive Data Augmentation framework to address data imbalance issues in aspect-based sentiment analysis by enhancing tail quad patterns and aspect categories.


SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

Zhenlong Yuan,Jiakai Cao,Zhaoxin Li,Hao Jiang,Zhaoqi Wang

http://arxiv.org/abs/2401.06385v1

Compressor summary: SD-MVS is a new method that uses semantic segmentation, pixel deformation, and refinement techniques to reconstruct 3D models of textureless areas with high quality and efficiency.


Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs

Matthew L. Daggitt,Wen Kokke,Robert Atkey,Natalia Slusarz,Luca Arnaboldi,Ekaterina Komendantskaya

http://arxiv.org/abs/2401.06379v1

Compressor summary: Vehicle is a tool that helps verify neural-symbolic programs by linking problem-space properties to embedding-space properties, enabling formal verification of a simple self-driving car example.


Cognitive BPM as an Equalizer: Improving Access and Efficiency for Employees with (and without) Cognitive Disabilities

Gordon Banks,Gates Bierhuizen,Katherine McCrum,Ellen Wengert

http://arxiv.org/abs/2401.06375v1

Compressor summary: ProcessGPT is an AI model that helps design better business processes considering human cognitive limitations, benefiting both people with and without disabilities.


SamLP: A Customized Segment Anything Model for License Plate Detection

Haoxuan Ding,Junyu Gao,Yuan Yuan,Qi Wang

http://arxiv.org/abs/2401.06374v1

Compressor summary: The paper introduces SamLP, a license plate detector based on a vision foundation model (SAM) that leverages few-shot and zero-shot learning for diverse LP styles and appearances.


Graph Relation Distillation for Efficient Biomedical Instance Segmentation

Xiaoyu Liu,Yueyi Zhang,Zhiwei Xiong,Wei Huang,Bo Hu,Xiaoyan Sun,Feng Wu

http://arxiv.org/abs/2401.06370v1

Compressor summary: The text proposes a graph relation distillation method for efficient biomedical instance segmentation that transfers knowledge from heavy to lightweight networks using instance and pixel relation graphs, achieving high performance with significantly reduced parameters and inference time.


An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation

Md Arafat Sultan,Aashka Trivedi,Parul Awasthy,Avirup Sil

http://arxiv.org/abs/2401.06356v1

Compressor summary: This study examines how different configuration parameters in knowledge distillation affect student performance, identifying an optimal configuration for various natural language processing tasks and student sizes.


Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

Chang Yu,Junran Peng,Xiangyu Zhu,Zhaoxiang Zhang,Qi Tian,Zhen Lei

http://arxiv.org/abs/2401.06345v1

Compressor summary: The paper proposes a method to improve diffusion models' image generation by learning proper textual descriptions using quality and semantic guidance from pre-trained models.


Hyper-STTN: Social Group-aware Spatial-Temporal Transformer Network for Human Trajectory Prediction with Hypergraph Reasoning

Weizheng Wang,Le Mao,Baijian Yang,Guohua Chen,Byung-Cheol Min

http://arxiv.org/abs/2401.06344v1

Compressor summary: Hyper-STTN is a hypergraph-based method for predicting crowd trajectories that captures both pair-wise and group-wise interactions using spectral convolution and multimodal transformers.


AffordanceLLM: Grounding Affordance from Vision Language Models

Shengyi Qian,Weifeng Chen,Min Bai,Xiong Zhou,Zhuowen Tu,Li Erran Li

http://arxiv.org/abs/2401.06341v1

Compressor summary: The paper proposes a model that leverages large-scale vision language models to improve affordance grounding tasks, achieving better performance on in-the-wild object affordance grounding and handling unseen objects and actions.


Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity

Banafshe Felfeliyan,Yuyue Zhou,Shrimanti Ghosh,Jessica Kupper,Shaobo Liu,Abhilash Hareendranathan,Jacob L. Jaremko

http://arxiv.org/abs/2401.06331v1

Compressor summary: The study explores using Vision Language Processing models to predict osteoarthritis severity from X-ray images and reports, potentially improving diagnosis and paving the way for specialized AI in medicine.


Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

Jiaxin Wang,Lingling Zhang,Jun Liu,Tianlin Guo,Wenjun Wu

http://arxiv.org/abs/2401.06327v1

Compressor summary: The paper introduces a new task called GRD that involves finding novel relations or clustering instances in existing relations using semi-factual examples and proposes a framework (SFGRD) that outperforms current models.


Multi-Task Learning for Front-End Text Processing in TTS

Wonjune Kang,Yun Wang,Shun Zhang,Arthur Hinsvark,Qing He

http://arxiv.org/abs/2401.06321v1

Compressor summary: The paper proposes a multi-task learning model for text-to-speech tasks that uses shared representations and pre-trained language embeddings to improve text normalization, part-of-speech tagging, and homograph disambiguation performance.


Striking a Balance in Fairness for Dynamic Systems Through Reinforcement Learning

Yaowei Hu,Jacob Lear,Lu Zhang

http://arxiv.org/abs/2401.06318v1

Compressor summary: The paper proposes an algorithmic framework to integrate fairness and reinforcement learning in dynamic systems with sequential decisions.


Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou,Leheng Zhang,Xiaorui Zhao,Keze Wang,Leida Li,Shuhang Gu

http://arxiv.org/abs/2401.06312v1

Compressor summary: The paper proposes MIA-VSR, a feature-level masked processing framework for video super-resolution that reduces redundant computations and improves memory and computation efficiency.


Beyond the Surface: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

Akshita Jha,Vinodkumar Prabhakaran,Remi Denton,Sarah Laszlo,Shachi Dave,Rida Qadri,Chandan K. Reddy,Sunipa Dev

http://arxiv.org/abs/2401.06310v1

Compressor summary: This study evaluates visual stereotypes in Text-to-Image models for 135 nationality-based identity groups and finds that they are often present, offensive, and similar across different attributes.


Misconfidence-based Demonstration Selection for LLM In-Context Learning

Shangqing Xu,Chao Zhang

http://arxiv.org/abs/2401.06301v1

Compressor summary: In-Context Reflection (ICR) is a method that selects and refines demonstrations for large language models to improve their learning efficiency and generalization across tasks.