This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-18 generated by the compressor, my personal LLM-based project.
Chung Min Kim,Mingxuan Wu,Justin Kerr,Ken Goldberg,Matthew Tancik,Angjoo Kanazawa
http://arxiv.org/abs/2401.09419v1
Compressor summary: GARField is a method to decompose 3D scenes into groups based on posed images, using scale and physical features to handle ambiguity and create multi-view consistent hierarchies.
Lianghui Zhu,Bencheng Liao,Qian Zhang,Xinlong Wang,Wenyu Liu,Xinggang Wang
http://arxiv.org/abs/2401.09417v1
Compressor summary: The paper proposes Vim, a new generic vision backbone based on bidirectional Mamba blocks, which improves efficiency and performance compared to existing vision transformers like DeiT for image classification, object detection, and semantic segmentation tasks.
Yu-Ying Yeh,Jia-Bin Huang,Changil Kim,Lei Xiao,Thu Nguyen-Phuoc,Numair Khan,Cheng Zhang,Manmohan Chandraker,Carl S Marshall,Zhao Dong,Zhengqin Li
http://arxiv.org/abs/2401.09416v1
Compressor summary: TextureDreamer is a novel image-guided texture synthesis method that can transfer detailed textures from real-world environments to any 3D object using only a few casually captured images, outperforming previous methods.
Shaobin Zhuang,Kunchang Li,Xinyuan Chen,Yaohui Wang,Ziwei Liu,Yu Qiao,Yali Wang
http://arxiv.org/abs/2401.09414v1
Compressor summary: Vlogger is an AI system that uses a large language model as director to generate minute-long vlogs with diverse scenes, by coordinating various foundation models for scripting, acting, videography, and voice acting.
Mazal Bethany,Brandon Wherry,Emet Bethany,Nishant Vishwamitra,Peyman Najafirad
http://arxiv.org/abs/2401.09407v1
Compressor summary: The paper introduces T5LLMCipher, a system that detects machine-generated text in real-world scenarios by combining a pretrained T5 encoder with LLM embedding sub-clustering, achieving better generalization and attribution than existing methods.
Pengfei Hong,Deepanway Ghosal,Navonil Majumder,Somak Aditya,Rada Mihalcea,Soujanya Poria
http://arxiv.org/abs/2401.09395v1
Compressor summary: The authors develop a method to create a dataset of perturbed math problems to test the limits of large language models' mathematical reasoning skills, and find that current models perform worse on these questions, indicating a lack of robustness in their abilities.
Luchuan Song,Pinxin Liu,Lele Chen,Celong Liu,Chenliang Xu
http://arxiv.org/abs/2401.09386v1
Compressor summary: The Tri$^2$-plane approach improves monocular photo-realistic volumetric head avatar reconstruction by using multiple tri-planes for details enhancement and a camera-based sliding window method for robustness.
Kevin Slote,Elaine Lee
http://arxiv.org/abs/2401.09376v1
Compressor summary: The authors propose a new method to estimate performance metrics in machine learning without ground truth, using latent classes and Gibbs sampling from Hui-Walter paradigm.
Vladimir Vlasov
http://arxiv.org/abs/2401.09343v1
Compressor summary: A lightweight slot labelling method outperforms heavy PLMs in finding important arguments in user turns, requiring fewer resources and being more suitable for real-world applications.
Baoxiong Jia,Yixin Chen,Huangyue Yu,Yan Wang,Xuesong Niu,Tengyu Liu,Qing Li,Siyuan Huang
http://arxiv.org/abs/2401.09340v1
Compressor summary: The paper introduces SceneVerse, a large-scale 3D indoor scene dataset and GPS, a pre-training framework for 3D vision-language learning that addresses challenges in grounding language with 3D scenes.
Meng Fang,Shilong Deng,Yudi Zhang,Zijing Shi,Ling Chen,Mykola Pechenizkiy,Jun Wang
http://arxiv.org/abs/2401.09334v1
Compressor summary: The paper explores using Large Language Models to enhance symbolic reasoning in text-based games, achieving high performance in various symbolic tasks.
Diana Davila Gordillo,Joan Timoneda,Sebastian Vallejo Vera
http://arxiv.org/abs/2401.09333v1
Compressor summary: Key points: - The article presents a guideline to identify and classify different forms of racist discourse in large text corpora using XLM-R, a cross-lingual model. - The approach considers the context and time of interest for each form of racism. - The approach outperforms other methods and is illustrated with tweets about the Ecuadorian ind'igena community. Summary: The article introduces a method to classify various forms of racist language in large texts using a cross-lingual model that adapts to different contexts and times, and shows its effectiveness on tweets about an indigenous group.
Wanting Xu,Si'ao Zhang,Li Cui,Xin Peng,Laurent Kneip
http://arxiv.org/abs/2401.09331v1
Compressor summary: Key points: - Event-based motion estimation is hard but possible for planar ground vehicles using Ackermann steering model and constrained non-holonomic motion model - Single feature n-linearities are extended to quasi time-continuous event-tracks via Taylor expansions - Histogram voting is used for robust averaging over multiple event tracks - Algorithm achieves accurate and robust estimates comparable to frame-based sensors in normal conditions and outperforms traditional alternatives in challenging illumination scenarios Summary: The authors propose a method for event-based visual odometry on planar ground vehicles using Taylor expansions, histogram voting, and the Ackermann steering model, which performs similarly or better than frame-based sensors.
Wanting Xu,Lan Hu,Manolis C. Tsakiris,Laurent Kneip
http://arxiv.org/abs/2401.09328v1
Compressor summary: The paper presents two contributions: showing that variable reordering affects accuracy in geometric vision problems and proposing a method to train a classifier for selecting the best solver using only the original coefficients.
Jia Jia,Geunho Lee,Zhibo Wang,Lyu Zhi,Yuchu He
http://arxiv.org/abs/2401.09325v1
Compressor summary: SMDNet is a new deep learning network that combines Siamese Meets Diffusion Network to improve edge change detection accuracy and robustness in remote sensing images.
Haixin Wang,Jiaxin Li,Anubhav Dwivedi,Kentaro Hara,Tailin Wu
http://arxiv.org/abs/2401.09323v1
Compressor summary: Boundary-Embedded Neural Operators (BENO) is a novel neural operator architecture that solves elliptic PDEs with complex geometries and inhomogeneous boundary values more efficiently by using Graph Neural Networks (GNNs) and Transformer encoder.
Wanting Xu,Xin Peng,Laurent Kneip
http://arxiv.org/abs/2401.09296v1
Compressor summary: The paper proposes a novel method for estimating camera velocities using a dynamic vision sensor and trifocal tensor geometry, improving stability and accuracy in highly dynamic scenarios.
Zhou Lu,Qiuyi Zhang,Xinyi Chen,Fred Zhang,David Woodruff,Elad Hazan
http://arxiv.org/abs/2401.09278v1
Compressor summary: The paper presents optimal bandit algorithms that achieve near-optimal adaptive regret in fast changing environments with minimal queries per round and show their effectiveness in various applications.
Konrad Heidler,Ingmar Nitze,Guido Grosse,Xiao Xiang Zhu
http://arxiv.org/abs/2401.09271v1
Compressor summary: The text describes a semi-supervised learning approach called PixelDINO that uses deep learning to detect retrogressive thaw slumps in Arctic permafrost using remote sensing data and improves generalization across the region.
Chuyu Zhang,Hui Ren,Xuming He
http://arxiv.org/abs/2401.09266v1
Compressor summary: The paper proposes a novel framework for deep imbalanced clustering that uses pseudo-labeling based on progressive optimal transport to handle class distribution and achieves good performance on various datasets.
Zongjiang Shang,Ling Chen
http://arxiv.org/abs/2401.09261v1
Compressor summary: MSHyper is a framework for modeling high-order interactions between temporal patterns of different scales for long-range time series forecasting, achieving state-of-the-art results.
Feiyang Ye,Baijiong Lin,Xiaofeng Cao,Yu Zhang,Ivor Tsang
http://arxiv.org/abs/2401.09257v1
Compressor summary: The paper presents FORUM, a novel first-order multi-gradient method for solving Multi-Objective Bi-Level Optimization (MOBLO) problems efficiently and effectively in different learning scenarios.
Thiago Lopes Trugillo da Silveira,Paulo Gamarra Lessa Pinto,Jeffri Erwin Murrugarra Llerena,Claudio Rosito Jung
http://arxiv.org/abs/2401.09252v1
Compressor summary: This paper reviews various methods for estimating 3D scene geometry from omnidirectional images, covering monocular, stereo, and multi-view approaches.
Loay Mualem,Murad Tukan,Moran Fledman
http://arxiv.org/abs/2401.09251v1
Compressor summary: This paper proposes new algorithms for non-convex optimization that can smoothly interpolate between different types of constraints, unlike previous methods that relied on the minimum $\ell_\infty$ norm.
Dominic Petrak,Thy Thy Tran,Iryna Gurevych
http://arxiv.org/abs/2401.09248v1
Compressor summary: The authors introduce FEDI, a dialogue dataset for task-oriented document-grounded dialogues with demographic information, user emotions and implicit feedback annotations, which improves system performance and user acceptance.
Jan Küchler,Daniel Kröll,Sebastian Schoenen,Andreas Witte
http://arxiv.org/abs/2401.09245v1
Compressor summary: The paper proposes a meta-classification model to improve the reliability and quality of car body part segmentation in motor claims handling using low-quality photos.
Aiqi Jiang,Arkaitz Zubiaga
http://arxiv.org/abs/2401.09244v1
Compressor summary: This paper explores cross-lingual transfer learning techniques for detecting offensive language in social media, analysing 67 papers and summarising three main transfer approaches.
Mikel De Iturrate Reyzabal,Mingcong Chen,Wei Huang,Sebastien Ourselin,Hongbin Liu
http://arxiv.org/abs/2401.09239v1
Compressor summary: Key points: - The paper introduces a new dataset and a pipeline to train deep neural models for contact force prediction in MIRS - It compares single dataset training with dataset mixing and shows that mixing improves translation across different domains - It also shows that increasing data volume boosts the performance of transformers for force estimation Summary: The paper presents a new vision-haptic dataset and a method to mix different datasets for contact force prediction in MIRS, which improves generalization and performance compared to single dataset training.
Jan Rathjens,Laurenz Wiskott
http://arxiv.org/abs/2401.09237v1
Compressor summary: The study analyzes how classification and reconstruction processes interact in deep learning architectures inspired by predictive coding and finds a trade-off effect between them in shared latent layers, which can be alleviated by expanding dimensions or increasing network complexity.
Marco Pacini,Xiaowen Dong,Bruno Lepri,Gabriele Santin
http://arxiv.org/abs/2401.09235v1
Compressor summary: The paper presents a theorem on equivariant neural networks, discusses their practical relevance, and characterizes different types of such networks.
Jiawei Wang,Shunchi Zhang,Kai Hu,Chixiang Ma,Zhuoyao Zhong,Lei Sun,Qiang Huo
http://arxiv.org/abs/2401.09232v1
Compressor summary: The text introduces a new graph generation framework for detecting contextual text blocks in natural scenes, using DQ-DETR for node detection and DRFormer for edge generation.
Kai Hu,Jiawei Wang,Weihong Lin,Zhuoyao Zhong,Lei Sun,Qiang Huo
http://arxiv.org/abs/2401.09220v1
Compressor summary: UniVIE is a new model for Visual Information Extraction from form-like documents that uses a unified relation prediction approach to handle hierarchical structures.
Janny Steeven,Nadri Madiha,Digne Julie,Wolf Christian
http://arxiv.org/abs/2401.09198v1
Compressor summary: The authors propose a data-driven method for predicting fluid dynamics in a continuous spatial and temporal domain using recurrent GNNs and spatio-temporal attention, achieving better results than existing approaches.
Jiaqi Guo,Sitong Su,Junchen Zhu,Lianli Gao,Jingkuan Song
http://arxiv.org/abs/2401.09195v1
Compressor summary: The paper proposes a training-free method using a diffusion model to create harmonious video compositions with different foregrounds and backgrounds, handling various semantic disparities.
Alessandro Bicciato,Luca Cosmo,Giorgia Minello,Luca Rossi,Andrea Torsello
http://arxiv.org/abs/2401.09193v1
Compressor summary: Key points: - New graph neural network architecture using local feature distribution analysis - Histogram intersection kernel for similarity measurement - Outperforms graph kernels and graph neural networks on standard benchmarks Summary: The paper introduces a novel graph neural network that uses histogram intersection kernel to compare feature distributions locally, and shows its superior performance on graph classification and regression tasks.
Yu Pan,Ye Yuan,Yichun Yin,Jiaxin Shi,Zenglin Xu,Ming Zhang,Lifeng Shang,Xin Jiang,Qun Liu
http://arxiv.org/abs/2401.09192v1
Compressor summary: Apollo is a novel method to train deep Transformers faster by learning high-layer functionality during low-layer training, with techniques like low-value-prioritized sampling and weight sharing.
Nicolas Garcia Trillos,Matt Jacobs,Jakwang Kim,Matthew Werenski
http://arxiv.org/abs/2401.09191v1
Compressor summary: The paper proposes computationally efficient algorithms for adversarial training in multiclass classification using multimarginal optimal transport, and shows their effectiveness on image datasets.
Walid Brahmi,Imen Jdey,Fadoua Drira
http://arxiv.org/abs/2401.09190v1
Compressor summary: This paper summarizes existing studies on using deep learning techniques for dental imaging analysis to improve diagnostic accuracy and early detection of oral health issues.
Junhao Zheng,Qianli Ma,Zhen Liu,Binquan Wu,Huawen Feng
http://arxiv.org/abs/2401.09181v1
Compressor summary: Fwd-Prompt is a prompt-based method that reduces catastrophic forgetting and negative forward transfer in multimodal large language models by projecting gradients to minimize interference between tasks and reuse pre-trained knowledge.
Almudévar Antonio,Mariotte Théo,Ortega Alfonso,Tahon Marie
http://arxiv.org/abs/2401.09180v1
Compressor summary: Key points: - The paper proposes a method for unsupervised translation between domains using a modified Variational Autoencoder with two disentangled latent variables. - One latent variable is controlled by the domain, and the other one by other factors of the data. - The approach outperforms existing methods on different vision datasets and allows better control and understanding of the latent space. Summary: The paper presents a new Variational Autoencoder with two disentangled latent variables for unsupervised translation between domains, which improves performance and latent space interpretation over existing methods.
Liye Chen,Biaoshun Li,Yihao Chen,Mujie Lin,Shipeng Zhang,Chenxin Li,Yu Pang,Ling Wang
http://arxiv.org/abs/2401.09176v1
Compressor summary: Key points: - ADCNet is a deep learning framework to help design potential ADCs - It integrates protein and small-molecule representation learning models - It predicts activity based on antigen, antibody, linker, payload, and DAR features - It outperforms baseline machine learning models and shows stability and robustness - It has an online platform and open source code Summary: ADCNet is a deep learning framework that uses protein and small-molecule language models to predict the activity of antibody-drug conjugates based on their structures and features, and provides an online platform and open source code.
Kunpeng Guo,Clement Defretiere,Dennis Diefenbach,Christophe Gravier,Antoine Gourru
http://arxiv.org/abs/2401.09175v1
Compressor summary: The text discusses using Question Answering technologies for website search functionality, combining knowledge graphs and free text, and evaluates their benefits and drawbacks using case studies from Wikimedia Foundation websites.
Kunpeng Guo,Dennis Diefenbach,Antoine Gourru,Christophe Gravier
http://arxiv.org/abs/2401.09168v1
Compressor summary: The paper proposes a better strategy for fine-tuning QA models using pre-trained language models, which significantly improves performance under low annotation budgets without extra labeling effort.
Feng Jiang,Kuang Wang,Haizhou Li
http://arxiv.org/abs/2401.09150v1
Compressor summary: MMAPIS is an open-source system that uses LLMs to preprocess, summarize, and present scientific papers effectively across various modalities and scenarios.
Hexiang Wang,Fengqi Liu,Qianyu Zhou,Ran Yi,Xin Tan,Lizhuang Ma
http://arxiv.org/abs/2401.09146v1
Compressor summary: The paper proposes a new method to animate images using expressive transformations based on keypoints, and improves the semantic consistency and structure alignment between source and driving images.
Haowen Wang,Zhen Zhao,Zhao Jin,Zhengping Che,Liang Qiao,Yakun Huang,Zhipeng Fan,Xiuquan Qiao,Jian Tang
http://arxiv.org/abs/2401.09133v1
Compressor summary: Key points: - The paper proposes a self-supervised method (SM$^3$) for modeling articulated objects from multi-view RGB images before and after interaction - SM$^3$ reconstructs 3D geometries and textures, identifies movable parts, and infers joint parameters without annotations - The paper introduces the MMArt dataset, which covers diverse categories of articulated objects with multi-view and multi-modal data Summary: The paper presents SM$^3$, a self-supervised method that models articulated objects from multi-view images and reconstructs their 3D structures and joints without annotations, and the MMArt dataset, which supports this approach.
Benjamin Ummenhofer,Sanskar Agrawal,Rene Sepulveda,Yixing Lao,Kai Zhang,Tianhang Cheng,Stephan Richter,Shenlong Wang,German Ros
http://arxiv.org/abs/2401.09126v1
Compressor summary: The paper introduces a real-world dataset for measuring object reconstruction and rendering quality under different lighting conditions, showing that novel view synthesis is not a reliable proxy for performance evaluation.
Junfu Wang,Yuanfang Guo,Liang Yang,Yunhong Wang
http://arxiv.org/abs/2401.09125v1
Compressor summary: The paper studies how different heterophily patterns affect Graph Neural Networks, proposes Heterophilous Stochastic Block Models to incorporate graph convolution operations, and provides theoretical insights on separability gains for various heterophily patterns and node degrees.
Shuo Wang,Fan Jia,Yingfei Liu,Yucheng Zhao,Zehui Chen,Tiancai Wang,Chi Zhang,Xiangyu Zhang,Feng Zhao
http://arxiv.org/abs/2401.09112v1
Compressor summary: The paper proposes a novel method, Stream Query Denoising (SQD), for improving temporal modeling in high-definition map construction for autonomous driving by denoising stream queries.
Johannes Theodoridis,Jessica Hofmann,Johannes Maucher,Andreas Schilling
http://arxiv.org/abs/2401.09109v1
Compressor summary: This study evaluates the robustness of 20 instance segmentation models to out-of-distribution texture and identifies YOLACT++, SOTR, and SOLOv2 as the most robust frameworks.
Haowen Hou,F. Richard Yu
http://arxiv.org/abs/2401.09093v1
Compressor summary: The authors propose an efficient RNN-based model, RWKV-TS, for time series tasks with three features: low time complexity and memory usage, enhanced long-term sequence information capture, and high computational efficiency.
Ludan Ruan,Lei Tian,Chuanwei Huang,Xu Zhang,Xinyan Xiao
http://arxiv.org/abs/2401.09084v1
Compressor summary: Key points: - The paper proposes a Unified-modal Video Generation system that handles multiple tasks across text and image modalities. - It classifies video generation tasks into high-freedom and low-freedom categories, and uses different methods for each category to generate videos with better alignment and preservation of input conditions. - It outperforms existing methods in FVD, human evaluations, and is comparable to a close-source method. Summary: The paper presents a system that generates videos from text and image inputs in various tasks, using different techniques for high-freedom and low-freedom video generation to improve quality and alignment.
Haonan Guo,Xin Su,Chen Wu,Bo Du,Liangpei Zhang,Deren Li
http://arxiv.org/abs/2401.09083v1
Compressor summary: Remote Sensing ChatGPT is an AI agent that uses a large language model to understand user requests and execute tasks involving remote sensing images, making interpretation accessible to non-experts.
Lize Alberts,Geoff Keeling,Amanda McCroskery
http://arxiv.org/abs/2401.09082v1
Compressor summary: The text discusses the importance of considering relational and situational factors in ensuring ethical behavior of dialogue agents based on large language models, focusing on how they can be treated as good social actors.
Emanuele La Malfa,Christoph Weinhuber,Orazio Torre,Fangru Lin,Anthony Cohn,Nigel Shadbolt,Michael Wooldridge
http://arxiv.org/abs/2401.09074v1
Compressor summary: The authors study how well Large Language Models can execute computer code and find that they struggle with complex programs, suggesting a new prompting method to improve their performance.
Zhirui Chen,P. N. Karthik,Yeow Meng Chee,Vincent Y. F. Tan
http://arxiv.org/abs/2401.09073v1
Compressor summary: The paper proposes a policy for identifying the best arm in linear bandits with a fixed budget and differential privacy constraints, and analyzes its performance in terms of error probability bounds.
Jingwei Guo,Kaizhu Huang,Xinping Yi,Zixian Su,Rui Zhang
http://arxiv.org/abs/2401.09071v1
Compressor summary: This paper connects spectral filtering in Graph Neural Networks (GNNs) to spatial aggregation, showing their interpretability and proposing a new Spatially Adaptive Filtering (SAF) method that improves node classification performance.
Qinghua Huang,Yongzhen Wang
http://arxiv.org/abs/2401.09070v1
Compressor summary: This paper introduces a new inference method for knowledge graphs that uses high-level pyramidal knowledge to improve generalization and robustness in reasoning tasks.
Lixiang Han,Zhen Xiao,Zhenjiang Li
http://arxiv.org/abs/2401.09068v1
Compressor summary: DTMM is a library that efficiently deploys and executes pruned machine learning models on weak IoT devices like MCUs, using techniques for model compression, optimization, acceleration, and storage.
Depeng Li,Tianqi Wang,Junwei Chen,Qining Ren,Kenji Kawaguchi,Zhigang Zeng
http://arxiv.org/abs/2401.09067v1
Compressor summary: The paper proposes a method to reduce catastrophic forgetting in deep neural networks by overwriting layer-wise parameters and adapting decision boundaries between tasks without using extra memory or privacy-sensitive techniques.
Yunze Liu,Changxi Chen,Zifan Wang,Li Yi
http://arxiv.org/abs/2401.09057v1
Compressor summary: The paper presents CrossVideo, a self-supervised method for point cloud video understanding using cross-modal contrastive learning between point clouds and images.
Zike Wu,Pan Zhou,Xuanyu Yi,Xiaoding Yuan,Hanwang Zhang
http://arxiv.org/abs/2401.09050v1
Compressor summary: The paper proposes a method called Consistent3D that uses ordinary differential equations to generate more accurate and consistent 3D models from text, improving on existing text-to-3D methods.
Raphael van Kempen,Tim Rehbronn,Abin Jose,Johannes Stegmaier,Bastian Lampe,Timo Woopen,Lutz Eckstein
http://arxiv.org/abs/2401.09049v1
Compressor summary: The text investigates how to improve lidar-based object detection in adverse weather by processing sequential data samples from lidar sensors using various neural network architectures, including a novel temporal offset augmentation strategy.
Jonghyun Lee,Hansam Cho,Youngjoon Yoo,Seoung Bum Kim,Yonghyun Jeong
http://arxiv.org/abs/2401.09048v1
Compressor summary: The paper introduces a method to generate images by controlling the placement of objects at different depths and applying global style from multiple examples.
Haoxin Chen,Yong Zhang,Xiaodong Cun,Menghan Xia,Xintao Wang,Chao Weng,Ying Shan
http://arxiv.org/abs/2401.09047v1
Compressor summary: The paper proposes a method to train video models using low-quality videos and synthesized high-quality images, resulting in better quality videos without motion degradation.
Zhiming Li,Yushi Cao,Xiufeng Xu,Junzhe Jiang,Xu Liu,Yon Shin Teo,Shang-wei Lin,Yang Liu
http://arxiv.org/abs/2401.09042v1
Compressor summary: The text evaluates large language models' reasoning ability on an inductive logic programming benchmark, finding them to have poor performance and generalization compared to smaller neural program induction systems.
Kittipitch Kuptavanich,Ehud Reiter,Kees Van Deemter,Advaith Siddharthan
http://arxiv.org/abs/2401.09041v1
Compressor summary: The paper presents and tests a method to create brief descriptions of groups of academic papers or consumer products using rules.
Tong Xie,Haoyu Li,Andrew Bai,Cho-Jui Hsieh
http://arxiv.org/abs/2401.09031v1
Compressor summary: Diffusion-TracIn and Diffusion-ReTrac are methods that help understand how neural networks learn over time by tracing model behavior back to training data and mitigating bias in influence estimation.
Dunyuan Xu,Xi Wang,Jinyue Cai,Pheng-Ann Heng
http://arxiv.org/abs/2401.09029v1
Compressor summary: The text proposes a novel method for automated brain tumor diagnosis using multiple MRI modalities and dual attention to capture semantic interdependencies, which can improve accuracy and reduce noise.
Krishanu Maity,Prince Jha,Raghav Jain,Sriparna Saha,Pushpak Bhattacharyya
http://arxiv.org/abs/2401.09023v1
Compressor summary: The authors propose a new multi-task model for detecting cyberbullying in code-mixed languages, which can also explain why a post is considered bullying.
Jianing Li,Vardan Papyan
http://arxiv.org/abs/2401.09018v1
Compressor summary: The paper investigates how ResNets work by analyzing their residual blocks and identifies Residual Alignment, a process that leads to good generalization across various settings.
Udesh Habaraduwa
http://arxiv.org/abs/2401.09011v1
Compressor summary: The paper emphasizes the need for more transparent and explainable artificial neural networks that provide insights and not just predictions.
Sulthan Rafif,Mochamad Arfan Ravy Wahyu Pratama,Mohammad Faris Azhar,Ahmad Mustafidul Ibad,Lailil Muflikhah,Novanto Yudistira
http://arxiv.org/abs/2401.09008v1
Compressor summary: The paper proposes a CNN model with learnable stride and spectral pooling techniques to improve image classification accuracy by preserving more information.
Xingming Long,Shiguang Shan,Jie Zhang
http://arxiv.org/abs/2401.09006v1
Compressor summary: The paper introduces AG-FAS, a method that leverages real faces to improve face anti-spoofing models by generating "real" versions of input faces using a De-spoofing Face Generator and then extracting anomalous cues with a cross-attention transformer.
Haoxiong Liu,Andrew Chi-Chih Yao
http://arxiv.org/abs/2401.09003v1
Compressor summary: The authors introduce MMIQC, a dataset that improves mathematical reasoning skills of large language models by combining web data and synthetic question-response pairs, leading to higher accuracy on math problems.
Dong shu,Mingyu Jin,Suiyuan Zhu,Beichen Wang,Zihao Zhou,Chong Zhang,Yongfeng Zhang
http://arxiv.org/abs/2401.09002v1
Compressor summary: The study introduces two novel evaluation frameworks to measure how effective jailbreak attacks are on large language models and develops a comprehensive dataset as a benchmark for future research.
Hugo Laurencon,Yesoda Bhargava,Riddhi Zantye,Charbel-Raphaël Ségerie,Johann Lussange,Veeky Baths,Boris Gutkin
http://arxiv.org/abs/2401.08999v1
Compressor summary: The paper proposes a continuous time-space model (CTCS-HRRL) to study how living beings maintain their internal balance by learning from their environment and their own changing states.
Yoonhwa Jung,Ikhyun Cho,Shun-Hsiang Hsu,Julia Hockenmaier
http://arxiv.org/abs/2401.08998v1
Compressor summary: ARU is a novel machine-unlearning approach that uses adversarial noise to create a mask and reset specific parameters, making them unlearnable and improving privacy and regulatory compliance.
Ye Qiao,Haocheng Xu,Yifan Zhang,Sitao Huang
http://arxiv.org/abs/2401.08996v1
Compressor summary: MicroNAS is a fast and efficient method for finding optimal neural network architectures for edge devices, like microcontrollers, without sacrificing accuracy.
Junwen Bai,Bo Li,Qiujia Li,Tara N. Sainath,Trevor Strohman
http://arxiv.org/abs/2401.08992v1
Compressor summary: The study proposes a simple and effective method to improve streaming multilingual ASR by using Language-Dependent Adapters (LDAs) that only account for 0.4% of the full model per language, achieving significant word error rate reduction and alleviating performance degradation.
Ziyang Yu,Wenbing Huang,Yang Liu
http://arxiv.org/abs/2401.08986v1
Compressor summary: ElliDock is a novel, fast, and accurate learning-based method for protein-protein docking that uses elliptic paraboloid interfaces to represent the docking interface.
Chen Qi,Yang Jingjing,Huang Ming,Zhou Qiang
http://arxiv.org/abs/2401.08976v1
Compressor summary: The paper introduces ACT-GAN, a new method to create more accurate radio maps using generative adversarial networks and various blocks, which improves reconstruction accuracy and local texture, and works well in different scenarios.
Aditya Sharma,Luke Yoffe,Tobias Höllerer
http://arxiv.org/abs/2401.08973v1
Compressor summary: The paper presents OCTO+, a new method for automatically placing virtual objects in natural locations using open-vocabulary vision-language models, and introduces a benchmark for evaluating this task without user studies.
Yufeng Yin,Ishwarya Ananthabhotla,Vamsi Krishna Ithapu,Stavros Petridis,Yu-Hsiang Wu,Christi Miller
http://arxiv.org/abs/2401.08972v1
Compressor summary: The paper proposes a self-supervised pre-training strategy to detect hearing loss from facial expressions in real-world conversation scenarios, mitigating age bias using adversarial representation learning.
Xiaotian Han,Yiqi Wang,Bohan Zhai,Quanzeng You,Hongxia Yang
http://arxiv.org/abs/2401.08968v1
Compressor summary: Key points: - MLLMs are AI models that need visual IFT to align output with user intentions - Visual IFT datasets are constructed with a multifaceted approach using GPT-4 and rule-based templates - LLaVA-mix-665k is an effective but limited IFT dataset for multi-round dialog - A new IFT dataset with diverse and detailed instructions is proposed and shows better performance on open-ended evaluation benchmarks Summary: The paper proposes a new IFT dataset for MLLMs that improves their open-ended generation and instruction following ability in dialog settings.
Trung Quoc Luong,Xinbo Zhang,Zhanming Jie,Peng Sun,Xiaoran Jin,Hang Li
http://arxiv.org/abs/2401.08967v1
Compressor summary: Reinforced Fine-Tuning (ReFT) improves the reasoning capability of Large Language Models (LLMs) by using online reinforcement learning to learn from multiple annotated reasoning paths for math problems, without needing extra training data.
Lei Xun,Jonathon Hare,Geoff V. Merrett
http://arxiv.org/abs/2401.08965v1
Compressor summary: The text proposes a system for managing DNN performance trade-offs using dynamic super-networks and runtime resource management to improve efficiency and reduce latency on mobile devices.
Yihan Du,R. Srikant,Wei Chen
http://arxiv.org/abs/2401.08961v1
Compressor summary: The text proposes a generalized cascading bandit framework that considers user states and state transitions in recommendation systems, and develops two computationally-efficient and sample-efficient algorithms with near-optimal regret and sample complexity guarantees.
Teng Xiao,Suhang Wang
http://arxiv.org/abs/2401.08959v1
Compressor summary: The paper proposes a new algorithm that optimizes both user rewards and ranking metrics using offline data in a unified EM framework.
Lei Xun,Mingyu Hu,Hengrui Zhao,Amit Kumar Singh,Jonathon Hare,Geoff V. Merrett
http://arxiv.org/abs/2401.08943v1
Compressor summary: Fluid DyDNNs are a novel distributed DNN inference approach that improves system reliability, adaptability, and efficiency on embedded Arm CPUs.
Saba Aslam,Abdur Rasool,Hongyan Wu,Xiaoli Li
http://arxiv.org/abs/2401.08940v1
Compressor summary: The study introduces a novel Continual Extreme Learning (CEL) model that uses Elastic Weight Consolidation to mitigate catastrophic forgetting and achieve high performance in predicting disease outbreaks.
Weiyao Wang,Pierre Gleize,Hao Tang,Xingyu Chen,Kevin J Liang,Matt Feiszli
http://arxiv.org/abs/2401.08937v1
Compressor summary: ICON is a novel optimization procedure for training NeRFs from 2D video frames without pose initialization, using smooth camera motion and adaptive weighting of gradients based on confidence measures.
Aida Afshar,Wenchao Li
http://arxiv.org/abs/2401.08936v1
Compressor summary: DeLF is a method that uses large language models to design and code user-intended learning scenarios for reinforcement learning applications.
Zili Liu,Hao Chen,Wenyuan Li,Keyan Chen,Zipeng Qi,Chenyang Liu,Zhengxia Zou,Zhenwei Shi
http://arxiv.org/abs/2401.08932v1
Compressor summary: The text proposes a new dataset and training strategy for cloud and snow detection in remote sensing images, addressing the impact of noisy labels and improving performance with UNet and Segformer models.
Haorui Ji,Hongdong Li
http://arxiv.org/abs/2401.08930v1
Compressor summary: PADS is a novel framework that uses diffusion synthesis to learn a task-agnostic pose prior and unify various 3D human pose analysis tasks as inverse problems.
Songlin Fan,Zixuan Guo,Wei Gao,Ge Li
http://arxiv.org/abs/2401.08926v1
Compressor summary: The paper proposes a probabilistic model for point cloud quality assessment that accounts for human judging stochasticity and generates multiple intermediate ratings to predict the final quality score.
Muhammad ElNokrashy,Badr AlKhamissi
http://arxiv.org/abs/2401.08919v1
Compressor summary: The study introduces Context-Contrastive Partial Diacritization (CCPD), a new method for improving readability and disambiguation in Arabic texts by marking only some characters based on context, and proposes novel performance indicators and a Transformer-variant model for CCPD.
Chengxu Wu,Qinrui Fan,Shu Hu,Xi Wu,Xin Wang,Jing Hu
http://arxiv.org/abs/2401.08913v1
Compressor summary: SVAN is a novel network that uses large receptive fields for efficient super-resolution by combining convolution operations and an attention mechanism, achieving high-quality results with fewer parameters.
Renchunzi Xie,Ambroise Odonnat,Vasilii Feofanov,Ievgen Redko,Jianfeng Zhang,Bo An
http://arxiv.org/abs/2401.08909v1
Compressor summary: The paper proposes a new method to estimate test accuracy using gradient norms over one gradient step, which works better than existing methods under distribution shift.
Tianwei Ni,Benjamin Eysenbach,Erfan Seyedsalehi,Michel Ma,Clement Gehring,Aditya Mahajan,Pierre-Luc Bacon
http://arxiv.org/abs/2401.08898v1
Compressor summary: The paper introduces self-predictive abstraction as a unifying idea behind various representation learning methods in deep reinforcement learning, and presents a minimalist algorithm to learn such representations with theoretical insights and empirical validation.
Hee-Jun Jung,Jaehyoung Jeong,Kangil Kim
http://arxiv.org/abs/2401.08897v1
Compressor summary: CFASL is a novel method for unsupervised learning of symmetry-based disentanglement in VAEs without knowing the dataset factor information, incorporating three features to align latent vector dimensions and induce group equivariant encoder and decoder.
Mark Zhao,Emanuel Adamiak,Christos Kozyrakis
http://arxiv.org/abs/2401.08895v1
Compressor summary: Cedar is a framework that simplifies building, optimizing, and executing input data pipelines for machine learning training using various optimizations and operators.
Kaan Ozkara,Can Karakus,Parameswaran Raman,Mingyi Hong,Shoham Sabach,Branislav Kveton,Volkan Cevher
http://arxiv.org/abs/2401.08893v1
Compressor summary: MADA is a framework that learns the best adaptive optimizer for deep learning during training, and it often outperforms existing optimizers like Adam, Lion, and Adan.