This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2023-12-08 generated by the compressor, my personal LLM-based project.
Lijie Fan,Kaifeng Chen,Dilip Krishnan,Dina Katabi,Phillip Isola,Yonglong Tian
http://arxiv.org/abs/2312.04567v1
Compressor summary: This paper investigates how text-to-image models scale when training vision systems using synthetic data and identifies factors that affect this behavior, finding that synthetic images can be effective in certain scenarios but struggle with generating some concepts, limiting their usefulness for supervised image classifiers.
Saksham Suri,Fanyi Xiao,Animesh Sinha,Sean Chang Culatana,Raghuraman Krishnamoorthi,Chenchen Zhu,Abhinav Shrivastava
http://arxiv.org/abs/2312.04566v1
Compressor summary: Gen2Det is a simple pipeline that uses state-of-the-art image generation methods to create synthetic training data for object detection, improving performance on various settings and tasks.
Haofei Xu,Anpei Chen,Yuedong Chen,Christos Sakaridis,Yulun Zhang,Marc Pollefeys,Andreas Geiger,Fisher Yu
http://arxiv.org/abs/2312.04565v1
Compressor summary: MuRF is a method for sparse view synthesis that uses discretized volumes and convolutional networks to produce high-quality images across various baseline settings and scenes.
Sharath Girish,Kamal Gupta,Abhinav Shrivastava
http://arxiv.org/abs/2312.04564v1
Compressor summary: The authors propose a technique to reduce memory storage requirements for 3D Gaussian splatting in novel-view scene synthesis, achieving faster training and rendering speeds while maintaining visual quality.
Jianyuan Wang,Nikita Karaev,Christian Rupprecht,David Novotny
http://arxiv.org/abs/2312.04563v1
Compressor summary: The paper proposes a new deep learning pipeline called VGGSfM that reconstructs the camera poses and 3D structure of a scene from unconstrained images in an end-to-end differentiable manner, improving performance on three datasets.
Ethan Weber,Aleksander Hołyński,Varun Jampani,Saurabh Saxena,Noah Snavely,Abhishek Kar,Angjoo Kanazawa
http://arxiv.org/abs/2312.04560v1
Compressor summary: NeRFiller uses 2D visual generative models to complete missing parts of 3D scenes or objects, achieving the most 3D consistent and plausible scene completions.
Wen Wang,Kecheng Zheng,Qiuyu Wang,Hao Chen,Zifan Shi,Ceyuan Yang,Yujun Shen,Chunhua Shen
http://arxiv.org/abs/2312.04561v1
Compressor summary: The GenDeF method generates videos by warping a static image with a generative deformation field, which improves visual quality, allows for motion modeling, and enables easy video editing and processing.
Zhaoxi Chen,Fangzhou Hong,Haiyi Mei,Guangcong Wang,Lei Yang,Ziwei Liu
http://arxiv.org/abs/2312.04559v1
Compressor summary: PrimDiffusion is a diffusion-based framework that generates high-quality 3D human models by operating on volumetric primitives, enabling efficient rendering and flexible conditional generation.
Yufan Chen,Lizhen Wang,Qijing Li,Hongjiang Xiao,Shengping Zhang,Hongxun Yao,Yebin Liu
http://arxiv.org/abs/2312.04558v1
Compressor summary: MonoGaussianAvatar is a novel approach that uses 3D Gaussian points and a deformation field to create realistic head avatars from monocular portrait videos, overcoming the limitations of existing methods.
Shoufa Chen,Mengmeng Xu,Jiawei Ren,Yuren Cong,Sen He,Yanping Xie,Animesh Sinha,Ping Luo,Tao Xiang,Juan-Manuel Perez-Rua
http://arxiv.org/abs/2312.04557v1
Compressor summary: GenTron is a family of generative models using Transformer-based diffusion that improves visual quality and can generate videos from text, achieving high win rates in human evaluations.
Simon Frieder,Julius Berner,Philipp Petersen,Thomas Lukasiewicz
http://arxiv.org/abs/2312.04556v1
Compressor summary: The note discusses how large language models can help professional mathematicians by explaining their structure, assessing their mathematical skills, and exploring their impact on the field.
Ruozhen He,Paola Cascante-Bonilla,Ziyan Yang,Alexander C. Berg,Vicente Ordonez
http://arxiv.org/abs/2312.04554v1
Compressor summary: The authors propose SelfEQ, a method that improves object localization in vision-and-language models by generating paraphrases and finetuning for self-consistent visual explanations.
Tomoki Ichikawa,Shohei Nobuhara,Ko Nishino
http://arxiv.org/abs/2312.04553v1
Compressor summary: SPIDeRS uses polarized patterns of light to capture depth, surface normals, and reflectance of objects invisibly for applications in vision, xR, robotics, and HCI.
Sachit Menon,Ishan Misra,Rohit Girdhar
http://arxiv.org/abs/2312.04552v1
Compressor summary: The authors introduce a task called Illustrated Instructions, which generates custom visual instructions based on text input, and propose a new model called StackedDiffusion that outperforms existing methods and enables personalized applications.
Chuanxia Zheng,Andrea Vedaldi
http://arxiv.org/abs/2312.04551v1
Compressor summary: Free3D is a novel view synthesis method that uses a 2D image generator fine-tuned with ray conditioning normalization and multi-view attention to achieve better pose encoding and consistency without needing a 3D representation.
Aritra Dutta,Srijan Das,Jacob Nielsen,Rajatsubhra Chakraborty,Mubarak Shah
http://arxiv.org/abs/2312.04548v1
Compressor summary: MAVREC is a large, diverse video dataset with ground and aerial views for improving object detection in aerial images.
Zhongang Cai,Jianping Jiang,Zhongfei Qing,Xinying Guo,Mingyuan Zhang,Zhengyu Lin,Haiyi Mei,Chen Wei,Ruisi Wang,Wanqi Yin,Xiangyu Fan,Han Du,Liang Pan,Peng Gao,Zhitao Yang,Yang Gao,Jiaqi Li,Tianxiang Ren,Yukun Wei,Xiaogang Wang,Chen Change Loy,Lei Yang,Ziwei Liu
http://arxiv.org/abs/2312.04547v1
Compressor summary: The Digital Life Project is a framework that creates autonomous 3D characters with realistic social interactions and body movements using language and motion synthesis techniques.
Miriam Barrabes,Daniel Mas Montserrat,Margarita Geleta,Xavier Giro-i-Nieto,Alexander G. Ioannidis
http://arxiv.org/abs/2312.04546v1
Compressor summary: The text describes a method for detecting and fixing data shifts using adversarial learning with supervised classifiers and iterative heuristics, which outperforms existing techniques.
Tong Wu,Zhibing Li,Shuai Yang,Pan Zhang,Xinggang Pan,Jiaqi Wang,Dahua Lin,Ziwei Liu
http://arxiv.org/abs/2312.04543v1
Compressor summary: HyperDreamer is a new method for creating realistic and editable 3D models from a single image using advanced techniques for viewing, rendering, and editing.
Yuejiang Liu,Ahmad Rahimi,Po-Chien Luan,Frano Rajič,Alexandre Alahi
http://arxiv.org/abs/2312.04540v1
Compressor summary: The authors study how to represent causal relationships in multi-agent systems, propose a metric learning approach for causal awareness, and demonstrate its effectiveness on pedestrian datasets.
Osman Ülger,Maksymilian Kulicki,Yuki Asano,Martin R. Oswald
http://arxiv.org/abs/2312.04539v1
Compressor summary: The paper presents a novel framework called Self-Seg that uses VLMs to perform open-vocabulary image segmentation without textual input, achieving state-of-the-art results on several datasets.
Jonah Philion,Xue Bin Peng,Sanja Fidler
http://arxiv.org/abs/2312.04535v1
Compressor summary: Key points: - The paper proposes a method to simulate dynamic driving scenarios using discrete sequence modeling - The method discretizes trajectories to centimeter-level resolution and models multi-agent interactions with an encoder-decoder - The method achieves state-of-the-art realism and outperforms prior work on benchmarks - The method can be adapted to improve performance on other datasets and evaluated for scalability and saliency Summary: The paper presents a data-driven, tokenized, and encoder-decoder based method to simulate realistic and interactive driving scenarios, which improves self-driving development and can be applied to different tasks.
Shuliang Ning,Duomin Wang,Yipeng Qin,Zirong Jin,Baoyuan Wang,Xiaoguang Han
http://arxiv.org/abs/2312.04534v1
Compressor summary: The paper introduces ucVTON, a novel method for realistic synthesis of personalized clothing on human images, allowing flexible specification of style and texture conditions, and enabling superior quality and user experience in virtual try-on applications.
Genki Kinoshita,Ko Nishino
http://arxiv.org/abs/2312.04530v1
Compressor summary: Key points: - Monocular depth estimators need scale supervision or suffer from ambiguity - StableCamH is a novel scale-aware method that uses object heights and camera height - StableCamH does not require auxiliary sensors or supervision - StableCamH has a learning-based size prior for car appearance - StableCamH achieves state-of-the-art accuracy and generalizability Summary: StableCamH is a scale-aware monocular depth estimation method that uses object heights and camera height without auxiliary sensors or supervision. It has a learning-based size prior for cars and outperforms related methods.
Yuto Enyo,Ko Nishino
http://arxiv.org/abs/2312.04529v1
Compressor summary: The paper introduces DRMNet, a stochastic inverse rendering method that recovers the full frequency spectrum of illumination and object reflectance from a single image using a diffusion model.
Michael R. Zhang,Nishkrit Desai,Juhan Bae,Jonathan Lorraine,Jimmy Ba
http://arxiv.org/abs/2312.04528v1
Compressor summary: The paper shows how large language models can help improve hyperparameter optimization efficiency by generating code and making better decisions with limited search budgets.
Kohei Yamashita,Vincent Lepetit,Ko Nishino
http://arxiv.org/abs/2312.04527v1
Compressor summary: The paper introduces reflection correspondences, a new type of correspondence that helps estimate camera pose without relying on the background, and proposes methods for using all three kinds of correspondences for robust object shape estimation.
Ozgur Kara,Bariscan Kurtkaya,Hidir Yesiltepe,James M. Rehg,Pinar Yanardag
http://arxiv.org/abs/2312.04524v1
Compressor summary: RAVE is a zero-shot video editing method that uses text-to-image diffusion models to create high-quality, temporally consistent, and semantically preserved videos with various edits and efficient memory requirements.
Alex Costanzino,Pierluigi Zama Ramirez,Giuseppe Lisanti,Luigi Di Stefano
http://arxiv.org/abs/2312.04521v1
Compressor summary: The paper presents a fast framework for anomaly detection using point clouds and RGB images by learning feature mapping between modalities and detecting inconsistencies, achieving state-of-the-art results and improving efficiency with layer pruning.
Yiduo Hao,Sohrab Madani,Junfeng Guan,Mohammed Alloulah,Saurabh Gupta,Haitham Hassanieh
http://arxiv.org/abs/2312.04519v1
Compressor summary: The paper proposes a self-supervised learning method to train radar models for autonomous vehicles using unlabeled data, improving object detection accuracy.
Xutai Ma,Anna Sun,Siqi Ouyang,Hirofumi Inaguma,Paden Tomasello
http://arxiv.org/abs/2312.04515v1
Compressor summary: EMMA is a new translation model that improves monotonic alignment estimation, training, and inference, achieving top performance in speech-to-text translation for Spanish and English.
Sehoon Kim,Suhong Moon,Ryan Tabrizi,Nicholas Lee,Michael W. Mahoney,Kurt Keutzer,Amir Gholami
http://arxiv.org/abs/2312.04511v1
Compressor summary: LLMCompiler is a tool that improves the efficiency and accuracy of multi-function calling in large language models by executing functions in parallel using classical compiler principles.
Jarad Forristal,Niloofar Mireshghallah,Greg Durrett,Taylor Berg-Kirkpatrick
http://arxiv.org/abs/2312.04510v1
Compressor summary: The paper proposes a new Markov Chain (MC) sampler for energy-based language models that can generate longer texts by iteratively prompting a large language model, improving both efficiency and accuracy in controlled text generation tasks.
Derek Lim,Haggai Maron,Marc T. Law,Jonathan Lorraine,James Lucas
http://arxiv.org/abs/2312.04501v1
Compressor summary: The paper introduces Graph Metanetworks (GMNs), a generalizable method for processing graphs representing input neural networks, which can handle various neural architectures and are expressive and equivariant to parameter permutation symmetries.
Xiang Xu,Lingdong Kong,Hui Shuai,Qingshan Liu
http://arxiv.org/abs/2312.04484v1
Compressor summary: FRNet restores contextual information in range-view LiDAR segmentation using frustum-based feature extraction and fusion, achieving competitive performance with high efficiency.
Zhiwu Qing,Shiwei Zhang,Jiayu Wang,Xiang Wang,Yujie Wei,Yingya Zhang,Changxin Gao,Nong Sang
http://arxiv.org/abs/2312.04483v1
Compressor summary: HiGen is a diffusion model-based method that improves text-to-video generation by decoupling spatial and temporal factors, leading to more realistic and diverse videos with semantics accuracy and motion stability.
Zhongchang Luo,Marion Robin,Pavan Vasishta
http://arxiv.org/abs/2312.04479v1
Compressor summary: GSGFormer is a new generative model that predicts pedestrian trajectories by considering complex interactions between pedestrians and their environment, offering diverse behavioral modalities and performing well even with limited data.
Chengshu Li,Jacky Liang,Andy Zeng,Xinyun Chen,Karol Hausman,Dorsa Sadigh,Sergey Levine,Li Fei-Fei,Fei Xia,Brian Ichter
http://arxiv.org/abs/2312.04474v1
Compressor summary: Chain of Code is a method to improve language models' ability to reason by having them write and emulate code for various linguistic tasks, leading to better performance on reasoning benchmarks.
Chenchen Gu,Xiang Lisa Li,Percy Liang,Tatsunori Hashimoto
http://arxiv.org/abs/2312.04469v1
Compressor summary: The paper proposes watermark distillation, a method for teaching models to generate watermarked text with high detectability, and explores its limitations.
Kiran Chhatre,Radek Daněček,Nikos Athanasiou,Giorgio Becherini,Christopher Peters,Michael J. Black,Timo Bolkart
http://arxiv.org/abs/2312.04466v1
Compressor summary: AMUSE is a model that generates realistic 3D human gestures from speech, controlling for content, emotion, and style.
Stathis Galanakis,Alexandros Lattas,Stylianos Moschoglou,Stefanos Zafeiriou
http://arxiv.org/abs/2312.04465v1
Compressor summary: FitDiff is a diffusion-based 3D face model that uses a 2D image to generate realistic and relightable avatars with high performance.
Jiayi Huang,Han Zhong,Liwei Wang,Lin F. Yang
http://arxiv.org/abs/2312.04464v1
Compressor summary: UCRL-WVTR is an algorithm for reinforcement learning that eliminates the planning horizon, achieves sharp regret bounds, and is computationally efficient.
Zhen Li,Mingdeng Cao,Xintao Wang,Zhongang Qi,Ming-Ming Cheng,Ying Shan
http://arxiv.org/abs/2312.04461v1
Compressor summary: PhotoMaker is a fast text-to-image generation method that preserves identity information by encoding multiple input images into a unified ID representation, enabling various applications.
Yuhan Chen,Ang Lv,Ting-En Lin,Changyu Chen,Yuchuan Wu,Fei Huang,Yongbin Li,Rui Yan
http://arxiv.org/abs/2312.04455v1
Compressor summary: The paper introduces Attention Buckets, a method that improves large language models' tool use performance by shaping their attention waveform with multiple processes and angles.
Shmuel Amar,Liat Schiff,Ori Ernst,Asi Shefer,Ori Shapira,Ido Dagan
http://arxiv.org/abs/2312.04440v1
Compressor summary: The paper introduces OpenAsp, a benchmark dataset for multi-document aspect-based summarization, created from existing datasets using a novel annotation protocol.
Yujie Wei,Shiwei Zhang,Zhiwu Qing,Hangjie Yuan,Zhiheng Liu,Yu Liu,Yingya Zhang,Jingren Zhou,Hongming Shan
http://arxiv.org/abs/2312.04433v1
Compressor summary: DreamVideo is a method to generate personalized videos from static images and motion videos by learning subject appearance and target motion patterns using textual inversion, fine-tuning, and adapters.
Shubham Agarwal,Subrata Mitra,Sarthak Chakraborty,Srikrishna Karanam,Koyel Mukherjee,Shiv Saini
http://arxiv.org/abs/2312.04429v1
Compressor summary: The paper introduces approximate-caching, a technique that reduces resource consumption and latency in text-to-image generation using diffusion models by reusing intermediate noise states for similar prompts.
Yabo Chen,Jiemin Fang,Yuyang Huang,Taoran Yi,Xiaopeng Zhang,Lingxi Xie,Xinggang Wang,Wenrui Dai,Hongkai Xiong,Qi Tian
http://arxiv.org/abs/2312.04424v1
Compressor summary: The authors propose a cascade generation framework called Cascade-Zero123 that uses two Zero-1-to-3 models to generate multi-view 3D images from one single image, addressing the challenges of geometric and visual consistency across views for complex objects.
Shivika Prasanna,Deepthi Rao,Eduardo Simoes,Praveen Rao
http://arxiv.org/abs/2312.04423v1
Compressor summary: The text describes how variant-level information from RNA-sequences of COVID-19 patients was represented as a large, scalable knowledge graph, which was used for analysis and inference tasks.
Michelle W. L. Wan,Jeffrey N. Clark,Edward A. Small,Elena Fillola Mayoral,Raúl Santos-Rodríguez
http://arxiv.org/abs/2312.04416v1
Compressor summary: The authors propose methods to measure and track sustainable development using data integration and machine learning.
Jiayi Guo,Xingqian Xu,Yifan Pu,Zanlin Ni,Chaofei Wang,Manushree Vasu,Shiji Song,Gao Huang,Humphrey Shi
http://arxiv.org/abs/2312.04410v1
Compressor summary: The paper proposes Smooth Diffusion, a new category of diffusion models that improve latent space smoothness for better text-to-image generation and other downstream tasks.
karima Makhlouf,Heber H. Arcolezi,Sami Zhioua,Ghassen Ben Brahim,Catuscia Palamidessi
http://arxiv.org/abs/2312.04404v1
Compressor summary: This paper studies how local differential privacy (LDP) affects fairness when multiple sensitive attributes are used, and provides recommendations for balancing privacy, fairness, and utility in machine learning applications.
Dongchen Han,Xiaojun Jia,Yang Bai,Jindong Gu,Yang Liu,Xiaochun Cao
http://arxiv.org/abs/2312.04403v1
Compressor summary: The paper proposes a new method, OT-Attack, to generate high-transferability adversarial examples for VLP models by optimizing the alignment between data-augmented image and text pairs using optimal transport theory.
Yongqi Dong,Xingmin Lu,Ruohan Li,Wei Song,Bart van Arem,Haneen Farah
http://arxiv.org/abs/2312.04398v1
Compressor summary: The paper proposes a four-phase pipeline to detect anomalies in lane rendering maps using self-supervised pre-training with MiM, customized fine-tuning, and post-processing, improving accuracy and efficiency.
Yinhuai Wang,Jing Lin,Ailing Zeng,Zhengyi Luo,Jian Zhang,Lei Zhang
http://arxiv.org/abs/2312.04393v1
Compressor summary: The text describes a new approach called PhysHOI for teaching humanoid robots to imitate human-object interaction using physics-based models and contact graph rewards without task-specific rewards, as well as introducing a dataset of basketball skills for testing the approach.
Carlos E. Luis,Alessandro G. Bottero,Julia Vinogradska,Felix Berkenkamp,Jan Peters
http://arxiv.org/abs/2312.04386v1
Compressor summary: The paper proposes a new uncertainty Bellman equation for model-based reinforcement learning that improves exploration and policy optimization, and introduces QU-SAC, an algorithm that can handle risk-seeking or risk-averse objectives.
Marco Matarese,Francesco Rea,Alessandra Sciutti
http://arxiv.org/abs/2312.04379v1
Compressor summary: The paper proposes an assessment task to measure and compare the information power of XAI systems in user-centred approaches, which could improve interaction between users and systems.
Yunsheng Ma,Can Cui,Xu Cao,Wenqian Ye,Peiran Liu,Juanwu Lu,Amr Abdelraouf,Rohit Gupta,Kyungtae Han,Aniket Bera,James M. Rehg,Ziran Wang
http://arxiv.org/abs/2312.04372v1
Compressor summary: LaMPilot is a framework for autonomous driving that uses code generation with behavioral primitives to handle user instructions, and evaluates LLMs on a custom benchmark with GPT-4 achieving high performance.
Sijing Wu,Yunhao Li,Weitian Zhang,Jun Jia,Yucheng Zhu,Yichao Yan,Guangtao Zhai
http://arxiv.org/abs/2312.04369v1
Compressor summary: The paper introduces SingingHead, a large dataset for singing head animation, and UniSinger, a framework that uses it to achieve both 3D and 2D facial animation for singing.
Dar-Yen Chen,Subhadeep Koley,Aneeshan Sain,Pinaki Nath Chowdhury,Tao Xiang,Ayan Kumar Bhunia,Yi-Zhe Song
http://arxiv.org/abs/2312.04364v1
Compressor summary: The paper introduces methods for generating personalized caricatures from photos and sketches, balancing abstraction and identity while preserving creativity.
Hamed Hematian Hemati,Atousa Toghyani,Atena Souri,Sayed Hesam Alavian,Hossein Sameti,Hamid Beigy
http://arxiv.org/abs/2312.04362v1
Compressor summary: The paragraph introduces PCoQA, a Persian conversational question answering dataset with challenges like open-ended non-factual answers and longer answers.
Zhijing Jin,Yuen Chen,Felix Leeb,Luigi Gresele,Ojasv Kamal,Zhiheng Lyu,Kevin Blin,Fernando Gonzalez Adauto,Max Kleiman-Weiner,Mrinmaya Sachan,Bernhard Schölkopf
http://arxiv.org/abs/2312.04350v1
Compressor summary: The authors propose a new natural language processing task to evaluate whether large language models can perform causal inference using formal rules, and present a challenging dataset and prompting strategy for this purpose.
Jianhua Pei,Jingyu Wang,Dongyuan Shi,Ping Wang
http://arxiv.org/abs/2312.04346v1
Compressor summary: The paper proposes a two-stage denoising diffusion model for accurate power system measurement recovery despite various uncertainties and complex dynamics, with improved efficiency and robustness.
Pengcheng Chen,Ziyan Huang,Zhongying Deng,Tianbin Li,Yanzhou Su,Haoyu Wang,Jin Ye,Yu Qiao,Junjun He
http://arxiv.org/abs/2312.04344v1
Compressor summary: The paper examines how to improve GPT-4V's medical imaging interpretation skills using prompt engineering techniques, leading to more reliable and valuable insights for healthcare.
Ilias Tsoumas,Vasileios Sitokonstantinou,Georgios Giannarakis,Evagelia Lampiri,Christos Athanassiou,Gustau Camps-Valls,Charalampos Kontoes,Ioannis Athanasiadis
http://arxiv.org/abs/2312.04343v1
Compressor summary: The authors propose an advanced data analysis framework to help farmers adopt Integrated Pest Management (IPM) practices by providing accurate pest predictions, interpretable advice, and effective assessments.
Derek Tam,Mohit Bansal,Colin Raffel
http://arxiv.org/abs/2312.04339v1
Compressor summary: The authors propose a new method called MaTS for merging models by matching them based on their task subspace, which improves performance and allows solving intractable problems with various initializations and estimates.
Llukman Cerkezi,Aram Davtyan,Sepehr Sameni,Paolo Favaro
http://arxiv.org/abs/2312.04337v1
Compressor summary: The paper presents a new method for unsupervised training of a diffusion model that can synthesize novel views from single-category datasets using object poses identified by clustering and cross-view consistency ensured by hard-attention guidance, achieving state-of-the-art results on real and synthetic images.
Justine Giroux,Mohammad Reza Karimi Dastjerdi,Yannick Hold-Geoffroy,Javier Vazquez-Corral,Jean-François Lalonde
http://arxiv.org/abs/2312.04334v1
Compressor summary: The authors propose a psychophysical experiment to measure human preference for relit virtual scenes and show that existing image quality assessment metrics do not capture human perception, but a combination of them can improve the evaluation of lighting estimation algorithms.
Nuo Chen,Ning Wu,Shining Liang,Ming Gong,Linjun Shou,Dongmei Zhang,Jia Li
http://arxiv.org/abs/2312.04333v1
Compressor summary: The paper analyzes LLaMA, a natural language processing model, using multiple-choice tasks to measure its understanding in reasoning and computation, finding that larger sizes improve reasoning but not knowledge, and lower layers lack arithmetic and facts while upper layers have more computational power and real-world knowledge.
Julia Borisova,Nikolay O. Nikitin
http://arxiv.org/abs/2312.04330v1
Compressor summary: LANE-SI is an adaptive deep learning model that forecasts sea ice concentration in the Arctic, achieving comparable or better results than existing physical models.
Guang Yang,Jie Li,Hanxiao Lei,Xinbo Gao
http://arxiv.org/abs/2312.04328v1
Compressor summary: The authors propose a multi-scale dual attention framework for fusing infrared and visible images, which measures and integrates complementary information at different scales using structure and loss function, and achieves robust and informative results across scenarios.
Thomas Sanchez
http://arxiv.org/abs/2312.04327v1
Compressor summary: The thesis proposes two algorithms for accelerating MRI acquisition and improving image quality, focusing on Cartesian MRI techniques and comparing them with deep learning methods.
Ruyi Gan,Xiaojun Wu,Junyu Lu,Yuanhe Tian,Dixiang Zhang,Ziwei Wu,Renliang Sun,Chang Liu,Jiaxing Zhang,Pingjian Zhang,Yan Song
http://arxiv.org/abs/2312.04326v1
Compressor summary: The paper presents a text-to-image model for interior design that uses curriculum learning and reinforcement learning to improve prompt-following capabilities and generate high-quality images based on textual descriptions.
Dominik Mattern,Pierre Schumacher,Francisco M. López,Marcel C. Raabe,Markus R. Ernst,Arthur Aubret,Jochen Triesch
http://arxiv.org/abs/2312.04318v1
Compressor summary: The paragraph discusses an open-source multi-modal infant model called MIMo, which simulates early human cognitive development through embodied interactions with the physical and social environment.
Zuyao Chen,Jinlin Wu,Zhen Lei,Zhaoxiang Zhang,Changwen Chen
http://arxiv.org/abs/2312.04314v1
Compressor summary: The authors propose a new method, GPT4SGG, to generate scene graphs from detailed narratives based on images, which improves upon traditional language parsing and localization methods for scene graph generation.
Nils Philipp Walter,Jonas Fischer,Jilles Vreeken
http://arxiv.org/abs/2312.04311v1
Compressor summary: The proposed binary neural network architecture DIFFNAPS can extract differential patterns from high-dimensional data in a scalable and interpretable way, improving the understanding of cellular processes and potentially leading to novel treatments.
Ricky Maulana Fajri,Yulong Pei,Lu Yin,Mykola Pechenizkiy
http://arxiv.org/abs/2312.04307v1
Compressor summary: The Structural-Clustering PageRank method for improved Active learning (SPA) is a simple and effective approach to select informative and central nodes from graph-structured data using community detection and PageRank scoring.
Felix Stollenwerk
http://arxiv.org/abs/2312.04306v1
Compressor summary: nerblackbox is a python library that simplifies using transformer-based models for named entity recognition, offering various options for training, evaluation, and inference.
Yuechen Zhang,Shengju Qian,Bohao Peng,Shu Liu,Jiaya Jia
http://arxiv.org/abs/2312.04302v1
Compressor summary: The study introduces Prompt Highlighter, a method to control text generation from multi-modal LLMs by highlighting specific prompt spans for focused and customized output.
Julius Weißmann,Markus Seidl,Anya Dietrich,Martin Haltrich
http://arxiv.org/abs/2312.04296v1
Compressor summary: The paper shows how using cross-codex training data and neural networks can improve scribe identification from historic manuscripts, allowing for more accurate and efficient paleographic analysis.
S. Nandini,Sanjjushri Varshini R
http://arxiv.org/abs/2312.04275v1
Compressor summary: The text discusses how machine learning can be used to analyze maternal mortality rates in different countries and identify similarities and differences in their factors affecting these rates.
Yufan Liao,Qi Wu,Xing Yan
http://arxiv.org/abs/2312.04273v1
Compressor summary: The paper proposes Invariant Decision Tree (IDT) and Invariant Random Forest (IRF), novel methods for out-of-distribution generalization in decision tree models, motivated by theory and validated by experiments.
Dayoung Gong,Joonseok Lee,Deunsol Jung,Suha Kwak,Minsu Cho
http://arxiv.org/abs/2312.04266v1
Compressor summary: The paper introduces an activity grammar to help neural networks predict actions from videos more accurately and understandably.
Zhixiang Wei,Lin Chen,Yi Jin,Xiaoxiao Ma,Tianle Liu,Pengyang Lin,Ben Wang,Huaian Chen,Jinjin Zheng
http://arxiv.org/abs/2312.04265v1
Compressor summary: The paper introduces Rein, a robust fine-tuning method that uses fewer trainable parameters to improve semantic segmentation with pre-trained vision models, achieving state-of-the-art results.
Huachuan Qiu,Anqi Li,Lizhi Ma,Zhenzhong Lan
http://arxiv.org/abs/2312.04262v1
Compressor summary: PsyChat is a client-centric dialogue system that provides psychological support through online chat by recognizing client behaviors and generating appropriate responses.
Francesco Pacenza,Jessica Zangari
http://arxiv.org/abs/2312.04249v1
Compressor summary: The paper proposes an extension to Answer Set Programming (ASP) that approximates non-integers with rational numbers, improving its ability to model real-world data and information while preserving declarativity and reproducibility.
Xuying Zhang,Bo-Wen Yin,Yuming Chen,Zheng Lin,Yunheng Li,Qibin Hou,Ming-Ming Cheng
http://arxiv.org/abs/2312.04248v1
Compressor summary: TeMO is a novel framework that uses Decoupled Graph Attention and Cross-Grained Contrast supervision to style multiple objects in 3D scenes.
Yiqun Zhang,Zhenyue Qin,Yang Liu,Dylan Campbell
http://arxiv.org/abs/2312.04236v1
Compressor summary: The authors present a method to correct anatomical errors in hand images generated by Stable Diffusion using a specialized dataset, detection, pose estimation, ControlNet, and InstructPix2Pix.
Jeongwhan Choi,Hyowon Wi,Jayoung Kim,Yehjin Shin,Kookjin Lee,Nathaniel Trask,Noseong Park
http://arxiv.org/abs/2312.04234v1
Compressor summary: The authors propose a new self-attention mechanism called graph-filter-based self-attention (GFSA) that improves Transformer performance across different tasks by addressing the oversmoothing problem.
Kang Ge,Chen Wang,Yutao Guo
http://arxiv.org/abs/2312.04233v1
Compressor summary: The authors propose CrackSAM, a large foundation model fine-tuned for crack segmentation using two efficient methods, and show its excellent performance on two unique datasets and challenging conditions.
Mayank Vatsa,Anubhooti Jain,Richa Singh
http://arxiv.org/abs/2312.04231v1
Compressor summary: This paper examines vision-language transformers using BRI principles to improve their trustworthiness and accountability in various applications.
Shuangmei Wang,Yang Cao,Tieru Wu
http://arxiv.org/abs/2312.04225v1
Compressor summary: TLCE is a method that uses multiple pre-trained models and episodic training to recognize new classes without forgetting old ones or overfitting, achieving better results than existing few-shot class-incremental learning approaches.
Ramon Ferrer-i-Cancho,Savithry Namboodiripad
http://arxiv.org/abs/2312.04219v1
Compressor summary: The paragraph discusses the principle of swap distance minimization in word order variations and its cognitive underpinning, and tests it on three flexible order SOV languages.
Timothy K. Mathes,Jessica Inman,Andrés Colón,Simon Khan
http://arxiv.org/abs/2312.04216v1
Compressor summary: The paper introduces CODEX, a method that uses semantic clustering to summarize RL agent behavior in state-action space, making it easier to explain and build trust in high-risk applications.
Manuel Combarro Simón,Pierre Talbot,Grégoire Danoy,Jedrzej Musial,Mohammed Alswaitti,Pascal Bouvry
http://arxiv.org/abs/2312.04210v1
Compressor summary: Key points: - Satellite image mosaic selection problem is a challenge when optimizing multiple parameters - The input includes area of interest, satellite images, requirements, and objectives - The authors propose a new dataset and two models to solve the problem Summary: The paper presents a new problem of selecting satellite images to create mosaics that meet various criteria, and proposes two models and a realistic dataset to address it.
Eliabelle Mauduit,Andrea Simonetto
http://arxiv.org/abs/2312.04209v1
Compressor summary: The paragraph discusses a method for clustering words with both horizontal and vertical constraints using a two-step algorithm that combines soft constraints, graph coarsening, and optimal cut heights.
Ronan Docherty,Isaac Squires,Antonis Vamvakeros,Samuel J. Cooper
http://arxiv.org/abs/2312.04197v1
Compressor summary: SAMBA is a web-based trainable segmentation tool for materials science images that uses SAM for label suggestions and a random forest classifier for robust segmentations.
Adrián Bazaga,Pietro Liò,Gos Micklem
http://arxiv.org/abs/2312.04193v1
Compressor summary: The authors develop a smaller, efficient Spanish language model for question answering based on knowledge distillation from a larger model.
Peng Tang,Xintong Yan,Yang Nan,Xiaobin Hu,Xiaobin Hu,Bjoern H Menzee. Sebastian Krammer,Tobias Lasser
http://arxiv.org/abs/2312.04189v1
Compressor summary: The paper proposes a new fusion method combining dermatological images and patient metadata for skin cancer classification using a joint-individual fusion structure and a fusion attention module, which improves accuracy over existing methods.
Dandan Qiao,Huaxia Rui,Qian Xiong
http://arxiv.org/abs/2312.04180v1
Compressor summary: The text discusses how artificial intelligence performance affects human workers' jobs in different occupations and proposes a framework to analyze the impact of AI on employment.
Cong Guo
http://arxiv.org/abs/2312.04171v1
Compressor summary: The paper proposes a new framework for selecting features on incomplete datasets that considers feature importance in the imputation process and uses an improved reliefF algorithm to learn the feature importance vector.
Jiawei Fan,Chao Li,Xiaolong Liu,Meina Song,Anbang Yao
http://arxiv.org/abs/2312.04168v1
Compressor summary: Af-DCD is a new contrastive learning method for semantic segmentation that improves efficiency and accuracy by using masked features and feature partitions without data augmentation or memory buffer.
Xiaoyu Lin,Laurent Girin,Xavier Alameda-Pineda
http://arxiv.org/abs/2312.04167v1
Compressor summary: The paper introduces MixDVAE, a latent-variable generative model for multi-source dynamics estimation, and demonstrates its effectiveness on computer vision and audio processing tasks.
Xuelin Zhu,Jiuxin Cao,Jian liu,Dongqi Tang,Furong Xu,Weijia Liu,Jiawei Ge,Bo Liu,Qingpei Guo,Tianyi Zhang
http://arxiv.org/abs/2312.04160v1
Compressor summary: The authors propose a method to improve vision-language pre-trained models for multi-label image classification by using an adapter network with random perturbation and large language models for text generation, enabling automated visual label recognition.
Fei Wang,Dan Guo,Kun Li,Meng Wang
http://arxiv.org/abs/2312.04152v1
Compressor summary: The paper proposes a novel dynamic filtering strategy for video motion magnification that separates texture and shape, eliminates noise, and preserves critical features using a global dynamic sparse cross-covariance attention mechanism and a multi-scale dual-path gating mechanism.
Nir Zabari,Aharon Azulay,Alexey Gorkor,Tavi Halperin,Ohad Fried
http://arxiv.org/abs/2312.04145v1
Compressor summary: The paper proposes a new method for colorizing grayscale images using diffusion techniques and text prompts, improving both visual quality and user control over the process.
Tiantian Wang,Xinxin Zuo,Fangzhou Mu,Jian Wang,Ming-Hsuan Yang
http://arxiv.org/abs/2312.04143v1
Compressor summary: The paper proposes a method for stylizing human videos in 4D (3D and time) by using NeRFs to represent both the person and their surroundings, allowing for animation across poses and viewpoints.
Ching Chang,Chiao-Tung Chan,Wei-Yao Wang,Wen-Chih Peng,Tien-Fu Chen
http://arxiv.org/abs/2312.04142v1
Compressor summary: TimeDRL is a novel framework that learns disentangled embeddings from multivariate time-series data using timestamp-predictive and instance-contrastive tasks, without relying on augmentation methods or transformation-invariance.
Ryota Maeda,Shinsaku Hiura
http://arxiv.org/abs/2312.04140v1
Compressor summary: The paper introduces a novel polarimetric method to decompose specular inter-reflections of metal objects by analyzing the rotation direction of linear polarization.
Edwin C. Y. Koh
http://arxiv.org/abs/2312.04134v1
Compressor summary: The paper proposes a workflow using a large language model to help create Design Structure Matrices for complex engineering systems, which could save time and resources compared to traditional manual methods.
Yanrui Du,Sendong Zhao,Ming Ma,Yuhan Chen,Bing Qin
http://arxiv.org/abs/2312.04127v1
Compressor summary: The paper introduces a new jailbreak attack method called RADIAL, which exploits the inherent response tendencies of large language models to generate harmful responses when given specific real-world instructions with embedded malicious instructions.
Rasel Ahmed Bhuiyan,Adam Czajka
http://arxiv.org/abs/2312.04125v1
Compressor summary: The paper presents a new iris synthesis model using StyleGAN to generate realistic post-mortem iris images for data collection and training purposes in forensic identification.
Timothy Schaumlöffel,Arthur Aubret,Gemma Roig,Jochen Triesch
http://arxiv.org/abs/2312.04118v1
Compressor summary: The study proposes a computational model to investigate how caregivers' utterances during play sessions can enhance infants' ability to recognize and categorize objects visually.
Yunhan Zhao,Haoyu Ma,Shu Kong,Charless Fowlkes
http://arxiv.org/abs/2312.04117v1
Compressor summary: The authors introduce a new dataset and evaluation protocol for instance tracking in real-world 3D scenes from egocentric videos, and present a simple method that outperforms SOT-based approaches.
Zijian Shen,Zhenping Mu,Xiangxiang Li
http://arxiv.org/abs/2312.04113v1
Compressor summary: The text describes a new neural network model for vehicle target detection and distance estimation in automobiles, which improves safety warnings and provides suggestions based on nonparametric testing.
Henan Sun,Xunkai Li,Zhengyu Wu,Daohan Su,Rong-Hua Li,Guoren Wang
http://arxiv.org/abs/2312.04111v1
Compressor summary: AMUD introduces a new GNN model that adapts to homophily and heterophily in directed graphs, improving node representations and graph learning efficiency.
Jiayi Kong,Baixin Xu,Xurui Song,Chen Qian,Jun Luo,Ying He
http://arxiv.org/abs/2312.04106v1
Compressor summary: The proposed method reconstructs 3D head geometry with NeRF using identity-obscured inputs to preserve facial privacy.
Wei Liu,Haozhao Wang,Jun Wang,Zhiying Deng,YuanKai Zhang,Cheng Wang,Ruixuan Li
http://arxiv.org/abs/2312.04103v1
Compressor summary: The paper proposes DAR, a method that improves explanation quality in deep learning models by aligning the selected rationale with the original input to avoid the rationale shift problem.
Tuan Hoang,Santu Rana,Sunil Gupta,Svetha Venkatesh
http://arxiv.org/abs/2312.04095v1
Compressor summary: Projected-Gradient Unlearning (PGU) is a method for removing specific data samples from a machine learning model without affecting its performance on the remaining dataset, using an efficient algorithm that can handle any model and dataset size.
Yong Liu,Sule Bai,Guanbin Li,Yitong Wang,Yansong Tang
http://arxiv.org/abs/2312.04089v1
Compressor summary: The paper proposes SCAN, a method for open-vocabulary segmentation that uses CLIP's generalized contextual prior to improve alignment of visual content with unbounded text and introduces SG-IoU, a new metric to address semantic duplication issues.
Zongjie Li,Chaozheng Wang,Chaowei Liu,Pingchuan Ma,Daoyuan Wu,Shuai Wang,Cuiyun Gao
http://arxiv.org/abs/2312.04087v1
Compressor summary: The study analyzes the performance of Large Multimodal Models (LMMs) using various visual referring prompting strategies, introducing a new benchmark dataset called VRPTEST and finding that the choice of prompt strategy significantly affects accuracy.
Gyeongrok Oh,Jaehwan Jeong,Sieun Kim,Wonmin Byeon,Jinkyu Kim,Sungwoong Kim,Hyeokmin Kwon,Sangpil Kim
http://arxiv.org/abs/2312.04086v1
Compressor summary: The authors propose a novel method for generating videos from multiple texts with diverse events, using a pre-trained diffusion-based model and several techniques to ensure visual consistency and coherence.
Dario Piga,Filippo Pura,Marco Forgione
http://arxiv.org/abs/2312.04083v1
Compressor summary: The paper explores how adapting meta-models can improve predictive performance in different scenarios of system identification, enhancing robustness and versatility.
Zhaoheng Zheng,Jingmin Wei,Xuefeng Hu,Haidong Zhu,Ram Nevatia
http://arxiv.org/abs/2312.04076v1
Compressor summary: The paper proposes LLaMP, a method to integrate Large Language Models into pre-trained Vision-Language models for low-shot image classification by generating adaptive prompts for the CLIP text encoder.
Florian Lalande,Yoshitomo Matsubara,Naoya Chiba,Tatsunori Taniai,Ryo Igarashi,Yoshitala Ushiku
http://arxiv.org/abs/2312.04070v1
Compressor summary: The paper introduces a new Transformer model for Symbolic Regression that can find mathematical expressions for datasets without interpretation issues, but with more computation and flexibility needed to avoid overfitting, achieving state-of-the-art results on SRSD datasets.
Dehua Peng,Zhipeng Gui,Huayi Wu
http://arxiv.org/abs/2312.04067v1
Compressor summary: The paper proposes a new graph clustering method, MeanCut, that uses path-based similarity to handle non-spherical data and improve cluster associations, while reducing computational complexity and enhancing robustness.
Thomas Westfechtel,Dexuan Zhang,Tatsuya Harada
http://arxiv.org/abs/2312.04066v1
Compressor summary: The paper proposes a method that combines unsupervised domain adaptation with vision-language models to improve zero-shot prediction accuracy on image classification tasks, using data from source and target domains and adjusting class probabilities.
Dehua Peng,Zhipeng Gui,Huayi Wu
http://arxiv.org/abs/2312.04065v1
Compressor summary: The paper proposes LoDD, a method for detecting boundary points in machine learning tasks, which uses KNN and eigenvalues of the covariance matrix to measure centrality and performs well on synthetic and real datasets.
Israt Zarin Era,Imtiaz Ahmed,Zhichao Liu,Srinjoy Das
http://arxiv.org/abs/2312.04063v1
Compressor summary: The paragraph describes a framework for real-time image segmentation in manufacturing using a Foundation model with unsupervised prompt generation, which could improve product quality and enable Industry 4.0.
Junsheng Zhou,Baorui Ma,Wenyuan Zhang,Yi Fang,Yu-Shen Liu,Zhizhong Han
http://arxiv.org/abs/2312.04060v1
Compressor summary: The authors propose a novel method to register 2D images and 3D point clouds using a structured cross-modality latent space learned by a triplet network and a differentiable probabilistic PnP solver, achieving state-of-the-art results on KITTI and nuScenes datasets.
Zhuoran Huang,Michael P. Berry,Christina Chwyl,Gary Hsieh,Jing Wei,Evan M. Forman
http://arxiv.org/abs/2312.04059v1
Compressor summary: LLM AI chatbots like ChatGPT can generate personalized and novel weight-loss coaching messages that are as helpful as human-written ones, but need improvements in authenticity and data focus.
Fei Huang,Jianrong Lv,Yang Yue
http://arxiv.org/abs/2312.04055v1
Compressor summary: The paper proposes a new method (ST-GraphRL) to represent human trajectories in a way that captures their spatial and temporal dependencies, which improves the performance of geospatial foundation models.
Amica De Jager,Vukosi Marivate,Abioudun Modupe
http://arxiv.org/abs/2312.04052v1
Compressor summary: The paper presents a multimodal misinformation detection model for South African social media that uses textual and visual information, and shows its improved performance compared to unimodal models.
Qiuxiao Chen,Xiaojun Qi
http://arxiv.org/abs/2312.04044v1
Compressor summary: The paper proposes a Residual Graph Convolutional (RGC) module for Bird's-Eye-View semantic segmentation that improves global information and region-level semantic relationships using graph space projection and data augmentation.
Hmrishav Bandyopadhyay,Subhadeep Koley,Ayan Das,Aneeshan Sain,Pinaki Nath Chowdhury,Tao Xiang,Ayan Kumar Bhunia,Yi-Zhe Song
http://arxiv.org/abs/2312.04043v1
Compressor summary: The paper presents a new framework for generating 3D shapes from sketches that simplifies the process, allows editing, and works efficiently.
Zhijun Zeng,Pipi Hu,Chenglong Bao,Yi Zhu,Zuoqiang Shi
http://arxiv.org/abs/2312.04038v1
Compressor summary: The paper proposes a method to reconstruct dynamical systems from data without time labels using sliced Wasserstein distance to minimize distribution loss.
Weilin Wan,Yiming Huang,Shutong Wu,Taku Komura,Wenping Wang,Dinesh Jayaraman,Lingjie Liu
http://arxiv.org/abs/2312.04036v1
Compressor summary: The study presents a method for generating diverse and smooth human motion sequences from text descriptions using a network encoder and a conditional diffusion model in the frequency domain.
Jaehyung Kim,Yuning Mao,Rui Hou,Hanchao Yu,Davis Liang,Pascale Fung,Qifan Wang,Fuli Feng,Lifu Huang,Madian Khabsa
http://arxiv.org/abs/2312.04032v1
Compressor summary: The paper proposes RoAST, a technique that enhances the multi-perspective robustness of pre-trained language models by incorporating adversarial perturbation and selective training during fine-tuning.
Athul Paul Jacob,Abhishek Gupta,Jacob Andreas
http://arxiv.org/abs/2312.04030v1
Compressor summary: The latent inference budget model (L-IBM) is a new approach that explicitly simulates agents' computational constraints in models of bounded rationality, and shows promising results in various tasks involving suboptimal decision-making.
Zhenduo Zhang
http://arxiv.org/abs/2312.04029v1
Compressor summary: The paper introduces a new method for improving face recognition by using cluster knowledge from face clustering tasks in two ways, extending ArcFace with a cluster-guided angular margin and aligning cluster centers with class centers in the classifier.
Mingwu Zheng,Haiyu Zhang,Hongyu Yang,Liming Chen,Di Huang
http://arxiv.org/abs/2312.04028v1
Compressor summary: The paper introduces ImFace++, a novel 3D morphable face model that learns continuous neural representations with disentangled deformation fields, refinement displacement field, and Neural Blend-Field to capture complex facial shapes and expressions for various computer vision and graphics applications.
Binghui Peng
http://arxiv.org/abs/2312.04027v1
Compressor summary: The paper provides an algorithm for multi-distribution learning with sample complexity $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$ and resolves a COLT 2023 open problem.
Shashank Kotyan,Ueda Tatsuya,Danilo Vasconcellos Vargas
http://arxiv.org/abs/2312.04024v1
Compressor summary: The k* Distribution method analyzes the structure of sample distributions within specific classes in the learned latent space of neural networks, revealing different distribution types and enabling a deeper understanding of how these networks process various classes.
Hanlin Zhang,Yi-Fan Zhang,Yaodong Yu,Dhruv Madeka,Dean Foster,Eric Xing,Hima Lakkaraju,Sham Kakade
http://arxiv.org/abs/2312.04021v1
Compressor summary: The study examines the trade-offs between performance and calibration of large language models in in-context learning tasks and suggests that current recalibration techniques may not be sufficient for ensuring reliability.
Ardian Umam,Cheng-Kun Yang,Min-Hung Chen,Jen-Hui Chuang,Yen-Yu Lin
http://arxiv.org/abs/2312.04016v1
Compressor summary: The paper introduces PartDistill, a framework that transfers 2D knowledge from vision-language models to improve 3D shape part segmentation using cross-modal distillation.
Kairui Yang,Zihao Guo,Gengjie Lin,Haotian Dong,Die Zuo,Jibin Peng,Zhao Huang,Zhecheng Xu,Fupeng Li,Ziyun Bai,Di Lin
http://arxiv.org/abs/2312.04008v1
Compressor summary: The authors propose a natural-language-driven simulation for creating realistic object interactions in virtual driving scenes and present a new method called SimCopilot to evaluate their approach using the Language-to-Interaction dataset.
Youngwan Lee,Kwanyong Park,Yoorhim Cho,Yong-Ju Lee,Sung Ju Hwang
http://arxiv.org/abs/2312.04005v1
Compressor summary: The paper proposes an efficient text-to-image model by distilling knowledge from the larger and faster Stable Diffusion XL model, addressing its high computation cost and size requirements.
Vimal Thilak,Chen Huang,Omid Saremi,Laurent Dinh,Hanlin Goh,Preetum Nakkiran,Joshua M. Susskind,Etai Littwin
http://arxiv.org/abs/2312.04000v1
Compressor summary: LiDAR is a metric that measures the quality of representations in joint embedding architectures by quantifying the rank of the LDA matrix associated with a surrogate self-supervised learning task.
Navid Mohammadi Foumani,Chang Wei Tan,Geoffrey I. Webb,Mahsa Salehi
http://arxiv.org/abs/2312.03998v1
Compressor summary: The authors propose a new self-supervised method called Series2Vec, which predicts the similarity between two time series in both temporal and spectral domains, and shows its effectiveness on various real-world datasets.
Boyang Deng,Yuzhen Lu
http://arxiv.org/abs/2312.03996v1
Compressor summary: The paragraph discusses using stable diffusion generative models to improve object detection and classification tasks with synthetic images from small datasets, and evaluates their performance on various categories from the COCO dataset and weed species in Michigan.
Sloke Shrestha,Sundar Sripada V. S.,Asvin Venkataramanan
http://arxiv.org/abs/2312.03993v1
Compressor summary: The report describes using stable-diffusion-v1.5 with LoRA to perform style transfer on Calvin and Hobbes comics, achieving good visual results.
Xiao-Yin Liu,Xiao-Hu Zhou,Guo-Tao Li,Hao Li,Mei-Jiang Gui,Tian-Yu Xiang,De-Xing Huang,Zeng-Guang Hou
http://arxiv.org/abs/2312.03991v1
Compressor summary: The paper proposes a new model-based offline reinforcement learning algorithm (MICRO) that balances performance and robustness by using a conservative Bellman operator and reduces computation cost compared to previous methods.
Weijian Zheng,Jun-Sang Park,Peter Kenesei,Ahsan Ali,Zhengchun Liu,Ian T. Foster,Nicholas Schwarz,Rajkumar Kettimuthu,Antonino Miceli,Hemant Sharma
http://arxiv.org/abs/2312.03989v1
Compressor summary: The paragraph describes a new automated technique for quickly detecting plasticity in metallic materials using high-energy X-ray microscopy, which is faster and works with sparser data than traditional methods.
Meihao Fan,Xiaoyue Han,Ju Fan,Chengliang Chai,Nan Tang,Guoliang Li,Xiaoyong Du
http://arxiv.org/abs/2312.03987v1
Compressor summary: The paper proposes BATCHER, a cost-effective batch prompting approach for entity resolution using large language models without fine-tuning or manual prompting.
Yuni Lai,Yulin Zhu,Bailin Pan,Kai Zhou
http://arxiv.org/abs/2312.03979v1
Compressor summary: The text introduces a new framework for defending DGL models against graph injection attacks, which is model-agnostic and provides theoretical and empirical evidence of its effectiveness.
Shibin Wu,Bang Yang,Zhiyu Ye,Haoqian Wang,Hairong Zheng,Tong Zhang
http://arxiv.org/abs/2312.03970v1
Compressor summary: The study improves medical report generation using a customized vision-language model that integrates adapter tuning and medical knowledge enhancement, achieving better accuracy and coherence than existing methods.