This page contains one-sentence summaries of cs.AI/ML/CV papers announced on 2023-11-27 generated by the compressor, my personal LLM-based project.
Munan Ning,Bin Zhu,Yujia Xie,Bin Lin,Jiaxi Cui,Lu Yuan,Dongdong Chen,Li Yuan
http://arxiv.org/abs/2311.16103v1
Compressor summary: The paper introduces Video-Bench, a comprehensive evaluation system for video-based large language models, with 10 tasks covering understanding, question-answering, and decision-making.
Mihir Prabhudesai,Tsung-Wei Ke,Alexander C. Li,Deepak Pathak,Katerina Fragkiadaki
http://arxiv.org/abs/2311.16102v1
Compressor summary: Diffusion-TTA adapts pre-trained discriminative models using generative feedback from a diffusion model, improving their accuracy in various tasks.
Haoqin Tu,Chenhang Cui,Zijun Wang,Yiyang Zhou,Bingchen Zhao,Junlin Han,Wangchunshu Zhou,Huaxiu Yao,Cihang Xie
http://arxiv.org/abs/2311.16101v1
Compressor summary: This study evaluates Vision LLMs' visual reasoning abilities by introducing a comprehensive safety evaluation suite that covers OOD generalization and adversarial robustness, revealing their strengths and weaknesses in handling different conditions.
Jiahui Lei,Yufu Wang,Georgios Pavlakos,Lingjie Liu,Kostas Daniilidis
http://arxiv.org/abs/2311.16099v1
Compressor summary: GART is a model that uses moving 3D Gaussians to represent deformable subjects in monocular videos with efficient reconstruction and rendering.
Christian Diller,Angela Dai
http://arxiv.org/abs/2311.16097v1
Compressor summary: CG-HOI is a method for generating realistic 3D human-object interactions from text by modeling contact between the human body and object geometry.
Zhe Li,Zerong Zheng,Lizhen Wang,Yebin Liu
http://arxiv.org/abs/2311.16096v1
Compressor summary: The paper presents a new method for creating realistic and dynamic human avatars using a combination of 2D and 3D neural networks, which can adapt to different clothing styles and poses.
Aiyu Cui,Jay Mahajan,Viraj Shah,Preeti Gomathinayagam,Svetlana Lazebnik
http://arxiv.org/abs/2311.16094v1
Compressor summary: The paper introduces a Street TryOn benchmark and a novel method for virtual try-on on in-the-wild scenes without paired data, using DensePose warping correction and diffusion-based inpainting.
Luca M. Schulze Buschoff,Elif Akata,Matthias Bethge,Eric Schulz
http://arxiv.org/abs/2311.16093v1
Compressor summary: The paper evaluates how well vision-based large language models perform in intuitive physics, causal reasoning, and intuitive psychology tasks, finding that they are still far from human capabilities in these domains.
Tsung-Han Wu,Long Lian,Joseph E. Gonzalez,Boyi Li,Trevor Darrell
http://arxiv.org/abs/2311.16090v1
Compressor summary: SLD is a framework that generates images from text prompts, assesses their alignment, and performs self-corrections to ensure correctness in the resulting image, without needing additional training or integrating with existing diffusion models.
Afra Feyza Akyürek,Eric Pan,Garry Kuwanto,Derry Wijaya
http://arxiv.org/abs/2311.16087v1
Compressor summary: This paragraph discusses a study that explores different ways to edit language models beyond factual data, introduces a new benchmark called DUnE, and shows that no existing methods have completely solved the generalized editing problem.
Yury Demidovich,Grigory Malinovsky,Egor Shulgin,Peter Richtárik
http://arxiv.org/abs/2311.16086v1
Compressor summary: The text introduces a new optimization problem that uses pre-trained models and random sketch operators for sparsification during machine learning model training, leading to improved convergence rates and relaxed assumptions.
Dmitri Roussinov,Serge Sharoff
http://arxiv.org/abs/2311.16083v1
Compressor summary: The paper shows that PLMs struggle with topic changes in text classification tasks, proposes using synthetic texts to improve performance, and provides empirical results and code for replication.
Weixian Lei,Yixiao Ge,Kun Yi,Jianfeng Zhang,Difei Gao,Dylan Sun,Yuying Ge,Ying Shan,Mike Zheng Shou
http://arxiv.org/abs/2311.16081v1
Compressor summary: The paper introduces ViT-Lens-2, a method for efficient learning of diverse modalities using pretrained vision transformers and modality alignment with existing foundation models.
Zeming Chen,Alejandro Hernández Cano,Angelika Romanou,Antoine Bonnet,Kyle Matoba,Francesco Salvi,Matteo Pagliardini,Simin Fan,Andreas Köpf,Amirkeivan Mohtashami,Alexandre Sallinen,Alireza Sakhaeirad,Vinitra Swamy,Igor Krawczuk,Deniz Bayazit,Axel Marmet,Syrielle Montariol,Mary-Anne Hartley,Martin Jaggi,Antoine Bosselut
http://arxiv.org/abs/2311.16079v1
Compressor summary: MEDITRON is an open-source suite of large-scale medical language models that outperform several closed-source models on medical benchmarks.
François Remy,Kris Demuynck,Thomas Demeester
http://arxiv.org/abs/2311.16075v1
Compressor summary: The study uses Large Language Models and UMLS knowledge graph to create high-fidelity representations of biomedical concepts and sentences, improving performance on various tasks and releasing a multilingual model.
Xianghua Xie,Chen Hu,Hanchi Ren,Jingjing Deng
http://arxiv.org/abs/2311.16065v1
Compressor summary: This paper reviews malicious attacks on federated learning (FL) systems, categorizes them into four types, and discusses defense strategies that aim to protect FL's learning process, data, and models from manipulation and sabotage.
Zhaoyang Xia,Carol Neidle,Dimitris N. Metaxas
http://arxiv.org/abs/2311.16060v1
Compressor summary: The research introduces DiffSLVA, a method that uses diffusion models and image features to anonymize sign language videos without losing linguistic content, potentially benefiting Deaf and Hard-of-Hearing communities.
Katharina Limbeck,Rayna Andreeva,Rik Sarkar,Bastian Rieck
http://arxiv.org/abs/2311.16054v1
Compressor summary: The paragraph introduces magnitude as a measure of the effective size of a space, and presents a new quality measure for dimensionality reduction tasks based on dissimilarity between magnitude functions.
Rishubh Parihar,Prasanna Balaji,Raghav Magazine,Sarthak Vora,Tejan Karmali,Varun Jampani,R. Venkatesh Babu
http://arxiv.org/abs/2311.16052v1
Compressor summary: The paper proposes a new method for diverse attribute editing by modeling multidimensional attribute edits using disentangled latent spaces of pretrained GANs and training a Denoising Diffusion Probabilistic Model.
Jian Gao,Chun Gu,Youtian Lin,Hao Zhu,Xun Cao,Li Zhang,Yao Yao
http://arxiv.org/abs/2311.16043v1
Compressor summary: The paragraph describes a new method to render 3D scenes from multiple images using point-based rendering, which allows for editing, ray-tracing, and real-time relighting of the scene.
Jane Wu,Diego Thomas,Ronald Fedkiw
http://arxiv.org/abs/2311.16042v1
Compressor summary: The paper proposes a new method using deep learning to reconstruct 3D clothed humans from 2D normal maps and RGB images, without volumetric information ground truth.
Wenzhao Zheng,Weiliang Chen,Yuanhui Huang,Borui Zhang,Yueqi Duan,Jiwen Lu
http://arxiv.org/abs/2311.16038v1
Compressor summary: The paper proposes OccWorld, a world model that predicts the movement of the ego car and the evolution of surrounding scenes in 3D occupancy space, using scene tokens and a GPT-like generative transformer.
Jiemin Fang,Junjie Wang,Xiaopeng Zhang,Lingxi Xie,Qi Tian
http://arxiv.org/abs/2311.16037v1
Compressor summary: The GaussianEditor framework allows delicate and precise editing of 3D scenes using text instructions and 3D Gaussians, with faster training speed compared to previous methods.
Yutian Pang,Peng Zhao,Jueming Hu,Yongming Liu
http://arxiv.org/abs/2311.16030v1
Compressor summary: The paper proposes a machine learning-enhanced landing scheduling method that reduces aircraft delays, improves safety, and considers uncertainties in flight events.
Dennis Frauen,Fergus Imrie,Alicia Curth,Valentyn Melnychuk,Stefan Feuerriegel,Mihaela van der Schaar
http://arxiv.org/abs/2311.16026v1
Compressor summary: NeuralCSA is a neural framework for causal sensitivity analysis that works with various sensitivity models, treatment types, and causal queries, and can infer valid bounds on the causal query of interest.
Yuantao Fan,Zhenkan Wang,Sepideh Pashami,Slawomir Nowaczyk,Henrik Ydreskog
http://arxiv.org/abs/2311.16003v1
Compressor summary: The paper proposes a method to improve energy consumption prediction and explainability for electric commercial vehicles by training multiple regression models on subsets of data based on relevant sub-populations.
Teo Deveney,Jan Stanczuk,Lisa Maria Kreusser,Chris Budd,Carola-Bibiane Schönlieb
http://arxiv.org/abs/2311.15996v1
Compressor summary: This paper analyses the differences between ODE and SDE dynamics in score-based diffusion models and proposes a regularisation term to reduce these differences, but it may degrade SDE sample quality.
Evelyn Herberg,Roland Herzog,Frederik Köhne,Leonie Kreis,Anton Schiela
http://arxiv.org/abs/2311.15995v1
Compressor summary: The paper presents a method to insert new layers in neural networks during training using sensitivity-based techniques, which improves training efficiency and reduces computational effort.
Shaobo Wang,Xiangdong Zhang,Junchi Yan
http://arxiv.org/abs/2311.15993v1
Compressor summary: UBN is a two-stage framework that alleviates feature condensation and unifies various BN variants to improve neural network training stability and convergence.
Zeyun Zhong,Chengzhi Wu,Manuel Martin,Michael Voit,Juergen Gall,Jürgen Beyerer
http://arxiv.org/abs/2311.15991v1
Compressor summary: The authors propose a new generative model that captures different possible future actions by iteratively generating them from Gaussian noise, conditioned on the observed video, and show its effectiveness on four benchmark datasets.
Shikai Qiu,Tim G. J. Rudner,Sanyam Kapoor,Andrew Gordon Wilson
http://arxiv.org/abs/2311.15990v1
Compressor summary: The text discusses alternatives to standard regularized training methods by directly estimating the most likely function implied by a model and data, which can improve generalization and robustness.
Yilun Liu,Difan Jiao,Ashton Anderson
http://arxiv.org/abs/2311.15983v1
Compressor summary: The Sparsify-then-Classify (STC) approach improves text classification performance by using all internal representations of Large Language Models with multiple pooling strategies and sparsifying task-specific features.
Yuanxun Lu,Jingyang Zhang,Shiwei Li,Tian Fang,David McKinnon,Yanghai Tsin,Long Quan,Xun Cao,Yao Yao
http://arxiv.org/abs/2311.15980v1
Compressor summary: The authors propose a novel method for generating diverse and high-fidelity 3D content using a multi-view 2.5D diffusion model that is fine-tuned from a pre-trained 2D diffusion model, without the need for score distillation sampling or extensive 3D training data.
Weiying Zhao,Natalia Efremova
http://arxiv.org/abs/2311.15979v1
Compressor summary: The study compared four Graph Neural Network operators to estimate soil organic carbon using satellite data and found that PESAGE and PETransformer models performed best, showing the potential of GNNs in predicting SOC.
Yan Xia,Letian Shi,Zifeng Ding,João F. Henriques,Daniel Cremers
http://arxiv.org/abs/2311.15977v1
Compressor summary: The Text2Loc neural network uses natural language descriptions and a coarse-to-fine localization pipeline to improve 3D point cloud localization accuracy by up to 2 times over previous methods.
Thanh-Dat Truong,Utsav Prabhu,Bhiksha Raj,Jackson Cothren,Khoa Luu
http://arxiv.org/abs/2311.15965v1
Compressor summary: The paper introduces a new Fairness Contrastive Clustering loss to address catastrophic forgetting and fairness in continual learning for semantic scene understanding, and proposes an attention-based visual grammar approach for background shift and unknown classes.
Anil Batra,Davide Moltisanti,Laura Sevilla-Lara,Marcus Rohrbach,Frank Keller
http://arxiv.org/abs/2311.15964v1
Compressor summary: Sieve-&-Swap is a technique to automatically filter and improve procedural video transcripts for better step localization and instruction generation with less computational resources.
Fabricio Breve
http://arxiv.org/abs/2311.15963v1
Compressor summary: The paper shows how convolutional neural networks can identify video games from single screenshots with high accuracy, using different architectures and initial weights.
Yu-An Lin,Chen-Tao Lee,Guan-Ting Liu,Pu-Jen Cheng,Shao-Hua Sun
http://arxiv.org/abs/2311.15960v1
Compressor summary: The paper introduces Program Machine Policies (POMPs), which combine programmatic RL and state machine policies to represent complex behaviors and address long-term tasks, outperforming previous methods on various tasks and generalizing inductively without fine-tuning.
Shuyue Stella Li,Beining Xu,Xiangyu Zhang,Hexin Liu,Wenhan Chao,Leibny Paola Garcia
http://arxiv.org/abs/2311.15954v1
Compressor summary: The study examines how self-supervised learning (SSL) models perform as feature extractors in cross-lingual settings and proposes a new metric, PSR, to measure their effectiveness using ASR performance.
Dhruva Tirumala,Thomas Lampe,Jose Enrique Chen,Tuomas Haarnoja,Sandy Huang,Guy Lever,Ben Moran,Tim Hertweck,Leonard Hasenclever,Martin Riedmiller,Nicolas Heess,Markus Wulfmeier
http://arxiv.org/abs/2311.15951v1
Compressor summary: Replay Across Experiments (RaE) is a simple framework that uses experience from previous experiments to improve RL performance, exploration, and bootstrapping while requiring minimal changes.
Antonio Di Cecco,Carlo Metta,Marco Fantozzi,Francesco Morandin,Maurizio Parton
http://arxiv.org/abs/2311.15947v1
Compressor summary: GloNet is a new architecture that helps deep neural networks work better at higher depths by uniformly connecting and regulating information flow across the network.
Tuan-Dung Le,Zhuqi Miao,Samuel Alvarado,Brittany Smith,William Paiva,Thanh Thieu
http://arxiv.org/abs/2311.15946v1
Compressor summary: The paragraph introduces a new dataset for extracting and analyzing mobility functioning information from clinical notes using BERT and CRF models.
Julia Balla
http://arxiv.org/abs/2311.15945v1
Compressor summary: The paper investigates whether using Riemannian manifolds of variable curvature in Hyperbolic Graph Neural Networks (HGNNs) can reduce over-squashing, a phenomenon where node features become insensitive to distant nodes in the graph.
Sicong Leng,Yang Zhou,Mohammed Haroon Dupty,Wee Sun Lee,Sam Conrad Joyce,Wei Lu
http://arxiv.org/abs/2311.15941v1
Compressor summary: The authors introduce a new dataset, model, and evaluation method for generating floor plans from natural language descriptions, aiming to advance the field of language-guided design generation.
Samuel Burbulla
http://arxiv.org/abs/2311.15940v1
Compressor summary: The paper proposes a method to improve physics-informed neural networks (PINNs) by incorporating geometric transformations, allowing them to handle complex or varying shapes better and enable shape optimization.
Zhongyi Shui,Yunlong Zhang,Kai Yao,Chenglu Zhu,Yuxuan Sun,Lin Yang
http://arxiv.org/abs/2311.15939v1
Compressor summary: The paper introduces a novel framework that uses a point prompter and a segment anything model (SAM) for automatic nuclear instance segmentation in histology images, achieving state-of-the-art results.
Sergio Izquierdo,Javier Civera
http://arxiv.org/abs/2311.15937v1
Compressor summary: SALAD is a new method for visual place recognition that uses optimal transport to aggregate local features, discards non-informative ones, and leverages a fast-learning backbone to achieve better performance than existing approaches.
Qixiao Hu,Shiquan Zhang,Chaolang Hu,Yuetong Liu
http://arxiv.org/abs/2311.15933v1
Compressor summary: The paper proposes a new method for multi-attribute group decision-making using TOPSIS and optimization models with interval-valued intuitionistic fuzzy sets, which combines subjective and objective weighting methods and is demonstrated on a real case study.
Youssef Benchekroun,Megi Dervishi,Mark Ibrahim,Jean-Baptiste Gaya,Xavier Martinet,Grégoire Mialon,Thomas Scialom,Emmanuel Dupoux,Dieuwke Hupkes,Pascal Vincent
http://arxiv.org/abs/2311.15930v1
Compressor summary: WorldSense is a benchmark to test LLMs' ability to understand simple arrangements of entities, but current chat-LLMs make errors and have response biases even with three objects.
Alexander Tapley,Marissa Dotter,Michael Doyle,Aidan Fennelly,Dhanuj Gandikota,Savanna Smith,Michael Threet,Tim Welsh
http://arxiv.org/abs/2311.15925v1
Compressor summary: The paper introduces SimFire, a realistic wildfire simulator, and SimHarness, an agent-based machine learning system to generate land management strategies, to help prepare for and react to increasingly severe fire seasons due to climate change.
Henrik S. Steude,Lukas Moddemann,Alexander Diedrich,Jonas Ehrhardt,Oliver Niggemann
http://arxiv.org/abs/2311.15924v1
Compressor summary: The authors propose a method that combines deep learning-based anomaly detection with Consistency-Based Diagnosis for holistic diagnosis in Cyber-Physical Systems and show its effectiveness on simulated and real data.
Jianxiong Li,Shichao Lin,Tianyu Shi,Chujie Tian,Yu Mei,Jian Song,Xianyuan Zhan,Ruimin Li
http://arxiv.org/abs/2311.15920v1
Compressor summary: The paper proposes a data-driven framework for traffic signal control using machine learning and traffic flow theory to infer rewards from coarse-grained data and learn policies from historical datasets.
Elahe Vahdani,Yingli Tian
http://arxiv.org/abs/2311.15916v1
Compressor summary: The paper introduces ADM-Loc, a novel framework for detecting actions in videos with limited annotations, by generating action proposals from a composite distribution and enforcing consistency in action classification scores.
Ari Goodman,Gurpreet Singh,James Hing,Ryan O'Shea
http://arxiv.org/abs/2311.15914v1
Compressor summary: PATRIOT is a prototype system that uses existing camera feeds and passive sensing to automatically track and update aircraft positions on a virtual Ouija board interface, improving deck tracking efficiency and safety without GPS sensors.
Ari Goodman,Ryan O'Shea
http://arxiv.org/abs/2311.15912v1
Compressor summary: LIFT OFF is a hybrid framework that uses machine vision, GPS sensors, and LoRaWAN to provide real-time situational awareness of people, equipment, and aircraft positions in various environments, including military flightlines.
Claudio Rota,Marco Buzzelli,Joost van de Weijer
http://arxiv.org/abs/2311.15908v1
Compressor summary: The paper proposes StableVSR, a method that uses Diffusion Models and Temporal Conditioning Module to enhance the quality of upscaled videos by synthesizing realistic and temporally-consistent details.
Can Sun,Hao Zheng,Zhigang Hu,Liu Yang,Meiguang Zheng,Bo Xu
http://arxiv.org/abs/2311.15906v1
Compressor summary: MetaDefa is a novel meta-learning method that improves SDG model generalization by enhancing domains and aligning features using background substitution, visual corruptions, and class activation maps.
Evgenii Davydkin,Aleksandr Markelov,Egor Iuldashev,Anton Dudkin,Ivan Krivorotov
http://arxiv.org/abs/2311.15896v1
Compressor summary: The paper proposes a novel method to generate realistic synthetic Cyrillic handwriting and use it to create a large dataset for training a post-OCR correction model, which can improve error identification and evaluation of student performance.
Theodor Westny,Arman Mohammadi,Daniel Jung,Erik Frisk
http://arxiv.org/abs/2311.15890v1
Compressor summary: The paper explores how different aspects of neural ODE training impact performance and introduces a new initialization technique based on stability regions.
D. M. Bot,J. Peeters,J. Liesenborgs,J. Aerts
http://arxiv.org/abs/2311.15887v1
Compressor summary: FLASC is a flare-sensitive clustering algorithm that improves upon HDBSCAN* by differentiating branches within detected clusters and offering two variants with varying computational cost and noise robustness.
Jiaxuan Li,Duc Minh Vo,Akihiro Sugimoto,Hideki Nakayama
http://arxiv.org/abs/2311.15879v1
Compressor summary: EVCap is a retrieval-augmented image captioning method that uses external visual-name memory to enable LLMs to describe novel objects without relying on large amounts of data or scaling up network parameters.
Kwanyoung Kim,Yujin Oh,Sangjoon Park,Hwa Kyung Byun,Jin Sung Kim,Yong Bae Kim,Jong Chul Ye
http://arxiv.org/abs/2311.15876v1
Compressor summary: RO-LLaMA is a versatile AI model that can handle various tasks in radiation oncology, thanks to the CEFTune technique and LLM-driven segmentation framework.
Zhenzhi Wang,Jingbo Wang,Dahua Lin,Bo Dai
http://arxiv.org/abs/2311.15864v1
Compressor summary: InterControl is a novel approach that uses motion diffusion models and controlnets to generate realistic human interactions with flexible spatial control of every joint.
Hsuan-I Ho,Jie Song,Otmar Hilliges
http://arxiv.org/abs/2311.15855v1
Compressor summary: SiTH is a novel pipeline that uses an image-conditioned diffusion model to create lifelike and detailed 3D human reconstructions from single images by decomposing the problem into hallucination and reconstruction subproblems.
Balazs Kegl
http://arxiv.org/abs/2311.15854v1
Compressor summary: The authors compare different hyperparameter optimization engines using normalization and aggregation methods and identify three top-performing engines.
Zongwei Wu,Jilai Zheng,Xiangxuan Ren,Florin-Alexandru Vasluianu,Chao Ma,Danda Pani Paudel,Luc Van Gool,Radu Timofte
http://arxiv.org/abs/2311.15851v1
Compressor summary: Un-Track is a single transformer-based tracker that learns a common latent space for multiple modalities using RGB-X pairs and achieves significant F-score gains on various datasets without modality-specific fine-tuning.
Lei Wang,Qingbo Wu,Desen Yuan,King Ngi Ngan,Hongliang Li,Fanman Meng,Linfeng Xu
http://arxiv.org/abs/2311.15846v1
Compressor summary: The paper proposes a method for learning robust image quality assessment models from low-cost opinion scores, which can perform well even with noisy and limited data.
Siteng Huang,Biao Gong,Yutong Feng,Xi Chen,Yuqian Fu,Yu Liu,Donglin Wang
http://arxiv.org/abs/2311.15841v1
Compressor summary: The study introduces a new method called Action-Disentangled Identifier (ADI) for text-to-image generation that improves action customization by learning action-specific identifiers and blocking the inversion of irrelevant features.
Alexander Tapley,Kyle Gatesman,Luis Robaina,Brett Bissey,Joseph Weissman
http://arxiv.org/abs/2311.15838v1
Compressor summary: ARLIN Toolkit helps identify weaknesses in Deep Reinforcement Learning models using clear explanations and visualizations, making them safer to use in real situations.
Léo Lebrat,Rodrigo Santa Cruz,Remi Chierchia,Yulia Arzhaeva,Mohammad Ali Armin,Joshua Goldsmith,Jeremy Oorloff,Prithvi Reddy,Chuong Nguyen,Lars Petersson,Michelle Barakat-Johnson,Georgina Luscombe,Clinton Fookes,Olivier Salvado,David Ahmedt-Aristizabal
http://arxiv.org/abs/2311.15836v1
Compressor summary: Syn3DWound is an open-source dataset of realistic simulated wounds with annotations, aimed at improving machine learning-based wound management through image analysis.
Marius Bock,Michael Moeller,Kristof Van Laerhoven
http://arxiv.org/abs/2311.15831v1
Compressor summary: This paper shows how temporal attention models can improve wearable human activity recognition using raw data, outperforming previous methods with up to 25% better results.
Soyed Tuhin Ahmed,Kamal Danouchi,Michael Hefenbrock,Guillaume Prenat,Lorena Anghel,Mehdi B. Tahoori
http://arxiv.org/abs/2311.15816v1
Compressor summary: The paper proposes Scale Dropout, a novel regularization technique for binary neural networks, and Monte Carlo-Scale Dropout-based Bayesian neural networks for efficient uncertainty estimation on spintronic memory-based computation-in-memory architectures with significant energy savings.
Yu Lu,Linchao Zhu,Hehe Fan,Yi Yang
http://arxiv.org/abs/2311.15813v1
Compressor summary: FlowZero is a novel framework that uses LLMs and image diffusion models to generate temporally-coherent videos from complex spatio-temporal text descriptions.
Avigyan Bhattacharya,Mainak Singha,Ankit Jha,Biplab Banerjee
http://arxiv.org/abs/2311.15812v1
Compressor summary: C-SAW is a method that improves CLIP's performance in analyzing optical remote sensing images by enhancing visual features and prompt learning, addressing domain and content variations.
Marjan FatehiJananloo,Helen Stopps,J. J. McArthur
http://arxiv.org/abs/2311.15807v1
Compressor summary: The study reviewed 17 articles using machine learning and AI to predict hospital energy consumption, finding that occupancy and meteorological data are significant factors, while highlighting the need for more research on optimizing methods and integrating real-time data.
Edouard Yvinec,Arnaud Dapogny,Kevin Bailly
http://arxiv.org/abs/2311.15806v1
Compressor summary: PIPE is a data-free quantization method for deep neural networks that adapts well to different devices and achieves good accuracy-speed trade-offs using residual error expansion, group sparsity, and ensemble approximation.
Quentin Herau,Nathan Piasco,Moussab Bennehar,Luis Roldão,Dzmitry Tsishkou,Cyrille Migniot,Pascal Vasseur,Cédric Demonceaux
http://arxiv.org/abs/2311.15803v1
Compressor summary: The paper proposes a NeRF-based sensor calibration method for autonomous driving that uses overlapping areas and improves accuracy and robustness.
Lukas Wutschitz,Boris Köpf,Andrew Paverd,Saravan Rajmohan,Ahmed Salem,Shruti Tople,Santiago Zanella-Béguelin,Menglin Xia,Victor Rühle
http://arxiv.org/abs/2311.15792v1
Compressor summary: The authors propose using metadata in machine learning systems to address security and privacy issues and compare two methods for achieving user-level non-interference, finding that retrieval augmented models provide the best balance of utility, scalability, and flexibility.
Shaohua Wu,Xudong Zhao,Shenling Wang,Jiangang Luo,Lingjun Li,Xi Chen,Bing Zhao,Wei Wang,Tong Yu,Rongguo Zhang,Jiahua Zhang,Chao Wang
http://arxiv.org/abs/2311.15786v1
Compressor summary: The paragraph describes a new language model called Yuan 2.0 that uses local dependencies in natural language to improve attention, has a large number of parameters, and can perform well in various tasks such as code generation and math problem-solving.
Svetlana Pavlitska,Hannes Grolig,J. Marius Zöllner
http://arxiv.org/abs/2311.15782v1
Compressor summary: The paper reviews how different techniques to make neural networks smaller can affect their ability to resist attacks, but the results are not consistent.
Simone Conia,Min Li,Daniel Lee,Umar Farooq Minhas,Ihab Ilyas,Yunyao Li
http://arxiv.org/abs/2311.15781v1
Compressor summary: The authors propose a new task called Knowledge Graph Enhancement (KGE) that aims to improve the quality and quantity of textual information for non-English entity names and descriptions in Wikidata using a novel unsupervised approach, M-NTA, which combines Machine Translation, Web Search, and Large Language Models. They also introduce WikiKGE-10, the first benchmark to evaluate KGE methods across 10 languages and 7 language families.
Qi Fan,Xin Tao,Lei Ke,Mingqiao Ye,Yuan Zhang,Pengfei Wan,Zhongyuan Wang,Yu-Wing Tai,Chi-Keung Tang
http://arxiv.org/abs/2311.15776v1
Compressor summary: The paper analyzes how well SAM can segment objects with low-quality prompts and proposes Stable-SAM, which adjusts feature sampling locations to improve stability without changing the model architecture or adding many parameters.
Biao Gong,Siteng Huang,Yutong Feng,Shiwei Zhang,Yuyuan Li,Yu Liu
http://arxiv.org/abs/2311.15773v1
Compressor summary: SimM is a system that adjusts image generation to match textual layout instructions without needing additional training, using a pipeline of error detection and rectification with minimal overhead.
Xinglin Li,Kun Wang,Hanhui Deng,Yuxuan Liang,Di Wu
http://arxiv.org/abs/2311.15772v1
Compressor summary: The paper proposes Shock Absorber, a perturbation technique that enhances graph neural networks' robustness and stability by generating synthetic graphs with minimal additional time overhead.
Huanjin Yao,Wenhao Wu,Zhiheng Li
http://arxiv.org/abs/2311.15769v1
Compressor summary: The paper introduces Side4Video, a method for memory-efficient fine-tuning of large vision models to video understanding using a lightweight spatial-temporal side network.
Nianwen Si,Hao Zhang,Heyu Chang,Wenlin Zhang,Dan Qu,Weiqiang Zhang
http://arxiv.org/abs/2311.15766v1
Compressor summary: The paragraph discusses knowledge unlearning as a solution to mitigate risks associated with large language models' potential to retain harmful knowledge without compromising their capabilities.
Yunxin Li,Baotian Hu,Wei Wang,Xiaochun Cao,Min Zhang
http://arxiv.org/abs/2311.15759v1
Compressor summary: The paper proposes MKS2, an approach to improve multimodal language models by enhancing their visual memory and collaboration abilities, leading to better reasoning and performance on benchmarks.
Gabriele D'Acunto,Paolo Di Lorenzo,Francesco Bonchi,Stefania Sardellitti,Sergio Barbarossa
http://arxiv.org/abs/2311.15756v1
Compressor summary: The paper proposes two methods to learn partial correlations between time series across different frequency bands, and shows their effectiveness on synthetic and financial data.
Minghui Hu,Jianbin Zheng,Chuanxia Zheng,Chaoyue Wang,Dacheng Tao,Tat-Jen Cham
http://arxiv.org/abs/2311.15744v1
Compressor summary: The text proposes a new method called One More Step (OMS) to improve image quality in diffusion models by integrating a compact network and an additional step during inference while preserving original model parameters.
Auvick Chandra Bhowmik,Dr. Md. Taimur Ahad,Yousuf Rayhan Emon
http://arxiv.org/abs/2311.15741v1
Compressor summary: The paper reviews image processing techniques and Vision Transformer models used for detecting plant leaf diseases, with potential applications for jamun leaf disease detection.
Mariana Dias,Carla Teixeira Lopes
http://arxiv.org/abs/2311.15740v1
Compressor summary: The paper evaluates how image processing methods and parameter tuning in Optical Character Recognition (OCR) can improve the recognition of text in images of typewritten cultural heritage documents.
Wenhao Wu,Huanjin Yao,Mengxi Zhang,Yuxin Song,Wanli Ouyang,Jingdong Wang
http://arxiv.org/abs/2311.15732v1
Compressor summary: The paper evaluates GPT-4's linguistic and visual capabilities in zero-shot recognition tasks across images, videos, and point clouds, finding improved performance with rich textual descriptions.
Michael Adjeisah,Kwame Omono Asamoah,Martha Asamoah Yeboah,Raji Rafiu King,Godwin Ferguson Achaab,Kingsley Adjei
http://arxiv.org/abs/2311.15728v1
Compressor summary: The researchers created a dataset and model to recognize and classify Adinkra symbols, an example of using AI for cultural preservation and community empowerment.
Mengxi Zhang,Yiming Liu,Xiangjun Yin,Huanjing Yue,Jingyu Yang
http://arxiv.org/abs/2311.15727v1
Compressor summary: MARIS is a referring image segmentation method that uses the Segment Anything Model and mutual-aware attention to improve cross-modal fusion for more accurate segmentation.
Kamyar Zeinalipour,Tommaso laquinta,Asya Zanollo,Giovanni Angelini,Leonardo Rigutini,Marco Maggini,Marco Gori
http://arxiv.org/abs/2311.15723v1
Compressor summary: The paragraph describes how advanced language models can be used to generate and verify educational crossword clues, enhancing student engagement and learning outcomes.
Zeren Tan,Yang Tian,Jian Li
http://arxiv.org/abs/2311.15722v1
Compressor summary: GLIME is an improved method for explaining machine learning models that addresses instability and low local fidelity issues in LIME by using faster convergence, a local and unbiased sampling distribution, and flexible sampling choices.
Benjamin Keel,Aaron Quyn,David Jayne,Samuel D. Relton
http://arxiv.org/abs/2311.15719v1
Compressor summary: The study uses generative AI models to analyze lung cancer lesions from CT scans and develop an interpretable classifier with high accuracy and a clear latent space.
Sabine Wehnert
http://arxiv.org/abs/2311.15716v1
Compressor summary: The author proposes using Justifiable AI to increase trust in Large Language Models' legal outputs by gathering evidence for and against their predictions.
Jiehong Lin,Lihua Liu,Dekun Lu,Kui Jia
http://arxiv.org/abs/2311.15707v1
Compressor summary: SAM-6D is a framework that uses two sub-networks to detect new objects in cluttered scenes and estimate their 6D poses using instance segmentation and pose estimation, outperforming existing methods on BOP Benchmark datasets.
Federico A. Galatolo,Mario G. C. A. Cimino
http://arxiv.org/abs/2311.15698v1
Compressor summary: The study presents a novel method to generate high-quality chat corpora using a generator and embedder LLM, evaluate them with a new MLM metric, and improve the performance of an Italian LLM.
Bogdan Ficiu,Neil D. Lawrence,Andrei Paleyes
http://arxiv.org/abs/2311.15691v1
Compressor summary: The authors propose a pipeline called PFairDP, which uses Bayesian optimization to find Pareto-optimal points between fairness, privacy, and utility of machine learning models in a multi-objective optimization problem.
Laurent Bonnasse-Gahot,Jean-Pierre Nadal
http://arxiv.org/abs/2311.15682v1
Compressor summary: The paragraph discusses an information theoretic approach to evaluate the efficiency of category learning in biological and artificial neural networks, focusing on the coding and decoding costs and the expansion of neural space near decision boundaries.
Maurice Günder,Sneha Banerjee,Rafet Sifa,Christian Bauckhage
http://arxiv.org/abs/2311.15679v1
Compressor summary: The authors propose a framework to use sampling-based explanation methods in pedestrian detection and introduce a new method similar to KernelSHAP that is more efficient for large-scale datasets.
Cédric Goemaere,Johannes Deleu,Thomas Demeester
http://arxiv.org/abs/2311.15673v1
Compressor summary: The paper proposes two strategies to speed up memory retrieval in Hierarchical Associative Memory models, which are a type of neural network, by using faster solvers and alternating optimization of layers.
Xihe Yang,Xingyu Chen,Shaohui Wang,Daiheng Gao,Xiaoguang Han,Baoyuan Wang
http://arxiv.org/abs/2311.15672v1
Compressor summary: The paper proposes HaveFun, a framework for reconstructing human avatars from few-shot unconstrained photos using a tetrahedral representation and a two-phase optimization method.
Aymen Merrouche,Joao Regateiro,Stefanie Wuhrer,Edmond Boyer
http://arxiv.org/abs/2311.15668v1
Compressor summary: The paper proposes a robust unsupervised method for matching shapes with fine details and different types of noise using a hierarchical patch representation and a near-rigid deformation model.
Pengfei Zheng,Kanokphan Lertniphonphan,Feng Chen,Siwei Chen,Bingchuan Sun,Jun Xie,Zhepeng Wang
http://arxiv.org/abs/2311.15660v1
Compressor summary: The paper introduces a LiDAR-based 4D occupancy forecasting method that outperforms the baseline and ranks first in Argoverse Challenges at CVPR 2023.
Jeongsol Kim,Geon Yeong Park,Hyungjin Chung,Jong Chul Ye
http://arxiv.org/abs/2311.15658v1
Compressor summary: The authors propose a new method called TReg that uses textual descriptions to help solve ill-posed inverse problems in latent diffusion models, improving their performance and accuracy.
Chaofeng Chen,Annan Wang,Haoning Wu,Liang Liao,Wenxiu Sun,Qiong Yan,Weisi Lin
http://arxiv.org/abs/2311.15657v1
Compressor summary: Text-to-image diffusion models can be improved by fine-tuning the text encoder using reinforcement learning, leading to better text-image alignment and visual quality.
Qianlong Du,Chengqing Zong,Jiajun Zhang
http://arxiv.org/abs/2311.15653v1
Compressor summary: The paper proposes a MoDS approach to select high-quality and necessary instruction data for fine-tuning LLMs, outperforming the full original dataset.
Aboli Marathe
http://arxiv.org/abs/2311.15648v1
Compressor summary: The paper introduces two models for image generation using model-agnostic learning, RLDF and noisy diffusion gradient, which use a special CFG encoding to guide semantic priors and produce high-quality images from single input images.
Thomas Kleine Buening,Aadirupa Saha,Christos Dimitrakakis,Haifeng Xu
http://arxiv.org/abs/2311.15647v1
Compressor summary: The paper proposes a learning algorithm for online recommendation systems that considers both click-through rates and post-click rewards, and designs an incentive mechanism to encourage desirable arm behavior while minimizing regret.
Hao-Bin Duan,Miao Wang,Yan-Xun Li,Yong-Liang Yang
http://arxiv.org/abs/2311.15637v1
Compressor summary: PaintNeSF is a new technique that uses vector strokes to create stylized 3D images from multi-view 2D images, optimizing stroke parameters with gradient descent and maintaining consistent appearance across views.
Giovanni Angelini,Marco Ernandes,Tommaso laquinta,Caroline Stehlé,Fanny Simões,Kamyar Zeinalipour,Andrea Zugarini,Marco Gori
http://arxiv.org/abs/2311.15626v1
Compressor summary: The authors present a crossword solver for French that uses multiple modules to find candidate answers from various sources and performs well compared to humans in challenges.
Renkai Wu,Yinghao Liu,Pengchen Liang,Qing Chang
http://arxiv.org/abs/2311.15625v1
Compressor summary: The paper proposes MHA-UNet, a model that uses high-order attention interaction to segment skin lesions and detect their presence or absence in an explainable way without needing negative samples.
Xiaohan Feng,Xixin Wu,Helen Meng
http://arxiv.org/abs/2311.15623v1
Compressor summary: The paper proposes an unsupervised method to improve BERT's performance and interpretability in dialogue state tracking tasks using linguistic knowledge extracted from conversations.
Yifei Chen,Dapeng Chen,Ruijin Liu,Sai Zhou,Wenyuan Xue,Wei Peng
http://arxiv.org/abs/2311.15619v1
Compressor summary: The paper proposes a new "Align before Adapt" paradigm for video action recognition that leverages region-aware image embeddings matched to a text corpus and exploits the visual-language alignment of VLP during adaptation to better understand actions by bridging the gap with complex activity semantics.
Zhepeng Wang,Feng Chen,Kanokphan Lertniphonphan,Siwei Chen,Jinyao Bao,Pengfei Zheng,Jinbao Zhang,Kaer Huang,Tao Zhang
http://arxiv.org/abs/2311.15615v1
Compressor summary: The report introduces Le3DE2E, a unified network for sensor-based detection, tracking, and forecasting in autonomous driving, which achieved 1st place in Argoverse Challenges at CVPR 2023 WAD.
Ruixuan Xiao,Yiwen Dong,Junbo Zhao,Runze Wu,Minmin Lin,Gang Chen,Haobo Wang
http://arxiv.org/abs/2311.15614v1
Compressor summary: The authors propose a collaborative learning framework called FreeAL that uses a large language model as an active annotator and a small language model as a student to distill and filter task-specific knowledge, reducing the annotation cost and improving zero-shot performances.
Jialin Liu,Lu Yan,Xiaowei Liu,Yuzhuo Dai,Fanggen Lu,Yuanting Ma,Muzhou Hou,Zheng Wang
http://arxiv.org/abs/2311.15609v1
Compressor summary: The paragraph describes a study that used image processing of high-resolution manometry data to predict esophageal contraction vigor and make the evaluation of esophageal dynamic function easier and more accurate.
Ozan Unal,Dengxin Dai,Lukas Hoyer,Yigit Baran Can,Luc Van Gool
http://arxiv.org/abs/2311.15605v1
Compressor summary: IGNet is a method for weakly-supervised LiDAR semantic segmentation that uses RGB images to compensate for boundary estimation and false negative issues, achieving state-of-the-art results with minimal annotations.
Xiaohan Ding,Yiyuan Zhang,Yixiao Ge,Sijie Zhao,Lin Song,Xiangyu Yue,Ying Shan
http://arxiv.org/abs/2311.15599v1
Compressor summary: The paper proposes architectural guidelines for large-kernel ConvNets, which outperform conventional ConvNets in image recognition and show universal perception ability across modalities.
Sijie Cheng,Zhicheng Guo,Jingwen Wu,Kechen Fang,Peng Li,Huaping Liu,Yang Liu
http://arxiv.org/abs/2311.15596v1
Compressor summary: EgoThink is a new visual question-answering test for vision-language models that assesses their first-person perspective abilities using egocentric video clips, which can help improve autonomous agents and robotics.
Suorong Yang,Geng Zhang,Jian Zhao,Furao Shen
http://arxiv.org/abs/2311.15583v1
Compressor summary: The paper proposes a simple geometric-aware interpolation algorithm for indoor positioning that exploits local topological manifold using manifold learning principles, improving accuracy and efficiency over existing methods.
Sudheer Achary,Rohit Girmaji,Adhiraj Anil Deshmukh,Vineet Gandhi
http://arxiv.org/abs/2311.15581v1
Compressor summary: Real Time GAZED is a novel system that allows users to create high-quality, professionally edited videos in real-time by combining the GAZED framework with CineFilter, a new camera trajectory stabilization technique.
Hailin Zhang,Penghao Zhao,Xupeng Miao,Yingxia Shao,Zirui Liu,Tong Yang,Bin Cui
http://arxiv.org/abs/2311.15578v1
Compressor summary: The paper compares 14 embedding compression methods in machine learning tasks, evaluates their performance under different memory budgets, and recommends the best approach for each use case.
Cindy Le,Congrui Hetang,Ang Cao,Yihui He
http://arxiv.org/abs/2311.15573v1
Compressor summary: The paper introduces a new way to create realistic textures for 3D models using text descriptions and depth information, and shows that it outperforms existing methods in quality, diversity, and speed.
Yongjin Yang,Jongwoo Ko,Se-Young Yun
http://arxiv.org/abs/2311.15569v1
Compressor summary: This paper explores how vision-language models (VLMs) use prompts and adapters for image classification tasks, and proposes an adaptive ensemble method to improve generalization across domains.
Finbarrs Oketunji
http://arxiv.org/abs/2311.15565v1
Compressor summary: The paragraph describes a research project that employs advanced deep learning models to distinguish AI-generated texts from human-written ones using a diverse dataset and natural language processing techniques.
Fan Jiang,Qiongkai Xu,Tom Drummond,Trevor Cohn
http://arxiv.org/abs/2311.15564v1
Compressor summary: ABEL is a simple unsupervised method to enhance passage retrieval by iteratively improving a dense retriever and a reranker, achieving strong results on BEIR benchmark and adapting well to new tasks and domains.
Fan Jiang,Tom Drummond,Trevor Cohn
http://arxiv.org/abs/2311.15563v1
Compressor summary: The paper introduces a new self-training framework that improves neural retrievers without external data and shows better performance on different benchmarks, even with limited training data.
Chongyan Chen,Mengchen Liu,Noel Codella,Yunsheng Li,Lu Yuan,Danna Gurari
http://arxiv.org/abs/2311.15562v1
Compressor summary: The paper introduces VQAonline, a new VQA dataset with longer answers from online forums, and evaluates six models on it.
Yiming Chen,Zhiqi Li,Peidong Liu
http://arxiv.org/abs/2311.15561v1
Compressor summary: The authors propose a fast text-to-3D generation method that uses images from a pre-trained text-to-image diffusion model to train a 3D generative network, which takes only about 8 milliseconds per 3D asset.
Jiquan Yuan,Xinyan Cao,Changjin Li,Fanyi Yang,Jinlong Lin,Xixin Cao
http://arxiv.org/abs/2311.15556v1
Compressor summary: The text introduces a new database (PKU-I2IQA) and two benchmark models for evaluating the quality of AI-generated images in various scenarios.
Jiang Liu,Chen Wei,Yuxiang Guo,Heng Yu,Alan Yuille,Soheil Feizi,Chun Pong Lau,Rama Chellappa
http://arxiv.org/abs/2311.15551v1
Compressor summary: Instruct2Attack (I2A) is a language-guided semantic attack that uses latent diffusion models to generate natural and diverse adversarial examples based on image and text instructions, breaking state-of-the-art neural networks even under strong defenses.
Haoqiang Kang,Xiao-Yang Liu
http://arxiv.org/abs/2311.15548v1
Compressor summary: The paper investigates and proposes solutions for the problem of large language models hallucinating or making up information when performing financial tasks.
Yuxuan Duan,Jianfu Zhang,Liqing Zhang
http://arxiv.org/abs/2311.15547v1
Compressor summary: The paper proposes a new method for dataset distillation using latent space to address problems with time and space complexity and info-compactness, enabling better compression and performance.
Zeyang Zhang,Xingwang Li,Fei Teng,Ning Lin,Xueling Zhu,Xin Wang,Wenwu Zhu
http://arxiv.org/abs/2311.15545v1
Compressor summary: Key points: - Human albumin is important for health, but hard to predict and dose accurately, especially for critically ill patients. - The paper proposes a framework called DyG-HAP that uses dynamic graph regression and attention to capture invariant and variant patterns in the data. - The paper also introduces a new dataset (ANIC) for evaluating albumin prediction methods. Summary: The paper presents DyG-HAP, a framework that uses graphs and attention to predict human albumin levels accurately for ICU patients, and a new dataset (ANIC) to test it on.
Sue Lim,Ralf Schmälzle
http://arxiv.org/abs/2311.15544v1
Compressor summary: This paper explores how people's evaluation of AI-generated health prevention messages changes depending on whether they know the source is AI or human, and how their negative attitudes towards AI affect this preference.
Tong Zhang,Haoyang Liu,Peiyan Zhang,Yuxuan Cheng,Haohan Wang
http://arxiv.org/abs/2311.15543v1
Compressor summary: The text introduces Simple-SVG-Generation (Sextsuperscript{2}VGextsuperscript{2}), a method that generates accurate and simple SVGs for images, improving readability and interpretability compared to previous methods.
Xiang Li,Long Lan,Husam Lahza,Shaowu Yang,Shuihua Wang,Wenjing Yang,Hengzhu Liu,Yudong Zhang
http://arxiv.org/abs/2311.15540v1
Compressor summary: EAFP-Med is a module that uses language models to adaptively process lesion features in different medical imaging technologies, improving disease detection performance.
Bin Xie,Jiale Cao,Jin Xie,Fahad Shahbaz Khan,Yanwei Pang
http://arxiv.org/abs/2311.15537v1
Compressor summary: The paper proposes SED, an encoder-decoder model for open-vocabulary semantic segmentation that uses a hierarchical backbone and category early rejection to improve efficiency and accuracy.
Jia Li,Yanyan Shen,Lei Chen,Charles Wang Wai NG
http://arxiv.org/abs/2311.15530v1
Compressor summary: Key points: - SSIN is a novel data-driven self-supervised learning framework for rainfall spatial interpolation - SpaFormer model uses Transformer architecture and random masking to learn embeddings and model spatial correlations - SSIN outperforms state-of-the-art solutions on two real-world datasets and shows effectiveness on traffic spatial interpolation Summary: SSIN is a new method that uses SpaFormer, a Transformer-based model with self-supervision, to interpolate rainfall distribution from historical data and achieve better results than existing methods.
Jianyang Gu,Saeed Vahidian,Vyacheslav Kungurtsev,Haonan Wang,Wei Jiang,Yang You,Yiran Chen
http://arxiv.org/abs/2311.15529v1
Compressor summary: The paper proposes a new method to reduce the storage and computational cost of training networks using generative diffusion techniques that enhance representativeness and diversity, achieving better performance with less distillation time compared to previous methods.
Mai-Vu Tran,Hoang-Quynh Le,Duy-Cat Can,Quoc-An Nguyen
http://arxiv.org/abs/2311.15525v1
Compressor summary: The paper describes the VLSP 2022 shared task on Vietnamese abstractive multi-document summarization (Abmusu) and presents a human-annotated dataset of Vietnamese news documents in 8 categories.
Shashidhar Reddy Javaji,Haoran Hu,Sai Sameer Vennam,Vijaya Gajanan Buddhavarapu
http://arxiv.org/abs/2311.15513v1
Compressor summary: Question answer generation using NLP models is widely used in various applications, improving customer satisfaction and ease of usage, but can be affected by human errors.
Yonghao Dong,Le Wang,Sanpin Zhou,Gang Hua,Changyin Sun
http://arxiv.org/abs/2311.15512v1
Compressor summary: TSNet is a novel network for pedestrian trajectory prediction in autonomous driving that uses a sparse character graph to learn and remove harmful negative character information, achieving state-of-the-art performance.
Haidong Zhu,Tianyu Ding,Tianyi Chen,Ilya Zharkov,Ram Nevatia,Luming Liang
http://arxiv.org/abs/2311.15510v1
Compressor summary: CaesarNeRF is an end-to-end approach that combines scene-level and pixel-level representations to improve few-shot, generalizable neural rendering with a holistic understanding of scenes.
Hanjie Zhao,Jinge Xie,Yuchen Yan,Yuxiang Jia,Yawen Ye,Hongying Zan
http://arxiv.org/abs/2311.15509v1
Compressor summary: The authors build a large corpus of annotated named entities from different genres of Chinese novels and study the characteristics, genre differences, and challenges of named entity recognition in literature.
Elijah Rippeth,Marine Carpuat,Kevin Duh,Matt Post
http://arxiv.org/abs/2311.15507v1
Compressor summary: The authors propose a simple and scalable way to resolve translation ambiguity in neural machine translation using extra-sentential context without sense annotation or model changes, and evaluate their method on a new challenge set.
Wei Wang,Takashi Ishida,Yu-Jie Zhang,Gang Niu,Masashi Sugiyama
http://arxiv.org/abs/2311.15502v1
Compressor summary: The paper proposes a novel complementary-label learning method that doesn't need uniform distribution assumption or ordinary-label training set, uses negative-unlabeled binary classification, and has theoretical guarantees and experimental validation.
Patrick Hajali,Ignas Budvytis
http://arxiv.org/abs/2311.15500v1
Compressor summary: The authors present a technique for using user-provided code and generating modular sub-functions to aid LLMs in solving programming tasks, as well as introducing a new evaluation method for assessing their performance.
Gabriel De Araujo,Shanlin Sun,Xiaohui Xie
http://arxiv.org/abs/2311.15497v1
Compressor summary: The paper proposes a new image registration method that combines learning and optimization to improve accuracy, efficiency, and smoothness.
Xi Wang,Xianyao Ling,Tom Zhang,Xuecao Li,Shaolan Wang,Zhixing Li,Liang Zhang,Peng Gong
http://arxiv.org/abs/2311.15490v1
Compressor summary: The study uses ChatGLM to generate QA datasets for urban renewal, then fine-tunes it with Prefix and LoRA methods to improve performance in knowledge QA tasks.
Thomas Chen
http://arxiv.org/abs/2311.15487v1
Compressor summary: The paper introduces two modified versions of gradient descent flow for different levels of over- and under-parametrization in Deep Learning, with invariant geometric meanings and proven convergence properties.
Callie C. Liao,Duoduo Liao,Jesse Guessford
http://arxiv.org/abs/2311.15480v1
Compressor summary: The paper proposes a novel method using lyrics as input to generate time signatures for lyrical songs, discovering patterns and utilizing explainable machine learning models with high accuracy.
Divya Kothandaraman,Tianyi Zhou,Ming Lin,Dinesh Manocha
http://arxiv.org/abs/2311.15478v1
Compressor summary: AerialBooth is a new method that can generate aerial views from a single image based on its text description, using a pretrained model and mutual information guidance.
Kam Woh Ng,Xiatian Zhu,Yi-Zhe Song,Tao Xiang
http://arxiv.org/abs/2311.15477v1
Compressor summary: DreamCreature is a novel method that generates new hybrid creatures by extracting sub-concepts from unlabeled images and composing them in a text-to-image model.
Yawar Siddiqui,Antonio Alliegro,Alexey Artemov,Tatiana Tommasi,Daniele Sirigatti,Vladislav Rosov,Angela Dai,Matthias Nießner
http://arxiv.org/abs/2311.15475v1
Compressor summary: MeshGPT is a new method for generating compact triangle meshes using a sequence-based approach inspired by large language models, which improves upon existing methods with better shape coverage and lower FID scores.