arxiv compressed, 2024-01-25

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-25 generated by the compressor, my personal LLM-based project.


Algebraic methods for solving recognition problems with non-crossing classes

http://arxiv.org/abs/2401.13666v1

Compressor summary: The paper explores pattern recognition models using two operators and algebraic operations to create a family of algorithms with guaranteed completeness.


The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations

http://arxiv.org/abs/2401.13662v1

Compressor summary: This paper gives an overview of on-policy policy gradient algorithms in deep reinforcement learning, including their theory, implementation, and comparison on continuous control environments.


MambaByte: Token-free Selective State Space Model

http://arxiv.org/abs/2401.13660v1

Compressor summary: MambaByte is a token-free language model that performs well on byte sequences and competes with subword Transformers while offering faster inference.


Inadequacy of common stochastic neural networks for reliable clinical decision support

http://arxiv.org/abs/2401.13657v1

Compressor summary: This study examines stochastic deep learning methods for uncertainty estimation in mortality prediction for ICU patients and finds that they underestimate epistemic uncertainty and overconfident models, limiting their reliability for clinical decision support.


Graph-Informed Neural Networks for Sparse Grid-Based Discontinuity Detectors

http://arxiv.org/abs/2401.13652v1

Compressor summary: The paper proposes a new method using Graph-Informed Neural Networks (GINNs) to find discontinuities in high-dimensional functions, which is efficient, accurate, and adaptable to different algorithms.


VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

http://arxiv.org/abs/2401.13649v1

Compressor summary: VisualWebArena is a benchmark to test multimodal web agents' abilities on realistic visually grounded tasks using image-text inputs, natural language instructions, and website actions.


How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability

http://arxiv.org/abs/2401.13641v1

Compressor summary: Key points: - Large Language Models (LLMs) like GPT can perform various tasks without specific training - ChatGPT is a conversational interface for LLMs that has many applications - The study explores ChatGPT's ability for face biometrics tasks such as verification and estimation - ChatGPT could increase explainability and transparency of automatic decisions in human scenarios - Experiments show the potential of ChatGPT for face biometrics, especially to enhance explainability Summary: The study evaluates ChatGPT, a conversational interface for LLMs, for face biometrics tasks, finding that it could improve explainability and performance in human scenarios.


Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

http://arxiv.org/abs/2401.13627v1

Compressor summary: SUPIR is a new image restoration method that uses generative models, scaling, and text prompts to improve the quality of images.


DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

http://arxiv.org/abs/2401.13621v1

Compressor summary: The paper proposes a new sentence representation learning method that uses denoising from the intra-sentence perspective and shows its effectiveness in semantic textual similarity tasks and transfers.


Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode

http://arxiv.org/abs/2401.13613v1

Compressor summary: CLIP is a pre-trained model that learns to understand images and text, enabling effective image retrieval using natural language queries.


Stream-based perception for cognitive agents in mobile ecosystems

http://arxiv.org/abs/2401.13604v1

Compressor summary: The paper proposes a stream-based perception approach for cognitive agents that enables them to perceive meaningful situations from low-level sensor data and use them to guide auctions for collaboration in a crowdshipping case study.


MM-LLMs: Recent Advances in MultiModal Large Language Models

http://arxiv.org/abs/2401.13601v1

Compressor summary: This paper surveys existing and emerging MultiModal Large Language Models (MM-LLMs), their design, performance, and future directions.


Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction

http://arxiv.org/abs/2401.13598v1

Compressor summary: The paper proposes a method to generate labeled data for extracting relations from documents using large language models, which improves performance on the task.


Graph Guided Question Answer Generation for Procedural Question-Answering

http://arxiv.org/abs/2401.13594v1

Compressor summary: The paper introduces a novel method to generate exhaustive and high-quality question answering (QA) training data from procedural text using graph-based representations, which enables training compact and competitive task-specific QA models.


Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

http://arxiv.org/abs/2401.13588v1

Compressor summary: The study evaluated three large language models in understanding clinical notes and found GPT-4 to be the best overall, while GPT-3.5 and text-davinci-003 performed better with specific prompting strategies.


Prompt Weight Experiments for LLM Instruction Fine-Tuning

http://arxiv.org/abs/2401.13586v1

Compressor summary: The study examines how adjusting the importance of different parts of a language model affects its performance in instruction tasks, depending on the length of the input text.


Towards Efficient and Effective Deep Clustering with Dynamic Grouping and Prototype Aggregation

http://arxiv.org/abs/2401.13581v1

Compressor summary: The paper introduces DigPro, a novel end-to-end deep clustering framework that extends contrastive learning from instance-level to group-level and performs prototype aggregation in a spherical feature space for efficient and effective representation learning and clustering.


Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding

http://arxiv.org/abs/2401.13565v1

Compressor summary: The paper introduces Mistral 7B, a large language model with extended context lengths, and shows its improved performance on Malay grammar tasks.


SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

http://arxiv.org/abs/2401.13560v1

Compressor summary: SegMamba is a fast and efficient 3D medical image segmentation model that uses a State Space Model to capture long-range dependencies in volume features.


Task structure and nonlinearity jointly determine learned representational geometry

http://arxiv.org/abs/2401.13558v1

Compressor summary: Tanh and ReLU activation functions in neural networks affect the representational geometry differently, influencing the disentanglement of inputs and outputs depending on the target output dimension.


Benchmarking the Fairness of Image Upsampling Methods

http://arxiv.org/abs/2401.13555v1

Compressor summary: The text introduces a framework for evaluating the fairness and diversity of conditional generative models, particularly in image upsampling, using UnfairFace as a benchmark dataset with balanced racial distribution.


PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

http://arxiv.org/abs/2401.13554v1

Compressor summary: The PanAf20K dataset is a large and diverse video collection of chimpanzees and gorillas in Africa with annotations for AI tasks that help conserve endangered species.


Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection

http://arxiv.org/abs/2401.13551v1

Compressor summary: The paper proposes a new unsupervised video anomaly detection method that alternates training one-class classification and weakly-supervised models, and uses weighted one-class classification and adaptive thresholding to improve performance.


Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

http://arxiv.org/abs/2401.13544v1

Compressor summary: The text introduces a method for making interpretable neural networks by performing concept-based interventions on already-trained models, which can improve their effectiveness and calibration.


QAGait: Revisit Gait Recognition from a Quality Perspective

http://arxiv.org/abs/2401.13531v1

Compressor summary: QAGait is a gait recognition approach that improves reliability and performance by using cost-effective quality assessment strategies to handle various challenging silhouettes in real-world scenarios.


Towards Understanding the Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

http://arxiv.org/abs/2401.13530v1

Compressor summary: The paper explores new optimization methods on the Wasserstein space, a probability measure metric space, by extending standard stochastic methods to the Riemannian setting.


SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

http://arxiv.org/abs/2401.13527v1

Compressor summary: The text introduces Chain-of-Information Generation (CoIG), a method for efficient speech generation by decoupling semantic and perceptual information, and SpeechGPT-Gen, an 8-billion-parameter SLLM that uses CoIG to excel in various speech tasks.


Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

http://arxiv.org/abs/2401.13516v1

Compressor summary: Delocate is a novel Deepfake detection model that can recognize and locate unknown domain Deepfake videos by recovering real faces and guiding the localization process with supervision.


Can GPT-3.5 Generate and Code Discharge Summaries?

http://arxiv.org/abs/2401.13512v1

Compressor summary: The study used GPT-3.5 to generate medical documents with ICD-10 codes for data augmentation, improving performance for generation codes and their families, but the generated texts lacked variety and authenticity.


Generative Human Motion Stylization in Latent Space

http://arxiv.org/abs/2401.13505v1

Compressor summary: The paper proposes a novel generative model for human motion stylization that uses latent space of pretrained autoencoders to extract and infuse style, allowing versatile and flexible stylization with content preservation and good performance in various applications.


Research about the Ability of LLM in the Tamper-Detection Area

http://arxiv.org/abs/2401.13504v1

Compressor summary: Large Language Models can detect basic tampering activities but struggle with identifying sophisticated forgeries and realistic AI-generated images.


Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction

http://arxiv.org/abs/2401.13503v1

Compressor summary: The paper introduces PICI, a new deep image clustering method that uses a Transformer encoder and three learning modules to achieve better results than existing methods.


LDCA: Local Descriptors with Contextual Augmentation for Few-Shot Learning

http://arxiv.org/abs/2401.13499v1

Compressor summary: The paper introduces LDCA, a novel approach for few-shot image classification that uses local descriptors with contextual augmentation from a visual transformer to improve global understanding and achieve state-of-the-art results.


SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

http://arxiv.org/abs/2401.13463v1

Compressor summary: SpeechDPR is a framework that uses unsupervised ASR and text dense retriever knowledge to find answers in spoken passages without manual transcription.


Multi-Agent Diagnostics for Robustness via Illuminated Diversity

http://arxiv.org/abs/2401.13460v1

Compressor summary: MADRID is a novel approach to generate diverse adversarial scenarios for testing robustness in multi-agent systems, which can expose strategic vulnerabilities in pre-trained policies.


Symbolic Equation Solving via Reinforcement Learning

http://arxiv.org/abs/2401.13447v1

Compressor summary: The authors show how to use reinforcement learning and deep neural networks to automate solving linear equations in symbolic form.


Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption

http://arxiv.org/abs/2401.13444v1

Compressor summary: CGPE is a framework that efficiently combines knowledge graphs with large language models, using question clues to explore the required knowledge path, resulting in improved performance and reduced computational overhead.


Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond

http://arxiv.org/abs/2401.13432v1

Compressor summary: The paper proposes a novel method called CoupledTPS to improve single-image-based warping tasks by coupling multiple thin-plate splines with limited control points and using semi-supervised learning with unlabeled data.


Serial fusion of multi-modal biometric systems

http://arxiv.org/abs/2401.13418v1

Compressor summary: The paper presents a new framework to evaluate serial biometric fusion systems and shows their benefits over parallel ones using theoretical analysis and experiments.


GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition

http://arxiv.org/abs/2401.13414v1

Compressor summary: GTAutoAct is a dataset generation framework that uses game engine technology to create large-scale, diverse, and high-quality action recognition datasets with various viewpoints and annotations.


Causal Perception

http://arxiv.org/abs/2401.13408v1

Compressor summary: The paper proposes a causal reasoning framework to formalize perception in automated decision-making systems and its implications for fairness and bias.


Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

http://arxiv.org/abs/2401.13405v1

Compressor summary: The paper proposes a synthetic data generation method for object recognition and pose estimation in robotic grasping that reduces human intervention, costs, and improves segmentation and grasping performance.


Text Categorization Can Enhance Domain-Agnostic Stopword Extraction

http://arxiv.org/abs/2401.13398v1

Compressor summary: The paper shows that text categorization helps improve stopword extraction in nine African languages and French by detecting domain-agnostic stopwords with high success rates, but variances exist across languages.


Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

http://arxiv.org/abs/2401.13391v1

Compressor summary: The paper argues that current bias mitigation techniques in AI are not sufficient because they do not account for changes within subgroups and suggests focusing on ranking precision before ensuring fair representation.


UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

http://arxiv.org/abs/2401.13388v1

Compressor summary: UNIMO-G is a diffusion model that generates images from multimodal inputs, improving image quality and fidelity to textual and visual descriptions.


Privacy-Preserving Face Recognition in Hybrid Frequency-Color Domain

http://arxiv.org/abs/2401.13386v1

Compressor summary: The paper proposes a hybrid approach for privacy-preserving face recognition using frequency and color fusion, identity-specific mapping, and secure multiparty computation, achieving higher accuracy than existing methods.


Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

http://arxiv.org/abs/2401.13363v1

Compressor summary: The paper introduces a new task, dataset, and evaluation protocol for measuring the generalizability of human dance generation models and proposes a novel zero-shot framework, MultiDance-Zero, that can synthesize realistic videos with multiple persons and complex backgrounds.


Debiased Sample Selection for Combating Noisy Labels

http://arxiv.org/abs/2401.13360v1

Compressor summary: The paper proposes a noIse-Tolerant Expert Model (ITEM) to address data and training bias in sample selection for learning with noisy labels, using a robust network architecture and a mixed sampling strategy.


Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry

http://arxiv.org/abs/2401.13357v1

Compressor summary: The paper presents a linear relative pose estimation algorithm for n point pairs that filters out outliers by reweighting and improves accuracy even with a high percentage of outliers.


EndoGaussians: Single View Dynamic Gaussian Splatting for Deformable Endoscopic Tissues Reconstruction

http://arxiv.org/abs/2401.13352v1

Compressor summary: EndoGaussians is a new method that uses Gaussian Splatting to accurately reconstruct 3D soft body tissues from endoscopic videos, improving medical applications like VR surgery and image analysis.


Explainable Bayesian Optimization

http://arxiv.org/abs/2401.13334v1

Compressor summary: TNTRules is a new method that explains Bayesian optimization solutions in a way that improves human-AI collaboration and trust in parameter tuning.


NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks

http://arxiv.org/abs/2401.13330v1

Compressor summary: This paper proposes NACHOS, a NAS framework for designing optimal EENNs that balance accuracy and MAC operations while satisfying hardware constraints.


Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval

http://arxiv.org/abs/2401.13329v1

Compressor summary: The text describes a method to simulate target domain videos for cross-domain video moment retrieval using generative video diffusion controlled by target sentences, addressing both generation and selection of high-quality simulation videos.


Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

http://arxiv.org/abs/2401.13327v1

Compressor summary: The authors propose a method to generate realistic and privacy-preserving synthetic smartwatch health data for stress detection using GANs and DP safeguards.


Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

http://arxiv.org/abs/2401.13325v1

Compressor summary: MCDL is a semi-supervised learning method that uses memory banks to record historical predictions of unlabeled data, measuring their credibility, and dividing the data into consistent and inconsistent groups for better learning.


InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

http://arxiv.org/abs/2401.13313v1

Compressor summary: InstructDoc is a collection of 30 diverse visual document understanding datasets with human-written instructions, and InstructDr is a new model that connects document images, image encoders, and large language models to adapt to various VDU tasks.


ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

http://arxiv.org/abs/2401.13311v1

Compressor summary: The paper introduces ConTextual, a benchmark to evaluate large multimodal models' ability to perform text-rich visual reasoning in diverse real-world scenarios, revealing significant performance gaps between current AI and human capabilities.


ChatterBox: Multi-round Multimodal Referring and Grounding

http://arxiv.org/abs/2401.13307v1

Compressor summary: The study introduces a new task (MRG) and a model (ChatterBox) for multimodal dialogues that handle complex spatial relationships and reasoning among multiple instances.


MaLA-500: Massive Language Adaptation of Large Language Models

http://arxiv.org/abs/2401.13303v1

Compressor summary: MaLA-500 is a new large language model that works for 534 languages and improves in-context learning.


Classification of Radiologically Isolated Syndrome and Clinically Isolated Syndrome with Machine-Learning Techniques

http://arxiv.org/abs/2401.13301v1

Compressor summary: The study used machine learning to analyze MRI data from patients with different stages of MS and found that specific brain features could help differentiate between early clinical expressions of the disease with 78% accuracy.


Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

http://arxiv.org/abs/2401.13298v1

Compressor summary: The paper proposes an explainable method to detect harmful memes using multimodal debates between large language models and a fine-tuned small language model as a judge, providing better performance and explanations than existing methods.


Visual Objectification in Films: Towards a New AI Task for Video Interpretation

http://arxiv.org/abs/2401.13296v1

Compressor summary: The article presents a video-interpretation task for detecting objectification of characters in films using a novel dataset and evaluating existing vision models.


Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region

http://arxiv.org/abs/2401.13285v1

Compressor summary: Key points: - The paper proposes a Siamese network-based method for small object tracking in LiDAR point cloud - The method consists of TAPM and RGS modules that learn prototypes and recover fine-grained features respectively - The method improves the tracking performance of small targets without affecting normal-sized objects Summary: The paper presents a Siamese network method that learns prototype features for accurate tracking of small objects in LiDAR point cloud, using TAPM and RGS modules.


RefreshNet: Learning Multiscale Dynamics through Hierarchical Refreshing

http://arxiv.org/abs/2401.13282v1

Compressor summary: RefreshNet is a multiscale framework that uses convolutional autoencoders and recurrent neural networks to capture latent dynamics of complex systems, enabling efficient long-term predictions with high accuracy.


DDI-CoCo: A Dataset For Understanding The Effect Of Color Contrast In Machine-Assisted Skin Disease Detection

http://arxiv.org/abs/2401.13280v1

Compressor summary: Color contrast between lesion area and skin affects malignancy detection in skin disease datasets, suggesting that dermatology AI models should consider both color difference and skin tone when evaluating skin conditions.


Can AI Assistants Know What They Don't Know?

http://arxiv.org/abs/2401.13275v1

Compressor summary: The paper investigates if AI assistants can recognize and communicate their uncertainty in natural language, focusing on open-domain question answering tasks.


Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics

http://arxiv.org/abs/2401.13270v1

Compressor summary: The paper proposes a novel method for automatic image colorization using audio information, which improves the performance of color estimation by incorporating semantic information from both audio and video.


Dual-modal Dynamic Traceback Learning for Medical Report Generation

http://arxiv.org/abs/2401.13267v1

Compressor summary: The study proposes a new framework for generating medical reports from images, using dual-modal learning with dynamic traceback to capture both pathological semantics and morphological details, and perform well without text input during inference.


Enhancing cross-domain detection: adaptive class-aware contrastive transformer

http://arxiv.org/abs/2401.13264v1

Compressor summary: Key points: - Detection transformer needs abundant training data but faces challenges in cross-domain adaptation - Proposed method uses adversarial learning and mean-teacher framework to address issues of class imbalance and performance degradation - Method introduces IoU-aware prediction branch, dynamic category threshold refinement, and instance-level class-aware contrastive learning module - Method improves performance and alleviates class imbalance in diverse domain-adaptive scenarios Summary: The paper presents a novel detection transformer method that uses adversarial learning and mean-teacher framework to handle class imbalance and performance issues in cross-domain adaptation, with three innovative components.


UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems

http://arxiv.org/abs/2401.13256v1

Compressor summary: The paper proposes UniMS-RAG, a system that uses multiple sources to generate personalized responses, by decomposing the task into three sub-tasks, unifying them in a sequence-to-sequence paradigm, and using self-refinement mechanism.


SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning

http://arxiv.org/abs/2401.13246v1

Compressor summary: SEER is a novel method that improves question-answering systems by enabling structured reasoning and explanation using a structure-based return and a fine-grained reward function.


Adaptive Crowdsourcing Via Self-Supervised Learning

http://arxiv.org/abs/2401.13239v1

Compressor summary: Just-predict-others is a new crowdsourcing approach that uses self-supervised learning and adaptive weighting to produce more accurate group estimates when workers' skills vary or their estimates correlate.


From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning

http://arxiv.org/abs/2401.13229v1

Compressor summary: The paper proposes an automatic data selection method to build a small but diverse dataset for few-shot learning, addressing issues with random sampling and crowdsourcing for natural language processing tasks.


Scalable Link Prediction on Large-Scale Heterogeneous Graphs with Large Language Models

http://arxiv.org/abs/2401.13227v1

Compressor summary: The paper introduces LPNL, a framework using natural language prompts and a T5 model to improve scalable link prediction on large heterogeneous graphs.


TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data

http://arxiv.org/abs/2401.13223v1

Compressor summary: The authors propose a step-wise pipeline for question answering over hybrid tabular and textual data, using large language models like GPT-4 initially but later developing a specialized smaller model (TAT-LLM) that outperforms existing methods on various benchmarks.


Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration

http://arxiv.org/abs/2401.13221v1

Compressor summary: The text introduces U-WADN, a novel image restoration method that adapts to different degradation types and levels using varying width sub-networks, achieving better performance and reducing computational resources.


ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Refinement

http://arxiv.org/abs/2401.13218v1

Compressor summary: The paper introduces ULTRA, a hierarchical framework that uses open source LLMs to extract event arguments from documents cost-effectively and without positional bias, and LEAFER to improve argument boundary locating.


AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical Attention Network

http://arxiv.org/abs/2401.13214v1

Compressor summary: A new deep learning method called AMAM improves ship detection in coastal areas by learning multi-scale features and adaptively aggregating salient features from various layers, outperforming existing methods.


Common-Sense Bias Discovery and Mitigation for Classification Tasks

http://arxiv.org/abs/2401.13213v1

Compressor summary: The authors propose CSBD, a method to discover dataset feature correlations based on image descriptions, which can help mitigate model bias by adjusting image sampling weights.


Multitask Active Learning for Graph Anomaly Detection

http://arxiv.org/abs/2401.13210v1

Compressor summary: MITIGATE is a novel framework for graph anomaly detection that leverages multitask learning and node informativeness to improve performance.


Self-Improving Interference Management Based on Deep Learning With Uncertainty Quantification

http://arxiv.org/abs/2401.13206v1

Compressor summary: The paper proposes a self-improving interference management framework that combines deep learning and uncertainty quantification to enhance wireless communication performance and address limitations of data-driven models.


Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step Size

http://arxiv.org/abs/2401.13205v1

Compressor summary: The paper proposes a black-box adversarial generative framework that enhances input diversity and adapts step sizes for better transferable adversarial examples.


Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects

http://arxiv.org/abs/2401.13203v1

Compressor summary: The text describes a new pipeline for generating 3D indoor scenes with customizable styles and appearances, using professionally designed bounding boxes as guidance.


MLLMReID: Multimodal Large Language Model-based Person Re-identification

http://arxiv.org/abs/2401.13201v1

Compressor summary: This paper explores how to adapt multimodal large language models for person re-identification and proposes two methods, Common Instruction and DirectReID, to address the challenges in this task.


Topology-aware Embedding Memory for Learning on Expanding Graphs

http://arxiv.org/abs/2401.13200v1

Compressor summary: PDGNNs with TEM reduce memory complexity, utilize topological information for memory replay, and improve performance on expanding graphs in continual learning.


Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN

http://arxiv.org/abs/2401.13193v1

Compressor summary: Catch-up Mix is a novel method that improves deep learning models' performance and robustness by mixing activation maps with lower norms to promote diverse representations and reduce reliance on specific filters.


Generative Design of Crystal Structures by Point Cloud Representations and Diffusion Model

http://arxiv.org/abs/2401.13192v1

Compressor summary: The framework presents a novel approach for generating new, stable crystal structures using point cloud representation and diffusion model, which can help in material design and synthesis.


Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model

http://arxiv.org/abs/2401.13191v1

Compressor summary: Key points: - The text presents a two-stage training approach for face landmark detection in multiple domains using limited data and a pre-trained diffusion model. - The first stage trains a landmark-conditioned face generation model on real faces, and the second stage fine-tunes it on synthetic pairs with text prompts. - The method generates high-quality synthetic paired datasets from multiple domains and improves face landmark detection performance. Summary: The authors propose a method that uses limited data and a pre-trained diffusion model to generate synthetic paired datasets for multi-domain face landmark detection, achieving better results than existing methods.


Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments

http://arxiv.org/abs/2401.13185v1

Compressor summary: The authors present three efficient algorithms for computing matrices needed by predictive models like Kernel-Based Partial Least-Squares, which speed up cross-validation and do not leak data.


AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

http://arxiv.org/abs/2401.13178v1

Compressor summary: AgentBoard is an evaluation framework for large language models that provides insights into their capabilities and limitations through interactive visualization and fine-grained progress rate metrics.


Boundary and Relation Distillation for Semantic Segmentation

http://arxiv.org/abs/2401.13174v1

Compressor summary: The paper proposes a method to improve small semantic segmentation models by teaching them to preserve object boundaries and relations from larger models.


ADMap: Anti-disturbance framework for reconstructing online vectorized HD map

http://arxiv.org/abs/2401.13172v1

Compressor summary: This paper introduces ADMap, a framework for reconstructing HD maps in autonomous driving that reduces jitter and improves stability with multi-scale perception, interactive attention, and direction difference loss.


Compositional Generative Inverse Design

http://arxiv.org/abs/2401.13171v1

Compressor summary: The paper proposes a method for inverse design using learned diffusion models, which improves performance and allows compositional design of complex systems.


CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering

http://arxiv.org/abs/2401.13170v1

Compressor summary: The paper proposes a new evaluation method for question answering that uses expert rules and a lightweight classifier to better align with human judgments.


Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages

http://arxiv.org/abs/2401.13165v1

Compressor summary: The chapter explores gender-related errors in machine translation for low-resource languages like Bengali, highlighting the social and computational factors that create linguistic hierarchies and their impacts on representational harms and language preservation.


A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm

http://arxiv.org/abs/2401.13161v1

Compressor summary: The paper presents a noise-robust hyperspectral sparse unmixing method using multiscale spatial regularization with group sparsity, which selects the most representative abundance estimation for robust and reproducible results.


SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

http://arxiv.org/abs/2401.13160v1

Compressor summary: SpacTor is a new training method for large language models that combines span corruption and token replacement detection, reducing pre-training time and computational costs while maintaining or improving downstream task performance.