arxiv compressed, 2024-02-29

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-29 generated by the compressor, my personal LLM-based project.


UniMODE: Unified Monocular 3D Object Detection

http://arxiv.org/abs/2402.18573v1

Compressor summary: The paper proposes UniMODE, a bird's-eye-view detector that can handle diverse indoor and outdoor scenes in 3D object detection by using uneven grid, sparse feature projection, and domain alignment techniques.


Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

http://arxiv.org/abs/2402.18571v1

Compressor summary: The text introduces Directional Preference Alignment (DPA), a framework that uses multi-objective reward modeling to capture diverse user preferences for large language models.


Diffusion Language Models Are Versatile Protein Learners

http://arxiv.org/abs/2402.18567v1

Compressor summary: The paper presents DPLM, a protein language model that can generate diverse and structurally plausible protein sequences using a diffusion-based pre-training method and can be fine-tuned for various predictive tasks or conditioned on different inputs.


Approaching Human-Level Forecasting with Language Models

http://arxiv.org/abs/2402.18563v1

Compressor summary: The authors develop a system that combines language models with information retrieval to predict future events and compare its performance to human forecasters, achieving similar or better results in some cases.


Selection of appropriate multispectral camera exposure settings and radiometric calibration methods for applications in phenotyping and precision agriculture

http://arxiv.org/abs/2402.18553v1

Compressor summary: Using a fixed exposure time improves radiometric accuracy and precision in multispectral images for agricultural applications.


Implicit Bias of Next-Token Prediction

http://arxiv.org/abs/2402.18551v1

Compressor summary: This paper studies how gradient descent optimizes next-token prediction (NTP) models and finds conditions under which it reaches the optimal solution and when it favors certain structures.


Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

http://arxiv.org/abs/2402.18546v1

Compressor summary: The study compares two models for neural data representation, finding that the transformer-based TOTEM model performs better in handling variability and sensor failure in electroencephalography datasets.


Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

http://arxiv.org/abs/2402.18540v1

Compressor summary: The paper proposes the "Pure Tuning, Safe Testing" principle to mitigate unsafe behaviors in large language models fine-tuned on seemingly safe datasets.


Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

http://arxiv.org/abs/2402.18528v1

Compressor summary: Our method balances gradients and distills knowledge to improve class-incremental learning on non-uniform data with dual imbalance problems, reducing overfitting and forgetting.


Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures

http://arxiv.org/abs/2402.18527v1

Compressor summary: The paper presents a robust method for detecting tire defects using traditional and advanced features and machine learning models, achieving high accuracy and reliability.


Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference

http://arxiv.org/abs/2402.18512v1

Compressor summary: Log-NCDEs use the Log-ODE method from rough paths to improve training of neural differential equations for multivariate time series classification, achieving higher accuracy than other models.


RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

http://arxiv.org/abs/2402.18510v1

Compressor summary: The paper examines how Chain-of-Thought improves RNNs' performance on algorithmic problems but is not enough to match Transformers, and proposes techniques to enhance RNNs' in-context retrieval ability.


Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

http://arxiv.org/abs/2402.18508v1

Compressor summary: Orchid is a new architecture that uses data-dependent convolution to improve sequence modeling efficiency and expressivity, outperforming traditional attention-based models like BERT and Vision Transformers.


Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images

http://arxiv.org/abs/2402.18507v1

Compressor summary: The paper proposes a deep learning framework that uses advanced image techniques to improve the analysis of cardiac images for detecting late mechanical activation.


Evolving machine learning workflows through interactive AutoML

http://arxiv.org/abs/2402.18505v1

Compressor summary: The paper proposes an interactive genetic programming algorithm for automatic workflow composition in automated machine learning, which improves performance and reduces tuning time by allowing users to modify the grammar dynamically.


Detection of Micromobility Vehicles in Urban Traffic Videos

http://arxiv.org/abs/2402.18503v1

Compressor summary: This paper proposes a new object detection model that combines YOLOX with spatio-temporal features from consecutive video frames to better detect micromobility vehicles in urban traffic.


Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification

http://arxiv.org/abs/2402.18502v1

Compressor summary: This study proposes a framework for assessing and improving fairness in large language models using in-context learning and shows that GPT-4 performs better than other models in terms of accuracy and fairness.


Language Models Represent Beliefs of Self and Others

http://arxiv.org/abs/2402.18496v1

Compressor summary: The study reveals how Large Language Models represent self and others' beliefs in their neural activations and manipulates them to understand mental states in various social reasoning tasks.


ROG$_{PL}$: Robust Open-Set Graph Learning via Region-Based Prototype Learning

http://arxiv.org/abs/2402.18495v1

Compressor summary: ROG$_{PL}$ is a framework that uses prototype learning to improve robust open-set node classification on noisy graph data.


Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

http://arxiv.org/abs/2402.18493v1

Compressor summary: The paper proposes DRET, a rain simulation method, and SRKD, a knowledge distillation approach, to improve 3D object detection under various weather conditions.


Dynamical Regimes of Diffusion Models

http://arxiv.org/abs/2402.18491v1

Compressor summary: The study uses statistical physics methods to analyze generative diffusion models in large dimensions and datasets, identifying three distinct dynamical regimes and showing how they relate to phase transitions and the curse of dimensionality.


TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

http://arxiv.org/abs/2402.18490v1

Compressor summary: TAMM is a novel two-stage learning approach that uses three synergetic adapters to effectively leverage image and language modalities for pre-training 3D shape representations, improving performance on various tasks.


NewsQs: Multi-Source Question Generation for the Inquiring Mind

http://arxiv.org/abs/2402.18479v1

Compressor summary: NewsQs is a dataset containing question-answer pairs for multiple news articles created by fine-tuning a T5 model on FAQ-style news and filtering the data with QNLI.


Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes

http://arxiv.org/abs/2402.18477v1

Compressor summary: The paper proposes a new method to infer causal structures from stochastic dynamical systems using path-space data and signature kernels, which performs better than existing approaches.


IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

http://arxiv.org/abs/2402.18476v1

Compressor summary: The paper proposes an image-biased decoding technique to reduce hallucinations in large vision-language models by contrasting predictions from conventional and image-biased models, improving the quality of generated responses.


Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

http://arxiv.org/abs/2402.18467v1

Compressor summary: The authors propose SeCo, a method that separates co-occurring objects in image patches and enhances semantic representation with multi-granularity knowledge contrast to tackle the challenging co-occurrence problem in weakly supervised semantic segmentation.


Meta-Task Prompting Elicits Embedding from Large Language Models

http://arxiv.org/abs/2402.18458v1

Compressor summary: MetaEOL is an unsupervised method that uses meta-task prompts to generate high-quality sentence embeddings from LLMs without fine-tuning or task engineering.


HOP to the Next Tasks and Domains for Continual Learning in NLP

http://arxiv.org/abs/2402.18449v1

Compressor summary: HOP is a method that enables continual learning in NLP by adapting to new tasks and domains, preserving past knowledge, and distinguishing task-related statistics using high-order moments and auxiliary heads.


Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

http://arxiv.org/abs/2402.18447v1

Compressor summary: The paper proposes a dynamic object-centric perception network using prompt learning to improve single-domain generalization for image classification and object detection tasks by adapting to image complexity variations.


LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs

http://arxiv.org/abs/2402.18443v1

Compressor summary: LeMo-NADe is a framework that uses an expert system, a large language model, and user preferences to automatically discover efficient neural network architectures for edge devices.


Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

http://arxiv.org/abs/2402.18439v1

Compressor summary: The text explores how allowing Large Language Models to choose non-natural language formats for reasoning and communication can improve efficiency and effectiveness.


Graph Regularized Encoder Training for Extreme Classification

http://arxiv.org/abs/2402.18434v1

Compressor summary: Key points: - XC trains an encoder and a classifier to tag data points with relevant labels from a large universe - GCNs are powerful but costly for XC applications that have tail labels with little training data - RAMEN is a new paradigm that uses graph data to regularize encoder training without GCNs, achieving higher accuracy and lower costs Summary: RAMEN improves extreme classification by using graph data to regularize encoder training instead of expensive graph convolutional networks.


Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

http://arxiv.org/abs/2402.18428v1

Compressor summary: DCMCL is a novel collaborative learning method that improves both AR and NAR models by leveraging bilateral contextual information from different types of generative models for Neural Machine Translation.


A Relational Inductive Bias for Dimensional Abstraction in Neural Networks

http://arxiv.org/abs/2402.18426v1

Compressor summary: The paper explores how the relational bottleneck, a mechanism that focuses on relations among inputs, improves neural networks' generalization, learning efficiency, and ability to form compositional representations like humans.


Emotion Classification in Low and Moderate Resource Languages

http://arxiv.org/abs/2402.18424v1

Compressor summary: Key points: - The paper presents a cross-lingual emotion classifier that transfers learning from English to other languages. - Two transfer approaches are compared: parallel projection and direct transfer. - The results show that both approaches outperform random baselines and direct transfer is better. - Emotion-labeled resources are created for four languages. Summary: The paper proposes a cross-lingual emotion classifier that learns from English and transfers to other languages, comparing two methods and creating labeled resources for some languages.


Can GPT Improve the State of Prior Authorization via Guideline Based Automated Question Answering?

http://arxiv.org/abs/2402.18419v1

Compressor summary: Key points: - Prior authorization (PA) is a health plan cost-control process that requires clearance in advance for certain procedures. - Validating PA requests is time-consuming and challenging for health insurance companies. - The authors evaluate if GPT can help validate key factors using question answering from patient electronic health records. - They experiment with different prompting techniques and report qualitative assessment by humans. - Their method outperforms standard counterparts with a mean weighted F1 score of 0.61. Summary: The authors use GPT to validate prior authorization requests for health plans using question answering from patient records, improving efficiency and accuracy over conventional methods.


Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

http://arxiv.org/abs/2402.18411v1

Compressor summary: ProtoOT is a novel Optimal Transport method for unsupervised cross-domain image retrieval that integrates intra-domain feature learning and cross-domain alignment, using K-means clustering and contrastive learning to improve performance.


A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models

http://arxiv.org/abs/2402.18409v1

Compressor summary: The paper introduces a new benchmark to test the high-level cognitive abilities of LVLMs using images with rich semantics, inspired by a human cognition task.


A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

http://arxiv.org/abs/2402.18402v1

Compressor summary: The paper proposes SyMPIE, a modular system that enhances noisy input data for robust multimedia understanding tasks with minimal computational cost and without needing paired clean-corrupted data.


Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models

http://arxiv.org/abs/2402.18397v1

Compressor summary: The paper presents a new method to test how well large language models understand different languages by generating individual prompts for each token in a sentence and evaluates it on part-of-speech tagging tasks.


Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

http://arxiv.org/abs/2402.18393v1

Compressor summary: This paper proposes a method to test autonomous driving systems' decision-making quality by generating scenarios where they don't make optimal choices, and a new metamorphic relation to identify such scenarios.


Unveiling the Potential of Robustness in Evaluating Causal Inference Models

http://arxiv.org/abs/2402.18392v1

Compressor summary: The paper introduces a new method for selecting the best Conditional Average Treatment Effect (CATE) estimator using Distributionally Robust Metric (DRM), which is effective and requires less additional models.


The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA

http://arxiv.org/abs/2402.18385v1

Compressor summary: Key points: - Conversational multi-doc QA is answering questions based on documents and contextual conversations - The paper introduces a winning approach using large language models (LLMs) - The approach adapts LLMs to the task, uses in-domain unlabeled data, filters irrelevant documents, and ensembles models Summary: The paper presents a winning approach for conversational multi-doc QA that leverages large language models for adaptation, data usage, document filtering, and model ensemble.


Robust Quantification of Percent Emphysema on CT via Domain Attention: the Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study

http://arxiv.org/abs/2402.18383v1

Compressor summary: The authors developed a deep learning framework that combines image features and scanner priors using a novel domain attention block to improve pulmonary emphysema segmentation on CT scans.


Large Language Models As Evolution Strategies

http://arxiv.org/abs/2402.18381v1

Compressor summary: Large language models can perform black-box optimization tasks without explicit training by using a novel prompting strategy and outperform baseline algorithms.


Out-of-Domain Generalization in Dynamical Systems Reconstruction

http://arxiv.org/abs/2402.18377v1

Compressor summary: The paper presents a framework to analyze and improve out-of-domain generalization in dynamical systems reconstruction using topological concepts and ergodic theory, showing that current black-box deep learning methods struggle with this challenge.


Tokenization Is More Than Compression

http://arxiv.org/abs/2402.18376v1

Compressor summary: The paper evaluates PathPiece, a new tokenizer that segments text into the minimum number of tokens for a given vocabulary, and tests its effectiveness compared to BPE; it also investigates various factors in tokenization and trains 64 language models with different tokenization methods.


VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models

http://arxiv.org/abs/2402.18374v1

Compressor summary: VerifiNER is a framework that uses knowledge to correct errors in biomedical named entity recognition.


Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model

http://arxiv.org/abs/2402.18362v1

Compressor summary: The study presents AG-DDAD, an automated method to assess breast cosmesis after surgery using attention-guided denoising diffusion and self-supervised Vision Transformer, which outperforms existing models and eliminates manual annotations.


LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

http://arxiv.org/abs/2402.18351v1

Compressor summary: LatentSwap is a lightweight face swapping framework that uses latent codes to swap faces between images, producing realistic results with minimal data and fast training.


Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning

http://arxiv.org/abs/2402.18344v1

Compressor summary: RIDERS is a novel method to improve large language models' commonsense reasoning and mitigate information loss issues in Chain-of-Thought reasoning.


Probabilistic Bayesian optimal experimental design using conditional normalizing flows

http://arxiv.org/abs/2402.18337v1

Compressor summary: The paper proposes a new method for designing efficient and robust experiments using a conditional normalizing flow and a Bernoulli distribution.


Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

http://arxiv.org/abs/2402.18334v1

Compressor summary: Bonito generates synthetic tasks for instruction tuning of large language models using unannotated text, improving their zero-shot performance on various domain tasks.


FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

http://arxiv.org/abs/2402.18331v1

Compressor summary: FineDiffusion is a parameter-efficient strategy for large-scale fine-grained image generation using diffusion models, with a novel sampling method and achieving state-of-the-art results.


Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

http://arxiv.org/abs/2402.18330v1

Compressor summary: EgoTAP is a novel method that converts heatmaps to accurate 3D pose estimation using self-attention and skeletal information, improving performance over previous methods.


Location-guided Head Pose Estimation for Fisheye Image

http://arxiv.org/abs/2402.18320v1

Compressor summary: The paper proposes a new neural network approach to estimate head pose in fisheye images without rectification or calibration, achieving better performance than existing methods.


How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

http://arxiv.org/abs/2402.18312v1

Compressor summary: This study explores how Large Language Models use multiple parallel pathways to generate Chain-of-Thought reasoning and reveals an internal phase shift between their initial and later layers.


Escaping Local Optima in Global Placement

http://arxiv.org/abs/2402.18311v1

Compressor summary: The paper presents a hybrid optimization method that improves placement in physical design by escaping local optima and outperforms existing methods on two benchmarks.


Enhancing Roadway Safety: LiDAR-based Tree Clearance Analysis

http://arxiv.org/abs/2402.18309v1

Compressor summary: The paper introduces a new algorithm that uses LiDAR point clouds to automatically detect trees blocking roadways and help municipalities manage them for safer streets.


Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks

http://arxiv.org/abs/2402.18307v1

Compressor summary: The paper proposes a method to segment objects in low-light images using Mask R-CNN with weighted non-local blocks for feature denoising and improved performance.


EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

http://arxiv.org/abs/2402.18302v1

Compressor summary: The paper introduces AR-MOT, a challenging audio-based object tracking task for autonomous driving, and presents EchoTrack, an end-to-end framework using dual vision transformers and bidirectional audio-video fusion to address it.


Comparative Analysis of XGBoost and Minirocket Algortihms for Human Activity Recognition

http://arxiv.org/abs/2402.18296v1

Compressor summary: This study compares two machine learning algorithms (XGBoost and MiniRocket) for human activity recognition using smartphone sensor data, finding that both achieve high accuracy and efficiency, with XGBoost slightly outperforming MiniRocket.


Grid-Based Continuous Normal Representation for Anomaly Detection

http://arxiv.org/abs/2402.18293v1

Compressor summary: GRAD is a new anomaly detection method that uses continuous grids to represent normal features, improving generalization and handling multiple classes of objects.


FSL Model can Score Higher as It Is

http://arxiv.org/abs/2402.18292v1

Compressor summary: Key points: - The text describes a method to improve few-shot-learning classification by generating new samples of unseen classes through image-to-image translation - The method captures the style or shape of the test image and transfers it to train-class images - The method can achieve significant performance improvement with just one additional generated sample Summary: The authors propose a method that generates new samples of unseen classes for few-shot-learning classification by using image-to-image translation to match the style or shape of test images. This can improve the accuracy of trained models with minimal extra data.


Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform

http://arxiv.org/abs/2402.18287v1

Compressor summary: The paper proposes a new method for reconstructing indoor scenes from single images using a U-Former architecture with a Windowed-FourierMixer block, which performs better than existing methods in handling periodic structures and achieving realistic results.


Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis

http://arxiv.org/abs/2402.18286v1

Compressor summary: This paper shows how self-supervised learning from unlabeled electron microscopy data improves efficiency and performance for various tasks, such as segmentation and denoising.


PiShield: A NeSy Framework for Learning with Requirements

http://arxiv.org/abs/2402.18285v1

Compressor summary: PiShield is a framework that integrates safety requirements into neural networks' topology, ensuring compliance regardless of input, and can be used in various domains like functional genomics, autonomous driving, and tabular data generation.


Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization

http://arxiv.org/abs/2402.18284v1

Compressor summary: Key points: - ChatGPT uses reinforcement learning from human feedback, which is costly and time-consuming - The paper proposes a self-supervised text ranking approach to fine-tune language models without human annotators - The method involves probabilistic sampling, TextRank, ISODATA, reward model, and policy optimization - The experiments show that the proposed method improves metrics and matches human ranking results Summary: The paper presents a self-supervised text ranking approach to fine-tune language models like ChatGPT without human feedback, using probabilistic sampling, clustering, reward modeling, and policy optimization.


Towards Better Understanding of Contrastive Sentence Representation Learning: A Unified Paradigm for Gradient

http://arxiv.org/abs/2402.18281v1

Compressor summary: The paper investigates why contrastive self-supervised learning (SSL) works well for sentence representation learning (SRL), and proposes a unified paradigm that integrates four effective contrastive losses based on gradient dissipation, weight, and ratio, which improves non-contrastive SSL performance in SRL.


EAN-MapNet: Efficient Vectorized HD Map Construction with Anchor Neighborhoods

http://arxiv.org/abs/2402.18278v1

Compressor summary: EAN-MapNet is an efficient and accurate HD map construction system using anchor neighborhoods and grouped local self-attention.


Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

http://arxiv.org/abs/2402.18277v1

Compressor summary: The authors propose a deep learning model that uses slot attention to separate multiple light sources and achieve state-of-the-art white balancing results while providing information on the number and color of light sources.


Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?

http://arxiv.org/abs/2402.18272v1

Compressor summary: The text discusses how single-agent LLMs with strong prompts can perform almost as well as multi-agent discussion on reasoning tasks, except when there's no demonstration in the prompt.


A Survey on Neural Question Generation: Methods, Applications, and Prospects

http://arxiv.org/abs/2402.18267v1

Compressor summary: The text surveys the advancements in Neural Question Generation (NQG), which uses neural networks to generate relevant questions from diverse inputs, and classifies NQG approaches into structured, unstructured, and hybrid categories.


Retrieval-based Full-length Wikipedia Generation for Emergent Events

http://arxiv.org/abs/2402.18264v1

Compressor summary: The paper introduces Wiki-GenBen, a benchmark for evaluating Large Language Models' ability to generate factual full-length Wikipedia articles from web sources for recently occurred events.


Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

http://arxiv.org/abs/2402.18262v1

Compressor summary: WebLM is a multimodal pre-training network that improves webpage understanding by integrating document images' structure and interacting with text and structure modalities.


Efficiently Computable Safety Bounds for Gaussian Processes in Active Learning

http://arxiv.org/abs/2402.18260v1

Compressor summary: Key points: - Active learning of physical systems needs to consider safety constraints - Gaussian Processes (GPs) with uncertainty estimations are common tools for this task - Safety assessment along continuous trajectories is challenging due to Monte-Carlo sampling of high quantiles - Provable safety bounds based on adaptively sampled median of posterior GP are proposed - The method reduces samples, improves speed and accuracy, and works in simulations and a real engine example Summary: The paper proposes a safe active learning method for physical systems that uses adaptive sampling of the posterior GP to provide provable safety bounds and reduce Monte-Carlo sampling, achieving faster evaluation without sacrificing accuracy.


A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames

http://arxiv.org/abs/2402.18258v1

Compressor summary: The authors propose a Multi-Intent dataset for realistic in-Vehicle dialogue Systems, using a BiRGAT model to encode hierarchical ontology items and a 3-way pointer-generator decoder to tackle multi-intent cases.


Towards Generalist Prompting for Large Language Models by Mental Models

http://arxiv.org/abs/2402.18252v1

Compressor summary: The paper introduces generalist prompting, a method to make large language models perform well on various tasks without needing specialized prompts, and proposes MeMo, a simple but effective prompting technique that uses mental models for different tasks.


On the Accuracy of Edge Detectors in Number Plate Extraction

http://arxiv.org/abs/2402.18251v1

Compressor summary: The paper proposes an edge detection method for number plate extraction that works well in both noisy and clean environments, using pixel intensity changes and MATLAB 2017b.


Learning or Self-aligning? Rethinking Instruction Fine-tuning

http://arxiv.org/abs/2402.18243v1

Compressor summary: The paper investigates the factors behind instruction fine-tuning in language models, finding that learning additional world knowledge is not always beneficial and maintaining consistency is crucial for success.


Image2Flow: A hybrid image and graph convolutional neural network for rapid patient-specific pulmonary artery segmentation and CFD flow field calculation from 3D cardiac MRI data

http://arxiv.org/abs/2402.18236v1

Compressor summary: This study trained a deep learning model to generate patient-specific volume-meshes of the pulmonary artery from 3D cardiac MRI data and directly estimate CFD flow fields, achieving high accuracy and speed.


Zero-Shot Aerial Object Detection with Visual Description Regularization

http://arxiv.org/abs/2402.18233v1

Compressor summary: DescReg is a zero-shot method for aerial object detection that uses prior descriptions of visual appearance to improve semantic-visual correlation and outperforms state-of-the-art methods on three challenging datasets.


CogBench: a large language model walks into a psychology lab

http://arxiv.org/abs/2402.18225v1

Compressor summary: CogBench is a benchmark for evaluating large language models based on cognitive psychology experiments, revealing the impact of model size, RLHF, open-source vs proprietary models, and prompt-engineering techniques on their behavior.


Improving Open-Ended Text Generation via Adaptive Decoding

http://arxiv.org/abs/2402.18223v1

Compressor summary: Adaptive decoding is a mechanism that helps language models dynamically choose better candidates for the next token during text generation, improving quality and diversity in tasks like storytelling.


Region-Aware Exposure Consistency Network for Mixed Exposure Correction

http://arxiv.org/abs/2402.18217v1

Compressor summary: RECNet is a network that can correct images with mixed exposure by adapting regional features into an exposure-invariant space and restoring local information using a mix-scale restoration unit, while maintaining global image quality through exposure contrastive regularization.


LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

http://arxiv.org/abs/2402.18216v1

Compressor summary: The paper studies how task-switches in conversations affect the performance of large language models and finds that they can cause significant degradation.


Multi-objective Differentiable Neural Architecture Search

http://arxiv.org/abs/2402.18213v1

Compressor summary: Key points: - The paper proposes a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics across devices. - The algorithm uses a hypernetwork to parameterize the joint architectural distribution via hardware features and preference vectors, enabling zero-shot transferability. - The method is effective and scalable, outperforming existing MOO NAS methods on various search spaces and datasets. Summary: The paper introduces a new NAS algorithm that leverages user preferences and a hypernetwork to find diverse and representative architectures across devices with performance-hardware trade-offs, achieving better results than previous MOO NAS methods.


Catastrophic Overfitting: A Potential Blessing in Disguise

http://arxiv.org/abs/2402.18211v1

Compressor summary: The text proposes using catastrophic overfitting as a way to improve adversarial robustness without sacrificing accuracy on clean data by manipulating feature activation differences with regularization terms and adding noise during evaluation.


DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition

http://arxiv.org/abs/2402.18209v1

Compressor summary: The paper introduces a high-granularity named entity dataset (DANSK), a generalizable model (DaCy 2.6.0), and evaluates existing models' domain generalization in Danish NLP, addressing limitations and discussing annotation quality.


Balancing Act: Distribution-Guided Debiasing in Diffusion Models

http://arxiv.org/abs/2402.18206v1

Compressor summary: The paper proposes Distribution Guidance, a method to reduce bias in diffusion models' image generation by using Attribute Distribution Predictor (ADP) that guides fair generation based on latent features of denoising UNet.


Oil Spill Drone: A Dataset of Drone-Captured, Segmented RGB Images for Oil Spill Detection in Port Environments

http://arxiv.org/abs/2402.18202v1

Compressor summary: This paper introduces a new RGB image dataset for oil spill detection using drones and neural networks, which can help improve environmental protection in port areas.


Learning Invariant Inter-pixel Correlations for Superpixel Generation

http://arxiv.org/abs/2402.18201v1

Compressor summary: The CDS algorithm separates invariant inter-pixel correlations from statistical properties in images by using auxiliary modalities and mutual information minimization, improving superpixel grouping performance.


Automated Machine Learning for Multi-Label Classification

http://arxiv.org/abs/2402.18198v1

Compressor summary: The text discusses the challenges of applying automated machine learning (AutoML) to single-label and multi-label classification tasks, and proposes a novel AutoML approach for SLC with limited algorithm complexity and explores its extension to MLC and improving flexibility and efficiency.


NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images

http://arxiv.org/abs/2402.18196v1

Compressor summary: The text introduces a new dataset and method for generating top-view human pose estimation data using NeRF, which improves the performance of neural networks for this task.


Misalignment-Robust Frequency Distribution Loss for Image Transformation

http://arxiv.org/abs/2402.18192v1

Compressor summary: The paper proposes a Frequency Distribution Loss (FDL) to address the challenge of training deep learning-based image transformation methods on poorly aligned paired datasets, by measuring distribution distance in the frequency domain and improving performance on image enhancement, super-resolution, and style transfer tasks.


Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

http://arxiv.org/abs/2402.18191v1

Compressor summary: CaR is a data selection method that efficiently selects high-quality instructional data for GPT models by ranking instruction pairs based on expert preferences and preserving dataset diversity.


CFDNet: A Generalizable Foggy Stereo Matching Network with Contrastive Feature Distillation

http://arxiv.org/abs/2402.18181v1

Compressor summary: The proposed framework uses contrastive feature distillation to improve stereo matching in foggy scenes by combining feature learning from both clean and foggy features, enhancing generalization across different environments.


Challenges in Pre-Training Graph Neural Networks for Context-Based Fake News Detection: An Evaluation of Current Strategies and Resource Limitations

http://arxiv.org/abs/2402.18179v1

Compressor summary: The paper explores using pre-trained graph neural networks for context-based fake news detection and finds that current pre-training strategies do not significantly improve performance over training from scratch.


Reflection Removal Using Recurrent Polarization-to-Polarization Network

http://arxiv.org/abs/2402.18178v1

Compressor summary: Key points: - Paper proposes polarization-to-polarization approach for reflection removal - Uses two sequential networks and recurrent framework to exploit polarization information - Outperforms existing methods on a public dataset Summary: The paper presents a novel method for removing reflections from images by using polarized images as inputs and predicting polarized reflection and transmission images with two sequential networks and a recurrent framework.


Self-Supervised Spatially Variant PSF Estimation for Aberration-Aware Depth-from-Defocus

http://arxiv.org/abs/2402.18175v1

Compressor summary: The paper proposes a self-supervised learning method for estimating spatially variant point spread functions (PSFs) from real sharp and blurred images, which improves aberration-aware depth-from-defocus (DfD).


NiteDR: Nighttime Image De-Raining with Cross-View Sensor Cooperative Learning for Dynamic Driving Scenes

http://arxiv.org/abs/2402.18172v1

Compressor summary: The text describes a framework that uses cooperative learning between visible and infrared images to enhance image quality and visual perception for rainy nighttime driving scenes, addressing challenges faced by autonomous driving systems.


Digging Into Normal Incorporated Stereo Matching

http://arxiv.org/abs/2402.18171v1

Compressor summary: The paper proposes a normal map-based method to improve learning-based stereo matching in challenging regions by using non-local affinity matrix and local residual learning.


MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery

http://arxiv.org/abs/2402.18169v1

Compressor summary: MIKO is a framework that uses two large language models to understand users' intentions in social media posts by interpreting images, extracting text information, and generating intentions.


Autoencoder-based General Purpose Representation Learning for Customer Embedding

http://arxiv.org/abs/2402.18164v1

Compressor summary: The paper presents an autoencoder framework for embedding complex tabular data, showing simpler models perform better and improving reconstruction loss calculation for contractive autoencoders.


Ef-QuantFace: Streamlined Face Recognition with Small Data and Low-Bit Precision

http://arxiv.org/abs/2402.18163v1

Compressor summary: The paper introduces a new efficient approach to compress face recognition models using much smaller datasets than previously required, achieving state-of-the-art results and transformative applications.


Out-of-Distribution Detection using Neural Activation Prior

http://arxiv.org/abs/2402.18162v1

Compressor summary: The paper introduces a simple Neural Activation Prior (NAP) for out-of-distribution detection in neural networks, which uses strongly activated neurons before global pooling to detect patterns in input samples and achieves state-of-the-art performance on various image datasets.


Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

http://arxiv.org/abs/2402.18159v1

Compressor summary: The paper introduces a general framework for risk-sensitive distributional reinforcement learning and proposes two novel meta-algorithms that achieve statistically efficient regret bounds.


Evaluating Quantized Large Language Models

http://arxiv.org/abs/2402.18158v1

Compressor summary: This paper evaluates post-training quantization (PTQ) techniques for large language models (LLMs), studying their impact on different tasks and model families, and providing recommendations for application.


From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

http://arxiv.org/abs/2402.18157v1

Compressor summary: Sum2Act is a novel tool invocation pipeline that mimics the human problem-solving process, improving LLMs' ability to use and create tools for complex real-world tasks.


Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models

http://arxiv.org/abs/2402.18154v1

Compressor summary: The paper explores knowledge conflicts in language models due to external context and proposes PH3, a method to prune conflicting attention heads without updating parameters, improving performance on open-domain QA tasks.


Diffusion-based Neural Network Weights Generation

http://arxiv.org/abs/2402.18153v1

Compressor summary: Key points: - Transfer learning improves deep learning performance on new tasks - Pretrained models are often suboptimal and blindly selected - The proposed method uses a latent diffusion model with a variational autoencoder to learn the distribution of pretrained weights for each dataset - This enables adaptive sampling of weights for faster convergence and competitive performance Summary: The paper proposes an efficient and adaptive transfer learning scheme that uses a latent diffusion model and a variational autoencoder to learn and sample pretrained weights conditioned on each dataset, achieving faster convergence and competitive performance.


Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation

http://arxiv.org/abs/2402.18150v1

Compressor summary: The paper proposes InFO-RAG, a low-cost method to train large language models as "Information Refiners" to improve retrieval-augmented generation by integrating knowledge from retrieved texts and model parameters.


Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

http://arxiv.org/abs/2402.18149v1

Compressor summary: The paper proposes a new risk-sensitive reinforcement learning algorithm for partially observable environments with hindsight observations, and proves its low regret and efficiency.


3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

http://arxiv.org/abs/2402.18146v1

Compressor summary: Our novel auto-labelling approach generates 3D scene flow pseudo labels for real-world LiDAR point clouds using rigid motion decomposition and data augmentation, achieving superior performance on multiple datasets without manual labelling.


Learning Intrinsic Dimension via Information Bottleneck for Explainable Aspect-based Sentiment Analysis

http://arxiv.org/abs/2402.18145v1

Compressor summary: The paper proposes a new method to improve explanations for neural models in natural language processing, specifically for sentiment analysis, by refining word embeddings with an information bottleneck.


Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information

http://arxiv.org/abs/2402.18144v1

Compressor summary: The authors propose "random silicon sampling" to generate opinions aligned with human subgroups based on their demographic data, showing that language models can mimic public opinion polls but are influenced by societal biases.


Cause and Effect: Can Large Language Models Truly Understand Causality?

http://arxiv.org/abs/2402.18139v1

Compressor summary: The paper proposes CARE CA, a framework that combines explicit and implicit causal reasoning using ConceptNet and large language models, enhancing causal understanding and interpretability.


Learning to Deblur Polarized Images

http://arxiv.org/abs/2402.18134v1

Compressor summary: The paper presents a polarized image deblurring pipeline that uses a neural network to handle motion blur caused by camera shakes in polarization-based vision applications.


Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

http://arxiv.org/abs/2402.18133v1

Compressor summary: The paper studies image recognition fairness, showing that it's a widespread issue due to problematic representations rather than biased classifiers, and suggests improving fairness can enhance performance.


On the Inductive Biases of Demographic Parity-based Fair Learning Algorithms

http://arxiv.org/abs/2402.18129v1

Compressor summary: This paper analyzes how demographic parity (DP) can lead to biased classifiers if the training data is imbalanced, and proposes a distributionally robust optimization method to improve fairness.


Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

http://arxiv.org/abs/2402.18128v1

Compressor summary: MLO-MAE is a new method for self-supervised learning that uses feedback from downstream tasks to improve masking of image patches, leading to better visual representations.


Hierarchical Multi-Relational Graph Representation Learning for Large-Scale Prediction of Drug-Drug Interactions

http://arxiv.org/abs/2402.18127v1

Compressor summary: The paper introduces a hierarchical multi-relational graph representation learning (HMGRL) approach to predict drug-drug interactions (DDI) by capturing both explicit and implicit correlations between drugs using heterogeneous graphs, relational graph convolutional networks (RGCN), and multi-view differentiable spectral clustering (MVDSC).


G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment

http://arxiv.org/abs/2402.18122v1

Compressor summary: G4G is a novel framework for high fidelity talking face generation with fine-grained intra-modal alignment, which achieves better synchronization of lip movements and audio than existing methods.


Saving the legacy of Hero Ibash: Evaluating Four Language Models for Aminoacian

http://arxiv.org/abs/2402.18121v1

Compressor summary: The study evaluates four advanced language models in Aminoacian, a low-resourced language, to improve natural language processing and promote inclusivity.


Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

http://arxiv.org/abs/2402.18120v1

Compressor summary: The study examines how LLMs encode human values in different languages and suggests optimal data composition for pre-training multilingual models.


PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

http://arxiv.org/abs/2402.18117v1

Compressor summary: The text proposes a robust framework for Semi-Supervised Semantic Segmentation using Probabilistic Representations, Global Distribution Prototypes, and Virtual Negatives to improve unsupervised training.


UniVS: Unified and Universal Video Segmentation with Prompts as Queries

http://arxiv.org/abs/2402.18115v1

Compressor summary: The paper introduces UniVS, a unified video segmentation framework that uses prompts as queries to handle different video segmentation tasks in a universal way.


Small But Funny: A Feedback-Driven Approach to Humor Distillation

http://arxiv.org/abs/2402.18113v1

Compressor summary: The paper explores how feedback from a large language model can improve the performance of small language models in creative and complex tasks like humor generation.


Dual-Context Aggregation for Universal Image Matting

http://arxiv.org/abs/2402.18109v1

Compressor summary: The paper proposes a universal image matting framework called DCAM that uses semantic features and dual-context aggregation to robustly estimate alpha matte with or without guidance.


Assessing the Efficacy of Grammar Error Correction: A Human Evaluation Approach in the Japanese Context

http://arxiv.org/abs/2402.18101v1

Compressor summary: The study tested a grammar error detection and correction model on Japanese students' writing samples, showing high accuracy but conservativeness in detecting errors.


Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models

http://arxiv.org/abs/2402.18099v1

Compressor summary: This paper introduces MedLaSA, a method to modify large language models for accurate medical knowledge using scalable adapters based on causal tracing, and evaluates its effectiveness with new benchmarks and metrics.


No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

http://arxiv.org/abs/2402.18096v1

Compressor summary: The paper examines the negative effects of evicting key-value pairs from the cache in large language models and proposes MiKV, a compression method that balances context preservation and generation quality using mixed precision.


Context-aware Talking Face Video Generation

http://arxiv.org/abs/2402.18092v1

Compressor summary: The paper proposes a method to generate realistic face videos of multiple people talking, considering context like audience and surroundings, using facial landmarks as control signals.


Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

http://arxiv.org/abs/2402.18091v1

Compressor summary: Polos is a new automatic evaluation metric for image captioning models that uses contrastive learning to compute scores from multimodal inputs and is trained on human feedback from a large dataset.


Generalizable Two-Branch Framework for Image Class-Incremental Learning

http://arxiv.org/abs/2402.18086v1

Compressor summary: The paper proposes a two-branch continual learning framework that improves upon existing methods by combining a main branch model and a lightweight side branch network, leading to better performance on multiple image datasets.


Spannotation: Enhancing Semantic Segmentation for Autonomous Navigation with Efficient Image Annotation

http://arxiv.org/abs/2402.18084v1

Compressor summary: Spannotation is a fast and user-friendly tool for annotating images in autonomous navigation tasks that achieves high accuracy with a U-Net model trained on its segmentation masks.


Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

http://arxiv.org/abs/2402.18078v1

Compressor summary: The paper proposes a new image generation method called Coarse-to-Fine Latent Diffusion (CFLD) that improves pose-guided person image synthesis by using semantic understanding and multi-scale attention.


SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

http://arxiv.org/abs/2402.18068v1

Compressor summary: Key points: - Image synthesis faces challenge of complex artifacts - Fine-tune Vision-Language Model (VLM) as artifact classifier - Develop SynArtifact-1K dataset with artifact annotations - VLM outperforms baseline by 25.66% - Use VLM feedback to improve generative model quality Summary: The authors propose an end-to-end artifact classification task and solution for image synthesis, using a fine-tuned VLM to identify and classify artifacts, create SynArtifact-1K dataset, and refine the generative model.


Six-Point Method for Multi-Camera Systems with Reduced Solution Space

http://arxiv.org/abs/2402.18066v1

Compressor summary: The paper proposes several minimal solvers using six point correspondences to compute relative pose of multi-camera systems with improved accuracy and efficiency.


On the use of Silver Standard Data for Zero-shot Classification Tasks in Information Extraction

http://arxiv.org/abs/2402.18061v1

Compressor summary: The paper proposes Clean-LaVe, a framework that uses silver standard data to improve zero-shot information extraction performance by finetuning off-the-shelf models and achieves significant improvements on various datasets.


Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

http://arxiv.org/abs/2402.18060v1

Compressor summary: The paper introduces two new challenging datasets for large language models to answer complex medical questions with explanations, showing that existing models struggle with consistency in explaining their answers.


Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

http://arxiv.org/abs/2402.18059v1

Compressor summary: The paper proposes a novel multi-objective optimization approach for watermarking large language models to distinguish AI-generated texts from human-written ones while maintaining their semantic coherence.


Contextualizing Generated Citation Texts

http://arxiv.org/abs/2402.18054v1

Compressor summary: The paper proposes a method to generate citations and context windows together, improving citation relevance and readability for humans.


MEGAnno+: A Human-LLM Collaborative Annotation System

http://arxiv.org/abs/2402.18050v1

Compressor summary: MEGAnno+ is a system that enables humans and large language models to work together for accurate and efficient data labeling in NLP tasks.


Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

http://arxiv.org/abs/2402.18048v1

Compressor summary: The paper proposes using local intrinsic dimension (LID) of model activations to measure the truthfulness of texts generated by large language models (LLMs).


Data augmentation method for modeling health records with applications to clopidogrel treatment failure detection

http://arxiv.org/abs/2402.18046v1

Compressor summary: The paper proposes a new way to create more data for NLP models using EHRs by rearranging records within visits, which improves detection of clopidogrel treatment failure and works well with limited data.


Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore

http://arxiv.org/abs/2402.18045v1

Compressor summary: The paper evaluates multilingual large language models' factual accuracy, finding that English is more accurate and biased towards Western continents.


SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation

http://arxiv.org/abs/2402.18044v1

Compressor summary: SFTformer is a model that decouples spatial and temporal features to effectively predict future weather radar echoes for precipitation nowcasting.


Crisis talk: analysis of the public debate around the energy crisis and cost of living

http://arxiv.org/abs/2402.18043v1

Compressor summary: The paper examines how the UK public debates the energy crisis, cost of living, and their interrelated issues using natural language processing and data visualisation techniques.


Datasets for Large Language Models: A Comprehensive Survey

http://arxiv.org/abs/2402.18041v1

Compressor summary: The paper surveys Large Language Model datasets, examining their roles, challenges, and trends across five perspectives and providing a comprehensive dataset resource list with statistics.


Automated Discovery of Integral with Deep Learning

http://arxiv.org/abs/2402.18040v1

Compressor summary: The study explores deep learning's potential for rediscovering mathematical concepts like integrals and demonstrates AI's ability to infer basic integrals using sequence-to-sequence models or by uncovering fundamental principles.


ResLoRA: Identity Residual Mapping in Low-Rank Adaption

http://arxiv.org/abs/2402.18039v1

Compressor summary: ResLoRA improves upon low-rank adaptation (LoRA) for fine-tuning large language models by adding residual paths and merging them during inference, leading to better results with fewer training steps and no extra cost.


Human Shape and Clothing Estimation

http://arxiv.org/abs/2402.18032v1

Compressor summary: The paper surveys recent advances, strengths, limitations, and approaches in computer vision tasks related to estimating human shape and clothing for various applications.


OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine

http://arxiv.org/abs/2402.18028v1

Compressor summary: OpenMEDLab is an open-source platform that accelerates the development of domain-specific foundation models for multi-modal medical applications using pre-trained models and transfer learning techniques.


Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

http://arxiv.org/abs/2402.18027v1

Compressor summary: This paper proposes a new black-box model inversion attack method that uses a pre-trained GAN as prior information and gradient-free optimizer, achieving high-quality and high-resolution image generation across diverse data distributions.


Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions

http://arxiv.org/abs/2402.18025v1

Compressor summary: LINGOLLM is a training-free approach that uses linguistic knowledge to enable large language models to process and translate endangered languages with few resources.


Do Large Language Models Mirror Cognitive Language Processing?

http://arxiv.org/abs/2402.18023v1

Compressor summary: The paper proposes a method to evaluate how well large language models simulate human cognition by measuring their alignment with fMRI signals of the brain and examines various factors affecting this alignment.


A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

http://arxiv.org/abs/2402.18013v1

Compressor summary: The paper reviews existing large language models and their applications in open-domain and task-oriented multi-turn dialogue systems, highlighting challenges and future directions.


Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints

http://arxiv.org/abs/2402.18012v1

Compressor summary: The paper proposes a two-stage diffusion model framework to optimize problems without explicit objective functions or constraints, using sampling from the product of Boltzmann distributions defined by the objective and data distributions.


Representing 3D sparse map points and lines for camera relocalization

http://arxiv.org/abs/2402.18011v1

Compressor summary: The study presents a lightweight neural network that learns to represent 3D point and line features for visual localization and mapping, achieving leading pose accuracy and outperforming state-of-the-art methods in indoor and outdoor scenarios.


Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations

http://arxiv.org/abs/2402.18008v1

Compressor summary: The paper introduces two fast and interpretable methods, SKS and ACA, for decomposing 2D homographies using minimal points and polynomial parameterization, with ACA being efficient enough to be a plug-in module in feature-based or deep homography pipelines.


Mixer is more than just a model

http://arxiv.org/abs/2402.18007v1

Compressor summary: MLP-Mixer is a popular neural network architecture for computer vision that fuses channel and token information, and a new model called Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) applies this concept to audio data for improved classification performance.


Exploring Multi-Document Information Consolidation for Scientific Sentiment Summarization

http://arxiv.org/abs/2402.18005v1

Compressor summary: The authors propose a three-layer framework for summarizing scientific sentiments in meta-review generation and test its effectiveness with LLMs.