This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-29 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.18573v1
Compressor summary: The paper proposes UniMODE, a bird's-eye-view detector that can handle diverse indoor and outdoor scenes in 3D object detection by using uneven grid, sparse feature projection, and domain alignment techniques.
http://arxiv.org/abs/2402.18571v1
Compressor summary: The text introduces Directional Preference Alignment (DPA), a framework that uses multi-objective reward modeling to capture diverse user preferences for large language models.
http://arxiv.org/abs/2402.18567v1
Compressor summary: The paper presents DPLM, a protein language model that can generate diverse and structurally plausible protein sequences using a diffusion-based pre-training method and can be fine-tuned for various predictive tasks or conditioned on different inputs.
http://arxiv.org/abs/2402.18563v1
Compressor summary: The authors develop a system that combines language models with information retrieval to predict future events and compare its performance to human forecasters, achieving similar or better results in some cases.
http://arxiv.org/abs/2402.18553v1
Compressor summary: Using a fixed exposure time improves radiometric accuracy and precision in multispectral images for agricultural applications.
http://arxiv.org/abs/2402.18551v1
Compressor summary: This paper studies how gradient descent optimizes next-token prediction (NTP) models and finds conditions under which it reaches the optimal solution and when it favors certain structures.
http://arxiv.org/abs/2402.18546v1
Compressor summary: The study compares two models for neural data representation, finding that the transformer-based TOTEM model performs better in handling variability and sensor failure in electroencephalography datasets.
http://arxiv.org/abs/2402.18540v1
Compressor summary: The paper proposes the "Pure Tuning, Safe Testing" principle to mitigate unsafe behaviors in large language models fine-tuned on seemingly safe datasets.
http://arxiv.org/abs/2402.18528v1
Compressor summary: Our method balances gradients and distills knowledge to improve class-incremental learning on non-uniform data with dual imbalance problems, reducing overfitting and forgetting.
http://arxiv.org/abs/2402.18527v1
Compressor summary: The paper presents a robust method for detecting tire defects using traditional and advanced features and machine learning models, achieving high accuracy and reliability.
http://arxiv.org/abs/2402.18512v1
Compressor summary: Log-NCDEs use the Log-ODE method from rough paths to improve training of neural differential equations for multivariate time series classification, achieving higher accuracy than other models.
http://arxiv.org/abs/2402.18510v1
Compressor summary: The paper examines how Chain-of-Thought improves RNNs' performance on algorithmic problems but is not enough to match Transformers, and proposes techniques to enhance RNNs' in-context retrieval ability.
http://arxiv.org/abs/2402.18508v1
Compressor summary: Orchid is a new architecture that uses data-dependent convolution to improve sequence modeling efficiency and expressivity, outperforming traditional attention-based models like BERT and Vision Transformers.
http://arxiv.org/abs/2402.18507v1
Compressor summary: The paper proposes a deep learning framework that uses advanced image techniques to improve the analysis of cardiac images for detecting late mechanical activation.
http://arxiv.org/abs/2402.18505v1
Compressor summary: The paper proposes an interactive genetic programming algorithm for automatic workflow composition in automated machine learning, which improves performance and reduces tuning time by allowing users to modify the grammar dynamically.
http://arxiv.org/abs/2402.18503v1
Compressor summary: This paper proposes a new object detection model that combines YOLOX with spatio-temporal features from consecutive video frames to better detect micromobility vehicles in urban traffic.
http://arxiv.org/abs/2402.18502v1
Compressor summary: This study proposes a framework for assessing and improving fairness in large language models using in-context learning and shows that GPT-4 performs better than other models in terms of accuracy and fairness.
http://arxiv.org/abs/2402.18496v1
Compressor summary: The study reveals how Large Language Models represent self and others' beliefs in their neural activations and manipulates them to understand mental states in various social reasoning tasks.
http://arxiv.org/abs/2402.18495v1
Compressor summary: ROG$_{PL}$ is a framework that uses prototype learning to improve robust open-set node classification on noisy graph data.
http://arxiv.org/abs/2402.18493v1
Compressor summary: The paper proposes DRET, a rain simulation method, and SRKD, a knowledge distillation approach, to improve 3D object detection under various weather conditions.
http://arxiv.org/abs/2402.18491v1
Compressor summary: The study uses statistical physics methods to analyze generative diffusion models in large dimensions and datasets, identifying three distinct dynamical regimes and showing how they relate to phase transitions and the curse of dimensionality.
http://arxiv.org/abs/2402.18490v1
Compressor summary: TAMM is a novel two-stage learning approach that uses three synergetic adapters to effectively leverage image and language modalities for pre-training 3D shape representations, improving performance on various tasks.
http://arxiv.org/abs/2402.18479v1
Compressor summary: NewsQs is a dataset containing question-answer pairs for multiple news articles created by fine-tuning a T5 model on FAQ-style news and filtering the data with QNLI.
http://arxiv.org/abs/2402.18477v1
Compressor summary: The paper proposes a new method to infer causal structures from stochastic dynamical systems using path-space data and signature kernels, which performs better than existing approaches.
http://arxiv.org/abs/2402.18476v1
Compressor summary: The paper proposes an image-biased decoding technique to reduce hallucinations in large vision-language models by contrasting predictions from conventional and image-biased models, improving the quality of generated responses.
http://arxiv.org/abs/2402.18467v1
Compressor summary: The authors propose SeCo, a method that separates co-occurring objects in image patches and enhances semantic representation with multi-granularity knowledge contrast to tackle the challenging co-occurrence problem in weakly supervised semantic segmentation.
http://arxiv.org/abs/2402.18458v1
Compressor summary: MetaEOL is an unsupervised method that uses meta-task prompts to generate high-quality sentence embeddings from LLMs without fine-tuning or task engineering.
http://arxiv.org/abs/2402.18449v1
Compressor summary: HOP is a method that enables continual learning in NLP by adapting to new tasks and domains, preserving past knowledge, and distinguishing task-related statistics using high-order moments and auxiliary heads.
http://arxiv.org/abs/2402.18447v1
Compressor summary: The paper proposes a dynamic object-centric perception network using prompt learning to improve single-domain generalization for image classification and object detection tasks by adapting to image complexity variations.
http://arxiv.org/abs/2402.18443v1
Compressor summary: LeMo-NADe is a framework that uses an expert system, a large language model, and user preferences to automatically discover efficient neural network architectures for edge devices.
http://arxiv.org/abs/2402.18439v1
Compressor summary: The text explores how allowing Large Language Models to choose non-natural language formats for reasoning and communication can improve efficiency and effectiveness.
http://arxiv.org/abs/2402.18434v1
Compressor summary: Key points: - XC trains an encoder and a classifier to tag data points with relevant labels from a large universe - GCNs are powerful but costly for XC applications that have tail labels with little training data - RAMEN is a new paradigm that uses graph data to regularize encoder training without GCNs, achieving higher accuracy and lower costs Summary: RAMEN improves extreme classification by using graph data to regularize encoder training instead of expensive graph convolutional networks.
http://arxiv.org/abs/2402.18428v1
Compressor summary: DCMCL is a novel collaborative learning method that improves both AR and NAR models by leveraging bilateral contextual information from different types of generative models for Neural Machine Translation.
http://arxiv.org/abs/2402.18426v1
Compressor summary: The paper explores how the relational bottleneck, a mechanism that focuses on relations among inputs, improves neural networks' generalization, learning efficiency, and ability to form compositional representations like humans.
http://arxiv.org/abs/2402.18424v1
Compressor summary: Key points: - The paper presents a cross-lingual emotion classifier that transfers learning from English to other languages. - Two transfer approaches are compared: parallel projection and direct transfer. - The results show that both approaches outperform random baselines and direct transfer is better. - Emotion-labeled resources are created for four languages. Summary: The paper proposes a cross-lingual emotion classifier that learns from English and transfers to other languages, comparing two methods and creating labeled resources for some languages.
http://arxiv.org/abs/2402.18419v1
Compressor summary: Key points: - Prior authorization (PA) is a health plan cost-control process that requires clearance in advance for certain procedures. - Validating PA requests is time-consuming and challenging for health insurance companies. - The authors evaluate if GPT can help validate key factors using question answering from patient electronic health records. - They experiment with different prompting techniques and report qualitative assessment by humans. - Their method outperforms standard counterparts with a mean weighted F1 score of 0.61. Summary: The authors use GPT to validate prior authorization requests for health plans using question answering from patient records, improving efficiency and accuracy over conventional methods.
http://arxiv.org/abs/2402.18411v1
Compressor summary: ProtoOT is a novel Optimal Transport method for unsupervised cross-domain image retrieval that integrates intra-domain feature learning and cross-domain alignment, using K-means clustering and contrastive learning to improve performance.
http://arxiv.org/abs/2402.18409v1
Compressor summary: The paper introduces a new benchmark to test the high-level cognitive abilities of LVLMs using images with rich semantics, inspired by a human cognition task.
http://arxiv.org/abs/2402.18402v1
Compressor summary: The paper proposes SyMPIE, a modular system that enhances noisy input data for robust multimedia understanding tasks with minimal computational cost and without needing paired clean-corrupted data.
http://arxiv.org/abs/2402.18397v1
Compressor summary: The paper presents a new method to test how well large language models understand different languages by generating individual prompts for each token in a sentence and evaluates it on part-of-speech tagging tasks.
http://arxiv.org/abs/2402.18393v1
Compressor summary: This paper proposes a method to test autonomous driving systems' decision-making quality by generating scenarios where they don't make optimal choices, and a new metamorphic relation to identify such scenarios.
http://arxiv.org/abs/2402.18392v1
Compressor summary: The paper introduces a new method for selecting the best Conditional Average Treatment Effect (CATE) estimator using Distributionally Robust Metric (DRM), which is effective and requires less additional models.
http://arxiv.org/abs/2402.18385v1
Compressor summary: Key points: - Conversational multi-doc QA is answering questions based on documents and contextual conversations - The paper introduces a winning approach using large language models (LLMs) - The approach adapts LLMs to the task, uses in-domain unlabeled data, filters irrelevant documents, and ensembles models Summary: The paper presents a winning approach for conversational multi-doc QA that leverages large language models for adaptation, data usage, document filtering, and model ensemble.
http://arxiv.org/abs/2402.18383v1
Compressor summary: The authors developed a deep learning framework that combines image features and scanner priors using a novel domain attention block to improve pulmonary emphysema segmentation on CT scans.
http://arxiv.org/abs/2402.18381v1
Compressor summary: Large language models can perform black-box optimization tasks without explicit training by using a novel prompting strategy and outperform baseline algorithms.
http://arxiv.org/abs/2402.18377v1
Compressor summary: The paper presents a framework to analyze and improve out-of-domain generalization in dynamical systems reconstruction using topological concepts and ergodic theory, showing that current black-box deep learning methods struggle with this challenge.
http://arxiv.org/abs/2402.18376v1
Compressor summary: The paper evaluates PathPiece, a new tokenizer that segments text into the minimum number of tokens for a given vocabulary, and tests its effectiveness compared to BPE; it also investigates various factors in tokenization and trains 64 language models with different tokenization methods.
http://arxiv.org/abs/2402.18374v1
Compressor summary: VerifiNER is a framework that uses knowledge to correct errors in biomedical named entity recognition.
http://arxiv.org/abs/2402.18362v1
Compressor summary: The study presents AG-DDAD, an automated method to assess breast cosmesis after surgery using attention-guided denoising diffusion and self-supervised Vision Transformer, which outperforms existing models and eliminates manual annotations.
http://arxiv.org/abs/2402.18351v1
Compressor summary: LatentSwap is a lightweight face swapping framework that uses latent codes to swap faces between images, producing realistic results with minimal data and fast training.
http://arxiv.org/abs/2402.18344v1
Compressor summary: RIDERS is a novel method to improve large language models' commonsense reasoning and mitigate information loss issues in Chain-of-Thought reasoning.
http://arxiv.org/abs/2402.18337v1
Compressor summary: The paper proposes a new method for designing efficient and robust experiments using a conditional normalizing flow and a Bernoulli distribution.
http://arxiv.org/abs/2402.18334v1
Compressor summary: Bonito generates synthetic tasks for instruction tuning of large language models using unannotated text, improving their zero-shot performance on various domain tasks.
http://arxiv.org/abs/2402.18331v1
Compressor summary: FineDiffusion is a parameter-efficient strategy for large-scale fine-grained image generation using diffusion models, with a novel sampling method and achieving state-of-the-art results.
http://arxiv.org/abs/2402.18330v1
Compressor summary: EgoTAP is a novel method that converts heatmaps to accurate 3D pose estimation using self-attention and skeletal information, improving performance over previous methods.
http://arxiv.org/abs/2402.18320v1
Compressor summary: The paper proposes a new neural network approach to estimate head pose in fisheye images without rectification or calibration, achieving better performance than existing methods.
http://arxiv.org/abs/2402.18312v1
Compressor summary: This study explores how Large Language Models use multiple parallel pathways to generate Chain-of-Thought reasoning and reveals an internal phase shift between their initial and later layers.
http://arxiv.org/abs/2402.18311v1
Compressor summary: The paper presents a hybrid optimization method that improves placement in physical design by escaping local optima and outperforms existing methods on two benchmarks.
http://arxiv.org/abs/2402.18309v1
Compressor summary: The paper introduces a new algorithm that uses LiDAR point clouds to automatically detect trees blocking roadways and help municipalities manage them for safer streets.
http://arxiv.org/abs/2402.18307v1
Compressor summary: The paper proposes a method to segment objects in low-light images using Mask R-CNN with weighted non-local blocks for feature denoising and improved performance.
http://arxiv.org/abs/2402.18302v1
Compressor summary: The paper introduces AR-MOT, a challenging audio-based object tracking task for autonomous driving, and presents EchoTrack, an end-to-end framework using dual vision transformers and bidirectional audio-video fusion to address it.
http://arxiv.org/abs/2402.18296v1
Compressor summary: This study compares two machine learning algorithms (XGBoost and MiniRocket) for human activity recognition using smartphone sensor data, finding that both achieve high accuracy and efficiency, with XGBoost slightly outperforming MiniRocket.
http://arxiv.org/abs/2402.18293v1
Compressor summary: GRAD is a new anomaly detection method that uses continuous grids to represent normal features, improving generalization and handling multiple classes of objects.
http://arxiv.org/abs/2402.18292v1
Compressor summary: Key points: - The text describes a method to improve few-shot-learning classification by generating new samples of unseen classes through image-to-image translation - The method captures the style or shape of the test image and transfers it to train-class images - The method can achieve significant performance improvement with just one additional generated sample Summary: The authors propose a method that generates new samples of unseen classes for few-shot-learning classification by using image-to-image translation to match the style or shape of test images. This can improve the accuracy of trained models with minimal extra data.
http://arxiv.org/abs/2402.18287v1
Compressor summary: The paper proposes a new method for reconstructing indoor scenes from single images using a U-Former architecture with a Windowed-FourierMixer block, which performs better than existing methods in handling periodic structures and achieving realistic results.
http://arxiv.org/abs/2402.18286v1
Compressor summary: This paper shows how self-supervised learning from unlabeled electron microscopy data improves efficiency and performance for various tasks, such as segmentation and denoising.
http://arxiv.org/abs/2402.18285v1
Compressor summary: PiShield is a framework that integrates safety requirements into neural networks' topology, ensuring compliance regardless of input, and can be used in various domains like functional genomics, autonomous driving, and tabular data generation.
http://arxiv.org/abs/2402.18284v1
Compressor summary: Key points: - ChatGPT uses reinforcement learning from human feedback, which is costly and time-consuming - The paper proposes a self-supervised text ranking approach to fine-tune language models without human annotators - The method involves probabilistic sampling, TextRank, ISODATA, reward model, and policy optimization - The experiments show that the proposed method improves metrics and matches human ranking results Summary: The paper presents a self-supervised text ranking approach to fine-tune language models like ChatGPT without human feedback, using probabilistic sampling, clustering, reward modeling, and policy optimization.
http://arxiv.org/abs/2402.18281v1
Compressor summary: The paper investigates why contrastive self-supervised learning (SSL) works well for sentence representation learning (SRL), and proposes a unified paradigm that integrates four effective contrastive losses based on gradient dissipation, weight, and ratio, which improves non-contrastive SSL performance in SRL.
http://arxiv.org/abs/2402.18278v1
Compressor summary: EAN-MapNet is an efficient and accurate HD map construction system using anchor neighborhoods and grouped local self-attention.
http://arxiv.org/abs/2402.18277v1
Compressor summary: The authors propose a deep learning model that uses slot attention to separate multiple light sources and achieve state-of-the-art white balancing results while providing information on the number and color of light sources.
http://arxiv.org/abs/2402.18272v1
Compressor summary: The text discusses how single-agent LLMs with strong prompts can perform almost as well as multi-agent discussion on reasoning tasks, except when there's no demonstration in the prompt.
http://arxiv.org/abs/2402.18267v1
Compressor summary: The text surveys the advancements in Neural Question Generation (NQG), which uses neural networks to generate relevant questions from diverse inputs, and classifies NQG approaches into structured, unstructured, and hybrid categories.
http://arxiv.org/abs/2402.18264v1
Compressor summary: The paper introduces Wiki-GenBen, a benchmark for evaluating Large Language Models' ability to generate factual full-length Wikipedia articles from web sources for recently occurred events.
http://arxiv.org/abs/2402.18262v1
Compressor summary: WebLM is a multimodal pre-training network that improves webpage understanding by integrating document images' structure and interacting with text and structure modalities.
http://arxiv.org/abs/2402.18260v1
Compressor summary: Key points: - Active learning of physical systems needs to consider safety constraints - Gaussian Processes (GPs) with uncertainty estimations are common tools for this task - Safety assessment along continuous trajectories is challenging due to Monte-Carlo sampling of high quantiles - Provable safety bounds based on adaptively sampled median of posterior GP are proposed - The method reduces samples, improves speed and accuracy, and works in simulations and a real engine example Summary: The paper proposes a safe active learning method for physical systems that uses adaptive sampling of the posterior GP to provide provable safety bounds and reduce Monte-Carlo sampling, achieving faster evaluation without sacrificing accuracy.
http://arxiv.org/abs/2402.18258v1
Compressor summary: The authors propose a Multi-Intent dataset for realistic in-Vehicle dialogue Systems, using a BiRGAT model to encode hierarchical ontology items and a 3-way pointer-generator decoder to tackle multi-intent cases.
http://arxiv.org/abs/2402.18252v1
Compressor summary: The paper introduces generalist prompting, a method to make large language models perform well on various tasks without needing specialized prompts, and proposes MeMo, a simple but effective prompting technique that uses mental models for different tasks.
http://arxiv.org/abs/2402.18251v1
Compressor summary: The paper proposes an edge detection method for number plate extraction that works well in both noisy and clean environments, using pixel intensity changes and MATLAB 2017b.
http://arxiv.org/abs/2402.18243v1
Compressor summary: The paper investigates the factors behind instruction fine-tuning in language models, finding that learning additional world knowledge is not always beneficial and maintaining consistency is crucial for success.
http://arxiv.org/abs/2402.18236v1
Compressor summary: This study trained a deep learning model to generate patient-specific volume-meshes of the pulmonary artery from 3D cardiac MRI data and directly estimate CFD flow fields, achieving high accuracy and speed.
http://arxiv.org/abs/2402.18233v1
Compressor summary: DescReg is a zero-shot method for aerial object detection that uses prior descriptions of visual appearance to improve semantic-visual correlation and outperforms state-of-the-art methods on three challenging datasets.
http://arxiv.org/abs/2402.18225v1
Compressor summary: CogBench is a benchmark for evaluating large language models based on cognitive psychology experiments, revealing the impact of model size, RLHF, open-source vs proprietary models, and prompt-engineering techniques on their behavior.
http://arxiv.org/abs/2402.18223v1
Compressor summary: Adaptive decoding is a mechanism that helps language models dynamically choose better candidates for the next token during text generation, improving quality and diversity in tasks like storytelling.
http://arxiv.org/abs/2402.18217v1
Compressor summary: RECNet is a network that can correct images with mixed exposure by adapting regional features into an exposure-invariant space and restoring local information using a mix-scale restoration unit, while maintaining global image quality through exposure contrastive regularization.
http://arxiv.org/abs/2402.18216v1
Compressor summary: The paper studies how task-switches in conversations affect the performance of large language models and finds that they can cause significant degradation.
http://arxiv.org/abs/2402.18213v1
Compressor summary: Key points: - The paper proposes a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics across devices. - The algorithm uses a hypernetwork to parameterize the joint architectural distribution via hardware features and preference vectors, enabling zero-shot transferability. - The method is effective and scalable, outperforming existing MOO NAS methods on various search spaces and datasets. Summary: The paper introduces a new NAS algorithm that leverages user preferences and a hypernetwork to find diverse and representative architectures across devices with performance-hardware trade-offs, achieving better results than previous MOO NAS methods.
http://arxiv.org/abs/2402.18211v1
Compressor summary: The text proposes using catastrophic overfitting as a way to improve adversarial robustness without sacrificing accuracy on clean data by manipulating feature activation differences with regularization terms and adding noise during evaluation.
http://arxiv.org/abs/2402.18209v1
Compressor summary: The paper introduces a high-granularity named entity dataset (DANSK), a generalizable model (DaCy 2.6.0), and evaluates existing models' domain generalization in Danish NLP, addressing limitations and discussing annotation quality.
http://arxiv.org/abs/2402.18206v1
Compressor summary: The paper proposes Distribution Guidance, a method to reduce bias in diffusion models' image generation by using Attribute Distribution Predictor (ADP) that guides fair generation based on latent features of denoising UNet.
http://arxiv.org/abs/2402.18202v1
Compressor summary: This paper introduces a new RGB image dataset for oil spill detection using drones and neural networks, which can help improve environmental protection in port areas.
http://arxiv.org/abs/2402.18201v1
Compressor summary: The CDS algorithm separates invariant inter-pixel correlations from statistical properties in images by using auxiliary modalities and mutual information minimization, improving superpixel grouping performance.
http://arxiv.org/abs/2402.18198v1
Compressor summary: The text discusses the challenges of applying automated machine learning (AutoML) to single-label and multi-label classification tasks, and proposes a novel AutoML approach for SLC with limited algorithm complexity and explores its extension to MLC and improving flexibility and efficiency.
http://arxiv.org/abs/2402.18196v1
Compressor summary: The text introduces a new dataset and method for generating top-view human pose estimation data using NeRF, which improves the performance of neural networks for this task.
http://arxiv.org/abs/2402.18192v1
Compressor summary: The paper proposes a Frequency Distribution Loss (FDL) to address the challenge of training deep learning-based image transformation methods on poorly aligned paired datasets, by measuring distribution distance in the frequency domain and improving performance on image enhancement, super-resolution, and style transfer tasks.
http://arxiv.org/abs/2402.18191v1
Compressor summary: CaR is a data selection method that efficiently selects high-quality instructional data for GPT models by ranking instruction pairs based on expert preferences and preserving dataset diversity.
http://arxiv.org/abs/2402.18181v1
Compressor summary: The proposed framework uses contrastive feature distillation to improve stereo matching in foggy scenes by combining feature learning from both clean and foggy features, enhancing generalization across different environments.
http://arxiv.org/abs/2402.18179v1
Compressor summary: The paper explores using pre-trained graph neural networks for context-based fake news detection and finds that current pre-training strategies do not significantly improve performance over training from scratch.
http://arxiv.org/abs/2402.18178v1
Compressor summary: Key points: - Paper proposes polarization-to-polarization approach for reflection removal - Uses two sequential networks and recurrent framework to exploit polarization information - Outperforms existing methods on a public dataset Summary: The paper presents a novel method for removing reflections from images by using polarized images as inputs and predicting polarized reflection and transmission images with two sequential networks and a recurrent framework.
http://arxiv.org/abs/2402.18175v1
Compressor summary: The paper proposes a self-supervised learning method for estimating spatially variant point spread functions (PSFs) from real sharp and blurred images, which improves aberration-aware depth-from-defocus (DfD).
http://arxiv.org/abs/2402.18172v1
Compressor summary: The text describes a framework that uses cooperative learning between visible and infrared images to enhance image quality and visual perception for rainy nighttime driving scenes, addressing challenges faced by autonomous driving systems.
http://arxiv.org/abs/2402.18171v1
Compressor summary: The paper proposes a normal map-based method to improve learning-based stereo matching in challenging regions by using non-local affinity matrix and local residual learning.
http://arxiv.org/abs/2402.18169v1
Compressor summary: MIKO is a framework that uses two large language models to understand users' intentions in social media posts by interpreting images, extracting text information, and generating intentions.
http://arxiv.org/abs/2402.18164v1
Compressor summary: The paper presents an autoencoder framework for embedding complex tabular data, showing simpler models perform better and improving reconstruction loss calculation for contractive autoencoders.
http://arxiv.org/abs/2402.18163v1
Compressor summary: The paper introduces a new efficient approach to compress face recognition models using much smaller datasets than previously required, achieving state-of-the-art results and transformative applications.
http://arxiv.org/abs/2402.18162v1
Compressor summary: The paper introduces a simple Neural Activation Prior (NAP) for out-of-distribution detection in neural networks, which uses strongly activated neurons before global pooling to detect patterns in input samples and achieves state-of-the-art performance on various image datasets.
http://arxiv.org/abs/2402.18159v1
Compressor summary: The paper introduces a general framework for risk-sensitive distributional reinforcement learning and proposes two novel meta-algorithms that achieve statistically efficient regret bounds.
http://arxiv.org/abs/2402.18158v1
Compressor summary: This paper evaluates post-training quantization (PTQ) techniques for large language models (LLMs), studying their impact on different tasks and model families, and providing recommendations for application.
http://arxiv.org/abs/2402.18157v1
Compressor summary: Sum2Act is a novel tool invocation pipeline that mimics the human problem-solving process, improving LLMs' ability to use and create tools for complex real-world tasks.
http://arxiv.org/abs/2402.18154v1
Compressor summary: The paper explores knowledge conflicts in language models due to external context and proposes PH3, a method to prune conflicting attention heads without updating parameters, improving performance on open-domain QA tasks.
http://arxiv.org/abs/2402.18153v1
Compressor summary: Key points: - Transfer learning improves deep learning performance on new tasks - Pretrained models are often suboptimal and blindly selected - The proposed method uses a latent diffusion model with a variational autoencoder to learn the distribution of pretrained weights for each dataset - This enables adaptive sampling of weights for faster convergence and competitive performance Summary: The paper proposes an efficient and adaptive transfer learning scheme that uses a latent diffusion model and a variational autoencoder to learn and sample pretrained weights conditioned on each dataset, achieving faster convergence and competitive performance.
http://arxiv.org/abs/2402.18150v1
Compressor summary: The paper proposes InFO-RAG, a low-cost method to train large language models as "Information Refiners" to improve retrieval-augmented generation by integrating knowledge from retrieved texts and model parameters.
http://arxiv.org/abs/2402.18149v1
Compressor summary: The paper proposes a new risk-sensitive reinforcement learning algorithm for partially observable environments with hindsight observations, and proves its low regret and efficiency.
http://arxiv.org/abs/2402.18146v1
Compressor summary: Our novel auto-labelling approach generates 3D scene flow pseudo labels for real-world LiDAR point clouds using rigid motion decomposition and data augmentation, achieving superior performance on multiple datasets without manual labelling.
http://arxiv.org/abs/2402.18145v1
Compressor summary: The paper proposes a new method to improve explanations for neural models in natural language processing, specifically for sentiment analysis, by refining word embeddings with an information bottleneck.
http://arxiv.org/abs/2402.18144v1
Compressor summary: The authors propose "random silicon sampling" to generate opinions aligned with human subgroups based on their demographic data, showing that language models can mimic public opinion polls but are influenced by societal biases.
http://arxiv.org/abs/2402.18139v1
Compressor summary: The paper proposes CARE CA, a framework that combines explicit and implicit causal reasoning using ConceptNet and large language models, enhancing causal understanding and interpretability.
http://arxiv.org/abs/2402.18134v1
Compressor summary: The paper presents a polarized image deblurring pipeline that uses a neural network to handle motion blur caused by camera shakes in polarization-based vision applications.
http://arxiv.org/abs/2402.18133v1
Compressor summary: The paper studies image recognition fairness, showing that it's a widespread issue due to problematic representations rather than biased classifiers, and suggests improving fairness can enhance performance.
http://arxiv.org/abs/2402.18129v1
Compressor summary: This paper analyzes how demographic parity (DP) can lead to biased classifiers if the training data is imbalanced, and proposes a distributionally robust optimization method to improve fairness.
http://arxiv.org/abs/2402.18128v1
Compressor summary: MLO-MAE is a new method for self-supervised learning that uses feedback from downstream tasks to improve masking of image patches, leading to better visual representations.
http://arxiv.org/abs/2402.18127v1
Compressor summary: The paper introduces a hierarchical multi-relational graph representation learning (HMGRL) approach to predict drug-drug interactions (DDI) by capturing both explicit and implicit correlations between drugs using heterogeneous graphs, relational graph convolutional networks (RGCN), and multi-view differentiable spectral clustering (MVDSC).
http://arxiv.org/abs/2402.18122v1
Compressor summary: G4G is a novel framework for high fidelity talking face generation with fine-grained intra-modal alignment, which achieves better synchronization of lip movements and audio than existing methods.
http://arxiv.org/abs/2402.18121v1
Compressor summary: The study evaluates four advanced language models in Aminoacian, a low-resourced language, to improve natural language processing and promote inclusivity.
http://arxiv.org/abs/2402.18120v1
Compressor summary: The study examines how LLMs encode human values in different languages and suggests optimal data composition for pre-training multilingual models.
http://arxiv.org/abs/2402.18117v1
Compressor summary: The text proposes a robust framework for Semi-Supervised Semantic Segmentation using Probabilistic Representations, Global Distribution Prototypes, and Virtual Negatives to improve unsupervised training.
http://arxiv.org/abs/2402.18115v1
Compressor summary: The paper introduces UniVS, a unified video segmentation framework that uses prompts as queries to handle different video segmentation tasks in a universal way.
http://arxiv.org/abs/2402.18113v1
Compressor summary: The paper explores how feedback from a large language model can improve the performance of small language models in creative and complex tasks like humor generation.
http://arxiv.org/abs/2402.18109v1
Compressor summary: The paper proposes a universal image matting framework called DCAM that uses semantic features and dual-context aggregation to robustly estimate alpha matte with or without guidance.
http://arxiv.org/abs/2402.18101v1
Compressor summary: The study tested a grammar error detection and correction model on Japanese students' writing samples, showing high accuracy but conservativeness in detecting errors.
http://arxiv.org/abs/2402.18099v1
Compressor summary: This paper introduces MedLaSA, a method to modify large language models for accurate medical knowledge using scalable adapters based on causal tracing, and evaluates its effectiveness with new benchmarks and metrics.
http://arxiv.org/abs/2402.18096v1
Compressor summary: The paper examines the negative effects of evicting key-value pairs from the cache in large language models and proposes MiKV, a compression method that balances context preservation and generation quality using mixed precision.
http://arxiv.org/abs/2402.18092v1
Compressor summary: The paper proposes a method to generate realistic face videos of multiple people talking, considering context like audience and surroundings, using facial landmarks as control signals.
http://arxiv.org/abs/2402.18091v1
Compressor summary: Polos is a new automatic evaluation metric for image captioning models that uses contrastive learning to compute scores from multimodal inputs and is trained on human feedback from a large dataset.
http://arxiv.org/abs/2402.18086v1
Compressor summary: The paper proposes a two-branch continual learning framework that improves upon existing methods by combining a main branch model and a lightweight side branch network, leading to better performance on multiple image datasets.
http://arxiv.org/abs/2402.18084v1
Compressor summary: Spannotation is a fast and user-friendly tool for annotating images in autonomous navigation tasks that achieves high accuracy with a U-Net model trained on its segmentation masks.
http://arxiv.org/abs/2402.18078v1
Compressor summary: The paper proposes a new image generation method called Coarse-to-Fine Latent Diffusion (CFLD) that improves pose-guided person image synthesis by using semantic understanding and multi-scale attention.
http://arxiv.org/abs/2402.18068v1
Compressor summary: Key points: - Image synthesis faces challenge of complex artifacts - Fine-tune Vision-Language Model (VLM) as artifact classifier - Develop SynArtifact-1K dataset with artifact annotations - VLM outperforms baseline by 25.66% - Use VLM feedback to improve generative model quality Summary: The authors propose an end-to-end artifact classification task and solution for image synthesis, using a fine-tuned VLM to identify and classify artifacts, create SynArtifact-1K dataset, and refine the generative model.
http://arxiv.org/abs/2402.18066v1
Compressor summary: The paper proposes several minimal solvers using six point correspondences to compute relative pose of multi-camera systems with improved accuracy and efficiency.
http://arxiv.org/abs/2402.18061v1
Compressor summary: The paper proposes Clean-LaVe, a framework that uses silver standard data to improve zero-shot information extraction performance by finetuning off-the-shelf models and achieves significant improvements on various datasets.
http://arxiv.org/abs/2402.18060v1
Compressor summary: The paper introduces two new challenging datasets for large language models to answer complex medical questions with explanations, showing that existing models struggle with consistency in explaining their answers.
http://arxiv.org/abs/2402.18059v1
Compressor summary: The paper proposes a novel multi-objective optimization approach for watermarking large language models to distinguish AI-generated texts from human-written ones while maintaining their semantic coherence.
http://arxiv.org/abs/2402.18054v1
Compressor summary: The paper proposes a method to generate citations and context windows together, improving citation relevance and readability for humans.
http://arxiv.org/abs/2402.18050v1
Compressor summary: MEGAnno+ is a system that enables humans and large language models to work together for accurate and efficient data labeling in NLP tasks.
http://arxiv.org/abs/2402.18048v1
Compressor summary: The paper proposes using local intrinsic dimension (LID) of model activations to measure the truthfulness of texts generated by large language models (LLMs).
http://arxiv.org/abs/2402.18046v1
Compressor summary: The paper proposes a new way to create more data for NLP models using EHRs by rearranging records within visits, which improves detection of clopidogrel treatment failure and works well with limited data.
http://arxiv.org/abs/2402.18045v1
Compressor summary: The paper evaluates multilingual large language models' factual accuracy, finding that English is more accurate and biased towards Western continents.
http://arxiv.org/abs/2402.18044v1
Compressor summary: SFTformer is a model that decouples spatial and temporal features to effectively predict future weather radar echoes for precipitation nowcasting.
http://arxiv.org/abs/2402.18043v1
Compressor summary: The paper examines how the UK public debates the energy crisis, cost of living, and their interrelated issues using natural language processing and data visualisation techniques.
http://arxiv.org/abs/2402.18041v1
Compressor summary: The paper surveys Large Language Model datasets, examining their roles, challenges, and trends across five perspectives and providing a comprehensive dataset resource list with statistics.
http://arxiv.org/abs/2402.18040v1
Compressor summary: The study explores deep learning's potential for rediscovering mathematical concepts like integrals and demonstrates AI's ability to infer basic integrals using sequence-to-sequence models or by uncovering fundamental principles.
http://arxiv.org/abs/2402.18039v1
Compressor summary: ResLoRA improves upon low-rank adaptation (LoRA) for fine-tuning large language models by adding residual paths and merging them during inference, leading to better results with fewer training steps and no extra cost.
http://arxiv.org/abs/2402.18032v1
Compressor summary: The paper surveys recent advances, strengths, limitations, and approaches in computer vision tasks related to estimating human shape and clothing for various applications.
http://arxiv.org/abs/2402.18028v1
Compressor summary: OpenMEDLab is an open-source platform that accelerates the development of domain-specific foundation models for multi-modal medical applications using pre-trained models and transfer learning techniques.
http://arxiv.org/abs/2402.18027v1
Compressor summary: This paper proposes a new black-box model inversion attack method that uses a pre-trained GAN as prior information and gradient-free optimizer, achieving high-quality and high-resolution image generation across diverse data distributions.
http://arxiv.org/abs/2402.18025v1
Compressor summary: LINGOLLM is a training-free approach that uses linguistic knowledge to enable large language models to process and translate endangered languages with few resources.
http://arxiv.org/abs/2402.18023v1
Compressor summary: The paper proposes a method to evaluate how well large language models simulate human cognition by measuring their alignment with fMRI signals of the brain and examines various factors affecting this alignment.
http://arxiv.org/abs/2402.18013v1
Compressor summary: The paper reviews existing large language models and their applications in open-domain and task-oriented multi-turn dialogue systems, highlighting challenges and future directions.
http://arxiv.org/abs/2402.18012v1
Compressor summary: The paper proposes a two-stage diffusion model framework to optimize problems without explicit objective functions or constraints, using sampling from the product of Boltzmann distributions defined by the objective and data distributions.
http://arxiv.org/abs/2402.18011v1
Compressor summary: The study presents a lightweight neural network that learns to represent 3D point and line features for visual localization and mapping, achieving leading pose accuracy and outperforming state-of-the-art methods in indoor and outdoor scenarios.
http://arxiv.org/abs/2402.18008v1
Compressor summary: The paper introduces two fast and interpretable methods, SKS and ACA, for decomposing 2D homographies using minimal points and polynomial parameterization, with ACA being efficient enough to be a plug-in module in feature-based or deep homography pipelines.
http://arxiv.org/abs/2402.18007v1
Compressor summary: MLP-Mixer is a popular neural network architecture for computer vision that fuses channel and token information, and a new model called Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) applies this concept to audio data for improved classification performance.
http://arxiv.org/abs/2402.18005v1
Compressor summary: The authors propose a three-layer framework for summarizing scientific sentiments in meta-review generation and test its effectiveness with LLMs.