This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-24 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.16698v1
Compressor summary: The text proposes a new method to generate challenging images for single-image depth estimation by using text-to-image diffusion models with depth-aware control and self-distillation.
http://arxiv.org/abs/2407.16695v1
Compressor summary: Lifelong ICL introduces a problem setting to evaluate how well long-context language models can learn from multiple tasks in a sequence, while Task Haystack is an evaluation suite that tests their ability to understand and use context effectively.
http://arxiv.org/abs/2407.16693v1
Compressor summary: Explanation regularisation may not improve out-of-domain performance by relying more on plausible tokens, as previously suggested, and its impact on model attributions needs further study.
http://arxiv.org/abs/2407.16686v1
Compressor summary: AutoJailbreak is a new technique that uses large language models and weak-to-strong prompts to automatically jailbreak GPT-4V, raising privacy concerns.
http://arxiv.org/abs/2407.16674v1
Compressor summary: The paper compares KAN and MLP models across various tasks, finding that MLP generally outperforms KAN except in symbolic formula representation, where B-spline activation improves MLP's performance.
http://arxiv.org/abs/2407.16670v1
Compressor summary: The authors propose a new method for detecting fake news in short videos by analyzing the creative process behind their production and using a model that captures material selection and editing preferences.
http://arxiv.org/abs/2407.16665v1
Compressor summary: Event cameras can accurately track rapid eye movements called saccades, enabling better understanding of neurological conditions and other applications.
http://arxiv.org/abs/2407.16664v1
Compressor summary: Pretraining for transfer learning enhances low-resource ASR models' performance and robustness across languages and domains, especially for rare words.
http://arxiv.org/abs/2407.16663v1
Compressor summary: This paper shows that any natural hypothesis class learnable by a computer must meet computability requirements under mild assumptions.
http://arxiv.org/abs/2407.16658v1
Compressor summary: The paper introduces EgoCVR, a new evaluation benchmark for Composed Video Retrieval using egocentric video datasets, and proposes a re-ranking framework to improve performance on this challenging task.
http://arxiv.org/abs/2407.16655v1
Compressor summary: MovieDreamer is a novel framework for generating long-duration videos with complex plots, high visual fidelity, and consistent character identities by combining autoregressive models and diffusion rendering.
http://arxiv.org/abs/2407.16641v1
Compressor summary: Key points: - Hyperbolic embeddings are good for tree-like graphs, but hard to learn for hierarchical data - The paper identifies three challenges and proposes a geometry-aware algorithm with dilation and transitive closure regularization - The algorithm improves performance on synthetic and real datasets and has theoretical support Summary: The paper presents a geometry-aware algorithm that tackles challenges in learning hyperbolic embeddings for hierarchical data using dilation and transitive closure regularization, and shows its effectiveness on various datasets.
http://arxiv.org/abs/2407.16637v1
Compressor summary: The paper studies how to improve large language models' ability to avoid generating harmful content by teaching them to correct their course quickly using a synthetic dataset and preference learning.
http://arxiv.org/abs/2407.16624v1
Compressor summary: This paper explores how large language models can help analyze three types of word meaning changes and improve computer applications like translation and chatbots.
http://arxiv.org/abs/2407.16615v1
Compressor summary: The study compares GPT-4 and Llama 3 models for legal text classification, finding that lightly fine-tuned Llama 3 outperforms GPT-4 and offers a viable alternative to commercial models.
http://arxiv.org/abs/2407.16611v1
Compressor summary: The text discusses continual learning, a problem where models update with new information while preserving past knowledge, and compares two approximation strategies for this problem.
http://arxiv.org/abs/2407.16607v1
Compressor summary: This paper introduces a method to infer the distribution of training data used by language models based on their byte-pair encoding tokenizers, revealing information about their multilingual and domain diversity.
http://arxiv.org/abs/2407.16604v1
Compressor summary: The paper introduces imaginary question answering to study model similarity among large language models, revealing a shared imagination space between them.
http://arxiv.org/abs/2407.16602v1
Compressor summary: We improve the Policy Mirror Descent family of algorithms in Reinforcement Learning by adding momentum and duality, making it independent of policy parametrization and suitable for large-scale optimization.
http://arxiv.org/abs/2407.16600v1
Compressor summary: The paper proposes a new method for synthesizing realistic novel views in driving scenes by decoupling and hybridizing road and non-road layers, using an implicit road representation with SDF, and adding auxiliary losses to improve quality.
http://arxiv.org/abs/2407.16593v1
Compressor summary: The text discusses how linguistic characteristics can help classify genuine patient voices from social media to bridge the gap between healthcare professionals' perceptions and patients' reality, improving healthcare standards.
http://arxiv.org/abs/2407.16575v1
Compressor summary: The paper studies how communication delay affects real-time 3D scene representations and proposes a method to use AoI to improve fidelity in such scenarios.
http://arxiv.org/abs/2407.16574v1
Compressor summary: TLCR uses a discriminator to assign continuous rewards to tokens based on human feedback, improving language model quality in RLHF.
http://arxiv.org/abs/2407.16565v1
Compressor summary: The authors propose pRAGe, a pipeline that combines retrieval and small language models to generate medically accurate paraphrases in French.
http://arxiv.org/abs/2407.16556v1
Compressor summary: This work analyzes the frequency behavior of ReLU activation functions in Convolutional Neural Networks, showing that it introduces higher oscillations and a constant DC component that helps feature extraction and convergence.
http://arxiv.org/abs/2407.16541v1
Compressor summary: This paper proposes a new pretraining framework based on masked image modeling that improves quality and aesthetics assessment of visual content.
http://arxiv.org/abs/2407.16539v1
Compressor summary: This paper proposes data augmentation techniques to improve internet traffic classification for encrypted data, addressing challenges such as limited data availability and varying transmission units.
http://arxiv.org/abs/2407.16537v1
Compressor summary: The text proposes a method to measure the impact of textual context and acoustics on speech recognition, revealing the strengths and weaknesses of different models and explaining poor performance on African-American English.
http://arxiv.org/abs/2407.16533v1
Compressor summary: HAPFI is a method that uses past information from different sources to improve an agent's ability to plan and execute long sequences of tasks.
http://arxiv.org/abs/2407.16526v1
Compressor summary: The text proposes a method to update vision encoders in VLMs locally and selectively, improving performance on data with previous errors and maintaining robustness during continual few-shot updates.
http://arxiv.org/abs/2407.16521v1
Compressor summary: The paper studies how large language models perform in a text-based version of the game Among Us, which involves identifying saboteurs on a spaceship, to understand their social reasoning and decision-making skills.
http://arxiv.org/abs/2407.16516v1
Compressor summary: The paper compares fine-tuned pre-trained encoder models and in-context learning for detecting topic-related content in webpages with few annotated data points and different features.
http://arxiv.org/abs/2407.16515v1
Compressor summary: Ebc-exstream is a novel model drift detector that uses explanations and human feedback to identify and correct spurious correlations, reducing annotation costs and improving performance.
http://arxiv.org/abs/2407.16514v1
Compressor summary: The paper proposes new techniques for efficient 3D convolutions using 2D/1D operations on 4D/3D tensors, improving efficiency and accuracy for real-time applications like robots.
http://arxiv.org/abs/2407.16511v1
Compressor summary: DreamVTON is a novel 3D virtual try-on model that optimizes person and clothes geometry and texture using a personalized diffusion model with multi-concept LoRA and template-based optimization.
http://arxiv.org/abs/2407.16508v1
Compressor summary: The ToDER pipeline uses a bi-directional adaptation architecture and a TNet module to accurately predict depth maps for reliable colonoscopy video reconstruction and diagnosis.
http://arxiv.org/abs/2407.16503v1
Compressor summary: HDRSplat is a fast method for 14-bit high dynamic range 3D scene reconstruction using 3D Gaussian Splatting, which works well in dark and bright scenes with low texture and high depth of field.
http://arxiv.org/abs/2407.16497v1
Compressor summary: Source-free object detection (SFOD) improves stability and performance by using the proposed Dynamic Retraining-Updating mechanism and Historical Student Loss, achieving results comparable to or better than advanced unsupervised domain adaptation methods.
http://arxiv.org/abs/2407.16485v1
Compressor summary: The paper proposes a positive-unlabeled learning approach to infer nonlinear constraints from expert demonstrations for real-world tasks, using an iterative framework with memory buffer.
http://arxiv.org/abs/2407.16482v1
Compressor summary: Shapley Values are a way to explain AI models, but calculating them accurately is hard; BONES is a new tool that simplifies their estimation and evaluation for both tabular and image data.
http://arxiv.org/abs/2407.16477v1
Compressor summary: The qMRI Diffusor uses a deep generative model (DDPM) to estimate T1 parameters in the brain more accurately and precisely than other methods, while also allowing for uncertainty quantification.
http://arxiv.org/abs/2407.16470v1
Compressor summary: The paper evaluates how well large language models can detect hallucinations in machine translations across various languages and finds that they perform better for high-resource languages than low-resource ones.
http://arxiv.org/abs/2407.16468v1
Compressor summary: QRF-GNN is a novel algorithm that leverages Graph Neural Networks and static node features to solve combinatorial optimization problems with QUBO formulation, achieving high performance and scalability.
http://arxiv.org/abs/2407.16466v1
Compressor summary: The paper proposes a method to improve neural network training for computational mechanics using sensitivity information and residual weighting, leading to better convergence and error reduction in linear and nonlinear material models.
http://arxiv.org/abs/2407.16445v1
Compressor summary: The paper proposes a benchmark to evaluate and rank time series forecasting methods across various datasets and compares two prominent frameworks, AutoGluon-Timeseries and sktime, to inform method selection for optimal predictions.
http://arxiv.org/abs/2407.16444v1
Compressor summary: The paper introduces Psychomatics, a framework to compare and understand the differences between human and artificial language processing and cognition, aiming to improve AI systems' human-likeness.
http://arxiv.org/abs/2407.16434v1
Compressor summary: The paper introduces context structurization to improve large language models' cognition and performance on complex NLP tasks by organizing sentences into well-ordered and hierarchical structures.
http://arxiv.org/abs/2407.16431v1
Compressor summary: FairFlow is a method to automatically create parallel data for training language models that reduce harmful biases and stereotypes by balancing demographic attributes without relying on expensive, manual word substitutions.
http://arxiv.org/abs/2407.16430v1
Compressor summary: The paper introduces ImOOD framework to address challenges in detecting out-of-distribution samples on imbalanced data and proposes a regularization technique that improves the performance of OOD detectors.
http://arxiv.org/abs/2407.16424v1
Compressor summary: Key points: - The paper proposes a method to detect small objects on high-resolution images using less computation and memory. - The method reuses the detector's backbone for feature-level object-seeking and patch-slicing, and adds a sparse detection head. - The method is generic and can be applied to different types of detectors (CNN or ViT). - The method outperforms state-of-the-art detectors on several datasets. Summary: The paper presents ESOD, a method that improves small object detection on high-resolution images by reusing and focusing the detector's features with a sparse head, achieving better performance than existing methods.
http://arxiv.org/abs/2407.16406v1
Compressor summary: The paper introduces Emotion Forecasting, a new Deep Learning problem that predicts how people's emotions will change based on their interactions with others, and presents a new dataset (Hi-EF) to train and evaluate models for this task.
http://arxiv.org/abs/2407.16396v1
Compressor summary: The text introduces a novel data-driven differentiable renderer that uses neural networks to infer unbiased and scalable unsigned distance functions from RGB images.
http://arxiv.org/abs/2407.16394v1
Compressor summary: The text proposes a new sign language representation framework called SEDS, which uses Pose and RGB modalities to capture local and global information of sign videos and fuses them with Cross Gloss Attention Fusion for better performance.
http://arxiv.org/abs/2407.16388v1
Compressor summary: The text introduces Causal Discovery Algorithms (CDA) as a data-driven method for Root Cause Analysis (RCA) in modern production processes, and compares their suitability and runtime using data from an automotive assembly case study.
http://arxiv.org/abs/2407.16384v1
Compressor summary: Key points: - The study proposes a multitask deep learning model for hyperspectral imaging in remote sensing - The model performs multiple classification and regression tasks on 13 forest variables - The model uses a sharing encoder, task-specific decoder, dense atrous pyramid pooling, and attention network - The model outperforms other state-of-the-art methods and is robust across different seeds/trials Summary: The study presents a multitask deep learning model that uses a sharing encoder and task-specific decoder to perform multiple classification and regression tasks on hyperspectral images of forests, achieving better performance and robustness than existing methods.
http://arxiv.org/abs/2407.16382v1
Compressor summary: The study introduces two new BERT models for Persian natural language understanding tasks and shows their superior performance compared to existing models.
http://arxiv.org/abs/2407.16370v1
Compressor summary: The paper proposes using alternative prompts for generative error correction in speech recognition and optimizing them with an evolutionary algorithm to improve performance.
http://arxiv.org/abs/2407.16369v1
Compressor summary: FCNR is a fast compressive neural representation for large numbers of images that uses stereo context modules and joint context transfer modules to achieve high compression ratios and reconstruction quality, outperforming other neural compression methods.
http://arxiv.org/abs/2407.16364v1
Compressor summary: TextHarmony is a model that generates visual text by combining modality-specific and modality-agnostic experts and uses DetailedTextCaps-100K, a large image caption dataset, to improve performance.
http://arxiv.org/abs/2407.16361v1
Compressor summary: The paper proposes a method to adjust robots' ethical behavior according to their environment using virtue ethics and character tuning, and demonstrates it in an elder-care simulation.
http://arxiv.org/abs/2407.16355v1
Compressor summary: The paper studies how using best-action queries can help decision makers in online learning minimize their loss and achieve better performance with limited feedback.
http://arxiv.org/abs/2407.16347v1
Compressor summary: FACTTRACK is a novel method for tracking atomic facts and addressing factual contradictions in language models with time-aware validity intervals.
http://arxiv.org/abs/2407.16337v1
Compressor summary: The paper proposes a novel method, STATE, that uses the Student's t-distribution to estimate treatment effects in online controlled experiments with heavy-tailed metrics, achieving significant variance reduction and better data-driven decisions.
http://arxiv.org/abs/2407.16326v1
Compressor summary: The paper proposes a framework to compare reasoning abilities of KGE methods and introduces STransCoRe, an improved version of STransE.
http://arxiv.org/abs/2407.16318v1
Compressor summary: PrimeGuard is a novel method that uses structured control flow to route queries to different LM instantiations with varying instructions, improving safety and helpfulness without fine-tuning or compromising on either.
http://arxiv.org/abs/2407.16309v1
Compressor summary: The paper introduces a new visual quality metric for multidimensional projections based on human perception, which improves the evaluation of the Local Affine Multidimensional Projection method.
http://arxiv.org/abs/2407.16308v1
Compressor summary: The paper proposes a novel network (SAFNet) for HDR imaging that improves efficiency by selectively refining valuable areas and using a lightweight refine module, while achieving better results than previous methods.
http://arxiv.org/abs/2407.16302v1
Compressor summary: Key points: - Two-level sequential planning approach for automated image distortion classification and rectification - Higher level detects class of corruptions, lower level selects specific algorithm - Runs in single forward pass during inference and can be queried iteratively - Improves object detection on COCO dataset with rich set of distortions - Dynamic reconfiguration and generalisability to unseen algorithms Summary: The paper proposes a two-level approach that automatically classifies and rectifies image distortions in one forward pass, improving object detection on COCO dataset. The approach can adapt to new algorithms during inference.
http://arxiv.org/abs/2407.16293v1
Compressor summary: The paper introduces a new method to project data into the $\ell_{1,\infty}$ norm, reducing the time complexity by a factor of 2.5 and improving sparsity and accuracy in classification tasks.
http://arxiv.org/abs/2407.16291v1
Compressor summary: TAPTRv2 improves tracking any point task by introducing attention-based position update and removing cost-volume computation.
http://arxiv.org/abs/2407.16286v1
Compressor summary: This paper explores different metrics to prune large language models and shows that adaptive metrics can trade off performance across tasks, while self-attention layers are more amendable to pruning with recovery techniques.
http://arxiv.org/abs/2407.16280v1
Compressor summary: DECOR is an algorithm that quickly detects symmetries in factor graphs, enabling efficient probabilistic inference with respect to domain sizes.
http://arxiv.org/abs/2407.16269v1
Compressor summary: The paper introduces HyTAS, a benchmark for searching optimal transformer architectures for hyperspectral imaging classification tasks, and evaluates 12 methods on 5 datasets.
http://arxiv.org/abs/2407.16268v1
Compressor summary: The paper proposes a new CNN architecture with Kolmogorov-Arnold Network and Fuzzy Pooling for interpretable and accurate image classification tasks, and shows its effectiveness in comparison to traditional models.
http://arxiv.org/abs/2407.16266v1
Compressor summary: This study introduces a new machine translation evaluation method that considers non-binary gender and uses Emotional Attitude Score to measure ambiguous attitude words, revealing significant bias in current models.
http://arxiv.org/abs/2407.16264v1
Compressor summary: The text proposes a two-step approach to improve medical contrastive learning by standardizing text reports, converting them into binary questions, and enhancing visual pre-training with a Meijering-based masking technique.
http://arxiv.org/abs/2407.16260v1
Compressor summary: DreamDissector is a text-to-3D method that generates multiple independent objects with plausible interactions by disentangling a NeRF and using category score distillation sampling.
http://arxiv.org/abs/2407.16255v1
Compressor summary: The text proposes a new learning framework that can generate non-Abelian gauge fields for studying condensed matter physics by using self-reasoning and continuous transformation of data.
http://arxiv.org/abs/2407.16252v1
Compressor summary: LawLuo is a novel legal dialogue framework using multiple LLM agents that collaborate to provide comprehensive legal consultations, overcoming limitations of existing Chinese legal LLMs.
http://arxiv.org/abs/2407.16248v1
Compressor summary: The paper proposes SGMN, a model that uses text guidance, spatiotemporal graphing, and multi-modal hard example mining to accurately identify products in livestreaming sales videos.
http://arxiv.org/abs/2407.16245v1
Compressor summary: This paper explores how to select intermediate tasks for transfer learning and compares four methods, finding that pairwise token similarity is the best predictor of transfer performance.
http://arxiv.org/abs/2407.16244v1
Compressor summary: HSVLT is a novel Transformer-based method for multi-label image classification that uses hierarchical multi-scale architecture and interactive visual-linguistic attention to improve performance and efficiency.
http://arxiv.org/abs/2407.16243v1
Compressor summary: Chameleon is a robust textual-visual multimodal learning method that works well even when some modalities are missing, unlike conventional multi-branch designs.
http://arxiv.org/abs/2407.16239v1
Compressor summary: The authors propose nonlinear ICA-based bandit algorithms that can learn latent variables from observational data and infer the optimal action for each patient, improving personalized decision-making in health applications.
http://arxiv.org/abs/2407.16234v1
Compressor summary: Keypoints: - The paper proposes a new method for age estimation using facial features - It combines graph convolutional neural network with multi-view mask contrastive learning - It uses an asymmetric siamese network to learn latent representations and reconstruct missing information - It has two stages: feature extraction and age estimation - It outperforms existing methods on benchmark datasets Summary: The paper presents MMCL-GCN, a novel method for estimating age using facial features. It uses graph convolutional network and multi-view mask contrastive learning to learn complex structural and semantic information. It has two stages: feature extraction with an asymmetric siamese network and age estimation with extreme learning machines. It achieves better results than existing methods on benchmark datasets.
http://arxiv.org/abs/2407.16232v1
Compressor summary: The paper introduces a new attention method (CPAT) for super-resolution that expands windows along feature maps and a spatial-frequency interaction module (SFIM) that integrates information from both domains, achieving state-of-the-art results.
http://arxiv.org/abs/2407.16224v1
Compressor summary: OutfitAnyone is a two-stream conditional diffusion model that generates lifelike virtual clothing images by handling garment deformation and adapting to various factors like pose, body shape, and image types.
http://arxiv.org/abs/2407.16222v1
Compressor summary: PreAlign improves multilingual alignment in large language models during pretraining, leading to better cross-lingual performance.
http://arxiv.org/abs/2407.16221v1
Compressor summary: This paper explores Abstention Ability (AA), a crucial aspect of large language models' reliability, and proposes evaluation methods to improve their ability to refrain from answering when uncertain or when answers are unanswerable.
http://arxiv.org/abs/2407.16220v1
Compressor summary: The paper proposes Online Dynamic Goal Recognition (ODGR), a novel reinforcement learning approach to recognize an agent's goals in real-time, overcoming limitations of traditional goal recognition methods.
http://arxiv.org/abs/2407.16216v1
Compressor summary: The text discusses the progress and challenges of large language models in generating accurate and human-like responses, and reviews various methods to improve their performance.
http://arxiv.org/abs/2407.16214v1
Compressor summary: Diff-Shadow is a global-guided diffusion model that combines local and global information for high-quality shadow removal in images.
http://arxiv.org/abs/2407.16207v1
Compressor summary: The paper proposes Graph-structured Speculative Decoding (GSD), which generates multiple hypotheses and uses a directed acyclic graph to efficiently merge recurring token sequences, achieving significant speedup for inference of Large Language Models.
http://arxiv.org/abs/2407.16204v1
Compressor summary: The visual-text inpainting task aims to restore and complete missing information in both damaged scene text images and their corresponding texts by leveraging complementary information from both modalities using a cross-modal predictive interaction model.
http://arxiv.org/abs/2407.16200v1
Compressor summary: The text describes how Monte Carlo Tree Search (MCTS) can be used to optimize haul-truck dispatch in mining by incorporating operational constraints as opportunity costs in the optimization problem.
http://arxiv.org/abs/2407.16198v1
Compressor summary: INF-LLaVA is a novel multimodal language model with innovative modules to process high-resolution images by capturing both local and global information.
http://arxiv.org/abs/2407.16193v1
Compressor summary: CloudFixer is a test-time input adaptation method for 3D point clouds that uses a pre-trained diffusion model and optimizes geometric transformation parameters to handle noisy points and improve recognition performance.
http://arxiv.org/abs/2407.16190v1
Compressor summary: The paper proposes a model of artificial agents based on their history, repertoire, and environment, argues that LLMs are not agents yet, but could become so with additional modules, and discusses challenges and future research directions.
http://arxiv.org/abs/2407.16189v1
Compressor summary: EIANet uses a novel attention mechanism with an ETF classifier to separate and focus on discriminative features for fine-grained visual categorization in SFDA.
http://arxiv.org/abs/2407.16181v1
Compressor summary: The paper introduces a new technique to improve neural grammar induction by focusing on relevant parse trees per sentence, reducing errors, variance, and simplicity bias.
http://arxiv.org/abs/2407.16177v1
Compressor summary: The paper proposes a new method to analyze datasets using logifolds, which can improve ensemble machine learning and identify fuzzy domains.
http://arxiv.org/abs/2407.16174v1
Compressor summary: Pixel embedding replaces float-valued input pixels with low-bit vectors, reducing quantization errors and increasing efficiency for deep neural networks.
http://arxiv.org/abs/2407.16173v1
Compressor summary: The authors propose a hybrid method for 3D indoor scene reconstruction using meshes and 3D Gaussian Splatting, with Segment Anything Model to guide the selection, and an additional densification stage to improve image quality.
http://arxiv.org/abs/2407.16171v1
Compressor summary: The paper proposes a framework that improves audio-visual question answering (AVQA) robustness by using relation-aware models to handle missing modalities and enhance features across audio and visual inputs.
http://arxiv.org/abs/2407.16168v1
Compressor summary: The paper proposes a new method called PMF for aligning entities across different knowledge graphs by focusing on relevant features and enhancing multi-modal fusion with a cross-modal association loss.
http://arxiv.org/abs/2407.16166v1
Compressor summary: The study used NLP and large language models to generate synthetic patient notes with balanced privacy and utility, finding re-identified data more effective than de-identified data.
http://arxiv.org/abs/2407.16164v1
Compressor summary: The paper proposes a model-level solution, Saturn Ring Classifier Module (SRCM), to reduce privacy vulnerability in machine learning models by creating a confined representation space.
http://arxiv.org/abs/2407.16161v1
Compressor summary: The TransFeat-TPP model uses a Transformer network to better incorporate contextual data into event models, improving interpretability and prediction accuracy.
http://arxiv.org/abs/2407.16160v1
Compressor summary: UniMEL is a framework that uses large language models to link ambiguous mentions in multimodal contexts to entities in a knowledge base, improving performance and scalability.
http://arxiv.org/abs/2407.16154v1
Compressor summary: The paper introduces DDK, a framework that adjusts the distillation dataset composition to improve smaller LLMs' performance by transferring knowledge from larger LLMs in a stable and effective way.
http://arxiv.org/abs/2407.16153v1
Compressor summary: The paper studies how rank and number of heads in attention mechanisms affect their performance on different target functions and context lengths, and provides theoretical and empirical evidence for the trade-offs involved.
http://arxiv.org/abs/2407.16150v1
Compressor summary: The text uses deep learning networks to predict stock prices based on news articles and shows that combining different news categories improves accuracy.
http://arxiv.org/abs/2407.16148v1
Compressor summary: The authors explore the use of LLMs for generating hierarchical organizations of scientific studies to help with literature reviews, create a dataset (CHIME) for this task, and train a corrector model to improve study assignments based on human feedback.
http://arxiv.org/abs/2407.16145v1
Compressor summary: Key points: - VQA models can act as zero-shot image classifiers with language prompts - Zero-shot VQA performance is often low due to different data distributions and category names - The proposed method uses few-shot learning with multiple-choice questions to improve VQA performance - The method outperforms visual encoders and zero-shot VQA baselines on common few-shot tasks - The method works well on diverse visual attributes such as clothing features Summary: The paper proposes a few-shot learning method for VQA models to enhance their image classification ability using multiple-choice questions, which improve the relevance of visual information and overcome data distribution issues.
http://arxiv.org/abs/2407.16142v1
Compressor summary: The paper proposes Trajectory Diffuser, a method that speeds up diffusion models for reinforcement learning tasks by separating the generation and optimization of feasible trajectories.
http://arxiv.org/abs/2407.16134v1
Compressor summary: The paper explores how diffusion transformers can capture and leverage spatial-temporal dependencies in sequential data generation, using Gaussian process models as a case study.
http://arxiv.org/abs/2407.16133v1
Compressor summary: The text proposes new loss functions for biometric recognition that improve performance in open-set scenarios, where probe subjects may or may not be in the gallery, and also enhance closed-set performance.
http://arxiv.org/abs/2407.16129v1
Compressor summary: The paper proposes LMA, a novel multimodal object detector with shared backbone and adaptive rank allocation, achieving significant accuracy improvement and parameter reduction over existing methods.
http://arxiv.org/abs/2407.16128v1
Compressor summary: The PSPD framework uses adaptive curriculum learning to improve brain imaging analysis by adjusting training examples based on past and present models' performance.
http://arxiv.org/abs/2407.16127v1
Compressor summary: The paper proposes DIFT, a finetuning framework that leverages lightweight models and truncated sampling to improve KG completion with large language models without grounding errors.
http://arxiv.org/abs/2407.16126v1
Compressor summary: MxT is a new image inpainting method that combines Mamba and transformers to efficiently restore missing regions with high quality and contextual accuracy.
http://arxiv.org/abs/2407.16125v1
Compressor summary: The paper proposes a new method, DAVI, that uses diffusion models to solve inverse problems more efficiently and effectively than existing methods.
http://arxiv.org/abs/2407.16124v1
Compressor summary: The paper introduces FVMD, a metric for evaluating motion consistency in generated videos, which outperforms existing metrics and can improve VQA models.
http://arxiv.org/abs/2407.16123v1
Compressor summary: Key points: - The text discusses the challenges and methods for multimodal spatio-temporal data fusion and forecasting in smart mobility scenarios. - The challenges include insufficient data, complex transportation modes, and partial data loss. - The methods aim to transfer knowledge, distinguish features, and fuse sparse representations. Summary: The text summarizes the research on how to deal with multimodal spatio-temporal data fusion and forecasting challenges in smart mobility scenarios, such as insufficient data, complex transportation modes, and partial data loss. The methods involve knowledge transfer, feature distinction, and sparse representation fusion.
http://arxiv.org/abs/2407.16115v1
Compressor summary: The paper proposes a novel AIoT model called SEB-Transformer that predicts the battery range of shared e-bikes, enabling better route planning and user experience.
http://arxiv.org/abs/2407.16110v1
Compressor summary: The paper studies how the meanings of words change over time through analysis of sentences using Chat GPT, showing that word polysemy is an evolutionary consequence of modifying Semantic Cells.