arxiv compressed, 2024-07-24

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-24 generated by the compressor, my personal LLM-based project.


Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

http://arxiv.org/abs/2407.16698v1

Compressor summary: The text proposes a new method to generate challenging images for single-image depth estimation by using text-to-image diffusion models with depth-aware control and self-distillation.


Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack

http://arxiv.org/abs/2407.16695v1

Compressor summary: Lifelong ICL introduces a problem setting to evaluate how well long-context language models can learn from multiple tasks in a sequence, while Task Haystack is an evaluation suite that tests their ability to understand and use context effectively.


Explanation Regularisation through the Lens of Attributions

http://arxiv.org/abs/2407.16693v1

Compressor summary: Explanation regularisation may not improve out-of-domain performance by relying more on plausible tokens, as previously suggested, and its impact on model attributions needs further study.


Can Large Language Models Automatically Jailbreak GPT-4V?

http://arxiv.org/abs/2407.16686v1

Compressor summary: AutoJailbreak is a new technique that uses large language models and weak-to-strong prompts to automatically jailbreak GPT-4V, raising privacy concerns.


KAN or MLP: A Fairer Comparison

http://arxiv.org/abs/2407.16674v1

Compressor summary: The paper compares KAN and MLP models across various tasks, finding that MLP generally outperforms KAN except in symbolic formula representation, where B-spline activation improves MLP's performance.


FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process

http://arxiv.org/abs/2407.16670v1

Compressor summary: The authors propose a new method for detecting fake news in short videos by analyzing the creative process behind their production and using a model that captures material selection and editing preferences.


A Framework for Pupil Tracking with Event Cameras

http://arxiv.org/abs/2407.16665v1

Compressor summary: Event cameras can accurately track rapid eye movements called saccades, enabling better understanding of neurological conditions and other applications.


Towards scalable efficient on-device ASR with transfer learning

http://arxiv.org/abs/2407.16664v1

Compressor summary: Pretraining for transfer learning enhances low-resource ASR models' performance and robustness across languages and domains, especially for rare words.


Computable learning of natural hypothesis classes

http://arxiv.org/abs/2407.16663v1

Compressor summary: This paper shows that any natural hypothesis class learnable by a computer must meet computability requirements under mild assumptions.


EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval

http://arxiv.org/abs/2407.16658v1

Compressor summary: The paper introduces EgoCVR, a new evaluation benchmark for Composed Video Retrieval using egocentric video datasets, and proposes a re-ranking framework to improve performance on this challenging task.


MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

http://arxiv.org/abs/2407.16655v1

Compressor summary: MovieDreamer is a novel framework for generating long-duration videos with complex plots, high visual fidelity, and consistent character identities by combining autoregressive models and diffusion rendering.


A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space

http://arxiv.org/abs/2407.16641v1

Compressor summary: Key points: - Hyperbolic embeddings are good for tree-like graphs, but hard to learn for hierarchical data - The paper identifies three challenges and proposes a geometry-aware algorithm with dilation and transitive closure regularization - The algorithm improves performance on synthetic and real datasets and has theoretical support Summary: The paper presents a geometry-aware algorithm that tackles challenges in learning hyperbolic embeddings for hierarchical data using dilation and transitive closure regularization, and shows its effectiveness on various datasets.


Course-Correction: Safety Alignment Using Synthetic Preferences

http://arxiv.org/abs/2407.16637v1

Compressor summary: The paper studies how to improve large language models' ability to avoid generating harmful content by teaching them to correct their course quickly using a synthetic dataset and preference learning.


Semantic Change Characterization with LLMs using Rhetorics

http://arxiv.org/abs/2407.16624v1

Compressor summary: This paper explores how large language models can help analyze three types of word meaning changes and improve computer applications like translation and chatbots.


Lawma: The Power of Specialization for Legal Tasks

http://arxiv.org/abs/2407.16615v1

Compressor summary: The study compares GPT-4 and Llama 3 models for legal text classification, finding that lightly fine-tuned Llama 3 outperforms GPT-4 and offers a viable alternative to commercial models.


Local vs Global continual learning

http://arxiv.org/abs/2407.16611v1

Compressor summary: The text discusses continual learning, a problem where models update with new information while preserving past knowledge, and compares two approximation strategies for this problem.


Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

http://arxiv.org/abs/2407.16607v1

Compressor summary: This paper introduces a method to infer the distribution of training data used by language models based on their byte-pair encoding tokenizers, revealing information about their multilingual and domain diversity.


Shared Imagination: LLMs Hallucinate Alike

http://arxiv.org/abs/2407.16604v1

Compressor summary: The paper introduces imaginary question answering to study model similarity among large language models, revealing a shared imagination space between them.


Functional Acceleration for Policy Mirror Descent

http://arxiv.org/abs/2407.16602v1

Compressor summary: We improve the Policy Mirror Descent family of algorithms in Reinforcement Learning by adding momentum and duality, making it independent of policy parametrization and suitable for large-scale optimization.


DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene

http://arxiv.org/abs/2407.16600v1

Compressor summary: The paper proposes a new method for synthesizing realistic novel views in driving scenes by decoupling and hybridizing road and non-road layers, using an implicit road representation with SDF, and adding auxiliary losses to improve quality.


A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions

http://arxiv.org/abs/2407.16593v1

Compressor summary: The text discusses how linguistic characteristics can help classify genuine patient voices from social media to bridge the gap between healthcare professionals' perceptions and patients' reality, improving healthcare standards.


Timeliness-Fidelity Tradeoff in 3D Scene Representations

http://arxiv.org/abs/2407.16575v1

Compressor summary: The paper studies how communication delay affects real-time 3D scene representations and proposes a method to use AoI to improve fidelity in such scenarios.


TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

http://arxiv.org/abs/2407.16574v1

Compressor summary: TLCR uses a discriminator to assign continuous rewards to tokens based on human feedback, improving language model quality in RLHF.


Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models

http://arxiv.org/abs/2407.16565v1

Compressor summary: The authors propose pRAGe, a pipeline that combines retrieval and small language models to generate medically accurate paraphrases in French.


DC is all you need: describing ReLU from a signal processing standpoint

http://arxiv.org/abs/2407.16556v1

Compressor summary: This work analyzes the frequency behavior of ReLU activation functions in Convolutional Neural Networks, showing that it introduces higher oscillations and a constant DC component that helps feature extraction and convergence.


QPT V2: Masked Image Modeling Advances Visual Scoring

http://arxiv.org/abs/2407.16541v1

Compressor summary: This paper proposes a new pretraining framework based on masked image modeling that improves quality and aesthetics assessment of visual content.


Enhancing Encrypted Internet Traffic Classification Through Advanced Data Augmentation Techniques

http://arxiv.org/abs/2407.16539v1

Compressor summary: This paper proposes data augmentation techniques to improve internet traffic classification for encrypted data, addressing challenges such as limited data availability and varying transmission units.


Quantifying the Role of Textual Predictability in Automatic Speech Recognition

http://arxiv.org/abs/2407.16537v1

Compressor summary: The text proposes a method to measure the impact of textual context and acoustics on speech recognition, revealing the strengths and weaknesses of different models and explaining poor performance on African-American English.


HAPFI: History-Aware Planning based on Fused Information

http://arxiv.org/abs/2407.16533v1

Compressor summary: HAPFI is a method that uses past information from different sources to improve an agent's ability to plan and execute long sequences of tasks.


Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models

http://arxiv.org/abs/2407.16526v1

Compressor summary: The text proposes a method to update vision encoders in VLMs locally and selectively, improving performance on data with previous errors and maintaining robustness during continual few-shot updates.


AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

http://arxiv.org/abs/2407.16521v1

Compressor summary: The paper studies how large language models perform in a text-based version of the game Among Us, which involves identifying saboteurs on a spaceship, to understand their social reasoning and decision-making skills.


Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

http://arxiv.org/abs/2407.16516v1

Compressor summary: The paper compares fine-tuned pre-trained encoder models and in-context learning for detecting topic-related content in webpages with few annotated data points and different features.


Spurious Correlations in Concept Drift: Can Explanatory Interaction Help?

http://arxiv.org/abs/2407.16515v1

Compressor summary: Ebc-exstream is a novel model drift detector that uses explanations and human feedback to identify and correct spurious correlations, reducing annotation costs and improving performance.


Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?

http://arxiv.org/abs/2407.16514v1

Compressor summary: The paper proposes new techniques for efficient 3D convolutions using 2D/1D operations on 4D/3D tensors, improving efficiency and accuracy for real-time applications like robots.


DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models

http://arxiv.org/abs/2407.16511v1

Compressor summary: DreamVTON is a novel 3D virtual try-on model that optimizes person and clothes geometry and texture using a personalized diffusion model with multi-concept LoRA and template-based optimization.


ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

http://arxiv.org/abs/2407.16508v1

Compressor summary: The ToDER pipeline uses a bi-directional adaptation architecture and a TNet module to accurately predict depth maps for reliable colonoscopy video reconstruction and diagnosis.


HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images

http://arxiv.org/abs/2407.16503v1

Compressor summary: HDRSplat is a fast method for 14-bit high dynamic range 3D scene reconstruction using 3D Gaussian Splatting, which works well in dark and bright scenes with low texture and high depth of field.


Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection

http://arxiv.org/abs/2407.16497v1

Compressor summary: Source-free object detection (SFOD) improves stability and performance by using the proposed Dynamic Retraining-Updating mechanism and Historical Student Loss, achieving results comparable to or better than advanced unsupervised domain adaptation methods.


Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning

http://arxiv.org/abs/2407.16485v1

Compressor summary: The paper proposes a positive-unlabeled learning approach to infer nonlinear constraints from expert demonstrations for real-world tasks, using an iterative framework with memory buffer.


BONES: a Benchmark fOr Neural Estimation of Shapley values

http://arxiv.org/abs/2407.16482v1

Compressor summary: Shapley Values are a way to explain AI models, but calculating them accurately is hard; BONES is a new tool that simplifies their estimation and evaluation for both tabular and image data.


qMRI Diffusor: Quantitative T1 Mapping of the Brain using a Denoising Diffusion Probabilistic Model

http://arxiv.org/abs/2407.16477v1

Compressor summary: The qMRI Diffusor uses a deep generative model (DDPM) to estimate T1 parameters in the brain more accurately and precisely than other methods, while also allowing for uncertainty quantification.


Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

http://arxiv.org/abs/2407.16470v1

Compressor summary: The paper evaluates how well large language models can detect hallucinations in machine translations across various languages and finds that they perform better for high-resource languages than low-resource ones.


Enhancing GNNs Performance on Combinatorial Optimization by Recurrent Feature Update

http://arxiv.org/abs/2407.16468v1

Compressor summary: QRF-GNN is a novel algorithm that leverages Graph Neural Networks and static node features to solve combinatorial optimization problems with QUBO formulation, achieving high performance and scalability.


Sobolev neural network with residual weighting as a surrogate in linear and non-linear mechanics

http://arxiv.org/abs/2407.16466v1

Compressor summary: The paper proposes a method to improve neural network training for computational mechanics using sensitivity information and residual weighting, leading to better convergence and error reduction in linear and nonlinear material models.


Can time series forecasting be automated? A benchmark and analysis

http://arxiv.org/abs/2407.16445v1

Compressor summary: The paper proposes a benchmark to evaluate and rank time series forecasting methods across various datasets and compares two prominent frameworks, AutoGluon-Timeseries and sktime, to inform method selection for optimal predictions.


Psychomatics -- A Multidisciplinary Framework for Understanding Artificial Minds

http://arxiv.org/abs/2407.16444v1

Compressor summary: The paper introduces Psychomatics, a framework to compare and understand the differences between human and artificial language processing and cognition, aiming to improve AI systems' human-likeness.


Enhancing LLM's Cognition via Structurization

http://arxiv.org/abs/2407.16434v1

Compressor summary: The paper introduces context structurization to improve large language models' cognition and performance on complex NLP tasks by organizing sentences into well-ordered and hierarchical structures.


FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

http://arxiv.org/abs/2407.16431v1

Compressor summary: FairFlow is a method to automatically create parallel data for training language models that reduce harmful biases and stereotypes by balancing demographic attributes without relying on expensive, manual word substitutions.


Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

http://arxiv.org/abs/2407.16430v1

Compressor summary: The paper introduces ImOOD framework to address challenges in detecting out-of-distribution samples on imbalanced data and proposes a regularization technique that improves the performance of OOD detectors.


ESOD: Efficient Small Object Detection on High-Resolution Images

http://arxiv.org/abs/2407.16424v1

Compressor summary: Key points: - The paper proposes a method to detect small objects on high-resolution images using less computation and memory. - The method reuses the detector's backbone for feature-level object-seeking and patch-slicing, and adds a sparse detection head. - The method is generic and can be applied to different types of detectors (CNN or ViT). - The method outperforms state-of-the-art detectors on several datasets. Summary: The paper presents ESOD, a method that improves small object detection on high-resolution images by reusing and focusing the detector's features with a sparse head, achieving better performance than existing methods.


Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

http://arxiv.org/abs/2407.16406v1

Compressor summary: The paper introduces Emotion Forecasting, a new Deep Learning problem that predicts how people's emotions will change based on their interactions with others, and presents a new dataset (Hi-EF) to train and evaluate models for this task.


Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

http://arxiv.org/abs/2407.16396v1

Compressor summary: The text introduces a novel data-driven differentiable renderer that uses neural networks to infer unbiased and scalable unsigned distance functions from RGB images.


SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval

http://arxiv.org/abs/2407.16394v1

Compressor summary: The text proposes a new sign language representation framework called SEDS, which uses Pose and RGB modalities to capture local and global information of sign videos and fuses them with Cross Gloss Attention Fusion for better performance.


Anwendung von Causal-Discovery-Algorithmen zur Root-Cause-Analyse in der Fahrzeugmontage

http://arxiv.org/abs/2407.16388v1

Compressor summary: The text introduces Causal Discovery Algorithms (CDA) as a data-driven method for Root Cause Analysis (RCA) in modern production processes, and compares their suitability and runtime using data from an automotive assembly case study.


A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset

http://arxiv.org/abs/2407.16384v1

Compressor summary: Key points: - The study proposes a multitask deep learning model for hyperspectral imaging in remote sensing - The model performs multiple classification and regression tasks on 13 forest variables - The model uses a sharing encoder, task-specific decoder, dense atrous pyramid pooling, and attention network - The model outperforms other state-of-the-art methods and is robust across different seeds/trials Summary: The study presents a multitask deep learning model that uses a sharing encoder and task-specific decoder to perform multiple classification and regression tasks on hyperspectral images of forests, achieving better performance and robustness than existing methods.


TookaBERT: A Step Forward for Persian NLU

http://arxiv.org/abs/2407.16382v1

Compressor summary: The study introduces two new BERT models for Persian natural language understanding tasks and shows their superior performance compared to existing models.


Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

http://arxiv.org/abs/2407.16370v1

Compressor summary: The paper proposes using alternative prompts for generative error correction in speech recognition and optimizing them with an evolutionary algorithm to improve performance.


FCNR: Fast Compressive Neural Representation of Visualization Images

http://arxiv.org/abs/2407.16369v1

Compressor summary: FCNR is a fast compressive neural representation for large numbers of images that uses stereo context modules and joint context transfer modules to achieve high compression ratios and reconstruction quality, outperforming other neural compression methods.


Harmonizing Visual Text Comprehension and Generation

http://arxiv.org/abs/2407.16364v1

Compressor summary: TextHarmony is a model that generates visual text by combining modality-specific and modality-agnostic experts and uses DetailedTextCaps-100K, a large image caption dataset, to improve performance.


Virtue Ethics For Ethically Tunable Robotic Assistants

http://arxiv.org/abs/2407.16361v1

Compressor summary: The paper proposes a method to adjust robots' ethical behavior according to their environment using virtue ethics and character tuning, and demonstrates it in an elder-care simulation.


Online Learning with Sublinear Best-Action Queries

http://arxiv.org/abs/2407.16355v1

Compressor summary: The paper studies how using best-action queries can help decision makers in online learning minimize their loss and achieve better performance with limited feedback.


FACTTRACK: Time-Aware World State Tracking in Story Outlines

http://arxiv.org/abs/2407.16347v1

Compressor summary: FACTTRACK is a novel method for tracking atomic facts and addressing factual contradictions in language models with time-aware validity intervals.


STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments

http://arxiv.org/abs/2407.16337v1

Compressor summary: The paper proposes a novel method, STATE, that uses the Student's t-distribution to estimate treatment effects in online controlled experiments with heavy-tailed metrics, achieving significant variance reduction and better data-driven decisions.


On The Expressive Power of Knowledge Graph Embedding Methods

http://arxiv.org/abs/2407.16326v1

Compressor summary: The paper proposes a framework to compare reasoning abilities of KGE methods and introduces STransCoRe, an improved version of STransE.


PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing

http://arxiv.org/abs/2407.16318v1

Compressor summary: PrimeGuard is a novel method that uses structured control flow to route queries to different LM instantiations with varying instructions, improving safety and helpfulness without fine-tuning or compromising on either.


A new visual quality metric for Evaluating the performance of multidimensional projections

http://arxiv.org/abs/2407.16309v1

Compressor summary: The paper introduces a new visual quality metric for multidimensional projections based on human perception, which improves the evaluation of the Local Affine Multidimensional Projection method.


SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

http://arxiv.org/abs/2407.16308v1

Compressor summary: The paper proposes a novel network (SAFNet) for HDR imaging that improves efficiency by selectively refining valuable areas and using a lightweight refine module, while achieving better results than previous methods.


DeepClean: Integrated Distortion Identification and Algorithm Selection for Rectifying Image Corruptions

http://arxiv.org/abs/2407.16302v1

Compressor summary: Key points: - Two-level sequential planning approach for automated image distortion classification and rectification - Higher level detects class of corruptions, lower level selects specific algorithm - Runs in single forward pass during inference and can be queried iteratively - Improves object detection on COCO dataset with rich set of distortions - Dynamic reconfiguration and generalisability to unseen algorithms Summary: The paper proposes a two-level approach that automatically classifies and rectifies image distortions in one forward pass, improving object detection on COCO dataset. The approach can adapt to new algorithms during inference.


A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

http://arxiv.org/abs/2407.16293v1

Compressor summary: The paper introduces a new method to project data into the $\ell_{1,\infty}$ norm, reducing the time complexity by a factor of 2.5 and improving sparsity and accuracy in classification tasks.


TAPTRv2: Attention-based Position Update Improves Tracking Any Point

http://arxiv.org/abs/2407.16291v1

Compressor summary: TAPTRv2 improves tracking any point task by introducing attention-based position update and removing cost-volume computation.


A deeper look at depth pruning of LLMs

http://arxiv.org/abs/2407.16286v1

Compressor summary: This paper explores different metrics to prune large language models and shows that adaptive metrics can trade off performance across tasks, while self-attention layers are more amendable to pruning with recovery techniques.


Efficient Detection of Commutative Factors in Factor Graphs

http://arxiv.org/abs/2407.16280v1

Compressor summary: DECOR is an algorithm that quickly detects symmetries in factor graphs, enabling efficient probabilistic inference with respect to domain sizes.


HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis

http://arxiv.org/abs/2407.16269v1

Compressor summary: The paper introduces HyTAS, a benchmark for searching optimal transformer architectures for hyperspectral imaging classification tasks, and evaluates 12 methods on 5 datasets.


Image Classification using Fuzzy Pooling in Convolutional Kolmogorov-Arnold Networks

http://arxiv.org/abs/2407.16268v1

Compressor summary: The paper proposes a new CNN architecture with Kolmogorov-Arnold Network and Fuzzy Pooling for interpretable and accurate image classification tasks, and shows its effectiveness in comparison to traditional models.


Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words

http://arxiv.org/abs/2407.16266v1

Compressor summary: This study introduces a new machine translation evaluation method that considers non-binary gender and uses Emotional Attitude Score to measure ambiguous attitude words, revealing significant bias in current models.


Masks and Manuscripts: Advancing Medical Pre-training with End-to-End Masking and Narrative Structuring

http://arxiv.org/abs/2407.16264v1

Compressor summary: The text proposes a two-step approach to improve medical contrastive learning by standardizing text reports, converting them into binary questions, and enhancing visual pre-training with a Meijering-based masking technique.


DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

http://arxiv.org/abs/2407.16260v1

Compressor summary: DreamDissector is a text-to-3D method that generates multiple independent objects with plausible interactions by disentangling a NeRF and using category score distillation sampling.


Self-Reasoning Assistant Learning for non-Abelian Gauge Fields Design

http://arxiv.org/abs/2407.16255v1

Compressor summary: The text proposes a new learning framework that can generate non-Abelian gauge fields for studying condensed matter physics by using self-reasoning and continuous transformation of data.


LawLuo: A Chinese Law Firm Co-run by LLM Agents

http://arxiv.org/abs/2407.16252v1

Compressor summary: LawLuo is a novel legal dialogue framework using multiple LLM agents that collaborate to provide comprehensive legal consultations, overcoming limitations of existing Chinese legal LLMs.


Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval

http://arxiv.org/abs/2407.16248v1

Compressor summary: The paper proposes SGMN, a model that uses text guidance, spatiotemporal graphing, and multi-modal hard example mining to accurately identify products in livestreaming sales videos.


Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning

http://arxiv.org/abs/2407.16245v1

Compressor summary: This paper explores how to select intermediate tasks for transfer learning and compares four methods, finding that pairwise token similarity is the best predictor of transfer performance.


HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

http://arxiv.org/abs/2407.16244v1

Compressor summary: HSVLT is a novel Transformer-based method for multi-label image classification that uses hierarchical multi-scale architecture and interactive visual-linguistic attention to improve performance and efficiency.


Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities

http://arxiv.org/abs/2407.16243v1

Compressor summary: Chameleon is a robust textual-visual multimodal learning method that works well even when some modalities are missing, unlike conventional multi-branch designs.


Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

http://arxiv.org/abs/2407.16239v1

Compressor summary: The authors propose nonlinear ICA-based bandit algorithms that can learn latent variables from observational data and infer the optimal action for each patient, improving personalized decision-making in health applications.


A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation

http://arxiv.org/abs/2407.16234v1

Compressor summary: Keypoints: - The paper proposes a new method for age estimation using facial features - It combines graph convolutional neural network with multi-view mask contrastive learning - It uses an asymmetric siamese network to learn latent representations and reconstruct missing information - It has two stages: feature extraction and age estimation - It outperforms existing methods on benchmark datasets Summary: The paper presents MMCL-GCN, a novel method for estimating age using facial features. It uses graph convolutional network and multi-view mask contrastive learning to learn complex structural and semantic information. It has two stages: feature extraction with an asymmetric siamese network and age estimation with extreme learning machines. It achieves better results than existing methods on benchmark datasets.


Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

http://arxiv.org/abs/2407.16232v1

Compressor summary: The paper introduces a new attention method (CPAT) for super-resolution that expands windows along feature maps and a spatial-frequency interaction module (SFIM) that integrates information from both domains, achieving state-of-the-art results.


OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

http://arxiv.org/abs/2407.16224v1

Compressor summary: OutfitAnyone is a two-stream conditional diffusion model that generates lifelike virtual clothing images by handling garment deformation and adapting to various factors like pose, body shape, and image types.


PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment

http://arxiv.org/abs/2407.16222v1

Compressor summary: PreAlign improves multilingual alignment in large language models during pretraining, leading to better cross-lingual performance.


Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

http://arxiv.org/abs/2407.16221v1

Compressor summary: This paper explores Abstention Ability (AA), a crucial aspect of large language models' reliability, and proposes evaluation methods to improve their ability to refrain from answering when uncertain or when answers are unanswerable.


ODGR: Online Dynamic Goal Recognition

http://arxiv.org/abs/2407.16220v1

Compressor summary: The paper proposes Online Dynamic Goal Recognition (ODGR), a novel reinforcement learning approach to recognize an agent's goals in real-time, overcoming limitations of traditional goal recognition methods.


A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

http://arxiv.org/abs/2407.16216v1

Compressor summary: The text discusses the progress and challenges of large language models in generating accurate and human-like responses, and reviews various methods to improve their performance.


Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

http://arxiv.org/abs/2407.16214v1

Compressor summary: Diff-Shadow is a global-guided diffusion model that combines local and global information for high-quality shadow removal in images.


Graph-Structured Speculative Decoding

http://arxiv.org/abs/2407.16207v1

Compressor summary: The paper proposes Graph-structured Speculative Decoding (GSD), which generates multiple hypotheses and uses a directed acyclic graph to efficiently merge recurring token sequences, achieving significant speedup for inference of Large Language Models.


CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

http://arxiv.org/abs/2407.16204v1

Compressor summary: The visual-text inpainting task aims to restore and complete missing information in both damaged scene text images and their corresponding texts by leveraging complementary information from both modalities using a cross-modal predictive interaction model.


MCTS Based Dispatch of Autonomous Vehicles under Operational Constraints for Continuous Transportation

http://arxiv.org/abs/2407.16200v1

Compressor summary: The text describes how Monte Carlo Tree Search (MCTS) can be used to optimize haul-truck dispatch in mining by incorporating operational constraints as opportunity costs in the optimization problem.


INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

http://arxiv.org/abs/2407.16198v1

Compressor summary: INF-LLaVA is a novel multimodal language model with innovative modules to process high-resolution images by capturing both local and global information.


CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation

http://arxiv.org/abs/2407.16193v1

Compressor summary: CloudFixer is a test-time input adaptation method for 3D point clouds that uses a pre-trained diffusion model and optimizes geometric transformation parameters to handle noisy points and improve recognition performance.


Artificial Agency and Large Language Models

http://arxiv.org/abs/2407.16190v1

Compressor summary: The paper proposes a model of artificial agents based on their history, repertoire, and environment, argues that LLMs are not agents yet, but could become so with additional modules, and discusses challenges and future research directions.


EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles

http://arxiv.org/abs/2407.16189v1

Compressor summary: EIANet uses a novel attention mechanism with an ETF classifier to separate and focus on discriminative features for fine-grained visual categorization in SFDA.


Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

http://arxiv.org/abs/2407.16181v1

Compressor summary: The paper introduces a new technique to improve neural grammar induction by focusing on relevant parse trees per sentence, reducing errors, variance, and simplicity bias.


Logifold: A Geometrical Foundation of Ensemble Machine Learning

http://arxiv.org/abs/2407.16177v1

Compressor summary: The paper proposes a new method to analyze datasets using logifolds, which can improve ensemble machine learning and identify fuzzy domains.


Pixel Embedding: Fully Quantized Convolutional Neural Network with Differentiable Lookup Table

http://arxiv.org/abs/2407.16174v1

Compressor summary: Pixel embedding replaces float-valued input pixels with low-bit vectors, reducing quantization errors and increasing efficiency for deep neural networks.


Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance

http://arxiv.org/abs/2407.16173v1

Compressor summary: The authors propose a hybrid method for 3D indoor scene reconstruction using meshes and 3D Gaussian Splatting, with Segment Anything Model to guide the selection, and an additional densification stage to improve image quality.


Learning Trimodal Relation for AVQA with Missing Modality

http://arxiv.org/abs/2407.16171v1

Compressor summary: The paper proposes a framework that improves audio-visual question answering (AVQA) robustness by using relation-aware models to handle missing modalities and enhance features across audio and visual inputs.


Progressively Modality Freezing for Multi-Modal Entity Alignment

http://arxiv.org/abs/2407.16168v1

Compressor summary: The paper proposes a new method called PMF for aligning entities across different knowledge graphs by focusing on relevant features and enhancing multi-modal fusion with a cross-modal association loss.


Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks

http://arxiv.org/abs/2407.16166v1

Compressor summary: The study used NLP and large language models to generate synthetic patient notes with balanced privacy and utility, finding re-identified data more effective than de-identified data.


Representation Magnitude has a Liability to Privacy Vulnerability

http://arxiv.org/abs/2407.16164v1

Compressor summary: The paper proposes a model-level solution, Saturn Ring Classifier Module (SRCM), to reduce privacy vulnerability in machine learning models by creating a confined representation space.


TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes

http://arxiv.org/abs/2407.16161v1

Compressor summary: The TransFeat-TPP model uses a Transformer network to better incorporate contextual data into event models, improving interpretability and prediction accuracy.


UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models

http://arxiv.org/abs/2407.16160v1

Compressor summary: UniMEL is a framework that uses large language models to link ambiguous mentions in multimodal contexts to entities in a knowledge base, improving performance and scalability.


DDK: Distilling Domain Knowledge for Efficient Large Language Models

http://arxiv.org/abs/2407.16154v1

Compressor summary: The paper introduces DDK, a framework that adjusts the distillation dataset composition to improve smaller LLMs' performance by transferring knowledge from larger LLMs in a stable and effective way.


On the Benefits of Rank in Attention Layers

http://arxiv.org/abs/2407.16153v1

Compressor summary: The paper studies how rank and number of heads in attention mechanisms affect their performance on different target functions and context lengths, and provides theoretical and empirical evidence for the trade-offs involved.


Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis

http://arxiv.org/abs/2407.16150v1

Compressor summary: The text uses deep learning networks to predict stock prices based on news articles and shows that combining different news categories improves accuracy.


CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

http://arxiv.org/abs/2407.16148v1

Compressor summary: The authors explore the use of LLMs for generating hierarchical organizations of scientific studies to help with literature reviews, create a dataset (CHIME) for this task, and train a corrector model to improve study assignments based on human feedback.


Improved Few-Shot Image Classification Through Multiple-Choice Questions

http://arxiv.org/abs/2407.16145v1

Compressor summary: Key points: - VQA models can act as zero-shot image classifiers with language prompts - Zero-shot VQA performance is often low due to different data distributions and category names - The proposed method uses few-shot learning with multiple-choice questions to improve VQA performance - The method outperforms visual encoders and zero-shot VQA baselines on common few-shot tasks - The method works well on diverse visual attributes such as clothing features Summary: The paper proposes a few-shot learning method for VQA models to enhance their image classification ability using multiple-choice questions, which improve the relevance of visual information and overcome data distribution issues.


Diffusion Models as Optimizers for Efficient Planning in Offline RL

http://arxiv.org/abs/2407.16142v1

Compressor summary: The paper proposes Trajectory Diffuser, a method that speeds up diffusion models for reinforcement learning tasks by separating the generation and optimization of feasible trajectories.


Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data

http://arxiv.org/abs/2407.16134v1

Compressor summary: The paper explores how diffusion transformers can capture and leverage spatial-temporal dependencies in sequential data generation, using Gaussian process models as a case study.


Open-Set Biometrics: Beyond Good Closed-Set Models

http://arxiv.org/abs/2407.16133v1

Compressor summary: The text proposes new loss functions for biometric recognition that improve performance in open-set scenarios, where probe subjects may or may not be in the gallery, and also enhance closed-set performance.


FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

http://arxiv.org/abs/2407.16129v1

Compressor summary: The paper proposes LMA, a novel multimodal object detector with shared backbone and adaptive rank allocation, achieving significant accuracy improvement and parameter reduction over existing methods.


Advancing Brain Imaging Analysis Step-by-step via Progressive Self-paced Learning

http://arxiv.org/abs/2407.16128v1

Compressor summary: The PSPD framework uses adaptive curriculum learning to improve brain imaging analysis by adjusting training examples based on past and present models' performance.


Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion

http://arxiv.org/abs/2407.16127v1

Compressor summary: The paper proposes DIFT, a finetuning framework that leverages lightweight models and truncated sampling to improve KG completion with large language models without grounding errors.


MxT: Mamba x Transformer for Image Inpainting

http://arxiv.org/abs/2407.16126v1

Compressor summary: MxT is a new image inpainting method that combines Mamba and transformers to efficiently restore missing regions with high quality and contextual accuracy.


Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems

http://arxiv.org/abs/2407.16125v1

Compressor summary: The paper proposes a new method, DAVI, that uses diffusion models to solve inverse problems more efficiently and effectively than existing methods.


Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos

http://arxiv.org/abs/2407.16124v1

Compressor summary: The paper introduces FVMD, a metric for evaluating motion consistency in generated videos, which outperforms existing metrics and can improve VQA models.


Towards Effective Fusion and Forecasting of Multimodal Spatio-temporal Data for Smart Mobility

http://arxiv.org/abs/2407.16123v1

Compressor summary: Key points: - The text discusses the challenges and methods for multimodal spatio-temporal data fusion and forecasting in smart mobility scenarios. - The challenges include insufficient data, complex transportation modes, and partial data loss. - The methods aim to transfer knowledge, distinguish features, and fuse sparse representations. Summary: The text summarizes the research on how to deal with multimodal spatio-temporal data fusion and forecasting challenges in smart mobility scenarios, such as insufficient data, complex transportation modes, and partial data loss. The methods involve knowledge transfer, feature distinction, and sparse representation fusion.


Transformer-based Graph Neural Networks for Battery Range Prediction in AIoT Battery-Swap Services

http://arxiv.org/abs/2407.16115v1

Compressor summary: The paper proposes a novel AIoT model called SEB-Transformer that predicts the battery range of shared e-bikes, enabling better route planning and user experience.


Analyzing the Polysemy Evolution using Semantic Cells

http://arxiv.org/abs/2407.16110v1

Compressor summary: The paper studies how the meanings of words change over time through analysis of sentences using Chat GPT, showing that word polysemy is an evolutionary consequence of modifying Semantic Cells.