arxiv compressed, 2024-07-19

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-19 generated by the compressor, my personal LLM-based project.


Training-Free Model Merging for Multi-target Domain Adaptation

http://arxiv.org/abs/2407.13771v1

Compressor summary: The paper proposes a method to merge scene understanding models trained on different domains without accessing their training data, using linear model parameter merging and Gaussian prior-based buffer merging.


Addressing Imbalance for Class Incremental Learning in Medical Image Classification

http://arxiv.org/abs/2407.13768v1

Compressor summary: Key points: - The text is about class incremental learning (CIL) in the medical domain, which suffers from catastrophic forgetting when training on new classes. - The text introduces two plug-in methods to mitigate the imbalance caused by the class overlap and the distribution margin. - The text reports better performance than state-of-the-art methods on three benchmark datasets. Summary: The text presents two simple methods to improve CIL in the medical domain, which address the imbalance problem due to class overlap and distribution margin, and achieve better results than existing methods.


Visual Haystacks: Answering Harder Questions About Sets of Images

http://arxiv.org/abs/2407.13766v1

Compressor summary: The paper introduces Multi-Image Visual Question Answering (MIQA) as a new task, proposes a public benchmark called "Visual Haystacks," and presents MIRAGE, a novel framework that improves efficiency and accuracy for large multimodal models in this task.


Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

http://arxiv.org/abs/2407.13765v1

Compressor summary: The authors propose a formal approach using structural causal models (SCM) to analyze and design probing experiments for language models, showing how it helps understand their unsupervised learning of latent causal concepts from text.


Shape of Motion: 4D Reconstruction from a Single Video

http://arxiv.org/abs/2407.13764v1

Compressor summary: Our method reconstructs dynamic scenes from monocular videos by exploiting low-dimensional structure of 3D motion and using data-driven priors to consolidate supervisory signals.


Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

http://arxiv.org/abs/2407.13759v1

Compressor summary: The text describes a method for generating realistic long sequences of city views based on language input and map data, using video diffusion and an autoregressive framework with temporal imputation to maintain quality and consistency.


Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

http://arxiv.org/abs/2407.13757v1

Compressor summary: The paper explores how retrieval-enhanced generative models are vulnerable to black-box attacks that manipulate ranking results and affect user cognition and decision-making.


Random Latent Exploration for Deep Reinforcement Learning

http://arxiv.org/abs/2407.13755v1

Compressor summary: Random Latent Exploration (RLE) is a new exploration technique for deep reinforcement learning that combines bonus-based and noise-based strategies, adding structured random rewards to encourage agent exploration in certain states.


Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units

http://arxiv.org/abs/2407.13753v1

Compressor summary: The study explores using facial expressions and emotions as objective biomarkers for depression by analyzing video data of people with or without the disorder.


LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

http://arxiv.org/abs/2407.13752v1

Compressor summary: Key points: - Text-to-image model customization can use new concepts with few examples - Logos are hard to learn for diffusion models due to unique patterns and textual elements - Logo insertion task aims to insert logos into diffusion models and synthesize them in contexts - LogoSticker is a two-phase pipeline that pre-trains actor-critic relation and learns decoupled identity of logos - LogoSticker outperforms customization methods and DALLE~3 Summary: LogoSticker is a novel logo insertion method that uses a two-phase pipeline to train diffusion models to synthesize logos accurately and harmoniously in various contexts, surpassing existing methods.


General Geometry-aware Weakly Supervised 3D Object Detection

http://arxiv.org/abs/2407.13748v1

Compressor summary: Key points: - Paper proposes a general approach for weakly supervised 3D object detection using RGB images and 2D boxes - Approach consists of three components: prior injection, 2D projection constraint, and 3D geometry constraint - Method achieves high-quality 3D bounding boxes with only 2D annotation on two datasets Summary: The paper presents a general method for weakly supervised 3D object detection that uses RGB images and 2D boxes, and applies three components to obtain accurate 3D boxes without 3D annotations.


Multi-Label Learning with Stronger Consistency Guarantees

http://arxiv.org/abs/2407.13746v1

Compressor summary: The study proposes a new surrogate loss function for multi-label learning that accounts for label correlations and has optimal consistency guarantees, as well as adapting standard classification losses to multi-label settings.


MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References

http://arxiv.org/abs/2407.13745v1

Compressor summary: MaRINeR is a method that uses a nearby mapping image to improve the quality of novel views in Computer Vision and Robotics tasks by matching deep features and transferring details hierarchically.


LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation

http://arxiv.org/abs/2407.13744v1

Compressor summary: The paper discusses how natural language processing models are becoming more generalist and suggests evaluating their strengths and weaknesses based on their ability to approximate specialist functions from natural language specifications.


Optimistic Q-learning for average reward and episodic reinforcement learning

http://arxiv.org/abs/2407.13743v1

Compressor summary: The paper introduces an optimistic Q-learning algorithm for average reward reinforcement learning with a relaxed assumption on the frequent state visiting time, and shows a regret bound of O(H^5 S√AT).


Scaling Granite Code Models to 128K Context

http://arxiv.org/abs/2407.13739v1

Compressor summary: The paper presents long-context Granite code models that improve context handling up to 128K tokens using continual pretraining and instruction-tuned fine-tuning, with no performance drop on regular tasks.


Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

http://arxiv.org/abs/2407.13734v1

Compressor summary: The tutorial covers RL-based methods for optimizing diffusion models to generate samples that maximize desired metrics in biology applications.


Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer

http://arxiv.org/abs/2407.13732v1

Compressor summary: The authors study different surrogate loss functions for learning to defer and show their consistency properties under various conditions and hypothesis sets.


Baba Is AI: Break the Rules to Beat the Benchmark

http://arxiv.org/abs/2407.13729v1

Compressor summary: The text describes a new benchmark for testing multi-modal language models based on the game Baba Is You, where agents have to manipulate both objects and rules to win.


Enhanced $H$-Consistency Bounds

http://arxiv.org/abs/2407.13722v1

Compressor summary: This paper proposes a general framework to derive better $H$-consistency bounds for surrogate losses by relaxing previous assumptions on the relationship between zero-one estimation error and surrogate loss estimation error.


HazeCLIP: Towards Language Guided Real-World Image Dehazing

http://arxiv.org/abs/2407.13719v1

Compressor summary: HazeCLIP improves real-world dehazing performance using a language-guided adaptation framework based on CLIP model's ability to distinguish between hazy and clean images.


Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning

http://arxiv.org/abs/2407.13715v1

Compressor summary: The study proposes an Open World Compositional Zero-Shot Learning model with self-attention and external knowledge to predict realistic compositions of attributes and objects.


FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

http://arxiv.org/abs/2407.13711v1

Compressor summary: The paper proposes a method to improve uncertainty estimates in deep neural networks by placing a prior on function space instead of weight space, using structured and interpretable inductive biases.


Understanding Reference Policies in Direct Preference Optimization

http://arxiv.org/abs/2407.13709v1

Compressor summary: This paper explores how the choice of reference policy affects Direct Preference Optimization (DPO) for instruction fine-tuning of large language models, and provides guidance on optimal settings and similarities between reference policies and target models.


Are We Ready for Out-of-Distribution Detection in Digital Pathology?

http://arxiv.org/abs/2407.13708v1

Compressor summary: The paper evaluates OOD detection methods in digital pathology using proper protocols and exploring advanced ML settings.


ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

http://arxiv.org/abs/2407.13702v1

Compressor summary: ANHALTEN is a new German dataset for cross-lingual transfer in reference-free hallucination detection that shows the benefits of few-shot learning with minimal annotations.


Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

http://arxiv.org/abs/2407.13696v1

Compressor summary: The text discusses the importance of standardized benchmark agreement testing (BAT) for evaluating language models and introduces BenchBench, a python package to facilitate BAT and improve its robustness and validity.


HPix: Generating Vector Maps from Satellite Images

http://arxiv.org/abs/2407.13680v1

Compressor summary: HPix is a new method that uses GANs to create detailed vector maps from satellite images, overcoming limitations of existing techniques.


PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers

http://arxiv.org/abs/2407.13677v1

Compressor summary: PASTA is a transformer-based model that can generate realistic and diverse 3D objects by composing cuboidal primitives and synthesizing high quality meshes, using various inputs and manipulating object parts.


Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

http://arxiv.org/abs/2407.13666v1

Compressor summary: The paper proposes a new data-driven approach for uncertainty quantification in regression that improves confidence intervals for both LASSO and neural network predictors by estimating bias terms from training data.


Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

http://arxiv.org/abs/2407.13664v1

Compressor summary: DFCL is a framework that integrates machine learning and operation research for optimal budget allocation in marketing, addressing technical challenges like uncertainty, counterfactual computation, and computational cost.


CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

http://arxiv.org/abs/2407.13660v1

Compressor summary: CogniVoice is a new framework that uses speech data and its textual transcriptions to detect mild cognitive impairment (MCI) and estimate mental state scores in multiple languages.


FuLG: 150B Romanian Corpus for Language Model Pretraining

http://arxiv.org/abs/2407.13657v1

Compressor summary: FuLG is a large Romanian corpus created from CommonCrawl, with a new methodology for filtering and comparing its quality to other Romanian corpora.


Weak-to-Strong Reasoning

http://arxiv.org/abs/2407.13647v1

Compressor summary: The paper proposes a progressive learning framework that improves the reasoning capabilities of large language models using weaker models without needing external supervision or human-annotated data.


Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments

http://arxiv.org/abs/2407.13640v1

Compressor summary: The paper proposes a multi-mode synchronization learning (MMSL) strategy to improve person re-identification accuracy under extreme capture conditions by applying diverse data augmentation techniques without altering the original image structure.


A Comparative Study on Automatic Coding of Medical Letters with Explainability

http://arxiv.org/abs/2407.13638v1

Compressor summary: The study applies NLP and ML to automate medical coding with explainability and light-weighted models using a public database and network models, achieving high accuracy in code prediction.


Data Alchemy: Mitigating Cross-Site Model Variability Through Test Time Data Calibration

http://arxiv.org/abs/2407.13632v1

Compressor summary: Data Alchemy is a method to improve cross-site analysis and tumor classification in histopathology images using stain normalization and template learning, without changing network weights or requiring site-specific fine-tuning.


Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

http://arxiv.org/abs/2407.13623v1

Compressor summary: The optimal vocabulary size depends on the compute budget and is often overlooked in language model scaling research, leading to under-fitting and suboptimal performance.


Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

http://arxiv.org/abs/2407.13622v1

Compressor summary: The paper by Dong & Yang (2023) explores the sample complexity of obtaining an optimal policy for misspecified sparse linear bandits in reinforcement learning, showing that a novel elimination-based algorithm can achieve suboptimal guarantees with a polynomial number of samples.


Differential Privacy Mechanisms in Neural Tangent Kernel Regression

http://arxiv.org/abs/2407.13621v1

Compressor summary: The paper studies how differential privacy works in neural network learning and shows that it can protect user data while maintaining accurate predictions.


Training-free Composite Scene Generation for Layout-to-Image Synthesis

http://arxiv.org/abs/2407.13609v1

Compressor summary: This paper presents a training-free method to generate high-quality images from textual descriptions by resolving token conflicts and improving pixel relationships using attention redistribution.


dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

http://arxiv.org/abs/2407.13608v1

Compressor summary: The paper describes the dzNLP team's approach to Multi-label Country-level Dialect Identification using various machine learning techniques and achieving high precision but low recall.


Physics-guided Active Sample Reweighting for Urban Flow Prediction

http://arxiv.org/abs/2407.13605v1

Compressor summary: The paper proposes a physics-guided neural network and a data-aware framework to improve urban flow prediction by addressing the limitations of existing physics-guided machine learning methods.


dzStance at StanceEval2024: Arabic Stance Detection based on Sentence Transformers

http://arxiv.org/abs/2407.13603v1

Compressor summary: Sentence Transformers perform better than TF-IDF features in detecting writers' stances on COVID-19 vaccine, digital transformation, and women empowerment, as shown by the team dzStance's results in a stance detection competition.


PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks

http://arxiv.org/abs/2407.13597v1

Compressor summary: The text introduces a new challenge in summarization called planning-like tasks that involve generating actions to achieve specific goals and proposes a dataset and a baseline method for this problem.


EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension

http://arxiv.org/abs/2407.13596v1

Compressor summary: EarthMarker is a novel visual prompting model that improves multi-granularity remote sensing imagery interpretation by leveraging natural and RS domain-specific knowledge, cross-domain phased learning, and a new dataset called RSVP.


Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach

http://arxiv.org/abs/2407.13594v1

Compressor summary: The paper proposes a formal framework for mechanistic interpretability of neural networks using abstract interpretation and demonstrates it on a Transformer-based model solving 2-SAT problems.


MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

http://arxiv.org/abs/2407.13592v1

Compressor summary: MeshFeat is a new encoding technique for neural fields on meshes that uses multi-resolution feature grids and simplifies the mesh structure to speed up inference and maintain reconstruction quality for texture and BRDF representation.


Robust Calibration of Large Vision-Language Adapters

http://arxiv.org/abs/2407.13588v1

Compressor summary: The paper tackles miscalibration in CLIP-based model adaptation for out-of-distribution samples and proposes a simple, model-agnostic solution to scale logit ranges.


Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation

http://arxiv.org/abs/2407.13584v1

Compressor summary: The paper proposes a framework to improve text-to-3D generation quality by analyzing and addressing issues like limited detail, low fidelity, and oversaturation in the generated 3D assets.


Towards Zero-Shot Multimodal Machine Translation

http://arxiv.org/abs/2407.13579v1

Compressor summary: Key points: - The paper proposes a method (ZeroMMT) to train MMT systems with multimodal English data only - ZeroMMT adapts a text-only MT model by using visually conditioned masked language modelling and KL divergence between MMT outputs - ZeroMMT achieves disambiguation performance comparable to state-of-the-art MMT models on CoMMuTE benchmark and can be extended to other languages - ZeroMMT allows controlling the trade-off between disambiguation and translation fidelity without extra data Summary: ZeroMMT is a method to train multimodal machine translation systems using only English data, by adapting a text-only model with visual and diversity objectives. It performs well on disambiguation tasks across languages and can balance disambiguation and translation quality.


Large Language Models as Reliable Knowledge Bases?

http://arxiv.org/abs/2407.13578v1

Compressor summary: This study evaluates the reliability and effectiveness of large language models as knowledge bases using new metrics and finds that current models have significant limitations in factuality and consistency, regardless of model size or fine-tuning methods.


New Capability to Look Up an ASL Sign from a Video Example

http://arxiv.org/abs/2407.13571v1

Compressor summary: The system allows users to search for unknown ASL signs by submitting a video and receiving the most likely matches to improve ASL dictionary lookup and annotation efficiency.


dzFinNlp at AraFinNLP: Improving Intent Detection in Financial Conversational Agents

http://arxiv.org/abs/2407.13565v1

Compressor summary: The paper introduces dzFinNlp's intent detection system for financial chatbots, using various machine learning and deep learning models, achieving high scores on a benchmark dataset.


Research on Tibetan Tourism Viewpoints information generation system based on LLM

http://arxiv.org/abs/2407.13561v1

Compressor summary: The text discusses a study on using an AI system to improve tourism services in Tibet by generating more accurate information about the region's complex topography and historical sites.


Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

http://arxiv.org/abs/2407.13559v1

Compressor summary: Qalam is a novel foundation model for Arabic OCR and HWR that achieves high accuracy and handles diacritics and high-resolution inputs well.


PetFace: A Large-Scale Dataset and Benchmark for Animal Identification

http://arxiv.org/abs/2407.13555v1

Compressor summary: PetFace is a large dataset for animal face identification with detailed annotations and benchmarks, helping to advance automated animal recognition methods.


On the Discriminability of Self-Supervised Representation Learning

http://arxiv.org/abs/2407.13541v1

Compressor summary: The paper analyzes the crowding problem in self-supervised learning (SSL) features and proposes a learnable regulator called Dynamic Semantic Adjuster (DSA) to improve feature separation and aggregation for complex downstream tasks.


EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models

http://arxiv.org/abs/2407.13538v1

Compressor summary: EnergyDiff is a generative AI framework for creating realistic time series data for energy systems, improving on temporal dependencies and marginal distributions.


Evaluating the performance-deviation of itemKNN in RecBole and LensKit

http://arxiv.org/abs/2407.13531v1

Compressor summary: ItemKNN algorithms in RecBole and LensKit libraries were compared using four data sets, showing that RecBole performed better on most metrics until similarity matrix calculations were modified in LensKit, resulting in near-identical performance.


Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts

http://arxiv.org/abs/2407.13526v1

Compressor summary: The authors propose a sparse Mixture-of-Experts model with Logistic Regressors for interpretable outcome prediction from partial process traces, selecting input features automatically during training.


Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation

http://arxiv.org/abs/2407.13524v1

Compressor summary: The text introduces a new approach called Low-confidence Pseudo Label Distillation (LPLD) to improve source-free domain adaptive object detection by better utilizing low-confidence pseudo labels from Region Proposal Network (RPN).


INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages

http://arxiv.org/abs/2407.13522v1

Compressor summary: The paper introduces Indic-QA, a large context-grounded question-answering dataset for 11 Indian languages, to evaluate multilingual LLMs' performance in non-English QA tasks.


EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting

http://arxiv.org/abs/2407.13520v1

Compressor summary: EaDeblur-GS is a method that uses event cameras and Gaussian splatting to improve 3D reconstruction from blurry images with complex motion.


Model-based Policy Optimization using Symbolic World Model

http://arxiv.org/abs/2407.13518v1

Compressor summary: The authors propose using symbolic regression to generate transition dynamics models for robotics, which improves sample efficiency and extrapolation quality in model-based reinforcement learning.


Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization

http://arxiv.org/abs/2407.13513v1

Compressor summary: Key points: - DAC is a challenge of setting hyperparameters for different instances - Deep RL agents have limited generalization in DAC due to bias in training instances - The paper proposes instance selection based on time series features to improve generalization - Empirical evaluations show the benefits of instance selection on DAC benchmarks Summary: The paper introduces a method for improving generalization of deep RL agents in dynamic algorithm configuration by selecting representative training instances using time series features.


Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks

http://arxiv.org/abs/2407.13511v1

Compressor summary: The paper compares the performance of commercial and open-source large language models in a natural language processing challenge, finding that open-source models are competitive in some settings but need more data for better results.


Enhancing Biomedical Knowledge Discovery for Diseases: An End-To-End Open-Source Framework

http://arxiv.org/abs/2407.13492v1

Compressor summary: The framework extracts disease-related knowledge from text, creating annotated datasets for Rett syndrome and Alzheimer's disease, while benchmarking and probing language models' semantic relation detection capabilities.


Combining Constraint Programming Reasoning with Large Language Model Predictions

http://arxiv.org/abs/2407.13490v1

Compressor summary: The paper proposes a solution to overcome the challenges in text generation by combining Constraint Programming (CP) and Machine Learning (ML), using a Large Language Model (LLM) to generate words with meaning and CP to handle structural constraints, resulting in faster and better results than standard NLP methods.


Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection?

http://arxiv.org/abs/2407.13488v1

Compressor summary: The study introduces MUSE, a simple but robust baseline for detecting out-of-context misinformation by comparing image-text pairs with external evidence, and shows its effectiveness on two datasets.


Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation

http://arxiv.org/abs/2407.13481v1

Compressor summary: Large language models struggle to suggest missing elements in long lists due to attention overflow, which can be mitigated by iterative loops but at a cost of novelty loss.


Fixed and Adaptive Simultaneous Machine Translation Strategies Using Adapters

http://arxiv.org/abs/2407.13469v1

Compressor summary: The paper proposes using lightweight adapter modules in machine translation models to achieve multiple latency levels without training separate models, and demonstrates improved performance over existing methods.


End-To-End Clinical Trial Matching with Large Language Models

http://arxiv.org/abs/2407.13463v1

Compressor summary: The text describes an automated system using large language models that can match cancer patients to clinical trials more accurately and efficiently than human experts.


All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models

http://arxiv.org/abs/2407.13449v1

Compressor summary: The study measures how similar different image generation models are by creating linear maps between their latent spaces and finds that they learn similar representations, especially for gender in CelebA models.


BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

http://arxiv.org/abs/2407.13442v1

Compressor summary: Key points: - VLMs use visual encoder and LLM to perceive the world - VLMs are prone to hallucination, which reduces trustworthiness - The authors introduce BEAF dataset and new metrics (TU, IG, SB, ID) to measure hallucination based on scene changes - The new metrics reveal different aspects of VLM hallucination that have not been reported before Summary: The paper proposes a new benchmark and metrics for measuring hallucination in vision language models (VLMs), which use a visual encoder and a large language model to perceive the world. The benchmark manipulates scene information by image editing and evaluates VLMs on their ability to detect changes.


Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies

http://arxiv.org/abs/2407.13435v1

Compressor summary: The paper proposes a cheap way to collect more TTS training data for low-resource languages like Hindi and Tamil by using volunteers instead of professional voice artists, which improves out-of-vocabulary word pronunciation without affecting voice quality or in-domain performance.


Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations

http://arxiv.org/abs/2407.13431v1

Compressor summary: The paper proposes a new OoD testing protocol for trajectory prediction models that homogenizes datasets and tasks, introduces a polynomial-based algorithm for smaller and faster models with near SotA ID performance and improved OoD robustness, and studies the effects of two augmentation strategies on model generalization.


Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information

http://arxiv.org/abs/2407.13429v1

Compressor summary: The paper proposes a method to train acquirers for multivariate time series using conditional mutual information to improve performance and reduce costs.


WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration

http://arxiv.org/abs/2407.13426v1

Compressor summary: WiNet estimates scale-wise wavelet coefficients for displacement/velocity fields using the wavelet transform, enabling fast and explainable image registration with low memory usage.


CycleMix: Mixing Source Domains for Domain Generalization in Style-Dependent Data

http://arxiv.org/abs/2407.13421v1

Compressor summary: The paper proposes a method to improve deep learning-based image classification by using CycleGAN to learn and disregard image styles, enhancing generalization ability.


From Words to Worlds: Compositionality for Cognitive Architectures

http://arxiv.org/abs/2407.13419v1

Compressor summary: The study examines how different factors affect the compositionality of large language models and identifies challenges for improving their abilities to learn compositional strategies.


GDDS: A Single Domain Generalized Defect Detection Frame of Open World Scenario using Gather and Distribute Domain-shift Suppression Network

http://arxiv.org/abs/2407.13417v1

Compressor summary: The GDDS method detects surface defects on photovoltaic modules using a single domain generalized approach, improving accuracy and speed while addressing distribution shift and normalized Wasserstein distance for similarity measurement.


Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

http://arxiv.org/abs/2407.13399v1

Compressor summary: The text introduces a new algorithm, $\chi^2$-Preference Optimization ($\chi$PO), which improves sample-efficiency in offline language model alignment by mitigating overoptimization using the $\chi^2$-divergence.


PICASSO: A Feed-Forward Framework for Parametric Inference of CAD Sketches via Rendering Self-Supervision

http://arxiv.org/abs/2407.13394v1

Compressor summary: PICASSO is a novel framework that can learn parametric CAD sketches from precise or hand-drawn images by using self-supervised rendering techniques and geometric cues.


GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields

http://arxiv.org/abs/2407.13390v1

Compressor summary: GeometrySticker is a method for embedding binary messages into the geometry components of NeRF models to protect copyright, while preserving effectiveness against recolorization.


Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols

http://arxiv.org/abs/2407.13382v1

Compressor summary: The paper proposes a neuro-symbolic approach to find object configurations in images, using first-order logic and language-vision models, and shows its applicability to real-world scenarios.


Removing cloud shadows from ground-based solar imagery

http://arxiv.org/abs/2407.13379v1

Compressor summary: The study introduces a new method using U-Net architecture and conditional GAN to remove cloud shadows from solar images for better space weather prediction.


Linear-Complexity Self-Supervised Learning for Speech Processing

http://arxiv.org/abs/2407.13377v1

Compressor summary: This paper introduces SummaryMixing, a linear-complexity context encoder for self-supervised learning that reduces pre-training time and resources while maintaining or improving performance on downstream tasks.


Any Image Restoration with Efficient Automatic Degradation Adaptation

http://arxiv.org/abs/2407.13372v1

Compressor summary: The paper proposes a unified model that efficiently restores degraded images using joint embedding, gated reweighting, and contextualized attention.


Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction

http://arxiv.org/abs/2407.13368v1

Compressor summary: The paper proposes an improved method for robots to understand and interact with objects in open world settings by combining affordance representation, vision-language models, and human feedback.


Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

http://arxiv.org/abs/2407.13364v1

Compressor summary: The GAE algorithm combines AE and MDP homomorphisms to explore dynamical systems' state spaces more efficiently for scientific discovery using geometric structures.


Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

http://arxiv.org/abs/2407.13362v1

Compressor summary: The paper introduces Geometry Guided Self-Distillation (GGSD), which improves open vocabulary 3D scene understanding by leveraging geometric priors from 2D data and enhancing representation learning with self-distillation.


Capturing Style in Author and Document Representation

http://arxiv.org/abs/2407.13358v1

Compressor summary: The paper introduces a new NLP model that learns embeddings for authors and documents with a focus on capturing their writing style, and shows its effectiveness on three datasets.


Learning-From-Mistakes Prompting for Indigenous Language Translation

http://arxiv.org/abs/2407.13343v1

Compressor summary: The paper proposes techniques to improve translation for indigenous languages with little data using large language models and specific prompting methods.


Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds

http://arxiv.org/abs/2407.13342v1

Compressor summary: The paper proposes a novel method to reconstruct surfaces with fine-grained details using neural signed distance functions and a non-linear implicit filter.


Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

http://arxiv.org/abs/2407.13338v1

Compressor summary: The paper proposes a novel SLAM framework for dynamic environments that leverages continual learning, forgetting, and object identification to improve robustness.


Long-Term 3D Point Tracking By Cost Volume Fusion

http://arxiv.org/abs/2407.13337v1

Compressor summary: The paper presents a new deep learning framework for long-term 3D point tracking that generalizes well and outperforms prior methods without test-time fine-tuning.


OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

http://arxiv.org/abs/2407.13335v1

Compressor summary: The Object-level Attention Transformer (OAT) is a model that predicts human visual search behavior by using object features and a new positional encoding to generate gaze scanpaths in cluttered scenes.


Reconstruct the Pruned Model without Any Retraining

http://arxiv.org/abs/2407.13331v1

Compressor summary: LIAR is an efficient and effective compression technique for large language models that preserves high accuracy by using linear interpolation to reconstruct pruned weights.


Why do you cite? An investigation on citation intents and decision-making classification processes

http://arxiv.org/abs/2407.13329v1

Compressor summary: The authors propose an approach to classify citation intents using advanced Ensemble Strategies with Language Models and Explainable AI techniques, showing that section titles improve performances and providing a web application for this purpose.


Unsupervised Domain Adaptive Lane Detection via Contextual Contrast and Aggregation

http://arxiv.org/abs/2407.13328v1

Compressor summary: DACCA is a novel method for domain-adaptive lane detection that uses cross-domain contrastive loss and feature aggregation to improve feature learning and knowledge transfer across domains, achieving superior performance on six datasets.


RISC-V RVV efficiency for ANN algorithms

http://arxiv.org/abs/2407.13326v1

Compressor summary: This study applies RISC-V's vector extension RVV to optimize common ANN algorithms for high-performance computing using a parameterized vector block model.


Fully Test-Time rPPG Estimation via Synthetic Signal-Guided Feature Learning

http://arxiv.org/abs/2407.13322v1

Compressor summary: The paper proposes a novel Test-Time Adaptation framework for remote photoplethysmography estimation, which adapts to various domain information and heart rate distributions using synthetic signals and spectral-based entropy minimization.


Sortability of Time Series Data

http://arxiv.org/abs/2407.13313v1

Compressor summary: The paper studies how dataset characteristics, such as varsortability and R2-sortability, affect the performance of causal discovery algorithms for time-dependent processes using various types of real and simulated data.


Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering

http://arxiv.org/abs/2407.13309v1

Compressor summary: The paper proposes a new method to render high dynamic range videos from low dynamic range videos by completing missing exposure information and improving temporal consistency, resulting in state-of-the-art performance.


A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

http://arxiv.org/abs/2407.13304v1

Compressor summary: Key points: - The paper proposes a 3D shape completion dataset for agricultural vision systems using RGB-D frames and high-precision point clouds of sweet peppers in lab and greenhouse conditions. - The dataset aims to help autonomous robotic systems estimate the complete 3D shapes of fruits despite occlusions, which is challenging due to cluttered agricultural environments. - The paper also provides segmented RGB-D frames with camera intrinsic parameters and a public challenge on a benchmark server to evaluate shape completion approaches. Summary: The paper introduces a new dataset for 3D shape completion of fruits in agricultural settings, using RGB-D frames and point clouds, to enable autonomous robotic systems to harvest fruits more efficiently.


Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

http://arxiv.org/abs/2407.13303v1

Compressor summary: The paper proposes a semi-supervised learning framework for neural networks that uses unlabeled Wi-Fi fingerprints to improve indoor localization performance and can handle hybrid databases and continual expansion.


CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

http://arxiv.org/abs/2407.13301v1

Compressor summary: The study proposes Chain-of-Diagnosis (CoD), a method to improve the interpretability and controllability of large language models for medical diagnosis by creating a diagnostic chain that resembles a physician's thought process.


Robust ASR Error Correction with Conservative Data Filtering

http://arxiv.org/abs/2407.13300v1

Compressor summary: Our method filters low-quality error correction data to prevent overcorrection and improve automatic speech recognition performance in out-of-domain settings using Japanese language models.


SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning

http://arxiv.org/abs/2407.13297v1

Compressor summary: SpeciaLex is a benchmark to assess language models' ability to follow specialized lexicon constraints for various tasks and audiences.


Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

http://arxiv.org/abs/2407.13288v1

Compressor summary: The paper proposes a new indoor localization method using linked neural networks trained in a hierarchical stage-wise way, achieving the most accurate results with the UJIIndoorLoc database.


Collaborative real-time vision-based device for olive oil production monitoring

http://arxiv.org/abs/2407.13285v1

Compressor summary: The paper presents a computer-vision system that detects and warns about foreign objects in an olive grinder to prevent quality issues and machinery damage.


Auditing Local Explanations is Hard

http://arxiv.org/abs/2407.13281v1

Compressor summary: The paper proposes an auditing framework to check the consistency of machine learning algorithms' explanations, but shows that it requires a large number of queries and highlights the importance of locality in explainability.


Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

http://arxiv.org/abs/2407.13279v1

Compressor summary: The paper analyzes how using discounted reward as a proxy for total reward affects deep reinforcement learning and proposes conditions to align optimal policies of these two objectives.


Deep Time Series Models: A Comprehensive Survey and Benchmark

http://arxiv.org/abs/2407.13278v1

Compressor summary: Key points: - Time series are sequences of data points in discrete-time order with complex and dynamic patterns. - Deep learning models have advanced the analysis of time series across various tasks. - The paper reviews existing literature, introduces Time Series Library (TSLib), and evaluates 12 deep time series models on different tasks. Summary: The paper surveys deep learning models for time series analysis, introduces a benchmark library (TSLib) with 24 models, 30 datasets, and five tasks, and assesses 12 models on different tasks.


Mixture of Experts based Multi-task Supervise Learning from Crowds

http://arxiv.org/abs/2407.13268v1

Compressor summary: This paper introduces a new multi-task learning approach for truth inference in crowdsourcing, which improves the accuracy and effectiveness of worker behavior models by focusing on item features rather than hidden ground truth variables.


Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

http://arxiv.org/abs/2407.13252v1

Compressor summary: The text proposes a new method for protecting privacy in text-to-image models by detecting if an image was used to train them based on the preservation of image structures during the diffusion process.


Motif-Consistent Counterfactuals with Adversarial Refinement for Graph-Level Anomaly Detection

http://arxiv.org/abs/2407.13251v1

Compressor summary: MotifCAR is a novel graph anomaly detection method that uses motifs and GANs to create realistic, valid, proximal, and sparse counterfactual graphs for improved performance.


Are Large Language Models Capable of Generating Human-Level Narratives?

http://arxiv.org/abs/2407.13248v1

Compressor summary: The paper explores how LLMs struggle with storytelling, especially creating suspense and diversity, and proposes a framework to improve their narrative skills.


PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks

http://arxiv.org/abs/2407.13244v1

Compressor summary: The paper introduces PM-LLM-Benchmark, a comprehensive benchmark for evaluating open-source large language models in process mining tasks, and discusses the challenges and limitations of such a benchmark.


NODER: Image Sequence Regression Based on Neural Ordinary Differential Equations

http://arxiv.org/abs/2407.13241v1

Compressor summary: The paper introduces NODER, a novel framework that uses neural ODEs to model complex dynamics in medical image sequences and achieves state-of-the-art 3D image regression performance with reduced computational cost and practical applicability.


Transformers with Stochastic Competition for Tabular Data Modelling

http://arxiv.org/abs/2407.13238v1

Compressor summary: A new Transformer-based deep learning model for tabular data uses stochastic competition to promote generalization capacity and outperforms gradient boosted decision trees on various datasets.


LLM-Empowered State Representation for Reinforcement Learning

http://arxiv.org/abs/2407.13237v1

Compressor summary: The paper proposes LESR, a method that uses large language models to generate task-related state representation codes for better reinforcement learning performance.


Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts

http://arxiv.org/abs/2407.13228v1

Compressor summary:


Non-Contact Breath Rate Classification Using SVM Model and mmWave Radar Sensor Data

http://arxiv.org/abs/2407.13222v1

Compressor summary:


Multimodal Label Relevance Ranking via Reinforcement Learning

http://arxiv.org/abs/2407.13221v1

Compressor summary:


Multi-sentence Video Grounding for Long Video Generation

http://arxiv.org/abs/2407.13219v1

Compressor summary:


LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

http://arxiv.org/abs/2407.13218v1

Compressor summary:


LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

http://arxiv.org/abs/2407.13217v1

Compressor summary:


TXL-PBC: a freely accessible labeled peripheral blood cell dataset

http://arxiv.org/abs/2407.13214v1

Compressor summary:


Research on Image Super-Resolution Reconstruction Mechanism based on Convolutional Neural Network

http://arxiv.org/abs/2407.13211v1

Compressor summary:


Transformer-based Single-Cell Language Model: A Survey

http://arxiv.org/abs/2407.13205v1

Compressor summary:


Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers

http://arxiv.org/abs/2407.13200v1

Compressor summary:


Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

http://arxiv.org/abs/2407.13195v1

Compressor summary:


Robust Multivariate Time Series Forecasting against Intra- and Inter-Series Transitional Shift

http://arxiv.org/abs/2407.13194v1

Compressor summary:


Retrieval-Augmented Generation for Natural Language Processing: A Survey

http://arxiv.org/abs/2407.13193v1

Compressor summary:


Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

http://arxiv.org/abs/2407.13188v1

Compressor summary:


KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter

http://arxiv.org/abs/2407.13185v1

Compressor summary:


HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

http://arxiv.org/abs/2407.13184v1

Compressor summary:


SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq

http://arxiv.org/abs/2407.13182v1

Compressor summary:


Training-Free Large Model Priors for Multiple-in-One Image Restoration

http://arxiv.org/abs/2407.13181v1

Compressor summary:


The use of the symmetric finite difference in the local binary pattern (symmetric LBP)

http://arxiv.org/abs/2407.13178v1

Compressor summary:


Compressed models are NOT miniature versions of large models

http://arxiv.org/abs/2407.13174v1

Compressor summary:


SciCode: A Research Coding Benchmark Curated by Scientists

http://arxiv.org/abs/2407.13168v1

Compressor summary:


Translate-and-Revise: Boosting Large Language Models for Constrained Translation

http://arxiv.org/abs/2407.13164v1

Compressor summary:


Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain

http://arxiv.org/abs/2407.13159v1

Compressor summary:


HHGT: Hierarchical Heterogeneous Graph Transformer for Heterogeneous Graph Representation Learning

http://arxiv.org/abs/2407.13158v1

Compressor summary:


Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems

http://arxiv.org/abs/2407.13153v1

Compressor summary:


DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection

http://arxiv.org/abs/2407.13147v1

Compressor summary:


PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods

http://arxiv.org/abs/2407.13146v1

Compressor summary:


Integrated Hardware Architecture and Device Placement Search

http://arxiv.org/abs/2407.13143v1

Compressor summary:


A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR

http://arxiv.org/abs/2407.13142v1

Compressor summary:


Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

http://arxiv.org/abs/2407.13141v1

Compressor summary:


Image Inpainting Models are Effective Tools for Instruction-guided Image Editing

http://arxiv.org/abs/2407.13139v1

Compressor summary:


FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

http://arxiv.org/abs/2407.13133v1

Compressor summary:


Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation

http://arxiv.org/abs/2407.13123v1

Compressor summary:


MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets

http://arxiv.org/abs/2407.13122v1

Compressor summary:


HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

http://arxiv.org/abs/2407.13120v1

Compressor summary:


TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

http://arxiv.org/abs/2407.13115v1

Compressor summary:


Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

http://arxiv.org/abs/2407.13113v1

Compressor summary:


UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

http://arxiv.org/abs/2407.13108v1

Compressor summary:


Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach

http://arxiv.org/abs/2407.13101v1

Compressor summary:


AlcLaM: Arabic Dialectal Language Model

http://arxiv.org/abs/2407.13097v1

Compressor summary:


Audio-visual Generalized Zero-shot Learning the Easy Way

http://arxiv.org/abs/2407.13095v1

Compressor summary:


Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

http://arxiv.org/abs/2407.13094v1

Compressor summary:


On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems

http://arxiv.org/abs/2407.13091v1

Compressor summary:


MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

http://arxiv.org/abs/2407.13089v1

Compressor summary:


Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism

http://arxiv.org/abs/2407.13078v1

Compressor summary:


Dynamic Sentiment Analysis with Local Large Language Models using Majority Voting: A Study on Factors Affecting Restaurant Evaluation

http://arxiv.org/abs/2407.13069v1

Compressor summary:


Krait: A Backdoor Attack Against Graph Prompt Tuning

http://arxiv.org/abs/2407.13068v1

Compressor summary: