arxiv compressed, 2024-07-25

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-25 generated by the compressor, my personal LLM-based project.


SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

http://arxiv.org/abs/2407.17470v1

Compressor summary: SV4D is a latent video diffusion model that generates consistent novel views for dynamic 3D objects from a reference video and optimizes an implicit 4D representation without SDS optimization.


I Could've Asked That: Reformulating Unanswerable Questions

http://arxiv.org/abs/2407.17469v1

Compressor summary: The paper introduces CouldAsk, a benchmark for evaluating question reformulation, and shows that current language models struggle to improve unanswerable questions.


WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

http://arxiv.org/abs/2407.17468v1

Compressor summary: WildHallucinations is a benchmark that tests factuality of LLMs by generating and fact-checking information about real-world entities from user-chatbot conversations, revealing Hallucination patterns across domains.


CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

http://arxiv.org/abs/2407.17467v1

Compressor summary: The paper proposes a critical mixture ratio (CMR) to balance the general and domain-specific knowledge of large language models during continual pre-training, improving their efficiency and effectiveness in specialized domains.


Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

http://arxiv.org/abs/2407.17466v1

Compressor summary: The paper studies multi-objective reinforcement learning (MORL) with multiple reward functions, analyzes different optimization targets for finding Pareto optimal policies, and proposes efficient algorithms using Tchebycheff scalarization.


u-$μ$P: The Unit-Scaled Maximal Update Parametrization

http://arxiv.org/abs/2407.17465v1

Compressor summary: u-$\mu$P combines $\mu$P and Unit Scaling to simplify and improve hyperparameter optimization for efficient training and low-precision computing.


Hidden or Inferred: Fair Learning-To-Rank with Unknown Demographics

http://arxiv.org/abs/2407.17459v1

Compressor summary: The paper investigates how demographic inference errors affect the fairness performance of various learning-to-rank models and suggests that fair re-ranking strategies are more robust to these errors than LTR-based methods.


EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification

http://arxiv.org/abs/2407.17458v1

Compressor summary: EuroCropsML is a new dataset for crop type classification in Europe using Sentinel-2 satellite images.


CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

http://arxiv.org/abs/2407.17457v1

Compressor summary: CSCPR is a new algorithm for RGB-D indoor place recognition that uses global retrieval, reranking, and context clusters to handle noisy data and achieve better performance than existing models.


Automated Explanation Selection for Scientific Discovery

http://arxiv.org/abs/2407.17454v1

Compressor summary: The paper proposes a machine learning-based cycle for generating and choosing explanations in Explainable AI, using a taxonomy of explanation selection criteria derived from sociology and cognitive science.


$VILA^2$: VILA Augmented VILA

http://arxiv.org/abs/2407.17453v1

Compressor summary: The authors propose VILA^2, a visual language model family that uses self-augmentation and specialist-augmentation to improve data quality and performance on various tasks.


Looking at Model Debiasing through the Lens of Anomaly Detection

http://arxiv.org/abs/2407.17449v1

Compressor summary: The paper proposes a new method to identify and reduce unwanted correlations in deep neural networks using anomaly detection and data augmentation.


Fluent Student-Teacher Redteaming

http://arxiv.org/abs/2407.17447v1

Compressor summary: The authors improve existing algorithms for jailbreaking safety-tuned language models by using a new distillation-based approach that makes the attacks more powerful and fluent, achieving high success rates on various models.


AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction

http://arxiv.org/abs/2407.17442v1

Compressor summary: The paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) model that incorporates working memory and long-term memory to predict driver attention more human-like, using domain adaptation techniques for better performance.


HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

http://arxiv.org/abs/2407.17438v1

Compressor summary: Key points: - Human image animation generates videos from a photo of a character - Existing approaches face challenges with training data, 2D motion, and camera motions - HumanVid is a new dataset that combines real-world and synthetic data with rich annotations - CamAnimate is a baseline model that shows state-of-the-art performance on HumanVid Summary: HumanVid introduces a large-scale dataset for human image animation, addressing the limitations of existing approaches by providing diverse and precise real-world and synthetic data with human and camera motion annotations.


Nerva: a Truly Sparse Implementation of Neural Networks

http://arxiv.org/abs/2407.17437v1

Compressor summary: Nerva is a fast C++ neural network library that uses sparse matrix operations to reduce training time and memory usage while maintaining accuracy.


3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

http://arxiv.org/abs/2407.17418v1

Compressor summary: 3D Gaussian Splatting is a technique that converts multi-view images into 3D representations for real-time rendering, and this survey reviews existing works, challenges, and opportunities in the field.


Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

http://arxiv.org/abs/2407.17417v1

Compressor summary: This paper investigates how watermarking large language models can deter copyright violations, while also considering the impact on membership inference attacks and proposing an adaptive technique to improve detection.


(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork

http://arxiv.org/abs/2407.17412v1

Compressor summary: The paper proposes a novel algorithm, PASS, that leverages visual prompts to determine channel importance and achieve structural sparsity in neural networks, resulting in improved efficiency and performance.


Generation of Training Data from HD Maps in the Lanelet2 Framework

http://arxiv.org/abs/2407.17409v1

Compressor summary: The paper introduces lanelet2_ml_converter, an extension to the HD map framework Lanelet2 that supports machine learning tasks and training using map data.


Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

http://arxiv.org/abs/2407.17406v1

Compressor summary: The paper introduces Dependency Transformer Grammars (DTGs), a new type of Transformer model that uses dependency structures to improve generalization, and shows that it outperforms previous models based on constituency trees.


Grammar-based Game Description Generation using Large Language Models

http://arxiv.org/abs/2407.17404v1

Compressor summary: The paper explores using large language models with game description grammar to improve automated game design with limited data.


Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

http://arxiv.org/abs/2407.17399v1

Compressor summary: The paper proposes a method called Noise2VST that uses an off-the-shelf Gaussian denoiser and a variance-stabilizing transform to efficiently remove noise from real-world images without additional training data.


Systematic Reasoning About Relational Domains With Graph Neural Networks

http://arxiv.org/abs/2407.17396v1

Compressor summary: The paper proposes a GNN architecture that treats node embeddings as epistemic states and shows it can achieve state-of-the-art results in reasoning with relational domains.


Five reasons against assuming a data-generating distribution in Machine Learning

http://arxiv.org/abs/2407.17395v1

Compressor summary: The authors challenge the common assumption of data-generating probability distributions in machine learning and propose an alternative framework that focuses on finite populations for better modeling and theory.


CovScore: Evaluation of Multi-Document Abstractive Title Set Generation

http://arxiv.org/abs/2407.17390v1

Compressor summary: CovScore is a new method for evaluating extracted thematic title sets from documents using five metrics, which simplifies and speeds up the evaluation process and can be applied to relevant datasets like Holocaust survivor testimonies.


PERSONA: A Reproducible Testbed for Pluralistic Alignment

http://arxiv.org/abs/2407.17387v1

Compressor summary: PERSONA is a test bed that evaluates and improves language models' ability to align with diverse user values by generating synthetic personas and feedback.


A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance

http://arxiv.org/abs/2407.17383v1

Compressor summary: The research uses neural networks and BERT models to correct spelling errors in written text, especially in the Persian language.


MMRA: A Benchmark for Multi-granularity Multi-image Relational Association

http://arxiv.org/abs/2407.17379v1

Compressor summary: The text introduces a new multi-image relation association task (MMRA) to evaluate large visual language models' ability to perceive associative relations between multiple images and their details, finding that current models struggle with fine-grained tasks and spatial perception.


PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

http://arxiv.org/abs/2407.17378v1

Compressor summary: The paper introduces PrevPredMap, a framework that uses previous predictions to create online vectorized HD maps, with two essential modules and a dual-mode strategy for robust performance.


Entropy Reweighted Conformal Classification

http://arxiv.org/abs/2407.17377v1

Compressor summary:


ViPer: Visual Personalization of Generative Models via Individual Preference Learning

http://arxiv.org/abs/2407.17365v1

Compressor summary:


MuST: Multi-Scale Transformers for Surgical Phase Recognition

http://arxiv.org/abs/2407.17361v1

Compressor summary:


Gradient-based inference of abstract task representations for generalization in neural networks

http://arxiv.org/abs/2407.17356v1

Compressor summary:


Scalify: scale propagation for efficient low-precision LLM training

http://arxiv.org/abs/2407.17353v1

Compressor summary:


Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

http://arxiv.org/abs/2407.17349v1

Compressor summary:


Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition

http://arxiv.org/abs/2407.17344v1

Compressor summary:


Preliminary study on artificial intelligence methods for cybersecurity threat detection in computer networks based on raw data packets

http://arxiv.org/abs/2407.17339v1

Compressor summary:


Cascaded Light Propagation Volumes using Spherical Radial Basis Functions

http://arxiv.org/abs/2407.17336v1

Compressor summary:


Global and Local Confidence Based Fraud Detection Graph Neural Network

http://arxiv.org/abs/2407.17333v1

Compressor summary:


Multi-label Cluster Discrimination for Visual Representation Learning

http://arxiv.org/abs/2407.17331v1

Compressor summary:


DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture

http://arxiv.org/abs/2407.17328v1

Compressor summary:


LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering

http://arxiv.org/abs/2407.17310v1

Compressor summary:


MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning

http://arxiv.org/abs/2407.17303v1

Compressor summary:


A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks

http://arxiv.org/abs/2407.17284v1

Compressor summary:


DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

http://arxiv.org/abs/2407.17272v1

Compressor summary:


M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

http://arxiv.org/abs/2407.17267v1

Compressor summary:


Channel-Aware Low-Rank Adaptation in Time Series Forecasting

http://arxiv.org/abs/2407.17246v1

Compressor summary:


Improving ICD coding using Chapter based Named Entities and Attentional Models

http://arxiv.org/abs/2407.17230v1

Compressor summary:


LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model

http://arxiv.org/abs/2407.17229v1

Compressor summary:


LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

http://arxiv.org/abs/2407.17227v1

Compressor summary:


Sublinear Regret for An Actor-Critic Algorithm in Continuous-Time Linear-Quadratic Reinforcement Learning

http://arxiv.org/abs/2407.17226v1

Compressor summary:


Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification?

http://arxiv.org/abs/2407.17219v1

Compressor summary:


Spectrum-Informed Multistage Neural Networks: Multiscale Function Approximators of Machine Precision

http://arxiv.org/abs/2407.17213v1

Compressor summary:


Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model

http://arxiv.org/abs/2407.17209v1

Compressor summary:


Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization

http://arxiv.org/abs/2407.17206v1

Compressor summary:


ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only

http://arxiv.org/abs/2407.17197v1

Compressor summary:


Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model

http://arxiv.org/abs/2407.17193v1

Compressor summary:


Solving the Electrical Impedance Tomography Problem with a DeepONet Type Neural Network: Theory and Application

http://arxiv.org/abs/2407.17182v1

Compressor summary:


NarrationDep: Narratives on Social Media For Automatic Depression Detection

http://arxiv.org/abs/2407.17174v1

Compressor summary:


Domain Generalized Recaptured Screen Image Identification Using SWIN Transformer

http://arxiv.org/abs/2407.17170v1

Compressor summary:


Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

http://arxiv.org/abs/2407.17164v1

Compressor summary:


Explainable Artificial Intelligence Techniques for Irregular Temporal Classification of Multidrug Resistance Acquisition in Intensive Care Unit Patients

http://arxiv.org/abs/2407.17165v1

Compressor summary:


dlordinal: a Python package for deep ordinal classification

http://arxiv.org/abs/2407.17163v1

Compressor summary:


Context-aware Multi-task Learning for Pedestrian Intent and Trajectory Prediction

http://arxiv.org/abs/2407.17162v1

Compressor summary:


A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives

http://arxiv.org/abs/2407.17160v1

Compressor summary:


Establishing Truly Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning

http://arxiv.org/abs/2407.17157v1

Compressor summary:


Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach

http://arxiv.org/abs/2407.17156v1

Compressor summary: