arxiv compressed, 2024-08-23

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-23 generated by the compressor, my personal LLM-based project.


DreamCinema: Cinematic Transfer with Free Camera and 3D Character

http://arxiv.org/abs/2408.12601v1

Compressor summary: DreamCinema is a framework that uses AI to create high-quality, user-friendly films with 3D characters and smooth cinematography.


Controllable Text Generation for Large Language Models: A Survey

http://arxiv.org/abs/2408.12599v1

Compressor summary: This paper reviews controllable text generation techniques for large language models, discussing different methods, applications, and challenges in meeting complex user needs.


ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

http://arxiv.org/abs/2408.12598v1

Compressor summary: ND-SDF is a novel technique that learns to adaptively use samples for accurate 3D surface reconstruction and rendering with improved quality and preservation of geometric details.


Non-Homophilic Graph Pre-Training and Prompt Learning

http://arxiv.org/abs/2408.12594v1

Compressor summary: ProNoG is a novel pre-training and prompt learning framework for non-homophilic graphs that considers node-specific characteristics and reduces labeling requirements.


Differentiable Logic Programming for Distant Supervision

http://arxiv.org/abs/2408.12591v1

Compressor summary: The paper presents a new NeSy method that learns with distant supervision by differentiably reasoning about logical implications using neural network outputs and logic programs embedded in matrices, achieving better accuracy and faster learning than existing methods.


xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

http://arxiv.org/abs/2408.12590v1

Compressor summary: xGen-VideoSyn-1 is a text-to-video model that uses latent diffusion, video variational autoencoder, and diffusion transformer to generate realistic scenes from textual descriptions.


Real-Time Video Generation with Pyramid Attention Broadcast

http://arxiv.org/abs/2408.12588v1

Compressor summary: PAB is a fast and easy way to generate videos using DiT models, by sharing attention information in a smart way across different steps.


Identifying the Best Arm in the Presence of Global Environment Shifts

http://arxiv.org/abs/2408.12581v1

Compressor summary: The paper introduces new methods for identifying the best arm in non-stationary stochastic bandits with global environmental shifts, and shows they outperform existing solutions.


RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

http://arxiv.org/abs/2408.12579v1

Compressor summary: The RuleAlign framework helps large language models become better at diagnosing patients by aligning them with specific diagnostic rules, using a medical dialogue dataset and preference learning.


A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

http://arxiv.org/abs/2408.12578v1

Compressor summary: The authors propose a definition for "emergence" in neural networks as the sudden learning of specific capabilities due to acquiring certain structures from the data-generating process, and empirically show this phenomenon in a Transformer model using a context-sensitive formal language.


MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

http://arxiv.org/abs/2408.12574v1

Compressor summary: MuMA-ToM is a benchmark for evaluating AI's ability to reason about human mental states in complex social interactions using multiple sources of information.


Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

http://arxiv.org/abs/2408.12570v1

Compressor summary: Jamba-1.5 is a hybrid language model with high throughput and low memory usage, fine-tuned for conversation and instruction-following, and supported by ExpertsInt8 quantization technique.


Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers

http://arxiv.org/abs/2408.12568v1

Compressor summary: The paper proposes a method to prune large neural networks by optimizing attribution methods, achieving higher compression rates and performance on image classification tasks.


ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation

http://arxiv.org/abs/2408.12561v1

Compressor summary: Our energy-efficient convolution module uses channel-wise sparsity and gradient selection schedulers to reduce computations, improve model performance, and lower the carbon footprint of deep learning training.


Comparing YOLOv5 Variants for Vehicle Detection: A Performance Analysis

http://arxiv.org/abs/2408.12550v1

Compressor summary: The study compares five YOLOv5 variants for vehicle detection in various environments, evaluating their performance in detecting different types of vehicles under different conditions using precision, recall, F1-score, and mean Average Precision metrics.


Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and Opportunities

http://arxiv.org/abs/2408.12548v1

Compressor summary: Key points: - Machine Learning (ML) is important for Autonomous Vehicles (AVs) but faces challenges in complex scenarios - Human-In-The-Loop Machine Learning (HITL-ML) integrates human capabilities to improve ML effectiveness and safety - HITL-ML includes Curriculum Learning, Human-In-The-Loop Reinforcement Learning, Active Learning, and ethical principles Summary: HITL-ML is a promising approach to enhance the performance and safety of AVs by leveraging human input in ML tasks, such as training, optimization, annotation, and ethics.


Towards Evaluating and Building Versatile Large Language Models for Medicine

http://arxiv.org/abs/2408.12547v1

Compressor summary: The study introduces MedS-Bench, a benchmark for evaluating large language models in clinical tasks, and MedS-Ins, a dataset for instruction tuning to improve their performance.


Dynamics of Meta-learning Representation in the Teacher-student Scenario

http://arxiv.org/abs/2408.12545v1

Compressor summary: This paper investigates the dynamics of meta-learning in non-linear two-layer neural networks using statistical physics and highlights the role of hyper-parameters in the formation of shared representations and generalization.


Deep Learning Improvements for Sparse Spatial Field Reconstruction

http://arxiv.org/abs/2408.12531v1

Compressor summary: The authors improve upon a previous machine learning method for reconstructing global spatial fields from sparse data, achieving better results in Earth Sciences and Fluid Dynamics simulations.


Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

http://arxiv.org/abs/2408.12528v1

Compressor summary: Show-o is a unified transformer that combines autoregressive and diffusion modeling for multimodal understanding and generation across various vision-language tasks.


Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

http://arxiv.org/abs/2408.12526v1

Compressor summary: Academus is a system that uses student parallelism and distillation techniques to reduce online inference latency of BERT-like models without sacrificing accuracy.


PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

http://arxiv.org/abs/2408.12525v1

Compressor summary: The paper introduces a new PCGRL framework using Jax to speed up training and improve scalability, as well as introduce randomized level sizes and pinpoints to enhance designer control, and evaluate the generalization ability of learned generators on large maps.


Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks

http://arxiv.org/abs/2408.12519v1

Compressor summary: The authors propose using graph neural networks to learn atomic-level protein representations and predict protein flexibility from 3D structures, outperforming previous methods on a large test set.


The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design

http://arxiv.org/abs/2408.12503v1

Compressor summary: The paper presents a new Russian embedding model, compares it with existing models, and introduces a benchmark for Russian NLP tasks.


MEDCO: Medical Education Copilots Based on A Multi-Agent Framework

http://arxiv.org/abs/2408.12496v1

Compressor summary: MEDCO is a novel multi-agent-based copilot system for medical education that simulates real-world training environments and enhances student performance and learning behaviors.


GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

http://arxiv.org/abs/2408.12494v1

Compressor summary: GenderCARE is a framework to assess and reduce gender bias in large language models by introducing criteria, benchmarks, and debiasing techniques.


AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

http://arxiv.org/abs/2408.12491v1

Compressor summary: The systematic review examines radiology-based AI methods for diagnosing and prognosing soft-tissue and bone tumours, finding that they perform poorly on current guidelines and need improvement in design, development, evaluation, and data reproducibility.


Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

http://arxiv.org/abs/2408.12483v1

Compressor summary: The paper explores sample difficulty in dataset distillation, proposes a theoretical explanation for matching-based methods, and introduces the Sample Difficulty Correction approach to improve dataset quality.


Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

http://arxiv.org/abs/2408.12480v1

Compressor summary: The report introduces Vintern-1B, a multimodal large language model for Vietnamese tasks that combines Qwen2 and InternViT models, fine-tuned on a large dataset, and optimized for on-device applications.


Predicting Solar Energy Generation with Machine Learning based on AQI and Weather Features

http://arxiv.org/abs/2408.12476v1

Compressor summary: Key points: - Paper proposes a solar energy prediction model using Machine Learning and Deep Learning - Model considers Air Quality Index and weather features as influencing factors - Model uses power transform normalization and zero-inflated modeling - Achieves high accuracy and low error with Conv2D Long Short-Term Memory model Summary: The paper presents a Machine Learning and Deep Learning based solar energy prediction model that considers Air Quality Index and weather features, using novel techniques like power transform normalization and zero-inflated modeling, and achieves high accuracy with Conv2D Long Short-Term Memory model.


Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

http://arxiv.org/abs/2408.12469v1

Compressor summary: The paper proposes a new framework that combines abstract class semantics and concrete class entities from language models to improve few-shot learning by extracting semantic-aware visual patterns and refining class prototypes.


Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation

http://arxiv.org/abs/2408.12463v1

Compressor summary: The authors developed two new eye-tracking techniques for video-type visuals using deep learning models and optimized them for smartphones' resource constraints.


Finding Closure: A Closer Look at the Gestalt Law of Closure in Convolutional Neural Networks

http://arxiv.org/abs/2408.12460v1

Compressor summary: The study investigates whether neural networks use Closure, a human visual skill for filling in missing parts of objects, by testing various Convolutional Neural Networks (CNNs) with curated datasets and reveals mixed results, suggesting some CNNs exhibit the Closure effect.


Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing

http://arxiv.org/abs/2408.12456v1

Compressor summary: Key points: - Large language models have internal inaccuracies and outdated knowledge - Current knowledge editing techniques are good for single-hop reasoning but not for multi-hop reasoning - The proposed KELE method uses erasure and injection functions to improve multi-hop reasoning Summary: The paper proposes a novel knowledge editing method, KELE, that addresses the limitations of current methods in improving large language models' multi-hop reasoning skills by using erasure and injection functions.


Relaxed Rotational Equivariance via $G$-Biases in Vision

http://arxiv.org/abs/2408.12454v1

Compressor summary: RREConv is a method to handle rotational symmetry breaking in data by using learnable biases under the group order to relax strict group constraints.


A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

http://arxiv.org/abs/2408.12443v1

Compressor summary: The paper presents a new approach to model, analyze, and generate tree-like 4D objects that change shape and structure over time by representing them as elastic trajectories in a square root velocity function tree space with a Riemannian metric and statistical models.


Adapting MIMO video restoration networks to low latency constraints

http://arxiv.org/abs/2408.12439v1

Compressor summary: The paper proposes solutions to improve MIMO video restoration by increasing temporal receptive field and smoothing discontinuities at stack transitions, achieving state-of-the-art low-latency performance on a new drone footage benchmark.


Positional Description for Numerical Normalization

http://arxiv.org/abs/2408.12430v1

Compressor summary: The Positional Description Scheme (PDS) improves language models' arithmetic processing by simplifying number normalization and reducing errors in text-to-speech and speech recognition tasks.


FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing

http://arxiv.org/abs/2408.12429v1

Compressor summary: FlexEdit is an image editing method that uses free-shape masks and language instructions to achieve state-of-the-art performance in LLM-based image editing.


Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification

http://arxiv.org/abs/2408.12426v1

Compressor summary: The text discusses various image classification techniques for crop identification in agriculture and evaluates their accuracy, model size, prediction time, and explainability using Xception as the best performer.


Multi-Knowledge Fusion Network for Time Series Representation Learning

http://arxiv.org/abs/2408.12423v1

Compressor summary: The text proposes a hybrid method that combines prior knowledge with relational structure of multivariate time series data to improve forecast accuracy and uncertainty estimation.


Dataset | Mindset = Explainable AI | Interpretable AI

http://arxiv.org/abs/2408.12420v1

Compressor summary: XAI is a subset of IAI, which involves a mindset of abstraction and focuses on post-hoc analysis of a dataset, while IAI encompasses both outward and inward reasons for interpreting AI.


CODE: Confident Ordinary Differential Editing

http://arxiv.org/abs/2408.12418v1

Compressor summary: CODE is a novel image editing method that uses diffusion models and ordinary differential equations to enhance images based on noisy or out-of-distribution guidance while maintaining realism and fidelity.


An Evaluation of Deep Learning Models for Stock Market Trend Prediction

http://arxiv.org/abs/2408.12408v1

Compressor summary: The study explores various deep learning models for short-term stock market trend prediction using daily and hourly prices, finding that xLSTM-TS performs best.


Multi-Source Knowledge-Based Hybrid Neural Framework for Time Series Representation Learning

http://arxiv.org/abs/2408.12409v1

Compressor summary: The paper proposes a hybrid method that combines domain knowledge and relational structure inference to improve forecasting of complex dynamical systems with high-dimensional multivariate time series data.


Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

http://arxiv.org/abs/2408.12400v1

Compressor summary: The paper proposes a new model for generating high-quality multi-stylized sketch portraits from images using semi-supervised learning and feature extraction, achieving better performance than previous methods.


Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

http://arxiv.org/abs/2408.12396v1

Compressor summary: Key points: - The study explores using computer vision foundation models (FMs) for geoscience tasks - The workflow fine-tunes existing FMs for different geoscientific data types - The experiments show the effectiveness and advantages of cross-domain FMs adaptation Summary: The study adapts computer vision foundation models to various geoscientific data analysis tasks, demonstrating their feasibility and benefits.


Sampling Strategies based on Wisdom of Crowds for Amazon Deforestation Detection

http://arxiv.org/abs/2408.12381v1

Compressor summary: ForestEyes is a Citizen Science project using Machine Learning models to monitor deforestation and improve its effectiveness by selecting optimal samples from the training set based on user entropy-increasing strategy.


UMERegRobust -- Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration

http://arxiv.org/abs/2408.12380v1

Compressor summary: The paper presents UMERegRobust, a robust registration pipeline that handles partial overlap and differently sampled point clouds using UME framework, and shows its superior performance on KITTI and RotKITTI benchmarks.


Cell-ontology guided transcriptome foundation model

http://arxiv.org/abs/2408.12373v1

Compressor summary: The text describes a new transcriptome foundation model (scCello) that leverages cell ontology information to learn gene co-expression patterns and improve biological tasks such as identifying novel cell types or predicting drug responses.


RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering

http://arxiv.org/abs/2408.12369v1

Compressor summary: The paper proposes a novel framework that uses full-text search to improve query accuracy and user experience when querying complex databases with natural language.


Robust Principal Component Analysis via Discriminant Sample Weight Learning

http://arxiv.org/abs/2408.12366v1

Compressor summary: The paper proposes a robust PCA method that learns discriminative sample weights to mitigate the impact of outliers on feature extraction.


CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

http://arxiv.org/abs/2408.12362v1

Compressor summary: The paper investigates label errors in an Arabic Named Entity Recognition dataset, corrects them, and proposes a cleaner version called CLEANANERCorp for better model training and evaluation.


Class-balanced Open-set Semi-supervised Object Detection for Medical Images

http://arxiv.org/abs/2408.12355v1

Compressor summary: The paper proposes an open-set semi-supervised object detection method for medical images that handles class imbalance and utilizes out-of-distribution information to improve detections.


GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

http://arxiv.org/abs/2408.12352v1

Compressor summary: GarmentAligner is a text-to-garment diffusion model that uses retrieval augmentation, automatic component extraction, and multi-level correction losses to generate accurate and aligned garments from texts.


VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

http://arxiv.org/abs/2408.12340v1

Compressor summary: VTON-HandFit is a method that uses hand priors to improve virtual try-on performance, especially for hand occlusion cases.


Fine-tuning Smaller Language Models for Question Answering over Financial Documents

http://arxiv.org/abs/2408.12337v1

Compressor summary: Smaller language models can learn financial reasoning by fine-tuning with larger teacher models and generating programs to encode calculations.


Enhanced Expressivity in Graph Neural Networks with Lanczos-Based Linear Constraints

http://arxiv.org/abs/2408.12334v1

Compressor summary: Our method improves link prediction tasks for graph neural networks by embedding subgraphs in the Laplacian matrix's eigenbasis using a novel Learnable Lanczos algorithm with Linear Constraints, achieving significant speedup and performance improvement with less training data.


Graph Retrieval Augmented Trustworthiness Reasoning

http://arxiv.org/abs/2408.12333v1

Compressor summary: GRATR is a framework that uses a dynamic trustworthiness graph and retrieval-augmented generation to improve trust reasoning in multiplayer games, outperforming baseline methods by 30% or more.


Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

http://arxiv.org/abs/2408.12326v1

Compressor summary: DualChecker is a framework that improves knowledge distillation between teacher and student models using an interactive dynamic checker system to mitigate hallucinations and enhance performance in machine learning tasks.


Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

http://arxiv.org/abs/2408.12325v1

Compressor summary: The paper proposes a CDT framework to reduce unfaithful hallucinations in LLMs by comparing their responses to truthful ones using multi-task fine-tuning and mixture of experts strategies.


MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

http://arxiv.org/abs/2408.12321v1

Compressor summary: MaVEn is a framework that improves multimodal language models' ability to reason with multiple images by combining coarse-grained visual symbols with fine-grained features and using a dynamic reduction mechanism.


PolyRouter: A Multi-LLM Querying System

http://arxiv.org/abs/2408.12320v1

Compressor summary: PolyRouter is a system that combines different large language models to answer queries efficiently, cheaply, and with high quality.


Adapt CLIP as Aggregation Instructor for Image Dehazing

http://arxiv.org/abs/2408.12317v1

Compressor summary: CLIPHaze is a hybrid framework that combines Mamba and CLIP to improve dehazing by using parallel state space model and window-based self-attention, along with a novel aggregation module that adapts to different haze types.


Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

http://arxiv.org/abs/2408.12316v1

Compressor summary: The paper proposes UDU-Net, a network that enhances low-light videos by decomposing the signal into spatial and temporal factors and updating them iteratively using expert knowledge and human feedback.


Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

http://arxiv.org/abs/2408.12315v1

Compressor summary: The paper proposes SELF-TAUGHT, a framework that creates customized demonstrations for large language models to improve their performance in various domains, such as clinical diagnosis.


MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer

http://arxiv.org/abs/2408.12312v1

Compressor summary: Key points: - Backdoor attacks threaten face recognition systems - Existing attacks are simple and visible - MakeupAttack is a novel feature space attack via makeup transfer - It only requires model queries and can bypass defenses Summary: MakeupAttack is a new backdoor attack on face recognition that uses subtle makeup features to manipulate models without full access or detection.


Deep Learning with CNNs: A Compact Holistic Tutorial with Focus on Supervised Regression (Preprint)

http://arxiv.org/abs/2408.12308v1

Compressor summary: This tutorial provides a comprehensive and accessible introduction to Deep Learning with CNNs and supervised regression, emphasizing the connections between learning theory, statistics, and machine learning.


Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning

http://arxiv.org/abs/2408.12307v1

Compressor summary: The paper proposes an algorithm that uses unlabelled data in offline reinforcement learning with kernel function approximation and proves its complexity properties.


Tipta uzmanlik sinavinda (tus) büyük dil modelleri insanlardan daha mi başarili?

http://arxiv.org/abs/2408.12305v1

Compressor summary: The study shows that artificial intelligence models can answer medical questions accurately and may improve medical education and assessment.


OPTDTALS: Approximate Logic Synthesis via Optimal Decision Trees Approach

http://arxiv.org/abs/2408.12304v1

Compressor summary: The paper proposes a new Approximate Logic Synthesis method using optimal decision trees to balance circuit complexity and accuracy, outperforming existing methods.