This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.12601v1
Compressor summary: DreamCinema is a framework that uses AI to create high-quality, user-friendly films with 3D characters and smooth cinematography.
http://arxiv.org/abs/2408.12599v1
Compressor summary: This paper reviews controllable text generation techniques for large language models, discussing different methods, applications, and challenges in meeting complex user needs.
http://arxiv.org/abs/2408.12598v1
Compressor summary: ND-SDF is a novel technique that learns to adaptively use samples for accurate 3D surface reconstruction and rendering with improved quality and preservation of geometric details.
http://arxiv.org/abs/2408.12594v1
Compressor summary: ProNoG is a novel pre-training and prompt learning framework for non-homophilic graphs that considers node-specific characteristics and reduces labeling requirements.
http://arxiv.org/abs/2408.12591v1
Compressor summary: The paper presents a new NeSy method that learns with distant supervision by differentiably reasoning about logical implications using neural network outputs and logic programs embedded in matrices, achieving better accuracy and faster learning than existing methods.
http://arxiv.org/abs/2408.12590v1
Compressor summary: xGen-VideoSyn-1 is a text-to-video model that uses latent diffusion, video variational autoencoder, and diffusion transformer to generate realistic scenes from textual descriptions.
http://arxiv.org/abs/2408.12588v1
Compressor summary: PAB is a fast and easy way to generate videos using DiT models, by sharing attention information in a smart way across different steps.
http://arxiv.org/abs/2408.12581v1
Compressor summary: The paper introduces new methods for identifying the best arm in non-stationary stochastic bandits with global environmental shifts, and shows they outperform existing solutions.
http://arxiv.org/abs/2408.12579v1
Compressor summary: The RuleAlign framework helps large language models become better at diagnosing patients by aligning them with specific diagnostic rules, using a medical dialogue dataset and preference learning.
http://arxiv.org/abs/2408.12578v1
Compressor summary: The authors propose a definition for "emergence" in neural networks as the sudden learning of specific capabilities due to acquiring certain structures from the data-generating process, and empirically show this phenomenon in a Transformer model using a context-sensitive formal language.
http://arxiv.org/abs/2408.12574v1
Compressor summary: MuMA-ToM is a benchmark for evaluating AI's ability to reason about human mental states in complex social interactions using multiple sources of information.
http://arxiv.org/abs/2408.12570v1
Compressor summary: Jamba-1.5 is a hybrid language model with high throughput and low memory usage, fine-tuned for conversation and instruction-following, and supported by ExpertsInt8 quantization technique.
http://arxiv.org/abs/2408.12568v1
Compressor summary: The paper proposes a method to prune large neural networks by optimizing attribution methods, achieving higher compression rates and performance on image classification tasks.
http://arxiv.org/abs/2408.12561v1
Compressor summary: Our energy-efficient convolution module uses channel-wise sparsity and gradient selection schedulers to reduce computations, improve model performance, and lower the carbon footprint of deep learning training.
http://arxiv.org/abs/2408.12550v1
Compressor summary: The study compares five YOLOv5 variants for vehicle detection in various environments, evaluating their performance in detecting different types of vehicles under different conditions using precision, recall, F1-score, and mean Average Precision metrics.
http://arxiv.org/abs/2408.12548v1
Compressor summary: Key points: - Machine Learning (ML) is important for Autonomous Vehicles (AVs) but faces challenges in complex scenarios - Human-In-The-Loop Machine Learning (HITL-ML) integrates human capabilities to improve ML effectiveness and safety - HITL-ML includes Curriculum Learning, Human-In-The-Loop Reinforcement Learning, Active Learning, and ethical principles Summary: HITL-ML is a promising approach to enhance the performance and safety of AVs by leveraging human input in ML tasks, such as training, optimization, annotation, and ethics.
http://arxiv.org/abs/2408.12547v1
Compressor summary: The study introduces MedS-Bench, a benchmark for evaluating large language models in clinical tasks, and MedS-Ins, a dataset for instruction tuning to improve their performance.
http://arxiv.org/abs/2408.12545v1
Compressor summary: This paper investigates the dynamics of meta-learning in non-linear two-layer neural networks using statistical physics and highlights the role of hyper-parameters in the formation of shared representations and generalization.
http://arxiv.org/abs/2408.12531v1
Compressor summary: The authors improve upon a previous machine learning method for reconstructing global spatial fields from sparse data, achieving better results in Earth Sciences and Fluid Dynamics simulations.
http://arxiv.org/abs/2408.12528v1
Compressor summary: Show-o is a unified transformer that combines autoregressive and diffusion modeling for multimodal understanding and generation across various vision-language tasks.
http://arxiv.org/abs/2408.12526v1
Compressor summary: Academus is a system that uses student parallelism and distillation techniques to reduce online inference latency of BERT-like models without sacrificing accuracy.
http://arxiv.org/abs/2408.12525v1
Compressor summary: The paper introduces a new PCGRL framework using Jax to speed up training and improve scalability, as well as introduce randomized level sizes and pinpoints to enhance designer control, and evaluate the generalization ability of learned generators on large maps.
http://arxiv.org/abs/2408.12519v1
Compressor summary: The authors propose using graph neural networks to learn atomic-level protein representations and predict protein flexibility from 3D structures, outperforming previous methods on a large test set.
http://arxiv.org/abs/2408.12503v1
Compressor summary: The paper presents a new Russian embedding model, compares it with existing models, and introduces a benchmark for Russian NLP tasks.
http://arxiv.org/abs/2408.12496v1
Compressor summary: MEDCO is a novel multi-agent-based copilot system for medical education that simulates real-world training environments and enhances student performance and learning behaviors.
http://arxiv.org/abs/2408.12494v1
Compressor summary: GenderCARE is a framework to assess and reduce gender bias in large language models by introducing criteria, benchmarks, and debiasing techniques.
http://arxiv.org/abs/2408.12491v1
Compressor summary: The systematic review examines radiology-based AI methods for diagnosing and prognosing soft-tissue and bone tumours, finding that they perform poorly on current guidelines and need improvement in design, development, evaluation, and data reproducibility.
http://arxiv.org/abs/2408.12483v1
Compressor summary: The paper explores sample difficulty in dataset distillation, proposes a theoretical explanation for matching-based methods, and introduces the Sample Difficulty Correction approach to improve dataset quality.
http://arxiv.org/abs/2408.12480v1
Compressor summary: The report introduces Vintern-1B, a multimodal large language model for Vietnamese tasks that combines Qwen2 and InternViT models, fine-tuned on a large dataset, and optimized for on-device applications.
http://arxiv.org/abs/2408.12476v1
Compressor summary: Key points: - Paper proposes a solar energy prediction model using Machine Learning and Deep Learning - Model considers Air Quality Index and weather features as influencing factors - Model uses power transform normalization and zero-inflated modeling - Achieves high accuracy and low error with Conv2D Long Short-Term Memory model Summary: The paper presents a Machine Learning and Deep Learning based solar energy prediction model that considers Air Quality Index and weather features, using novel techniques like power transform normalization and zero-inflated modeling, and achieves high accuracy with Conv2D Long Short-Term Memory model.
http://arxiv.org/abs/2408.12469v1
Compressor summary: The paper proposes a new framework that combines abstract class semantics and concrete class entities from language models to improve few-shot learning by extracting semantic-aware visual patterns and refining class prototypes.
http://arxiv.org/abs/2408.12463v1
Compressor summary: The authors developed two new eye-tracking techniques for video-type visuals using deep learning models and optimized them for smartphones' resource constraints.
http://arxiv.org/abs/2408.12460v1
Compressor summary: The study investigates whether neural networks use Closure, a human visual skill for filling in missing parts of objects, by testing various Convolutional Neural Networks (CNNs) with curated datasets and reveals mixed results, suggesting some CNNs exhibit the Closure effect.
http://arxiv.org/abs/2408.12456v1
Compressor summary: Key points: - Large language models have internal inaccuracies and outdated knowledge - Current knowledge editing techniques are good for single-hop reasoning but not for multi-hop reasoning - The proposed KELE method uses erasure and injection functions to improve multi-hop reasoning Summary: The paper proposes a novel knowledge editing method, KELE, that addresses the limitations of current methods in improving large language models' multi-hop reasoning skills by using erasure and injection functions.
http://arxiv.org/abs/2408.12454v1
Compressor summary: RREConv is a method to handle rotational symmetry breaking in data by using learnable biases under the group order to relax strict group constraints.
http://arxiv.org/abs/2408.12443v1
Compressor summary: The paper presents a new approach to model, analyze, and generate tree-like 4D objects that change shape and structure over time by representing them as elastic trajectories in a square root velocity function tree space with a Riemannian metric and statistical models.
http://arxiv.org/abs/2408.12439v1
Compressor summary: The paper proposes solutions to improve MIMO video restoration by increasing temporal receptive field and smoothing discontinuities at stack transitions, achieving state-of-the-art low-latency performance on a new drone footage benchmark.
http://arxiv.org/abs/2408.12430v1
Compressor summary: The Positional Description Scheme (PDS) improves language models' arithmetic processing by simplifying number normalization and reducing errors in text-to-speech and speech recognition tasks.
http://arxiv.org/abs/2408.12429v1
Compressor summary: FlexEdit is an image editing method that uses free-shape masks and language instructions to achieve state-of-the-art performance in LLM-based image editing.
http://arxiv.org/abs/2408.12426v1
Compressor summary: The text discusses various image classification techniques for crop identification in agriculture and evaluates their accuracy, model size, prediction time, and explainability using Xception as the best performer.
http://arxiv.org/abs/2408.12423v1
Compressor summary: The text proposes a hybrid method that combines prior knowledge with relational structure of multivariate time series data to improve forecast accuracy and uncertainty estimation.
http://arxiv.org/abs/2408.12420v1
Compressor summary: XAI is a subset of IAI, which involves a mindset of abstraction and focuses on post-hoc analysis of a dataset, while IAI encompasses both outward and inward reasons for interpreting AI.
http://arxiv.org/abs/2408.12418v1
Compressor summary: CODE is a novel image editing method that uses diffusion models and ordinary differential equations to enhance images based on noisy or out-of-distribution guidance while maintaining realism and fidelity.
http://arxiv.org/abs/2408.12408v1
Compressor summary: The study explores various deep learning models for short-term stock market trend prediction using daily and hourly prices, finding that xLSTM-TS performs best.
http://arxiv.org/abs/2408.12409v1
Compressor summary: The paper proposes a hybrid method that combines domain knowledge and relational structure inference to improve forecasting of complex dynamical systems with high-dimensional multivariate time series data.
http://arxiv.org/abs/2408.12400v1
Compressor summary: The paper proposes a new model for generating high-quality multi-stylized sketch portraits from images using semi-supervised learning and feature extraction, achieving better performance than previous methods.
http://arxiv.org/abs/2408.12396v1
Compressor summary: Key points: - The study explores using computer vision foundation models (FMs) for geoscience tasks - The workflow fine-tunes existing FMs for different geoscientific data types - The experiments show the effectiveness and advantages of cross-domain FMs adaptation Summary: The study adapts computer vision foundation models to various geoscientific data analysis tasks, demonstrating their feasibility and benefits.
http://arxiv.org/abs/2408.12381v1
Compressor summary: ForestEyes is a Citizen Science project using Machine Learning models to monitor deforestation and improve its effectiveness by selecting optimal samples from the training set based on user entropy-increasing strategy.
http://arxiv.org/abs/2408.12380v1
Compressor summary: The paper presents UMERegRobust, a robust registration pipeline that handles partial overlap and differently sampled point clouds using UME framework, and shows its superior performance on KITTI and RotKITTI benchmarks.
http://arxiv.org/abs/2408.12373v1
Compressor summary: The text describes a new transcriptome foundation model (scCello) that leverages cell ontology information to learn gene co-expression patterns and improve biological tasks such as identifying novel cell types or predicting drug responses.
http://arxiv.org/abs/2408.12369v1
Compressor summary: The paper proposes a novel framework that uses full-text search to improve query accuracy and user experience when querying complex databases with natural language.
http://arxiv.org/abs/2408.12366v1
Compressor summary: The paper proposes a robust PCA method that learns discriminative sample weights to mitigate the impact of outliers on feature extraction.
http://arxiv.org/abs/2408.12362v1
Compressor summary: The paper investigates label errors in an Arabic Named Entity Recognition dataset, corrects them, and proposes a cleaner version called CLEANANERCorp for better model training and evaluation.
http://arxiv.org/abs/2408.12355v1
Compressor summary: The paper proposes an open-set semi-supervised object detection method for medical images that handles class imbalance and utilizes out-of-distribution information to improve detections.
http://arxiv.org/abs/2408.12352v1
Compressor summary: GarmentAligner is a text-to-garment diffusion model that uses retrieval augmentation, automatic component extraction, and multi-level correction losses to generate accurate and aligned garments from texts.
http://arxiv.org/abs/2408.12340v1
Compressor summary: VTON-HandFit is a method that uses hand priors to improve virtual try-on performance, especially for hand occlusion cases.
http://arxiv.org/abs/2408.12337v1
Compressor summary: Smaller language models can learn financial reasoning by fine-tuning with larger teacher models and generating programs to encode calculations.
http://arxiv.org/abs/2408.12334v1
Compressor summary: Our method improves link prediction tasks for graph neural networks by embedding subgraphs in the Laplacian matrix's eigenbasis using a novel Learnable Lanczos algorithm with Linear Constraints, achieving significant speedup and performance improvement with less training data.
http://arxiv.org/abs/2408.12333v1
Compressor summary: GRATR is a framework that uses a dynamic trustworthiness graph and retrieval-augmented generation to improve trust reasoning in multiplayer games, outperforming baseline methods by 30% or more.
http://arxiv.org/abs/2408.12326v1
Compressor summary: DualChecker is a framework that improves knowledge distillation between teacher and student models using an interactive dynamic checker system to mitigate hallucinations and enhance performance in machine learning tasks.
http://arxiv.org/abs/2408.12325v1
Compressor summary: The paper proposes a CDT framework to reduce unfaithful hallucinations in LLMs by comparing their responses to truthful ones using multi-task fine-tuning and mixture of experts strategies.
http://arxiv.org/abs/2408.12321v1
Compressor summary: MaVEn is a framework that improves multimodal language models' ability to reason with multiple images by combining coarse-grained visual symbols with fine-grained features and using a dynamic reduction mechanism.
http://arxiv.org/abs/2408.12320v1
Compressor summary: PolyRouter is a system that combines different large language models to answer queries efficiently, cheaply, and with high quality.
http://arxiv.org/abs/2408.12317v1
Compressor summary: CLIPHaze is a hybrid framework that combines Mamba and CLIP to improve dehazing by using parallel state space model and window-based self-attention, along with a novel aggregation module that adapts to different haze types.
http://arxiv.org/abs/2408.12316v1
Compressor summary: The paper proposes UDU-Net, a network that enhances low-light videos by decomposing the signal into spatial and temporal factors and updating them iteratively using expert knowledge and human feedback.
http://arxiv.org/abs/2408.12315v1
Compressor summary: The paper proposes SELF-TAUGHT, a framework that creates customized demonstrations for large language models to improve their performance in various domains, such as clinical diagnosis.
http://arxiv.org/abs/2408.12312v1
Compressor summary: Key points: - Backdoor attacks threaten face recognition systems - Existing attacks are simple and visible - MakeupAttack is a novel feature space attack via makeup transfer - It only requires model queries and can bypass defenses Summary: MakeupAttack is a new backdoor attack on face recognition that uses subtle makeup features to manipulate models without full access or detection.
http://arxiv.org/abs/2408.12308v1
Compressor summary: This tutorial provides a comprehensive and accessible introduction to Deep Learning with CNNs and supervised regression, emphasizing the connections between learning theory, statistics, and machine learning.
http://arxiv.org/abs/2408.12307v1
Compressor summary: The paper proposes an algorithm that uses unlabelled data in offline reinforcement learning with kernel function approximation and proves its complexity properties.
http://arxiv.org/abs/2408.12305v1
Compressor summary: The study shows that artificial intelligence models can answer medical questions accurately and may improve medical education and assessment.
http://arxiv.org/abs/2408.12304v1
Compressor summary: The paper proposes a new Approximate Logic Synthesis method using optimal decision trees to balance circuit complexity and accuracy, outperforming existing methods.