This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-18 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2409.11406v1
Compressor summary: Phidias is a novel generative model that uses diffusion and reference-augmented 3D generation to improve quality, generalization, and controllability in 3D modeling.
http://arxiv.org/abs/2409.11404v1
Compressor summary: The paper introduces seven synthetic datasets for underrepresented Arabic dialects, a benchmark for evaluating LLMs on dialect comprehension and generation, and highlights challenges in capturing diverse Arabic dialects and cultural contexts.
http://arxiv.org/abs/2409.11402v1
Compressor summary: NVLM 1.0 is a state-of-the-art multimodal language model that outperforms leading models on vision-language tasks, with a novel architecture and curation of high-quality datasets.
http://arxiv.org/abs/2409.11390v1
Compressor summary: The paper evaluates Large Language Models' performance in identifying narrative focalization and shows their potential for studying literary texts.
http://arxiv.org/abs/2409.11389v1
Compressor summary: The text discusses feature normalization methods for data analysis and modeling, focusing on uniform and proportional features and their comparisons, with some examples of normalization and similarity measures.
http://arxiv.org/abs/2409.11380v1
Compressor summary: The paper presents a novel method that uses adaptive beamforming and denoising diffusion to enhance ultrasound images by balancing contrast, resolution, and speckle preservation.
http://arxiv.org/abs/2409.11378v1
Compressor summary: This paper proposes a data selection method for fine-tuning large language models that focuses on diversity rather than quality, using k-means clustering and iterative refinement to improve performance across various tasks.
http://arxiv.org/abs/2409.11377v1
Compressor summary: The text summarizes the authors' study of using existing fMRI data to understand human cognition/behavior, evaluate current deep models for cognitive task recognition and disease diagnosis, and provide guidelines for selecting suitable machine learning backbones for new neuroimaging applications.
http://arxiv.org/abs/2409.11376v1
Compressor summary: The paper proposes a multi-modal time-series language model that extracts and reasons about time-series information using a lightweight encoder and chain-of-thought augmentation, achieving zero-shot performance across various domains.
http://arxiv.org/abs/2409.11375v1
Compressor summary: The authors propose a self-supervised framework using large language models and SwinV2 to improve retinal disease diagnosis from multi-modal data, enhancing generalization and performance on smaller datasets.
http://arxiv.org/abs/2409.11367v1
Compressor summary: The authors propose a two-stage training framework that combines consistency distillation with GAN training to accelerate video diffusion, leading to high-quality videos in one step and outperforming existing methods.
http://arxiv.org/abs/2409.11365v1
Compressor summary: The paper explores how to enhance the safety-awareness of multimodal language models against malicious image inputs using a technique called CoCA.
http://arxiv.org/abs/2409.11363v1
Compressor summary: CORE-Bench is a benchmark for measuring AI agents' accuracy in performing computational reproducibility tasks across three disciplines, aiming to improve scientific processes and agent development.
http://arxiv.org/abs/2409.11355v1
Compressor summary: The paper proposes a faster and more accurate monocular depth estimator by fixing an inference pipeline flaw, fine-tuning the model with task-specific losses, and applying the method to Stable Diffusion.
http://arxiv.org/abs/2409.11353v1
Compressor summary: THaMES is a framework for detecting and mitigating hallucinations in large language models using various strategies and automated test set generation.
http://arxiv.org/abs/2409.11338v1
Compressor summary: The text analyses how CLIP model's high cosine similarity between paired and unpaired images affects few-shot classification and proposes a lightweight adapter to reduce this overlap, improving performance and robustness.
http://arxiv.org/abs/2409.11329v1
Compressor summary: The paper proposes a self-distillation method to solve catastrophic forgetting in continual learning and improves it with a memory update technique that prioritizes storing misclassified samples.
http://arxiv.org/abs/2409.11325v1
Compressor summary: TopoMask is a new method that uses mask-based instances and attention-based transformers to improve centerline prediction from road images, achieving state-of-the-art performance on OpenLane-V2 dataset.
http://arxiv.org/abs/2409.11323v1
Compressor summary: LPT++ is a framework for long-tailed classification that combines fine-tuning, model ensemble, and three core components to improve Vision Transformers' performance with minimal additional parameters.
http://arxiv.org/abs/2409.11321v1
Compressor summary: SOAP is a computationally efficient optimization algorithm that combines the benefits of Shampoo and Adam, reducing the number of iterations and wall clock time for large-scale language model pre-training.
http://arxiv.org/abs/2409.11315v1
Compressor summary: The paper introduces the fMRI-3D dataset, a collection of fMRI data and 3D object images with text captions, and presents MinD-3D, a framework to reconstruct 3D objects from fMRI signals using a generative transformer decoder.
http://arxiv.org/abs/2409.11308v1
Compressor summary: The text discusses the advances and challenges of speech generation technology, especially in detecting misinformation in synthetic spoken content, and introduces an open-source dataset (SpMis) to study this issue.
http://arxiv.org/abs/2409.11307v1
Compressor summary: GS-Net is a plug-and-play module that improves 3D Gaussian Splatting by densifying Gaussian ellipsoids from sparse point clouds, achieving better generalization and rendering quality on novel viewpoints using the CARLA-NVS dataset.
http://arxiv.org/abs/2409.11302v1
Compressor summary: The text discusses using Parameter-Efficient Fine-Tuning techniques for time series models in healthcare applications, particularly forecasting vital signs of sepsis patients, and shows that some methods outperform existing approaches while fine-tuning fewer parameters.
http://arxiv.org/abs/2409.11294v1
Compressor summary: The paper uses pm4py to analyze road traffic fine management processes and discover their models, patterns, and limitations using various process-mining techniques.
http://arxiv.org/abs/2409.11290v1
Compressor summary: The text discusses using neural networks as a new tool for optimizing vehicle routes, presenting a novel graphical neural network model and demonstrating its efficiency through tests.
http://arxiv.org/abs/2409.11282v1
Compressor summary: The paper proposes a method to transfer knowledge from a proprietary LLM to a more accessible one, enabling better document understanding with limited resources.
http://arxiv.org/abs/2409.11277v1
Compressor summary: This text discusses how machine learning methods are influenced by the domain theories they are applied to, arguing that both theory-dependent and theory-independent perspectives are oversimplified.
http://arxiv.org/abs/2409.11274v1
Compressor summary: The authors propose an augmented task arithmetic method for expanding speech-text multimodal foundation models to new language pairs by using a language control model to prevent confusion and improve translation quality.
http://arxiv.org/abs/2409.11272v1
Compressor summary: LOLA is a large language model that works well across many languages by using expert-routing and sparse architecture.
http://arxiv.org/abs/2409.11270v1
Compressor summary: The paper proposes a neural network that optimizes the precoder and phase shifts for reconfigurable intelligent surface-assisted systems, achieving better performance in terms of rate, power consumption, and convergence speed.
http://arxiv.org/abs/2409.11261v1
Compressor summary: The paper presents an education tool that uses Generative AI to create interactive stories for children by combining narrative co-creation, text-to-speech, and text-to-video generation.
http://arxiv.org/abs/2409.11256v1
Compressor summary: TAP is an unsupervised video denoising method that uses a pre-trained image denoiser with tunable temporal modules to harness temporal information and improve denoising performance.
http://arxiv.org/abs/2409.11253v1
Compressor summary: The study analyzes how the norm and variance of contextualized embeddings vary by context and layer in Transformer models, finding a trade-off relationship and a decomposition into within-cluster and between-cluster variances.
http://arxiv.org/abs/2409.11252v1
Compressor summary: The paper evaluates different ASR models for Urdu, comparing their performance on read and conversational speech using WER and error analysis, and highlighting the challenges of developing robust ASR systems for low-resource languages.
http://arxiv.org/abs/2409.11250v1
Compressor summary: The paper evaluates a modified Transformer model with ALiBi, which simulates memory decay and improves its fit to human reading times and sentence processing difficulty.
http://arxiv.org/abs/2409.11242v1
Compressor summary: The authors introduce Trust-Score, a metric to evaluate the trustworthiness of LLMs in RAG systems, and propose Trust-Align, a framework to improve LLMs' performance on RAG tasks.
http://arxiv.org/abs/2409.11241v1
Compressor summary: SponSpeech is a new dataset for punctuation restoration in spontaneous speech, with a filtering pipeline to generate more data and a challenging test set.
http://arxiv.org/abs/2409.11239v1
Compressor summary: The paper analyzes automated evaluators' performance outside of English, finding that English skills transfer well but LLMs have issues with errors and unwanted language in non-English settings.
http://arxiv.org/abs/2409.11236v1
Compressor summary: The paper presents a decision-theoretic method for dimensionality reduction in structural asset management, balancing misclassification costs and preserving discriminatory information.
http://arxiv.org/abs/2409.11235v1
Compressor summary: The paper introduces SLAck, a unified framework that uses semantics, location, and appearance priors to improve open-vocabulary multiple object tracking performance.
http://arxiv.org/abs/2409.11234v1
Compressor summary: The proposed Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) uses historical embedding features to improve target recognition and location in UAV videos, achieving state-of-the-art performance.
http://arxiv.org/abs/2409.11233v1
Compressor summary: This study evaluates compression methods for large language models, showing that SparseGPT and Wanda maintain perplexity but degrade downstream task performance, and introduces JS Divergence as a better metric while emphasizing the importance of task-specific calibration data.
http://arxiv.org/abs/2409.11232v1
Compressor summary: The paper analyzes how well the OpenAI O1-preview model uses external solvers to solve random K-SAT problems, and investigates if it shows any sign of intelligent behavior or just makes random guesses.
http://arxiv.org/abs/2409.11219v1
Compressor summary: This paper proposes Score Forgetting Distillation (SFD), an innovative machine unlearning method that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of "unsafe" classes with those of "safe" ones, without requiring real data.
http://arxiv.org/abs/2409.11218v1
Compressor summary: The text discusses using ChatGPT, a large language model, for data augmentation in aspect-based sentiment analysis, improving performance with three strategies and contrastive learning.
http://arxiv.org/abs/2409.11212v1
Compressor summary: The UPO framework uses uncertainty estimation and reliable feedback sampling to improve large language models' self-evolution and response generation in iterative preference optimization.
http://arxiv.org/abs/2409.11211v1
Compressor summary: The paper proposes an optimization strategy to improve 3D Gaussian Splatting, a method for reconstructing 3D scenes from multi-view images, by modeling splat features as outputs of an implicit neural field.
http://arxiv.org/abs/2409.11164v1
Compressor summary: The study shows that using synthetic data can improve the performance of deep learning-based vision models for robotic mobility aids for blind and low-vision people, but also highlights their limitations compared to real-world data.
http://arxiv.org/abs/2409.11149v1
Compressor summary: SAGED is a benchmarking pipeline that detects and mitigates biases in large language models by using counterfactual branching and baseline calibration.
http://arxiv.org/abs/2409.11148v1
Compressor summary: This paper introduces BLIND-VALM, a visually-augmented LM that uses text representations from CLIP instead of images, achieving similar performance to existing methods with less computation and complexity.
http://arxiv.org/abs/2409.11147v1
Compressor summary: RGER is a novel method that uses graph kernels to select exemplars for in-context learning based on both semantic and structural similarity, improving the performance of large language models on reasoning tasks.
http://arxiv.org/abs/2409.11143v1
Compressor summary: The paper proposes Semformer, a new method for training Transformer language models that uses planning tokens to guide semantic representation prediction, reducing shortcut learning and improving performance on various tasks.
http://arxiv.org/abs/2409.11140v1
Compressor summary: The paper analyses scale generalisation in Gaussian derivative networks using new datasets, pooling methods, regularisation, and discrete approximations, and shows their good performance and explainability.
http://arxiv.org/abs/2409.11138v1
Compressor summary: The text describes how physics informed neural networks, especially Hamiltonian neural networks, can improve the performance of standard neural networks by incorporating physical invariances and conserving energy, and proposes a method to reconstruct and conserve Hamiltonians for generalized non-separable systems using symplectic integrators.
http://arxiv.org/abs/2409.11129v1
Compressor summary: Graph reordering optimizes GNN training by improving memory access patterns and reduces training time on different systems and hyperparameters.
http://arxiv.org/abs/2409.11128v1
Compressor summary: The paper proposes a machine learning method to predict AMD susceptibility genes using fundus and OCT images, as well as medical records, achieving over 80% accuracy.
http://arxiv.org/abs/2409.11123v1
Compressor summary: DAX is a framework that generates saliency-based explanations for deep models without using gradients or model specific information, and it performs better than existing methods in various settings.
http://arxiv.org/abs/2409.11114v1
Compressor summary: The study proposes a new fine-tuning framework for large language models to enhance intent classification for task-oriented dialogue systems, using semantic matching with prototypes derived from class names.
http://arxiv.org/abs/2409.11112v1
Compressor summary: The paper investigates how players strategize and learn in a word-guessing game over two years and tests large language models' abilities to understand and play the game in different languages.
http://arxiv.org/abs/2409.11110v1
Compressor summary: The paper compares the reliability of different models for classifying Whole Slide Images in pathology using three metrics and datasets, and finds the MEAN-POOL-INS model to be the most reliable.
http://arxiv.org/abs/2409.11104v1
Compressor summary: The paper proposes a method to estimate 3D human pose from single RGB images by hallucinating depth information using a heatmap-based estimator and Privileged Information.
http://arxiv.org/abs/2409.11100v1
Compressor summary: The text describes a method to improve naive Bayes classification by estimating variable weights and using sparse regularization.
http://arxiv.org/abs/2409.11078v1
Compressor summary: MonoKAN is a novel ANN architecture that combines the interpretability of KAN with certified partial monotonicity using cubic Hermite splines and positive weights.
http://arxiv.org/abs/2409.11075v1
Compressor summary: ShapeAug++ is a method that enhances occlusion handling in DVS event data using random polygons and curved movements, leading to improved top-1 accuracy for DVS classification.
http://arxiv.org/abs/2409.11074v1
Compressor summary: RoMath is a Romanian mathematical reasoning benchmark suite that aims to improve non-English language models and promote multilingual AI development by covering various domains and difficulty levels in mathematics.
http://arxiv.org/abs/2409.11071v1
Compressor summary: The study found that using mixed precision and carefully chosen hyper-parameters can reduce power consumption in regression ML models, but there was no statistical significance between different techniques or dataset formats.
http://arxiv.org/abs/2409.11068v1
Compressor summary: The project introduces a new RL environment for the MLIR compiler that uses Multi-Action Reinforcement Learning to optimize code performance and achieves comparable or better results than TensorFlow.
http://arxiv.org/abs/2409.11064v1
Compressor summary: The paper proposes a novel Hybrid Multi-Factor framework using a Transformer encoder to predict intraoperative hypotension as a blood pressure forecasting task, addressing distribution shift and sequence dependencies.
http://arxiv.org/abs/2409.11059v1
Compressor summary: OneEncoder is a lightweight framework for cross-modal alignment learning that efficiently integrates information from image, text, audio, and video modalities using a Universal Projection module.
http://arxiv.org/abs/2409.11057v1
Compressor summary: KVPruner is a method to improve efficiency and speed of large language models by pruning non-essential key-value channels using global perplexity analysis and requiring minimal recovery training.
http://arxiv.org/abs/2409.11056v1
Compressor summary: MLPrompt is a new method that helps LLMs understand and follow complex rules by translating them into different languages, leading to better performance than existing methods in various tasks.
http://arxiv.org/abs/2409.11055v1
Compressor summary: The paper evaluates how different quantization methods affect the performance of large language models on various tasks, finding that larger models generally perform better with similar size quantization as smaller ones, and weight-only methods often yield better results in larger models.
http://arxiv.org/abs/2409.11052v1
Compressor summary: The text discusses a method to evaluate binary classifiers by using axioms that ensure logical consistency and allow proving malfunctions with unlabeled data, which has applications in safe AI.
http://arxiv.org/abs/2409.11051v1
Compressor summary: Our method improves ultra-fine-grained image recognition accuracy with less parameters, less floating-point operations, and frozen backbone using down-sampling inter-layer adapters.
http://arxiv.org/abs/2409.11041v1
Compressor summary: The paper explores using large language models to create natural language instructions for collaborative robots in assembly tasks, finding that they can generate accurate first order code but struggle with higher-order code.
http://arxiv.org/abs/2409.11032v1
Compressor summary: A hierarchical framework using large language models can reveal argumentative patterns in textual narratives, as demonstrated by analyzing public opinions on generative AI.
http://arxiv.org/abs/2409.11028v1
Compressor summary: The authors develop a computer vision pipeline to analyze natural images and find that numerosity perception follows a power law distribution and is correlated with other continuous magnitudes.
http://arxiv.org/abs/2409.11024v1
Compressor summary: D2Vformer is a novel model that uses date2vec to generate time position embeddings and an attention mechanism to make predictions on time series data, outperforming existing methods in various scenarios.
http://arxiv.org/abs/2409.11022v1
Compressor summary: Key points: - LLMs are powerful but not efficient for NER tasks - GEIC is a new task to use LLMs for extraction and in-context classification - CascadeNER is a framework that uses two small LLMs in cascading for few-shot and zero-shot NER - AnythingNER is a new multilingual NER dataset for LLMs - CascadeNER outperforms baselines on low-resource and fine-grained scenarios Summary: The paper proposes CascadeNER, a framework that uses two small LLMs in cascading to perform few-shot and zero-shot NER on the new AnythingNER dataset, achieving state-of-the-art results.
http://arxiv.org/abs/2409.11010v1
Compressor summary: The paper introduces MM2Latent, a practical framework for multimodal image generation and editing that improves controllability and efficiency over existing methods.
http://arxiv.org/abs/2409.11008v1
Compressor summary: The paper introduces LMM-VAE, a scalable and interpretable model that combines linear mixed models and variational autoencoders for modelling longitudinal data.
http://arxiv.org/abs/2409.11007v1
Compressor summary: The paper introduces a new test (CAST) to evaluate vision language models' consistency across visual and language inputs, which is important for their generalization abilities.
http://arxiv.org/abs/2409.10999v1
Compressor summary: This paper studies how to improve audio language models for low-resource languages like Thai, and proposes Typhoon-Audio, which performs better than existing models and is comparable to state-of-the-art Gemini-1.5-Pro in English and Thai.
http://arxiv.org/abs/2409.10997v1
Compressor summary: The paper introduces a dataset with different types of adversarial noises to test how well question-answering models perform under realistic distorted inputs.
http://arxiv.org/abs/2409.10996v1
Compressor summary: The paper presents a novel framework for interpreting temporal graph regression models using Information Bottleneck and prototype-based methods, which improves performance and interpretability on traffic datasets.
http://arxiv.org/abs/2409.10994v1
Compressor summary: TRIM is a new approach to improve the efficiency of Multimodal Large Language Models by reducing image tokens using CLIP Metric, achieving significant computational savings without sacrificing performance.
http://arxiv.org/abs/2409.10989v1
Compressor summary: The paper presents GOSt-MT, a Knowledge Graph that analyzes gender bias in machine translation across multiple languages by integrating labour data and textual corpora.
http://arxiv.org/abs/2409.10967v1
Compressor summary: The paper proposes two improvements to relative representations for zero-shot model stitching: normalization and topological densification, leading to better performance on a natural language task.
http://arxiv.org/abs/2409.10965v1
Compressor summary: This study compares monolingual and multilingual NLP models for cross-lingual transfer between Kinyarwanda and Kirundi, finding that multilingual models perform better while still being competitive.
http://arxiv.org/abs/2409.10956v1
Compressor summary: The paper proposes ICON, a framework for Versatile Incremental Learning (VIL) that tackles class and domain confusion using CAST regularization and an Incremental Classifier to avoid overwriting and accumulate new knowledge effectively.
http://arxiv.org/abs/2409.10955v1
Compressor summary: This study examines how memory strength and evidence presentation affect Large Language Models' context-faithfulness when incorporating external information into their responses.
http://arxiv.org/abs/2409.10951v1
Compressor summary: FairAD is a new anomaly detection method that ensures fairness in imbalanced scenarios by using contrastive learning and rebalancing autoencoder modules.
http://arxiv.org/abs/2409.10944v1
Compressor summary: The text introduces Contrasformer, a novel contrastive brain network Transformer that improves the identification of neurological disorders using prior-knowledge-enhanced contrast graphs and attention mechanisms.
http://arxiv.org/abs/2409.10942v1
Compressor summary: This paper shows how reducing data acquisition rates can improve the efficiency, energy consumption, and latency of TinyML models for time series classification on IoT devices with minimal accuracy loss.
http://arxiv.org/abs/2409.10927v1
Compressor summary: Propulsion is a novel method that efficiently fine-tunes large language models for specific tasks by selectively scaling pre-trained dimensions without modifying the model's parameters, reducing computational overhead and maintaining performance.
http://arxiv.org/abs/2409.10921v1
Compressor summary: The paper introduces KALE, a model that generates detailed and meaningful captions for fine-art paintings by using artwork metadata as additional knowledge.
http://arxiv.org/abs/2409.10917v1
Compressor summary: AMEGO enhances comprehension of long egocentric videos by constructing self-contained representations from them, and it is evaluated on the new Active Memories Benchmark.
http://arxiv.org/abs/2409.10910v1
Compressor summary: PINN is a framework that uses deep learning to solve physical problems involving moving boundaries, such as solidification of alloys, by integrating physics knowledge and constraints into neural networks.
http://arxiv.org/abs/2409.10907v1
Compressor summary: Attention-Seeker is an unsupervised keyphrase extraction method that uses self-attention maps from a Large Language Model to estimate the importance of phrases without manual tuning, achieving state-of-the-art results on four datasets.
http://arxiv.org/abs/2409.10898v1
Compressor summary: Key points: - Paper presents hybrid deep learning model for predicting Nepal's seasonal water quality - Model combines CNN and RNN to capture temporal and spatial patterns - Model outperforms traditional methods and provides reliable tool for water quality control Summary: The paper introduces a novel deep learning model that integrates CNN and RNN to forecast Nepal's seasonal water quality from a small dataset, achieving better accuracy than conventional methods.
http://arxiv.org/abs/2409.10897v1
Compressor summary: AutoSpec is a framework that automatically generates accurate specifications for neural networks in learning-augmented systems, overcoming the limitations of manual specification processes.
http://arxiv.org/abs/2409.10889v1
Compressor summary: SFake is a new real-time deepfake detection method that uses mechanical vibrations to identify face swapping in videos.
http://arxiv.org/abs/2409.10883v1
Compressor summary: CREAM is a novel framework that evaluates meeting summarizations without using references by combining chain-of-thought reasoning and key facts alignment with an ELO ranking system.
http://arxiv.org/abs/2409.10874v1
Compressor summary: The study compares the performance of Transformer and Seq2Seq models in sign language translation and tests the impact of adding ResidualLSTM to the Transformer model, which reduces its BLEU Score by 23.37%.
http://arxiv.org/abs/2409.10870v1
Compressor summary: The paper introduces adaptive computations for LLMs to make AI architectures depth and context adaptive using attention shortcuts, improving performance on various datasets.
http://arxiv.org/abs/2409.10848v1
Compressor summary: 3DFacePolicy is a diffusion policy model for realistic 3D facial animation prediction using audio and vertex states to imitate human emotions.
http://arxiv.org/abs/2409.10847v1
Compressor summary: Bidirectional Autoregressive Diffusion (BAD) combines autoregressive and mask-based models to improve sequence modeling by using a permutation-based corruption technique that preserves structure and enforces causality, leading to better text-to-motion generation.
http://arxiv.org/abs/2409.10840v1
Compressor summary: This study evaluates the reasoning abilities of deep time series forecasting models and finds evidence of effective generalization beyond pattern memorization.
http://arxiv.org/abs/2409.10838v1
Compressor summary: Key points: - Urban safety is a major concern that can benefit from accurate crime prediction using ML techniques - The paper uses police dispatch call data from San Jose, CA to categorize calls into priority levels based on time, place, and nature - Random Forest models are the most effective in identifying dangerous situations with high accuracy and low false negatives Summary: The paper applies ML techniques, especially Random Forest models, to predict and categorize crime risks from police dispatch call data in San Jose, CA, improving urban safety outcomes.
http://arxiv.org/abs/2409.10829v1
Compressor summary: ReXErr is a method that uses large language models to generate realistic errors in chest X-ray reports for improving radiology reporting quality and reliability.