arxiv compressed, 2024-09-18

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-18 generated by the compressor, my personal LLM-based project.


Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

http://arxiv.org/abs/2409.11406v1

Compressor summary: Phidias is a novel generative model that uses diffusion and reference-augmented 3D generation to improve quality, generalization, and controllability in 3D modeling.


AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

http://arxiv.org/abs/2409.11404v1

Compressor summary: The paper introduces seven synthetic datasets for underrepresented Arabic dialects, a benchmark for evaluating LLMs on dialect comprehension and generation, and highlights challenges in capturing diverse Arabic dialects and cultural contexts.


NVLM: Open Frontier-Class Multimodal LLMs

http://arxiv.org/abs/2409.11402v1

Compressor summary: NVLM 1.0 is a state-of-the-art multimodal language model that outperforms leading models on vision-language tasks, with a novel architecture and curation of high-quality datasets.


Says Who? Effective Zero-Shot Annotation of Focalization

http://arxiv.org/abs/2409.11390v1

Compressor summary: The paper evaluates Large Language Models' performance in identifying narrative focalization and shows their potential for studying literary texts.


Normalization in Proportional Feature Spaces

http://arxiv.org/abs/2409.11389v1

Compressor summary: The text discusses feature normalization methods for data analysis and modeling, focusing on uniform and proportional features and their comparisons, with some examples of normalization and similarity measures.


Ultrasound Image Enhancement with the Variance of Diffusion Models

http://arxiv.org/abs/2409.11380v1

Compressor summary: The paper presents a novel method that uses adaptive beamforming and denoising diffusion to enhance ultrasound images by balancing contrast, resolution, and speckle preservation.


Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

http://arxiv.org/abs/2409.11378v1

Compressor summary: This paper proposes a data selection method for fine-tuning large language models that focuses on diversity rather than quality, using k-means clustering and iterative refinement to improve performance across various tasks.


Machine Learning on Dynamic Functional Connectivity: Promise, Pitfalls, and Interpretations

http://arxiv.org/abs/2409.11377v1

Compressor summary: The text summarizes the authors' study of using existing fMRI data to understand human cognition/behavior, evaluate current deep models for cognitive task recognition and disease diagnosis, and provide guidelines for selecting suitable machine learning backbones for new neuroimaging applications.


Towards Time Series Reasoning with LLMs

http://arxiv.org/abs/2409.11376v1

Compressor summary: The paper proposes a multi-modal time-series language model that extracts and reasons about time-series information using a lightweight encoder and chain-of-thought augmentation, achieving zero-shot performance across various domains.


Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification

http://arxiv.org/abs/2409.11375v1

Compressor summary: The authors propose a self-supervised framework using large language models and SwinV2 to improve retinal disease diagnosis from multi-modal data, enhancing generalization and performance on smaller datasets.


OSV: One Step is Enough for High-Quality Image to Video Generation

http://arxiv.org/abs/2409.11367v1

Compressor summary: The authors propose a two-stage training framework that combines consistency distillation with GAN training to accelerate video diffusion, leading to high-quality videos in one step and outperforming existing methods.


CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

http://arxiv.org/abs/2409.11365v1

Compressor summary: The paper explores how to enhance the safety-awareness of multimodal language models against malicious image inputs using a technique called CoCA.


CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

http://arxiv.org/abs/2409.11363v1

Compressor summary: CORE-Bench is a benchmark for measuring AI agents' accuracy in performing computational reproducibility tasks across three disciplines, aiming to improve scientific processes and agent development.


Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

http://arxiv.org/abs/2409.11355v1

Compressor summary: The paper proposes a faster and more accurate monocular depth estimator by fixing an inference pipeline flaw, fine-tuning the model with task-specific losses, and applying the method to Stable Diffusion.


THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models

http://arxiv.org/abs/2409.11353v1

Compressor summary: THaMES is a framework for detecting and mitigating hallucinations in large language models using various strategies and automated test set generation.


CLIP Adaptation by Intra-modal Overlap Reduction

http://arxiv.org/abs/2409.11338v1

Compressor summary: The text analyses how CLIP model's high cosine similarity between paired and unpaired images affects few-shot classification and proposes a lightweight adapter to reduce this overlap, improving performance and robustness.


Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

http://arxiv.org/abs/2409.11329v1

Compressor summary: The paper proposes a self-distillation method to solve catastrophic forgetting in continual learning and improves it with a memory update technique that prioritizes storing misclassified samples.


TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem

http://arxiv.org/abs/2409.11325v1

Compressor summary: TopoMask is a new method that uses mask-based instances and attention-based transformers to improve centerline prediction from road images, achieving state-of-the-art performance on OpenLane-V2 dataset.


LPT++: Efficient Training on Mixture of Long-tailed Experts

http://arxiv.org/abs/2409.11323v1

Compressor summary: LPT++ is a framework for long-tailed classification that combines fine-tuning, model ensemble, and three core components to improve Vision Transformers' performance with minimal additional parameters.


SOAP: Improving and Stabilizing Shampoo using Adam

http://arxiv.org/abs/2409.11321v1

Compressor summary: SOAP is a computationally efficient optimization algorithm that combines the benefits of Shampoo and Adam, reducing the number of iterations and wall clock time for large-scale language model pre-training.


fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction

http://arxiv.org/abs/2409.11315v1

Compressor summary: The paper introduces the fMRI-3D dataset, a collection of fMRI data and 3D object images with text captions, and presents MinD-3D, a framework to reconstruct 3D objects from fMRI signals using a generative transformer decoder.


SpMis: An Investigation of Synthetic Spoken Misinformation Detection

http://arxiv.org/abs/2409.11308v1

Compressor summary: The text discusses the advances and challenges of speech generation technology, especially in detecting misinformation in synthetic spoken content, and introduces an open-source dataset (SpMis) to study this issue.


GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

http://arxiv.org/abs/2409.11307v1

Compressor summary: GS-Net is a plug-and-play module that improves 3D Gaussian Splatting by densifying Gaussian ellipsoids from sparse point clouds, achieving better generalization and rendering quality on novel viewpoints using the CARLA-NVS dataset.


Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models

http://arxiv.org/abs/2409.11302v1

Compressor summary: The text discusses using Parameter-Efficient Fine-Tuning techniques for time series models in healthcare applications, particularly forecasting vital signs of sepsis patients, and shows that some methods outperform existing approaches while fine-tuning fewer parameters.


Navigating Process Mining: A Case study using pm4py

http://arxiv.org/abs/2409.11294v1

Compressor summary: The paper uses pm4py to analyze road traffic fine management processes and discover their models, patterns, and limitations using various process-mining techniques.


Neural Networks for Vehicle Routing Problem

http://arxiv.org/abs/2409.11290v1

Compressor summary: The text discusses using neural networks as a new tool for optimizing vehicle routes, presenting a novel graphical neural network model and demonstrating its efficiency through tests.


Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5

http://arxiv.org/abs/2409.11282v1

Compressor summary: The paper proposes a method to transfer knowledge from a proprietary LLM to a more accessible one, enabling better document understanding with limited resources.


Machine Learning and Theory Ladenness -- A Phenomenological Account

http://arxiv.org/abs/2409.11277v1

Compressor summary: This text discusses how machine learning methods are influenced by the domain theories they are applied to, arguing that both theory-dependent and theory-independent perspectives are oversimplified.


Task Arithmetic for Language Expansion in Speech Translation

http://arxiv.org/abs/2409.11274v1

Compressor summary: The authors propose an augmented task arithmetic method for expanding speech-text multimodal foundation models to new language pairs by using a language control model to prevent confusion and improve translation quality.


LOLA -- An Open-Source Massively Multilingual Large Language Model

http://arxiv.org/abs/2409.11272v1

Compressor summary: LOLA is a large language model that works well across many languages by using expert-routing and sparse architecture.


Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS

http://arxiv.org/abs/2409.11270v1

Compressor summary: The paper proposes a neural network that optimizes the precoder and phase shifts for reconfigurable intelligent surface-assisted systems, achieving better performance in terms of rate, power consumption, and convergence speed.


The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

http://arxiv.org/abs/2409.11261v1

Compressor summary: The paper presents an education tool that uses Generative AI to create interactive stories for children by combining narrative co-creation, text-to-speech, and text-to-video generation.


Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers

http://arxiv.org/abs/2409.11256v1

Compressor summary: TAP is an unsupervised video denoising method that uses a pre-trained image denoiser with tunable temporal modules to harness temporal information and improve denoising performance.


Norm of Mean Contextualized Embeddings Determines their Variance

http://arxiv.org/abs/2409.11253v1

Compressor summary: The study analyzes how the norm and variance of contextualized embeddings vary by context and layer in Transformer models, finding a trade-off relationship and a decomposition into within-cluster and between-cluster variances.


WER We Stand: Benchmarking Urdu ASR Models

http://arxiv.org/abs/2409.11252v1

Compressor summary: The paper evaluates different ASR models for Urdu, comparing their performance on read and conversational speech using WER and error analysis, and highlighting the challenges of developing robust ASR systems for low-resource languages.


Linear Recency Bias During Training Improves Transformers' Fit to Reading Times

http://arxiv.org/abs/2409.11250v1

Compressor summary: The paper evaluates a modified Transformer model with ALiBi, which simulates memory decay and improves its fit to human reading times and sentence processing difficulty.


Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

http://arxiv.org/abs/2409.11242v1

Compressor summary: The authors introduce Trust-Score, a metric to evaluate the trustworthiness of LLMs in RAG systems, and propose Trust-Align, a framework to improve LLMs' performance on RAG tasks.


Spontaneous Informal Speech Dataset for Punctuation Restoration

http://arxiv.org/abs/2409.11241v1

Compressor summary: SponSpeech is a new dataset for punctuation restoration in spontaneous speech, with a filtering pipeline to generate more data and a challenging test set.


LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

http://arxiv.org/abs/2409.11239v1

Compressor summary: The paper analyzes automated evaluators' performance outside of English, finding that English skills transfer well but LLMs have issues with errors and unwanted language in non-English settings.


Cost-informed dimensionality reduction for structural digital twin technologies

http://arxiv.org/abs/2409.11236v1

Compressor summary: The paper presents a decision-theoretic method for dimensionality reduction in structural asset management, balancing misclassification costs and preserving discriminatory information.


SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking

http://arxiv.org/abs/2409.11235v1

Compressor summary: The paper introduces SLAck, a unified framework that uses semantics, location, and appearance priors to improve open-vocabulary multiple object tracking performance.


STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

http://arxiv.org/abs/2409.11234v1

Compressor summary: The proposed Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) uses historical embedding features to improve target recognition and location in UAV videos, achieving state-of-the-art performance.


Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

http://arxiv.org/abs/2409.11233v1

Compressor summary: This study evaluates compression methods for large language models, showing that SparseGPT and Wanda maintain perplexity but degrade downstream task performance, and introduces JS Divergence as a better metric while emphasizing the importance of task-specific calibration data.


Fast Analysis of the OpenAI O1-Preview Model in Solving Random K-SAT Problem: Does the LLM Solve the Problem Itself or Call an External SAT Solver?

http://arxiv.org/abs/2409.11232v1

Compressor summary: The paper analyzes how well the OpenAI O1-preview model uses external solvers to solve random K-SAT problems, and investigates if it shows any sign of intelligent behavior or just makes random guesses.


Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

http://arxiv.org/abs/2409.11219v1

Compressor summary: This paper proposes Score Forgetting Distillation (SFD), an innovative machine unlearning method that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of "unsafe" classes with those of "safe" ones, without requiring real data.


Exploring ChatGPT-based Augmentation Strategies for Contrastive Aspect-based Sentiment Analysis

http://arxiv.org/abs/2409.11218v1

Compressor summary: The text discusses using ChatGPT, a large language model, for data augmentation in aspect-based sentiment analysis, improving performance with three strategies and contrastive learning.


Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization

http://arxiv.org/abs/2409.11212v1

Compressor summary: The UPO framework uses uncertainty estimation and reliable feedback sampling to improve large language models' self-evolution and response generation in iterative preference optimization.


SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

http://arxiv.org/abs/2409.11211v1

Compressor summary: The paper proposes an optimization strategy to improve 3D Gaussian Splatting, a method for reconstructing 3D scenes from multi-view images, by modeling splat features as outputs of an implicit neural field.


Synthetic data augmentation for robotic mobility aids to support blind and low vision people

http://arxiv.org/abs/2409.11164v1

Compressor summary: The study shows that using synthetic data can improve the performance of deep learning-based vision models for robotic mobility aids for blind and low-vision people, but also highlights their limitations compared to real-world data.


SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration

http://arxiv.org/abs/2409.11149v1

Compressor summary: SAGED is a benchmarking pipeline that detects and mitigates biases in large language models by using counterfactual branching and baseline calibration.


Improving the Efficiency of Visually Augmented Language Models

http://arxiv.org/abs/2409.11148v1

Compressor summary: This paper introduces BLIND-VALM, a visually-augmented LM that uses text representations from CLIP instead of images, achieving similar performance to existing methods with less computation and complexity.


Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

http://arxiv.org/abs/2409.11147v1

Compressor summary: RGER is a novel method that uses graph kernels to select exemplars for in-context learning based on both semantic and structural similarity, improving the performance of large language models on reasoning tasks.


Semformer: Transformer Language Models with Semantic Planning

http://arxiv.org/abs/2409.11143v1

Compressor summary: The paper proposes Semformer, a new method for training Transformer language models that uses planning tokens to guide semantic representation prediction, reducing shortcut learning and improving performance on various tasks.


Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations

http://arxiv.org/abs/2409.11140v1

Compressor summary: The paper analyses scale generalisation in Gaussian derivative networks using new datasets, pooling methods, regularisation, and discrete approximations, and shows their good performance and explainability.


Learning Generalized Hamiltonians using fully Symplectic Mappings

http://arxiv.org/abs/2409.11138v1

Compressor summary: The text describes how physics informed neural networks, especially Hamiltonian neural networks, can improve the performance of standard neural networks by incorporating physical invariances and conserving energy, and proposes a method to reconstruct and conserve Hamiltonians for generalized non-separable systems using symplectic integrators.


Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study

http://arxiv.org/abs/2409.11129v1

Compressor summary: Graph reordering optimizes GNN training by improving memory access patterns and reduces training time on different systems and hyperparameters.


Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

http://arxiv.org/abs/2409.11128v1

Compressor summary: The paper proposes a machine learning method to predict AMD susceptibility genes using fundus and OCT images, as well as medical records, achieving over 80% accuracy.


Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach

http://arxiv.org/abs/2409.11123v1

Compressor summary: DAX is a framework that generates saliency-based explanations for deep models without using gradients or model specific information, and it performs better than existing methods in various settings.


Diversity-grounded Channel Prototypical Learning for Out-of-Distribution Intent Detection

http://arxiv.org/abs/2409.11114v1

Compressor summary: The study proposes a new fine-tuning framework for large language models to enhance intent classification for task-oriented dialogue systems, using semantic matching with prototypes derived from class names.


Strategic Insights in Human and Large Language Model Tactics at Word Guessing Games

http://arxiv.org/abs/2409.11112v1

Compressor summary: The paper investigates how players strategize and learn in a word-guessing game over two years and tests large language models' abilities to understand and play the game in different languages.


Quantitative Evaluation of MILs' Reliability For WSIs Classification

http://arxiv.org/abs/2409.11110v1

Compressor summary: The paper compares the reliability of different models for classifying Whole Slide Images in pathology using three metrics and datasets, and finds the MEAN-POOL-INS model to be the most reliable.


Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB

http://arxiv.org/abs/2409.11104v1

Compressor summary: The paper proposes a method to estimate 3D human pose from single RGB images by hallucinating depth information using a heatmap-based estimator and Privileged Information.


Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier

http://arxiv.org/abs/2409.11100v1

Compressor summary: The text describes a method to improve naive Bayes classification by estimating variable weights and using sparse regularization.


MonoKAN: Certified Monotonic Kolmogorov-Arnold Network

http://arxiv.org/abs/2409.11078v1

Compressor summary: MonoKAN is a novel ANN architecture that combines the interpretability of KAN with certified partial monotonicity using cubic Hermite splines and positive weights.


ShapeAug++: More Realistic Shape Augmentation for Event Data

http://arxiv.org/abs/2409.11075v1

Compressor summary: ShapeAug++ is a method that enhances occlusion handling in DVS event data using random polygons and curved movements, leading to improved top-1 accuracy for DVS classification.


RoMath: A Mathematical Reasoning Benchmark in Romanian

http://arxiv.org/abs/2409.11074v1

Compressor summary: RoMath is a Romanian mathematical reasoning benchmark suite that aims to improve non-English language models and promote multilingual AI development by covering various domains and difficulty levels in mathematics.


Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms

http://arxiv.org/abs/2409.11071v1

Compressor summary: The study found that using mixed precision and carefully chosen hyper-parameters can reduce power consumption in regression ML models, but there was no statistical significance between different techniques or dataset formats.


A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler

http://arxiv.org/abs/2409.11068v1

Compressor summary: The project introduces a new RL environment for the MLIR compiler that uses Multi-Action Reinforcement Learning to optimize code performance and achieves comparable or better results than TensorFlow.


HMF: A Hybrid Multi-Factor Framework for Dynamic Intraoperative Hypotension Prediction

http://arxiv.org/abs/2409.11064v1

Compressor summary: The paper proposes a novel Hybrid Multi-Factor framework using a Transformer encoder to predict intraoperative hypotension as a blood pressure forecasting task, addressing distribution shift and sequence dependencies.


OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities

http://arxiv.org/abs/2409.11059v1

Compressor summary: OneEncoder is a lightweight framework for cross-modal alignment learning that efficiently integrates information from image, text, audio, and video modalities using a Universal Projection module.


KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models

http://arxiv.org/abs/2409.11057v1

Compressor summary: KVPruner is a method to improve efficiency and speed of large language models by pruning non-essential key-value channels using global perplexity analysis and requiring minimal recovery training.


Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

http://arxiv.org/abs/2409.11056v1

Compressor summary: MLPrompt is a new method that helps LLMs understand and follow complex rules by translating them into different languages, leading to better performance than existing methods in various tasks.


A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B

http://arxiv.org/abs/2409.11055v1

Compressor summary: The paper evaluates how different quantization methods affect the performance of large language models on various tasks, finding that larger models generally perform better with similar size quantization as smaller ones, and weight-only methods often yield better results in larger models.


A logical alarm for misaligned binary classifiers

http://arxiv.org/abs/2409.11052v1

Compressor summary: The text discusses a method to evaluate binary classifiers by using axioms that ensure logical consistency and allow proving malfunctions with unlabeled data, which has applications in safe AI.


Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition

http://arxiv.org/abs/2409.11051v1

Compressor summary: Our method improves ultra-fine-grained image recognition accuracy with less parameters, less floating-point operations, and frozen backbone using down-sampling inter-layer adapters.


Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming

http://arxiv.org/abs/2409.11041v1

Compressor summary: The paper explores using large language models to create natural language instructions for collaborative robots in assembly tasks, finding that they can generate accurate first order code but struggle with higher-order code.


Hierarchical Narrative Analysis: Unraveling Perceptions of Generative AI

http://arxiv.org/abs/2409.11032v1

Compressor summary: A hierarchical framework using large language models can reveal argumentative patterns in textual narratives, as demonstrated by analyzing public opinions on generative AI.


Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision

http://arxiv.org/abs/2409.11028v1

Compressor summary: The authors develop a computer vision pipeline to analyze natural images and find that numerosity perception follows a power law distribution and is correlated with other continuous magnitudes.


D2Vformer: A Flexible Time Series Prediction Model Based on Time Position Embedding

http://arxiv.org/abs/2409.11024v1

Compressor summary: D2Vformer is a novel model that uses date2vec to generate time position embeddings and an attention mechanism to make predictions on time series data, outperforming existing methods in various scenarios.


GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models

http://arxiv.org/abs/2409.11022v1

Compressor summary: Key points: - LLMs are powerful but not efficient for NER tasks - GEIC is a new task to use LLMs for extraction and in-context classification - CascadeNER is a framework that uses two small LLMs in cascading for few-shot and zero-shot NER - AnythingNER is a new multilingual NER dataset for LLMs - CascadeNER outperforms baselines on low-resource and fine-grained scenarios Summary: The paper proposes CascadeNER, a framework that uses two small LLMs in cascading to perform few-shot and zero-shot NER on the new AnythingNER dataset, achieving state-of-the-art results.


MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance

http://arxiv.org/abs/2409.11010v1

Compressor summary: The paper introduces MM2Latent, a practical framework for multimodal image generation and editing that improves controllability and efficiency over existing methods.


Latent mixed-effect models for high-dimensional longitudinal data

http://arxiv.org/abs/2409.11008v1

Compressor summary: The paper introduces LMM-VAE, a scalable and interpretable model that combines linear mixed models and variational autoencoders for modelling longitudinal data.


CAST: Cross-modal Alignment Similarity Test for Vision Language Models

http://arxiv.org/abs/2409.11007v1

Compressor summary: The paper introduces a new test (CAST) to evaluate vision language models' consistency across visual and language inputs, which is important for their generalization abilities.


Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

http://arxiv.org/abs/2409.10999v1

Compressor summary: This paper studies how to improve audio language models for low-resource languages like Thai, and proposes Typhoon-Audio, which performs better than existing models and is comparable to state-of-the-art Gemini-1.5-Pro in English and Thai.


Contextual Breach: Assessing the Robustness of Transformer-based QA Models

http://arxiv.org/abs/2409.10997v1

Compressor summary: The paper introduces a dataset with different types of adversarial noises to test how well question-answering models perform under realistic distorted inputs.


GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method

http://arxiv.org/abs/2409.10996v1

Compressor summary: The paper presents a novel framework for interpreting temporal graph regression models using Information Bottleneck and prototype-based methods, which improves performance and interpretability on traffic datasets.


Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs

http://arxiv.org/abs/2409.10994v1

Compressor summary: TRIM is a new approach to improve the efficiency of Multimodal Large Language Models by reducing image tokens using CLIP Metric, achieving significant computational savings without sacrificing performance.


GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation

http://arxiv.org/abs/2409.10989v1

Compressor summary: The paper presents GOSt-MT, a Knowledge Graph that analyzes gender bias in machine translation across multiple languages by integrating labour data and textual corpora.


Relative Representations: Topological and Geometric Perspectives

http://arxiv.org/abs/2409.10967v1

Compressor summary: The paper proposes two improvements to relative representations for zero-shot model stitching: normalization and topological densification, leading to better performance on a natural language task.


Cross-lingual transfer of multilingual models on low resource African Languages

http://arxiv.org/abs/2409.10965v1

Compressor summary: This study compares monolingual and multilingual NLP models for cross-lingual transfer between Kinyarwanda and Kirundi, finding that multilingual models perform better while still being competitive.


Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning

http://arxiv.org/abs/2409.10956v1

Compressor summary: The paper proposes ICON, a framework for Versatile Incremental Learning (VIL) that tackles class and domain confusion using CAST regularization and an Incremental Classifier to avoid overwriting and accumulate new knowledge effectively.


Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

http://arxiv.org/abs/2409.10955v1

Compressor summary: This study examines how memory strength and evidence presentation affect Large Language Models' context-faithfulness when incorporating external information into their responses.


Fair Anomaly Detection For Imbalanced Groups

http://arxiv.org/abs/2409.10951v1

Compressor summary: FairAD is a new anomaly detection method that ensures fairness in imbalanced scenarios by using contrastive learning and rebalancing autoencoder modules.


Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification

http://arxiv.org/abs/2409.10944v1

Compressor summary: The text introduces Contrasformer, a novel contrastive brain network Transformer that improves the identification of neurological disorders using prior-knowledge-enhanced contrast graphs and attention mechanisms.


Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers

http://arxiv.org/abs/2409.10942v1

Compressor summary: This paper shows how reducing data acquisition rates can improve the efficiency, energy consumption, and latency of TinyML models for time series classification on IoT devices with minimal accuracy loss.


Propulsion: Steering LLM with Tiny Fine-Tuning

http://arxiv.org/abs/2409.10927v1

Compressor summary: Propulsion is a novel method that efficiently fine-tunes large language models for specific tasks by selectively scaling pre-trained dimensions without modifying the model's parameters, reducing computational overhead and maintaining performance.


KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph

http://arxiv.org/abs/2409.10921v1

Compressor summary: The paper introduces KALE, a model that generates detailed and meaningful captions for fine-art paintings by using artwork metadata as additional knowledge.


AMEGO: Active Memory from long EGOcentric videos

http://arxiv.org/abs/2409.10917v1

Compressor summary: AMEGO enhances comprehension of long egocentric videos by constructing self-contained representations from them, and it is evaluated on the new Active Memories Benchmark.


A Physics Informed Neural Network (PINN) Methodology for Coupled Moving Boundary PDEs

http://arxiv.org/abs/2409.10910v1

Compressor summary: PINN is a framework that uses deep learning to solve physical problems involving moving boundaries, such as solidification of alloys, by integrating physics knowledge and constraints into neural networks.


Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

http://arxiv.org/abs/2409.10907v1

Compressor summary: Attention-Seeker is an unsupervised keyphrase extraction method that uses self-attention maps from a Large Language Model to estimate the importance of phrases without manual tuning, achieving state-of-the-art results on four datasets.


WaterQualityNeT: Prediction of Seasonal Water Quality of Nepal Using Hybrid Deep Learning Models

http://arxiv.org/abs/2409.10898v1

Compressor summary: Key points: - Paper presents hybrid deep learning model for predicting Nepal's seasonal water quality - Model combines CNN and RNN to capture temporal and spatial patterns - Model outperforms traditional methods and provides reliable tool for water quality control Summary: The paper introduces a novel deep learning model that integrates CNN and RNN to forecast Nepal's seasonal water quality from a small dataset, achieving better accuracy than conventional methods.


AutoSpec: Automated Generation of Neural Network Specifications

http://arxiv.org/abs/2409.10897v1

Compressor summary: AutoSpec is a framework that automatically generates accurate specifications for neural networks in learning-augmented systems, overcoming the limitations of manual specification processes.


Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes

http://arxiv.org/abs/2409.10889v1

Compressor summary: SFake is a new real-time deepfake detection method that uses mechanical vibrations to identify face swapping in videos.


CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization

http://arxiv.org/abs/2409.10883v1

Compressor summary: CREAM is a novel framework that evaluates meeting summarizations without using references by combining chain-of-thought reasoning and key facts alignment with an ELO ranking system.


American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM

http://arxiv.org/abs/2409.10874v1

Compressor summary: The study compares the performance of Transformer and Seq2Seq models in sign language translation and tests the impact of adding ResidualLSTM to the Transformer model, which reduces its BLEU Score by 23.37%.


Adaptive Large Language Models By Layerwise Attention Shortcuts

http://arxiv.org/abs/2409.10870v1

Compressor summary: The paper introduces adaptive computations for LLMs to make AI architectures depth and context adaptive using attention shortcuts, improving performance on various datasets.


3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy

http://arxiv.org/abs/2409.10848v1

Compressor summary: 3DFacePolicy is a diffusion policy model for realistic 3D facial animation prediction using audio and vertex states to imitate human emotions.


BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

http://arxiv.org/abs/2409.10847v1

Compressor summary: Bidirectional Autoregressive Diffusion (BAD) combines autoregressive and mask-based models to improve sequence modeling by using a permutation-based corruption technique that preserves structure and enforces causality, leading to better text-to-motion generation.


Implicit Reasoning in Deep Time Series Forecasting

http://arxiv.org/abs/2409.10840v1

Compressor summary: This study evaluates the reasoning abilities of deep time series forecasting models and finds evidence of effective generalization beyond pattern memorization.


Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety

http://arxiv.org/abs/2409.10838v1

Compressor summary: Key points: - Urban safety is a major concern that can benefit from accurate crime prediction using ML techniques - The paper uses police dispatch call data from San Jose, CA to categorize calls into priority levels based on time, place, and nature - Random Forest models are the most effective in identifying dangerous situations with high accuracy and low false negatives Summary: The paper applies ML techniques, especially Random Forest models, to predict and categorize crime risks from police dispatch call data in San Jose, CA, improving urban safety outcomes.


ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

http://arxiv.org/abs/2409.10829v1

Compressor summary: ReXErr is a method that uses large language models to generate realistic errors in chest X-ray reports for improving radiology reporting quality and reliability.