arxiv compressed, 2024-09-25

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-25 generated by the compressor, my personal LLM-based project.

Self-Supervised Any-Point Tracking by Contrastive Random Walks

http://arxiv.org/abs/2409.16288v1

Compressor summary: A simple, self-supervised approach to tracking any point uses a global matching transformer with attention-based cycles to find consistent tracks through video via random walks on a space-time graph, achieving strong performance and avoiding complexities of other methods.

MonoFormer: One Transformer for Both Diffusion and Autoregression

http://arxiv.org/abs/2409.16280v1

Compressor summary: The paper proposes a simple method to use one transformer for both text and visual generation, achieving comparable performance to existing methods.

AIM 2024 Challenge on UHD Blind Photo Quality Assessment

http://arxiv.org/abs/2409.16271v1

Compressor summary: The AIM 2024 UHD-IQA Challenge is a competition to develop No-Reference Image Quality Assessment models for high-resolution photos, with winners determined by performance, accuracy, and computational efficiency.

CDChat: A Large Multimodal Model for Remote Sensing Change Description

http://arxiv.org/abs/2409.16261v1

Compressor summary: The paper introduces a new dataset and method to improve large multimodal models' ability to describe changes between bi-temporal remote sensing images.

Learning To Help: Training Models to Assist Legacy Devices

http://arxiv.org/abs/2409.16253v1

Compressor summary: The paper proposes a method to train experts for legacy devices in the context of learning with abstention, using Bayes-optimal rejection rules and surrogate losses.

A fast and sound tagging method for discontinuous named-entity recognition

http://arxiv.org/abs/2409.16243v1

Compressor summary: The authors propose a new tagging scheme for identifying discontinuous named entities using a weighted finite state automaton that guarantees correctness and simplicity.

LLM Echo Chamber: personalized and automated disinformation

http://arxiv.org/abs/2409.16241v1

Compressor summary: This study explores how large language models like GPT4 can spread misinformation in simulated social media environments and highlights the need for ethical safeguards.

Label-Augmented Dataset Distillation

http://arxiv.org/abs/2409.16239v1

Compressor summary: LADD enhances dataset distillation by adding label augmentations to synthetic images, improving training efficiency and performance.

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

http://arxiv.org/abs/2409.16238v1

Compressor summary: SPECTRUM is a scalable framework for learning accurate and explainable logical models from relational data using rule utility and linear-time mining of recurrent structures.

EuroLLM: Multilingual Language Models for Europe

http://arxiv.org/abs/2409.16235v1

Compressor summary: The EuroLLM project develops open-weight multilingual LLMs for various European languages and releases initial models with good performance on multilingual tasks.

Predicting Deterioration in Mild Cognitive Impairment with Survival Transformers, Extreme Gradient Boosting and Cox Proportional Hazard Modelling

http://arxiv.org/abs/2409.16231v1

Compressor summary: The paper uses advanced machine learning and transformer techniques to predict cognitive decline in MCI patients using metabolomics data, showing their potential for early detection and intervention in Alzheimer's disease, and highlighting the importance of non-invasive biomarkers.

Fine-Tuning is Fine, if Calibrated

http://arxiv.org/abs/2409.16223v1

Compressor summary: This paper investigates why fine-tuning a model causes accuracy issues with unseen classes, and proposes post-processing calibration as a simple solution to restore the model's performance and reveal improved features for all classes.

Problem-oriented AutoML in Clustering

http://arxiv.org/abs/2409.16218v1

Compressor summary: PoAC is an AutoML framework that adapts to diverse clustering tasks by using a surrogate model trained on previous clustering solutions, enabling algorithm-agnostic and customizable clustering pipelines.

Deep Learning for Precision Agriculture: Post-Spraying Evaluation and Deposition Estimation

http://arxiv.org/abs/2409.16213v1

Compressor summary: The authors propose an XAI computer vision pipeline to evaluate precision spraying systems post-spraying using semantic segmentation and class-specific spray deposition estimation.

MaskBit: Embedding-free Image Generation via Bit Tokens

http://arxiv.org/abs/2409.16211v1

Compressor summary: The study introduces an improved VQGAN and an embedding-free image generation network using bit tokens, achieving high performance and state-of-the-art results on the ImageNet 256x256 benchmark.

LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM

http://arxiv.org/abs/2409.16209v1

Compressor summary: LLMCount uses large-language models to improve crowd detection in millimeter wave sensing by compensating signal power and achieving higher accuracy with less latency.

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

http://arxiv.org/abs/2409.16202v1

Compressor summary: CJEval is a new benchmark for evaluating Large Language Models' performance in educational applications, covering Chinese Junior High School Exams with detailed annotations.

Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

http://arxiv.org/abs/2409.16198v1

Compressor summary: The paper proposes AiRTran, a method to select the best PLM for text ranking by estimating its expected rank, which captures the model's ranking performance better than previous methods.

Second Order Bounds for Contextual Bandits with Function Approximation

http://arxiv.org/abs/2409.16197v1

Compressor summary: Key points: - The text is about developing algorithms for contextual bandits with function approximation - The optimism principle based algorithms, such as optimistic least squares, have regret that scales with some factors - The paper proposes new algorithms that have better regret bounds when the variances of the rewards are unknown Summary: The paper presents novel algorithms for contextual bandits with function approximation that achieve improved regret bounds when the reward variances are unknown and vary over time.

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

http://arxiv.org/abs/2409.16191v1

Compressor summary: HelloBench is a benchmark for evaluating large language models' long text generation capabilities using five subtasks and a human-aligned evaluation method called HelloEval.

Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

http://arxiv.org/abs/2409.16183v1

Compressor summary: RadFound is a vision-language foundation model tailored for radiology tasks that uses an extensive dataset to perform better than existing models on various multimodal interpretation and generation tasks.

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

http://arxiv.org/abs/2409.16178v1

Compressor summary: SDFit is a novel framework that uses learned signed-distance-function models as a strong shape prior to recover 3D object pose and shape from single images, refining estimates through an iterative render-and-compare process.

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

http://arxiv.org/abs/2409.16167v1

Compressor summary: The paper introduces Minimal Semantic Units (MSUs) for fine-tuning large language models by disassembling and reassembling multiple Low-Rank Adaptations (LoRAs), improving performance with the LoRA-LEGO framework.

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

http://arxiv.org/abs/2409.16165v1

Compressor summary: EnIGMA is a language model agent for solving Capture The Flag challenges with new Agent-Computer Interfaces and interactive command-line utilities, achieving state-of-the-art results on two benchmarks.

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

http://arxiv.org/abs/2409.16160v1

Compressor summary: MIMO is a novel framework that can generate realistic character videos with controllable attributes using spatial decomposition and encoded signals, achieving scalability, generality, and applicability in various scenarios.

ComiCap: A VLMs pipeline for dense captioning of Comic Panels

http://arxiv.org/abs/2409.16159v1

Compressor summary: The text describes a pipeline that uses Vision-Language Models to generate informative captions for comic panels, along with a new metric and test set to evaluate them.

Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

http://arxiv.org/abs/2409.16146v1

Compressor summary: The paper proposes a counterfactual prompting framework for RAG models to improve predictive uncertainty and control risks in their predictions.

Seeing Faces in Things: A Model and Dataset for Pareidolia

http://arxiv.org/abs/2409.16143v1

Compressor summary: The paper presents a dataset of faces seen in random objects, studies human and machine face detection, and proposes a statistical model of pareidolia.

HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection

http://arxiv.org/abs/2409.16136v1

Compressor summary: The paper proposes a method to improve object detection by highlighting fine-grained attributes in the input text for mainstream OVD models using explicit linear composition, achieving better results.

Implicit assessment of language learning during practice as accurate as explicit testing

http://arxiv.org/abs/2409.16133v1

Compressor summary: The text discusses using Item Response Theory in computer-aided language learning to assess learner proficiency efficiently and accurately through adaptive tests and exercise sessions.

VisioPhysioENet: Multimodal Engagement Detection using Visual and Physiological Signals

http://arxiv.org/abs/2409.16126v1

Compressor summary: VisioPhysioENet is a new system that uses both visual and physiological cues to accurately measure learner engagement in online education.

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

http://arxiv.org/abs/2409.16125v1

Compressor summary: The text proposes two methods for estimating AI agent capabilities on rare tasks, but both introduce bias and underestimate performance.

TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models

http://arxiv.org/abs/2409.16118v1

Compressor summary: TabEBM is a novel generative method that creates distinct EBM models for each class to generate high-quality synthetic data for data augmentation, which improves classification performance.

Self-attention as an attractor network: transient memories without backpropagation

http://arxiv.org/abs/2409.16112v1

Compressor summary: This work shows how to interpret self-attention in transformer models as a derivative of local energy terms, enabling a recurrent model without backpropagation that can learn from train and test examples.

Neuromorphic Drone Detection: an Event-RGB Multimodal Approach

http://arxiv.org/abs/2409.16099v1

Compressor summary: The text discusses the challenges of drone detection using RGB cameras and proposes a new model that combines neuromorphic and RGB data to improve detection performance.

The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems

http://arxiv.org/abs/2409.16098v1

Compressor summary: The paper proposes a platform that uses AI and Reinforcement Learning to optimize mobile health applications for various use cases, including supply chain, patient management, and capacity building, with potential benefits for resource-poor settings and efficiency improvements in general.

Exploring Hint Generation Approaches in Open-Domain Question Answering

http://arxiv.org/abs/2409.16096v1

Compressor summary: HINTQA is a novel method for preparing context for QA systems by using LLMs to generate hints about potential answers, outperforming retrieval-based and generation-based approaches.

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing

http://arxiv.org/abs/2409.16089v1

Compressor summary: The study proposes a chatbot that explains how face recognition works by combining explainable AI and natural language processing techniques, enhancing its interpretability without sacrificing accuracy.

Assessing Simplification Levels in Neural Networks: The Impact of Hyperparameter Configurations on Complexity and Sensitivity

http://arxiv.org/abs/2409.16086v1

Compressor summary: The paper studies how different hyperparameters in neural networks affect their simplicity and stability when using the MNIST dataset.

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

http://arxiv.org/abs/2409.16084v1

Compressor summary: The paper introduces MM-CamObj, a new dataset for camouflaged objects in visual-language tasks, and CamObj-Llava, an LVLM model that uses curriculum learning to improve its performance on these tasks.

GS-Net: Global Self-Attention Guided CNN for Multi-Stage Glaucoma Classification

http://arxiv.org/abs/2409.16082v1

Compressor summary: GS-Net uses global self-attention to improve multi-stage glaucoma classification from retinal fundus images, outperforming existing methods.

Open-World Object Detection with Instance Representation Learning

http://arxiv.org/abs/2409.16073v1

Compressor summary: The paper proposes a method to train an object detector that can detect novel objects and extract rich features in open-world scenarios using Vision Foundation Models, improving robustness and generalizability for various applications.

Learning with Confidence: Training Better Classifiers from Soft Labels

http://arxiv.org/abs/2409.16071v1

Compressor summary: The text discusses soft label learning, which considers uncertainty in class labels, and shows its potential benefits for classification models, especially when dealing with noisy or limited data.

Machine learning approaches for automatic defect detection in photovoltaic systems

http://arxiv.org/abs/2409.16069v1

Compressor summary: Key points: - Solar PV modules need monitoring for defects to maintain efficiency and environmental impact - Computer vision offers automatic, non-destructive and cost-effective way to detect defects - Existing approaches use deep learning-based methods, mainly convolutional neural networks - Interpretability analysis reveals focus on darker regions of images - Gaps include geometric, physics-based and interpretability aspects of models Summary: The text reviews computer vision techniques for monitoring solar PV module defects, which affect efficiency and environmental benefit. It highlights the use of deep learning, especially convolutional neural networks, and the need to address gaps in geometric, physics-based and interpretability aspects of models.

Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data

http://arxiv.org/abs/2409.16063v1

Compressor summary: The study presents a benchmark for assessing the robustness of endoscopic depth estimation models, introducing the Depth Estimation Robustness Score (DERS) to improve surgical precision and patient safety.

Generative 3D Cardiac Shape Modelling for In-Silico Trials

http://arxiv.org/abs/2409.16058v1

Compressor summary: The text proposes a deep learning method to model and generate synthetic aortic shapes using neural signed distance fields and trainable embedding vectors, trained on a dataset of aortic root meshes from CT images.

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

http://arxiv.org/abs/2409.16057v1

Compressor summary: The paper proposes a tailored framework to detect and remove backdoors in object detection models by analyzing inconsistencies between their modules.

LTNtorch: PyTorch Implementation of Logic Tensor Networks

http://arxiv.org/abs/2409.16045v1

Compressor summary: LTN is a framework that combines deep learning and logical reasoning using fuzzy logic, allowing neural models to be optimized by minimizing a loss function based on logical formulas.

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

http://arxiv.org/abs/2409.16040v1

Compressor summary: Time-MoE is a scalable and efficient deep learning architecture for time series forecasting that leverages mixture-of-experts and large-scale pre-training on Time-300B data, achieving better precision and reducing costs.

Unlocking Markets: A Multilingual Benchmark to Cross-Market Question Answering

http://arxiv.org/abs/2409.16025v1

Compressor summary: MCPQA is a new task that involves answering product questions from different markets and languages using information from other marketplaces, and a large dataset (McMarket) with over 7 million questions across 11 languages was created to evaluate this task.

Bridging Environments and Language with Rendering Functions and Vision-Language Models

http://arxiv.org/abs/2409.16024v1

Compressor summary: The paper proposes a novel method for building language-conditioned agents by first finding an environment configuration that matches the desired task description, then using a goal-conditioned policy to reach it, improving speed and quality with distilled models and multiple viewpoints.

AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

http://arxiv.org/abs/2409.16022v1

Compressor summary: This text investigates whether large language models are affected by the threshold priming effect, a cognitive bias that influences relevance judgments, and suggests considering human-like biases when designing and evaluating these models.

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs

http://arxiv.org/abs/2409.16005v1

Compressor summary: This paper proposes a novel training approach to enhance large language models' performance in automatic speech recognition by pre-training them on Pinyin embedding sequences and fine-tuning their parameters.

Unleashing the Potential of Synthetic Images: A Study on Histopathology Image Classification

http://arxiv.org/abs/2409.16002v1

Compressor summary: The text discusses using generative models and image selection methods to create realistic synthetic histopathology image patches for enhancing image classification tasks.

Artificial Human Intelligence: The role of Humans in the Development of Next Generation AI

http://arxiv.org/abs/2409.16001v1

Compressor summary: This text discusses how human intelligence, evolved over time, now interacts with artificial intelligence, shaping its development and ethical considerations for future advancements in AI.

Improvements to SDXL in NovelAI Diffusion V3

http://arxiv.org/abs/2409.15997v1

Compressor summary: The report describes modifications to SDXL for creating high-quality anime images with NovelAI Diffusion V3.

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

http://arxiv.org/abs/2409.15986v1

Compressor summary: This study examines how three common anomaly detection metrics behave under different conditions and challenges conventional understanding of their reliability and distinctiveness.

DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL

http://arxiv.org/abs/2409.15985v1

Compressor summary: The paper presents a system that converts natural language queries into SQL commands using fine-tuned models and datasets, improving data access for non-experts and achieving high accuracy.

Leveraging Unsupervised Learning for Cost-Effective Visual Anomaly Detection

http://arxiv.org/abs/2409.15980v1

Compressor summary: The study presents a low-cost visual anomaly detection system using unsupervised learning models and Raspberry Pi hardware, which achieves high accuracy with minimal data.

Finetuning LLMs for Comparative Assessment Tasks

http://arxiv.org/abs/2409.15979v1

Compressor summary: The authors propose a finetuning method for large language models to improve their efficiency and accuracy in natural language generation assessment using comparative probabilities.

Edge-device Collaborative Computing for Multi-view Classification

http://arxiv.org/abs/2409.15973v1

Compressor summary: The text discusses challenges and solutions for running deep learning at the edge, focusing on selective collaborative schemes that reduce data redundancy and improve performance metrics.

Adversarial Backdoor Defense in CLIP

http://arxiv.org/abs/2409.15968v1

Compressor summary: Adversarial Backdoor Defense (ABD) is a novel data augmentation strategy that aligns features with adversarial examples to disrupt backdoor associations and provide robust defense against multimodal backdoor attacks targeting CLIP-like models.

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

http://arxiv.org/abs/2409.15963v1

Compressor summary: The paper introduces two efficient exploration algorithms for Inverse Constrained Reinforcement Learning that reduce error and strategically constrain the exploration policy, and shows their effectiveness in different environments.

Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting

http://arxiv.org/abs/2409.15953v1

Compressor summary: The text introduces the Prompt-Aware Counting (PrACo) benchmark to evaluate the ability of vision-and-language models to understand and count objects based on natural language prompts, addressing limitations in current evaluation protocols.

Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements

http://arxiv.org/abs/2409.15949v1

Compressor summary: The paper analyzes gender bias in English song lyrics using topic modeling, clustering, and word embedding techniques, finding thematic shifts over time and varying biases across topics and genres.

TSFeatLIME: An Online User Study in Enhancing Explainability in Univariate Time Series Forecasting

http://arxiv.org/abs/2409.15950v1

Compressor summary: TSFeatLIME is an explainable AI technique for time series forecasting that uses feature integration and Euclidean distances to improve surrogate model fidelity and user understanding, especially for non-experts.

Self-supervised Shape Completion via Involution and Implicit Correspondences

http://arxiv.org/abs/2409.15939v1

Compressor summary: The paper proposes a non-adversarial self-supervised method for 3D shape completion that leverages correspondences and involves an involutory constraint on the completion function.

Automated test generation to evaluate tool-augmented LLMs as conversational AI agents

http://arxiv.org/abs/2409.15934v1

Compressor summary: The paper introduces a test generation pipeline to evaluate large language models as conversational AI agents, using a new dataset for customer support scenarios.

SLIMER-IT: Zero-Shot NER on Italian Language

http://arxiv.org/abs/2409.15933v1

Compressor summary: The paper proposes a framework for zero-shot Named Entity Recognition in Italian using instruction-tuning and demonstrates its effectiveness compared to other models.

Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge

http://arxiv.org/abs/2409.15931v1

Compressor summary: The paper proposes a method for registering noninvasive second-harmonic microscopy images with hematoxylin and eosin slides using keypoint matching and deformable registration, achieving good results in alignment and error on a challenge dataset.

Facing Asymmetry - Uncovering the Causal Link between Facial Symmetry and Expression Classifiers using Synthetic Interventions

http://arxiv.org/abs/2409.15927v1

Compressor summary: The text studies how facial symmetry affects the performance of black box models in recognizing expressions and shows that reduced symmetry lowers output activations for all investigated classifiers.

Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain

http://arxiv.org/abs/2409.15924v1

Compressor summary: The article describes Huawei's participation and strategies for translating Spanish into three low-resource languages at WMT 2024.

Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

http://arxiv.org/abs/2409.15922v1

Compressor summary: VLM-based reward models can improve agent performance in sparse reward environments, but only if they use the novel noise-resilient reward function BiMI to avoid false positive rewards.

Learning Compact Channel Correlation Representation for LiDAR Place Recognition

http://arxiv.org/abs/2409.15919v1

Compressor summary: The paper proposes C3R, a method that learns compact channel correlation representation for LiDAR place recognition, reducing computation and improving accuracy.

Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts

http://arxiv.org/abs/2409.15915v1

Compressor summary: The text proposes a novel approach to improve natural language planning with large language models by generating multiple action schemas and ranking them without expert intervention.

Explaining word embeddings with perfect fidelity: Case study in research impact prediction

http://arxiv.org/abs/2409.15912v1

Compressor summary: SMER is a new feature importance method for word embedding-based models that provides perfect fidelity and better explanations than LIME for predicting impactful research articles.

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

http://arxiv.org/abs/2409.15911v1

Compressor summary: The paper proposes a new method called Modular Gradient Conflict Mitigation (MGCM) that improves simultaneous speech translation performance and reduces GPU memory consumption by resolving optimization conflicts at a modular level using gradient projection.

Enhancing IoT based Plant Health Monitoring through Advanced Human Plant Interaction using Large Language Models and Mobile Applications

http://arxiv.org/abs/2409.15910v1

Compressor summary: The paper describes a new app that lets plants "talk" to humans using soil sensors, AI language models, and real-time data to provide insights on their health and mood, improving plant care and promoting sustainability.

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

http://arxiv.org/abs/2409.15907v1

Compressor summary: The paper proposes a method to improve LLMs' performance in Text-to-SQL tasks by injecting domain-specific database knowledge, which reduces errors and shows generalizability.

Unimotion: Unifying 3D Human Motion Synthesis and Understanding

http://arxiv.org/abs/2409.15904v1

Compressor summary: Unimotion is a novel human motion model that enables flexible control of avatar motion with global or local text inputs and provides frame-level text and pose outputs for various applications.

Five questions and answers about artificial intelligence

http://arxiv.org/abs/2409.15903v1

Compressor summary: This paper discusses the origins, future, emotions, risks, and singularity of Artificial Intelligence amidst society's controversy and fear.

Konstruktor: A Strong Baseline for Simple Knowledge Graph Question Answering

http://arxiv.org/abs/2409.15902v1

Compressor summary: Konstruktor is an approach that uses structured knowledge graphs to answer simple questions with complex entities, integrating language models and knowledge graphs for entity extraction, relation prediction, and querying the knowledge graph.

Unsupervised Attention Regularization Based Domain Adaptation for Oracle Character Recognition

http://arxiv.org/abs/2409.15893v1

Compressor summary: Key points: - The paper proposes a new unsupervised domain adaptation method for oracle character recognition - The method considers visual perceptual plausibility and enforces attention consistency and separability - The method outperforms previous methods on Oracle-241 dataset by 8.5% Summary: The paper introduces a novel unsupervised domain adaptation method for oracle character recognition that improves interpretability and performance by considering visual perceptual plausibility and attention consistency and separability, achieving state-of-the-art results on Oracle-241 dataset.

Symmetries and Expressive Requirements for Learning General Policies

http://arxiv.org/abs/2409.15892v1

Compressor summary: The paper explores how state symmetries affect planning and generalized planning, and evaluates the expressive requirements for learning general policies using different methods.

HLB: Benchmarking LLMs' Humanlikeness in Language Use

http://arxiv.org/abs/2409.15890v1

Compressor summary: This paper presents a benchmark to evaluate how well language models mimic human communication using 10 psycholinguistic experiments and human responses.

Self-Supervised Graph Embedding Clustering

http://arxiv.org/abs/2409.15887v1

Compressor summary: The proposed framework combines manifold learning with K-means to achieve one-step dimensionality reduction clustering without hyperparameters or class imbalance issues.

Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization

http://arxiv.org/abs/2409.15882v1

Compressor summary: The article proposes a new method to anonymize speech by disentangling its components and modifying speaker identity while keeping linguistic and emotional content intact, which works well for preserving emotions but needs improvement for other privacy tasks.

Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning

http://arxiv.org/abs/2409.15879v1

Compressor summary: Key points: - HW-TSC submitted to WMT24 Indian Languages MT Shared Task - Employed two knowledge transfer strategies for four low-resource languages: Assamese, Manipuri, Khasi, and Mizo - Achieved impressive BLEU scores using fine-tuning and multilingual model transfer learning Summary: HW-TSC used knowledge transfer techniques to improve machine translation for four low-resource Indian languages, achieving high BLEU scores.

Zero-Shot Detection of AI-Generated Images

http://arxiv.org/abs/2409.15875v1

Compressor summary: The ZED detector uses a lossless image encoder to measure the surprise of AI-generated images compared to a model of real images, without needing training data or knowledge of generative architectures.

Privacy Evaluation Benchmarks for NLP Models

http://arxiv.org/abs/2409.15868v1

Compressor summary: The paper presents a benchmark to assess privacy risks in NLP models, studies the impact of auxiliary data on attacks, and proposes an improved attack method using Knowledge Distillation and a chained framework for multiple attacks.

In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

http://arxiv.org/abs/2409.15867v1

Compressor summary: In-context ensemble learning improves video-language models' ability to generate Standard Operating Procedures from demonstration videos.

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

http://arxiv.org/abs/2409.15861v1

Compressor summary: The authors propose a zero-shot, open-vocabulary system for dialogue state tracking that integrates domain classification and refines question-answering tasks, achieving better performance on Multi-WOZ 2.1 dataset with fewer LLM API requests.

iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

http://arxiv.org/abs/2409.15848v1

Compressor summary: The paper proposes using visual analytics to guide the creation of synthetic data for text classification, addressing data deficiencies and improving model accuracy.

From Passive Watching to Active Learning: Empowering Proactive Participation in Digital Classrooms with AI Video Assistant

http://arxiv.org/abs/2409.15843v1

Compressor summary: SAM is an innovative online education platform that uses AI to provide personalized, context-specific assistance and improve learning outcomes.

Deep Learning Techniques for Automatic Lateral X-ray Cephalometric Landmark Detection: Is the Problem Solved?

http://arxiv.org/abs/2409.15834v1

Compressor summary: The paper introduces a large dataset for cephalometric landmark detection and evaluates state-of-the-art deep learning methods, achieving high accuracy with room for improvement.

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

http://arxiv.org/abs/2409.15827v1

Compressor summary: This study explores how large language models capture linguistic aspects using psycholinguistic paradigms and reveals that specific neurons correspond to different competencies.

Empirical Insights on Fine-Tuning Large Language Models for Question-Answering

http://arxiv.org/abs/2409.15825v1

Compressor summary: The text explores how to effectively fine-tune large language models for question-answering tasks by categorizing and analyzing supervised fine-tuning data based on memorized knowledge.

Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

http://arxiv.org/abs/2409.15820v1

Compressor summary: This paper investigates how large language models (LLMs) adapt to complex tasks with scarce data and proposes a gradient-based method to improve their efficiency and effectiveness using attention patterns.

SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents

http://arxiv.org/abs/2409.15817v1

Compressor summary: The text describes how artificial intelligence systems, especially Large Language Models (LLMs), can be used to improve drug discovery by enhancing their accuracy, incorporating external tools, and generating target dossiers.

AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Support

http://arxiv.org/abs/2409.15815v1

Compressor summary: AsthmaBot is a multi-lingual, multi-modal system that uses large language models and curated documents to provide accurate and interactive asthma support, especially in developing countries with limited medical care access.

Aided design of bridge aesthetics based on Stable Diffusion fine-tuning

http://arxiv.org/abs/2409.15812v1

Compressor summary: Stable Diffusion is fine-tuned using four methods to assist bridge-type innovation, generating new and inspiring designs for human designers.

Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification

http://arxiv.org/abs/2409.15810v1

Compressor summary: HyperIPC is a hyperbolic contrastive learning method that improves object classification and few-shot learning for multi-modal data by exploring intra-modal and cross-modal correlations in hyperbolic space.

CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation

http://arxiv.org/abs/2409.15806v1

Compressor summary: The paper proposes a pre-training method for state representations that improves performance in reinforcement learning, navigation, and multimodal language models.

NER-Luxury: Named entity recognition for the fashion and luxury domain

http://arxiv.org/abs/2409.15804v1

Compressor summary: The study develops a named-entity recognition model for the fashion and luxury industry, addressing challenges such as entity disambiguation, French jargon, and diverse company structures.

3D-JEPA: A Joint Embedding Predictive Architecture for 3D Self-Supervised Representation Learning

http://arxiv.org/abs/2409.15803v1

Compressor summary: 3D-JEPA is a non-generative 3D self-supervised representation learning framework that uses multi-block sampling and context-aware decoding to improve semantic modeling and efficiency on downstream tasks.

Towards Universal Large-Scale Foundational Model for Natural Gas Demand Forecasting

http://arxiv.org/abs/2409.15794v1

Compressor summary: The paper proposes a foundation model for natural gas demand forecasting that leverages contrastive learning and industry-specific fine-tuning, outperforming existing methods in accuracy.

Small Language Models: Survey, Measurements, and Insights

http://arxiv.org/abs/2409.15790v1

Compressor summary: The text is a survey of 59 small language models, analyzing their innovations and performance for on-device tasks.

Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

http://arxiv.org/abs/2409.15781v1

Compressor summary: The proposed method detects unauthorized use of text-to-image models by exploiting their memorization characteristic and identifying consistent behaviors on specific samples.

Zero-shot forecasting of chaotic systems

http://arxiv.org/abs/2409.15771v1

Compressor summary: Foundation models can make competitive forecasts of chaotic systems without explicit re-training or fine-tuning, preserving their long-term behavior even when point forecasts fail.

CHBench: A Chinese Dataset for Evaluating Health in Large Language Models

http://arxiv.org/abs/2409.15766v1

Compressor summary: CHBench is a benchmark dataset to evaluate the performance of large language models on health-related topics in Chinese, revealing their limitations and potential for improvement.

Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction

http://arxiv.org/abs/2409.15764v1

Compressor summary: Key points: - ST-MoGE framework for collective multiple-type crime prediction - attentive-gated MGEs to capture diverse and shared crime patterns - CECL to reduce blending and redundancy among experts - HALR to address imbalanced spatial distribution Summary: The paper proposes a novel framework that uses attention, contrastive learning, and loss re-weighting to predict multiple types of crimes with diverse and shared patterns and balanced spatial distribution.

XTRUST: On the Multilingual Trustworthiness of Large Language Models

http://arxiv.org/abs/2409.15762v1

Compressor summary: XTRUST is a benchmark to evaluate the trustworthiness of large language models across 10 languages and various topics, revealing their strengths and weaknesses.

TFG: Unified Training-Free Guidance for Diffusion Models

http://arxiv.org/abs/2409.15761v1

Compressor summary: The paper presents a new algorithmic framework for training-free guidance in conditional generation, improving performance across different diffusion models and tasks.

Development and Validation of Heparin Dosing Policies Using an Offline Reinforcement Learning Algorithm

http://arxiv.org/abs/2409.15753v1

Compressor summary: The study proposes an AI system that optimizes heparin dosing for ICU patients based on their individual conditions using reinforcement learning, improving safety and efficacy.

The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles

http://arxiv.org/abs/2409.15750v1

Compressor summary: The paper surveys generative artificial intelligence (GenAI) applications in the Internet of electric vehicles (IoEV), categorizing them into four layers and providing recommendations for future research.

Automated Assessment of Multimodal Answer Sheets in the STEM domain

http://arxiv.org/abs/2409.15749v1

Compressor summary: This paper presents an automated grading system using AI for STEM subjects that evaluates textual answers and diagrams, such as flowcharts, by converting them into textual representations and comparing them with sample answers.

Training Neural Networks for Modularity aids Interpretability

http://arxiv.org/abs/2409.15747v1

Compressor summary: The authors propose a method to make neural networks more interpretable by splitting them into non-interacting clusters that learn different parts of the task.

Real-Time Pedestrian Detection on IoT Edge Devices: A Lightweight Deep Learning Approach

http://arxiv.org/abs/2409.15740v1

Compressor summary: The text describes using lightweight deep learning techniques on edge servers for real-time pedestrian detection in intelligent transportation systems.

Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint

http://arxiv.org/abs/2409.15739v1

Compressor summary: T3-DiffWeather is a novel pipeline that uses a prompt pool to adaptively handle intricate weather degradations and achieve state-of-the-art performance in adverse weather restoration.

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens

http://arxiv.org/abs/2409.15732v1

Compressor summary: Key points: - Multi-speaker challenges in real-world scenarios - Attention-based encoder-decoder method with speaker cluster tokens - Clustering hypotheses by AHC based on edit distance - Effective method for complex 3-mix environments Summary: The paper proposes an attention-based encoder-decoder method that uses speaker cluster tokens and hierarchical clustering to transcribe multi-speaker utterances in real-world scenarios, especially for complex 3-mix environments.

Applying Incremental Learning in Binary-Addition-Tree Algorithm for Dynamic Binary-State Network Reliability

http://arxiv.org/abs/2409.15721v1

Compressor summary: The paper proposes a new version of the Binary-Addition-Tree algorithm that uses incremental learning to adapt to dynamic and large-scale networks, improving efficiency and solution quality.

Disentangled Generation and Aggregation for Robust Radiance Fields

http://arxiv.org/abs/2409.15715v1

Compressor summary: The paper proposes a new method to improve triplane-based radiance fields for 3D scene disentanglement with better camera pose estimation and faster optimization.

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation

http://arxiv.org/abs/2409.15699v1

Compressor summary: FlexRAG is a novel approach that compresses retrieved contexts into embeddings to enhance question-answering performance while reducing costs.

GraphGI:A GNN Explanation Method using Game Interaction

http://arxiv.org/abs/2409.15698v1

Compressor summary: GraphGI is a novel graph explanation technique that identifies and presents the subgraph with the highest interaction strength to explain GNN predictions using game-theoretic values.

A Survey of Stance Detection on Social Media: New Directions and Perspectives

http://arxiv.org/abs/2409.15690v1

Compressor summary: The paper surveys stance detection techniques that automatically detect users' opinions on contentious topics in social media, discussing their benefits and limitations for understanding public sentiment.

Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB

http://arxiv.org/abs/2409.15689v1

Compressor summary: The paper introduces a novel compact 3D representation called PPNG that encodes a scene from 2D images and allows real-time rendering on various platforms.

A Comprehensive Evaluation of Large Language Models on Mental Illnesses

http://arxiv.org/abs/2409.15687v1

Compressor summary: The study evaluates large language models' abilities in mental health tasks using social media data and finds that prompt engineering and few-shot learning improve their accuracy, while highlighting challenges such as dataset variability and ethical concerns.

Linear Contextual Bandits with Interference

http://arxiv.org/abs/2409.15682v1

Compressor summary: The paper introduces a framework to address interference in linear contextual bandits, providing algorithms with theoretical guarantees for sublinear regret and other properties.

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

http://arxiv.org/abs/2409.15680v1

Compressor summary: Key points: - The paper studies online bandit optimization with nonconvex loss functions over a time-varying digraph - It proposes a novel one-point residual feedback algorithm that estimates gradient using two-points and reduces regret bound - It uses dynamic regret to evaluate the algorithm's performance and shows it is comparable to existing algorithms Summary: The paper introduces a new online bandit optimization algorithm that uses one-point residual feedback to estimate gradient and minimize nonconvex loss functions over a time-varying digraph, with reduced regret bound and similar performance to existing methods.

PDT: Uav Target Detection Dataset for Pests and Diseases Tree

http://arxiv.org/abs/2409.15679v1

Compressor summary: The authors create new datasets for UAV-based pest and disease detection in crops and develop a high-precision object detection model called YOLO-Dense Pest.

Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint

http://arxiv.org/abs/2409.15664v1

Compressor summary: ORACLE is a new method to improve cross-lingual sentence embeddings by reducing semantic leakage and enhancing semantic alignment using a novel training objective.

Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer for Stock Time Series Forecasting

http://arxiv.org/abs/2409.15662v1

Compressor summary: DPA-STIFormer is a new STGNN that extracts dynamic spatial information from stock data using continuous feature changes as tokens and a novel double-path fusion mechanism, achieving state-of-the-art results in stock prediction tasks.

MMPT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

http://arxiv.org/abs/2409.15657v1

Compressor summary: The paper proposes Multimodal Prompt Tuning, a method that finetunes pretrained models on multimodal tasks using visual and textual prompts, improving zero-shot generalization for unseen domains.

English offensive text detection using CNN based Bi-GRU model

http://arxiv.org/abs/2409.15652v1

Compressor summary: The authors propose a new model that uses both Bi-GRU and CNN to automatically identify and filter inappropriate content on social media platforms.

ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models

http://arxiv.org/abs/2409.15650v1

Compressor summary: ImPoster is an unsupervised algorithm that generates target images of a source subject performing a driving action using text descriptions, pretrained latent diffusion models, and image frequency guidance.

Looped Transformers for Length Generalization

http://arxiv.org/abs/2409.15647v1

Compressor summary: Looped Transformers with adaptive steps improve the ability of AI models to handle inputs of different lengths, especially for tasks requiring multiple iterations.

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

http://arxiv.org/abs/2409.15637v1

Compressor summary: Synatra uses indirect knowledge, such as online tutorials, to create direct supervision for large language models, improving their performance on web-based tasks and reducing data costs.

Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

http://arxiv.org/abs/2409.15631v1

Compressor summary: The article proposes a framework to improve learning performance data by augmenting it using tensor factorization and generative AI, addressing data sparsity issues in adaptive learning systems.