arxiv compressed, 2024-07-04

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-04 generated by the compressor, my personal LLM-based project.


Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

http://arxiv.org/abs/2407.03321v1

Compressor summary: The paper introduces \benchmarkName, a new benchmark to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks, addressing challenges in existing evaluation methods.


InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

http://arxiv.org/abs/2407.03320v1

Compressor summary: The text introduces IXC-2.5, a large-vision language model with long-context capabilities and improved vision-language comprehension for various applications, achieving GPT-4V level performance with less resources.


BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

http://arxiv.org/abs/2407.03314v1

Compressor summary: The paper introduces BACON, a graph structure that enhances vision language models' abilities in various tasks by simplifying complex visual scenes into basic elements.


Universal Length Generalization with Turing Programs

http://arxiv.org/abs/2407.03310v1

Compressor summary: Turing Programs enable length generalization on various algorithmic tasks by decomposing them into steps resembling a Turing Machine's computation.


A Review of the Applications of Deep Learning-Based Emergent Communication

http://arxiv.org/abs/2407.03302v1

Compressor summary: This paper reviews how emergent communication research can be applied to various fields like machine learning and linguistics by studying how language-like systems arise in multi-agent reinforcement learning environments.


DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

http://arxiv.org/abs/2407.03300v1

Compressor summary: DisCo-Diff models introduce discrete latent variables to simplify the learning process of diffusion models and improve their performance on various tasks, including image synthesis and molecular docking.


Improved Noise Schedule for Diffusion Training

http://arxiv.org/abs/2407.03297v1

Compressor summary: The paper proposes a new way to adjust the noise in diffusion models during training to improve efficiency and performance.


Biomechanics-informed Non-rigid Medical Image Registration and its Inverse Material Property Estimation with Linear and Nonlinear Elasticity

http://arxiv.org/abs/2407.03292v1

Compressor summary: The paper presents a biomechanical-constrained non-rigid medical image registration algorithm using physics-informed neural networks (PINNs) that generalizes linear elasticity to nonlinear models and solves the inverse parameter estimation problem under PINNs.


VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation

http://arxiv.org/abs/2407.03291v1

Compressor summary: VCHAR is a novel framework for recognizing complex human activities in smart environments without precise labeling of atomic activities, providing video-based explanations for better user comprehension.


LLM Internal States Reveal Hallucination Risk Faced With a Query

http://arxiv.org/abs/2407.03282v1

Compressor summary: The paper explores how Large Language Models can estimate their own hallucination risk before generating responses and analyzes the internal mechanisms involved in this process.


Evaluating Automatic Metrics with Incremental Machine Translation Systems

http://arxiv.org/abs/2407.03277v1

Compressor summary: The text describes a dataset of commercial translations from 12 directions over six years, which can be used to evaluate MT metrics based on their preference for newer translations.


For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

http://arxiv.org/abs/2407.03268v1

Compressor summary: FRESCO is a framework to study how images on social media platforms affect society by analyzing them at three levels and using a new metric called the FRESCO score.


A Unified Framework for 3D Scene Understanding

http://arxiv.org/abs/2407.03263v1

Compressor summary: UniSeg3D is a 3D segmentation framework that handles six tasks with one model, enhancing understanding of 3D scenes by sharing knowledge across tasks.


Magnetic Hysteresis Modeling with Neural Operators

http://arxiv.org/abs/2407.03261v1

Compressor summary: The paper proposes neural operators for modeling magnetic hysteresis and shows they outperform traditional methods and generalize well to novel input fields.


Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later

http://arxiv.org/abs/2407.03257v1

Compressor summary: The paper proposes ModernNCA, a modified version of Neighborhood Component Analysis, which improves semantic similarity learning for tabular data and surpasses most deep tabular models in accuracy and efficiency.


STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data

http://arxiv.org/abs/2407.03253v1

Compressor summary: Key points: - The text proposes Sentence Transformers Fine-tuning (STF), a system for classifying topics from tweets using pretrained models and fine-tuning. - STF outperforms state-of-the-art approaches and does not need much labeled data. - The main contribution is the application of pretrained sentence transformers language models. Summary: The text introduces STF, a system that uses pretrained models and fine-tuning to classify tweet topics accurately, overcoming the limitations of existing methods.


ACTRESS: Active Retraining for Semi-supervised Visual Grounding

http://arxiv.org/abs/2407.03251v1

Compressor summary: ACTRESS is a new approach for semi-supervised visual grounding that uses active sampling and selective retraining to improve performance with sparse labeled data.


Visual Grounding with Attention-Driven Constraint Balancing

http://arxiv.org/abs/2407.03243v1

Compressor summary: The paper proposes Attention-Driven Constraint Balancing (AttBalance), a framework that improves visual grounding tasks using transformer-based models and attention mechanisms, achieving state-of-the-art results.


Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking

http://arxiv.org/abs/2407.03240v1

Compressor summary: The paper proposes a cyclic learning mechanism for multi-view 3D detection and tracking tasks that suppresses irrelevant regions in historical frames and improves object awareness, resulting in consistent performance gains over baselines.


CATT: Character-based Arabic Tashkeel Transformer

http://arxiv.org/abs/2407.03236v1

Compressor summary: The paper presents a new approach to train ATD models, which improve Arabic text comprehension and processing by using finetuned transformers and the Noisy-Student method, achieving state-of-the-art results.


Self-Evaluation as a Defense Against Adversarial Attacks on LLMs

http://arxiv.org/abs/2407.03234v1

Compressor summary: The study shows that adding a space to LLMs' inputs can cause them to generate unsafe or biased outputs, highlighting the need for better alignment methods.


Single Character Perturbations Break LLM Alignment

http://arxiv.org/abs/2407.03232v1

Compressor summary: A study shows that adding a space to input can trick LLMs into generating unsafe outputs, highlighting the need for better model alignment.


Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

http://arxiv.org/abs/2407.03227v1

Compressor summary: The text proposes a method to improve Text-to-SQL semantic parsing using Large Language Models and in-context learning with few-shot examples and approximate SQL query generation.


MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI

http://arxiv.org/abs/2407.03217v1

Compressor summary: MHNet is a novel deep learning model that uses hierarchical and high-order features from multi-view brain functional networks derived from rs-fMRI data for neurodevelopmental disorder prediction, outperforming state-of-the-art methods.


Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers

http://arxiv.org/abs/2407.03216v1

Compressor summary: The paper proposes a novel object-centric model that learns disentangled representations and discovers blocks to predict dynamic visual states, achieving better accuracy and interpretability in various settings.


How Does Quantization Affect Multilingual LLMs?

http://arxiv.org/abs/2407.03211v1

Compressor summary: Quantization affects multilingual LLMs differently and negatively impacts non-Latin script languages and challenging tasks.


Combining AI Control Systems and Human Decision Support via Robustness and Criticality

http://arxiv.org/abs/2407.03210v1

Compressor summary: The text discusses using adversarial explanations and autoencoders to improve AI decision-making, robustness, and human oversight in real-world applications.


Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

http://arxiv.org/abs/2407.03205v1

Compressor summary: The text introduces a method to detect oriented objects in aerial images using a complex plane OBB representation, a conformer RPN head, and a category-aware dynamic label assignment.


Expressive Gaussian Human Avatars from Monocular RGB Video

http://arxiv.org/abs/2407.03204v1

Compressor summary: EVA is a method to create more realistic and expressive digital human avatars from monocular video by combining a sculpted 3D model with SMPL-X and improving alignment, density control, and confidence prediction.


SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

http://arxiv.org/abs/2407.03200v1

Compressor summary: The paper proposes SegVG, a method that uses segmentation signals from box annotations for Visual Grounding and mitigates domain discrepancy with Triple Alignment module, achieving SOTA performance.


DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

http://arxiv.org/abs/2407.03197v1

Compressor summary: The paper proposes a dynamic feature aggregation module for temporal action detection models that adapts kernel weights and receptive fields at different timestamps, improving performance on various benchmarks.


Prediction Instability in Machine Learning Ensembles

http://arxiv.org/abs/2407.03194v1

Compressor summary: The paper proves that ensembles have prediction instabilities and suggests balancing information use with risk management.


Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

http://arxiv.org/abs/2407.03185v1

Compressor summary: The authors propose a transformer-based time series forecasting method that uses multiple resolutions, cross-series information, and novel modules to improve performance on a real-world pricing problem.


Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

http://arxiv.org/abs/2407.03181v1

Compressor summary: The Divergent CoT (DCoT) method improves the performance of large language models by requiring them to generate multiple divergent reasoning chains in a single inference step, enabling self-correction.


Motion meets Attention: Video Motion Prompts

http://arxiv.org/abs/2407.03179v1

Compressor summary: The paper proposes a modified Sigmoid function with learnable parameters as an attention mechanism to highlight salient motion features in videos for action recognition tasks.


IMC 2024 Methods & Solutions Review

http://arxiv.org/abs/2407.03172v1

Compressor summary: The paper presents an advanced ensemble technique for 3D image reconstruction from 2D images, developed by the authors who participated in Kaggle's Image Matching Challenge and conducted a review of top-performing methods.


Investigating Decoder-only Large Language Models for Speech-to-text Translation

http://arxiv.org/abs/2407.03169v1

Compressor summary: The paper proposes a decoder-only architecture for speech-to-text translation using large language models and shows its effectiveness on two benchmarks, while analyzing the impact of different fine-tuning techniques and task formulation.


LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

http://arxiv.org/abs/2407.03168v1

Compressor summary: LivePortrait is a video-driven portrait animation framework that uses implicit keypoints to create lifelike videos from single images, with improved efficiency and controllability compared to diffusion-based methods.


Consistent Point Orientation for Manifold Surfaces via Boundary Integration

http://arxiv.org/abs/2407.03165v1

Compressor summary: The paper presents a method to generate consistent normals for point clouds using a boundary energy derived from the Dirichlet energy of the generalized winding number field, which improves robustness to noise and complex structures.


Global Context Modeling in YOLOv8 for Pediatric Wrist Fracture Detection

http://arxiv.org/abs/2407.03163v1

Compressor summary: The paper proposes a YOLOv8 model with a Global Context block that improves fracture detection and reaches state-of-the-art performance on a wrist X-ray dataset.


Let the Code LLM Edit Itself When You Edit the Code

http://arxiv.org/abs/2407.03157v1

Compressor summary: The paper proposes Positional Integrity Encoding (PIE) for efficient and accurate code prediction in real-time editing scenarios, reducing computational overhead by over 85%.


Reinforcement Learning for Sequence Design Leveraging Protein Language Models

http://arxiv.org/abs/2407.03154v1

Compressor summary: The authors propose using protein language models as a reward function to generate new protein sequences with reinforcement learning, while periodically finetuning a proxy model.


Stereo Risk: A Continuous Modeling Approach to Stereo Matching

http://arxiv.org/abs/2407.03152v1

Compressor summary: Stereo Risk is a new deep-learning method for stereo matching that uses continuous risk minimization to estimate scene depth better than existing methods.


Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data

http://arxiv.org/abs/2407.03145v1

Compressor summary: The paper proposes a two-phase training method for language translation using large pre-trained models, which improves accuracy for aligned source and target sentence orders and spoken language, especially with added tags and interleaved sentences.


Machine Learning Models for Improved Tracking from Range-Doppler Map Images

http://arxiv.org/abs/2407.03140v1

Compressor summary: The authors propose machine learning models for target detection and uncertainty estimation in RDM images to improve GMTI radar tracking performance.


Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

http://arxiv.org/abs/2407.03130v1

Compressor summary: ADClick is a novel interactive image segmentation algorithm that generates anomaly masks for real defective images with high accuracy, using only a few manual clicks per image and innovative residual features and language prompts.


Social Bias Evaluation for Large Language Models Requires Prompt Variations

http://arxiv.org/abs/2407.03129v1

Compressor summary: The paper explores how different prompts affect large language models' social biases and task performance, showing that they are highly sensitive to prompts and have tradeoffs between them.


Foundations and Frontiers of Graph Learning Theory

http://arxiv.org/abs/2407.03125v1

Compressor summary: The article summarizes the theoretical foundations and recent advances in graph learning models, focusing on their expressiveness, generalization, optimization, and unique phenomena.


Can machine learning solve the challenge of adaptive learning and the individualization of learning paths? A field experiment in an online learning platform

http://arxiv.org/abs/2407.03118v1

Compressor summary: The authors test an algorithm for personalized learning paths using convolutional neural networks on a large digital self-learning platform, and find that it does not significantly improve learners' effort or performance compared to group-based or individual non-adaptive treatments.


$L_p$-norm Distortion-Efficient Adversarial Attack

http://arxiv.org/abs/2407.03115v1

Compressor summary: The paper proposes a new adversarial attack method that reduces $L_0$-norm distortion while maintaining low $L_2$-norm loss, making the perturbations sparse and imperceptible to humans.


How Reliable and Stable are Explanations of XAI Methods?

http://arxiv.org/abs/2407.03108v1

Compressor summary: This paper evaluates the reliability and stability of various explainable AI (XAI) methods using a diabetes dataset and four machine learning models, finding eXirt as the most reliable XAI method and all others sensitive to perturbations except one.


Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

http://arxiv.org/abs/2407.03106v1

Compressor summary: The paper proposes a new loss function called Anti-Collapse Loss for deep metric learning that improves feature representation, avoids embedding space collapse, and enhances model performance.


On Generalization for Generative Flow Networks

http://arxiv.org/abs/2407.03105v1

Compressor summary: GFlowNets are a learning paradigm that samples from an unnormalized probability distribution and can learn complex patterns, but the paper investigates how they generalize to novel, longer trajectories.


KeyVideoLLM: Towards Large-scale Video Keyframe Selection

http://arxiv.org/abs/2407.03104v1

Compressor summary: KeyVideoLLM is a method to efficiently select keyframes from videos for large language models, improving data management, speed, and video question-answering performance.


Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

http://arxiv.org/abs/2407.03103v1

Compressor summary: Cactus is a realistic dialogue dataset for training open-source language models as psychological counselors using Cognitive Behavioral Therapy techniques.


Conformal Prediction for Causal Effects of Continuous Treatments

http://arxiv.org/abs/2407.03094v1

Compressor summary: The paper proposes a new method for predicting causal effects of continuous treatments using conformal prediction, accounting for uncertainty in propensity score estimation, and demonstrates its effectiveness on synthetic and real datasets.


Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations

http://arxiv.org/abs/2407.03082v1

Compressor summary: The paper proposes a new framework (SBRL-HAP) for estimating treatment effects that works well both on in-distribution and out-of-distribution data, addressing selection bias and distribution shift issues.


Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

http://arxiv.org/abs/2407.03080v1

Compressor summary: The paper proposes a novel way to generate realistic synthetic tabular data using Deep Generative Models with artificial inductive bias from transfer learning and meta-learning techniques, improving the quality of the synthetic data in limited real-data environments.


A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning

http://arxiv.org/abs/2407.03076v1

Compressor summary: The paper investigates multi-task learning for document-level neural machine translation to make the model sensitive to the choice of context and shows better performance in low-resource settings, but struggles with generating source from context.


Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes

http://arxiv.org/abs/2407.03065v1

Compressor summary: The paper proposes a policy optimization algorithm that eliminates a warm-up phase and achieves rate-optimal regret in different reinforcement learning settings.


ALTER: Augmentation for Large-Table-Based Reasoning

http://arxiv.org/abs/2407.03061v1

Compressor summary: ALTER is a framework that enhances large language models' table-based reasoning by augmenting NL questions and tables with relevant information.


FairJob: A Real-World Dataset for Fairness in Online Systems

http://arxiv.org/abs/2407.03059v1

Compressor summary: The text introduces a fairness-aware dataset for job recommendation in advertising, which preserves predictive power and addresses the challenge of balancing fairness and utility in high-impact domains.


Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

http://arxiv.org/abs/2407.03056v1

Compressor summary: Key points: - KDPL is a novel approach to prompt learning based on unsupervised knowledge distillation from more powerful models - It eliminates the need for labeled examples during adaptation and improves generalization of learned prompts - It can transfer knowledge even without knowing training class names Summary: KDPL is an unsupervised method that learns to adapt prompts from stronger models, enhancing zero-shot generalization and transferability without labels.


Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

http://arxiv.org/abs/2407.03051v1

Compressor summary: The paper proposes QDPO, a new technique that aligns quantized large language models with their full-precision versions, improving chatbot performance and efficiency.


Enhancements for Real-Time Monte-Carlo Tree Search in General Video Game Playing

http://arxiv.org/abs/2407.03049v1

Compressor summary: The paper proposes eight enhancements for Monte-Carlo Tree Search (MCTS) in General Video Game Playing (GVGP), which improve win percentages and approach competitive levels with existing agents.


SlerpFace: Face Template Protection via Spherical Linear Interpolation

http://arxiv.org/abs/2407.03043v1

Compressor summary: The paper proposes SlerpFace, a novel face template protection technique that rotates and drops out features of face templates to prevent identity-preserving synthetic face image attacks using diffusion models.


Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

http://arxiv.org/abs/2407.03040v1

Compressor summary: The R2S framework uses CoD-Chain of Dialogue logic to guide LLMs in generating knowledge-intensive multi-turn dialogues for instruction tuning, covering diverse domains like Wikipedia (English), Science (Chinese), and Artifacts (Chinese).


SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

http://arxiv.org/abs/2407.03036v1

Compressor summary: SAFT is a simple method that improves CLIP's performance on out-of-distribution data by only updating important parameters during fine-tuning.


ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation

http://arxiv.org/abs/2407.03033v1

Compressor summary: The text proposes a new method called ISWSST that uses quantum mechanics ideas to improve semantic segmentation of multispectral imagery, addressing several issues in the current approaches and achieving better accuracy.


Strategies for Arabic Readability Modeling

http://arxiv.org/abs/2407.03032v1

Compressor summary: The paper presents experimental results on Arabic readability assessment using various methods, achieving good scores by combining different techniques.


Exploiting Dialect Identification in Automatic Dialectal Text Normalization

http://arxiv.org/abs/2407.03020v1

Compressor summary: The paper introduces the task of CODAfication, which normalizes Dialectal Arabic into a standardized written form, and presents new models and methods to improve its performance.


An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis

http://arxiv.org/abs/2407.03018v1

Compressor summary: The paper introduces GeCA, a novel generative model inspired by biological evolution that improves retinal disease classification in Fundus and OCT images.


Context-Aware Video Instance Segmentation

http://arxiv.org/abs/2407.03010v1

Compressor summary: The paper presents CAVIS, a framework that uses contextual information to improve object tracking and instance matching in video segmentation tasks, achieving state-of-the-art results, especially on difficult videos.


Model Guidance via Explanations Turns Image Classifiers into Segmentation Models

http://arxiv.org/abs/2407.03009v1

Compressor summary: The text discusses how heatmaps from image classification networks can be used for weakly supervised segmentation and improved interpretability, and proposes a novel semi-supervised segmentation method using differentiable heatmap architectures.


Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

http://arxiv.org/abs/2407.03008v1

Compressor summary: The paper proposes a model-agnostic framework for Video Question-Answering that enhances compositional reasoning by integrating video aligner and answer aggregator modules, and evaluates it on various datasets using new metrics and an automatic question decomposition pipeline.


What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

http://arxiv.org/abs/2407.03007v1

Compressor summary: This paper investigates how different factors affect the performance of tool learning methods in large language models and offers insights for improving their practical use.


Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

http://arxiv.org/abs/2407.03006v1

Compressor summary: The paper presents FCDiffusion, a diffusion-based framework for text-guided image-to-image translation using Discrete Cosine Transform to filter latent features and control different aspects of the translation.


Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0

http://arxiv.org/abs/2407.03005v1

Compressor summary: The study examines how Wav2Vec2, a deep neural speech model, processes and resolves phonotactic constraints in ambiguous sounds and finds that this ability emerges early in the model's Transformer module.


SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research

http://arxiv.org/abs/2407.03004v1

Compressor summary: The study semioLLM evaluates how well large language models can diagnose epilepsy using text descriptions of seizures, revealing both their strengths and limitations for clinical applications.


VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values

http://arxiv.org/abs/2407.03000v1

Compressor summary: VIVA is a benchmark that tests vision-language models' ability to use human values to make decisions in real-world situations, revealing their limitations and potential improvements.


Are Large Language Models Consistent over Value-laden Questions?

http://arxiv.org/abs/2407.02996v1

Compressor summary: The study examines value consistency of large language models across various scenarios and topics, finding them relatively consistent except on controversial topics.


Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation

http://arxiv.org/abs/2407.02990v1

Compressor summary: The paper proposes a new efficient method for 3D Human Pose Estimation using a Graph and Skipped Transformer architecture that exploits spatio-temporal information and achieves superior performance with reduced computational complexity.


YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision

http://arxiv.org/abs/2407.02988v1

Compressor summary: The paper reviews YOLO object detection algorithms, focusing on their improvements and suitability for edge deployment.


LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

http://arxiv.org/abs/2407.02987v1

Compressor summary: LoRA-Guard is a method to adapt guardrails for content moderation on resource-constrained devices like mobile phones by sharing knowledge between LLMs and guardrail models.


Semantically Rich Local Dataset Generation for Explainable AI in Genomics

http://arxiv.org/abs/2407.02984v1

Compressor summary: The paper proposes using Genetic Programming to generate datasets with semantic variability for interpreting black box deep learning models in gene regulation, achieving good diversity and outperforming a random baseline.


Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text

http://arxiv.org/abs/2407.02978v1

Compressor summary: This paper presents a model to classify AI-generated or human text and evaluates its effectiveness, addressing concerns about machine-generated text misuse in various contexts.


Large Language Models as Evaluators for Scientific Synthesis

http://arxiv.org/abs/2407.02977v1

Compressor summary: The study tests how well LLMs like GPT-4 and Mistral can evaluate scientific summaries by comparing their judgments to human annotators, finding weak correlation between them.


Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization

http://arxiv.org/abs/2407.02968v1

Compressor summary: This paper proposes and tests lightweight multi-class anomaly detection models for visual inspection systems, showing that they can be deployed on edge devices with low latency and memory requirements.


FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

http://arxiv.org/abs/2407.02964v1

Compressor summary: The proposed Finite State Machine prompting method enhances large language models' reasoning capabilities for complex tasks by iteratively decomposing questions into sub-questions and self-correcting, improving accuracy and trustworthiness.


Towards a Scalable Reference-Free Evaluation of Generative Models

http://arxiv.org/abs/2407.02961v1

Compressor summary: The paper proposes a fast and interpretable method (FKEA) to evaluate the diversity of generated data using random Fourier features and kernel approximation, which can handle large-scale generative models.


3D Multimodal Image Registration for Plant Phenotyping

http://arxiv.org/abs/2407.02946v1

Compressor summary: The text describes a novel 3D image registration method that uses depth information from a time-of-flight camera to accurately align images from different cameras, improving the assessment of plant phenotypes.


VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

http://arxiv.org/abs/2407.02945v1

Compressor summary: The paper introduces a method for extrapolated view synthesis in urban scenes using LiDAR and prior knowledge, improving rendering quality for views outside the training camera distribution.


Probing the Feasibility of Multilingual Speaker Anonymization

http://arxiv.org/abs/2407.02937v1

Compressor summary: The study applies a multilingual anonymization system to nine languages, achieving robust results with varying quality of speech synthesis components.


GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

http://arxiv.org/abs/2407.02936v1

Compressor summary: GraCoRe is a benchmark for evaluating large language models' graph comprehension and reasoning abilities across various types of graphs and tasks, revealing insights into their performance and limitations.


PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

http://arxiv.org/abs/2407.02934v1

Compressor summary: PosMLP-Video is a lightweight MLP-like backbone for video recognition that uses relative positional encoding and spatio-temporal factorized positional MLP blocks to achieve competitive performance on image understanding tasks.


EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

http://arxiv.org/abs/2407.02920v1

Compressor summary: EgoFlowNet is a point-level scene flow estimation network that predicts a binary mask and uses all input points for ego-motion and scene flow estimation, improving performance over existing methods on realistic KITTI scenes.


Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

http://arxiv.org/abs/2407.02918v1

Compressor summary: The paper proposes a new method for real-time 3D reconstruction of surgical scenes without using Structure-from-Motion, by leveraging optical flow priors and scene consistency checks.


Towards Negotiative Dialogue for the Talkamatic Dialogue Manager

http://arxiv.org/abs/2407.02917v1

Compressor summary: The paper explores various aspects of negotiation-based conversations using a preliminary version of the Talkamatic Dialogue Manager.


The More the Merrier? Navigating Accuracy vs. Energy Efficiency Design Trade-Offs in Ensemble Learning Systems

http://arxiv.org/abs/2407.02914v1

Compressor summary: The paper analyzes how ensemble learning in machine learning affects accuracy and energy consumption, and suggests designing small ensembles with subset-based training, majority voting, and energy-efficient algorithms for a green AI approach.


SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

http://arxiv.org/abs/2407.02913v1

Compressor summary: SFC is a new algorithm that improves quantized convolution efficiency by extending DFT with symbolic computing and introducing correction terms, achieving 3.68x reduction in multiplication and maintaining accuracy.


Domain-independent detection of known anomalies

http://arxiv.org/abs/2407.02910v1

Compressor summary: Key points: - Industrial quality inspection faces challenges in detecting anomalies with sparse data and unseen objects. - Hybrid task of domain generalization on sparse classes introduced. - Three new datasets based on MVTec AD modified and presented. - Embedding-based approaches (SEMLP and Labeled PatchCore) designed and tested. - SEMLP achieves best performance with 87.2% AUROC on average. Summary: The paper introduces a new hybrid task and datasets for anomaly detection in industrial quality inspection, and proposes embedding-based methods that outperform existing approaches with 87.2% accuracy.


Single Image Rolling Shutter Removal with Diffusion Models

http://arxiv.org/abs/2407.02906v1

Compressor summary: RS-Diffusion is a new method to correct Rolling Shutter artifacts in single images using diffusion techniques and patch-attention, and introduces a new dataset with ground-truth data.


Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

http://arxiv.org/abs/2407.02894v1

Compressor summary: Translatotron-V is an end-to-end image translation model that uses target text decoding and visual tokenization to reduce the language alignment burden and improve performance while preserving visual features.


An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

http://arxiv.org/abs/2407.02893v1

Compressor summary: The paper proposes a novel method called UGTST that selectively annotates a few target domain samples to improve prostate segmentation in cross-center medical images using deep learning models.


GPTQT: Quantize Large Language Models Twice to Push the Efficiency

http://arxiv.org/abs/2407.02891v1

Compressor summary: The paper presents a novel post-training quantization method, GPTQT, that reduces memory usage and speeds up processing in large language models by using progressive two-step linear and binary coding.


Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

http://arxiv.org/abs/2407.02887v1

Compressor summary: EGIInet is a novel framework that efficiently combines 2D and 3D information for point cloud completion using a unified encoding process and an explicitly guided information interaction strategy, achieving state-of-the-art results with fewer parameters.


ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation

http://arxiv.org/abs/2407.02881v1

Compressor summary: ShiftAddAug improves the accuracy of neural networks using multiplication-free operators by augmenting them with costly multiplication and a novel weight sharing method, achieving significant gains in image classification and semantic segmentation tasks.


Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

http://arxiv.org/abs/2407.02880v1

Compressor summary: aTLAS algorithm combines pre-trained model components to enhance knowledge composition and transfer using linear combinations of parameter blocks with different learned coefficients.


Membership Inference Attacks Against Time-Series Models

http://arxiv.org/abs/2407.02870v1

Compressor summary: The authors propose new features for assessing privacy risks in time-series models using seasonality and trend components estimated from health data.


Fast maneuver recovery from aerial observation: trajectory clustering and outliers rejection

http://arxiv.org/abs/2407.02863v1

Compressor summary: The text describes a data-driven approach to model realistic and diverse behaviors of road users in multi-agent simulations, using clustering methods on raw data from different environments.


A Self-Supervised Task for Fault Detection in Satellite Multivariate Time Series

http://arxiv.org/abs/2407.02861v1

Compressor summary: The proposed method uses Physics-Informed Real NVP neural networks with self-supervised training to enhance fault detection in satellite multivariate time series, showing significant performance improvements and potential for other applications.


Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows

http://arxiv.org/abs/2407.02856v1

Compressor summary: Machine learning models perform worse on incomplete network data and need at least 7 packets in the test set for reliable anomaly detection.


Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production

http://arxiv.org/abs/2407.02854v1

Compressor summary: UniGloR is a self-supervised method for sign language translation and production that works without gloss annotations and achieves good results on multiple tasks.


Plant Doctor: A hybrid machine learning and image segmentation software to quantify plant damage in video footage

http://arxiv.org/abs/2407.02853v1

Compressor summary: Plant Doctor is an AI system that uses video footage to diagnose and track leaf damage in urban street plants, helping control disease spread in cities.


Multi-Task Domain Adaptation for Language Grounding with 3D Objects

http://arxiv.org/abs/2407.02846v1

Compressor summary: The text proposes a novel method called DA4LG for language grounding with 3D objects, which uses multi-task learning to align vision and language across domains.


MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

http://arxiv.org/abs/2407.02842v1

Compressor summary: MindBench is a new benchmark for structured document analysis that includes various tasks and challenges to improve current models' performance.


Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction

http://arxiv.org/abs/2407.02837v1

Compressor summary: The paper proposes two methods to protect personal data in texts: a feature-based one and a context-aware one using Multilingual-BERT, which performs better and considers semantic relationships.


A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection

http://arxiv.org/abs/2407.02835v1

Compressor summary: The paper proposes a new unsupervised domain adaptation method for object detection that uses a pairwise attentive adversarial network with a Domain Mixup module to align features from different domains and improve adaptation.


Aspect-Based Sentiment Analysis Techniques: A Comparative Study

http://arxiv.org/abs/2407.02834v1

Compressor summary: The text discusses how Aspect-based Sentiment Analysis, a method that analyzes specific aspects of customer feedback, is important for businesses to understand market trends and improve their competitive edge.


Style Alignment based Dynamic Observation Method for UAV-View Geo-localization

http://arxiv.org/abs/2407.02832v1

Compressor summary: Key points: - UAV-view geo-localization is to match drone images with satellite images for localization - The paper proposes a style alignment method and a dynamic observation module - The method transforms visual style, reduces noise, and uses deconstruction loss - The method achieves state-of-the-art performance on benchmarked datasets Summary: The paper presents a novel method for UAV-view geo-localization that aligns visual styles, controls noise, and uses deconstruction loss to outperform previous methods.


A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes

http://arxiv.org/abs/2407.02830v1

Compressor summary: This paper presents an algorithm that removes reflection noise from TLS point clouds, improving 3D vision tasks in urban environments with reflective surfaces.


Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

http://arxiv.org/abs/2407.02827v1

Compressor summary: The paper analyzes implicit gradient descent for training two-layer physics-informed neural networks and shows it converges faster and more reliably than common gradient descent.


Representation learning with CGAN for casual inference

http://arxiv.org/abs/2407.02825v1

Compressor summary: This paper introduces a new method for finding representation learning functions using CGANs for causal inference when two distributions are balanced.


Effect of a Process Mining based Pre-processing Step in Prediction of the Critical Health Outcomes

http://arxiv.org/abs/2407.02821v1

Compressor summary: The concatenation pre-processing algorithm improves data quality, process model fit, and critical health outcome predictions by reducing dataset complexities in healthcare datasets.


Investigating the Contextualised Word Embedding Dimensions Responsible for Contextual and Temporal Semantic Changes

http://arxiv.org/abs/2407.02820v1

Compressor summary: The paper studies how sense-aware contextual word embeddings encode semantic changes in different contexts and dimensions, using fine-tuned language models and various analyses.


Efficient Training of Language Models with Compact and Consistent Next Token Distributions

http://arxiv.org/abs/2407.02819v1

Compressor summary: The paper proposes a faster way to train language models by pre-aggregating the corpus with a collapsed n-gram distribution, which improves model quality and convergence rate.


Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

http://arxiv.org/abs/2407.02814v1

Compressor summary: The paper proposes a framework that uses causal mediation analysis to understand and reduce biases in vision-language models by focusing on image features, which have the largest impact on bias.


Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

http://arxiv.org/abs/2407.02813v1

Compressor summary: The text proposes a method called Dy-DCA that uses a dynamic deep neural network and content-aware data processing to improve video quality and efficiency, reducing model number and optimizing compilation for real-time performance on mobile devices.


SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing

http://arxiv.org/abs/2407.02811v1

Compressor summary: The paper introduces extit{SPLITZ}, a novel method that splits classifiers into two parts, constrains the Lipschitz constant of one part and smooths the other, to improve robustness against adversarial examples.


Euler's Elastica Based Cartoon-Smooth-Texture Image Decomposition

http://arxiv.org/abs/2407.02794v1

Compressor summary: The authors propose a new method to decompose grayscale images into three parts: structural, smooth, and oscillatory, using regularization terms and an efficient algorithm.


52B to 1T: Lessons Learned via Tele-FLM Series

http://arxiv.org/abs/2407.02783v1

Compressor summary: The report explores the potential of very large language models (>50B parameters) through supervised fine-tuning and progressive growth experiments, and shares an open-source 1T model checkpoint.


Croppable Knowledge Graph Embedding

http://arxiv.org/abs/2407.02779v1

Compressor summary: MED is a framework for training one KGE model that can serve multiple scenarios with different dimensional requirements by cropping sub-models without additional training, using mutual learning, evolutionary improvement, and dynamic loss weighting to enhance performance.


Foster Adaptivity and Balance in Learning with Noisy Labels

http://arxiv.org/abs/2407.02778v1

Compressor summary: Our proposed method, SED, handles label noise in a self-adaptive and class-balanced way by using a novel sample selection strategy, mean-teacher model, sample re-weighting mechanism, and consistency regularization.


A Framework for Quantum Finite-State Languages with Density Mapping

http://arxiv.org/abs/2407.02776v1

Compressor summary: The text introduces a framework for building and simulating quantum finite-state automata (QFAs) using predefined construction methods and improving accuracy on noisy quantum computers.


MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

http://arxiv.org/abs/2407.02775v1

Compressor summary: MLKD-BERT is a novel method that improves knowledge distillation by exploring multi-level knowledge and adjusting student attention heads, leading to better performance and faster inference on BERT models.


Automatic gradient descent with generalized Newton's method

http://arxiv.org/abs/2407.02772v1

Compressor summary: The generalized Newton's method (GeN) is a Hessian-informed optimizer that automatically selects the learning rate for faster convergence without tuning, and performs well on language and vision tasks.