arxiv compressed, 2024-08-30

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-30 generated by the compressor, my personal LLM-based project.


3D Whole-body Grasp Synthesis with Directional Controllability

http://arxiv.org/abs/2408.16770v1

Compressor summary: CWGrasp is a novel method for generating realistic 3D whole-body grasps that considers object geometry, hand positioning, and scene compatibility, improving controllability and efficiency over existing methods.


ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

http://arxiv.org/abs/2408.16767v1

Compressor summary: ReconX is a novel 3D scene reconstruction method that uses pre-trained video diffusion models to generate consistent, detailed videos from limited input views.


CSGO: Content-Style Composition in Text-to-Image Generation

http://arxiv.org/abs/2408.16766v1

Compressor summary: The authors present a data construction pipeline for creating image triplets with content, style, and stylized images, and propose a new end-to-end style transfer model called CSGO that can control various aspects of image generation.


A Score-Based Density Formula, with Applications in Diffusion Generative Models

http://arxiv.org/abs/2408.16765v1

Compressor summary: The paper provides a theoretical explanation for why optimizing the ELBO works well for training diffusion generative models like DDPMs.


UV-free Texture Generation with Denoising and Geodesic Heat Diffusions

http://arxiv.org/abs/2408.16762v1

Compressor summary: The paper proposes a new method to generate textures for 3D objects using point-clouds and heat diffusion, avoiding common UV-based texture issues.


OmniRe: Omni Urban Scene Reconstruction

http://arxiv.org/abs/2408.16760v1

Compressor summary: OmniRe is a novel framework for reconstructing high-fidelity dynamic urban scenes from on-device logs, accurately modeling various dynamic actors such as vehicles, pedestrians, and cyclists.


Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

http://arxiv.org/abs/2408.16757v1

Compressor summary: The paper compares OOD detection and OSR methods, provides a new benchmark setting, and finds that score rules sensitive to deep feature magnitude perform well at scale.


How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

http://arxiv.org/abs/2408.16756v1

Compressor summary: The text discusses the underrepresentation of Cantonese in natural language processing research and proposes new benchmarks and models to improve its language model performance.


Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models

http://arxiv.org/abs/2408.16753v1

Compressor summary: The paper proposes a reinforcement learning framework for last-mile fine-tuning of language models, which improves performance in abstractive summarization and can handle more complex undesirable outputs.


A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models

http://arxiv.org/abs/2408.16751v1

Compressor summary: The paper compares different methods to improve language models by penalizing bad examples and shows that a combination of ExMATE and DPO outperforms MLE in terms of both statistics and generation.


Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge

http://arxiv.org/abs/2408.16749v1

Compressor summary: The study compares BERT and GPT models in detecting and classifying online domestic extremism using different prompts and finds that GPT models outperform BERT models, with more detailed prompts generally yielding better results.


Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

http://arxiv.org/abs/2408.16740v1

Compressor summary: The paper discusses challenges in studying large language models, proposing a non-anthropomorphic approach to understand their texts' characteristics and explore their role in studying human culture.


Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

http://arxiv.org/abs/2408.16737v1

Compressor summary: This paper compares generating synthetic data from strong or weak language models and finds that using weak models is more efficient for improving reasoning performance of large language models.


VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

http://arxiv.org/abs/2408.16730v1

Compressor summary: VideoLLM-MoD is a novel method that reduces vision compute by skipping redundant vision tokens in transformer layers, achieving significant efficiency improvements while preserving or improving performance on various video tasks.


Prediction-Feedback DETR for Temporal Action Detection

http://arxiv.org/abs/2408.16729v1

Compressor summary: The paper introduces Pred-DETR, a new framework that improves the attention collapse problem in cross-attention within DETT-based Temporal Action Detection methods using predictions to align cross- and self-attention.


Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

http://arxiv.org/abs/2408.16725v1

Compressor summary: The paper presents Mini-Omni, an end-to-end conversational model that can generate speech in real-time and interact with humans using the audio modality.


H-SGANet: Hybrid Sparse Graph Attention Network for Deformable Medical Image Registration

http://arxiv.org/abs/2408.16719v1

Compressor summary: The H-SGANet is a lightweight hybrid model that uses sparse graph attention to improve brain MRI volume registration accuracy and efficiency.


A GREAT Architecture for Edge-Based Graph Problems Like TSP

http://arxiv.org/abs/2408.16717v1

Compressor summary: GREAT is a novel edge-based neural model that can handle dense routing problems, sparsify TSP graphs, and achieve state-of-the-art results in Euclidean and non-Euclidean asymmetric TSP.


Enhanced forecasting of stock prices based on variational mode decomposition, PatchTST, and adaptive scale-weighted layer

http://arxiv.org/abs/2408.16707v1

Compressor summary: The text introduces a new method that combines three techniques to improve the accuracy of stock index price forecasting using data from 2000 to 2024.


One-Shot Learning Meets Depth Diffusion in Multi-Object Videos

http://arxiv.org/abs/2408.16704v1

Compressor summary: The paper presents a method to generate editable videos with complex interactions between multiple objects in different artistic styles using a text-video pair and a depth-aware Text-to-Image model.


GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

http://arxiv.org/abs/2408.16700v1

Compressor summary: The paper proposes a framework to identify and quantify biases in Text-to-Image generative models using captions, images, and Vision Question Answering, with two variations that detect different aspects of biases.


SympGNNs: Symplectic Graph Neural Networks for identifiying high-dimensional Hamiltonian systems and node classification

http://arxiv.org/abs/2408.16698v1

Compressor summary: SympGNNs are novel neural network models that can learn high-dimensional Hamiltonian systems by combining symplectic maps with permutation equivarity, achieving accurate system identification and node classification.


CW-CNN & CW-AN: Convolutional Networks and Attention Networks for CW-Complexes

http://arxiv.org/abs/2408.16686v1

Compressor summary: The paper introduces a new neural network framework for learning on CW-complex structured data, which are useful for problems in chemistry.


PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

http://arxiv.org/abs/2408.16684v1

Compressor summary: PartFormer improves object re-identification using a novel adaptation of ViT that enhances diverse representation and attention head diversity for robust feature learning.


Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

http://arxiv.org/abs/2408.16673v1

Compressor summary: GEM is a distribution matching method that improves large language models' performance on various tasks by reducing overfitting and increasing output diversity using the maximum entropy principle.


Iterative Graph Alignment

http://arxiv.org/abs/2408.16667v1

Compressor summary: IGA is a new annotation-free algorithm that improves LLMs' ability to align their responses with rules and generalize better in diverse scenarios.


Space3D-Bench: Spatial 3D Question Answering Benchmark

http://arxiv.org/abs/2408.16662v1

Compressor summary: Space3D-Bench is a diverse 3D Q&A dataset for evaluating and improving spatial reasoning in language and vision models.


Optimal Parallelization of Boosting

http://arxiv.org/abs/2408.16653v1

Compressor summary: This paper improves lower bounds and provides a matching parallel Boosting algorithm for weak-to-strong learners, closing the gap between theory and practice in the tradeoff between training rounds and parallel work per round.


Turbulence Strength $C_n^2$ Estimation from Video using Physics-based Deep Learning

http://arxiv.org/abs/2408.16623v1

Compressor summary: The paper compares classic and deep learning methods for estimating refractive-index structure constant $C_n^2$ from video images and presents a new physics-based network architecture that combines learned convolutional layers with an image gradient method for improved accuracy and generalization.


Towards Infusing Auxiliary Knowledge for Distracted Driver Detection

http://arxiv.org/abs/2408.16621v1

Compressor summary: KiD3 is a new method for detecting distracted drivers using scene graphs, driver pose information, and video frames to improve road safety.


Hyperdimensional Vector Tsetlin Machines with Applications to Sequence Learning and Generation

http://arxiv.org/abs/2408.16620v1

Compressor summary: The authors propose a fast and powerful model that combines hyperdimensional vector computing and Tsetlin machines for learning and generating sequential data, and apply it to forecasting and classification tasks.


Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation

http://arxiv.org/abs/2408.16613v1

Compressor summary: NC-VQVAE is a novel framework that combines self-supervised learning with vector quantization to generate more realistic time series by capturing both low and high-level semantics.


Data Quality Monitoring through Transfer Learning on Anomaly Detection for the Hadron Calorimeters

http://arxiv.org/abs/2408.16612v1

Compressor summary: Transfer learning helps improve anomaly detection accuracy, data reconstruction, and model robustness for sensor data from complex systems like particle detectors at CERN.


sEMG-Driven Physics-Informed Gated Recurrent Networks for Modeling Upper Limb Multi-Joint Movement Dynamics

http://arxiv.org/abs/2408.16599v1

Compressor summary: The study introduces a new neural network model that predicts joint torques using surface electromyography data for exoskeleton and rehabilitation applications.


High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent

http://arxiv.org/abs/2408.16592v1

Compressor summary: Key points: - The paper proposes a new algorithm (A2PSGD) for low-rank representation of high-dimensional sparse data - The algorithm is faster and more accurate than existing ones by using parallelism, load balancing, and acceleration techniques - The algorithm can infer node interactions from real-world network data Summary: The paper introduces A2PSGD, a novel and efficient algorithm that can learn low-dimensional features from sparse network data and discover node interactions.


Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies

http://arxiv.org/abs/2408.16586v1

Compressor summary: The paper presents a large language model-based Werewolf Game AI that uses situation analysis and persuasion strategies to test LLMs' capabilities in complex interactive environments.


FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

http://arxiv.org/abs/2408.16582v1

Compressor summary: Key points: - The paper proposes a two-stream architecture for real-time image manipulation detection - The cognitive branch uses wavelet-guided Transformer blocks to capture global frequency traces - The inspective branch uses simple convolutions to capture fine-grained traces and interacts with the cognitive branch - The method is lightweight and competitive in performance Summary: The paper presents a lightweight two-stream architecture that combines wavelet transformation and attention design with simple convolutions to detect image manipulation in real-time.


Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning

http://arxiv.org/abs/2408.16577v1

Compressor summary: The paper proposes a method for learning causal features in multimodal data that enhances deep learning models' performance by separating invariant and specific components and ensuring PNS identifiability.


An Adaptive Latent Factorization of Tensors Model for Embedding Dynamic Communication Network

http://arxiv.org/abs/2408.16573v1

Compressor summary: The paper introduces a new model (ATT) for analyzing high-dimensional sparse data from communication networks, which improves performance over existing methods.


Predictability maximization and the origins of word order harmony

http://arxiv.org/abs/2408.16570v1

Compressor summary: The text discusses how to optimally arrange a linguistic head and its dependents for maximum predictability, considering statistical independence and harmonic order.


MST-KD: Multiple Specialized Teachers Knowledge Distillation for Fair Face Recognition

http://arxiv.org/abs/2408.16563v1

Compressor summary: Key points: - The text proposes a framework with multiple specialized teachers for face recognition tasks - Each teacher is trained on one specific ethnicity, leading to biased but effective feature extraction - The framework learns a project of the four teachers into a common space and distills information to a student network - The approach achieves better performance and reduced bias compared to balanced datasets Summary: The text presents a face recognition framework that uses four specialized ethnicity-specific teachers who are combined and distilled to a student network, improving performance and reducing bias.


Spurfies: Sparse Surface Reconstruction using Local Geometry Priors

http://arxiv.org/abs/2408.16544v1

Compressor summary: Spurfies is a new sparse-view reconstruction method that uses local geometry priors trained on synthetic data and neural point representation to improve surface quality and novel view synthesis.


SALSA: Speedy ASR-LLM Synchronous Aggregation

http://arxiv.org/abs/2408.16542v1

Compressor summary: SALSA is a method that improves ASR for low-resource languages by coupling decoder layers of both ASR and LLM while handling tokenizer mismatch.


SFR-GNN: Simple and Fast Robust GNNs against Structural Attacks

http://arxiv.org/abs/2408.16537v1

Compressor summary: The paper proposes an efficient defense method called SFR-GNN that uses contrastive learning and mutual information theory to improve the robustness of graph neural networks against adversarial structural attacks.


TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification

http://arxiv.org/abs/2408.16535v1

Compressor summary: TinyTNAS is a hardware-aware NAS tool that efficiently optimizes neural network architectures for TinyML time series classification on CPUs, reducing resource usage and latency while maintaining high accuracy.


Multitask learning for improved scour detection: A dynamic wave tank study

http://arxiv.org/abs/2408.16527v1

Compressor summary: The paper proposes using a Bayesian hierarchical model to infer foundation stiffness distribution in offshore wind farms, improving structural health monitoring by detecting anomalies like scour.


CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues

http://arxiv.org/abs/2408.16518v1

Compressor summary: The paper introduces CNIMA, a Chinese dialogue dataset with interactivity annotations, evaluates an existing framework for English on Chinese data, and proposes an automated evaluation system for second language assessment.


Adaptive Variational Continual Learning via Task-Heuristic Modelling

http://arxiv.org/abs/2408.16517v1

Compressor summary: AutoVCL is a continual learning model that adapts its hyperparameters to the task difficulty and similarity for better performance than standard GVCL.


Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

http://arxiv.org/abs/2408.16506v1

Compressor summary: Our approach generates high-quality, consistent, and realistic video animations from static images using a training-free framework that aligns skeletal, motion, and pixel information.


Locally Grouped and Scale-Guided Attention for Dense Pest Counting

http://arxiv.org/abs/2408.16503v1

Compressor summary: The study proposes a new model that uses local attention and multiscale features to accurately count densely distributed pests in images captured by digital traps.


LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?

http://arxiv.org/abs/2408.16502v1

Compressor summary: The text compares the performance and cost-effectiveness of large language models for data augmentation tasks with other established methods.


UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation

http://arxiv.org/abs/2408.16501v1

Compressor summary: This research develops a system that automatically selects and fuses detection results from teams of UAVs to accurately locate objects in search and rescue missions.


CogVLM2: Visual Language Models for Image and Video Understanding

http://arxiv.org/abs/2408.16500v1

Compressor summary: The authors introduce CogVLM2 family, a new generation of visual language models for image and video understanding, which achieves state-of-the-art results on various benchmarks and is open-sourced.


On-device AI: Quantization-aware Training of Transformers in Time-Series

http://arxiv.org/abs/2408.16495v1

Compressor summary: The research optimizes the Transformer AI model for time-series forecasting on resource-limited sensor devices using FPGAs and Quantization-aware Training.


Learning from Negative Samples in Generative Biomedical Entity Linking

http://arxiv.org/abs/2408.16493v1

Compressor summary: ANGEL is a framework that improves biomedical entity linking by training generative models with negative samples, leading to better accuracy on five benchmarks.


Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning

http://arxiv.org/abs/2408.16486v1

Compressor summary: The paper proposes a test-time prompt tuning approach for vision-language models that uses maximum concept matching scores as dynamic weights to improve generalization on open-set problems.


Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning

http://arxiv.org/abs/2408.16482v1

Compressor summary: The paper proposes a simple and cheap method using in-context learning and human survey data to align large language models better with cultural values, and shows it works across different languages and models.


An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

http://arxiv.org/abs/2408.16463v1

Compressor summary: The study uses artificial intelligence and deep learning to improve the accuracy of suicide risk prediction in psychological support hotlines, outperforming traditional methods.


HYGENE: A Diffusion-based Hypergraph Generation Method

http://arxiv.org/abs/2408.16457v1

Compressor summary: Key points: - Hypergraphs model complex relationships in various domains - HYGENE is a diffusion-based method that generates realistic and diverse hypergraphs - HYGENE works on a bipartite representation of hypergraphs and adds nodes and hyperedges iteratively using a denoising diffusion process - HYGENE closely mimics properties in hypergraphs and is the first deep learning model for hypergraph generation Summary: HYGENE is a novel deep learning method that generates realistic and diverse hypergraphs by progressively expanding a bipartite representation of nodes and hyperedges using a denoising diffusion process.


Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

http://arxiv.org/abs/2408.16451v1

Compressor summary: The authors propose a novel automated method using Vision Transformer and Multiple instance learning to accurately detect tooth-marked tongues in Traditional Chinese Medicine, improving objectivity and clinical value.


What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer

http://arxiv.org/abs/2408.16450v1

Compressor summary: The paper introduces HairFusion, a one-stage diffusion model that transfers hairstyles to face images while preserving other appearance features, using hair-agnostic representations and adaptive hair blending.


Enhancing Sound Source Localization via False Negative Elimination

http://arxiv.org/abs/2408.16448v1

Compressor summary: The proposed audio-visual learning framework combats false negatives in sound source localization using self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL), achieving superior performance and versatility in various tasks.


Is text normalization relevant for classifying medieval charters?

http://arxiv.org/abs/2408.16446v1

Compressor summary: The study compares different methods to classify medieval charters based on their text, finding that normalization may harm dating accuracy and that certain models perform better than others.


SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

http://arxiv.org/abs/2408.16444v1

Compressor summary: The paper presents SurveySum, a new dataset for summarizing scientific articles into survey sections, and evaluates two pipelines for this task, showing that high-quality retrieval is crucial.


Integrating Features for Recognizing Human Activities through Optimized Parameters in Graph Convolutional Networks and Transformer Architectures

http://arxiv.org/abs/2408.16442v1

Compressor summary: The study shows that feature fusion improves activity recognition accuracy using deep learning models, with PO-GCN performing best among four datasets.


Instruction-tuned Large Language Models for Machine Translation in the Medical Domain

http://arxiv.org/abs/2408.16440v1

Compressor summary: Instruction-tuned large language models perform better than baseline ones in translating medical terminology.


Gradient-free variational learning with conditional mixture networks

http://arxiv.org/abs/2408.16429v1

Compressor summary: CAVI-CMN is a fast, gradient-free Bayesian method that uses conditional mixture networks to solve complex classification tasks with competitive accuracy and efficiency.


COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

http://arxiv.org/abs/2408.16426v1

Compressor summary: COIN is a novel method that uses control-inpainting score distillation and human-scene relation loss to disentangle human and camera motions and achieve state-of-the-art results in global human motion estimation and camera motion estimation.


A Comparative Study of Hyperparameter Tuning Methods

http://arxiv.org/abs/2408.16425v1

Compressor summary: The paper evaluates three hyperparameter tuning algorithms and shows that nonlinear models with well-tuned hyperparameters perform better than linear ones, but the best algorithm depends on the task and model type.


Fourier Spectral Physics Informed Neural Network: An Efficient and Low-Memory PINN

http://arxiv.org/abs/2408.16414v1

Compressor summary: Key points: - The paper proposes a spectral-based neural network that replaces automatic differentiation with multiplication for solving PDEs with PINNs. - The approach reduces memory and training time, and improves accuracy due to exponential convergence of spectral basis. - The paper also provides two strategies to train networks with spectral information. Summary: The paper introduces a new spectral-based neural network that simplifies automatic differentiation, reduces computation costs, and enhances accuracy for solving PDEs with physics-informed neural networks, along with two methods to leverage spectral information in training.


DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

http://arxiv.org/abs/2408.16403v1

Compressor summary: DeepSPoC combines sequential propagation of chaos with deep learning to solve nonlinear Fokker-Planck equations and uses various neural network architectures for high-dimensional problems.


IBO: Inpainting-Based Occlusion to Enhance Explainable Artificial Intelligence Evaluation in Histopathology

http://arxiv.org/abs/2408.16395v1

Compressor summary: The paper proposes a novel occlusion strategy called Inpainting-Based Occlusion (IBO) that uses a Denoising Diffusion Probabilistic Model to generate realistic, non-cancerous tissue in histopathological images, improving interpretability and trustworthiness of Explainable Artificial Intelligence techniques for cancer diagnosis.


Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization

http://arxiv.org/abs/2408.16393v1

Compressor summary: The paper explores the trade-off between diversity and quality in optimization problems and proposes a new approach based on subset selection and random sampling.


TempoKGAT: A Novel Graph Attention Network Approach for Temporal Graph Analysis

http://arxiv.org/abs/2408.16391v1

Compressor summary: TempoKGAT is a graph attention network that uses time-decaying weights, selective neighbor aggregation, and top-k neighbor selection to handle dynamic, temporal data and achieve superior performance on spatio-temporal datasets.


MQM-Chat: Multidimensional Quality Metrics for Chat Translation

http://arxiv.org/abs/2408.16390v1

Compressor summary: This study proposes a new evaluation metric (MQM-Chat) for machine translation models that handle chat translations, revealing different errors and highlighting the significance of preserving style and consistency.


Addressing Common Misinterpretations of KART and UAT in Neural Network Literature

http://arxiv.org/abs/2408.16389v1

Compressor summary: The note clarifies the KART and UAT, correcting common errors in neural network literature to improve comprehension.


Exploiting temporal information to detect conversational groups in videos and predict the next speaker

http://arxiv.org/abs/2408.16380v1

Compressor summary: The paper presents an approach that uses time and multimodal signals to detect F formations in videos and predict the next speaker in a conversation using a recursive neural network (LSTM).


TG-PhyNN: An Enhanced Physically-Aware Graph Neural Network framework for forecasting Spatio-Temporal Data

http://arxiv.org/abs/2408.16379v1

Compressor summary: TG-PhyNN is a new framework that combines Graph Neural Networks with physical constraints to improve forecasting of spatio-temporal data in domains like traffic and disease spread.


Law of Vision Representation in MLLMs

http://arxiv.org/abs/2408.16357v1

Compressor summary: The "Law of Vision Representation" shows how cross-modal alignment and correspondence affect multimodal language models' performance and helps optimize their vision representation with less computation.


The Unreasonable Ineffectiveness of Nucleus Sampling on Mitigating Text Memorization

http://arxiv.org/abs/2408.16345v1

Compressor summary: The study examines if nucleus sampling reduces text memorization in large language models and finds that it has limited effect, and that soft memorization can still occur.


Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

http://arxiv.org/abs/2408.16343v1

Compressor summary: The study introduces a multimodal classification model that uses clinical, cognitive, neuroimaging, and EEG data to enhance the accuracy of diagnosing Alzheimer's disease and differentiate it from other conditions.


Do Graph Neural Networks Work for High Entropy Alloys?

http://arxiv.org/abs/2408.16337v1

Compressor summary: LEGraphs and LESets are novel graph neural network models for high-entropy alloys that use local environment graphs to capture their complex structure and predict their mechanical properties.


GL-TSVM: A robust and smooth twin support vector machine with guardian loss function

http://arxiv.org/abs/2408.16336v1

Compressor summary: GL-TSVM is a robust and smooth classifier that uses a novel loss function to address TSVM's sensitivity to noise and has lower computational complexity than SVM.


Self-Improving Diffusion Models with Synthetic Data

http://arxiv.org/abs/2408.16333v1

Compressor summary: SIMS is a new training concept for diffusion models that uses self-synthesized data to improve generative AI without compromising quality or diversity, and can adjust the synthetic data distribution to match in-domain target distributions.


Guided Reasoning: A Non-Technical Introduction

http://arxiv.org/abs/2408.16331v1

Compressor summary: Guided Reasoning is a method where one agent helps other agents improve their reasoning quality, and Logikon is an example of this concept.


Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

http://arxiv.org/abs/2408.16326v1

Compressor summary: Key points: - Self-critic helps improve LLM's reasoning performance but current approaches are simple and limited - Critic-CoT is a new framework that enhances LLM's critic capability via step-wise CoT reasoning and distant-supervision data - Experiments show that Critic-CoT boosts task-solving performance and improves generation Summary: Critic-CoT is a novel framework that leverages self-critic to enhance LLM's reasoning and critic ability via step-wise CoT reasoning and distant-supervision data, achieving better task-solving and generation results.


P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

http://arxiv.org/abs/2408.16325v1

Compressor summary: The paper proposes a novel method for point cloud denoising that learns an optimal transport plan between paired point clouds and improves over existing methods with or without additional features.


Minimising changes to audit when updating decision trees

http://arxiv.org/abs/2408.16321v1

Compressor summary: The paper proposes an algorithm for updating a decision tree with minimal human auditing, using a greedy approach and a customised objective function.


ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

http://arxiv.org/abs/2408.16314v1

Compressor summary: The ResVG model improves visual grounding by enhancing semantics and handling spatial relations in images with multiple distractions using text-to-image generation and relation-sensitive data augmentation.


FA-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules

http://arxiv.org/abs/2408.16313v1

Compressor summary: This paper introduces FA-YOLO, a novel object detection model that improves over YOLOv9 by enhancing feature selection, fusion, and detection accuracy for small, medium, and large targets in complex environments.


Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning

http://arxiv.org/abs/2408.16310v1

Compressor summary: SlotSAM enhances foundation models' object-level perception and generalization by reconstructing features and integrating them into the model using self-supervised learning.


Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach

http://arxiv.org/abs/2408.16305v1

Compressor summary: The paper proposes a semantics-oriented multitask learning approach for DeepFake detection using joint embedding of face images and textual descriptions, improving generalizability and interpretability.


Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models

http://arxiv.org/abs/2408.16296v1

Compressor summary: The paper proposes a new method for image retrieval using multi-modal large language models and data augmentation techniques, which outperforms conventional methods on several datasets.


Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

http://arxiv.org/abs/2408.16293v1

Compressor summary: The paper explores using error-correction data in pretraining language models to improve reasoning accuracy without multi-round prompting.


Convolutional Neural Network Compression Based on Low-Rank Decomposition

http://arxiv.org/abs/2408.16289v1

Compressor summary: Key points: - The paper proposes a compression method for deep neural networks using VBMF and orthogonal regularization - The method estimates the rank of the weight tensor at each layer and preserves accuracy - The method is general and adaptable to other convolutional neural networks and tensor decomposition methods Summary: The paper introduces a compression method for deep neural networks that combines VBMF and orthogonal regularization to estimate the rank of the weight tensor and maintain accuracy, and can be applied to various networks and tensor decomposition methods.


Measuring the Accuracy of Automatic Speech Recognition Solutions

http://arxiv.org/abs/2408.16287v1

Compressor summary: The paper evaluates the accuracy and reliability of automatic speech recognition (ASR) captioning for d/Deaf and hard of hearing people, finding wide variation between services and lower quality for streaming ASR.


Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

http://arxiv.org/abs/2408.16286v1

Compressor summary: The paper introduces the first algorithm to find near-optimal policies for robust constrained MDPs, addressing a limitation of existing methods by using the epigraph form and binary search.


ART: Actually Robust Training

http://arxiv.org/abs/2408.16285v1

Compressor summary: Art is a Python library that helps impose rules and standards for developing deep learning models, improving interpretability and robustness.


Enhancing Customer Churn Prediction in Telecommunications: An Adaptive Ensemble Learning Approach

http://arxiv.org/abs/2408.16284v1

Compressor summary: Key points: - Paper proposes an adaptive ensemble learning framework for accurate customer churn prediction - Framework combines multiple base models, meta-feature generation, and data preprocessing - Achieves 99.28% accuracy on three telecom datasets, outperforming existing methods Summary: The paper presents a novel framework that uses multiple models, feature engineering, and data preprocessing to predict customer churn in telecom industry with high accuracy.


Web Service QoS Prediction via Extended Canonical Polyadic-based Tensor Network

http://arxiv.org/abs/2408.16278v1

Compressor summary: The paper introduces an improved tensor network model (ECTN) for predicting web service Quality of Service (QoS), which considers user and service correlation in low-dimensional space and achieves better accuracy than existing models.


Enhancing AI-Driven Psychological Consultation: Layered Prompts with Large Language Models

http://arxiv.org/abs/2408.16276v1

Compressor summary: The authors propose using large language models with layered prompting systems to improve AI-driven psychological consultation services by enhancing their emotional intelligence and contextual understanding.


SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

http://arxiv.org/abs/2408.16273v1

Compressor summary: The text proposes a two-branch model that uses synthetic data to address the challenge of long-tailed image recognition and improve accuracy.


Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

http://arxiv.org/abs/2408.16272v1

Compressor summary: SRAM is a novel network module for Video Temporal Grounding that uses cross-modal alignment and Deep Evidential Regression to estimate uncertainties and handle open-world challenges, such as noisy and out-of-distribution data.


UDD: Dataset Distillation via Mining Underutilized Regions

http://arxiv.org/abs/2408.16268v1

Compressor summary: UDD is a novel approach that identifies and exploits underutilized regions in synthetic datasets to improve model performance using response-based and data jittering-based policies, as well as a category-wise feature contrastive loss.


Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation

http://arxiv.org/abs/2408.16266v1

Compressor summary: The paper introduces Diff-II, a novel diffusion-based data augmentation method that balances faithfulness and diversity for improving image classification performance on various tasks.


Low Saturation Confidence Distribution-based Test-Time Adaptation for Cross-Domain Remote Sensing Image Classification

http://arxiv.org/abs/2408.16265v1

Compressor summary: LSCD-TTA is a novel test time adaptation method for cross-domain remote sensing image classification that considers distribution characteristics, achieving fast and accurate adaptations without much prior data or manual annotation.


LoraMap: Harnessing the Power of LoRA Connections

http://arxiv.org/abs/2408.16264v1

Compressor summary: The paper proposes LoraMap, a method to connect multiple Low-Rank Adaptation models for fact-checking tasks, which improves performance over existing methods with fewer parameters.


On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

http://arxiv.org/abs/2408.16262v1

Compressor summary: The paper studies Q-learning algorithms for large state space problems under average-reward criterion, extending previous analysis from unichain to weakly communicating MDPs and characterizing their convergence sets.


Evaluating Time-Series Training Dataset through Lens of Spectrum in Deep State Space Models

http://arxiv.org/abs/2408.16261v1

Compressor summary: The study proposes a metric, the K-spectral metric, to estimate the performance of deep SSMs on time-series datasets early in the training process, reducing the cost of data collection for new tasks.


Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9

http://arxiv.org/abs/2408.16256v1

Compressor summary: Key points: - Breast cancer is a leading cause of death in women but many cases are cured by local treatment - Current prognostic metrics are not reliable and often lead to unnecessary adjuvant therapies - The authors propose using machine learning algorithms to develop better prognostics based on existing data Summary: The authors suggest that machine learning can improve breast cancer prognostics using routine data, reducing the need for unreliable current metrics and potentially sparing some women from unnecessary treatments.


Iterated Energy-based Flow Matching for Sampling from Boltzmann Densities

http://arxiv.org/abs/2408.16249v1

Compressor summary: The paper proposes a new method to train probabilistic models from unnormalized densities using energy functions, which performs better than existing methods and can handle complex high-dimensional systems.


Anno-incomplete Multi-dataset Detection

http://arxiv.org/abs/2408.16247v1

Compressor summary: The paper proposes a new method for object detection that uses multiple incomplete datasets and improves performance on COCO and VOC benchmarks.


Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions

http://arxiv.org/abs/2408.16245v1

Compressor summary: The text describes the development and success of multi-omic nucleotide-peptide foundation models for predicting peptide-nucleotide interactions and identifying involved residues.


Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers

http://arxiv.org/abs/2408.16241v1

Compressor summary: The thesis proposes new methods for enhancing transformer language models, improving their capabilities for tasks like sequence-to-sequence generation and few-shot classification, and discusses the trade-off between model likelihood and output quality.


Neural Spectral Decomposition for Dataset Distillation

http://arxiv.org/abs/2408.16236v1

Compressor summary: Key points: - Neural Spectrum Decomposition (NSD) is a new framework for dataset distillation that considers the entire dataset as a low-rank observation - NSD learns spectrum tensors and transformation matrices that reconstruct the data distribution through matrix multiplication - NSD uses trajectory matching optimization with real distribution guidance and achieves state-of-the-art performance on various benchmarks Summary: NSD is a novel dataset distillation method that decomposes high-dimensional data into low-rank components and optimizes their combinations to reconstruct the data distribution, achieving excellent results on several image datasets.


LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement

http://arxiv.org/abs/2408.16235v1

Compressor summary: The paper proposes a semi-supervised method, LMT-GP, for low-light image enhancement that leverages both labeled and unlabeled data using latent mean-teacher and Gaussian process techniques to improve generalization ability and visual quality.


PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

http://arxiv.org/abs/2408.16233v1

Compressor summary: PSE-Net is a novel parallel algorithm for efficient channel pruning of deep neural networks that simulates multiple subnets' training in one round and uses prior information to improve evolutionary search efficiency.


Enhancing Conditional Image Generation with Explainable Latent Space Manipulation

http://arxiv.org/abs/2408.16232v1

Compressor summary: The paper presents a method that combines diffusion models with latent space manipulation and gradient-based selective attention to generate realistic images based on textual descriptions while preserving the reference image features.


Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective

http://arxiv.org/abs/2408.16227v1

Compressor summary: PGFuse is a framework that uses distortion-aware Gabor filters to estimate depth from monocular 360 images, improving feature extraction and reducing distortions.


LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models

http://arxiv.org/abs/2408.16224v1

Compressor summary: The paper proposes an SGE module for VLMs that extracts and expresses complex semantic information in images, improving their perception and performance in vision-language tasks.


Training-free Video Temporal Grounding using Large-scale Pre-trained Models

http://arxiv.org/abs/2408.16219v1

Compressor summary: The paper proposes a Training-Free Video Temporal Grounding approach that uses pre-trained large models to select the most relevant video segments for a given natural language query, addressing issues of temporal boundaries and dynamic transitions between events in videos.


Targeted Cause Discovery with Data-Driven Learning

http://arxiv.org/abs/2408.16218v1

Compressor summary: The proposed method uses a neural network to learn causal relationships between variables and a target variable, identifying both direct and indirect causes in large-scale systems.


M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

http://arxiv.org/abs/2408.16213v1

Compressor summary: M4CXR is a multi-modal LLM that enhances CXR interpretation by performing tasks such as medical report generation, visual grounding, and visual question answering with state-of-the-art clinical accuracy.


From cart to truck: meaning shift through words in English in the last two centuries

http://arxiv.org/abs/2408.16209v1

Compressor summary: The study analyzes how words representing the same concepts changed over time using historical data and word embeddings, revealing language-society connections and discussing methodological and ethical challenges.


ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

http://arxiv.org/abs/2408.16208v1

Compressor summary: ReXamine-Global is a framework that tests the generalization of radiology report quality metrics using a large language model and 240 reports from six hospitals, revealing gaps in existing metrics' robustness.


Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

http://arxiv.org/abs/2408.16204v1

Compressor summary: Micro-batch clipping improves ASR model performance by reducing convergence time but introduces a constant bias dependent on factors like sample quality and domain diversity.


Short-Term Electricity-Load Forecasting by Deep Learning: A Comprehensive Survey

http://arxiv.org/abs/2408.16202v1

Compressor summary: This paper reviews deep learning applications in short-term electricity load forecasting, discussing the forecasting process, challenges, and future directions.


Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products

http://arxiv.org/abs/2408.16201v1

Compressor summary: The paper proposes a novel 3D anomaly detection method for manufacturing systems that can identify all defect types, including geometric and missing regions, on both model-free products and existing products.


DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning

http://arxiv.org/abs/2408.16195v1

Compressor summary: The paper proposes a VMTL method for video understanding tasks, which uses a Double-Layers Mapper to extract shareable knowledge from multiple tasks and improve performance.


Variational Mode-Driven Graph Convolutional Network for Spatiotemporal Traffic Forecasting

http://arxiv.org/abs/2408.16191v1

Compressor summary: The paper presents a variational mode graph convolutional network (VMGCN) that uses variational mode decomposition (VMD) to decompose spatio-temporal traffic data into modes for improved prediction.


Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

http://arxiv.org/abs/2408.16187v1

Compressor summary: The paper presents new datasets for streaming regression on energy prices in New Zealand and analyzes various aspects of their use and potential research directions.