This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-30 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.16770v1
Compressor summary: CWGrasp is a novel method for generating realistic 3D whole-body grasps that considers object geometry, hand positioning, and scene compatibility, improving controllability and efficiency over existing methods.
http://arxiv.org/abs/2408.16767v1
Compressor summary: ReconX is a novel 3D scene reconstruction method that uses pre-trained video diffusion models to generate consistent, detailed videos from limited input views.
http://arxiv.org/abs/2408.16766v1
Compressor summary: The authors present a data construction pipeline for creating image triplets with content, style, and stylized images, and propose a new end-to-end style transfer model called CSGO that can control various aspects of image generation.
http://arxiv.org/abs/2408.16765v1
Compressor summary: The paper provides a theoretical explanation for why optimizing the ELBO works well for training diffusion generative models like DDPMs.
http://arxiv.org/abs/2408.16762v1
Compressor summary: The paper proposes a new method to generate textures for 3D objects using point-clouds and heat diffusion, avoiding common UV-based texture issues.
http://arxiv.org/abs/2408.16760v1
Compressor summary: OmniRe is a novel framework for reconstructing high-fidelity dynamic urban scenes from on-device logs, accurately modeling various dynamic actors such as vehicles, pedestrians, and cyclists.
http://arxiv.org/abs/2408.16757v1
Compressor summary: The paper compares OOD detection and OSR methods, provides a new benchmark setting, and finds that score rules sensitive to deep feature magnitude perform well at scale.
http://arxiv.org/abs/2408.16756v1
Compressor summary: The text discusses the underrepresentation of Cantonese in natural language processing research and proposes new benchmarks and models to improve its language model performance.
http://arxiv.org/abs/2408.16753v1
Compressor summary: The paper proposes a reinforcement learning framework for last-mile fine-tuning of language models, which improves performance in abstractive summarization and can handle more complex undesirable outputs.
http://arxiv.org/abs/2408.16751v1
Compressor summary: The paper compares different methods to improve language models by penalizing bad examples and shows that a combination of ExMATE and DPO outperforms MLE in terms of both statistics and generation.
http://arxiv.org/abs/2408.16749v1
Compressor summary: The study compares BERT and GPT models in detecting and classifying online domestic extremism using different prompts and finds that GPT models outperform BERT models, with more detailed prompts generally yielding better results.
http://arxiv.org/abs/2408.16740v1
Compressor summary: The paper discusses challenges in studying large language models, proposing a non-anthropomorphic approach to understand their texts' characteristics and explore their role in studying human culture.
http://arxiv.org/abs/2408.16737v1
Compressor summary: This paper compares generating synthetic data from strong or weak language models and finds that using weak models is more efficient for improving reasoning performance of large language models.
http://arxiv.org/abs/2408.16730v1
Compressor summary: VideoLLM-MoD is a novel method that reduces vision compute by skipping redundant vision tokens in transformer layers, achieving significant efficiency improvements while preserving or improving performance on various video tasks.
http://arxiv.org/abs/2408.16729v1
Compressor summary: The paper introduces Pred-DETR, a new framework that improves the attention collapse problem in cross-attention within DETT-based Temporal Action Detection methods using predictions to align cross- and self-attention.
http://arxiv.org/abs/2408.16725v1
Compressor summary: The paper presents Mini-Omni, an end-to-end conversational model that can generate speech in real-time and interact with humans using the audio modality.
http://arxiv.org/abs/2408.16719v1
Compressor summary: The H-SGANet is a lightweight hybrid model that uses sparse graph attention to improve brain MRI volume registration accuracy and efficiency.
http://arxiv.org/abs/2408.16717v1
Compressor summary: GREAT is a novel edge-based neural model that can handle dense routing problems, sparsify TSP graphs, and achieve state-of-the-art results in Euclidean and non-Euclidean asymmetric TSP.
http://arxiv.org/abs/2408.16707v1
Compressor summary: The text introduces a new method that combines three techniques to improve the accuracy of stock index price forecasting using data from 2000 to 2024.
http://arxiv.org/abs/2408.16704v1
Compressor summary: The paper presents a method to generate editable videos with complex interactions between multiple objects in different artistic styles using a text-video pair and a depth-aware Text-to-Image model.
http://arxiv.org/abs/2408.16700v1
Compressor summary: The paper proposes a framework to identify and quantify biases in Text-to-Image generative models using captions, images, and Vision Question Answering, with two variations that detect different aspects of biases.
http://arxiv.org/abs/2408.16698v1
Compressor summary: SympGNNs are novel neural network models that can learn high-dimensional Hamiltonian systems by combining symplectic maps with permutation equivarity, achieving accurate system identification and node classification.
http://arxiv.org/abs/2408.16686v1
Compressor summary: The paper introduces a new neural network framework for learning on CW-complex structured data, which are useful for problems in chemistry.
http://arxiv.org/abs/2408.16684v1
Compressor summary: PartFormer improves object re-identification using a novel adaptation of ViT that enhances diverse representation and attention head diversity for robust feature learning.
http://arxiv.org/abs/2408.16673v1
Compressor summary: GEM is a distribution matching method that improves large language models' performance on various tasks by reducing overfitting and increasing output diversity using the maximum entropy principle.
http://arxiv.org/abs/2408.16667v1
Compressor summary: IGA is a new annotation-free algorithm that improves LLMs' ability to align their responses with rules and generalize better in diverse scenarios.
http://arxiv.org/abs/2408.16662v1
Compressor summary: Space3D-Bench is a diverse 3D Q&A dataset for evaluating and improving spatial reasoning in language and vision models.
http://arxiv.org/abs/2408.16653v1
Compressor summary: This paper improves lower bounds and provides a matching parallel Boosting algorithm for weak-to-strong learners, closing the gap between theory and practice in the tradeoff between training rounds and parallel work per round.
http://arxiv.org/abs/2408.16623v1
Compressor summary: The paper compares classic and deep learning methods for estimating refractive-index structure constant $C_n^2$ from video images and presents a new physics-based network architecture that combines learned convolutional layers with an image gradient method for improved accuracy and generalization.
http://arxiv.org/abs/2408.16621v1
Compressor summary: KiD3 is a new method for detecting distracted drivers using scene graphs, driver pose information, and video frames to improve road safety.
http://arxiv.org/abs/2408.16620v1
Compressor summary: The authors propose a fast and powerful model that combines hyperdimensional vector computing and Tsetlin machines for learning and generating sequential data, and apply it to forecasting and classification tasks.
http://arxiv.org/abs/2408.16613v1
Compressor summary: NC-VQVAE is a novel framework that combines self-supervised learning with vector quantization to generate more realistic time series by capturing both low and high-level semantics.
http://arxiv.org/abs/2408.16612v1
Compressor summary: Transfer learning helps improve anomaly detection accuracy, data reconstruction, and model robustness for sensor data from complex systems like particle detectors at CERN.
http://arxiv.org/abs/2408.16599v1
Compressor summary: The study introduces a new neural network model that predicts joint torques using surface electromyography data for exoskeleton and rehabilitation applications.
http://arxiv.org/abs/2408.16592v1
Compressor summary: Key points: - The paper proposes a new algorithm (A2PSGD) for low-rank representation of high-dimensional sparse data - The algorithm is faster and more accurate than existing ones by using parallelism, load balancing, and acceleration techniques - The algorithm can infer node interactions from real-world network data Summary: The paper introduces A2PSGD, a novel and efficient algorithm that can learn low-dimensional features from sparse network data and discover node interactions.
http://arxiv.org/abs/2408.16586v1
Compressor summary: The paper presents a large language model-based Werewolf Game AI that uses situation analysis and persuasion strategies to test LLMs' capabilities in complex interactive environments.
http://arxiv.org/abs/2408.16582v1
Compressor summary: Key points: - The paper proposes a two-stream architecture for real-time image manipulation detection - The cognitive branch uses wavelet-guided Transformer blocks to capture global frequency traces - The inspective branch uses simple convolutions to capture fine-grained traces and interacts with the cognitive branch - The method is lightweight and competitive in performance Summary: The paper presents a lightweight two-stream architecture that combines wavelet transformation and attention design with simple convolutions to detect image manipulation in real-time.
http://arxiv.org/abs/2408.16577v1
Compressor summary: The paper proposes a method for learning causal features in multimodal data that enhances deep learning models' performance by separating invariant and specific components and ensuring PNS identifiability.
http://arxiv.org/abs/2408.16573v1
Compressor summary: The paper introduces a new model (ATT) for analyzing high-dimensional sparse data from communication networks, which improves performance over existing methods.
http://arxiv.org/abs/2408.16570v1
Compressor summary: The text discusses how to optimally arrange a linguistic head and its dependents for maximum predictability, considering statistical independence and harmonic order.
http://arxiv.org/abs/2408.16563v1
Compressor summary: Key points: - The text proposes a framework with multiple specialized teachers for face recognition tasks - Each teacher is trained on one specific ethnicity, leading to biased but effective feature extraction - The framework learns a project of the four teachers into a common space and distills information to a student network - The approach achieves better performance and reduced bias compared to balanced datasets Summary: The text presents a face recognition framework that uses four specialized ethnicity-specific teachers who are combined and distilled to a student network, improving performance and reducing bias.
http://arxiv.org/abs/2408.16544v1
Compressor summary: Spurfies is a new sparse-view reconstruction method that uses local geometry priors trained on synthetic data and neural point representation to improve surface quality and novel view synthesis.
http://arxiv.org/abs/2408.16542v1
Compressor summary: SALSA is a method that improves ASR for low-resource languages by coupling decoder layers of both ASR and LLM while handling tokenizer mismatch.
http://arxiv.org/abs/2408.16537v1
Compressor summary: The paper proposes an efficient defense method called SFR-GNN that uses contrastive learning and mutual information theory to improve the robustness of graph neural networks against adversarial structural attacks.
http://arxiv.org/abs/2408.16535v1
Compressor summary: TinyTNAS is a hardware-aware NAS tool that efficiently optimizes neural network architectures for TinyML time series classification on CPUs, reducing resource usage and latency while maintaining high accuracy.
http://arxiv.org/abs/2408.16527v1
Compressor summary: The paper proposes using a Bayesian hierarchical model to infer foundation stiffness distribution in offshore wind farms, improving structural health monitoring by detecting anomalies like scour.
http://arxiv.org/abs/2408.16518v1
Compressor summary: The paper introduces CNIMA, a Chinese dialogue dataset with interactivity annotations, evaluates an existing framework for English on Chinese data, and proposes an automated evaluation system for second language assessment.
http://arxiv.org/abs/2408.16517v1
Compressor summary: AutoVCL is a continual learning model that adapts its hyperparameters to the task difficulty and similarity for better performance than standard GVCL.
http://arxiv.org/abs/2408.16506v1
Compressor summary: Our approach generates high-quality, consistent, and realistic video animations from static images using a training-free framework that aligns skeletal, motion, and pixel information.
http://arxiv.org/abs/2408.16503v1
Compressor summary: The study proposes a new model that uses local attention and multiscale features to accurately count densely distributed pests in images captured by digital traps.
http://arxiv.org/abs/2408.16502v1
Compressor summary: The text compares the performance and cost-effectiveness of large language models for data augmentation tasks with other established methods.
http://arxiv.org/abs/2408.16501v1
Compressor summary: This research develops a system that automatically selects and fuses detection results from teams of UAVs to accurately locate objects in search and rescue missions.
http://arxiv.org/abs/2408.16500v1
Compressor summary: The authors introduce CogVLM2 family, a new generation of visual language models for image and video understanding, which achieves state-of-the-art results on various benchmarks and is open-sourced.
http://arxiv.org/abs/2408.16495v1
Compressor summary: The research optimizes the Transformer AI model for time-series forecasting on resource-limited sensor devices using FPGAs and Quantization-aware Training.
http://arxiv.org/abs/2408.16493v1
Compressor summary: ANGEL is a framework that improves biomedical entity linking by training generative models with negative samples, leading to better accuracy on five benchmarks.
http://arxiv.org/abs/2408.16486v1
Compressor summary: The paper proposes a test-time prompt tuning approach for vision-language models that uses maximum concept matching scores as dynamic weights to improve generalization on open-set problems.
http://arxiv.org/abs/2408.16482v1
Compressor summary: The paper proposes a simple and cheap method using in-context learning and human survey data to align large language models better with cultural values, and shows it works across different languages and models.
http://arxiv.org/abs/2408.16463v1
Compressor summary: The study uses artificial intelligence and deep learning to improve the accuracy of suicide risk prediction in psychological support hotlines, outperforming traditional methods.
http://arxiv.org/abs/2408.16457v1
Compressor summary: Key points: - Hypergraphs model complex relationships in various domains - HYGENE is a diffusion-based method that generates realistic and diverse hypergraphs - HYGENE works on a bipartite representation of hypergraphs and adds nodes and hyperedges iteratively using a denoising diffusion process - HYGENE closely mimics properties in hypergraphs and is the first deep learning model for hypergraph generation Summary: HYGENE is a novel deep learning method that generates realistic and diverse hypergraphs by progressively expanding a bipartite representation of nodes and hyperedges using a denoising diffusion process.
http://arxiv.org/abs/2408.16451v1
Compressor summary: The authors propose a novel automated method using Vision Transformer and Multiple instance learning to accurately detect tooth-marked tongues in Traditional Chinese Medicine, improving objectivity and clinical value.
http://arxiv.org/abs/2408.16450v1
Compressor summary: The paper introduces HairFusion, a one-stage diffusion model that transfers hairstyles to face images while preserving other appearance features, using hair-agnostic representations and adaptive hair blending.
http://arxiv.org/abs/2408.16448v1
Compressor summary: The proposed audio-visual learning framework combats false negatives in sound source localization using self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL), achieving superior performance and versatility in various tasks.
http://arxiv.org/abs/2408.16446v1
Compressor summary: The study compares different methods to classify medieval charters based on their text, finding that normalization may harm dating accuracy and that certain models perform better than others.
http://arxiv.org/abs/2408.16444v1
Compressor summary: The paper presents SurveySum, a new dataset for summarizing scientific articles into survey sections, and evaluates two pipelines for this task, showing that high-quality retrieval is crucial.
http://arxiv.org/abs/2408.16442v1
Compressor summary: The study shows that feature fusion improves activity recognition accuracy using deep learning models, with PO-GCN performing best among four datasets.
http://arxiv.org/abs/2408.16440v1
Compressor summary: Instruction-tuned large language models perform better than baseline ones in translating medical terminology.
http://arxiv.org/abs/2408.16429v1
Compressor summary: CAVI-CMN is a fast, gradient-free Bayesian method that uses conditional mixture networks to solve complex classification tasks with competitive accuracy and efficiency.
http://arxiv.org/abs/2408.16426v1
Compressor summary: COIN is a novel method that uses control-inpainting score distillation and human-scene relation loss to disentangle human and camera motions and achieve state-of-the-art results in global human motion estimation and camera motion estimation.
http://arxiv.org/abs/2408.16425v1
Compressor summary: The paper evaluates three hyperparameter tuning algorithms and shows that nonlinear models with well-tuned hyperparameters perform better than linear ones, but the best algorithm depends on the task and model type.
http://arxiv.org/abs/2408.16414v1
Compressor summary: Key points: - The paper proposes a spectral-based neural network that replaces automatic differentiation with multiplication for solving PDEs with PINNs. - The approach reduces memory and training time, and improves accuracy due to exponential convergence of spectral basis. - The paper also provides two strategies to train networks with spectral information. Summary: The paper introduces a new spectral-based neural network that simplifies automatic differentiation, reduces computation costs, and enhances accuracy for solving PDEs with physics-informed neural networks, along with two methods to leverage spectral information in training.
http://arxiv.org/abs/2408.16403v1
Compressor summary: DeepSPoC combines sequential propagation of chaos with deep learning to solve nonlinear Fokker-Planck equations and uses various neural network architectures for high-dimensional problems.
http://arxiv.org/abs/2408.16395v1
Compressor summary: The paper proposes a novel occlusion strategy called Inpainting-Based Occlusion (IBO) that uses a Denoising Diffusion Probabilistic Model to generate realistic, non-cancerous tissue in histopathological images, improving interpretability and trustworthiness of Explainable Artificial Intelligence techniques for cancer diagnosis.
http://arxiv.org/abs/2408.16393v1
Compressor summary: The paper explores the trade-off between diversity and quality in optimization problems and proposes a new approach based on subset selection and random sampling.
http://arxiv.org/abs/2408.16391v1
Compressor summary: TempoKGAT is a graph attention network that uses time-decaying weights, selective neighbor aggregation, and top-k neighbor selection to handle dynamic, temporal data and achieve superior performance on spatio-temporal datasets.
http://arxiv.org/abs/2408.16390v1
Compressor summary: This study proposes a new evaluation metric (MQM-Chat) for machine translation models that handle chat translations, revealing different errors and highlighting the significance of preserving style and consistency.
http://arxiv.org/abs/2408.16389v1
Compressor summary: The note clarifies the KART and UAT, correcting common errors in neural network literature to improve comprehension.
http://arxiv.org/abs/2408.16380v1
Compressor summary: The paper presents an approach that uses time and multimodal signals to detect F formations in videos and predict the next speaker in a conversation using a recursive neural network (LSTM).
http://arxiv.org/abs/2408.16379v1
Compressor summary: TG-PhyNN is a new framework that combines Graph Neural Networks with physical constraints to improve forecasting of spatio-temporal data in domains like traffic and disease spread.
http://arxiv.org/abs/2408.16357v1
Compressor summary: The "Law of Vision Representation" shows how cross-modal alignment and correspondence affect multimodal language models' performance and helps optimize their vision representation with less computation.
http://arxiv.org/abs/2408.16345v1
Compressor summary: The study examines if nucleus sampling reduces text memorization in large language models and finds that it has limited effect, and that soft memorization can still occur.
http://arxiv.org/abs/2408.16343v1
Compressor summary: The study introduces a multimodal classification model that uses clinical, cognitive, neuroimaging, and EEG data to enhance the accuracy of diagnosing Alzheimer's disease and differentiate it from other conditions.
http://arxiv.org/abs/2408.16337v1
Compressor summary: LEGraphs and LESets are novel graph neural network models for high-entropy alloys that use local environment graphs to capture their complex structure and predict their mechanical properties.
http://arxiv.org/abs/2408.16336v1
Compressor summary: GL-TSVM is a robust and smooth classifier that uses a novel loss function to address TSVM's sensitivity to noise and has lower computational complexity than SVM.
http://arxiv.org/abs/2408.16333v1
Compressor summary: SIMS is a new training concept for diffusion models that uses self-synthesized data to improve generative AI without compromising quality or diversity, and can adjust the synthetic data distribution to match in-domain target distributions.
http://arxiv.org/abs/2408.16331v1
Compressor summary: Guided Reasoning is a method where one agent helps other agents improve their reasoning quality, and Logikon is an example of this concept.
http://arxiv.org/abs/2408.16326v1
Compressor summary: Key points: - Self-critic helps improve LLM's reasoning performance but current approaches are simple and limited - Critic-CoT is a new framework that enhances LLM's critic capability via step-wise CoT reasoning and distant-supervision data - Experiments show that Critic-CoT boosts task-solving performance and improves generation Summary: Critic-CoT is a novel framework that leverages self-critic to enhance LLM's reasoning and critic ability via step-wise CoT reasoning and distant-supervision data, achieving better task-solving and generation results.
http://arxiv.org/abs/2408.16325v1
Compressor summary: The paper proposes a novel method for point cloud denoising that learns an optimal transport plan between paired point clouds and improves over existing methods with or without additional features.
http://arxiv.org/abs/2408.16321v1
Compressor summary: The paper proposes an algorithm for updating a decision tree with minimal human auditing, using a greedy approach and a customised objective function.
http://arxiv.org/abs/2408.16314v1
Compressor summary: The ResVG model improves visual grounding by enhancing semantics and handling spatial relations in images with multiple distractions using text-to-image generation and relation-sensitive data augmentation.
http://arxiv.org/abs/2408.16313v1
Compressor summary: This paper introduces FA-YOLO, a novel object detection model that improves over YOLOv9 by enhancing feature selection, fusion, and detection accuracy for small, medium, and large targets in complex environments.
http://arxiv.org/abs/2408.16310v1
Compressor summary: SlotSAM enhances foundation models' object-level perception and generalization by reconstructing features and integrating them into the model using self-supervised learning.
http://arxiv.org/abs/2408.16305v1
Compressor summary: The paper proposes a semantics-oriented multitask learning approach for DeepFake detection using joint embedding of face images and textual descriptions, improving generalizability and interpretability.
http://arxiv.org/abs/2408.16296v1
Compressor summary: The paper proposes a new method for image retrieval using multi-modal large language models and data augmentation techniques, which outperforms conventional methods on several datasets.
http://arxiv.org/abs/2408.16293v1
Compressor summary: The paper explores using error-correction data in pretraining language models to improve reasoning accuracy without multi-round prompting.
http://arxiv.org/abs/2408.16289v1
Compressor summary: Key points: - The paper proposes a compression method for deep neural networks using VBMF and orthogonal regularization - The method estimates the rank of the weight tensor at each layer and preserves accuracy - The method is general and adaptable to other convolutional neural networks and tensor decomposition methods Summary: The paper introduces a compression method for deep neural networks that combines VBMF and orthogonal regularization to estimate the rank of the weight tensor and maintain accuracy, and can be applied to various networks and tensor decomposition methods.
http://arxiv.org/abs/2408.16287v1
Compressor summary: The paper evaluates the accuracy and reliability of automatic speech recognition (ASR) captioning for d/Deaf and hard of hearing people, finding wide variation between services and lower quality for streaming ASR.
http://arxiv.org/abs/2408.16286v1
Compressor summary: The paper introduces the first algorithm to find near-optimal policies for robust constrained MDPs, addressing a limitation of existing methods by using the epigraph form and binary search.
http://arxiv.org/abs/2408.16285v1
Compressor summary: Art is a Python library that helps impose rules and standards for developing deep learning models, improving interpretability and robustness.
http://arxiv.org/abs/2408.16284v1
Compressor summary: Key points: - Paper proposes an adaptive ensemble learning framework for accurate customer churn prediction - Framework combines multiple base models, meta-feature generation, and data preprocessing - Achieves 99.28% accuracy on three telecom datasets, outperforming existing methods Summary: The paper presents a novel framework that uses multiple models, feature engineering, and data preprocessing to predict customer churn in telecom industry with high accuracy.
http://arxiv.org/abs/2408.16278v1
Compressor summary: The paper introduces an improved tensor network model (ECTN) for predicting web service Quality of Service (QoS), which considers user and service correlation in low-dimensional space and achieves better accuracy than existing models.
http://arxiv.org/abs/2408.16276v1
Compressor summary: The authors propose using large language models with layered prompting systems to improve AI-driven psychological consultation services by enhancing their emotional intelligence and contextual understanding.
http://arxiv.org/abs/2408.16273v1
Compressor summary: The text proposes a two-branch model that uses synthetic data to address the challenge of long-tailed image recognition and improve accuracy.
http://arxiv.org/abs/2408.16272v1
Compressor summary: SRAM is a novel network module for Video Temporal Grounding that uses cross-modal alignment and Deep Evidential Regression to estimate uncertainties and handle open-world challenges, such as noisy and out-of-distribution data.
http://arxiv.org/abs/2408.16268v1
Compressor summary: UDD is a novel approach that identifies and exploits underutilized regions in synthetic datasets to improve model performance using response-based and data jittering-based policies, as well as a category-wise feature contrastive loss.
http://arxiv.org/abs/2408.16266v1
Compressor summary: The paper introduces Diff-II, a novel diffusion-based data augmentation method that balances faithfulness and diversity for improving image classification performance on various tasks.
http://arxiv.org/abs/2408.16265v1
Compressor summary: LSCD-TTA is a novel test time adaptation method for cross-domain remote sensing image classification that considers distribution characteristics, achieving fast and accurate adaptations without much prior data or manual annotation.
http://arxiv.org/abs/2408.16264v1
Compressor summary: The paper proposes LoraMap, a method to connect multiple Low-Rank Adaptation models for fact-checking tasks, which improves performance over existing methods with fewer parameters.
http://arxiv.org/abs/2408.16262v1
Compressor summary: The paper studies Q-learning algorithms for large state space problems under average-reward criterion, extending previous analysis from unichain to weakly communicating MDPs and characterizing their convergence sets.
http://arxiv.org/abs/2408.16261v1
Compressor summary: The study proposes a metric, the K-spectral metric, to estimate the performance of deep SSMs on time-series datasets early in the training process, reducing the cost of data collection for new tasks.
http://arxiv.org/abs/2408.16256v1
Compressor summary: Key points: - Breast cancer is a leading cause of death in women but many cases are cured by local treatment - Current prognostic metrics are not reliable and often lead to unnecessary adjuvant therapies - The authors propose using machine learning algorithms to develop better prognostics based on existing data Summary: The authors suggest that machine learning can improve breast cancer prognostics using routine data, reducing the need for unreliable current metrics and potentially sparing some women from unnecessary treatments.
http://arxiv.org/abs/2408.16249v1
Compressor summary: The paper proposes a new method to train probabilistic models from unnormalized densities using energy functions, which performs better than existing methods and can handle complex high-dimensional systems.
http://arxiv.org/abs/2408.16247v1
Compressor summary: The paper proposes a new method for object detection that uses multiple incomplete datasets and improves performance on COCO and VOC benchmarks.
http://arxiv.org/abs/2408.16245v1
Compressor summary: The text describes the development and success of multi-omic nucleotide-peptide foundation models for predicting peptide-nucleotide interactions and identifying involved residues.
http://arxiv.org/abs/2408.16241v1
Compressor summary: The thesis proposes new methods for enhancing transformer language models, improving their capabilities for tasks like sequence-to-sequence generation and few-shot classification, and discusses the trade-off between model likelihood and output quality.
http://arxiv.org/abs/2408.16236v1
Compressor summary: Key points: - Neural Spectrum Decomposition (NSD) is a new framework for dataset distillation that considers the entire dataset as a low-rank observation - NSD learns spectrum tensors and transformation matrices that reconstruct the data distribution through matrix multiplication - NSD uses trajectory matching optimization with real distribution guidance and achieves state-of-the-art performance on various benchmarks Summary: NSD is a novel dataset distillation method that decomposes high-dimensional data into low-rank components and optimizes their combinations to reconstruct the data distribution, achieving excellent results on several image datasets.
http://arxiv.org/abs/2408.16235v1
Compressor summary: The paper proposes a semi-supervised method, LMT-GP, for low-light image enhancement that leverages both labeled and unlabeled data using latent mean-teacher and Gaussian process techniques to improve generalization ability and visual quality.
http://arxiv.org/abs/2408.16233v1
Compressor summary: PSE-Net is a novel parallel algorithm for efficient channel pruning of deep neural networks that simulates multiple subnets' training in one round and uses prior information to improve evolutionary search efficiency.
http://arxiv.org/abs/2408.16232v1
Compressor summary: The paper presents a method that combines diffusion models with latent space manipulation and gradient-based selective attention to generate realistic images based on textual descriptions while preserving the reference image features.
http://arxiv.org/abs/2408.16227v1
Compressor summary: PGFuse is a framework that uses distortion-aware Gabor filters to estimate depth from monocular 360 images, improving feature extraction and reducing distortions.
http://arxiv.org/abs/2408.16224v1
Compressor summary: The paper proposes an SGE module for VLMs that extracts and expresses complex semantic information in images, improving their perception and performance in vision-language tasks.
http://arxiv.org/abs/2408.16219v1
Compressor summary: The paper proposes a Training-Free Video Temporal Grounding approach that uses pre-trained large models to select the most relevant video segments for a given natural language query, addressing issues of temporal boundaries and dynamic transitions between events in videos.
http://arxiv.org/abs/2408.16218v1
Compressor summary: The proposed method uses a neural network to learn causal relationships between variables and a target variable, identifying both direct and indirect causes in large-scale systems.
http://arxiv.org/abs/2408.16213v1
Compressor summary: M4CXR is a multi-modal LLM that enhances CXR interpretation by performing tasks such as medical report generation, visual grounding, and visual question answering with state-of-the-art clinical accuracy.
http://arxiv.org/abs/2408.16209v1
Compressor summary: The study analyzes how words representing the same concepts changed over time using historical data and word embeddings, revealing language-society connections and discussing methodological and ethical challenges.
http://arxiv.org/abs/2408.16208v1
Compressor summary: ReXamine-Global is a framework that tests the generalization of radiology report quality metrics using a large language model and 240 reports from six hospitals, revealing gaps in existing metrics' robustness.
http://arxiv.org/abs/2408.16204v1
Compressor summary: Micro-batch clipping improves ASR model performance by reducing convergence time but introduces a constant bias dependent on factors like sample quality and domain diversity.
http://arxiv.org/abs/2408.16202v1
Compressor summary: This paper reviews deep learning applications in short-term electricity load forecasting, discussing the forecasting process, challenges, and future directions.
http://arxiv.org/abs/2408.16201v1
Compressor summary: The paper proposes a novel 3D anomaly detection method for manufacturing systems that can identify all defect types, including geometric and missing regions, on both model-free products and existing products.
http://arxiv.org/abs/2408.16195v1
Compressor summary: The paper proposes a VMTL method for video understanding tasks, which uses a Double-Layers Mapper to extract shareable knowledge from multiple tasks and improve performance.
http://arxiv.org/abs/2408.16191v1
Compressor summary: The paper presents a variational mode graph convolutional network (VMGCN) that uses variational mode decomposition (VMD) to decompose spatio-temporal traffic data into modes for improved prediction.
http://arxiv.org/abs/2408.16187v1
Compressor summary: The paper presents new datasets for streaming regression on energy prices in New Zealand and analyzes various aspects of their use and potential research directions.