A Robust Nonparametric Framework for Repeated Spatial Clustering: Insights from Spatial Omics Theses uri icon

  •  
  • Overview
  •  

abstract

  • In a spatial clustering context, it is often necessary to form contiguous clusters, as observations in proximity tend to exhibit dependency. Another important aspect of the spatial clustering process is identifying repeated spatial clusters that are not necessarily in proximity. Both aspects are particularly relevant in spatial omics, where we perform spatial clustering using multivariate genomic features and spatial location information to find tumor microenvironments. These microenvironments are often contiguous within a region, but can also appear as spatially separated patches, which represent repeated spatial clusters. Various clustering techniques can be used for spatial clustering, such as constrained hierarchical clustering, constrained partitioning, and density-based clustering, all of which would ensure a certain degree of spatial contiguity in clusters. However, they may struggle to detect repeated spatial clusters consistently across different degrees of spatial dependence. To address this limitation, we propose a post-clustering framework that can be combined with constrained clustering to identify repeated spatial clusters. Constrained clustering often overestimates the number of clusters compared to the ground truth when repeated spatial clusters are present. Our post-clustering framework, which uses a nonparametric test, evaluates these clusters to identify any repeated clusters, allowing for a re-partitioning of the clusters. Specifically, in this thesis we use constrained agglomerative hierarchical clustering for the initial clustering. For the post-clustering framework, we use a nonparametric test based on Maximum Mean Discrepancy (MMD) and block permutation to assess whether the distributions of multivariate attributes within the clusters are similar. This approach allows for re-partitioning of the clusters if necessary. To evaluate the proposed method, we conducted a simulation study under varying conditions, including different levels of spatial dependence, cluster shapes, numbers of multivariate attributes, and numbers of spatial locations. We also applied the framework to identify repeated tumor microenvironments in triple-negative breast cancer patients using spatial proteomics data. The simulation study and the application to triple negative cancer data demonstrated effectiveness of the proposed framework in identifying repeated spatial clusters. These findings highlight the value of our framework as a powerful tool for analyzing spatial data.

publication date

  • 2024