This vignette introduces the CoFAST workflow for the analysis of NSCLC CosMx spatial transcriptomics dataset. In this vignette, the workflow of CoFAST consists of three steps
We demonstrate the use of CoFAST to NSCLC data, which can be downloaded to the current working path by the following command:
set.seed(2024) # set a random seed for reproducibility.
library(ProFAST) # load the package of FAST method
data(CosMx_subset)
CosMx_subsetThe package can be loaded with the command:
First, we normalize the data.
Then, we select the variable genes.
We introduce how to use FAST to perform coembedding for this CosMx data. First, we determine the dimension of coembeddings. Then, we select the variable genes.
dat_cor <- diagnostic.cor.eigs(CosMx_subset)
q_est <- attr(dat_cor, "q_est")
cat("q_est = ", q_est, '\n')Subsequently, we calculate coembeddings by utilizing FAST, and
observe that the reductions field acquires an additional
component named fast.
In the following, we show how to find the signature genes based on comebeddings. First, we calculate the distance matrix.
Next, we find the signature genes for each cell type
print(table(CosMx_subset$cell_type))
Idents(CosMx_subset) <- CosMx_subset$cell_type
df_sig_list <- find.signature.genes(CosMx_subset)
str(df_sig_list)Then, we obtain the top five signature genes and organize them into a
data.frame. Next, we calculate the UMAP projections of coembeddings. The
colname distance means the distance between gene (i.e.,
MS4A1) and cells with the specific cell type (i.e., B cell), which is
calculated based on the coembedding of genes and cells in the
coembedding space. The distance is smaller, the association between gene
and the cell type is stronger. The colname expr.prop
represents the expression proportion of the gene (i.e., MS4A1) within
the cell type (i.e., B cell). The colname label means the
cell types and colname gene denotes the gene name. By the
data.frame object, we know MS4A1 is the one of the top
signature gene of B cell.
Next, we calculate the UMAP projections of coembeddings of cells and the selected signature genes.
CosMx_subset <- coembedding_umap(
  CosMx_subset, reduction = "fast", reduction.name = "UMAP",
  gene.set = unique(dat$gene))Furthermore, we visualize the cells and top two signature genes of tumor 5 in the UMAP space of coembedding. We observe that the UMAP projections of the two signature genes are near to B cells, which indicates these genes are enriched in B cells.
## choose beutifual colors
cols_cluster <- c("black", PRECAST::chooseColors(palettes_name = "Blink 23", n_colors = 21, plot_colors = TRUE))
p1 <- coembed_plot(
   CosMx_subset, reduction = "UMAP",
   gene_txtdata = subset(dat, label=='tumor 5'),
   cols=cols_cluster, pt_text_size = 3)
p1Then, we visualize the cells and top two signature genes of all involved cell types in the UMAP space of coembedding. We observe that the UMAP projections of the signature genes are near to the corresponding cell type, which indicates these genes are enriched in the corresponding cells.
p2 <- coembed_plot(
   CosMx_subset, reduction = "UMAP",
   gene_txtdata = dat, cols=cols_cluster, 
   pt_text_size = 3, alpha=0.2)
p2In addtion, we can fully take advantages of the visualization
functions in Seurat package for visualization. The
following is an example that visualizes the cell types on the UMAP
space.
cols_type <- cols_cluster[-1]
names(cols_type)<-  sort(levels(Idents(CosMx_subset)))
DimPlot(CosMx_subset, reduction = 'UMAP', cols=cols_type)Then, there is another example that we plot the first two signature genes of Tumor 5 on UMAP space, in which we observed the high expression in B cells in constrast to other cell types.
sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                               
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.47        
#>  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.28   
#>  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
#> [13] compiler_4.4.1    rstudioapi_0.16.0 tools_4.4.1       evaluate_1.0.0   
#> [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9