Starfysh tutorial on simulated ST dataset

Azizi Lab

Siyu He, Yinuo Jin

02-02-2023

This is a tutorial on a simple simulated ST data (simulated_ST_data_1) generated from scRNA-seq data.

[1]:

import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    !pip3 install scanpy
    #!pip3 install histomicstk
    from google.colab import drive
    drive.mount('/content/drive')
    import sys
    sys.path.append('/content/drive/MyDrive/SpatialModelProject/model_test_colab/')

[2]:

import os
import numpy as np
import pandas as pd
import torch

[3]:

import matplotlib.pyplot as plt
import matplotlib.font_manager
from matplotlib import rcParams

font_list = []
fpaths = matplotlib.font_manager.findSystemFonts()
for i in fpaths:
    try:
        f = matplotlib.font_manager.get_font(i)
        font_list.append(f.family_name)
    except RuntimeError:
        pass

font_list = set(font_list)
plot_font = 'Helvetica' if 'Helvetica' in font_list else 'FreeSans'

rcParams['font.family'] = plot_font
rcParams.update({'font.size': 10})
rcParams.update({'figure.dpi': 300})
rcParams.update({'figure.figsize': (3,3)})
rcParams.update({'savefig.dpi': 500})

import warnings
warnings.filterwarnings('ignore')

Load starfysh

[4]:

from starfysh import (AA, utils, plot_utils, post_analysis)
from starfysh import starfysh as sf_model

(1). load data and marker genes:

File Input: - Spatial transcriptomics - Count matrix: adata - (Optional): Paired histology & spot coordinates: img, map_info

Annotated signatures (marker genes) for potential cell types: gene_sig

Starfysh is built upon scanpy and Anndata. The common ST/Visium data sample folder consists a expression count file (usually filtered_feature_bc_matrix.h5), and a subdirectory with corresponding H&E image and spatial information, as provided by Visium platform.

For example, our example ST data + marker genes has the following structure (If you’re running this tutorial locally, please download the simulation data and signature gene sets and save them accordingly):

├── ../data
    tnbc_signature.csv

    ├── simulated_ST_data_1:
        \__ counts.st_synth.csv

        ├── (Optional) spatial:
            \__ aligned_fiducials.jpg
                detected_tissue_image.jpg
                scalefactors_json.json
                tissue_hires_image.png
                tissue_lowres_image.png
                tissue_positions_list.csv

For data that doesn’t follow the common visium data structure (e.g. missing filtered_featyur_bc_matrix.h5 or the given .h5ad count matrix file lacks spatial metadata), please construct the data as Anndata synthesizing information as the example simulated data shows:

[5]:

# Loading expression counts and signature gene sets
data_path = '../data/'
sample_id = 'simulated_ST_data_1'
adata, adata_normed = utils.load_adata(data_folder=data_path,  # root data directory
                                       sample_id=sample_id,  # sample_id
                                       n_genes=2000  # number of highly variable genes to keep
                                      )


gene_sig = pd.read_csv(os.path.join('../data','tnbc_signature.csv'))
gene_sig = utils.filter_gene_sig(gene_sig,adata.to_df()) # filter out low-quality marker genes
gene_sig.head()

[2023-02-26 18:58:43] Preprocessing1: delete the mt and rp
[2023-02-26 18:58:44] Preprocessing2: Normalize
[2023-02-26 18:58:44] Preprocessing3: Logarithm
[2023-02-26 18:58:44] Preprocessing4: Find the variable genes

[5]:

	CAFs	Cancer Epithelial	Myeloid	Normal Epithelial	T-cells
0	DCN	SCGB2A2	C1QB	KRT14	CCL5
1	COL1A1	CD24	LYZ	KRT17	IL7R
2	LUM	MUCL1	C1QA	LTF	GNLY
3	COL1A2	KRT19	C1QC	KRT15	NKG7
4	SFRP2	SCGB1D2	TYROBP	PTN	CD3E

[6]:

adata.var.head()

[6]:

	highly_variable
MALAT1	True
EEF1A1	True
B2M	False
ACTB	False
TMSB4X	True

[7]:

adata

[7]:

AnnData object with n_obs × n_vars = 2551 × 7704
    obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt'
    var: 'highly_variable'

[8]:

# Load spatial information
# [Note]:
# For simulated data without spatial coordinates, we replace them with the umap location:
umap_df = utils.get_umap(adata, display=True)
map_info = utils.get_simu_map_info(umap_df)

# Create dummy dict. as img_metadata
img_metadata = {
    'img': None,
    'map_info': map_info,
    'scalefactor': None
}

../_images/notebooks_Starfysh_tutorial_simulation_10_0.png

(2). Preprocessing (finding anchor spots)

Identify anchor spot locations For simulated data without spatial coordinates, we replace them the umap location:

Instantiate parameters for Starfysh model training: - Raw & normalized counts after taking highly variable genes - filtered signature genes - library size & spatial smoothed library size (log-transformed) - Anchor spot indices (anchors_df) for each cell type & their signature means (sig_means)

[9]:

# Parameters for training
visium_args = utils.VisiumArguments(adata,
                                    adata_normed,
                                    gene_sig,
                                    img_metadata,
                                    n_anchors=60,
                                    window_size=3
                                   )

adata, adata_normed = visium_args.get_adata()

[2023-02-26 18:58:49] Subsetting highly variable & signature genes ...
[2023-02-26 18:58:49] Smoothing library size by taking averaging with neighbor spots...
[2023-02-26 18:58:50] Retrieving & normalizing signature gene expressions...
[2023-02-26 18:58:51] Identifying anchor spots (highly expression of specific cell-type signatures)...

Visualize the spatial data

[10]:

plot_utils.plot_spatial_feature(adata,
                                map_info,
                                visium_args.log_lib,
                                label='log library size'
                               )

../_images/notebooks_Starfysh_tutorial_simulation_15_0.png

[11]:

plot_utils.plot_spatial_feature(adata,
                                map_info,
                                visium_args.win_loglib,
                                label='windowed log library size',
                                )

../_images/notebooks_Starfysh_tutorial_simulation_16_0.png

plot raw gene expression:

[12]:

plot_utils.plot_spatial_gene(adata,
                             map_info,
                             gene_name='COL1A1'
                            )

../_images/notebooks_Starfysh_tutorial_simulation_18_0.png

Visuliaze anchor spots

[13]:

plot_utils.plot_anchor_spots(umap_df,
                             visium_args.pure_spots,
                             visium_args.sig_mean,
                             bbox_x=2
                            )

../_images/notebooks_Starfysh_tutorial_simulation_20_0.png

We observe that Cancer & Normal epithelial anchor spots are highly overlapped given input signatures.

(3). Optional: Archetypal Analysis

Overview: If users don’t provide annotated gene signature sets with cell types, Starfysh identifies candidates for cell types via archetypal analysis (AA). The underlying assumption is that the geometric “extremes” are identified as the purest cell types, whereas all other spots are mixture of the “archetypes”. If the users provide the gene signature sets, they can still optionally apply AA to refine marker genes and update anchor spots for known cell types. In addition, AA can identify & assign potential novel cell types / states. Here are the features provided by the optional archetypal analysis: - Finding archetypal spots & assign 1-1 mapping to their closest anchor spot neighbors - Finding archetypal marker genes & append them to marker genes of annotated cell types - Assigning novel cell type / cell states as the most distant archetypes

Overall, Starfysh provides the archetypal analysis as a complementary toolkit for automatic cell-type annotation & signature gene completion:

If signature genes aren’t provided: Archetypal analysis defines the geometric extrema of the data as major cell types with corresponding marker genes.
If full signature genes are known: Users can skip this section and use only the signature priors
If signature genes are incomplete or require refinement: Archetypal analysis can be applied to
1. Refine signatures of certain cell types
2. Find novel cell types / states that haven’t been provided from the input signature

If signature genes aren’t provided

Note: - Intrinsic Dimension (ID) estimator is implemented to estimate the lower-bound for the number of archetypes \(k\), followed by elbow method with iterations to identify the optimal \(k\). By default, a conditional number is set as 30; if you believe there are potentially more / fewer cell types, please increase / decrease cn accordingly.

Major cell types & corresponding markers are represented by the inferred archetypes:

aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed)
archetype, arche_dict, major_idx, evs = aa_model.compute_archetypes(r=40)

# (1). Find archetypal spots & archetypal clusters
arche_df = aa_model.find_archetypal_spots(major=True)

# (2). Define "signature genes" as marker genes associated with each archetypal cluster
gene_sig = aa_model.find_markers(n_markers=30, display=False)
gene_sig.head()

If full signature genes are known

Users can skip this section & run Starfysh

If signature genes are provided with potential incomplete annotations

In this tutorial, we show an example to re-define anchors & signatures with archetypal analysis: In the annotated gene signatures provided by Wu et al., we found that the marker from normal and cancer epithelials highly overlaps, thus creating confusions to separate those cell types. Here we re-define markers and update new anchor spots for Cancer Epithelials from the most distant archetype unmapped to any cell-type specific anchor spots:

[14]:

aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed)
archetype, arche_dict, major_idx, evs = aa_model.compute_archetypes(cn=10, converge=1e-2, display=False)
# (1). Find archetypal spots & archetypal clusters
arche_df = aa_model.find_archetypal_spots(major=True)

# (2). Find marker genes associated with each archetypal cluster
markers_df = aa_model.find_markers(n_markers=30, display=False)

# (3). Map archetypes to closest anchors (1-1 per cell type)
anchors_df = visium_args.get_anchors()
map_df, map_dict = aa_model.assign_archetypes(anchors_df)

# (4). Optional: Find the most distant archetypes that are not assigned to any annotated cell types
distant_arches = aa_model.find_distant_archetypes(anchors_df, n=3)

[2023-02-26 18:58:52] Computing intrinsic dimension to estimate k...

4 components are retained using conditional_number=10.00

[2023-02-26 18:58:53] Estimating lower bound of # archetype as 3...
[2023-02-26 18:59:02] Calculating UMAPs for counts + Archetypes...
[2023-02-26 18:59:09] Calculating UMAPs for counts + Archetypes...
[2023-02-26 18:59:14] 0.7838 variance explained by raw archetypes.
Merging raw archetypes within 20 NNs to get major archetypes
[2023-02-26 18:59:14] Finding 20 nearest neighbors for each archetype...
[2023-02-26 18:59:15] Finding 30 top marker genes for each archetype...

[15]:

plot_utils.plot_evs(evs, kmin=aa_model.kmin)

../_images/notebooks_Starfysh_tutorial_simulation_28_0.png

Visualize archetypes

[16]:

aa_model.plot_archetypes(do_3d=True, major=True, disp_cluster=True)

[16]:

(<Figure size 1200x800 with 1 Axes>,
 <Axes3DSubplot:xlabel='UMAP1', ylabel='UMAP2'>)

../_images/notebooks_Starfysh_tutorial_simulation_30_1.png

Visualize overlapping ratio between archetypal & anchor spots:

[17]:

aa_model.plot_mapping(map_df)

[17]:

<seaborn.matrix.ClusterGrid at 0x7f62d2e50f70>

../_images/notebooks_Starfysh_tutorial_simulation_32_1.png

Re-calculate anchor spots by replacing Cnacer Epithelial signautures with the most distant archetype:

[18]:

arche_to_repl = distant_arches[0]
visium_args.replace_factors(
    factors_to_repl='Cancer Epithelial',
    arche_markers=markers_df[arche_to_repl]
)

# Get updated adata & signatures
adata, adata_normed = visium_args.get_adata()
gene_sig = visium_args.gene_sig
cell_types = gene_sig.columns

plot_utils.plot_anchor_spots(umap_df,
                             visium_args.pure_spots,
                             visium_args.sig_mean,
                             bbox_x=2
                            )

[2023-02-26 18:59:17] Recalculating anchor spots (highly expression of specific cell-type signatures)...

../_images/notebooks_Starfysh_tutorial_simulation_34_1.png

[Note]: Compared with the previous visualization, it is observed that after replacing signatures of Cancer Epithelial with arch_4, Cancer & Normal epithelial anchors are now separable.

Run starfysh

We perform n_repeat random restarts and select the best model with lowest loss for parameter c (inferred cell-type proportions):

(1). Model parameters

[19]:

n_repeats = 3
epochs = 200
patience = 50
device = torch.device('cuda')

(2). Model training

[20]:

# Run models
model, loss = utils.run_starfysh(visium_args,
                                 n_repeats=n_repeats,
                                 epochs=epochs,
                                 patience=patience,
                                 poe=False,
                                 device=device
                                )

[2023-02-26 18:59:17] Running Starfysh with 3 restarts, choose the model with best parameters...
[2023-02-26 18:59:17]  ===  Restart Starfysh 1 ===

[2023-02-26 18:59:19] Initializing model parameters...
[2023-02-26 18:59:39] Epoch[10/200], train_loss: 2007.4214, train_reconst: 1847.7540, train_u: 2.8504,train_z: 17.5282,train_c: 117.0974,train_n: 25.0417
[2023-02-26 18:59:58] Epoch[20/200], train_loss: 1936.0011, train_reconst: 1788.0242, train_u: 3.9067,train_z: 20.7216,train_c: 102.4949,train_n: 24.7604
[2023-02-26 19:00:17] Epoch[30/200], train_loss: 1901.7924, train_reconst: 1776.1213, train_u: 6.5800,train_z: 20.9766,train_c: 80.2254,train_n: 24.4690
[2023-02-26 19:00:36] Epoch[40/200], train_loss: 1964.2989, train_reconst: 1809.2235, train_u: 9.0308,train_z: 24.6094,train_c: 106.0554,train_n: 24.4106
[2023-02-26 19:00:55] Epoch[50/200], train_loss: 1951.6084, train_reconst: 1755.0337, train_u: 13.1557,train_z: 24.5766,train_c: 147.7359,train_n: 24.2622
[2023-02-26 19:01:14] Epoch[60/200], train_loss: 1866.3766, train_reconst: 1729.1562, train_u: 17.3164,train_z: 23.5770,train_c: 89.5205,train_n: 24.1230
[2023-02-26 19:01:33] Epoch[70/200], train_loss: 1832.5643, train_reconst: 1705.9919, train_u: 20.3938,train_z: 22.2813,train_c: 80.2160,train_n: 24.0752
[2023-02-26 19:01:52] Epoch[80/200], train_loss: 1790.5258, train_reconst: 1671.8884, train_u: 23.2195,train_z: 17.6176,train_c: 76.9443,train_n: 24.0754
[2023-02-26 19:02:10] Epoch[90/200], train_loss: 1799.7392, train_reconst: 1677.9736, train_u: 24.0068,train_z: 18.0433,train_c: 79.7153,train_n: 24.0069
[2023-02-26 19:02:29] Epoch[100/200], train_loss: 1777.5381, train_reconst: 1660.1315, train_u: 27.9449,train_z: 20.0214,train_c: 73.3570,train_n: 24.0282
[2023-02-26 19:02:48] Epoch[110/200], train_loss: 1767.7843, train_reconst: 1649.0329, train_u: 32.4182,train_z: 17.2925,train_c: 77.5432,train_n: 23.9157
[2023-02-26 19:03:07] Epoch[120/200], train_loss: 1739.0430, train_reconst: 1630.8076, train_u: 30.9504,train_z: 15.4045,train_c: 68.9191,train_n: 23.9119
[2023-02-26 19:03:26] Epoch[130/200], train_loss: 1759.4370, train_reconst: 1647.0556, train_u: 32.7064,train_z: 20.8465,train_c: 67.4470,train_n: 24.0880
[2023-02-26 19:03:45] Epoch[140/200], train_loss: 1771.1122, train_reconst: 1664.5561, train_u: 37.0640,train_z: 20.4750,train_c: 62.0981,train_n: 23.9829
[2023-02-26 19:04:05] Epoch[150/200], train_loss: 1786.9986, train_reconst: 1661.2974, train_u: 42.3813,train_z: 21.5831,train_c: 80.1700,train_n: 23.9481
[2023-02-26 19:04:24] Epoch[160/200], train_loss: 1763.6897, train_reconst: 1646.7902, train_u: 49.5191,train_z: 19.5647,train_c: 73.4100,train_n: 23.9248
[2023-02-26 19:04:43] Epoch[170/200], train_loss: 1739.5695, train_reconst: 1638.8848, train_u: 54.8069,train_z: 17.9102,train_c: 58.9084,train_n: 23.8661
[2023-02-26 19:05:02] Epoch[180/200], train_loss: 1804.4797, train_reconst: 1667.8103, train_u: 58.6232,train_z: 26.3849,train_c: 86.4372,train_n: 23.8472
[2023-02-26 19:05:21] Epoch[190/200], train_loss: 1726.7753, train_reconst: 1627.4848, train_u: 65.4931,train_z: 17.4220,train_c: 58.0368,train_n: 23.8318
[2023-02-26 19:05:40] Epoch[200/200], train_loss: 1734.9239, train_reconst: 1634.0663, train_u: 66.5599,train_z: 17.6778,train_c: 59.2847,train_n: 23.8951
[2023-02-26 19:05:40]  === Finished training ===

[2023-02-26 19:05:40]  ===  Restart Starfysh 2 ===

[2023-02-26 19:05:40] Initializing model parameters...
[2023-02-26 19:05:59] Epoch[10/200], train_loss: 1995.6287, train_reconst: 1837.3655, train_u: 5.5579,train_z: 19.4181,train_c: 114.0334,train_n: 24.8117
[2023-02-26 19:06:18] Epoch[20/200], train_loss: 1905.0378, train_reconst: 1771.1047, train_u: 7.4276,train_z: 20.8725,train_c: 88.6755,train_n: 24.3851
[2023-02-26 19:06:37] Epoch[30/200], train_loss: 1882.0624, train_reconst: 1756.7080, train_u: 9.7762,train_z: 20.4063,train_c: 80.7915,train_n: 24.1564
[2023-02-26 19:06:56] Epoch[40/200], train_loss: 1833.0567, train_reconst: 1719.7439, train_u: 12.7793,train_z: 18.8701,train_c: 70.1789,train_n: 24.2638
[2023-02-26 19:07:15] Epoch[50/200], train_loss: 1781.8627, train_reconst: 1674.4554, train_u: 15.5697,train_z: 16.8745,train_c: 66.4713,train_n: 24.0615
[2023-02-26 19:07:34] Epoch[60/200], train_loss: 1818.5694, train_reconst: 1711.0119, train_u: 18.3740,train_z: 22.6398,train_c: 60.8798,train_n: 24.0380
[2023-02-26 19:07:54] Epoch[70/200], train_loss: 1788.1286, train_reconst: 1685.5714, train_u: 24.6083,train_z: 18.1727,train_c: 60.4125,train_n: 23.9719
[2023-02-26 19:08:14] Epoch[80/200], train_loss: 1747.2367, train_reconst: 1649.4645, train_u: 26.3997,train_z: 16.4421,train_c: 57.4785,train_n: 23.8517
[2023-02-26 19:08:33] Epoch[90/200], train_loss: 1759.1492, train_reconst: 1662.8173, train_u: 29.0645,train_z: 15.5998,train_c: 56.7371,train_n: 23.9951
[2023-02-26 19:08:52] Epoch[100/200], train_loss: 1741.1364, train_reconst: 1645.3145, train_u: 34.1272,train_z: 16.2355,train_c: 55.6609,train_n: 23.9254
[2023-02-26 19:09:11] Epoch[110/200], train_loss: 1714.4742, train_reconst: 1625.6590, train_u: 35.0826,train_z: 14.5555,train_c: 50.4014,train_n: 23.8583
[2023-02-26 19:09:30] Epoch[120/200], train_loss: 1763.7229, train_reconst: 1669.0588, train_u: 34.9643,train_z: 18.5195,train_c: 52.0978,train_n: 24.0468
[2023-02-26 19:09:49] Epoch[130/200], train_loss: 1734.6670, train_reconst: 1647.1458, train_u: 36.8845,train_z: 16.5062,train_c: 47.0256,train_n: 23.9894
[2023-02-26 19:10:08] Epoch[140/200], train_loss: 1720.3481, train_reconst: 1627.6302, train_u: 36.5827,train_z: 15.8767,train_c: 52.8854,train_n: 23.9558
[2023-02-26 19:10:27] Epoch[150/200], train_loss: 1750.2364, train_reconst: 1638.2808, train_u: 37.4025,train_z: 18.6396,train_c: 69.4796,train_n: 23.8364
[2023-02-26 19:10:46] Epoch[160/200], train_loss: 1699.4896, train_reconst: 1612.4391, train_u: 39.3915,train_z: 15.7478,train_c: 47.4444,train_n: 23.8583
[2023-02-26 19:11:05] Epoch[170/200], train_loss: 1672.0840, train_reconst: 1588.1737, train_u: 38.0813,train_z: 14.0079,train_c: 46.0943,train_n: 23.8080
[2023-02-26 19:11:24] Epoch[180/200], train_loss: 1755.8414, train_reconst: 1631.3099, train_u: 36.2186,train_z: 25.0040,train_c: 75.6701,train_n: 23.8573
[2023-02-26 19:11:43] Epoch[190/200], train_loss: 1679.4951, train_reconst: 1586.3100, train_u: 38.4945,train_z: 18.1959,train_c: 51.2051,train_n: 23.7841
[2023-02-26 19:12:02] Epoch[200/200], train_loss: 1667.7667, train_reconst: 1581.7074, train_u: 39.2724,train_z: 17.5819,train_c: 44.6048,train_n: 23.8727
[2023-02-26 19:12:02]  === Finished training ===

[2023-02-26 19:12:02]  ===  Restart Starfysh 3 ===

[2023-02-26 19:12:02] Initializing model parameters...
[2023-02-26 19:12:20] Epoch[10/200], train_loss: 2243.4246, train_reconst: 2047.7742, train_u: 4.0834,train_z: 25.8501,train_c: 144.4164,train_n: 25.3838
[2023-02-26 19:12:39] Epoch[20/200], train_loss: 2141.3988, train_reconst: 1982.1523, train_u: 5.6441,train_z: 23.2923,train_c: 111.2352,train_n: 24.7191
[2023-02-26 19:12:57] Epoch[30/200], train_loss: 2179.7912, train_reconst: 1989.2463, train_u: 8.3567,train_z: 22.7111,train_c: 143.1866,train_n: 24.6472
[2023-02-26 19:13:15] Epoch[40/200], train_loss: 2102.6703, train_reconst: 1932.6035, train_u: 11.3830,train_z: 21.8486,train_c: 123.8787,train_n: 24.3395
[2023-02-26 19:13:34] Epoch[50/200], train_loss: 2017.9727, train_reconst: 1886.6877, train_u: 13.6810,train_z: 19.6696,train_c: 87.4843,train_n: 24.1311
[2023-02-26 19:13:52] Epoch[60/200], train_loss: 1999.8159, train_reconst: 1868.3662, train_u: 16.2567,train_z: 18.8252,train_c: 88.7171,train_n: 23.9074
[2023-02-26 19:14:11] Epoch[70/200], train_loss: 1997.1496, train_reconst: 1875.5822, train_u: 21.1118,train_z: 18.9910,train_c: 78.4048,train_n: 24.1715
[2023-02-26 19:14:30] Epoch[80/200], train_loss: 1962.0342, train_reconst: 1849.1888, train_u: 25.1752,train_z: 16.3818,train_c: 72.4862,train_n: 23.9774
[2023-02-26 19:14:49] Epoch[90/200], train_loss: 1968.2629, train_reconst: 1857.5318, train_u: 27.0822,train_z: 15.2682,train_c: 71.4928,train_n: 23.9702
[2023-02-26 19:15:08] Epoch[100/200], train_loss: 1972.1483, train_reconst: 1857.3633, train_u: 30.1151,train_z: 20.8234,train_c: 69.8915,train_n: 24.0701
[2023-02-26 19:15:27] Epoch[110/200], train_loss: 1983.2419, train_reconst: 1869.9286, train_u: 35.6045,train_z: 19.8388,train_c: 69.4415,train_n: 24.0330
[2023-02-26 19:15:46] Epoch[120/200], train_loss: 2044.8325, train_reconst: 1900.3368, train_u: 36.0063,train_z: 21.5183,train_c: 99.0312,train_n: 23.9462
[2023-02-26 19:16:05] Epoch[130/200], train_loss: 1977.7661, train_reconst: 1862.6077, train_u: 40.7750,train_z: 18.5277,train_c: 72.6335,train_n: 23.9971
[2023-02-26 19:16:24] Epoch[140/200], train_loss: 1918.5328, train_reconst: 1817.9300, train_u: 45.1487,train_z: 15.6843,train_c: 61.1354,train_n: 23.7831
[2023-02-26 19:16:44] Epoch[150/200], train_loss: 1895.6381, train_reconst: 1803.6348, train_u: 43.4698,train_z: 13.6059,train_c: 54.6895,train_n: 23.7078
[2023-02-26 19:17:03] Epoch[160/200], train_loss: 1890.5867, train_reconst: 1796.4138, train_u: 40.7876,train_z: 13.0163,train_c: 57.3963,train_n: 23.7603
[2023-02-26 19:17:21] Epoch[170/200], train_loss: 1883.2295, train_reconst: 1787.3539, train_u: 42.1206,train_z: 15.8035,train_c: 56.3263,train_n: 23.7457
[2023-02-26 19:17:40] Epoch[180/200], train_loss: 1897.7456, train_reconst: 1793.6214, train_u: 46.2003,train_z: 15.8549,train_c: 64.3903,train_n: 23.8790
[2023-02-26 19:17:59] Epoch[190/200], train_loss: 2047.5401, train_reconst: 1820.5626, train_u: 49.1941,train_z: 19.5194,train_c: 183.5427,train_n: 23.9155
[2023-02-26 19:18:17] Epoch[200/200], train_loss: 1908.1888, train_reconst: 1800.0124, train_u: 54.2073,train_z: 18.5950,train_c: 65.6589,train_n: 23.9225
[2023-02-26 19:18:17]  === Finished training ===

(3). Downstream analysis

Parse Starfysh inference output

[21]:

inference_outputs, generative_outputs = sf_model.model_eval(model,
                                                            adata,
                                                            visium_args,
                                                            poe=False,
                                                            device=device)

Visualize starfysh deconvolution results

Gene sig mean vs. inferred prop

[22]:

for idx in range(len(cell_types)):
    post_analysis.gene_mean_vs_inferred_prop(inference_outputs,
                                             visium_args,
                                             idx=idx ## the order of the cell types
                                            )

../_images/notebooks_Starfysh_tutorial_simulation_46_0.png

../_images/notebooks_Starfysh_tutorial_simulation_46_1.png

../_images/notebooks_Starfysh_tutorial_simulation_46_2.png

../_images/notebooks_Starfysh_tutorial_simulation_46_3.png

../_images/notebooks_Starfysh_tutorial_simulation_46_4.png

Compare the deconvolution against the ground-truth proportions

Load simulation ground-truths:

[23]:

member = pd.read_csv(os.path.join('../data','simulated_ST_data_1','members.st_synth.csv'),index_col=0)
proportions = pd.read_csv(os.path.join('../data','simulated_ST_data_1','proportions.st_synth.csv'),index_col=0)

[24]:

# Plot the ground-truth proportions on the inferred Z-space
qz_u = post_analysis.get_z_umap(adata.obsm['qz_m'])
post_analysis.plot_type_all(inference_outputs, qz_u, proportions)

../_images/notebooks_Starfysh_tutorial_simulation_50_0.png

[25]:

post_analysis.get_corr_map(inference_outputs,proportions)

../_images/notebooks_Starfysh_tutorial_simulation_51_0.png

Spatial visualizations:

Inferred density

[26]:

plot_utils.pl_spatial_inf_feature(adata,
                                  feature='ql_m',
                                  spot_size=0.2,
                                  cmap='Blues')

../_images/notebooks_Starfysh_tutorial_simulation_54_0.png

Inferred proportions

Visualization on UMAP of data

[27]:

plot_utils.pl_spatial_inf_feature(adata,
                                  feature='qc_m',
                                  # To display for specific cell types:
                                  # factor = 'Cancer Epithelial', or factor = ['Cancer Epithelial, 'T-cells]
                                  spot_size=0.2,
                                  vmax=0.2)

../_images/notebooks_Starfysh_tutorial_simulation_57_0.png

Visualization on UMAP of inferred ``q(z)`` space

[28]:

plot_utils.pl_spatial_inf_feature(adata,
                                  feature='qz_m',
                                  # To display for specific cell types:
                                  # factor = 'Cancer Epithelial', or factor = ['Cancer Epithelial, 'T-cells]
                                  spot_size=0.2,
                                  vmax=0.2)

../_images/notebooks_Starfysh_tutorial_simulation_59_0.png

../_images/notebooks_Starfysh_tutorial_simulation_59_1.png

../_images/notebooks_Starfysh_tutorial_simulation_59_2.png

../_images/notebooks_Starfysh_tutorial_simulation_59_3.png

../_images/notebooks_Starfysh_tutorial_simulation_59_4.png

Inferred cell-type specific expressions for each spot

[29]:

pred_exprs = sf_model.model_ct_exp(model,
                                   adata,
                                   visium_args,
                                   poe=False,
                                   device=device)

The inferred predictions for each cell type is a S x G' matrix:

[30]:

pred_exprs['CAFs'].shape

[30]:

(2551, 2000)

E.g. Plot spot-level expression of CD69 from T-cells:

[31]:

plot_utils.pl_spatial_inf_gene(adata,
                               factor='T-cells',
                               feature='CD69',
                               spot_size=0.2)

../_images/notebooks_Starfysh_tutorial_simulation_65_0.png

Save model & inferred parameters

[32]:

# Specify output directory
outdir = './results/'
if not os.path.exists(outdir):
    os.mkdir(outdir)

# save the model
torch.save(model.state_dict(), os.path.join(outdir, 'starfysh_model.pt'))

# save `adata` object with inferred parameters
adata.write(os.path.join(outdir, 'st.h5ad'))