Unable to recover experimentally validated cell-type in Schistosoma mansoni single-cell RNASeq data after Seurat version changes #9820
Unanswered
FrancesBlow
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Seurat Team,
I am trying to re-analyse single-cell data from multicellular Schistosoma mansoni parasites from this publication.
The published Seurat (v3) object can be downloaded from here.
There have been updates to several parts of the single-cell analysis pipeline since these data were published in 2020, including version changes from Seurat v3 to v5. When I re-analyse this dataset following the methods in the Seurat v5 integrated analysis vignette (adapted to use any non-default parameters specified in the published analysis e.g. number PCs etc.), I do not detect the oesophageal gland (og) cluster from the publication.

Please see the published reporting and ISH validation of the top marker gene for the og cluster in Fig. 1F from the paper:
Here is the code I used for the Seurat v5 analysis, after performing quality-filtering as outlined in the publication:
adults<-readRDS('Adults_merged.rds')
adults<-JoinLayers(adults)
adults[["RNA"]] <- split(adults[["RNA"]], f = adults$plex)
adults<-SCTransform(adults, variable.features.n=2000)
adults<-RunPCA(adults,npcs=100)
adults.integrated <- IntegrateLayers( object = adults, method = CCAIntegration, orig.reduction = "pca", new.reduction="integrated.cca", verbose = FALSE, normalization.method="SCT")
adults.integrated<-FindNeighbors(adults.integrated,reduction="integrated.cca",dims=1:78)
adults.integrated<-FindClusters(adults.integrated,resolution=5)
adults.integrated<-RunUMAP(adults.integrated,dims=1:78, reduction="integrated.cca")
DimPlot(adults.integrated, label=T, shuffle=T, repel=T) + NoLegend()
FeaturePlot(adults.integrated, features=c("Smp-172180"), label=F, order=T)
Here is the resulting UMAP and the feature plot showing expression of top og marker gene Smp-172180.
Seurat_v5_Smp-172180.pdf
Seurat_v5_UMAP.pdf
When I perform the re-analysis using methods outlined in the paper (namely NormalizeData and ScaleData rather than SCTransform, and IntegrateData rather than IntegrateLayers), I do recover the og cluster.
Here is the code I used for this version of the re-analysis (which I refer to as “v4”):
adults<-readRDS('Adults_merged.rds')
adults<-JoinLayers(adults)
adults.list<-SplitObject(adults, split.by="plex")
adults.list <- lapply(X = adults.list, FUN = function(x) { x <- NormalizeData(x, verbose = FALSE) x <- FindVariableFeatures(x, verbose = FALSE)})
features <- SelectIntegrationFeatures(object.list = adults.list, nfeatures = 2000)
adults.anchors <- FindIntegrationAnchors(object.list = adults.list, dims = 1:78, anchor.features=features)
adults.integrated<-IntegrateData(anchorset=adults.anchors, dims=1:78)
adults.integrated<-ScaleData(adults.integrated)
adults.integrated<-RunPCA(adults.integrated, npcs=100,verbose=FALSE)
adults.integrated<-RunUMAP(adults.integrated,reduction="pca",dims=1:78,n.neighbors=40)
adults.integrated<-FindNeighbors(adults.integrated,reduction="pca",dims=1:78)
adults.integrated<-FindClusters(adults.integrated,resolution=5)
DimPlot(adults.integrated, label=T, shuffle=T, repel=T) + NoLegend()
FeaturePlot(adults.integrated, features=c("Smp-172180"), label=F, order=T)
Here is the resulting UMAP and the feature plot showing expression of top og marker gene Smp-172180 in cluster 88, where I have verified it is identified as a top marker.
Seurat_v4_Smp-172180.pdf
Seurat_v4_UMAP.pdf
I believe that variation in the selection of highly variable genes is driving the detection (or not) of the og cell-type. When I looked at how many of the 27 published og marker genes appear in the 2000 most highly variable genes for each analysis (Seurat v5 vs. v4), I find fewer for v5 (7/27 og marker genes, no og cluster recovered) than for v4 (17/27 og marker genes, og cluster is recovered). Here is a plot showing the relative rank of the 1,325 highly variable genes that are shared between the two analyses, and the position of the 7 og markers in those rankings.
Og_marker_HVGs_v4_v5.pdf
I am struggling to understand which steps in the analysis pipeline cause this discrepancy in detection of the most highly variable features, and whether this explains fully the difference in detection of the og cluster between the two versions of my analysis. My main questions are:
• What changes between Seurat v3 and v5 analysis might be driving which genes are selected as VariableFeatures? I have been assuming it is the normalization and integration steps.
• Would the disparity in og markers included as VariableFeatures be sufficient to prevent detection of the og cluster?
• Or is there something I am doing in the v5 analysis that prevents detection of the og cluster?
Due to the nature of my query, I cannot recreate this issue with the pbmc dataset, but I am providing my code here and can provide Seurat objects if helpful. Any advice you have on why I do not detect this cluster with Seurat v5 default methods would be much appreciated.
Thanks,
Frances
Beta Was this translation helpful? Give feedback.
All reactions