Seurat V5的可能算不出那么多PC

在某个普普通通的一天,我使用Seurat处理数据,但是发现这个数据却有点不那么普通

library(Seurat)

expr_matrix <- Read10X("/data5/ofo_data/yc/filtered_feature_bc_matrix_yc4/")

# 如果你的数据是一个矩阵或数据帧
rownames(expr_matrix) <- gsub("_", "-", rownames(expr_matrix))

seu <- CreateSeuratObject(counts = expr_matrix, project = "example")

seu <- subset(seu, subset =  nFeature_RNA > 500 & nFeature_RNA <12000 & nCount_RNA >1000 & nCount_RNA <70000)
seu <- NormalizeData(seu)
seu <- FindVariableFeatures(seu, selection.method = "dispersion", nfeatures = 1000)
all.genes <- rownames(seu)
seu <- ScaleData(seu, features = all.genes)
seu <- RunPCA(seu, features = VariableFeatures(object = seu), npcs=50)

令人震惊的是,他提示了一个我从未见过的警告,我并没有在意,继续处理数据

Warning in irlba(A = t(x = object), nv = npcs, ...) :
  You're computing too large a percentage of total singular values, use a standard svd instead.
Warning: Number of dimensions changing from 50 to 10
Warning messages:
1: Requested number is larger than the number of available items (11). Setting to 11. 
2: Requested number is larger than the number of available items (11). Setting to 11. 
3: Requested number is larger than the number of available items (11). Setting to 11. 
4: Requested number is larger than the number of available items (11). Setting to 11. 
5: Requested number is larger than the number of available items (11). Setting to 11. 

然后我在构建SNN这一步,我就出错了,提示,给的维度太多了了。确实根据之前的提示信息,Seurat只算了10个PC的样子。

Error in FindNeighbors.Seurat(pbmc, dims = 1:50) : 
  More dimensions specified in dims than have been computed

于是我修改参数,把后续的维度改成了10

seu <- FindNeighbors(seu, dims = 1:10)
seu <- RunUMAP(seu, dims = 1:10)
seu <- FindClusters(seu, resolution = 0.5)
DimPlot(object = seu, reduction = "umap",label = TRUE,label.size = 7,repel = TRUE,pt.size = 0.3) +
  theme(panel.grid = element_blank())

我得到了一个非常抽象的结果,我都不知道该如何解读

9097d8461c5c49e2ff5e22fbddee124

我推测一定是PCA这一步出了问题,我于是打算用警告里面提到可以用svd的方法。但是RunPCA 并没有选择方法的参数,只能自己写一个,代码如下

pca_svd <- function(data, npcs) {
  svd_result <- svd(t(data))  # 转置数据矩阵并进行SVD
  # 提取前`npcs`个成分
  U <- svd_result$u[, 1:npcs]
  S <- diag(svd_result$d[1:npcs])
  V <- svd_result$v[, 1:npcs]
  PCs <- U %*% S  # 计算PC得分
  return(list("PC" = PCs, "rotation" = V))
}


# 提取Seurat对象中的归一化数据
normalized_data <- GetAssayData(seu, layer = "scale.data")

normalized_data <- normalized_data[VariableFeatures(seu), ]

# 指定要计算的主成分数
npcs <- 50  # 例如,计算前20个PC

# 运行SVD计算PCA
svd_pca_result <- pca_svd(normalized_data, npcs)

# 将结果
embedings <-   svd_pca_result$PC

colnames(embedings) <- paste0('PC_', 1:50)
rownames(embedings) <- colnames(seu)

loadings <- svd_pca_result$rotation
colnames(loadings) <- paste0('PC_', 1:50)
rownames(loadings) <- VariableFeatures(seu)


pbmc[["pca"]] <- CreateDimReducObject(
  embeddings = embedings,
  loadings = loadings,
  key = "PC_",
  assay = DefaultAssay(pbmc)
)

最后运行之前的SNN构建,UMAP降维和聚类,结果就正常多了。

78e8f6c3afa23cb596399f3dcea2d8e

不过,故事还没有结束,因为,我发现相同的代码在Seurat V4是正常的,同时用scanpy的流程也没问题,这个问题就仅仅在Seurat V5中发生。

我感觉,要么安心的留在Seurat V4,要么就大举迁移到Scanpy生态吧。

# 单细胞 

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×