class: center, middle, inverse, title-slide #
Estructura e importe de datos
##
Bioconductor
para datos transcriptómicos de célula única (
scRNA-seq
) –
CDSB2020
###
Leonardo Collado-Torres
### 2020-08-06 --- class: inverse .center[ <a href="https://osca.bioconductor.org/"><img src="https://raw.githubusercontent.com/Bioconductor/OrchestratingSingleCellAnalysis-release/master/images/cover.png" style="width: 30%"/></a> <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. <a href='https://clustrmaps.com/site/1b5pl' title='Visit tracker'><img src='//clustrmaps.com/map_v2.png?cl=ffffff&w=150&t=n&d=rP3KLyAMuzVNcJFL-_C-B0XnLNVy8Sp6a8HDaKEnSzc'/></a> ] .footnote[Descarga los materiales con `usethis::use_course('comunidadbioinfo/cdsb2020')` o revisalos en línea vía [**comunidadbioinfo.github.io/cdsb2020**](http://comunidadbioinfo.github.io/cdsb2020).] <style type="text/css"> /* From https://github.com/yihui/xaringan/issues/147 */ .scroll-output { height: 80%; overflow-y: scroll; } /* https://stackoverflow.com/questions/50919104/horizontally-scrollable-output-on-xaringan-slides */ pre { max-width: 100%; overflow-x: scroll; } /* From https://github.com/yihui/xaringan/wiki/Font-Size */ .tiny{ font-size: 40% } /* From https://github.com/yihui/xaringan/wiki/Title-slide */ .title-slide { background-image: url(https://raw.githubusercontent.com/Bioconductor/OrchestratingSingleCellAnalysis/master/images/Workflow.png); background-size: 33%; background-position: 0% 100% } </style> --- # Diapositivas de Peter Hickey Ve las diapositivas [aquí](https://docs.google.com/presentation/d/1X9qP3wNlnn3BMUQhuZwAo4vCV76c33X_M-UnHxkPZpE/edit) --- # Código de R .scroll-output[ ```r library('scRNAseq') sce.416b <- LunSpikeInData(which = "416b") ``` ``` ## snapshotDate(): 2020-04-27 ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## snapshotDate(): 2020-04-27 ``` ``` ## loading from cache ``` ```r # Carga el paquete SingleCellExperiment library('SingleCellExperiment') # Extrae la matriz de cuentas del set de datos de 416b counts.416b <- counts(sce.416b) # Construye un nuevo SCE de la matriz de cuentas sce <- SingleCellExperiment(assays = list(counts = counts.416b)) # Revisa el objeto que acabamos de crear sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(0): ## assays(1): counts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(0): ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(0): ## reducedDimNames(0): ## altExpNames(0): ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## Registered S3 method overwritten by 'pryr': ## method from ## print.bytes Rcpp ``` ``` ## 40.1 MB ``` ```r # Accesa la matriz de cuenta del compartimento (slot) "assays" # assays(sce, "counts") # OJO: ¡esto puede inundar tu sesión de R! # 1. El método general assay(sce, "counts")[1:6, 1:3] ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ``` ```r # 2. El método específico para accesar la matriz de cuentas "counts" counts(sce)[1:6, 1:3] ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ``` ```r sce <- scater::logNormCounts(sce) # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(0): ## assays(2): counts logcounts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(0): ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(1): sizeFactor ## reducedDimNames(0): ## altExpNames(0): ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 112 MB ``` ```r # 1. El método general assay(sce, "logcounts")[1:6, 1:3] ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ``` ```r # 2. El método específico para accesar la matriz de cuentas # transformadas "logcounts" logcounts(sce)[1:6, 1:3] ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 ## ENSMUSG00000102693 0 ## ENSMUSG00000064842 0 ## ENSMUSG00000051951 0 ## ENSMUSG00000102851 0 ## ENSMUSG00000103377 0 ## ENSMUSG00000104017 0 ``` ```r # Asigna una nueva matriz al compartimento (slot) de "assays" assay(sce, "counts_100") <- assay(sce, "counts") + 100 # Enumera los "assays" en el objeto assays(sce) ``` ``` ## List of length 3 ## names(3): counts logcounts counts_100 ``` ```r assayNames(sce) ``` ``` ## [1] "counts" "logcounts" "counts_100" ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 183 MB ``` ```r # Extrae la información de las muestras (metadata) del set de datos de 416b colData.416b <- colData(sce.416b) # Agrega algo de esa información a nuestro objeto de SCE colData(sce) <- colData.416b[, c("phenotype", "block")] # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(0): ## assays(3): counts logcounts counts_100 ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(0): ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(2): phenotype block ## reducedDimNames(0): ## altExpNames(0): ``` ```r # Accesa a la información de las muestras (metadata) en nuestro SCE colData(sce) ``` ``` ## DataFrame with 192 rows and 2 columns ## phenotype ## <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 induced CBFB-MYH11 oncogene expression ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 induced CBFB-MYH11 oncogene expression ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 wild type phenotype ## block ## <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 20160113 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 20160113 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 20160113 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 20160113 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 20160113 ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 20160325 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 20160325 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 20160325 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 20160325 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 20160325 ``` ```r # Accesa una columna específica de la información de las muestras (metadata) table(sce$block) ``` ``` ## ## 20160113 20160325 ## 96 96 ``` ```r # Ejemplo de una función que agrega columnas nuevas al colData sce <- scater::addPerCellQC(sce.416b) # Accesa a la información de las muestras (metadata) en nuestro SCE actualizado colData(sce) ``` ``` ## DataFrame with 192 rows and 22 columns ## Source Name ## <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## cell line cell type ## <character> <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 416B embryonic stem cell ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 416B embryonic stem cell ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 416B embryonic stem cell ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 416B embryonic stem cell ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 416B embryonic stem cell ## ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 416B embryonic stem cell ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 416B embryonic stem cell ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 416B embryonic stem cell ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 416B embryonic stem cell ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 416B embryonic stem cell ## single cell well quality ## <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 OK ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 OK ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 OK ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 OK ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 OK ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 OK ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 OK ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 OK ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 OK ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 OK ## genotype ## <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 Doxycycline-inducible CBFB-MYH11 oncogene ## phenotype ## <character> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 wild type phenotype ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 induced CBFB-MYH11 oncogene expression ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 induced CBFB-MYH11 oncogene expression ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 oncogene expression ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 wild type phenotype ## strain spike-in addition block ## <character> <character> <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 B6D2F1-J ERCC+SIRV 20160113 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 B6D2F1-J ERCC+SIRV 20160113 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 B6D2F1-J ERCC+SIRV 20160113 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 B6D2F1-J ERCC+SIRV 20160113 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 B6D2F1-J ERCC+SIRV 20160113 ## ... ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 B6D2F1-J Premixed 20160325 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 B6D2F1-J Premixed 20160325 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 B6D2F1-J Premixed 20160325 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 B6D2F1-J Premixed 20160325 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 B6D2F1-J Premixed 20160325 ## sum detected percent_top_50 ## <integer> <integer> <numeric> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 865936 7618 26.7218 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 1076277 7521 29.4043 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 1180138 8306 27.3454 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 1342593 8143 35.8092 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 1668311 7154 34.1198 ## ... ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 776622 8174 45.9362 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 1299950 8956 38.0829 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 1800696 9530 30.6675 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 46731 6649 32.2998 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 1866692 10964 26.6632 ## percent_top_100 percent_top_200 ## <numeric> <numeric> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 32.2773 39.7208 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 35.0354 42.2581 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 32.4770 39.3296 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 40.2666 46.2460 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 39.0901 45.6660 ## ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 49.7010 54.6101 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 42.8930 49.0622 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 35.5839 41.8550 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 37.9149 44.5999 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 31.2584 37.5608 ## percent_top_500 altexps_ERCC_sum ## <numeric> <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 52.9038 65278 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 55.7454 74748 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 51.9337 60878 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 57.1210 60073 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 58.2004 136810 ## ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 64.4249 61575 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 60.6675 94982 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 53.6781 113707 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 56.5235 7580 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 48.9489 48664 ## altexps_ERCC_detected ## <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 39 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 40 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 42 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 42 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 44 ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 39 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 41 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 40 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 44 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 39 ## altexps_ERCC_percent altexps_SIRV_sum ## <numeric> <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 6.80658 27828 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 6.28030 39173 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 4.78949 30058 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 4.18567 32542 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 7.28887 71850 ## ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 7.17620 19848 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 6.65764 31729 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 5.81467 41116 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 13.48898 1883 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 2.51930 16289 ## altexps_SIRV_detected ## <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 7 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 7 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 7 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 7 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 7 ## ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 7 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 7 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 7 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 7 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 7 ## altexps_SIRV_percent total ## <numeric> <integer> ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 2.90165 959042 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 3.29130 1190198 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 2.36477 1271074 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 2.26741 1435208 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 3.82798 1876971 ## ... ... ... ## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 2.313165 858045 ## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 2.224004 1426661 ## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 2.102562 1955519 ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 3.350892 56194 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 0.843271 1931645 ``` ```r # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(0): ## assays(1): counts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(1): Length ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(22): Source Name cell line ... altexps_SIRV_percent total ## reducedDimNames(0): ## altExpNames(2): ERCC SIRV ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 41.4 MB ``` ```r ## Agrega las cuentas normalizadas (lognorm) de nuevo sce <- scater::logNormCounts(sce) ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 113 MB ``` ```r # Ejemplo: obtén el subconjunto de células de fenotipo "wild type" # Acuérdate que las células son columnas del SCE sce[, sce$phenotype == "wild type phenotype"] ``` ``` ## class: SingleCellExperiment ## dim: 46604 96 ## metadata(0): ## assays(2): counts logcounts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(1): Length ## colnames(96): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S504.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(23): Source Name cell line ... total sizeFactor ## reducedDimNames(0): ## altExpNames(2): ERCC SIRV ``` ```r # Accesa la información de los genes de nuestro SCE # ¡Está vació actualmente! rowData(sce) ``` ``` ## DataFrame with 46604 rows and 1 column ## Length ## <integer> ## ENSMUSG00000102693 1070 ## ENSMUSG00000064842 110 ## ENSMUSG00000051951 6094 ## ENSMUSG00000102851 480 ## ENSMUSG00000103377 2819 ## ... ... ## ENSMUSG00000094621 121 ## ENSMUSG00000098647 99 ## ENSMUSG00000096730 3077 ## ENSMUSG00000095742 243 ## CBFB-MYH11-mcherry 2998 ``` ```r # Ejemplo de una función que agrega campos nuevos en el rowData sce <- scater::addPerFeatureQC(sce) # Accesa a la información de las muestras (metadata) en nuestro SCE actualizado rowData(sce) ``` ``` ## DataFrame with 46604 rows and 3 columns ## Length mean detected ## <integer> <numeric> <numeric> ## ENSMUSG00000102693 1070 0.0000000 0.000000 ## ENSMUSG00000064842 110 0.0000000 0.000000 ## ENSMUSG00000051951 6094 0.0000000 0.000000 ## ENSMUSG00000102851 480 0.0000000 0.000000 ## ENSMUSG00000103377 2819 0.0104167 0.520833 ## ... ... ... ... ## ENSMUSG00000094621 121 0.0 0 ## ENSMUSG00000098647 99 0.0 0 ## ENSMUSG00000096730 3077 0.0 0 ## ENSMUSG00000095742 243 0.0 0 ## CBFB-MYH11-mcherry 2998 50375.7 100 ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 113 MB ``` ```r # Descarga los archivos de anotación de la base de datos de Ensembl # correspondientes usando los recursos disponibles vía AnnotationHub library('AnnotationHub') ah <- AnnotationHub() ``` ``` ## snapshotDate(): 2020-04-27 ``` ```r query(ah, c("Mus musculus", "Ensembl", "v97")) ``` ``` ## AnnotationHub with 1 record ## # snapshotDate(): 2020-04-27 ## # names(): AH73905 ## # $dataprovider: Ensembl ## # $species: Mus musculus ## # $rdataclass: EnsDb ## # $rdatadateadded: 2019-05-02 ## # $title: Ensembl 97 EnsDb for Mus musculus ## # $description: Gene and protein annotations for Mus musculus based on Ensem... ## # $taxonomyid: 10090 ## # $genome: GRCm38 ## # $sourcetype: ensembl ## # $sourceurl: http://www.ensembl.org ## # $sourcesize: NA ## # $tags: c("97", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene", ## # "Protein", "Transcript") ## # retrieve record with 'object[["AH73905"]]' ``` ```r # Obtén la posición del cromosoma para cada gen ensdb <- ah[["AH73905"]] ``` ``` ## loading from cache ``` ```r chromosome <- mapIds(ensdb, keys = rownames(sce), keytype = "GENEID", column = "SEQNAME") ``` ``` ## Warning: Unable to map 563 of 46604 requested IDs. ``` ```r rowData(sce)$chromosome <- chromosome # Accesa a la información de las muestras (metadata) en nuestro SCE actualizado rowData(sce) ``` ``` ## DataFrame with 46604 rows and 4 columns ## Length mean detected chromosome ## <integer> <numeric> <numeric> <character> ## ENSMUSG00000102693 1070 0.0000000 0.000000 1 ## ENSMUSG00000064842 110 0.0000000 0.000000 1 ## ENSMUSG00000051951 6094 0.0000000 0.000000 1 ## ENSMUSG00000102851 480 0.0000000 0.000000 1 ## ENSMUSG00000103377 2819 0.0104167 0.520833 1 ## ... ... ... ... ... ## ENSMUSG00000094621 121 0.0 0 GL456372.1 ## ENSMUSG00000098647 99 0.0 0 GL456381.1 ## ENSMUSG00000096730 3077 0.0 0 JH584292.1 ## ENSMUSG00000095742 243 0.0 0 JH584295.1 ## CBFB-MYH11-mcherry 2998 50375.7 100 NA ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 114 MB ``` ```r # Ejemplo: obtén el subconjunto de datos donde los genes están en el # cromosoma 3 # NOTA: which() fue necesario para lidear con los nombres de cromosoma # que son NA sce[which(rowData(sce)$chromosome == "3"), ] ``` ``` ## class: SingleCellExperiment ## dim: 2876 192 ## metadata(0): ## assays(2): counts logcounts ## rownames(2876): ENSMUSG00000098982 ENSMUSG00000098307 ... ## ENSMUSG00000105990 ENSMUSG00000075903 ## rowData names(4): Length mean detected chromosome ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(23): Source Name cell line ... total sizeFactor ## reducedDimNames(0): ## altExpNames(2): ERCC SIRV ``` ```r # Accesa la información de nuestro experimento usando metadata() # ¡Está vació actualmente! metadata(sce) ``` ``` ## list() ``` ```r # La información en el metadata() es como Vegas - todo se vale metadata(sce) <- list(favourite_genes = c("Shh", "Nck1", "Diablo"), analyst = c("Pete")) # Accesa la información de nuestro experimento usando metadata() de # nuestro objeto actualizado metadata(sce) ``` ``` ## $favourite_genes ## [1] "Shh" "Nck1" "Diablo" ## ## $analyst ## [1] "Pete" ``` ```r # Ejemplo: agrega los componentes principales (PCs) de las logcounts # NOTA: aprenderemos más sobre análisis de componentes principales (PCA) después sce <- scater::runPCA(sce) # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(2): favourite_genes analyst ## assays(2): counts logcounts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(4): Length mean detected chromosome ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(23): Source Name cell line ... total sizeFactor ## reducedDimNames(1): PCA ## altExpNames(2): ERCC SIRV ``` ```r # Accesa la matriz de PCA del componente (slot) reducedDims reducedDim(sce, "PCA")[1:6, 1:3] ``` ``` ## PC1 PC2 PC3 ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 18.717668 -27.598132 5.939654 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 2.480705 -27.564583 4.916567 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 42.034018 -7.552435 12.126964 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 -8.494303 31.833727 15.760853 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 -49.737390 4.226795 6.123169 ## SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 -44.528081 -3.215503 10.384939 ``` ```r # Ejemplo, agrega una representación de los logcounts en t-SNE # NOTA: aprenderemos más sobre t-SNE después sce <- scater::runTSNE(sce) # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(2): favourite_genes analyst ## assays(2): counts logcounts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(4): Length mean detected chromosome ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(23): Source Name cell line ... total sizeFactor ## reducedDimNames(2): PCA TSNE ## altExpNames(2): ERCC SIRV ``` ```r # Accesa a la matriz de t-SNE en el componente (slot) de reducedDims head(reducedDim(sce, "TSNE")) ``` ``` ## [,1] [,2] ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 3.6325953 -3.1664239 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 0.7872262 -1.9856767 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 7.9233587 1.4333125 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 2.7673456 4.0043258 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 -8.7769468 0.4735364 ## SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 -8.8302294 2.1605382 ``` ```r # Ejemplo: agrega una representación 'manual' de los logcounts en UMAP # NOTA: aprenderemos más sobre UMAP después y de una forma más sencilla de # calcularla u <- uwot::umap(t(logcounts(sce)), n_components = 2) # Agrega la matriz de UMAP al componente (slot) reducedDims reducedDim(sce, "UMAP") <- u # Accesa a la matriz de UMAP desde el componente (slot) reducedDims head(reducedDim(sce, "UMAP")) ``` ``` ## [,1] [,2] ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 -3.08790954 -1.790366 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 -1.83529409 -1.405215 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 -3.18092569 -1.682470 ## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 -0.56800970 -1.440291 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 0.02311221 -1.839899 ## SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 -0.17816242 -1.628937 ``` ```r # Enumera los resultados de reducción de dimensiones en nuestro objeto SCE reducedDims(sce) ``` ``` ## List of length 3 ## names(3): PCA TSNE UMAP ``` ```r # Extrae la información de ERCC de nuestro SCE para el set de datos de 416b ercc.sce.416b <- altExp(sce.416b, "ERCC") # Inspecciona el SCE para los datos de ERCC ercc.sce.416b ``` ``` ## class: SingleCellExperiment ## dim: 92 192 ## metadata(0): ## assays(1): counts ## rownames(92): ERCC-00002 ERCC-00003 ... ERCC-00170 ERCC-00171 ## rowData names(1): Length ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(0): ## reducedDimNames(0): ## altExpNames(0): ``` ```r # Agrega el SCE de ERCC como un experimento alternativo a nuestro SCE altExp(sce, "ERCC") <- ercc.sce.416b # Revisa el objeto que acabamos de actualizar sce ``` ``` ## class: SingleCellExperiment ## dim: 46604 192 ## metadata(2): favourite_genes analyst ## assays(2): counts logcounts ## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ... ## ENSMUSG00000095742 CBFB-MYH11-mcherry ## rowData names(4): Length mean detected chromosome ## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 ## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 ## colData names(23): Source Name cell line ... total sizeFactor ## reducedDimNames(3): PCA TSNE UMAP ## altExpNames(2): ERCC SIRV ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce) ``` ``` ## 114 MB ``` ```r # Enumera los experimentos alternativos almacenados en nuestro objeto altExps(sce) ``` ``` ## List of length 2 ## names(2): ERCC SIRV ``` ```r # El crear un subconjunto del SCE por muestra (célula) automáticamente # obtiene el subconjunto de los experimentos alternativos sce.subset <- sce[, 1:10] ncol(sce.subset) ``` ``` ## [1] 10 ``` ```r ncol(altExp(sce.subset)) ``` ``` ## [1] 10 ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce.subset) ``` ``` ## 12.6 MB ``` ```r # Extrae los factores de tamaño (size factors) # Estos fueron añadidos a nuestro objeto cuando corrimos # scater::logNormCounts(sce) head(sizeFactors(sce)) ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## 0.7427411 0.9231573 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 ## 1.0122422 1.1515851 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 ## 1.4309639 0.8713409 ``` ```r # "Automáticamente" reemplaza los factores de tamaño sce <- scran::computeSumFactors(sce) head(sizeFactors(sce)) ``` ``` ## [1] 0.6961756 0.8834223 0.9704247 0.9804890 1.2446699 0.7922620 ``` ```r # "Manualmente" reemplaza los factores de tamaño sizeFactors(sce) <- scater::librarySizeFactors(sce) head(sizeFactors(sce)) ``` ``` ## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ## 0.7427411 0.9231573 ## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1 SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 ## 1.0122422 1.1515851 ## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 ## 1.4309639 0.8713409 ``` ] --- # Ejercicios con el objeto `sce` -- * ¿Qué función define la clase del objeto `sce`? -- * ¿Cuáles son los tipos de tablas que debe siempre contenter el objeto `sce`? -- * ¿Donde usamos los `colnames(sce)`? -- * Similarmente, ¿donde usamos los `rownames(sce)`? -- * ¿Cuántos componentes principales calculamos? -- * ¿Cúales son los 3 cromosomas que tienen la media de expresión más alta? ??? * `SingleCellExperiment::SingleCellExperiment` * `colData()`, `assays()` y `rowData()` con `reducedDims()` siendo opcional * `rownames(colData())` y `colnames(assays())` * `rownames(rowData())` y `rownames(assays())` * `ncol(reducedDim(sce, 'PCA'))` * `sort(with(rowData(sce), tapply(mean, chromosome, mean)), decreasing = TRUE)` --- # Ejercicio con los datos de ERCC -- * Crea una gráfica para cada célula mostrando el número de cuentas de ERCC esperadas vs las observadas. -- * [Archivo de ERCC con las cuentas esperadas](https://tools.thermofisher.com/content/sfs/manuals/cms_095046.txt) --- # Solución ERCC ```r ## Lee los datos de ERCC de la red ercc_info <- read.delim( 'https://tools.thermofisher.com/content/sfs/manuals/cms_095046.txt', as.is = TRUE, row.names = 2, check.names = FALSE ) ## Pon los datos de ERCC en el mismo orden m <- match(rownames(altExp(sce, "ERCC")), rownames(ercc_info)) ercc_info <- ercc_info[m, ] ## Normaliza las cuentas de ERCC altExp(sce, "ERCC") <- scater::logNormCounts(altExp(sce, "ERCC")) ``` --- .scroll-output[ ```r for (i in seq_len(2)) { plot( log2(10 * ercc_info[, "concentration in Mix 1 (attomoles/ul)"] + 1) ~ log2(counts(altExp(sce, "ERCC"))[, i] + 1), xlab = "cuentas log norm", ylab = "Mezcla 1: log2(10 * Concentración + 1)", main = colnames(altExp(sce, "ERCC"))[i], xlim = c(min(logcounts( altExp(sce, "ERCC") )), max(logcounts( altExp(sce, "ERCC") ))) ) abline(0, 1, lty = 2, col = 'red') } ``` ![](02-data-infrastructure-and-import_files/figure-html/ercc_solution_plots-1.png)<!-- -->![](02-data-infrastructure-and-import_files/figure-html/ercc_solution_plots-2.png)<!-- --> ] --- # Importar datos .scroll-output[ ```r # Descarga datos de ejemplo procesados con CellRanger # Paréntesis: al usar BiocFileCache solo tenemos que descargar # los datos una vez. library('BiocFileCache') bfc <- BiocFileCache() pbmc.url <- paste0( "http://cf.10xgenomics.com/samples/cell-vdj/", "3.1.0/vdj_v1_hs_pbmc3/", "vdj_v1_hs_pbmc3_filtered_feature_bc_matrix.tar.gz" ) pbmc.data <- bfcrpath(bfc, pbmc.url) # Extrae los archivos en un directorio temporal untar(pbmc.data, exdir = tempdir()) # Enumera los archivos que descargamos y que extrajimos # Estos son los archivos típicos de CellRanger pbmc.dir <- file.path(tempdir(), "filtered_feature_bc_matrix") list.files(pbmc.dir) ``` ``` ## [1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz" ``` ```r # Importa los datos como un objeto de tipo SingleCellExperiment library('DropletUtils') sce.pbmc <- read10xCounts(pbmc.dir) # Revisa el objeto que acabamos de construir sce.pbmc ``` ``` ## class: SingleCellExperiment ## dim: 33555 7231 ## metadata(1): Samples ## assays(1): counts ## rownames(33555): ENSG00000243485 ENSG00000237613 ... CD127 CD15 ## rowData names(3): ID Symbol Type ## colnames: NULL ## colData names(2): Sample Barcode ## reducedDimNames(0): ## altExpNames(0): ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce.pbmc) ``` ``` ## 130 MB ``` ```r # Almacena la información de CITE-seq como un experimento alternativo sce.pbmc <- splitAltExps(sce.pbmc, rowData(sce.pbmc)$Type) # Revisa el objeto que acabamos de actualizar sce.pbmc ``` ``` ## class: SingleCellExperiment ## dim: 33538 7231 ## metadata(1): Samples ## assays(1): counts ## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 ## ENSG00000268674 ## rowData names(3): ID Symbol Type ## colnames: NULL ## colData names(2): Sample Barcode ## reducedDimNames(0): ## altExpNames(1): Antibody Capture ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce.pbmc) ``` ``` ## 131 MB ``` ```r # Descarga datos de ejemplo procesados con scPipe library('BiocFileCache') bfc <- BiocFileCache() sis_seq.url <- "https://github.com/LuyiTian/SIS-seq_script/archive/master.zip" sis_seq.data <- bfcrpath(bfc, sis_seq.url) # Extrae los archivos en un directorio temporal unzip(sis_seq.data, exdir = tempdir()) # Enumera (algunos de) los archivos que descargamos y extrajimos # Estos son los archivos típicos de scPipe sis_seq.dir <- file.path(tempdir(), "SIS-seq_script-master", "data", "BcorKO_scRNAseq", "RPI10") list.files(sis_seq.dir) ``` ``` ## [1] "gene_count.csv" "stat" ``` ```r # Importa los datos como un objeto de tipo SingleCellExperiment library('scPipe') sce.sis_seq <- create_sce_by_dir(sis_seq.dir) ``` ``` ## organism/gene_id_type not provided. Make a guess: mmusculus_gene_ensembl / ensembl_gene_id ``` ```r # Revisa el objeto que acabamos de construir sce.sis_seq ``` ``` ## class: SingleCellExperiment ## dim: 19232 383 ## metadata(2): scPipe Biomart ## assays(1): counts ## rownames(19232): ENSMUSG00000079140 ENSMUSG00000081587 ... ## ENSMUSG00000036880 ENSMUSG00000106872 ## rowData names(0): ## colnames(383): A1 A10 ... P8 P9 ## colData names(7): unaligned aligned_unmapped ... mapped_to_ERCC ## mapped_to_MT ## reducedDimNames(0): ## altExpNames(0): ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(sce.sis_seq) ``` ``` ## 31.3 MB ``` ```r # Descarga un ejemplo de un montón de archivos library('BiocFileCache') bfc <- BiocFileCache() lun_counts.url <- paste0( "https://www.ebi.ac.uk/arrayexpress/files/", "E-MTAB-5522/E-MTAB-5522.processed.1.zip" ) lun_counts.data <- bfcrpath(bfc, lun_counts.url) lun_coldata.url <- paste0("https://www.ebi.ac.uk/arrayexpress/files/", "E-MTAB-5522/E-MTAB-5522.sdrf.txt") lun_coldata.data <- bfcrpath(bfc, lun_coldata.url) # Extrae los archivos en un directorio temporal lun_counts.dir <- tempfile("lun_counts.") unzip(lun_counts.data, exdir = lun_counts.dir) # Enumera los archivos que descargamos y extrajimos list.files(lun_counts.dir) ``` ``` ## [1] "counts_Calero_20160113.tsv" "counts_Calero_20160325.tsv" ## [3] "counts_Liora_20160906.tsv" "counts_Liora_20170201.tsv" ``` ```r # Lee la matriz de cuentas (para una placa) lun.counts <- read.delim( file.path(lun_counts.dir, "counts_Calero_20160113.tsv"), header = TRUE, row.names = 1, check.names = FALSE ) # Almacena la información de la longitud de los genes para después gene.lengths <- lun.counts$Length # Convierte los datos de cuentas de genez a una matriz lun.counts <- as.matrix(lun.counts[, -1]) # Lee la información de las muestras (células) lun.coldata <- read.delim(lun_coldata.data, check.names = FALSE, stringsAsFactors = FALSE) library('S4Vectors') lun.coldata <- as(lun.coldata, "DataFrame") # Pon en orden la información de las muestras para que # sea idéntico al orden en la matriz de cuentas m <- match(colnames(lun.counts), lun.coldata$`Source Name`) lun.coldata <- lun.coldata[m,] # Construye la tabla de información de los genes lun.rowdata <- DataFrame(Length = gene.lengths) # Construye el objeto de SingleCellExperiment lun.sce <- SingleCellExperiment( assays = list(assays = lun.counts), colData = lun.coldata, rowData = lun.rowdata ) # Revisa el objeto que acabamos de construir lun.sce ``` ``` ## class: SingleCellExperiment ## dim: 46703 96 ## metadata(0): ## assays(1): assays ## rownames(46703): ENSMUSG00000102693 ENSMUSG00000064842 ... SIRV7 ## CBFB-MYH11-mcherry ## rowData names(1): Length ## colnames(96): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1 ## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ... ## SLX-9555.N712_S508.C89V9ANXX.s_1.r_1 ## SLX-9555.N712_S517.C89V9ANXX.s_1.r_1 ## colData names(50): Source Name Comment[ENA_SAMPLE] ... Factor ## Value[phenotype] Factor Value[block] ## reducedDimNames(0): ## altExpNames(0): ``` ```r ## ¿Qué tan grande es el objeto de R? pryr::object_size(lun.sce) ``` ``` ## 22.5 MB ``` ] --- class: middle .center[ # ¡Gracias! Las diapositivias fueron hechas con el paquete de R [**xaringan**](https://github.com/yihui/xaringan) y configuradas con [**xaringanthemer**](https://github.com/gadenbuie/xaringanthemer). Este curso está basado en el libro [**Orchestrating Single Cell Analysis with Bioconductor**](https://osca.bioconductor.org/) de [Aaron Lun](https://www.linkedin.com/in/aaron-lun-869b5894/), [Robert Amezquita](https://robertamezquita.github.io/), [Stephanie Hicks](https://www.stephaniehicks.com/) y [Raphael Gottardo](http://rglab.org), además del [**curso de scRNA-seq para WEHI**](https://drive.google.com/drive/folders/1cn5d-Ey7-kkMiex8-74qxvxtCQT6o72h) creado por [Peter Hickey](https://www.peterhickey.org/). Puedes encontrar los archivos para este taller en [comunidadbioinfo/cdsb2020](https://github.com/comunidadbioinfo/cdsb2020). Instructor: [**Leonardo Collado-Torres**](http://lcolladotor.github.io/). <a href="https://www.libd.org"><img src="img/LIBD_logo.jpg" style="width: 20%" /></a> ] .footnote[Descarga los materiales con `usethis::use_course('comunidadbioinfo/cdsb2020')` o revisalos en línea vía [**comunidadbioinfo.github.io/cdsb2020**](http://comunidadbioinfo.github.io/cdsb2020).] --- # Detalles de la sesión de R .scroll-output[ .tiny[ ```r options(width = 120) sessioninfo::session_info() ``` ``` ## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os macOS Catalina 10.15.5 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/New_York ## date 2020-08-05 ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## AnnotationDbi * 1.50.3 2020-07-25 [1] Bioconductor ## AnnotationFilter * 1.12.0 2020-04-27 [1] Bioconductor ## AnnotationHub * 2.20.0 2020-04-27 [1] Bioconductor ## askpass 1.1 2019-01-13 [1] CRAN (R 4.0.0) ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) ## beeswarm 0.2.3 2016-04-25 [1] CRAN (R 4.0.0) ## Biobase * 2.48.0 2020-04-27 [1] Bioconductor ## BiocFileCache * 1.12.0 2020-04-27 [1] Bioconductor ## BiocGenerics * 0.34.0 2020-04-27 [1] Bioconductor ## BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.0) ## BiocNeighbors 1.6.0 2020-04-27 [1] Bioconductor ## BiocParallel 1.22.0 2020-04-27 [1] Bioconductor ## BiocSingular 1.4.0 2020-04-27 [1] Bioconductor ## BiocVersion 3.11.1 2020-04-07 [1] Bioconductor ## biomaRt 2.44.1 2020-06-17 [1] Bioconductor ## Biostrings 2.56.0 2020-04-27 [1] Bioconductor ## bit 4.0.3 2020-07-30 [1] CRAN (R 4.0.2) ## bit64 4.0.2 2020-07-30 [1] CRAN (R 4.0.2) ## bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.0) ## blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0) ## cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) ## codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.2) ## colorout * 1.2-2 2020-03-16 [1] Github (jalvesaq/colorout@726d681) ## colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) ## curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) ## DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0) ## dbplyr * 1.4.4 2020-05-27 [1] CRAN (R 4.0.2) ## DelayedArray * 0.14.1 2020-07-14 [1] Bioconductor ## DelayedMatrixStats 1.10.1 2020-07-03 [1] Bioconductor ## DEoptimR 1.0-8 2016-11-19 [1] CRAN (R 4.0.0) ## digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) ## dplyr 1.0.1 2020-07-31 [1] CRAN (R 4.0.2) ## dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.0) ## DropletUtils * 1.8.0 2020-04-27 [1] Bioconductor ## edgeR 3.30.3 2020-06-02 [1] Bioconductor ## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) ## ensembldb * 2.12.1 2020-05-06 [1] Bioconductor ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) ## ExperimentHub 1.14.0 2020-04-27 [1] Bioconductor ## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) ## fastmap 1.0.1 2019-10-08 [1] CRAN (R 4.0.0) ## FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.0) ## generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0) ## GenomeInfoDb * 1.24.2 2020-06-15 [1] Bioconductor ## GenomeInfoDbData 1.2.3 2020-04-16 [1] Bioconductor ## GenomicAlignments 1.24.0 2020-04-27 [1] Bioconductor ## GenomicFeatures * 1.40.1 2020-07-14 [1] Bioconductor ## GenomicRanges * 1.40.0 2020-04-27 [1] Bioconductor ## GGally 2.0.0 2020-06-06 [1] CRAN (R 4.0.2) ## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.0) ## ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2) ## glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0) ## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) ## HDF5Array 1.16.1 2020-06-16 [1] Bioconductor ## hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0) ## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) ## httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) ## igraph 1.2.5 2020-03-19 [1] CRAN (R 4.0.0) ## interactiveDisplayBase 1.26.3 2020-06-02 [1] Bioconductor ## IRanges * 2.22.2 2020-05-21 [1] Bioconductor ## irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.0) ## knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0) ## later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.2) ## lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2) ## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.0) ## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) ## limma 3.44.3 2020-06-12 [1] Bioconductor ## locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.0) ## magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) ## Matrix 1.2-18 2019-11-27 [1] CRAN (R 4.0.2) ## matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.0.0) ## mclust 5.4.6 2020-04-11 [1] CRAN (R 4.0.0) ## memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) ## mime 0.9 2020-02-04 [1] CRAN (R 4.0.0) ## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) ## openssl 1.4.2 2020-06-27 [1] CRAN (R 4.0.1) ## org.Hs.eg.db 3.11.4 2020-07-27 [1] Bioconductor ## org.Mm.eg.db 3.11.4 2020-08-04 [1] Bioconductor ## pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) ## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) ## progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.0) ## promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.2) ## ProtGenerics 1.20.0 2020-04-27 [1] Bioconductor ## pryr 0.1.4 2018-02-18 [1] CRAN (R 4.0.2) ## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) ## R.methodsS3 1.8.0 2020-02-14 [1] CRAN (R 4.0.0) ## R.oo 1.23.0 2019-11-03 [1] CRAN (R 4.0.0) ## R.utils 2.9.2 2019-12-08 [1] CRAN (R 4.0.0) ## R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) ## rappdirs 0.3.1 2016-03-28 [1] CRAN (R 4.0.0) ## RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.2) ## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) ## RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0) ## reshape 0.8.8 2018-10-23 [1] CRAN (R 4.0.0) ## rhdf5 2.32.2 2020-07-03 [1] Bioconductor ## Rhdf5lib 1.10.1 2020-07-09 [1] Bioconductor ## Rhtslib 1.20.0 2020-04-27 [1] Bioconductor ## rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2) ## rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0) ## robustbase 0.93-6 2020-03-23 [1] CRAN (R 4.0.0) ## Rsamtools 2.4.0 2020-04-27 [1] Bioconductor ## RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.0.0) ## RSQLite 2.2.0 2020-01-07 [1] CRAN (R 4.0.0) ## rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0) ## rsvd 1.0.3 2020-02-17 [1] CRAN (R 4.0.0) ## rtracklayer 1.48.0 2020-04-27 [1] Bioconductor ## Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2) ## S4Vectors * 0.26.1 2020-05-16 [1] Bioconductor ## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) ## scater 1.16.2 2020-06-26 [1] Bioconductor ## scPipe * 1.10.0 2020-04-27 [1] Bioconductor ## scran 1.16.0 2020-04-27 [1] Bioconductor ## scRNAseq * 2.2.0 2020-05-07 [1] Bioconductor ## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) ## shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.2) ## showtext 0.8-1 2020-05-25 [1] CRAN (R 4.0.2) ## showtextdb 3.0 2020-06-04 [1] CRAN (R 4.0.2) ## SingleCellExperiment * 1.10.1 2020-04-28 [1] Bioconductor ## statmod 1.4.34 2020-02-17 [1] CRAN (R 4.0.0) ## stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) ## SummarizedExperiment * 1.18.2 2020-07-14 [1] Bioconductor ## sysfonts 0.8.1 2020-05-08 [1] CRAN (R 4.0.0) ## tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2) ## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) ## uwot 0.1.8 2020-03-16 [1] CRAN (R 4.0.2) ## vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2) ## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.0) ## viridis 0.5.1 2018-03-29 [1] CRAN (R 4.0.0) ## viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0) ## whisker 0.4 2019-08-28 [1] CRAN (R 4.0.0) ## withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) ## xaringan 0.16 2020-03-31 [1] CRAN (R 4.0.0) ## xaringanthemer * 0.3.0 2020-05-04 [1] CRAN (R 4.0.0) ## xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2) ## XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.2) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.0) ## XVector 0.28.0 2020-04-27 [1] Bioconductor ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) ## zlibbioc 1.34.0 2020-04-27 [1] Bioconductor ## ## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ## [2] /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library ``` ]]