class: center, middle, inverse, title-slide #
Normalización de datos
##
Bioconductor
para datos transcriptómicos de célula única (
scRNA-seq
) –
CDSB2020
###
Leonardo Collado-Torres
### 2020-08-06 --- class: inverse .center[ <a href="https://osca.bioconductor.org/"><img src="https://raw.githubusercontent.com/Bioconductor/OrchestratingSingleCellAnalysis-release/master/images/cover.png" style="width: 30%"/></a> <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. <a href='https://clustrmaps.com/site/1b5pl' title='Visit tracker'><img src='//clustrmaps.com/map_v2.png?cl=ffffff&w=150&t=n&d=rP3KLyAMuzVNcJFL-_C-B0XnLNVy8Sp6a8HDaKEnSzc'/></a> ] .footnote[Descarga los materiales con `usethis::use_course('comunidadbioinfo/cdsb2020')` o revisalos en línea vía [**comunidadbioinfo.github.io/cdsb2020**](http://comunidadbioinfo.github.io/cdsb2020).] <style type="text/css"> /* From https://github.com/yihui/xaringan/issues/147 */ .scroll-output { height: 80%; overflow-y: scroll; } /* https://stackoverflow.com/questions/50919104/horizontally-scrollable-output-on-xaringan-slides */ pre { max-width: 100%; overflow-x: scroll; } /* From https://github.com/yihui/xaringan/wiki/Font-Size */ .tiny{ font-size: 40% } /* From https://github.com/yihui/xaringan/wiki/Title-slide */ .title-slide { background-image: url(https://raw.githubusercontent.com/Bioconductor/OrchestratingSingleCellAnalysis/master/images/Workflow.png); background-size: 33%; background-position: 0% 100% } </style> --- # Diapositivas de Peter Hickey Ve las diapositivas [aquí](https://docs.google.com/presentation/d/1_tCNLiEsQ_TgsqHHf9_1lzXSaM_LunEHxBq3k130dQI/edit#slide=id.g7cc450648d_0_118) --- # Código de R .scroll-output[ ```r library('scRNAseq') ``` ``` ## Loading required package: SingleCellExperiment ``` ``` ## Loading required package: SummarizedExperiment ``` ``` ## Loading required package: GenomicRanges ``` ``` ## Loading required package: stats4 ``` ``` ## Loading required package: BiocGenerics ``` ``` ## Loading required package: parallel ``` ``` ## ## Attaching package: 'BiocGenerics' ``` ``` ## The following objects are masked from 'package:parallel': ## ## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, ## clusterExport, clusterMap, parApply, parCapply, parLapply, ## parLapplyLB, parRapply, parSapply, parSapplyLB ``` ``` ## The following objects are masked from 'package:stats': ## ## IQR, mad, sd, var, xtabs ``` ``` ## The following objects are masked from 'package:base': ## ## anyDuplicated, append, as.data.frame, basename, cbind, colnames, ## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, ## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, ## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, ## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, ## union, unique, unsplit, which, which.max, which.min ``` ``` ## Loading required package: S4Vectors ``` ``` ## ## Attaching package: 'S4Vectors' ``` ``` ## The following object is masked from 'package:base': ## ## expand.grid ``` ``` ## Loading required package: IRanges ``` ``` ## Loading required package: GenomeInfoDb ``` ``` ## Loading required package: Biobase ``` ``` ## Welcome to Bioconductor ## ## Vignettes contain introductory material; view with ## 'browseVignettes()'. To cite Bioconductor, see ## 'citation("Biobase")', and for packages 'citation("pkgname")'. ``` ``` ## Loading required package: DelayedArray ``` ``` ## Loading required package: matrixStats ``` ``` ## ## Attaching package: 'matrixStats' ``` ``` ## The following objects are masked from 'package:Biobase': ## ## anyMissing, rowMedians ``` ``` ## ## Attaching package: 'DelayedArray' ``` ``` ## The following objects are masked from 'package:matrixStats': ## ## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges ``` ``` ## The following objects are masked from 'package:base': ## ## aperm, apply, rowsum ``` ```r sce.zeisel <- ZeiselBrainData(ensembl = TRUE) ``` ``` ## snapshotDate(): 2020-04-27 ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## snapshotDate(): 2020-04-27 ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## downloading 1 resources ``` ``` ## retrieving 1 resource ``` ``` ## loading from cache ``` ``` ## snapshotDate(): 2020-04-27 ``` ``` ## loading from cache ``` ``` ## require("ensembldb") ``` ``` ## Warning: Unable to map 1565 of 20006 requested IDs. ``` ```r # Control de calidad library('scater') ``` ``` ## Loading required package: ggplot2 ``` ```r is.mito <- which(rowData(sce.zeisel)$featureType == "mito") stats <- perCellQCMetrics(sce.zeisel, subsets = list(Mt = is.mito)) qc <- quickPerCellQC(stats, percent_subsets = c("altexps_ERCC_percent", "subsets_Mt_percent")) sce.zeisel <- sce.zeisel[, !qc$discard] ``` ] --- .scroll-output[ ```r # Estimar tamaños de librerías lib.sf.zeisel <- librarySizeFactors(sce.zeisel) # Examina la distribución de los tamaños de librerías # que acabamos de estimar summary(lib.sf.zeisel) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1754 0.5682 0.8669 1.0000 1.2758 4.0651 ``` ```r hist(log10(lib.sf.zeisel), xlab = "Log10[Size factor]", col = "grey80") ``` ![](04-normalization_files/figure-html/all_code2-1.png)<!-- --> ```r ls.zeisel <- colSums(counts(sce.zeisel)) plot( ls.zeisel, lib.sf.zeisel, log = "xy", xlab = "Library size", ylab = "Size factor" ) ``` ![](04-normalization_files/figure-html/all_code2-2.png)<!-- --> ] --- # Ejercicio -- * ¿Son idénticos `ls.zeisel` y `lib.sf.zeisel`? -- * ¿Son proporcionales? -- * Calcula `lib.sf.zeisel` de forma manual --- # Soluciones -- * Revisa los detalles (**Details**) en `?scater::librarySizeFactors` -- * Calcula los tamaños de librería manualmente ```r ## Primero calcula las sumas zeisel_sums <- colSums(counts(sce.zeisel)) identical(zeisel_sums, ls.zeisel) ``` ``` ## [1] TRUE ``` ```r ## Ahora asegurate que su media sea 1 (unity mean) zeisel_size_factors <- zeisel_sums/mean(zeisel_sums) identical(zeisel_size_factors, lib.sf.zeisel) ``` ``` ## [1] TRUE ``` -- * Checa el [código fuente](https://github.com/LTLA/scuttle/blob/master/R/librarySizeFactors.R) --- .scroll-output[ ```r # Normalización por decircunvolución (deconvolution) library('scran') # Pre-clustering set.seed(100) clust.zeisel <- quickCluster(sce.zeisel) # Calcula factores de tamaño para la decircunvolución (deconvolution) deconv.sf.zeisel <- calculateSumFactors(sce.zeisel, clusters = clust.zeisel, min.mean = 0.1) # Examina la distribución de los factores de tamaño summary(deconv.sf.zeisel) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1282 0.4859 0.8248 1.0000 1.3194 4.6521 ``` ```r hist(log10(deconv.sf.zeisel), xlab = "Log10[Factor de tamaño]", col = "grey80") ``` ![](04-normalization_files/figure-html/all_code3-1.png)<!-- --> ```r plot( ls.zeisel, deconv.sf.zeisel, log = "xy", xlab = "Factores tamaño de librería", ylab = "Factores tamaño deconv" ) ``` ![](04-normalization_files/figure-html/all_code3-2.png)<!-- --> ] --- # Ejercicios -- * ¿Cúantos clusters rápidos obtuvimos? -- * ¿Cúantas células por cluster obtuvimos? -- * ¿Cúantos clusters rápidos obtendríamos si cambiamos el tamaño mínimo a 200? Usa 100 como la semilla (seed). -- * ¿Cúantas líneas ves en la gráfica? ??? * 12 * Desde 113 to 325, `sort(table(clust.zeisel))` * 10 `set.seed(100); sort(table(quickCluster(sce.zeisel, min.size = 200)))` * Veo varias líneas cerca de la diagonal. Muy probablemente son 7 `table(factor(sce.zeisel$level1class))` --- .scroll-output[ ```r # Factores de tamaño de librería vs # factor de tamaño vía circunvolución (convolution) # Usa colores para los típos celulares que nos dieron plot( lib.sf.zeisel, deconv.sf.zeisel, xlab = "Factores tamaño de librería", ylab = "Factores tamaño deconv", log = 'xy', pch = 16, col = as.integer(factor(sce.zeisel$level1class)) ) abline(a = 0, b = 1, col = "red") ``` ![](04-normalization_files/figure-html/all_code4-1.png)<!-- --> ] --- class: middle .center[ # ¡Gracias! Las diapositivias fueron hechas con el paquete de R [**xaringan**](https://github.com/yihui/xaringan) y configuradas con [**xaringanthemer**](https://github.com/gadenbuie/xaringanthemer). Este curso está basado en el libro [**Orchestrating Single Cell Analysis with Bioconductor**](https://osca.bioconductor.org/) de [Aaron Lun](https://www.linkedin.com/in/aaron-lun-869b5894/), [Robert Amezquita](https://robertamezquita.github.io/), [Stephanie Hicks](https://www.stephaniehicks.com/) y [Raphael Gottardo](http://rglab.org), además del [**curso de scRNA-seq para WEHI**](https://drive.google.com/drive/folders/1cn5d-Ey7-kkMiex8-74qxvxtCQT6o72h) creado por [Peter Hickey](https://www.peterhickey.org/). Puedes encontrar los archivos para este taller en [comunidadbioinfo/cdsb2020](https://github.com/comunidadbioinfo/cdsb2020). Instructor: [**Leonardo Collado-Torres**](http://lcolladotor.github.io/). <a href="https://www.libd.org"><img src="img/LIBD_logo.jpg" style="width: 20%" /></a> ] .footnote[Descarga los materiales con `usethis::use_course('comunidadbioinfo/cdsb2020')` o revisalos en línea vía [**comunidadbioinfo.github.io/cdsb2020**](http://comunidadbioinfo.github.io/cdsb2020).] --- # Detalles de la sesión de R .scroll-output[ .tiny[ ```r options(width = 120) sessioninfo::session_info() ``` ``` ## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os macOS Catalina 10.15.5 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/New_York ## date 2020-08-05 ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## AnnotationDbi * 1.50.3 2020-07-25 [1] Bioconductor ## AnnotationFilter * 1.12.0 2020-04-27 [1] Bioconductor ## AnnotationHub 2.20.0 2020-04-27 [1] Bioconductor ## askpass 1.1 2019-01-13 [1] CRAN (R 4.0.0) ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) ## beeswarm 0.2.3 2016-04-25 [1] CRAN (R 4.0.0) ## Biobase * 2.48.0 2020-04-27 [1] Bioconductor ## BiocFileCache 1.12.0 2020-04-27 [1] Bioconductor ## BiocGenerics * 0.34.0 2020-04-27 [1] Bioconductor ## BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.0) ## BiocNeighbors 1.6.0 2020-04-27 [1] Bioconductor ## BiocParallel 1.22.0 2020-04-27 [1] Bioconductor ## BiocSingular 1.4.0 2020-04-27 [1] Bioconductor ## BiocVersion 3.11.1 2020-04-07 [1] Bioconductor ## biomaRt 2.44.1 2020-06-17 [1] Bioconductor ## Biostrings 2.56.0 2020-04-27 [1] Bioconductor ## bit 4.0.3 2020-07-30 [1] CRAN (R 4.0.2) ## bit64 4.0.2 2020-07-30 [1] CRAN (R 4.0.2) ## bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.0) ## blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0) ## cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) ## codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.2) ## colorout * 1.2-2 2020-03-16 [1] Github (jalvesaq/colorout@726d681) ## colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) ## curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) ## DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0) ## dbplyr 1.4.4 2020-05-27 [1] CRAN (R 4.0.2) ## DelayedArray * 0.14.1 2020-07-14 [1] Bioconductor ## DelayedMatrixStats 1.10.1 2020-07-03 [1] Bioconductor ## digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) ## dplyr 1.0.1 2020-07-31 [1] CRAN (R 4.0.2) ## dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.0) ## edgeR 3.30.3 2020-06-02 [1] Bioconductor ## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) ## ensembldb * 2.12.1 2020-05-06 [1] Bioconductor ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) ## ExperimentHub 1.14.0 2020-04-27 [1] Bioconductor ## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) ## fastmap 1.0.1 2019-10-08 [1] CRAN (R 4.0.0) ## generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0) ## GenomeInfoDb * 1.24.2 2020-06-15 [1] Bioconductor ## GenomeInfoDbData 1.2.3 2020-04-16 [1] Bioconductor ## GenomicAlignments 1.24.0 2020-04-27 [1] Bioconductor ## GenomicFeatures * 1.40.1 2020-07-14 [1] Bioconductor ## GenomicRanges * 1.40.0 2020-04-27 [1] Bioconductor ## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.0) ## ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2) ## glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0) ## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) ## hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0) ## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) ## httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) ## igraph 1.2.5 2020-03-19 [1] CRAN (R 4.0.0) ## interactiveDisplayBase 1.26.3 2020-06-02 [1] Bioconductor ## IRanges * 2.22.2 2020-05-21 [1] Bioconductor ## irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.0) ## knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0) ## later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.2) ## lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2) ## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.0) ## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) ## limma 3.44.3 2020-06-12 [1] Bioconductor ## locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.0) ## magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) ## Matrix 1.2-18 2019-11-27 [1] CRAN (R 4.0.2) ## matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.0.0) ## memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) ## mime 0.9 2020-02-04 [1] CRAN (R 4.0.0) ## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) ## openssl 1.4.2 2020-06-27 [1] CRAN (R 4.0.1) ## pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) ## progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.0) ## promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.2) ## ProtGenerics 1.20.0 2020-04-27 [1] Bioconductor ## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) ## R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) ## rappdirs 0.3.1 2016-03-28 [1] CRAN (R 4.0.0) ## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) ## RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0) ## rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2) ## rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0) ## Rsamtools 2.4.0 2020-04-27 [1] Bioconductor ## RSQLite 2.2.0 2020-01-07 [1] CRAN (R 4.0.0) ## rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0) ## rsvd 1.0.3 2020-02-17 [1] CRAN (R 4.0.0) ## rtracklayer 1.48.0 2020-04-27 [1] Bioconductor ## S4Vectors * 0.26.1 2020-05-16 [1] Bioconductor ## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) ## scater * 1.16.2 2020-06-26 [1] Bioconductor ## scran * 1.16.0 2020-04-27 [1] Bioconductor ## scRNAseq * 2.2.0 2020-05-07 [1] Bioconductor ## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) ## shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.2) ## showtext 0.8-1 2020-05-25 [1] CRAN (R 4.0.2) ## showtextdb 3.0 2020-06-04 [1] CRAN (R 4.0.2) ## SingleCellExperiment * 1.10.1 2020-04-28 [1] Bioconductor ## statmod 1.4.34 2020-02-17 [1] CRAN (R 4.0.0) ## stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) ## SummarizedExperiment * 1.18.2 2020-07-14 [1] Bioconductor ## sysfonts 0.8.1 2020-05-08 [1] CRAN (R 4.0.0) ## tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2) ## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) ## vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2) ## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.0) ## viridis 0.5.1 2018-03-29 [1] CRAN (R 4.0.0) ## viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0) ## whisker 0.4 2019-08-28 [1] CRAN (R 4.0.0) ## withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) ## xaringan 0.16 2020-03-31 [1] CRAN (R 4.0.0) ## xaringanthemer * 0.3.0 2020-05-04 [1] CRAN (R 4.0.0) ## xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2) ## XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.2) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.0) ## XVector 0.28.0 2020-04-27 [1] Bioconductor ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) ## zlibbioc 1.34.0 2020-04-27 [1] Bioconductor ## ## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ## [2] /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library ``` ]]