CDSB Workshop 2020: Building workflows with RStudio and Bioconductor for single cell RNA-seq analysis
This workshop is part of the Mexican Bioinformatics Encounter (EBM in Spanish) 2020 organized by CDSB with:
TIB2020 | RMB | NNB | CCG-UNAM |
---|---|---|---|
Community of Bioinformatics Software Developers (CDSB)
-
Level: intermediate - advanced
-
Language: Spanish
-
When: August 3 - August 7, 2020 (9 am to 5:30 pm, Friday until 2:30 pm)
-
Where: Online through the Zoom platform. Hours are based on Mexico’s central time zone.
-
Twitter: @CDSBMexico
-
Facebook: @CDSBMexico
Summary
Join us for our 2020 workshop! This year we’ll teach you how to improve your skills for interacting with the R programming language with diverse strategies for organizing your code and projects. This will help you document your analyses such that they are easily to reproduce and for sharing them with your collaborators (from academia to industry). As a use case, we will learn the statistical tools needed for analyzing single cell transcriptomics (scRNA-seq) data using Bioconductor. Completing this workshop will help you in all your R projects and your analyses of biological data: all your analyses will benefit from the organization skills and the ideas behind scRNA-seq are used in many bioinformatics projects.
Requirements
Requirements prior knowledge
- Participants should have basic to intermediate knowledge of the R programming language: variable assignment, reading files:
read.csv
; data structures:matrix
,data.frame
,list
; data types:character
,numeric
,factor
,logical
, etc; installation and use of packages. - Know how to use RStudio.
- Be interested in learning good practices for organizing your work and sharing your work with others.
- Be interested in learning how to analyze biological data using R/Bioconductor packages.
Technical requirements
- Personal computer. Minimum 8GB RAM, a mouse and sufficient disk space for text files and image files. Administrator privileges to install and run utilities such as RStudio.
Overview
In recent years,
R has become one of the most used programming languages for data science. The explosion in data available in many fields has increased the demand for data analysts, which is the case in Bioinformatics. R
users start by learning how to use the tools others have openly shared with the international community. These R
users acquire skills as they continue to analyze data and might even start to interact with R
software developers through community websites such as
RStudio Community,
Bioconductor Support or via Twitter using the
#rstats hashtag. Eventually some R
users will want to write their own functions and organize their code across several projects. It is at this point that it’s useful to learn how to organize your code in order to make your life as an R programmer easier, such that you spend more time on your projects instead of remembering where your code is or what you did a few weeks ago. In order to practice these concepts, we will review the most recent methods for analyzing single cell RNA sequencing (scRNA-seq) data using R packages specialized for this goal that are freely available through
Bioconductor.
The instructors of this workshop have participated at CDSB since its foundation and have gone to conferences such as BioC2019 and rstudio::conf(2020), among others. In recent years we taught how to make R and Bioconductor packages, which are of great use for sharing code with others. Recently, CDSB alumni sent their first R package to Bioconductor, which represented a huge percent increase of Latin American representation in the Bioconductor developers community, thus demonstrating that participating at a CDSB workshop has an impact beyond the one week workshop. For 2020 we will have an applied focus while maintaining our goals at CDSB which are:
-
Turn (bioinformatics) software users into (bioinformatics) software developers.
-
Foster the exchange of expertise and establish multidisciplinary collaborations.
-
Create a community of Latin American scientists committed to the development of software and computational pipelines for (biological) data analysis.
-
Help train users that can become local instructors and continue to grow their local communities.
The scRNA-seq portion of the workshop will be based on the book Orchestrating Single Cell Analysis with Bioconductor that was published by Nature Methods ( DOI) and is among the most publicized papers in 2020.
This workshop is part of a long-term project to create a community of developers from Latin America. We hope to hold regular meetings in the future (similar to BioC, EuroBioc and BioCAsia) where attendees present their own software contributions. To provide a welcoming environment please follow our code of conduct.
Program
9 am to 5:30 pm on Mexico’s central time zone (Friday we end at 2:30 pm) with breaks and time to eat. The detailed schedule and the Zoom links will be provided to those participants that register for the event through the private CDSB Google Calendar. For more schedule details check the CDSB2020 workshop GitHub repository.
Every day we will have a help session from 8 to 9 am for those that need help installing the required software for the workshop.
Day 1
- EBM2020 inauguration
- Welcome to CDSB
- Participants self introductions
- Workflow around RStudio projects:
- Introduction to the project-oriented workflow.
- Working with projects against scripts.
- Creating a project.
- Using safe paths.
- How should I name my file?
Day 2
- Using Git and GitHub.
- Modifying the R startup files.
- Writing and documenting functions.
- Debugging R code.
Day 3
- Good practice for configuring and maintain workspaces.
- Remote picture / video.
- Installing R packages from source.
- General overview of single cell RNA-seq (scRNA-seq) data processing
- Community-building activities
- Overview of the scRNA-seq material
Day 4
- Introduction to scRNA-seq
- Introduction to scRNA-seq with Bioconductor
- Data infrastructure and data import
- Quality control
- Data normalization
Day 5
- Feature selection
- Dimension reduction
- Clustering and differential gene expression analysis
- spatialLIBD: analyzing data from the Visium assay by 10x Genomics
- Workshop evaluation
- Closing ceremony and CDSB reminders
Instructors
-
Alejandra Medina-Rivera (International Laboratory for Human Genome Research, Juriquilla, Querétaro, Mexico) Alejandra recently presented a keynote at the Women in Data Science event in Mexico City.
-
Alejandro Reyes (Data Scientist, Novartis, Basel, Switzerland). Alejandro recently worked on the regutools project with CDSB alumni that was published at Oxford Bioinformatics.
-
Joselyn Chávez (IBT-UNAM, Cuernavaca, Morelos, Mexico). Joselyn presented her work at BioC2019 thanks to a travel award, attended rstudio::conf(2020) thanks to a diversity scholarship where she took the What They Forgot to Teach You about R workshop, and recently was part of the first
R
package submission to Bioconductor by CDSB alumni. She initially was a CDSB2018 student. -
Leonardo Collado-Torres (Lieber Institute for Brain Development, Baltimore, MD, USA). Leonardo recently published a pre-print on spatial transcriptomics using 10xGenomics Visium data.
-
Marcel Ramos Perez (Roswell Park Comprehensive Cancer Center and CUNY School of Public Health, USA). Marcel is part of the Bioconductor Core Team and is the author of many Bioconductor packages including MultiAssayExperiment.
-
Maria Teresa Ortiz (CONABIO and ITAM, Mexico). Teresa presented at rstudio::conf(2020) her work on fast counting algorithms for Mexican presidential elections.
Organizing committee
-
Alejandra Medina-Rivera (International Laboratory for Human Genome Research, Juriquilla, Querétaro, Mexico)
-
Alejandro Reyes (Data Scientist, Novartis, Basel, Switzerland)
-
Joselyn Chávez (IBT-UNAM, Cuernavaca, Morelos, Mexico)
-
Leonardo Collado-Torres (Lieber Institute for Brain Development, Baltimore, MD, USA)
Code of Conduct
Sponsors
Platinum level
Gold level
Silver level
Organizers
CDSB is a node of the Mexican Bioinformatics Network (RMB in Spanish) and jointly organizes the yearly workshop with the National Node of Bioinformatics (NNB).
With support from: