CDSB Workshop 2019: How to Build and Create Tidy Tools
Community of Bioinformatics Software Developers
-
Workshop webpage: https://comunidadbioinfo.github.io/post/building-tidy-tools-CDSB-runconf-2019/
-
Level: intermediate - advanced
-
Language: Spanish^[If you speak English but not Spanish please let us know so we can plan accordingly.]
-
When: July 29 - August 2, 2019
-
Where: Auditorium of the Center for Genomic Sciences, Cuernavaca, Mexico
-
Twitter: #CDSBMexico
-
Facebook: @CDSBMexico
-
GitHub: https://github.com/ComunidadBioInfo/tidy-tools-CDSB-runconf-2019
Requirements
Requirements prior knowledge
- Participants should have basic to intermediate knowledge of the R programming language: variable assignment, reading files: read.csv, read.delim, read.table; data structures: matrix, dataframe, list; data types: character, numeric, factor, logical, etc; installation and use of packages.
- R: know how to install packages
- RStudio: know how to use.
Technical requirements
- Personal computer. Minimum 8GB RAM, a mouse and sufficient disk space for text files and image files. Administrator privileges to install and run utilities such as RStudio.
Overview
In recent years,
R has become one of the most used programming languages for data science. The explosion in data available in many fields has increased the demand for data analysts, which is the case in Bioinformatics. R
users start by learning how to use the tools others have openly shared with the international community. These R
users acquire skills as they continue to analyze data and might even start to interact with R
software developers through community websites such as
RStudio Community,
Bioconductor Support or via Twitter using the
#rstats hashtag. Eventually some R
users will want to write their own functions and share them online with others. They can do so via creating their own R
packages and sharing them via repositories likes
CRAN and
Bioconductor or simply via
GitHub. In this workshop participants will work through the Building Tidy Tools
rstudio::conf 2019 workshop and then build their own R
packages by collaborating with each other, similar to what was done at
rOpenSci unconf18.
This workshop is aimed at students and researchers interested in data analysis that have experience using R. We encourage applications from experts in diverse disciplines, including but not limited to biologists, bioinformaticians, data scientists, software engineers and programmers and R users at large. The main goals of the workshop are:
-
Teach participants the principles of reproducible data science through the development of R/Bioconductor packages.
-
Turn (bioinformatics) software users into (bioinformatics) software developers.
-
Foster the exchange of expertise and establish multidisciplinary collaborations.
-
Create a community of Latin American scientists committed to the development of software and computational pipelines for (biological) data analysis.
-
Help train users that can become local instructors and continue to grow their local communities.
This workshop is part of a long-term project to create a community of developers from Latin America. We hope to hold regular meetings in the future (similar to BioC, EuroBioc and BioCAsia) where attendees present their own software contributions. To provide a welcoming environment please follow our code of conduct.
Program
Overall TIB2019 program (registration, sessions and breaks) is available here.
Day 1: July 29, 2019 | ||
---|---|---|
08:00-09:00 | Registration | |
09:00-09:30 | TIB2019 Inauguration | |
09:30-10:00 | Welcome to CDSB: the history of our community and where we are going. Code of conduct review | Leonardo Collado-Torres |
10:00-10:30 | Why make an R package? | Alejandro Reyes |
10:30-11:00 | Preliminaries | Leonardo Collado-Torres |
11:00-11:30 | Coffee break | |
11:30-12:30 | Packages | Leonardo Collado-Torres |
12:30-14:00 | Testing | Leonardo Collado-Torres |
14:00-15:30 | Break | |
15:30-17:30 | API Design | Leonardo Collado-Torres |
17:30-18:30 | Welcome cocktail | |
Day 2: July 30, 2019 | ||
09:00-11:00 | Functional Programing | Leonardo Collado-Torres |
11:00-11:30 | Coffee break | |
11:30-14:00 | Errors | Leonardo Collado-Torres |
14:00-15:30 | Break | |
15:30-17:30 | Object Oriented Programming | Alejandro Reyes |
Day 3: July 31, 2019 | ||
09:00-11:00 | Tidy Evaluation | Leonardo Collado-Torres |
11:00-11:30 | Group picture | |
11:30-12:00 | Coffee break | |
12:00-14:00 | Document and Share | Leonardo Collado-Torres |
14:00-15:30 | Break | |
15:30-17:30 | Introduction to GitHub | Alejandra Medina-Rivera |
Day 4: August 1, 2019 | ||
09:00-09:30 | Introduction to runconf |
Leonardo Collado-Torres |
09:30-11:00 | Community building activities | |
11:00-11:30 | Coffee break | |
11:30-12:00 | Voting and selection of projects to work on | |
12:00-14:00 | Working on a Collaborative Project | Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz |
14:00-15:30 | Break | |
15:30-17:30 | Working on a Collaborative Project | Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz |
Day 5: August 2, 2019 | ||
09:00-09:15 | (optional) change project | |
09:15-11:00 | Working on a Collaborative Project | Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz |
11:00-11:30 | Coffee break | |
11:30-12:00 | Workshop evaluation | |
12:00-12:30 | Prepare presentations of the collaborative project | |
12:30-13:00 | Presentations of collaborative projects | |
13:00-13:30 | Closing remarks and community participation opportunities | |
13:30-14:00 | TIB2019 Closing Ceremony |
Building Tidy Tools
Charlotte Wickham and Hadley Wickham were the instructors of the original version of this workshop at rstudio::conf 2019. They have kindly shared their materials with us, which we will translate to Spanish.
This is a two-day hands on workshop for those who have embraced the tidyverse and now want to expand it to meet their own needs. We’ll discuss API design, functional programming tools, the basics of object design in S3, and the tidy eval system for NSE.
Learn efficient workflows for developing high-quality R functions, using the set of conventions codified by a package. You’ll also learn workflows for unit testing, which helps ensure that your functions do exactly what you think they do. Master the art of writing functions that do one thing well and can be fluently combined together to solve more complex problems.We’ll cover common function writing pitfalls and how to avoid them.
Learn how to write collections of functions that work well together, and adhere to existing conventions so they’re easy to pick up for newcomers.
You should take this workshop if you have experience programming in R and want to learn how to tackle larger scale problems. You’ll get the most from it if you’re already familiar with functions and are comfortable with R’s basic data structures (vectors, matrices, arrays, lists, and data frames).
Instead of two days, we will teach this workshop in two and a half days such that we can give you more time to digest the material and work through an introduction to Git and GitHub.
CDSB runconf
R
unconference events (runconf) typically involve two stages and an optional third one. First, participants create issues weeks ahead of the runconf where they share ideas that they can work through in collaboration with others during the two days of the runconf. In the second stage, participants meet each other and get to know each other through some icebreakers. They then work together for two days on the projects that received the most votes. This is where participants get to learn skills from each other and get a taste of developing open source software in a collaborative environment. At the end, everyone shares their work with the whole group. In the third and final stage, runconf attendees might continue to interact with each other to further polish the packages they worked on during the runconf event, start new collaborations and/or write blog posts about their experience.
There are many blog posts about rOpenSci unconf events. A recent runconf was the chirunconf
which you can read more about on this blog post by
Sharla Gelfand.
If you are attending our workshop, we will ask you to propose at least one R package idea by creating a GitHub Issue in our GitHub repository for the workshop. If you want some inspiration, take a look at rOpenSci unconf18’s GitHub repository. You will need to get familiarized with all the proposal before the workshop starts.
Instructors:
-
Alejandra Medina-Rivera (International Laboratory for Human Genome Research, Juriquilla, Mexico)
-
Alejandro Reyes (Dana-Farber Cancer Institute, Boston, USA)
-
Leonardo Collado-Torres (Lieber Institute for Brain Development, Baltimore, USA). Leonardo attended rstudio::conf 2019 thanks to a diversity scholarship where he took the original Building Tidy Tools workshop. He also attended rOpenSci unconf 2018.
-
Maria Teresa Ortiz (CONABIO and ITAM, Mexico)
Organizing committee:
-
Alejandra Medina-Rivera (International Laboratory for Human Genome Research, Juriquilla, Mexico)
-
Alejandro Reyes (Dana-Farber Cancer Institute, Boston, USA)
-
Delfino García (Center for Genomic Sciences, Cuernavaca, Mexico)
-
Heladia Salgado (Center for Genomic Sciences, Cuernavaca, Mexico)
-
Leonardo Collado-Torres (Lieber Institute for Brain Development, Baltimore, USA)