CDSB Workshop 2019: How to Build and Create Tidy Tools

Community of Bioinformatics Software Developers

Requirements

Requirements prior knowledge

  • Participants should have basic to intermediate knowledge of the R programming language: variable assignment, reading files: read.csv, read.delim, read.table; data structures: matrix, dataframe, list; data types: character, numeric, factor, logical, etc; installation and use of packages.
  • R: know how to install packages
  • RStudio: know how to use.

Technical requirements

  • Personal computer. Minimum 8GB RAM, a mouse and sufficient disk space for text files and image files. Administrator privileges to install and run utilities such as RStudio.

Overview

In recent years, R has become one of the most used programming languages for data science. The explosion in data available in many fields has increased the demand for data analysts, which is the case in Bioinformatics. R users start by learning how to use the tools others have openly shared with the international community. These R users acquire skills as they continue to analyze data and might even start to interact with R software developers through community websites such as RStudio Community, Bioconductor Support or via Twitter using the #rstats hashtag. Eventually some R users will want to write their own functions and share them online with others. They can do so via creating their own R packages and sharing them via repositories likes CRAN and Bioconductor or simply via GitHub. In this workshop participants will work through the Building Tidy Tools rstudio::conf 2019 workshop and then build their own R packages by collaborating with each other, similar to what was done at rOpenSci unconf18.

This workshop is aimed at students and researchers interested in data analysis that have experience using R. We encourage applications from experts in diverse disciplines, including but not limited to biologists, bioinformaticians, data scientists, software engineers and programmers and R users at large. The main goals of the workshop are:

  1. Teach participants the principles of reproducible data science through the development of R/Bioconductor packages.

  2. Turn (bioinformatics) software users into (bioinformatics) software developers.

  3. Foster the exchange of expertise and establish multidisciplinary collaborations.

  4. Create a community of Latin American scientists committed to the development of software and computational pipelines for (biological) data analysis.

  5. Help train users that can become local instructors and continue to grow their local communities.

This workshop is part of a long-term project to create a community of developers from Latin America. We hope to hold regular meetings in the future (similar to BioC, EuroBioc and BioCAsia) where attendees present their own software contributions. To provide a welcoming environment please follow our code of conduct.

Program

Overall TIB2019 program (registration, sessions and breaks) is available here.

Day 1: July 29, 2019
08:00-09:00 Registration
09:00-09:30 TIB2019 Inauguration
09:30-10:00 Welcome to CDSB: the history of our community and where we are going. Code of conduct review Leonardo Collado-Torres
10:00-10:30 Why make an R package? Alejandro Reyes
10:30-11:00 Preliminaries Leonardo Collado-Torres
11:00-11:30 Coffee break
11:30-12:30 Packages Leonardo Collado-Torres
12:30-14:00 Testing Leonardo Collado-Torres
14:00-15:30 Break
15:30-17:30 API Design Leonardo Collado-Torres
17:30-18:30 Welcome cocktail
Day 2: July 30, 2019
09:00-11:00 Functional Programing Leonardo Collado-Torres
11:00-11:30 Coffee break
11:30-14:00 Errors Leonardo Collado-Torres
14:00-15:30 Break
15:30-17:30 Object Oriented Programming Alejandro Reyes
Day 3: July 31, 2019
09:00-11:00 Tidy Evaluation Leonardo Collado-Torres
11:00-11:30 Group picture
11:30-12:00 Coffee break
12:00-14:00 Document and Share Leonardo Collado-Torres
14:00-15:30 Break
15:30-17:30 Introduction to GitHub Alejandra Medina-Rivera
Day 4: August 1, 2019
09:00-09:30 Introduction to runconf Leonardo Collado-Torres
09:30-11:00 Community building activities
11:00-11:30 Coffee break
11:30-12:00 Voting and selection of projects to work on
12:00-14:00 Working on a Collaborative Project Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz
14:00-15:30 Break
15:30-17:30 Working on a Collaborative Project Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz
Day 5: August 2, 2019
09:00-09:15 (optional) change project
09:15-11:00 Working on a Collaborative Project Alejandra Medina-Rivera, Alejandro Reyes, Leonardo Collado-Torres and Maria Teresa Ortiz
11:00-11:30 Coffee break
11:30-12:00 Workshop evaluation
12:00-12:30 Prepare presentations of the collaborative project
12:30-13:00 Presentations of collaborative projects
13:00-13:30 Closing remarks and community participation opportunities
13:30-14:00 TIB2019 Closing Ceremony

Building Tidy Tools

Charlotte Wickham and Hadley Wickham were the instructors of the original version of this workshop at rstudio::conf 2019. They have kindly shared their materials with us, which we will translate to Spanish.

This is a two-day hands on workshop for those who have embraced the tidyverse and now want to expand it to meet their own needs. We’ll discuss API design, functional programming tools, the basics of object design in S3, and the tidy eval system for NSE.

Learn efficient workflows for developing high-quality R functions, using the set of conventions codified by a package. You’ll also learn workflows for unit testing, which helps ensure that your functions do exactly what you think they do. Master the art of writing functions that do one thing well and can be fluently combined together to solve more complex problems.We’ll cover common function writing pitfalls and how to avoid them.

Learn how to write collections of functions that work well together, and adhere to existing conventions so they’re easy to pick up for newcomers.

You should take this workshop if you have experience programming in R and want to learn how to tackle larger scale problems. You’ll get the most from it if you’re already familiar with functions and are comfortable with R’s basic data structures (vectors, matrices, arrays, lists, and data frames).

Instead of two days, we will teach this workshop in two and a half days such that we can give you more time to digest the material and work through an introduction to Git and GitHub.

CDSB runconf

R unconference events (runconf) typically involve two stages and an optional third one. First, participants create issues weeks ahead of the runconf where they share ideas that they can work through in collaboration with others during the two days of the runconf. In the second stage, participants meet each other and get to know each other through some icebreakers. They then work together for two days on the projects that received the most votes. This is where participants get to learn skills from each other and get a taste of developing open source software in a collaborative environment. At the end, everyone shares their work with the whole group. In the third and final stage, runconf attendees might continue to interact with each other to further polish the packages they worked on during the runconf event, start new collaborations and/or write blog posts about their experience.

There are many blog posts about rOpenSci unconf events. A recent runconf was the chirunconf which you can read more about on this blog post by Sharla Gelfand.

If you are attending our workshop, we will ask you to propose at least one R package idea by creating a GitHub Issue in our GitHub repository for the workshop. If you want some inspiration, take a look at rOpenSci unconf18’s GitHub repository. You will need to get familiarized with all the proposal before the workshop starts.

Instructors:

Organizing committee:

Code of Conduct

Sponsors

Become our sponsor

Centro de Ciencias Genómicas UNAM
Avatar
CDSB
Community of Bioinformatics Software Developers

We want to help you acquire the skills to contribute open source Bioinformatics software using R