Tidyverse Cheat Sheet



The tidyverse cheat sheet will guide you through some general information on the tidyverse, and then covers topics such as useful functions, loading in your data, manipulating it with dplyr and lastly, visualize it with ggplot2. In short, everything that you need to kickstart your data science learning with R! Do you want to learn more? Tidyverse include dplyr, tidyr, and ggplot2, which are among the most popular R packages. There are others that are super useful like readxl, forcats, and stringr that are part of the tidyverse, but don't come installed automatically with the tidyverse package, so you'll have to lead them explicitly. R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Think of cheat sheets as a quick reference, with the emphasis on quick. Here's an analogy: A cheat sheet is more like a well-organized computer menu bar that leads you to a command than like a manual that documents each command. Everything about your cheat sheet should be designed to lead users to essential information quickly. The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. Install the complete tidyverse with: install.packages ('tidyverse').

Tidyverse

Usage

readr is part of the core tidyverse, so load it with:

To accurately read a rectangular dataset with readr you combine two pieces: a function that parses the overall file, and a column specification. The column specification describes how each column should be converted from a character vector to the most appropriate data type, and in most cases it’s not necessary because readr will guess it for you automatically.

readr supports seven file formats with seven read_ functions:

  • read_csv(): comma separated (CSV) files
  • read_tsv(): tab separated files
  • read_delim(): general delimited files
  • read_fwf(): fixed width files
  • read_table(): tabular files where columns are separated by white-space.
  • read_log(): web log files

In many cases, these functions will just work: you supply the path to a file and you get a tibble back. The following example loads a sample file bundled with readr:

Note that readr prints the column specification. This is useful because it allows you to check that the columns have been read in as you expect, and if they haven’t, you can easily copy and paste into a new call:

R Cheat Sheets

vignette('readr') gives more detail on how readr guesses the column types, how you can override the defaults, and provides some useful tools for debugging parsing problems.