Let's have a look at a specific example, the `summary()` function. --- # The `summary()` function .panelset[ .panel[.panel-name[numeric vector] ```r my_numbers <- rnorm(20) summary(my_numbers) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -1.21768 -0.09469 0.54184 0.61429 1.44219 2.70522 ``` ] .panel[.panel-name[character vector] ```r my_text <- c("Hello", "World") summary(my_text) ``` ``` ## Length Class Mode ## 2 character character ``` ] .panel[.panel-name[data.frame] .scroll-panel[ ```r summary(penguins) ``` ``` ## species island bill_length_mm bill_depth_mm ## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10 ## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60 ## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30 ## Mean :43.92 Mean :17.15 ## 3rd Qu.:48.50 3rd Qu.:18.70 ## Max. :59.60 Max. :21.50 ## NA's :2 NA's :2 ## flipper_length_mm body_mass_g sex year ## Min. :172.0 Min. :2700 female:165 Min. :2007 ## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007 ## Median :197.0 Median :4050 NA's : 11 Median :2008 ## Mean :200.9 Mean :4202 Mean :2008 ## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009 ## Max. :231.0 Max. :6300 Max. :2009 ## NA's :2 NA's :2 ``` ] ] .panel[.panel-name[linear model] .scroll-panel[ ```r model <- lm(data = drop_na(penguins), body_mass_g ~ flipper_length_mm) summary(model) ``` ``` ## ## Call: ## lm(formula = body_mass_g ~ flipper_length_mm, data = drop_na(penguins)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1057.33 -259.79 -12.24 242.97 1293.89 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5872.09 310.29 -18.93 <2e-16 *** ## flipper_length_mm 50.15 1.54 32.56 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 393.3 on 331 degrees of freedom ## Multiple R-squared: 0.7621, Adjusted R-squared: 0.7614 ## F-statistic: 1060 on 1 and 331 DF, p-value: < 2.2e-16 ``` ] ] ] --- class: middle, center # How does `summary()` know how to do different things for different types of data? --- # A big long list of if's and else's? ``` r summary <- function(object, ...) { if (is.numeric(object)) { summary.numeric(object, ...) } else if (is.character(object)) { summary.character(object, ...) } else if ( { } else if (is.linear_model(object)) { summary.lm(object) } else { ... } } ``` --- class: middle, center  --- # Is the previous code maintainable? - `summary()` would need to have an `if` statement for every single different type of data that comes with base R -- - and every single different type of data that other users have created -- - and every single different type of data that may be created before the next R release --- class: middle, center  --- # So how does summary work? ```r get("summary") ``` ``` ## function (object, ...) ## UseMethod("summary") ## <bytecode: 0x55d23687f0e8> ## <environment: namespace:base> ``` -- ``` r ?UseMethod ``` > R possesses a simple generic function mechanism which can be used for an object-oriented style of programming. Method > dispatch takes place based on the class(es) of the first argument to the generic function or of the object supplied as > an argument to UseMethod or NextMethod. --- # What is a "class"? (in s3) .panelset[ .panel[.panel-name[numeric vector] ```r class(my_numbers) ``` ``` ## [1] "numeric" ``` ] .panel[.panel-name[character vector] ```r class(my_text) ``` ``` ## [1] "character" ``` ] .panel[.panel-name[data.frame] ```r class(penguins) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ] .panel[.panel-name[linear model] ```r class(model) ``` ``` ## [1] "lm" ``` ] ] Every value (or, "object") has one or more classes. A function like `summary()` is called a generic function which will take an object, look at the objects class(es), and then look to find a function that matches the first class, e.g. `summary.lm()`. If it doesn't find a function for the first class, it will move onto the second class. If it doesn't manage to find a function at all it will try `summary.default()`. --- # What other generic functions are there? .pull-left[ The big ones that you will no doubt be using day after day are: - `c()` - `plot()` - `print()` (called any time you type a variable into the console in R<sup>1</sup>) - `ggplot()` - most of the `{dplyr}` verbs (e.g. `mutate()`, `select()`, `filter()`) .footnote[ [1] (sort of) ] ] .pull-right[ It's also possible to create new generic functions. First, create the "generic" function: ``` r my_generic <- function(x, ...) UseMethod("my_generic") ``` Then, create implementations of this generic for the different types of data you want to support. ``` r <- function(x, ...) do_stuff() my_generic.lm <- function(x, ...) do_other_stuff() ``` ] --- # Can you create your own class of data? Yes! And it's super easy! .panelset[ .panel[.panel-name[using a list] ```r my_data <- structure( list( name = "Tom", works_for = "The Strategy Unit", favourite_food = "π" ), class = "about_me" ) print.about_me <- function(x, ...) { cat("My name is ", x$name, ", and I work for ", x$works_for, ". ", "My favourite food is ", x$favourite_food, "\n", sep = "") } print(my_data) ``` ``` ## My name is Tom, and I work for The Strategy Unit. My favourite food is π ``` ] .panel[.panel-name[using a dataframe] .scroll-panel[ ```r my_penguins <- penguins # this is a tibble, which are a superset of data.frame # prepend our class to the existing classes class(my_penguins) <- c("my_penguins", class(my_penguins)) print.my_penguins <- function(x, ...) { cat("Penguins are awesome! π§\n") NextMethod() # now call the normal tibble print method } print(my_penguins) ``` ``` ## Penguins are awesome! π§ ## # A tibble: 344 Γ 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## 7 Adelie Torgersen 38.9 17.8 181 3625 ## 8 Adelie Torgersen 39.2 19.6 195 4675 ## 9 Adelie Torgersen 34.1 18.1 193 3475 ## 10 Adelie Torgersen 42 20.2 190 4250 ## # β¦ with 334 more rows, and 2 more variables: sex <fct>, year <int> ``` ] ] ] --- # In Summary: - s3 is a way to make generic code that works with different types of data, but will produce different results - s3 works by having a "generic" function which will "dispatch" to the right function based on the "class" of the data - you can inspect the class of an object using `class(my_obj)`, or set it using `class(my_obj) <- "class"` - you can create new generic methods by creating a function that uses `UseMethod()` - you can create implementations of a generic by creating a function named `[generic].[class]()` --- # What next: - Go read the OOP chapters in [Advanced-R][advr-oop] - There are other types of OOP in R - s3 - s4 - RC/R6 - ggproto - I would advice learning s3 and R6 (R6 is far more similar to OOP found in other languages like C++, Java, C#, Python) - ggproto is only useful if you want to create your own ggplot extensions/work on ggplot - learn another programming language! 