+ - 0:00:00
Notes for current slide
Notes for next slide

Functional Programming with Purrr


Birmingham R

Tom Jemmett | Senior Healthcare Analyst

January 2020

1 / 11

About The Strategy Unit / Me

"Leading research, analysis and change from within the NHS"

The Strategy Unit is a specialist NHS team, based in Midlands and Lancashire CSU. We focus on the application of high-quality, multi-disciplinary analytical work.

Our team comes from diverse backgrounds. Our academic qualifications include maths, economics, history, natural sciences, medicine, sociology, business and management, psychology and political science. Our career and personal histories are just as varied.

Our staff are NHS employees, animated by NHS values. The Strategy Unit covers all its costs through project funding. But this is driven by need, not what we can sell. Any surplus is recycled for public benefit.

| | | |

Tom Jemmett

Senior Healthcare Analyst

thomas.jemmett@nhs.net

  • 10+ years experience within the NHS as a data analyst
  • BSc Computer Science and Pure Mathematics (Open University)
  • MBCS/AMIMA
  • Active member of NHS-R community
  • Senior Fellow of NHS-R academy
  • AphA member, West Midlands branch champion

| | |

2 / 11

Purrr

Purrr is a package that aims to provide "A complete and consistent functional programming toolkit for R."

Functional Programming is a Declarative Programming Paradigm that treats computation as the evaluation of Mathematical Functions.

From a Mango blog post

FP is the process of writing code in a structured way and through functions remove code duplications and redundancies. In effect, computations or evaluations are treated as mathematical functions and the output of a function only depends on the values of its inputs – known as arguments. FP ensures that any side-effects such as changes in state do not affect the expected output such that if you call the same function twice with the same arguments the function returns the same output.

As of version 0.3.3 there are some 177 functions exported by purrr!

Fortunately, most of these functions are variants of one another, and some 65 of these functions all fall under the same family of "map" functions.

3 / 11

map

map is a function that takes a vector (or list) and another function for which each item from the input vector (or list) is evaluated against.

map <- function(.x, .f, ...)

map returns a list - but you can use the map_ variants if you want to return a specific type of data as a vector:

  • map_chr for a character vector
  • map_dbl for a numeric vector
  • map_lgl for a logical (TRUE/FALSE) vector
  • map_int for an integer vector
  • map_raw for a raw rector
  • map_dfr if your function returns dataframes and you want to bind by rows (rbind)
  • map_dfc if your function returns dataframes and you want to bind by columns (cbind)

from Advanced R source: Wickham, H. (2020). 9 Functionals | Advanced R.

map is very similar to the apply family of functions in Base R, but provide a much simpler and more consistent programming interface.

4 / 11

simple map examples

Let's say we have a function

fn <- function(x, a) { x^2 + a }

We can use map to iterate over the numbers from 1 to 5 and return the results as a list.

map(1:5, fn, 3)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] 12
##
## [[4]]
## [1] 19
##
## [[5]]
## [1] 28

If we want to simplify the results, we can use the map_dbl function, which will give us the results as a numeric vector.

We can also define functions inside of map. We can either write out the full function syntax, or we can use a formula syntax like:

map_dbl(1:5, ~.x^2 + 3)
## [1] 4 7 12 19 28

in the formula syntax the argument is called .x.

of course, if your function is vectorised you could just call fn(1:5, 3)...

5 / 11

reading in csv's from a folder

files <- dir(here::here("data", "ae_attendances"),
"^\\d{4}-\\d{2}-\\d{2}.csv$",
full.names = TRUE)
# set the name of each item in the vector to be the date part of the file
names(files) <- files %>% str_extract("\\d{4}-\\d{2}-\\d{2}.csv")
# map over the list of files, read each csv, add a column called "file" with the
# value of the name of the item.
ae_attendances <- map_dfr(files, read_csv, col_types = "Dccddd", .id = "file")
knitr::kable(head(ae_attendances, 5), format="html")
file period org_code type attendances breaches admissions
2016-04-01.csv 2016-04-01 RF4 1 18788 4082 4074
2016-04-01.csv 2016-04-01 RF4 2 561 5 0
2016-04-01.csv 2016-04-01 RF4 other 2685 17 0
2016-04-01.csv 2016-04-01 R1H 1 27396 5099 6424
2016-04-01.csv 2016-04-01 R1H 2 700 5 0
6 / 11

map2: map over two lists

Let's generate a table with random binomially distributed values:

df <- tibble(i = 1:20) %>%
mutate(N = rpois(max(i), 100)) %>%
mutate(p = map_dbl(N, rbinom,
n = 1, prob = 0.75))
knitr::kable(head(df, 5), format="html")
i N p
1 102 79
2 89 67
3 97 75
4 113 88
5 103 84

we can use map2 to iterate over the pairs of numbers in the N and p columns, and use the BinomCI function (from DescTools) to calculate confidence intervals.

df %>%
mutate(confint = map2(p, N, BinomCI) %>%
map(as_tibble)) %>%
unnest(cols = "confint") %>%
head(5) %>%
knitr::kable(format = "html")
i N p est lwr.ci upr.ci
1 102 79 0.7745098 0.6843102 0.8447831
2 89 67 0.7528090 0.6539796 0.8307176
3 97 75 0.7731959 0.6803956 0.8451819
4 113 88 0.7787611 0.6937769 0.8454152
5 103 84 0.8155340 0.7297734 0.8786046
7 / 11

Other map functions

pmap

If you have more than 2 items you need to map over, then you want pmap:

Your input must be a list, and each item in the list must contain the same number of items.

list(x = 1:3, y = 4:6, z = 7:9) %>%
pmap_dbl(function(x, y, z) x^y+z)
## [1] 8 40 738

If you pass in a dataframe then it will run a function for each row.

The functions arguments need to match the name of the items in the list/columns in the dataframe rather than by position.

imap

imap works just like map, except it adds an "index" value, so your function call will be function(.x, .i).

walk

walk/walk2/pwalk/iwalk works calls the function for it's side effects, but it returns it's original input. Useful for things like calling rmarkdown::render.

modify

For when your output is going to be the same type as your input. Can be awkward as some functions subtly change the data type (e.g. integers cast to numerics).

8 / 11

Other useful functions from purrr

Composing multiple functions together

You can chain multiple map's together, for example:

1:5 %>% map_dbl(~.x^2) %>% map_dbl(~.x+3)
## [1] 4 7 12 19 28

the compose function allows us to combine functions together.

map_dbl(1:5, compose(~.x^2, ~.x+3,
.dir="forward"))
## [1] 4 7 12 19 28

by default, compose works backwards, like the mathematical "dot" composition, you can change this by setting .dir="forward".

Partial application

We can partially apply functions arguments to create a new, simpler function.

mean_na <- partial(mean, na.rm = TRUE)
mean_na(c(1:5, NA))
## [1] 3

a good example of when partial can be useful is when used in combination with compose: if we want to pass additional arguments to the second or 3 function in the chain, then we will need to set these arguments with partial during the compose call

compose(function_1,
partial(function_2, arg = value),
function_3)
9 / 11

Using map to calculate confidence intervals in a tibble

Below we take the ae_attendances data and sum to get just the monthly performance values. We use the BinomCI function from the DescTools package to calculate the percentage of attendances that "breached" the 4 hour target, along with the 95% confidence intervals.

We wouldn't normally be able to call this function inside a tidy pipe because it's expecting a single value for the x and n arguments. Using map2 we run the function once for each pair of values. We compose with as_tibble to convert the results into a table - this helps us in the next step to unnest this column.

ae_attendances %>%
group_by(period) %>%
summarise_at(vars(attendances, breaches),
sum) %>%
mutate(ci = map2(breaches,
attendances,
compose(as_tibble,
BinomCI))) %>%
unnest(cols = c(ci)) %>%
select_if(negate(is.list)) %>%
mutate_at(vars(est:upr.ci), ~1-.x) %>%
select(period, est, lwr.ci, upr.ci) %>%
head(3) %>%
knitr::kable(format = "html")
period est lwr.ci upr.ci
2016-04-01 0.9003513 0.9007800 0.8999209
2016-05-01 0.9027556 0.9031584 0.9023512
2016-06-01 0.9055994 0.9060081 0.9051892
10 / 11

Further Reading

  • Advanced R contains an excellent section on Functional Programming
  • R4DS functions section is well worth reading also if you haven't created many of your own functions
  • Herding Cats with List Columns and Purrr shows some more advanced use cases for using purrr
  • To Purrr or not to purrr goes into a bit more detail about why you should care about functional programming
  • furrr is a package that provides future-compatible versions of the map functions that allow you to quickly parallelise your code
11 / 11

About The Strategy Unit / Me

"Leading research, analysis and change from within the NHS"

The Strategy Unit is a specialist NHS team, based in Midlands and Lancashire CSU. We focus on the application of high-quality, multi-disciplinary analytical work.

Our team comes from diverse backgrounds. Our academic qualifications include maths, economics, history, natural sciences, medicine, sociology, business and management, psychology and political science. Our career and personal histories are just as varied.

Our staff are NHS employees, animated by NHS values. The Strategy Unit covers all its costs through project funding. But this is driven by need, not what we can sell. Any surplus is recycled for public benefit.

| | | |

Tom Jemmett

Senior Healthcare Analyst

thomas.jemmett@nhs.net

  • 10+ years experience within the NHS as a data analyst
  • BSc Computer Science and Pure Mathematics (Open University)
  • MBCS/AMIMA
  • Active member of NHS-R community
  • Senior Fellow of NHS-R academy
  • AphA member, West Midlands branch champion

| | |

2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
sToggle scribble toolbox
Esc Back to slideshow