"Leading research, analysis and change from within the NHS"
The Strategy Unit is a specialist NHS team, based in Midlands and Lancashire CSU. We focus on the application of high-quality, multi-disciplinary analytical work.
Our team comes from diverse backgrounds. Our academic qualifications include maths, economics, history, natural sciences, medicine, sociology, business and management, psychology and political science. Our career and personal histories are just as varied.
Our staff are NHS employees, animated by NHS values. The Strategy Unit covers all its costs through project funding. But this is driven by need, not what we can sell. Any surplus is recycled for public benefit.
Tom Jemmett
Senior Healthcare Analyst
Purrr is a package that aims to provide "A complete and consistent functional programming toolkit for R."
Functional Programming is a Declarative Programming Paradigm that treats computation as the evaluation of Mathematical Functions.
From a Mango blog post
FP is the process of writing code in a structured way and through functions remove code duplications and redundancies. In effect, computations or evaluations are treated as mathematical functions and the output of a function only depends on the values of its inputs – known as arguments. FP ensures that any side-effects such as changes in state do not affect the expected output such that if you call the same function twice with the same arguments the function returns the same output.
As of version 0.3.3 there are some 177 functions exported by purrr!
Fortunately, most of these functions are variants of one another, and some 65 of these functions all fall under the same family of "map" functions.
map is a function that takes a vector (or list) and another function for which each item from the input vector (or list) is evaluated against.
map <- function(.x, .f, ...)
map returns a list - but you can use the map_ variants if you want to return a specific type of data as a vector:
TRUE
/FALSE
) vector
source: Wickham, H. (2020). 9 Functionals | Advanced R.
map is very similar to the apply family of functions in Base R, but provide a much simpler and more consistent programming interface.
Let's say we have a function
fn <- function(x, a) { x^2 + a }
We can use map to iterate over the numbers from 1 to 5 and return the results as a list.
map(1:5, fn, 3)
## [[1]]## [1] 4## ## [[2]]## [1] 7## ## [[3]]## [1] 12## ## [[4]]## [1] 19## ## [[5]]## [1] 28
If we want to simplify the results, we can use the map_dbl function, which will give us the results as a numeric vector.
We can also define functions inside of map. We can either write out the full function syntax, or we can use a formula syntax like:
map_dbl(1:5, ~.x^2 + 3)
## [1] 4 7 12 19 28
in the formula syntax the argument is called .x
.
of course, if your function is vectorised you could just call fn(1:5, 3)
...
files <- dir(here::here("data", "ae_attendances"), "^\\d{4}-\\d{2}-\\d{2}.csv$", full.names = TRUE)# set the name of each item in the vector to be the date part of the filenames(files) <- files %>% str_extract("\\d{4}-\\d{2}-\\d{2}.csv")# map over the list of files, read each csv, add a column called "file" with the# value of the name of the item.ae_attendances <- map_dfr(files, read_csv, col_types = "Dccddd", .id = "file")knitr::kable(head(ae_attendances, 5), format="html")
file | period | org_code | type | attendances | breaches | admissions |
---|---|---|---|---|---|---|
2016-04-01.csv | 2016-04-01 | RF4 | 1 | 18788 | 4082 | 4074 |
2016-04-01.csv | 2016-04-01 | RF4 | 2 | 561 | 5 | 0 |
2016-04-01.csv | 2016-04-01 | RF4 | other | 2685 | 17 | 0 |
2016-04-01.csv | 2016-04-01 | R1H | 1 | 27396 | 5099 | 6424 |
2016-04-01.csv | 2016-04-01 | R1H | 2 | 700 | 5 | 0 |
Let's generate a table with random binomially distributed values:
df <- tibble(i = 1:20) %>% mutate(N = rpois(max(i), 100)) %>% mutate(p = map_dbl(N, rbinom, n = 1, prob = 0.75))knitr::kable(head(df, 5), format="html")
i | N | p |
---|---|---|
1 | 102 | 79 |
2 | 89 | 67 |
3 | 97 | 75 |
4 | 113 | 88 |
5 | 103 | 84 |
we can use map2
to iterate over the pairs of numbers in the N
and p
columns, and use the BinomCI
function (from DescTools
) to calculate
confidence intervals.
df %>% mutate(confint = map2(p, N, BinomCI) %>% map(as_tibble)) %>% unnest(cols = "confint") %>% head(5) %>% knitr::kable(format = "html")
i | N | p | est | lwr.ci | upr.ci |
---|---|---|---|---|---|
1 | 102 | 79 | 0.7745098 | 0.6843102 | 0.8447831 |
2 | 89 | 67 | 0.7528090 | 0.6539796 | 0.8307176 |
3 | 97 | 75 | 0.7731959 | 0.6803956 | 0.8451819 |
4 | 113 | 88 | 0.7787611 | 0.6937769 | 0.8454152 |
5 | 103 | 84 | 0.8155340 | 0.7297734 | 0.8786046 |
If you have more than 2 items you need to map over, then you want pmap:
Your input must be a list, and each item in the list must contain the same number of items.
list(x = 1:3, y = 4:6, z = 7:9) %>% pmap_dbl(function(x, y, z) x^y+z)
## [1] 8 40 738
If you pass in a dataframe then it will run a function for each row.
The functions arguments need to match the name of the items in the list/columns in the dataframe rather than by position.
imap works just like map, except it adds an "index" value, so your function call will be function(.x, .i).
walk/walk2/pwalk/iwalk works calls the function for it's side effects, but it
returns it's original input. Useful for things like calling rmarkdown::render
.
For when your output is going to be the same type as your input. Can be awkward as some functions subtly change the data type (e.g. integers cast to numerics).
You can chain multiple map's together, for example:
1:5 %>% map_dbl(~.x^2) %>% map_dbl(~.x+3)
## [1] 4 7 12 19 28
the compose function allows us to combine functions together.
map_dbl(1:5, compose(~.x^2, ~.x+3, .dir="forward"))
## [1] 4 7 12 19 28
by default, compose works backwards, like the mathematical "dot" composition,
you can change this by setting .dir="forward"
.
We can partially apply functions arguments to create a new, simpler function.
mean_na <- partial(mean, na.rm = TRUE)mean_na(c(1:5, NA))
## [1] 3
a good example of when partial can be useful is when used in combination with
compose: if we want to pass additional arguments to the second or 3 function in
the chain, then we will need to set these arguments with partial
during the
compose
call
compose(function_1, partial(function_2, arg = value), function_3)
Below we take the ae_attendances
data and sum to get just the monthly
performance values. We use the BinomCI
function from the DescTools
package
to calculate the percentage of attendances that "breached" the 4 hour target,
along with the 95% confidence intervals.
We wouldn't normally be able to call this function inside a tidy pipe because
it's expecting a single value for the x
and n
arguments. Using map2 we run
the function once for each pair of values. We compose with as_tibble to
convert the results into a table - this helps us in the next step to unnest this
column.
ae_attendances %>% group_by(period) %>% summarise_at(vars(attendances, breaches), sum) %>% mutate(ci = map2(breaches, attendances, compose(as_tibble, BinomCI))) %>% unnest(cols = c(ci)) %>% select_if(negate(is.list)) %>% mutate_at(vars(est:upr.ci), ~1-.x) %>% select(period, est, lwr.ci, upr.ci) %>% head(3) %>% knitr::kable(format = "html")
period | est | lwr.ci | upr.ci |
---|---|---|---|
2016-04-01 | 0.9003513 | 0.9007800 | 0.8999209 |
2016-05-01 | 0.9027556 | 0.9031584 | 0.9023512 |
2016-06-01 | 0.9055994 | 0.9060081 | 0.9051892 |
"Leading research, analysis and change from within the NHS"
The Strategy Unit is a specialist NHS team, based in Midlands and Lancashire CSU. We focus on the application of high-quality, multi-disciplinary analytical work.
Our team comes from diverse backgrounds. Our academic qualifications include maths, economics, history, natural sciences, medicine, sociology, business and management, psychology and political science. Our career and personal histories are just as varied.
Our staff are NHS employees, animated by NHS values. The Strategy Unit covers all its costs through project funding. But this is driven by need, not what we can sell. Any surplus is recycled for public benefit.
Tom Jemmett
Senior Healthcare Analyst
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
s | Toggle scribble toolbox |
Esc | Back to slideshow |
"Leading research, analysis and change from within the NHS"
The Strategy Unit is a specialist NHS team, based in Midlands and Lancashire CSU. We focus on the application of high-quality, multi-disciplinary analytical work.
Our team comes from diverse backgrounds. Our academic qualifications include maths, economics, history, natural sciences, medicine, sociology, business and management, psychology and political science. Our career and personal histories are just as varied.
Our staff are NHS employees, animated by NHS values. The Strategy Unit covers all its costs through project funding. But this is driven by need, not what we can sell. Any surplus is recycled for public benefit.
Tom Jemmett
Senior Healthcare Analyst
Purrr is a package that aims to provide "A complete and consistent functional programming toolkit for R."
Functional Programming is a Declarative Programming Paradigm that treats computation as the evaluation of Mathematical Functions.
From a Mango blog post
FP is the process of writing code in a structured way and through functions remove code duplications and redundancies. In effect, computations or evaluations are treated as mathematical functions and the output of a function only depends on the values of its inputs – known as arguments. FP ensures that any side-effects such as changes in state do not affect the expected output such that if you call the same function twice with the same arguments the function returns the same output.
As of version 0.3.3 there are some 177 functions exported by purrr!
Fortunately, most of these functions are variants of one another, and some 65 of these functions all fall under the same family of "map" functions.
map is a function that takes a vector (or list) and another function for which each item from the input vector (or list) is evaluated against.
map <- function(.x, .f, ...)
map returns a list - but you can use the map_ variants if you want to return a specific type of data as a vector:
TRUE
/FALSE
) vector
source: Wickham, H. (2020). 9 Functionals | Advanced R.
map is very similar to the apply family of functions in Base R, but provide a much simpler and more consistent programming interface.
Let's say we have a function
fn <- function(x, a) { x^2 + a }
We can use map to iterate over the numbers from 1 to 5 and return the results as a list.
map(1:5, fn, 3)
## [[1]]## [1] 4## ## [[2]]## [1] 7## ## [[3]]## [1] 12## ## [[4]]## [1] 19## ## [[5]]## [1] 28
If we want to simplify the results, we can use the map_dbl function, which will give us the results as a numeric vector.
We can also define functions inside of map. We can either write out the full function syntax, or we can use a formula syntax like:
map_dbl(1:5, ~.x^2 + 3)
## [1] 4 7 12 19 28
in the formula syntax the argument is called .x
.
of course, if your function is vectorised you could just call fn(1:5, 3)
...
files <- dir(here::here("data", "ae_attendances"), "^\\d{4}-\\d{2}-\\d{2}.csv$", full.names = TRUE)# set the name of each item in the vector to be the date part of the filenames(files) <- files %>% str_extract("\\d{4}-\\d{2}-\\d{2}.csv")# map over the list of files, read each csv, add a column called "file" with the# value of the name of the item.ae_attendances <- map_dfr(files, read_csv, col_types = "Dccddd", .id = "file")knitr::kable(head(ae_attendances, 5), format="html")
file | period | org_code | type | attendances | breaches | admissions |
---|---|---|---|---|---|---|
2016-04-01.csv | 2016-04-01 | RF4 | 1 | 18788 | 4082 | 4074 |
2016-04-01.csv | 2016-04-01 | RF4 | 2 | 561 | 5 | 0 |
2016-04-01.csv | 2016-04-01 | RF4 | other | 2685 | 17 | 0 |
2016-04-01.csv | 2016-04-01 | R1H | 1 | 27396 | 5099 | 6424 |
2016-04-01.csv | 2016-04-01 | R1H | 2 | 700 | 5 | 0 |
Let's generate a table with random binomially distributed values:
df <- tibble(i = 1:20) %>% mutate(N = rpois(max(i), 100)) %>% mutate(p = map_dbl(N, rbinom, n = 1, prob = 0.75))knitr::kable(head(df, 5), format="html")
i | N | p |
---|---|---|
1 | 102 | 79 |
2 | 89 | 67 |
3 | 97 | 75 |
4 | 113 | 88 |
5 | 103 | 84 |
we can use map2
to iterate over the pairs of numbers in the N
and p
columns, and use the BinomCI
function (from DescTools
) to calculate
confidence intervals.
df %>% mutate(confint = map2(p, N, BinomCI) %>% map(as_tibble)) %>% unnest(cols = "confint") %>% head(5) %>% knitr::kable(format = "html")
i | N | p | est | lwr.ci | upr.ci |
---|---|---|---|---|---|
1 | 102 | 79 | 0.7745098 | 0.6843102 | 0.8447831 |
2 | 89 | 67 | 0.7528090 | 0.6539796 | 0.8307176 |
3 | 97 | 75 | 0.7731959 | 0.6803956 | 0.8451819 |
4 | 113 | 88 | 0.7787611 | 0.6937769 | 0.8454152 |
5 | 103 | 84 | 0.8155340 | 0.7297734 | 0.8786046 |
If you have more than 2 items you need to map over, then you want pmap:
Your input must be a list, and each item in the list must contain the same number of items.
list(x = 1:3, y = 4:6, z = 7:9) %>% pmap_dbl(function(x, y, z) x^y+z)
## [1] 8 40 738
If you pass in a dataframe then it will run a function for each row.
The functions arguments need to match the name of the items in the list/columns in the dataframe rather than by position.
imap works just like map, except it adds an "index" value, so your function call will be function(.x, .i).
walk/walk2/pwalk/iwalk works calls the function for it's side effects, but it
returns it's original input. Useful for things like calling rmarkdown::render
.
For when your output is going to be the same type as your input. Can be awkward as some functions subtly change the data type (e.g. integers cast to numerics).
You can chain multiple map's together, for example:
1:5 %>% map_dbl(~.x^2) %>% map_dbl(~.x+3)
## [1] 4 7 12 19 28
the compose function allows us to combine functions together.
map_dbl(1:5, compose(~.x^2, ~.x+3, .dir="forward"))
## [1] 4 7 12 19 28
by default, compose works backwards, like the mathematical "dot" composition,
you can change this by setting .dir="forward"
.
We can partially apply functions arguments to create a new, simpler function.
mean_na <- partial(mean, na.rm = TRUE)mean_na(c(1:5, NA))
## [1] 3
a good example of when partial can be useful is when used in combination with
compose: if we want to pass additional arguments to the second or 3 function in
the chain, then we will need to set these arguments with partial
during the
compose
call
compose(function_1, partial(function_2, arg = value), function_3)
Below we take the ae_attendances
data and sum to get just the monthly
performance values. We use the BinomCI
function from the DescTools
package
to calculate the percentage of attendances that "breached" the 4 hour target,
along with the 95% confidence intervals.
We wouldn't normally be able to call this function inside a tidy pipe because
it's expecting a single value for the x
and n
arguments. Using map2 we run
the function once for each pair of values. We compose with as_tibble to
convert the results into a table - this helps us in the next step to unnest this
column.
ae_attendances %>% group_by(period) %>% summarise_at(vars(attendances, breaches), sum) %>% mutate(ci = map2(breaches, attendances, compose(as_tibble, BinomCI))) %>% unnest(cols = c(ci)) %>% select_if(negate(is.list)) %>% mutate_at(vars(est:upr.ci), ~1-.x) %>% select(period, est, lwr.ci, upr.ci) %>% head(3) %>% knitr::kable(format = "html")
period | est | lwr.ci | upr.ci |
---|---|---|---|
2016-04-01 | 0.9003513 | 0.9007800 | 0.8999209 |
2016-05-01 | 0.9027556 | 0.9031584 | 0.9023512 |
2016-06-01 | 0.9055994 | 0.9060081 | 0.9051892 |