library(palmerpenguins)
penguins## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <fct>, year <int>
This blog post is an basic overview of function composition with implementation and examples in the programming language R
. We will see:
- Some basic mathematical notation for composition.
- R code to create generic function compositions.
- How to implement composition with custom binary (infix) operators.
- Some concrete examples.
Leading with an example.
Consider the palmerpenguins
data.
Actually, just consider the year
column.
head(penguins$year)
## [1] 2007 2007 2007 2007 2007 2007
How many unique years are there?
length(unique(penguins$year))
## [1] 3
How did we do that? First we removed all duplicates from the year
vector.
n = unique(penguins$year))
(## [1] 2007 2008 2009
And then we count the number of elements in that result.
length(n)
## [1] 3
So that was two functions, one after the other: unique
and length
.
Function composition is combining two or more functions into one function. Here is the “typical” way we could do that in R.
= function(x) length(unique(x))
n_unique
n_unique(penguins$year)
## [1] 3
But this post will cover atypical ways of doing it. Obviously the atypical ways are less common, but they open up new interfaces and stylistic possibilities for our code. Let’s explore.
A barely-formal definition of composition.
In the abstract, what we saw above is \(g(f(x))\). We have two functions, \(g\) and \(f\). We evaluate \(f\) on \(x\), and then evaluate \(g\) on the result. We could define a new function \(h(x) = g(f(x))\) that means the same thing. This \(h\) is a function composition similar to n_unique
above.
This is easy enough.
But here is another way to write function composition that better resembles what is to come. We can define \(h\) without reference to the input \(x\) at all. We can use the notation \(h = g \cdot f\), where the dot \(\cdot\) is an operator representing function composition. We can read it as “do \(g\) after \(f\)”. The resulting function is the same: \(h(x) = (g \cdot f)(x)\). Just like \(h\), the composition \(g \cdot f\) is a new function, and it can take an argument \(x\), but we do not actually need to refer to \(x\) when defining the composition.
Implementing abstract function composition in R
I say “abstract” because, unlike our definition of n_unique
above, we will write a tool to compose any functions, without reference to specific functions or specific arguments.
Here it is:
= function(g, f) {
compose_left function(...) g(f(...))
}
compose_left
takes two arguments g
and f
, which are functions, and internally defines a new function that applies f
and then g
on some unspecified arguments ...
.1
Here is how we would use this tool to define n_unique
.
= compose_left(length, unique) n_unique
That’s it.
And it works, too.
n_unique(penguins$year)
## [1] 3
Interface possibilities
The compose_left
function we wrote takes functions as arguments. Functions that take functions as arguments are sometimes called higher-order functions, and they can have dramatic effects on the way we write code. You may have experienced this effect with other higher-order functions in R like lapply
or purrr::map
.
We will now discuss how to modify our interface to function composition, so we can invoke it in more convenient and ergonomic ways.
Rightward composition and its relationship to the pipe operator
You may be thinking, this is a little like the pipe operator. The pipe operator |>
(or %>%
, if you’re old school) also lets us combine functions with convenient syntax. But the piping and composing are different. Composition lets you define a function ahead of time and pass the data whenever you want, or not at all. Piping forces you to pass the data now. This is extremely important because you can many more wacky things with an unevaluated function object. You can use a composed function as an ingredient to other function compositions. You can “lift” this function with other higher-order procedures like *apply
, Map
/purrr::map
, Reduce
, and so on. But with the pipe, it’s all-or-nothing.2
A less important difference is that the pipe reads from left to right, while composition reads from right to left. \(g \cdot f\) means \(g\) after \(f\). I call this difference “less important” because we can simply write a rightward composition function…
# note the order of g and f
= function(f, g) {
compose_right function(...) g(f(...))
}
= compose_right(unique, length)
n_unique n_unique(penguins$year)
## [1] 3
…to achieve the same orientation. There is no functional difference between compose_left(b, a)
and compose_right(a, b)
.3
Infix operators for function composition.
An infix operator is an operator that goes between its arguments, like +
(add), <-
(assign), and so on. Infix operators are nothing but syntax overtop a function that takes two arguments.4
We can declare our own infix operators in R by assigning a function to a special character sequence. This sequence must begin and end with %
, and you must use backticks when the operator is defined.
# left: evokes g . f
`%.%` = compose_left
# right: evokes f ; g
`%;%` = compose_right
We could then use these operators to define n_unique
like so:
# left
= length %.% unique
n_unique n_unique(penguins$year)
## [1] 3
# right
= unique %;% length
n_unique n_unique(penguins$year)
## [1] 3
n-ary composition
The compose_left
and compose_right
functions each take two arguments. But you can compose as many functions as we want. Composition is associative, so \((a \cdot b) \cdot c\) is the same as \(a \cdot (b \cdot c)\). The result is the same as long as the order of function application doesn’t change.
So let’s write a compose
function that can handle as many functions as we can give it.
= function(...) {
compose = c(...)
fns Reduce(compose_left, fns, init = identity)
}
We will introduce a third function for our example, as.character
.
= compose(as.character, length, unique)
chr_n_unique
chr_n_unique(penguins$year)
## [1] "3"
We implement this n-ary composition with an underlying call to Reduce
. A reduction is a programming technique where a binary operation is repeatedly applied over a sequence of values until one output value remains. So our function compose(as.character, length, unique)
is created in the following way:
- compose
identity
andas.character
(any functionf
composed withidentity
is justf
) - compose the result up to now with
length
- compose the result up to now with
unique
So it creates essentially compose_left(compose_left(compose_left(identity, as.character), length), unique)
.
= compose_left(compose_left(compose_left(identity, as.character), length), unique)
chr_n_unique chr_n_unique(penguins$year)
## [1] "3"
One clever thing about this design is that we can pass arguments as dots ...
or as a vector c(...)
. This is useful because we can organize our functions with normal R data containers ahead of time: the functions are data! It also lets the syntax directly mimic the associative property of function composition in mathematics, as function compositions are equivalent regardless of the prior “grouping” of composed units.
= penguins$year
yr
compose(c(as.character, length, unique))(yr)
## [1] "3"
compose(c(as.character, length), unique)(yr)
## [1] "3"
compose(as.character, c(length, unique))(yr)
## [1] "3"
compose(compose(as.character, length), unique)(yr)
## [1] "3"
compose(as.character, compose(length, unique))(yr)
## [1] "3"
Footnotes
Why does this function take dots
...
instead of one argumentx
? This lets us evaluate our first function (f
) on potentially multiple arguments. The second functiong
doesn’t have the same flexibility—it has to work with the simple returned value fromf
—but if we needed to provide other arguments tog
, we could insert a partial function forg
.↩︎I actually find the eager behavior of the pipe operator quite inconvenient in some cases. So much so that I am working on tools that let the user write “pipe-like” expressions without providing data and save those expressions as functions. And because these “unevaluated pipe expressions” are just functions, they can be composed with other functions, lifted with higher order functions, and so on. This would improve the composability and reusability of data science code in R broadly. I wrote about the effort here.↩︎
Composing left seems more conventional in mathematics (from my limited point of view), but “postfix” notation also exists: \(g \cdot f\) can be written as \(f ; g\).↩︎
I am writing a different package called
prefix
that provides prefix function bindings for R’s built-in infix operators. This makes it easier to writea + b
asadd(a, b)
, or, more importantly,a |> add(b)
.↩︎