This blog post is an basic overview of function composition with implementation and examples in the programming language R. We will see:

Some basic mathematical notation for composition.
R code to create generic function compositions.
How to implement composition with custom binary (infix) operators.
Some concrete examples.

Leading with an example.

Consider the palmerpenguins data.

library(palmerpenguins)

penguins
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <fct>, year <int>

Actually, just consider the year column.

head(penguins$year)
## [1] 2007 2007 2007 2007 2007 2007

How many unique years are there?

length(unique(penguins$year))
## [1] 3

How did we do that? First we removed all duplicates from the year vector.

(n = unique(penguins$year))
## [1] 2007 2008 2009

And then we count the number of elements in that result.

length(n)
## [1] 3

So that was two functions, one after the other: unique and length.

Function composition is combining two or more functions into one function. Here is the “typical” way we could do that in R.

n_unique = function(x) length(unique(x))

n_unique(penguins$year)
## [1] 3

But this post will cover atypical ways of doing it. Obviously the atypical ways are less common, but they open up new interfaces and stylistic possibilities for our code. Let’s explore.

A barely-formal definition of composition.

In the abstract, what we saw above is \(g(f(x))\). We have two functions, \(g\) and \(f\). We evaluate \(f\) on \(x\), and then evaluate \(g\) on the result. We could define a new function \(h(x) = g(f(x))\) that means the same thing. This \(h\) is a function composition similar to n_unique above.

This is easy enough.

But here is another way to write function composition that better resembles what is to come. We can define \(h\) without reference to the input \(x\) at all. We can use the notation \(h = g \cdot f\), where the dot \(\cdot\) is an operator representing function composition. We can read it as “do \(g\) after \(f\)”. The resulting function is the same: \(h(x) = (g \cdot f)(x)\). Just like \(h\), the composition \(g \cdot f\) is a new function, and it can take an argument \(x\), but we do not actually need to refer to \(x\) when defining the composition.

Implementing abstract function composition in R

I say “abstract” because, unlike our definition of n_unique above, we will write a tool to compose any functions, without reference to specific functions or specific arguments.

Here it is:

compose_left = function(g, f) {
    function(...) g(f(...))
}

compose_left takes two arguments g and f, which are functions, and internally defines a new function that applies f and then g on some unspecified arguments ....¹

Here is how we would use this tool to define n_unique.

n_unique = compose_left(length, unique)

That’s it.

And it works, too.

n_unique(penguins$year)
## [1] 3

Interface possibilities

The compose_left function we wrote takes functions as arguments. Functions that take functions as arguments are sometimes called higher-order functions, and they can have dramatic effects on the way we write code. You may have experienced this effect with other higher-order functions in R like lapply or purrr::map.

We will now discuss how to modify our interface to function composition, so we can invoke it in more convenient and ergonomic ways.

Rightward composition and its relationship to the pipe operator

You may be thinking, this is a little like the pipe operator. The pipe operator |> (or %>%, if you’re old school) also lets us combine functions with convenient syntax. But the piping and composing are different. Composition lets you define a function ahead of time and pass the data whenever you want, or not at all. Piping forces you to pass the data now. This is extremely important because you can many more wacky things with an unevaluated function object. You can use a composed function as an ingredient to other function compositions. You can “lift” this function with other higher-order procedures like *apply, Map/purrr::map, Reduce, and so on. But with the pipe, it’s all-or-nothing.²

A less important difference is that the pipe reads from left to right, while composition reads from right to left. \(g \cdot f\) means \(g\) after \(f\). I call this difference “less important” because we can simply write a rightward composition function…

# note the order of g and f
compose_right = function(f, g) {
    function(...) g(f(...))
}

n_unique = compose_right(unique, length)
n_unique(penguins$year)
## [1] 3

…to achieve the same orientation. There is no functional difference between compose_left(b, a) and compose_right(a, b).³

Infix operators for function composition.

An infix operator is an operator that goes between its arguments, like + (add), <- (assign), and so on. Infix operators are nothing but syntax overtop a function that takes two arguments.⁴

We can declare our own infix operators in R by assigning a function to a special character sequence. This sequence must begin and end with %, and you must use backticks when the operator is defined.

# left: evokes g . f
`%.%` = compose_left

# right: evokes f ; g
`%;%` = compose_right

We could then use these operators to define n_unique like so:

# left
n_unique = length %.% unique
n_unique(penguins$year)
## [1] 3

# right
n_unique = unique %;% length 
n_unique(penguins$year)
## [1] 3

n-ary composition

The compose_left and compose_right functions each take two arguments. But you can compose as many functions as we want. Composition is associative, so \((a \cdot b) \cdot c\) is the same as \(a \cdot (b \cdot c)\). The result is the same as long as the order of function application doesn’t change.

So let’s write a compose function that can handle as many functions as we can give it.

compose = function(...) {
    fns = c(...)
    Reduce(compose_left, fns, init = identity)
}

We will introduce a third function for our example, as.character.

chr_n_unique = compose(as.character, length, unique)

chr_n_unique(penguins$year)
## [1] "3"

We implement this n-ary composition with an underlying call to Reduce. A reduction is a programming technique where a binary operation is repeatedly applied over a sequence of values until one output value remains. So our function compose(as.character, length, unique) is created in the following way:

compose identity and as.character (any function f composed with identity is just f)
compose the result up to now with length
compose the result up to now with unique

So it creates essentially compose_left(compose_left(compose_left(identity, as.character), length), unique).

chr_n_unique = compose_left(compose_left(compose_left(identity, as.character), length), unique)
chr_n_unique(penguins$year)
## [1] "3"

One clever thing about this design is that we can pass arguments as dots ... or as a vector c(...). This is useful because we can organize our functions with normal R data containers ahead of time: the functions are data! It also lets the syntax directly mimic the associative property of function composition in mathematics, as function compositions are equivalent regardless of the prior “grouping” of composed units.

yr = penguins$year

compose(c(as.character, length, unique))(yr)
## [1] "3"
compose(c(as.character, length), unique)(yr)
## [1] "3"
compose(as.character, c(length, unique))(yr)
## [1] "3"
compose(compose(as.character, length), unique)(yr)
## [1] "3"
compose(as.character, compose(length, unique))(yr)
## [1] "3"

Footnotes

Why does this function take dots ... instead of one argument x? This lets us evaluate our first function (f) on potentially multiple arguments. The second function g doesn’t have the same flexibility—it has to work with the simple returned value from f—but if we needed to provide other arguments to g, we could insert a partial function for g.↩︎
I actually find the eager behavior of the pipe operator quite inconvenient in some cases. So much so that I am working on tools that let the user write “pipe-like” expressions without providing data and save those expressions as functions. And because these “unevaluated pipe expressions” are just functions, they can be composed with other functions, lifted with higher order functions, and so on. This would improve the composability and reusability of data science code in R broadly. I wrote about the effort here.↩︎
Composing left seems more conventional in mathematics (from my limited point of view), but “postfix” notation also exists: \(g \cdot f\) can be written as \(f ; g\).↩︎
I am writing a different package called prefix that provides prefix function bindings for R’s built-in infix operators. This makes it easier to write a + b as add(a, b), or, more importantly, a |> add(b).↩︎