Normal Forms

Posted on Mar 22, 2023

This semester, my advisor is teaching a PL theory course that is being offered at Columbia for the first time ever. The course content is super exciting to me: if I could redo college and extend my education with just one class, this would be it. I’m not officially involved with the course (as a TA or a student), but I’m unofficially and happily a guinea pig for the new lecture material.

As I’m following along the course, I’m planning on writing several posts discussing the course material. In particular, I’m going to focus on topics that I wished were more accessibly written about online. (Of course, accessibility is a totally subjective metric that depends on your prior background, and I’m not promising what I write will be any more accessible.)


We’re currently discussing the untyped lambda calculus, whose syntax is shown below:

\[ e \quad \coloncolonequals\quad x\quad|\quad \lambda x .\ e \quad|\quad e\ e \]

where \(e\) represents expressions and \(x\) represents variable names. The \(x\) inside a lambda abstraction is called its binder. Juxaposition (which denotes function application) are left-associative, so \(f\ a\ b\) means \((f\ a)\ b\), not \(f\ (a\ b)\).

A redex is an expression of the form:

\[ (\lambda x .\ e_1)\ e_2 \]

where a lambda abstraction appears on the left of an application.

We can reduce such an expression via a capture-avoiding substitution of \(e_2\) for every instance of \(x\) in \(e_1\):

\[ \frac{}{\quad (\lambda x .\ e_1)\ e_2 \quad \longrightarrow_\beta \quad e_1[x \colonequals e_2] \quad}\ \mathsf{beta} \]

This relation is called beta-reduction (also written as \(\beta\)-reduction in other places), hence the subscript under the reduction relation arrow, \(\longrightarrow_\beta\); this is a relation in the sense that \(\longrightarrow_\beta\) relates the pair of expressions surrounding it.

NOTE I specify this reduction relation as a rule of inference, with a long horizontal line separating zero or more premises above it from the conclusion below it; this notation is standard in PL literature.

According to the small-step operational semantics of untyped lambda calculus, computation is performed by repeatedly transforming the program (a lambda calculus expression) using the \(\mathsf{beta}\) rule. (There are other kinds of semantics which I won’t discuss here.)

Normal Forms

While the \(\mathsf{beta}\) rule tells us how to reduce a redex, it does not tell us where to look for redexes, nor does it specify which redex to reduce when there are multiple. That is the role of a reduction strategy, which I will talk about in a later blog post.

What you need to know about reduction strategies for the purposes of this post is that some reduction strategies do not care about certain redexes within an expression. When a reduction strategy cannot further reduce an expression (either because there are no redexes left, or because the strategy does not care about remaining redexes), that expression is said to be in a normal form.

I say “a normal form” here because different reduction strategies care about different sets of redexes, and thus lead to different kinds of normal forms. The normal form for one strategy may contain redexes that are reducible by another strategy.

Another way of looking at normal forms is that they describe what it means for a lambda calculus expression to have terminated. The “goal” of a reduction strategy is thus to compute a normal form of some expression (if one exists).

NOTE This blog post is only concerned with beta-reduction, so all the normal forms I discuss here are beta normal forms. In particular, I am avoiding the topic of eta-reduction, whereby expressions of the form \(\lambda x .\ e\ x\) can be reduced to \(e\); most real-world programming languages do not perform this kind of reduction.

So later in this post, when I write “weak head normal form”, I specifically mean “beta weak head normal form”, not “beta-eta weak head normal form”. This technicality is also why my specification of normal forms differ from those on Wikipedia.

Beta Normal Form (BNF)

The first normal form we’ll look at is beta normal form, which just describes an expression without any redexes. Anything in beta normal form cannot be further beta-reduced by any reduction strategy, so when you hear folks refer to “the normal form”, they’re probably referring this one.

Like reduction orders, normal forms tell you where to look for redexes (if any), but unlike reduction orders, they do not need to tell you what to do with each redex. Since normal forms need only describe the “shape” of an expression, we can specify a normal form as a grammar. Here’s how we specify beta normal form (written \(E_{BNF}\)):

\[ E_{BNF} \quad \coloncolonequals\quad \lambda x .\ E_{BNF} \quad|\quad x\ E_{BNF}^{\ *} \]

By Squinting A Little™, you should be able to see that this grammar describes a subset of untyped lambda calculus expressions. However, this grammar combines the production rules for variables and applications into a single rule, \(x\ E_{BNF}^{\ *}\), which means a variable \(x\) followed by zero or more beta normal forms (the superscript Kleene star means “zero or more repetitions”).

So why does the grammar for \(E_{BNF}\) eliminate redexes? Recall that juxtaposition is left-associative, so \(x\ E_{BNF_1}\ \ldots\ E_{BNF_n}\) means \(((x\ E_{BNF_1})\ \ldots)\ E_{BNF_n}\). By insisting that the left-most operand of any curried application is a variable, we preclude lambda abstractions from appearing as the left operand any application, thereby eliminating any possible redexes.

NOTE Instead of using a Kleene star, I could have written that last production rule as \(x\ E_{BNF_1}\ \ldots\ E_{BNF_n}\) but that seemed less clear to me. To me, the meaning of “\(\ldots\)” is imprecise, and the notation does not obviously encompass the case where \(n = 0\).

Weak Normal Form (WNF)

Beta normal form precludes redexes anywhere in an expression, but that form may be too restrictive. When we relax beta normal form to allow redexes that appear within the bodies of lambda abstractions, we get weak normal form:

\[ E_{WNF} \quad \coloncolonequals\quad \lambda x .\ e \quad|\quad x\ E_{WNF}^{\ *} \]

The grammar of \(E_{WNF}\) is quite similar to that of \(E_{BNF}\), except inside lambda bodies, we use the non-terminal \(e\) instead of \(E_{WNF}\), allowing any lambda calculus expression to appear there (including those containing redexes).

The fact that we don’t care about redexes “under lambdas” is what makes this normal form “weak”. This notion of weakness is meant to mirror our intuitions about conventional programming languages, where execution does not take place inside a function until all its actual arguments are applied.

Head Normal Form (HNF)

Head normal form is another variation on beta normal form that instead permits redexes in the arguments of a curried application. The term “head” refers to the fact that we only care about redexes in the “head” position of a curried application:

\[ E_{HNF} \quad \coloncolonequals\quad \lambda x .\ E_{HNF} \quad|\quad x\ e^* \]

Like with weak normal form, we use \(e\) to indicate the positions within an expression where we don’t care whether there are redexes. Note that head normal form is orthogonal to weak normal form; head normal form still precludes redexes within lambda abstraction bodies.

As far as I’m aware, head normal form isn’t an especially useful notion for real-world implementations of programming languages, because it still requires an absence of redexes under lambda abstractions. I include it in this discussion for the sake of completeness, and also to illustrate the meaning of “head” in “weak head normal form” (which we will look at next).

Weak Head Normal Form (WHNF)

As its name suggests, weak head normal form permits redexes in arguments and lambda bodies:

\[ E_{WHNF} \quad \coloncolonequals\quad \lambda x .\ e \quad|\quad x\ e^* \]

This form comes up a lot in discussions about Haskell and other lazy languages, which usually reduce terms to weak head normal form. In fact, the term seems to have been coined by Simon Peyton Jones in The Implementation of Functional Programming Languages to justify when to stop the reduction for a lazy functional language.

Categorizing Normal Forms

We’ve now seen a few normal forms, which can be categorized according to whether they allow redexes under lambdas (“weak”) in arguments (“head”):

Redexes in argumentsRedexes under lambdas
yesno ("weak")
yesbeta normal formweak normal form
no ("head")head normal formweak head normal form

This table (along with the nomenclature in this blog post) are adapted from Peter Sestoft’s Demonstrating Lambda Calculus Reduction.

Here are some facts about the relationship between these forms:

  • Any weak normal form is also in weak head normal form (\(E_{WNF} \subset E_{WHNF}\))
  • Any head normal form is in weak head normal form (\(E_{HNF} \subset E_{WHNF}\))
  • Weak head normal forms are either in weak normal form or in head normal form (\(E_{WHNF} = E_{WNF} \cup E_{HNF}\))
  • Weak and head are orthogonal properties (\(E_{WNF} \not\subset E_{HNF}\) and \(E_{HNF} \not\subset E_{WNF}\))
  • Any beta normal form is in weak, head, and weak head normal form (\(E_{BNF} \subset E_{WNF}\), \(E_{BNF} \subset E_{HNF}\), and \(E_{BNF} \subset E_{WHNF}\))


I want to mention one more “normal form” that I haven’t seen discussed in relation to the other normal forms: values. In conventional programming languages, values are what expressions evaluate to, so they function as the normal form of those languages.

Exactly what constitutes a value depends on the primitives available in that language. For instance, if a language contains Booleans (\(\mathtt{true}\) and \(\mathtt{false}\)) and conditional expressions (\(\mathtt{if}\ e\ \mathtt{then}\ e\ \mathtt{else}\ e\)), those Booleans are typically considered values, whereas the conditional is not. However, for functional languages based on the lambda calculus, values typically include at least lambda abstractions and variables:

\[ v \quad \coloncolonequals\quad \lambda x .\ e \quad|\quad x \]

Notice the similarity between the specification for values \(v\) and weak normal forms \(E_{WNF}\) and \(E_{WHNF}\). The difference is that the latter production rule for \(v\) precludes any applications. Thus, values are a subset of weak normal forms (\(v \subset E_{WNF}\)).

Values will later become relevant to typed lambda calculi, whose type systems enforce that well-typed expressions eventually reduce to a value. In other words, those type systems preclude expressions that may get stuck reducing ill-typed applications like \(\mathtt{true}\ \mathtt{false}\).

Examples Expressions

Here some examples of lambda calculus expressions that satisfy various normal forms discussed in this blog post:

\[(\lambda x .\ x\ x)\ (\lambda x .\ x\ x)\]
\[\lambda x .\ x\ x\]
\[f\ (\lambda x .\ x\ x)\ (\lambda x .\ x\ x)\]
\[f\ ((\lambda x .\ x\ x)\ (\lambda x .\ x\ x))\]
\[\lambda y .\ (\lambda x .\ x\ x)\ (\lambda x .\ x\ x)\]
\[\lambda y .\ y\ ((\lambda x .\ x\ x)\ (\lambda x .\ x\ x))\]

NOTE In these examples, the variable \(f\) is a free variable, meaning it is not introduced by the binder of an enclosing lambda abstraction.


In this post, I looked at a handful of normal forms, compared them, and gave a few examples of expressions that satisfy some normal forms but not others. I hope this brief survey helps at least some readers form a better intuition about what these terms mean.

What I haven’t done in this post is show where these normal forms are relevant, in the context of reduction strategies and type systems. I hope to do so in upcoming blog posts, so stay tuned.

I’ve also deliberately avoided grounding this discussion to Haskell, since there is plenty of other writing that already does so.

Please don’t hesitate to reach out via email or Twitter if there are any technical oversights in my writing, or if something could have been explained more accurately or clearly.



Online Citations

Further Reading