A structured approach to writing code for software and analyses

Luke Johnston

2025-10-23

Outline

Simple introduction to category theory
Category theory in computers: Functional programming
Benefits of functional programming
Example use cases in data analysis
Example use cases in software development

Category theory

Lots of math in category theory, but we’ll only cover the basics.

What is category theory?

Branch of mathematics.
Deals with abstract objects and connection between.
Used in many fields of math and programming.

Based on abstractions and compositions

Abstraction

Collection of things (“objects”).
Actions (“morphisms”) that transform the things.

Composition

Actions can combine into a new action.
Combined action produces the same result as sequence of them.

Similar to Graph Theory

Graphs have nodes and edges.
Categories have objects and arrows (called “morphisms”).

A full graph is a category

Category = composition of all objects and arrows

graph LR
  A(A) --> B(B)
  B --> C(C)
  A --> C

Objects can be anything

An abstraction of anything: a number, a string, a function.

Objects can contain other objects

A vector: Contains sub-objects of numbers or strings.
A data frame: Contains sub-objects of vectors.
A list/collection: Contains sub-objects of any type.

Types (of an object) are math objects with specific properties

Necessary for correct mathematical composability.
You can create any type you want.

Types are sets of values or other types

A boolean type can only have two values:

True
False

An integer type can have infinite values:

Negative infinity to positive infinity

Express types with `object: type` notation

A: Integer
B: String
D: Function

Can be empty (()):

y: ()

Product/sum type:

C: List(A, B)

Algebraic data types: Bigger container types

Holds other types or values.
Two kinds: Sum types and product types.
Can compose together (e.g. sum in product).

Algebraic data types: Sum types

Number of possible types/values is the sum of the types/values inside.

Colours: Red | Green | Blue
Bool: True | False

3 + 2 = 5 possible types/values

Algebraic data types: Product types

Number of possible types is the product of the types inside.

Colours: Red | Green | Blue
Bool: True | False

3 * 2 = 6 possible types

Types are necessary for actions

Actions take (input) a type and output a type.

Actions are math transformations

Transform one object to another object.
Action will always produce same type.
Don’t need to know how, only the input and output types.

Simplified syntax for actions

Actual math syntax is more complex.

f(A: Integer) -> B: String

g(B: String) -> C: Boolean

Actions are composable

. = composition.

g(f(A: Integer)) -> C: Boolean

h = f . g

h(A: Integer) -> C: Boolean

“f composed into g” or “f followed by g”

Composing via piping

Can “chain” or “pipe” functions via composition.

g(f(A)) = f(A) . g()

Piping helps with readability.

Graphically demonstrated with:

graph LR
  A("A: Integer") -->|"f()"| B(B: String) -->|"g()"| C(C: Boolean)
  A -->|"h()"| C

Functors: A type of algebraic structure

Type that allows an action to be applied to types inside.
But keeps the overall structure.
E.g. lists are usually functors.

Functor example: Map over a list

A: Integer
B: Boolean

map f(List(A, A)) -> List(B, B)

From product type to product type, but different types inside.

Graphically demonstrated with:

graph LR
    subgraph cat1 [List 1]
      A1(A: Integer)
      A2(A: Integer)
    end
    subgraph cat2 [List 2]
      B1(B: Boolean)
      B2(B: Boolean)
    end

    cat1:::functor -->|"map F"| cat2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2
    classDef functor fill:white;

F here is the type “function” that map uses as input.

Category theory in computers: Functional programming

Programming: Art of managing the complexity of solving a problem

Complex is only a problem for our limited human minds. Solved with:

(De-)Composition
Abstraction
Predictability

(De-)composition by breaking down a problem

Take a bigger problem
Decompose (break it down) into smaller problems/pieces
Solve each smaller piece
Compose (combine) those pieces together
Solve the bigger problem
Abstract away details of smaller pieces
Focus on general relationships between pieces

Abstraction: Hide complexity and details

Not have to imperatively state how exactly to do something, every time.

Immutability is an implicit assumption in category theory

Existing objects don’t change when doing an action.
Otherwise, the math won’t work.

No side effects allowed in “pure” functions

E.g. changing values of an existing type.
Can only output one type.

Explicit types described in function input and output

f(A: Integer) -> B: String

Same input always equals same output

Predictable and testable.

Functions are also types of objects

Can use them in other functions.

Higher order functions (e.g. for functors) that take a function as an input

Tools like map, filter, reduce.

Benefits of functional programming

Power comes from it’s mathematical foundation

Can construct mathematically provable programs.

Declarative (vs imperative) programming

Humans naturally think declaratively.
Computers operate imperatively.

“What do you want your program to produce?” vs “How do you want your program to produce it?”

Low-level languages abstract away the imperative steps

So that we can work declaratively (e.g. we don’t write in assembly or machine code).

Declarative allows us to focus on what we want to solve and the goal

Closer match between mental model of the design of solution to problem and the implementation.
Action on the object, e.g. “buy groceries” or “pick up the kids”.

Declarative tends to need less code, is more readable

Less code = easier to maintain and read.

Functional design pattern that can be used in many languages

Predictability and testability

Same output from same input.

Caching and speed

Functions can be memoized (cached when same input is used)

Easier parallel processing with pure functions and maps

graph LR
    subgraph cat1 [List 1]
      A1(A)
      A2(A)
    end
    subgraph cat2 [List 2]
      B1(B)
      B2(B)
    end

    cat1:::functor -->|"map F"| cat2:::functor

    A1 -->|"f() in core 1"| B1
    A2 -->|"f() in core 2"| B2

    classDef functor fill:white

Distributed/asynchronous computing: Futures and promises

Abstract containers (types):

Future = A value that will be available later
Promise = How to get that value later

Use cases in data analysis

Functional programming is very common in data analysis, though often not explicitly recognized.

Foundation of SQL is functional and declarative

Design of databases and the queries are functional.

Excel spreadsheet formulas are functional

Typical data analysis workflow

Decompose into small pieces/steps:

Clean data
Apply transformations
Model data
Extract results
Visualize results

Keep decomposing, e.g. within cleaning data

Convert empty string to missing.
Remove missing values.

Cleaning stage would be a “category”

Objects (or types) are “data at each step”.
Arrows are functions that transform data.

In R, there are few, simple objects/types

Most common type of objects are: data.frame, vector, list.
Are algebraic data types (e.g. data.frame and list are product types).
Are functors (can apply function to internal types).
Are (mostly) immutable.

In Python, it’s more complicated

Object types depend on the package you’re use.
Objects (classes) are mutable.
Functional programming features are mostly an afterthought.

Graphically demonstrated with:

graph TB
  A(Data: Raw) -->|"empty to missing"| B(Data: Empty as missing)
  B -->|"remove missing"| C(Data: Cleaned)

Zoom in: Convert to missingness is a functor (map) step

graph LR
    subgraph df1 [Data frame]
      A1(Column 1)
      A2(Column 2)
    end
    subgraph df2 [Data frame]
      B1(Modified<br>column 1)
      B2(Modified<br>column 2)
    end

    df1:::functor -->|"map F"| df2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2

    classDef functor fill:white;

Applying to other cases: Mapping a model function over a list of formula

graph LR
    subgraph cat1 [List 1]
      A1(Model<br>formula 1)
      A2(Model<br>formula 2)
    end
    subgraph cat2 [List 2]
      B1(Results 1)
      B2(Results 2)
    end

    cat1:::functor -->|"map F(formula, data)"| cat2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2

    classDef functor fill:white;

Functional piping is (now) common in R—makes more readable code

result1 = data |> clean() |> transform()
result2 = transform(clean(data))
identical(result1, result2)

Python doesn’t have piping.

Data cleaning example in R

removing_missing <- function(data) { ... }
standardize_column_names <- function(data) { ... }
clean_data <- function(data) {
  data |>
    remove_missing() |>
    standardize_column_names()
}
cleaned_data <- clean_data(raw_data)

`targets` R package: Manage complex analysis pipelines

Requires writing functionally.

File: _targets.R

library(targets)
# ...
list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, get_data(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, plot_model(model, data))
)

Code from targets documentation.

Easy parallel processing with furrr R package

Simple because of functional programming design:

library(furrr)
plan(multisession)
result <- future_map(
  list_of_data,
  transform_function
)

See furrr documentation.

Parallel processing isn’t as easy in Python

Need functional programming in Python to do (multiprocessing).

Use cases in software development

Design stage: Focus on objects and their types

(Not objects in object-oriented programming.)

Then build up actions around those objects.
Helps build the foundation of the software.

Better testing and predictability

No state changes, no side effects.

Make objects with actions, compose together

Decompose objects with clear actions on them.

Put objects into containers as functors, use map

Build containers for objects to act as functors.
Apply functions to all objects via functors.

Big benefit: Emphasis on objects’ types

Strong typing helps enforce correct types and compositions

Rust

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Python

def add(a: int, b: int) -> int:
    return a + b

Python types are not enforced. R doesn’t have these type annotations.

Only one output type per function

Keeps it simple and predictable. So this isn’t functional in Python:

def keep_num_gt_zero(x: int) -> int | None:
    if x > 0:
        return x
    return None

Solve multiple output types with railway-oriented programming

graph LR
    In(Input) -->|f1| fn{internal}
    fn --> A1(Type A)
    fn --> B1(Type B)
    A1 & B1 -->|return| Out(Sum type)
    Out -->|"f2"| A2(Result A)
    Out -->|"f3"| B2(Result B)

Example using sum types: Enums in Rust

enum Direction {
    North,
    South,
    East,
    West
}
Direction::North

Direction can only have one value and there are four possible values.

Example using sum types: Enums in Rust

enum GameResult {
    \\ Integer
    Score(i8),
    \\ Lost
    Lost,
}
GameResult::Score(10)
GameResult::Lost

Either score or lost, but not both.

Sum types: Enums in Python

from enum import Enum
class Direction(Enum):
    North = "North"
    South = "South"
    East = "East"
    West = "West"

Need to add a value to the enum in Python.

Monads: A way to handle side effects

Essentially is a sum or product type as object.
Must implement a flat map function (e.g. “unwraps” a container).

Illustration of a monad and flat map

graph LR
    subgraph M [Sum type]
      A(Type A)
      B(Type B)
    end

    M::monad -->|M f| Out(Type A)
    A -->|f| Out

    classDef monad fill:white;

Example monad: Option type in Rust

enum Option<T> {
    Some(T),
    None,
}

Example monad: Using `Option` type in Rust

fn divide(n1: i32, n2: i32) -> Option<i32> {
    if n2 == 0 {
        None
    } else {
        Some(n1 / n2)
    }
}
fn add_one(n: Option<i32>) -> i32 { ... }

Example: Option type in Python with `Optional`

from typing import Optional
def divide(n1: int, n2: int) -> Optional[int]:
    if n2 == 0:
        return None
    return n1 // n2

Or use `Maybe` from `returns` package

from returns.maybe import Maybe, Some, Nothing
def divide(n1: int, n2: int) -> Maybe[int]:
    if n2 == 0:
        return Nothing
    return Some(n1 // n2)

returns package docs.

Example of handling errors: Result type

At least two possible values: Success and its type or Error

enum Result<T, E> {
    Ok(T), // Successful result
    Err(E), // Why error occurred
}

Easy to see errors by looking at function signature with `Result`

fn read_file(path: String) -> Result<String, String> {
    // ... Pseudo-code
    if file_contents {
        Ok(file_content)
    } else {
        Err("File not found".to_string())
    }
}

Use `Result` as “railway” intersection

Need to convert Result to String before using it.

fn process(data: String) -> String { ... }
read_file("file.txt")
    .expect("Didn't read file.")
    .process()
read_file("file.txt")?.process()

Explicitly handle `Result` with `match`

let file = read_file("file.txt")
match file {
    Ok(content) => process(content),
    // Crash program
    Err(error) => panic!("{}", error),
}

Can use `Results` from `returns` package in Python

Summarising

Category theory

Objects and actions
Types as sets of values
Sum types and product types
Composition of objects and actions
Functors: Functions used in a container (e.g. map)
Monads: Unwrap a container (e.g. flat map)

Functional programming

Declarative programming
Design focus via composition, abstraction, flow
Same input = same output, no side effects
Functions are objects
Objects are immutable
Higher order functions (functors like map)
Monads for side effects (e.g. Option, Result)

Resources

https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/
https://kindafunctional.com/index.html
https://en.wikipedia.org/wiki/Functional_programming
https://en.wikipedia.org/wiki/Category_theory
https://en.wikipedia.org/wiki/Monad_(functional_programming)
https://algocademy.com/blog/understanding-monads-in-functional-programming-a-comprehensive-guide/
https://www.turingtaco.com/functors-the-key-to-scalable-functional-code/

A structured approach to writing code for software and analyses

Outline

Category theory

What is category theory?

Based on abstractions and compositions

Abstraction

Composition

Similar to Graph Theory

A full graph is a category

Objects can be anything

Objects can contain other objects

Types (of an object) are math objects with specific properties

Types are sets of values or other types

Express types with object: type notation

Algebraic data types: Bigger container types

Algebraic data types: Sum types

Algebraic data types: Product types

Types are necessary for actions

Actions are math transformations

Simplified syntax for actions

Actions are composable

Composing via piping

Graphically demonstrated with:

Functors: A type of algebraic structure

Functor example: Map over a list

Graphically demonstrated with:

Category theory in computers: Functional programming

Programming: Art of managing the complexity of solving a problem

(De-)composition by breaking down a problem

Abstraction: Hide complexity and details

Immutability is an implicit assumption in category theory

No side effects allowed in “pure” functions

Explicit types described in function input and output

Same input always equals same output

Functions are also types of objects

Higher order functions (e.g. for functors) that take a function as an input

Benefits of functional programming

Power comes from it’s mathematical foundation

Declarative (vs imperative) programming

Low-level languages abstract away the imperative steps

Declarative allows us to focus on what we want to solve and the goal

Declarative tends to need less code, is more readable

Functional design pattern that can be used in many languages

Predictability and testability

Caching and speed

Easier parallel processing with pure functions and maps

Distributed/asynchronous computing: Futures and promises

Use cases in data analysis

Foundation of SQL is functional and declarative

Excel spreadsheet formulas are functional

Typical data analysis workflow

Keep decomposing, e.g. within cleaning data

Cleaning stage would be a “category”

In R, there are few, simple objects/types

In Python, it’s more complicated

Graphically demonstrated with:

Zoom in: Convert to missingness is a functor (map) step

Applying to other cases: Mapping a model function over a list of formula

Functional piping is (now) common in R—makes more readable code

Data cleaning example in R

targets R package: Manage complex analysis pipelines

Easy parallel processing with furrr R package

Parallel processing isn’t as easy in Python

Use cases in software development

Design stage: Focus on objects and their types

Better testing and predictability

Make objects with actions, compose together

Put objects into containers as functors, use map

Big benefit: Emphasis on objects’ types

Strong typing helps enforce correct types and compositions

Rust

Python

Only one output type per function

Solve multiple output types with railway-oriented programming

Example using sum types: Enums in Rust

Example using sum types: Enums in Rust

Sum types: Enums in Python

Monads: A way to handle side effects

Illustration of a monad and flat map

Example monad: Option type in Rust

Express types with `object: type` notation

`targets` R package: Manage complex analysis pipelines

Example monad: Using `Option` type in Rust

Example: Option type in Python with `Optional`

Or use `Maybe` from `returns` package

Easy to see errors by looking at function signature with `Result`

Use `Result` as “railway” intersection

Explicitly handle `Result` with `match`

Can use `Results` from `returns` package in Python