A structured approach to writing code for software and analyses

Luke Johnston

2025-10-23

Outline

  • Simple introduction to category theory

  • Category theory in computers: Functional programming

  • Benefits of functional programming

  • Example use cases in data analysis

  • Example use cases in software development

Category theory

Lots of math in category theory, but we’ll only cover the basics.

What is category theory?

  • Branch of mathematics.
  • Deals with abstract objects and connection between.
  • Used in many fields of math and programming.

Based on abstractions and compositions

Abstraction

  • Collection of things (“objects”).
  • Actions (“morphisms”) that transform the things.

Composition

  • Actions can combine into a new action.
  • Combined action produces the same result as sequence of them.

Similar to Graph Theory

  • Graphs have nodes and edges.

  • Categories have objects and arrows (called “morphisms”).

A full graph is a category

Category = composition of all objects and arrows

graph LR
  A(A) --> B(B)
  B --> C(C)
  A --> C

Objects can be anything

An abstraction of anything: a number, a string, a function.

Objects can contain other objects

  • A vector: Contains sub-objects of numbers or strings.

  • A data frame: Contains sub-objects of vectors.

  • A list/collection: Contains sub-objects of any type.

Types (of an object) are math objects with specific properties

  • Necessary for correct mathematical composability.

  • You can create any type you want.

Types are sets of values or other types

A boolean type can only have two values:

  • True
  • False

An integer type can have infinite values:

  • Negative infinity to positive infinity

Express types with object: type notation

A: Integer
B: String
D: Function

Can be empty (()):

y: ()

Product/sum type:

C: List(A, B)

Algebraic data types: Bigger container types

  • Holds other types or values.

  • Two kinds: Sum types and product types.

  • Can compose together (e.g. sum in product).

Algebraic data types: Sum types

Number of possible types/values is the sum of the types/values inside.

Colours: Red | Green | Blue
Bool: True | False

3 + 2 = 5 possible types/values

Algebraic data types: Product types

Number of possible types is the product of the types inside.

Colours: Red | Green | Blue
Bool: True | False

3 * 2 = 6 possible types

Types are necessary for actions

Actions take (input) a type and output a type.

Actions are math transformations

  • Transform one object to another object.

  • Action will always produce same type.

  • Don’t need to know how, only the input and output types.

Simplified syntax for actions

Actual math syntax is more complex.

f(A: Integer) -> B: String

g(B: String) -> C: Boolean

Actions are composable

. = composition.

g(f(A: Integer)) -> C: Boolean

h = f . g

h(A: Integer) -> C: Boolean

“f composed into g” or “f followed by g”

Composing via piping

Can “chain” or “pipe” functions via composition.

g(f(A)) = f(A) . g()

Piping helps with readability.

Graphically demonstrated with:

graph LR
  A("A: Integer") -->|"f()"| B(B: String) -->|"g()"| C(C: Boolean)
  A -->|"h()"| C

Functors: A type of algebraic structure

  • Type that allows an action to be applied to types inside.

  • But keeps the overall structure.

  • E.g. lists are usually functors.

Functor example: Map over a list

A: Integer
B: Boolean

map f(List(A, A)) -> List(B, B)

From product type to product type, but different types inside.

Graphically demonstrated with:

graph LR
    subgraph cat1 [List 1]
      A1(A: Integer)
      A2(A: Integer)
    end
    subgraph cat2 [List 2]
      B1(B: Boolean)
      B2(B: Boolean)
    end

    cat1:::functor -->|"map F"| cat2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2
    classDef functor fill:white;

F here is the type “function” that map uses as input.

Category theory in computers: Functional programming

Programming: Art of managing the complexity of solving a problem

Complex is only a problem for our limited human minds. Solved with:

  • (De-)Composition
  • Abstraction
  • Predictability

(De-)composition by breaking down a problem

  • Take a bigger problem
  • Decompose (break it down) into smaller problems/pieces
  • Solve each smaller piece
  • Compose (combine) those pieces together
  • Solve the bigger problem
  • Abstract away details of smaller pieces
  • Focus on general relationships between pieces

Abstraction: Hide complexity and details

Not have to imperatively state how exactly to do something, every time.

Immutability is an implicit assumption in category theory

  • Existing objects don’t change when doing an action.

  • Otherwise, the math won’t work.

No side effects allowed in “pure” functions

  • E.g. changing values of an existing type.

  • Can only output one type.

Explicit types described in function input and output

f(A: Integer) -> B: String

Same input always equals same output

Predictable and testable.

Functions are also types of objects

Can use them in other functions.

Higher order functions (e.g. for functors) that take a function as an input

Tools like map, filter, reduce.

Benefits of functional programming

Power comes from it’s mathematical foundation

Can construct mathematically provable programs.

Declarative (vs imperative) programming

  • Humans naturally think declaratively.

  • Computers operate imperatively.

“What do you want your program to produce?” vs “How do you want your program to produce it?”

Low-level languages abstract away the imperative steps

So that we can work declaratively (e.g. we don’t write in assembly or machine code).

Declarative allows us to focus on what we want to solve and the goal

  • Closer match between mental model of the design of solution to problem and the implementation.

  • Action on the object, e.g. “buy groceries” or “pick up the kids”.

Declarative tends to need less code, is more readable

Less code = easier to maintain and read.

Functional design pattern that can be used in many languages

Predictability and testability

Same output from same input.

Caching and speed

Functions can be memoized (cached when same input is used)

Easier parallel processing with pure functions and maps

graph LR
    subgraph cat1 [List 1]
      A1(A)
      A2(A)
    end
    subgraph cat2 [List 2]
      B1(B)
      B2(B)
    end

    cat1:::functor -->|"map F"| cat2:::functor

    A1 -->|"f() in core 1"| B1
    A2 -->|"f() in core 2"| B2

    classDef functor fill:white

Distributed/asynchronous computing: Futures and promises

Abstract containers (types):

  • Future = A value that will be available later

  • Promise = How to get that value later

Use cases in data analysis

Functional programming is very common in data analysis, though often not explicitly recognized.

Foundation of SQL is functional and declarative

Design of databases and the queries are functional.

Excel spreadsheet formulas are functional

Typical data analysis workflow

Decompose into small pieces/steps:

  • Clean data
  • Apply transformations
  • Model data
  • Extract results
  • Visualize results

Keep decomposing, e.g. within cleaning data

  • Convert empty string to missing.
  • Remove missing values.

Cleaning stage would be a “category”

  • Objects (or types) are “data at each step”.
  • Arrows are functions that transform data.

In R, there are few, simple objects/types

  • Most common type of objects are: data.frame, vector, list.

  • Are algebraic data types (e.g. data.frame and list are product types).

  • Are functors (can apply function to internal types).

  • Are (mostly) immutable.

In Python, it’s more complicated

  • Object types depend on the package you’re use.

  • Objects (classes) are mutable.

  • Functional programming features are mostly an afterthought.

Graphically demonstrated with:

graph TB
  A(Data: Raw) -->|"empty to missing"| B(Data: Empty as missing)
  B -->|"remove missing"| C(Data: Cleaned)

Zoom in: Convert to missingness is a functor (map) step

graph LR
    subgraph df1 [Data frame]
      A1(Column 1)
      A2(Column 2)
    end
    subgraph df2 [Data frame]
      B1(Modified<br>column 1)
      B2(Modified<br>column 2)
    end

    df1:::functor -->|"map F"| df2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2

    classDef functor fill:white;

Applying to other cases: Mapping a model function over a list of formula

graph LR
    subgraph cat1 [List 1]
      A1(Model<br>formula 1)
      A2(Model<br>formula 2)
    end
    subgraph cat2 [List 2]
      B1(Results 1)
      B2(Results 2)
    end

    cat1:::functor -->|"map F(formula, data)"| cat2:::functor

    A1 -->|"f()"| B1
    A2 -->|"f()"| B2

    classDef functor fill:white;

Functional piping is (now) common in R—makes more readable code

result1 = data |> clean() |> transform()
result2 = transform(clean(data))
identical(result1, result2)

Python doesn’t have piping.

Data cleaning example in R

removing_missing <- function(data) { ... }
standardize_column_names <- function(data) { ... }
clean_data <- function(data) {
  data |>
    remove_missing() |>
    standardize_column_names()
}
cleaned_data <- clean_data(raw_data)

targets R package: Manage complex analysis pipelines

Requires writing functionally.

File: _targets.R
library(targets)
# ...
list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, get_data(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, plot_model(model, data))
)

Code from targets documentation.

Easy parallel processing with furrr R package

Simple because of functional programming design:

library(furrr)
plan(multisession)
result <- future_map(
  list_of_data,
  transform_function
)

See furrr documentation.

Parallel processing isn’t as easy in Python

Need functional programming in Python to do (multiprocessing).

Use cases in software development

Design stage: Focus on objects and their types

(Not objects in object-oriented programming.)

  • Then build up actions around those objects.

  • Helps build the foundation of the software.

Better testing and predictability

  • No state changes, no side effects.

Make objects with actions, compose together

  • Decompose objects with clear actions on them.

Put objects into containers as functors, use map

  • Build containers for objects to act as functors.

  • Apply functions to all objects via functors.

Big benefit: Emphasis on objects’ types

Strong typing helps enforce correct types and compositions

Rust

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Python

def add(a: int, b: int) -> int:
    return a + b

Python types are not enforced. R doesn’t have these type annotations.

Only one output type per function

Keeps it simple and predictable. So this isn’t functional in Python:

def keep_num_gt_zero(x: int) -> int | None:
    if x > 0:
        return x
    return None

Solve multiple output types with railway-oriented programming

graph LR
    In(Input) -->|f1| fn{internal}
    fn --> A1(Type A)
    fn --> B1(Type B)
    A1 & B1 -->|return| Out(Sum type)
    Out -->|"f2"| A2(Result A)
    Out -->|"f3"| B2(Result B)

Example using sum types: Enums in Rust

enum Direction {
    North,
    South,
    East,
    West
}
Direction::North

Direction can only have one value and there are four possible values.

Example using sum types: Enums in Rust

enum GameResult {
    \\ Integer
    Score(i8),
    \\ Lost
    Lost,
}
GameResult::Score(10)
GameResult::Lost

Either score or lost, but not both.

Sum types: Enums in Python

from enum import Enum
class Direction(Enum):
    North = "North"
    South = "South"
    East = "East"
    West = "West"

Need to add a value to the enum in Python.

Monads: A way to handle side effects

  • Essentially is a sum or product type as object.

  • Must implement a flat map function (e.g. “unwraps” a container).

Illustration of a monad and flat map

graph LR
    subgraph M [Sum type]
      A(Type A)
      B(Type B)
    end

    M::monad -->|M f| Out(Type A)
    A -->|f| Out

    classDef monad fill:white;

Example monad: Option type in Rust

enum Option<T> {
    Some(T),
    None,
}

Example monad: Using Option type in Rust

fn divide(n1: i32, n2: i32) -> Option<i32> {
    if n2 == 0 {
        None
    } else {
        Some(n1 / n2)
    }
}
fn add_one(n: Option<i32>) -> i32 { ... }

Example: Option type in Python with Optional

from typing import Optional
def divide(n1: int, n2: int) -> Optional[int]:
    if n2 == 0:
        return None
    return n1 // n2

Or use Maybe from returns package

from returns.maybe import Maybe, Some, Nothing
def divide(n1: int, n2: int) -> Maybe[int]:
    if n2 == 0:
        return Nothing
    return Some(n1 // n2)

returns package docs.

Example of handling errors: Result type

At least two possible values: Success and its type or Error

enum Result<T, E> {
    Ok(T), // Successful result
    Err(E), // Why error occurred
}

Easy to see errors by looking at function signature with Result

fn read_file(path: String) -> Result<String, String> {
    // ... Pseudo-code
    if file_contents {
        Ok(file_content)
    } else {
        Err("File not found".to_string())
    }
}

Use Result as “railway” intersection

Need to convert Result to String before using it.

fn process(data: String) -> String { ... }
read_file("file.txt")
    .expect("Didn't read file.")
    .process()
read_file("file.txt")?.process()

Explicitly handle Result with match

let file = read_file("file.txt")
match file {
    Ok(content) => process(content),
    // Crash program
    Err(error) => panic!("{}", error),
}

Can use Results from returns package in Python

Summarising

Category theory

  • Objects and actions
  • Types as sets of values
  • Sum types and product types
  • Composition of objects and actions
  • Functors: Functions used in a container (e.g. map)
  • Monads: Unwrap a container (e.g. flat map)

Functional programming

  • Declarative programming
  • Design focus via composition, abstraction, flow
  • Same input = same output, no side effects
  • Functions are objects
  • Objects are immutable
  • Higher order functions (functors like map)
  • Monads for side effects (e.g. Option, Result)

Resources

  • https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/
  • https://kindafunctional.com/index.html
  • https://en.wikipedia.org/wiki/Functional_programming
  • https://en.wikipedia.org/wiki/Category_theory
  • https://en.wikipedia.org/wiki/Monad_(functional_programming)
  • https://algocademy.com/blog/understanding-monads-in-functional-programming-a-comprehensive-guide/
  • https://www.turingtaco.com/functors-the-key-to-scalable-functional-code/