TypeScript on LLVM: Monomorphization and Native Codegen

Why LLVM for TypeScript?

An ahead-of-time compiler lives in a different regime than a JIT. A JIT compiles while the user waits, so compile latency is the constraint. An AOT compiler like Perry compiles once — on the developer's machine or in CI — and the binary is executed millions of times afterwards. That asymmetry is exactly where a heavyweight optimizer pays for itself.

LLVM brings two decades of middle-end work: loop vectorization, loop-invariant code motion, global value numbering, sparse conditional constant propagation, aggressive inlining, alias analysis. Perry's job is to hand that machinery IR it can actually optimize — which is where TypeScript's type information comes in.

The lowering pipeline

Source is parsed with SWC, then lowered to a typed high-level IR (HIR) where the interesting decisions happen before LLVM ever sees the code:

Monomorphization. Generic functions and classes are specialized per concrete instantiation, the same strategy Rust and C++ use. Stack<number> and Stack<string> become two independent, fully-typed functions — so the optimizer works with concrete types instead of a generic dispatch blob, and generics cost nothing at run time.
Static dispatch. Where the receiver type is known at compile time, method calls compile to direct calls that LLVM can inline, not hash-table lookups.
Direct field access. Object fields resolve to compile-time indices, so a property read is a fixed-offset load — not a dictionary lookup.

NaN-boxing and inline lowerings

Where values are dynamic, Perry uses NaN-boxing: every value is a 64-bit word. Doubles are stored directly; objects, strings, booleans, null, and undefined are encoded into the unused bit patterns of an IEEE 754 quiet NaN. Numbers are zero-cost — no boxing, no allocation for arithmetic.

The catch is that operations on non-number values need unpack-operate-repack bit sequences. If those sequences live as calls into a separately-compiled runtime, LLVM sees opaque black boxes and can't optimize across them. So Perry emits hot operations — property loads, method dispatch, object allocation — as inline LLVM IR that the optimizer can fuse and simplify. Object allocation, for example, compiles down to an inline thread-local bump allocation:

LLVM IR — inline bump allocation

%off_ptr = getelementptr i8, ptr %state, i64 8
%offset  = load i64, ptr %off_ptr        ; current bump offset
%new_off = add i64 %offset, 96           ; headers + 8 fields
%sz_ptr  = getelementptr i8, ptr %state, i64 16
%size    = load i64, ptr %sz_ptr         ; block capacity
%fits    = icmp ule i64 %new_off, %size
br i1 %fits, label %fast, label %slow

Why not Cranelift?

Perry's first backend was Cranelift — the codegen behind wasmtime, built for fast, predictable compilation. It was the right starting point, and it remains an excellent choice for JITs and sandboxed runtimes. Two things forced the switch:

The optimizer ceiling. Cranelift is deliberately a fast single-tier compiler: “decent code quickly,” which is the right trade for a JIT and the wrong one for an AOT compiler whose selling point is peak native performance.
arm64_32. Apple Watch uses an ABI (64-bit instructions, 32-bit pointers) that Cranelift doesn't support. For watchOS to exist as a target, LLVM was required — and maintaining two backends meant two sets of bugs, tests, and performance baselines.

The migration was not free: the first LLVM-only release regressed some benchmarks by up to 70x because hot operations initially went through opaque runtime helper calls. Recovering — inline lowerings, the bump allocator above, better inlining boundaries — took the backend past Cranelift's numbers, and by the time it settled Perry beat Node.js on every benchmark in its suite, by 1.7x to 24.6x with two ties (April 2026). The full post-mortem is worth reading: From Cranelift to LLVM.

Going deeper

The compiler internals page covers NaN-boxing, monomorphization, and static dispatch in more detail. On the blog, Optimizing Everything walks through the optimization work release by release, and Gen GC, lazy JSON, and defensible benchmarks explains how the benchmark methodology works (RUNS=11, median + p95). For the bigger picture, start at the TypeScript native compiler overview.

TypeScript on LLVM

Why LLVM for TypeScript?

The lowering pipeline

NaN-boxing and inline lowerings

Why not Cranelift?

Going deeper

See the output yourself