The JIT of PHP 8
PHP has never run your code. It has always run a translation of it.
When you write $a + $b, the CPU never sees an addition. It sees a long loop that, on every turn, reads an intermediate instruction, looks up who knows how to handle it, executes it on data structures wrapped in metadata, and starts again. This layer of mediation is what makes PHP comfortable to write and slow to run. It is the interpreter tax.
The JIT (Just-In-Time compiler), introduced in PHP 8.0 in November 2020, is the first mechanism in the language capable of removing that tax: it translates PHP byte-code directly into native machine code while the program is running. The CPU stops interpreting and starts executing.
In this article we look at what really happens under the hood — from the compilation pipeline to the Zend VM loop, from the two JIT strategies to the CRTO configuration — with a benchmark that was run and verified, not imagined.
The journey of a PHP script
Before understanding what the JIT accelerates, you need to understand what slows things down. A PHP script goes through four stages before producing a result.
| Stage | Input | Output | Component |
|---|---|---|---|
| Lexing | Source code | Tokens | Lexer |
| Parsing | Tokens | AST | Parser |
| Compilation | AST | Opcodes | Compiler |
| Execution | Opcodes | Result | Zend VM |
The first three stages turn the text you write into opcodes — the low-level instructions of the Zend Engine, PHP's equivalent of byte-code. A line like $c = $a + $b is not a single operation: it compiles into something like ADD $a, $b -> ~tmp followed by ASSIGN $c, ~tmp. Each opcode has its own handler, a C function that knows how to execute it.
The last stage is the one that matters. The Zend Virtual Machine is, in essence, a loop that does one single thing, millions of times:
while (opcode = next_opcode()) {
handler = handlers[opcode->type];
handler(opcode);
}
This is the heart of the interpreter. On each iteration the VM must decide which handler to call based on the opcode type — a dispatch that, repeated billions of times, becomes a measurable cost.
Why it is slow
The problem is not just the dispatch. It is what the handlers do.
PHP has dynamic types. The same variable can hold an integer, then a string, then an array. To manage this flexibility, every value is wrapped in a structure called a zval, which contains the value, its type, and a reference count.
When the ADD handler executes $a + $b, it does not sum two numbers. It has to: read the zval of $a, read the zval of $b, check their types at runtime, decide whether it is an integer addition, a float addition, a disguised concatenation, or an overload on an object, perform the right operation, and finally build a new zval for the result.
All of this machinery fires even when
$aand$bare always two integers, and they are so on every iteration of a loop that runs a million times.
The interpreter does not know this. It cannot know it: its job is to be ready for anything. And paying to be ready for anything, on every single opcode, is exactly what the JIT eliminates.
OPcache, the prerequisite
The JIT is not a separate entity. It lives inside OPcache, the extension PHP has used for years to avoid recompiling the same script on every request.
Without OPcache, every HTTP request redoes everything from scratch: lexing, parsing, compilation. For a script that does not change, it is pure waste. OPcache solves this by storing the compiled opcodes in shared memory. From the second request onwards, PHP jumps straight to execution.
Without OPcache: source -> tokens -> AST -> opcodes -> EXECUTION (every request)
With OPcache: [opcodes cached] -> opcodes -> EXECUTION (from 2nd request)
With JIT: [opcodes cached] -> [native code cached] -> CPU
OPcache eliminates recompilation. But execution stays interpreted: the opcodes still pass through the Zend VM loop. The JIT adds one more level to the cache — no longer just shared opcodes, but shared native code. This is why it is enabled through OPcache, and why it does not work if OPcache is switched off.
What the JIT actually does
The JIT takes the opcodes and, instead of feeding them to the VM loop, translates them into machine instructions the CPU runs directly. Two things disappear: the dispatch (there is no longer a loop deciding which handler to call) and, where possible, the zval machinery.
The second point is where the magic happens. It is called type inference.
If the JIT compiler manages to prove that, at a certain point in the code, a variable is always a float, then it does not need to emit the generic code that checks the type, handles zvals, and contemplates every possible case. It can emit a single machine instruction — an addsd on an SSE register for a float addition — and nothing else.
It is the difference between:
fetch zval $a; check type; fetch zval $b; check type;
branch on int/float/string/object; add; alloc result zval; write
and:
addsd xmm0, xmm1
One operation against a dozen. Multiplied by every iteration of a tight numeric loop, this is where the gain comes from.
Function JIT versus Tracing JIT
PHP does not have a single JIT, but two strategies for deciding what and when to compile. The choice changes the behaviour radically.
The Function JIT compiles entire functions into native code. When a function comes into play, it is translated in full, from start to finish, regardless of which branches will actually be taken. It is simple and predictable, but it has two flaws: it compiles cold code too, the kind that rarely runs, wasting time and memory; and it optimises each function in isolation, with limited knowledge of the types that actually flow through it.
The Tracing JIT works the way modern runtimes do — LuaJIT, V8, PyPy. It first lets the code run in the interpreter and profiles it, counting how many times each portion is executed. When a loop or a path becomes hot, it records a trace: the linear sequence of opcodes actually executed, following function calls and crossing their boundaries too. It observes the real types flowing through that path, and compiles the trace into highly specialised native code.
To stay correct, the Tracing JIT inserts guards: checks that verify the assumptions it made (this is an integer, that branch is taken) are still true. If a guard fails — the types change, the path deviates — the code exits the trace and falls back to the interpreter. By compiling only hot code and specialising it on the observed types, the Tracing JIT is almost always faster and uses far less memory.
It is the recommended mode, and it is the default.
The CRTO configuration
The JIT is controlled by the opcache.jit directive, which accepts a four-digit value in the CRTO format. Each digit governs a different aspect of the compiler.
| Position | Meaning | Values |
|---|---|---|
| C | CPU-specific optimisations | 0 none · 1 use AVX if available |
| R | Register allocation | 0 none · 1 block-local · 2 global |
| T | Trigger (when to compile) | 0 everything on load · 1 on first execution · 2 profile the first request · 3 profile on the fly · 4 (unused) · 5 tracing |
| O | Optimisation level | 0 no JIT · 1 minimal · 2 inline selected handlers · 3 type inference on a single function · 4 + call tree · 5 + inner loop optimisation |
Two combinations have an alias, and they are the only ones you will use in practice:
opcache.jit=tracingequals1254— Tracing JIT, global register allocation, AVX, type inference over the call tree.opcache.jit=functionequals1205— Function JIT, maximum optimisation on a single function.
There is one non-negotiable detail: the JIT does not activate until you reserve memory for it. The opcache.jit_buffer_size directive must be greater than zero — it is effectively the master switch. A typical configuration in php.ini:
opcache.enable=1
opcache.enable_cli=1
opcache.jit_buffer_size=128M
opcache.jit=tracing
Without jit_buffer_size, any value of opcache.jit is ignored and the JIT stays off.
The benchmark
The theory says the JIT transforms CPU-bound code and leaves the rest indifferent. Let us verify it on a real machine: an Intel Xeon at 2.80GHz, PHP 8.3.6 with Zend OPcache.
The ideal case for the JIT is pure numeric computation. The Mandelbrot set is perfect: a double loop that, for each pixel, iterates an operation on floats until a value diverges. No arrays, no strings, no I/O — just floating-point arithmetic inside a tight loop.
<?php
function mandelbrot(int $width, int $height, int $maxIter): int {
$sum = 0;
for ($py = 0; $py < $height; $py++) {
$y0 = ($py / $height) * 2.0 - 1.0;
for ($px = 0; $px < $width; $px++) {
$x0 = ($px / $width) * 3.0 - 2.0;
$x = 0.0;
$y = 0.0;
$iter = 0;
while ($x * $x + $y * $y <= 4.0 && $iter < $maxIter) {
$xtemp = $x * $x - $y * $y + $x0;
$y = 2.0 * $x * $y + $y0;
$x = $xtemp;
$iter++;
}
$sum += $iter;
}
}
return $sum;
}
$start = hrtime(true);
$result = mandelbrot(1000, 1000, 1000);
$end = hrtime(true);
printf("checksum: %d\n", $result);
printf("time: %.1f ms\n", ($end - $start) / 1e6);
Run without JIT, then with Tracing JIT and with Function JIT, on the same script:
=== NO JIT ===
checksum: 257988727
time: 8677.5 ms
=== JIT tracing (1254) ===
checksum: 257988727
time: 1154.0 ms
=== JIT function (1205) ===
checksum: 257988727
time: 1187.1 ms
The identical checksum in every case confirms the result is the same: the JIT does not change what the code does, only how fast it does it. And it does it about 7.5 times faster. Eight and a half seconds become just over one. This is the tight numeric loop collapsing, iteration after iteration, into direct SSE instructions instead of VM handlers on zvals.
The counterexample
Now the opposite case. A typical workload in a real web application is not float arithmetic: it is populating arrays, reading string keys, navigating hash tables.
<?php
function arraywork(int $n): int {
$arr = [];
for ($i = 0; $i < $n; $i++) {
$arr["key_$i"] = $i * 2;
}
$sum = 0;
foreach ($arr as $k => $v) {
if (isset($arr[$k])) { $sum += $v; }
}
return $sum;
}
$r = 0;
for ($j = 0; $j < 200; $j++) { $r = arraywork(50000); }
=== Array-heavy (hash table, string keys) ===
[no jit] time: 895.9 ms
[tracing] time: 882.2 ms
A 1.5% gain. Statistical noise.
The reason is the same one we saw in the Zend VM: accessing $arr["key_$i"] is not a machine instruction, it is a hash table lookup — computing the hash, finding the bucket, handling the zval. That cost lives in the data structure, not in the interpreter dispatch. The JIT can remove the interpreter tax, but it cannot rewrite the nature of a dynamic hash table.
This is why applications like WordPress, made mostly of array accesses, string manipulation, and I/O to a database, see improvements from the JIT in the region of 5%, while a numeric computation flies. The JIT does not make PHP fast everywhere. It makes fast the code that was slow because of the interpreter, not the code that is slow for other reasons.
Verifying that the JIT is active
Enabling the JIT and actually using it are two different things. opcache_get_status() exposes the real state of the compiler:
<?php
$jit = opcache_get_status(false)['jit'];
printf("on: %s\n", $jit['on'] ? 'true' : 'false');
printf("opt_level: %d\n", $jit['opt_level']);
printf("used: %d bytes\n", $jit['buffer_size'] - $jit['buffer_free']);
on: true
opt_level: 4
used: 2448 bytes
The on field set to true confirms the JIT is operational. The fact that used is greater than zero — here 2448 bytes — proves it has actually compiled something: if the buffer stays untouched, the JIT is on but not working, a sign your code offers nothing hot enough to compile.
From DynASM to the IR framework
The way the JIT generates native code changed deeply between the versions of PHP 8, and it is a story worth knowing.
PHP 8.0 and its successors up to 8.3 — the one in the benchmark above — generate machine code with DynASM, the same dynamic assembler born for the LuaJIT project. With DynASM, the assembly is written by hand, in separate template files for each architecture: zend_jit_x86.dasc for Intel/AMD, and from 8.1 (2021) zend_jit_arm64.dasc for AArch64 — the backend that makes the JIT work on Apple Silicon. It is effective, but fragile: each architecture is separate code to maintain, and the room for sophisticated optimisations is limited.
PHP 8.4, in November 2024, replaced all of this with a new JIT based on Dmitry Stogov's IR framework. The idea is to introduce an intermediate representation — absent until then — between the opcodes and the native code. This IR is inspired by the Sea-of-Nodes model, the same one used by Java HotSpot's server compiler, by V8's TurboFan, and by Graal: data and control dependencies unified into a single graph, in a form close to SSA (Static Single Assignment), where each value is assigned only once.
The advantage is twofold. On one side, maintainability: instead of writing assembly by hand for every CPU, developers build the IR graph once, and the framework takes care of optimisation, register allocation, and code generation for each architecture. On the other, a real IR opens the door to optimisations that were impractical with DynASM. Stogov estimates a further 5-10% of performance and a smaller memory footprint for the compiled code.
The important point for those who write code: from the outside, nothing changes. The CRTO configuration, the tracing and function aliases, the behaviour — it all stays identical. It is an internal replacement of the engine, invisible above the line of the opcache.jit directive.
Conclusion
The JIT does not turn PHP into C, nor should it. It does not make everything fast, and promising that would be dishonest: we measured it, on a hash table the gain is noise.
What it does is more surgical. For years, every operation in PHP paid the same toll — the VM loop, the dispatch, the zval machinery — regardless of whether it was needed or not. For code waiting on a database, that toll is invisible, lost in the noise of I/O. But for a numeric loop that runs a million times, it is the dominant line item. The JIT pinpoints exactly that code, proves what is superfluous, and removes it.
The distance between PHP and compiled languages has never, at bottom, been a distance of syntax. It was the interpreter layer standing in between. On the code that deserves it, the JIT takes that layer out of the way — and the seven seconds that vanish from the benchmark are the precise measure of how much it weighed.