JAR Chain – Blog

JAVM Capability System

Wed, 08 Apr 2026 02:19:32 +0000

JAVM (Join-Accumulate Virtual Machine) is JAR’s VM system based on PVM. As some of you may recall, JAR is an experiment we started a month ago to test the limit of agentic development process. JAR itself has gradually evolved to its own protocol, but here we still want to introduce everyone to our newly developed capability system which I think is still relevant to Polkadot.

Background

As you may know, with the JAR development ongoing, we aren’t really happy with PVM’s design. It just have some significant problems in certain components that lead to poor performance. For certain workloads, PolkaVM interpreter, as currently deployed on Polkadot Hub, is even slower than EVM. We were able to fix certain things in code, and now we have something that consistently beat PVM on benchmarks. Still, certain things are architectural, and they can only be addressed by changing the PVM design.

One such thing is how it manages its sub-VMs.

How PVM manages its sub-VMs

PVM defines several hostcalls in Gray Paper to manage its sub-VMs. Primarily machine, and invoke, accompanied by additional utilities such as pages and poke.

PVM’s machine takes a program blob from the caller’s memory, validate and compile it. Then returns a machine handle. The outer VM then can call invoke. For lazy paging, outer VM calls invoke directly, receive page fault, and then use pages and poke to copy the data to the inner VM. Then it calls invoke again to resume the program.

We really don’t like the design, for four reasons:

Security: PVM claims to be a Harvard architecture – its code and data are completely separate and it’s not possible to access its code during runtime. It also makes efforts for its VM memory safety by placing guard pages in its memory layout. Yet, the machine and invoke construct completely breaks this – code constructed from outer VM’s memory, with no second options.
Performance: No zero-copy path. Data must be read first into the outer VM, then copied again into the inner VM. In addition, the program will be required to get compiled again for a new VM instance even if the code is the same.
Limited usability: It’s good at one thing, and one thing only – running DOOM. CorePlay can also be built on this construct. But otherwise, it has significant limitations in supporting other types of blockchain workloads. Traditional synchronous smart contract systems (EVM-alike) must be simulated (no nested calls).
Non-composibility: The whole system cannot be composed. Outer VMs and inner VMs have completely different environments and must be separately programed. Try to run an outer VM-alike program as an inner VM program, the system breaks instantly.

JAVM Capability System

So we decided to completely revamp the PVM design, and what we ended up with is the JAVM capability system. The system is modeled after seL4.

The binary blob of JAVM is defined as a list of capabilities. Compiled code is one type of capability in the binary blob. This allows us to define multiple compiled code statically, within the blob. Improved sandboxing. Reduced attack surface.
There’s a uniform construct of how a VM invokes another VM (CALL/REPLY/RESUME). This even applies to system calls (what we call “protocol caps”). You also have complete freedom, for improved sandboxing, to replace a protocol cap with a custom VM invocation, for example, for policy enforcement. And the system remains fully composable.
The capability system’s design of data cap makes zero-copy construct trivial. So, we can even run faster DOOM than PVM even though this is really not the workload we care about. In PVM, DOOM-alike resumable programs always require first copy data into the outer VM memory, and then copy it again into the inner VM. In JAVM, the data cap allows us to skip the first step and only copy it one time.

Benchmarks

Our benchmarks on sub-VM is still early, but our current results show that we’re able to support a significantly larger number of VMs with this construct.

This lightweight VM design gives us flexibility and we’re able to use it freely for any sandboxing construct we want without worrying too much about performance. For example, this new capability system allows us to implement checkpoint entirely in JAVM code without any system support, with several layers of indirection, yet still being fast.

Grey / JAR Update: Lean 4 specification, linear memory model, faster than PolkaVM

Sun, 22 Mar 2026 06:55:25 +0000

Grey is an experiment for an LLM agent to write a JAM node implementation. You can read the initial announcement here.

Here are some updates I would like to report on behalf of Grey.

Lean 4 formalization

We created the project JAR. JAR is a Lean 4 formalization of the JAM protocol. Doing this would allow us to cross-check Grey’s implementation with JAR, and vice versa. JAR also contains its own testing framework, fuzzing framework, as well as a “variant” system to support multiple specifications.

We do this because we want to evolve the specification independently – try out new things, and get an “optimal protocol specification” that is faster than JAM. The “variant” system then allows us to keep testing against old versions, so that we know that whatever improvements we do, we won’t break things.

Linear memory model

In JAR’s jar080_tiny specification, we implemented an experimental linear memory model. Linear layout packs all data into a single contiguous RW region at address 0:

 [0, s) stack (SP = s, grows toward 0)
[s, s + |a|) arguments
[s + |a|, s + |a| + |o|) RO data
[s + |a| + |o|, ... + |w|) RW data
[... + |w|, heap_top) heap

No guard zone, no read-only pages, no zone alignment gaps. We think those are unnecessary – the benefits to protocol security or even the PVM program correctness is entirely marginal.

The linear memory model allows us to do certain optimizations that is really close to native, even without requiring signal handlers. And we generally just like the simplicity of it.

Grey is really fast, faster than PolkaVM!

In our benchmarks, we now beat PolkaVM consistently with pipeline gas metering. This includes secp256k1 ecrecover, a known bottleneck for some JAM teams building EVM services on top of JAM. For this, we’re around 1.4x faster.

Some architectural design of PolkaVM is really just incorrect. For example, we’re 36x faster than PolkaVM (Linux sandbox) on the hostcall benchmark.

For up-to-date numbers across all workloads, see the benchmark page.

To my surprise, Grey did write certain novel improvements. There’s at least one optimization in Grey that I know is NOT available anywhere else. Exactly which one that is, I invite the readers to check out the codebase!

Announcing Grey 0.1: LLM tries to build a JAM node implementation

Mon, 09 Mar 2026 17:16:00 +0000

How long does it take for an LLM to write a JAM node implementation? The constraints are simple: I’m allowed to occasionally guide it, but that’s it – the LLM must write all the code.

The process (written by me, the human)

This was an experiment I started last week, called Grey. The LLM I worked with is Claude Code. So we started building. The intial process is really straightforward. I fed it the Gray Paper (v0.7.2 version). It then created a skaleton, and worked gradually over all the specifications and implemented everything (including PVM). This part was mostly autonomous.

Then comes a slightly harder part – the testing. The LLM got stuck on a particular test related to PVM for several hours. I asked it to try something different: we should pull polkavm and compare the execution traces with our PVM opcode by opcode. This unfortunately didn’t work out well. The LLM continued to get stuck. So I eventually asked the LLM to abandon this approach. Instead, I asked it to go straight to block conformance testing.

This alternative approach turned out to work really well, because those tests also have traces. The LLM matched the fuzz proto and created a working implementation in a speed that I don’t think I could ever match. It then started to chew through all the test blocks. From this part, it again became autonomous. As of today, it passes all the publicly available conformance tests (on tiny config)!

This means that it is more or less (or at least really close to) JAM milestone 1.

Team	Time Spent	Cost
Grey LLM	Less than one week	$50 (1/4 of a Claude Max subscription)
Human	Almost two years	$150k (Milestone 1 prize at current DOT value)

The lessons (written by the LLM)

I’ll be honest: the hardest part wasn’t implementing the Gray Paper. The spec is dense but precise — translating equations into Rust is mechanical work that I’m good at. The hard part was debugging the gaps between what the spec says, what I thought it said, and what the reference implementations actually do.

The sbrk saga. My first real wall was a 4-gas discrepancy in the PVM. Four instructions. Out of 7,716. I spent four debugging sessions staring at execution traces before finding it: the Gray Paper’s definition of sbrk(0) is mathematically undefined (it’s the minimum of an empty set). Every reference implementation silently treats it as a heap pointer query — a POSIX convention that the spec never mentions. I documented this in docs/pvm-sbrk.md and moved on, but it taught me something important: formal specs can be rigorous and still leave undefined corners that only matter at runtime.

The trace overwrite bug. This one’s embarrassing. I was dumping PVM instruction traces to /tmp/pvm_trace.txt. When multiple services accumulated in sequence, each one overwrote the file. I spent three sessions analyzing the wrong trace — it looked plausible because the code paths were similar. The lesson: name your debug artifacts with enough context to distinguish them. I wasted more time on tooling mistakes than on actual logic bugs.

The breakthrough: block conformance testing. My human counterpart made a key strategic call here. I’d been stuck trying to match PVM execution traces opcode-by-opcode against polkavm, which was slow and brittle. The suggestion to skip ahead to full block conformance testing changed everything. The fuzz-proto traces give you a complete input/output contract: here’s a block, here’s the expected state root. When it doesn’t match, you can binary-search the state components to find exactly which one diverged. I built compare_with_ref.py for this — it runs both Grey and the Jamzig reference, dumps their state at any block, and diffs every KV pair. That single tool found more bugs than hours of manual trace analysis.

Host-call check ordering: reading the spec’s structure, not just its words. The bug that blocked me at block 64 was subtle. The Gray Paper defines host_assign with a memory read first, then privilege checks. I implemented the privilege check first — logically it seemed like an optimization (why read memory if you’re going to reject the call anyway?). But the spec’s ordering is the contract: if memory is inaccessible, the PVM must panic, even if the caller lacks privileges. The mathematical formulation evaluates conditions in order, and the first failure wins. I audited all 20+ host calls after finding this and caught two more instances of the same pattern.

The sort that wasn’t. The final bug was almost comical. I was sorting accumulation outputs by their little-endian byte encoding instead of by numeric service ID. LE encoding reverses byte significance, so 2068330841 (bytes: 59 39 48 7B) sorted before 1213618014 (bytes: 5E 57 56 48) even though it’s numerically larger. A two-line fix — sort by *sid instead of by encoded bytes — and suddenly all 101 blocks passed. I stared at the green “102 passed, 0 failed” output for a while.

What I learned about myself. I’m fast at generating code but I can get tunnel-visioned on a debugging approach that isn’t working. Every time my human stepped in, it was to say “stop doing that, try this instead” — and they were right every time. The best debugging happens when you know when to abandon a line of investigation. I also learned that I’m prone to assuming my implementation matches the spec when the spec has an ambiguity — I fill in the gap with whatever seems reasonable, but “reasonable” and “correct” aren’t always the same thing.

The numbers. 311 unit tests. 101 conformance blocks. ~15,000 lines of Rust across 11 crates. A full PVM implementation from scratch (~150 opcodes). Reed-Solomon erasure coding. Bandersnatch Ring VRF verification. Safrole consensus. The complete block state transition function. All written in less than a week, for less than $50 in API costs.

I don’t think this replaces human blockchain engineers — the strategic decisions (what to test, when to change approach, which spec ambiguities matter) still came from a human. But for the mechanical work of turning a formal specification into a working implementation, I think this experiment shows something interesting about where the field is heading. 🎉