On Briding the Gap

9 minute read Published: 2022-08-30

In our current node graph design there exist a sort of disconnect between our pure mathematical concept and, to be frank, reality. Let me start by introducing you to Graphene the node based "programming language" which constitutes the heart of the Graphite Editor.

Node based editing

If you already have experience with node based editing, feel free to jump to the next section

Tradiditional graphic editors always limit themselves by the chosen layer of abstraction. Photoshop for example uses layers (images) which are composited on top of each other. For most usages, this is a good abstraction that fits the use case of image editing well, until it doesn't. Let's say you want to create a blur that is dependent on the average brightness of an input image, that is something that you can't easily represent using the layer abstraction.

But someone already came up with a solution to that exact problem: node based editing. In node based editors, the smallest functional unit is called a node, in the context of image editing, you can for example think of and image node that outputs an image, a greyscale node that takes an image as input and outputs a greyscale version of the same image, etc.

// insert example picture to illustrate this concept

Node based editing is employed by lots of popular software such as blender, houdini, davinci resolve and many others, but not so much 2D image editors. // add Nuke and Natron as examples

Nodes in `Graphene`

But even when designing a node based editor, you still have to choose a set of fundamental building blocks from which you build your application. One example would be the set of fundamental data types to build your application out of. And even the nodes themselves are limited in some way, often these node systems allow you to write custom code to extend the application with your own nodes, but you are still limited by the choice of data types and the node-runtime. The runtime describes the process through which data is passed from node to node and when nodes are scheduled for execution.

Ideally, we'd like to:

Choose a sensible level of abstraction.
This is mainly a question of granularity, let's illustrate this using childrens toys. As a Kid I loved playing with LEGO bricks, they provide a fundamental building block from which you can construct pretty much everything you can imagine by combining small atomic units to build more complex structures. Playmobil for example chooses a bigger level of abstraction, instead of a set of composable fundamental bulding blocks it uses people, trees, animals, houses, cars, etc. This allows "user" to quickly get going and build a world, but it limits them as soon as they wan to build something that the playmobil company didn't forsee. Building your own house would likely not be possible using playmobil but possible although labor intensive using LEGO. It is of course always an option to 3D print your own parts (this would be akin to writing a custom node) but this process is labor intensive and the final part always kinda looks out of place if you don't use the same philosophy as the playmobil developers to design that part.
So the goal of a good system is to both provide sufficiently small composable building blocks to allow you to build everything as well as providing sensible abstractions (groups/modules) to quickly get work done. Programming languages had a very similar problem to solve, first there was machine code which made it possible to theoretically write every program imaginable, but for larger projects this was immensly labour intensive. Hence so called higher level programming languages were created to combat this problem of scalability. This brings us to a prime example for such a language: Rust provides so called zero cost abstractions, which brings us to the next point.
Minimize overhead:
For reasons that warrent their own blog post, Graphite will need to squeeze out the last bit of performance, because every performance improvement directly impacts the user experiance. To continue the analogy from before, lets talk about python, it provides great abstractions which allows fast prototyping and it is (for the most part) an interpreted language. This means that we need a runtime that keeps track of the current state and then reads the program line by line executing it along the way. This is very similar to how a human would read through a program while debugging.
Manually piping the data to each functional unit comes with a runtime overhead which is one of the reasons why interpreted languages usually tend to be slower that compiled ones. So ideally we could "compile" our network of useful abstract nodes to more fundamental execution units which no longer need a runtime to work. Lets say you want to reenact a scene from a james bond movie in which 007 holds on to a rocket which accends into space. With Playmobil you would need to coulple the movement of your character to the movement of the rocket with two hands, whenever the rocket moves, you have to move the figure as well. If they were made out of lego, you could stick both parts to gether in order to couple their movement. To translate this terrible analogy into the real world this is the difference between manually passing the output of the image node to the input for the greyscale node or fusing image node and greyscale node into a single functional unit.

This brings us to the current design envisioned for Graphene. We wanted ultimately build something that limits the user as little as possible while still providing sensible defaults to make it easier to get started. The solution to the first problem the solution was pretty simple, we just allow the usage of the entire rust type system as data and rust code to define nodes. Only a few core nodes should suffice to solve any computational problem by just combining these fundamental nodes, this can be achieved by implementing the lamda calculus using nodes an hence achieving Turing-completeness. (If you have no idea what those last two concepts where don't worry it's not required for understanding this article.)
Fundamentally we split the concept of a node graph in two parts, on the one hand, there is the so called document graph this provides the high level abstractions that user interacts with on a day to day basis, this is also where your image nodes, brush nodes, blur node etc. live. Each of the nodes in the document graph is then compiled to a set of fundamental nodes, that's what we call the protograph.

// picture illustrating the transition from document graph to node graph

This is analogous to how programming languages work, the document graph, which is bijective to what the user sees and interacts with in the editor, is equivalent to the syntax of a programming language. The protograph would then be the compiled result of our document graph syntax and represents the actual execution performed by the processing hardware.

Protograph

Let's take a short detour and talk in more detail about this ominous protograph. While working on the fundamental design for the node graph, I studied some category theory with a friend of mine using the excellent course Category Theory for Programmers by Bartosz Milewski and got hooked on functional programming concepts. There is is certain beauty in rigorously defining programming concepts from the ground up through math and abstract it to absurd levels. It definitely elevates computer science from being an engineering discipline to a more theoretical plane and at the same time grounds it in user chosen fundamental axioms. Instead of trying to form our programming languages and mental models to fit reality, we create our own system to describe computation from the ground up, which generally allows us much higher levels of abstraction. This mindset has been hugely influential in the conception of the graphene language. Every node in our protograph is just a quasi side effect free (=pure) function which allows us to employ extensive caching schemes and never recompute parts of a graph that we have already evaluated.

These nodes are themselves just implemented as rust functions which means that we can actually translate a visual representation of a node graph into rust source code which can then be compiled into a single optimized binary. This has quite a few advantages, we don't have to implement our own type checking and the compiler can strip away all of our fancy abstractions and make them actually zero cost.

However one fundamental constraint for the design of the node system is that it has to support real time updates, any delay between modification of the node graph and visual updates in the user interface should be imperceptible. Spinning up rustc every time the user adds a brush stroke is just not viable. But if we take a look at programming languages, this is already a solved problem: Wasm also consists of a "high level" level language that has to be executed with as little of a delay as possible and also as fast as possible. This is achieved by leveraging tiered compilation and initially just "interpreting" the binary and starting the compiler in the background to compute a more optimized binary that can be swapped out on the fly. We want to build something similar for graphite by using a mix of interpretation (executing instruction after instruction using a runtime to pass data from node to node)

Node based editing

Nodes in Graphene

Protograph

Nodes in `Graphene`