Rebuilding the TVM

Over at Hackaday, there was a Retrotechtacular post on the Transputer that generated a fair thread of comments and conversation. I put out a call for help to revitalize the occam toolchain a bit.

For the colleagues who replied, here’s a short history/roadmap.

I will Sum Up

The full story is long. We have a codebase that includes the original occ21 occam compiler from Inmos Ltd., which has been extended over the years to include not just the core occam2.1 language, but also extensions for process mobility; this language is called occam-π.

Multiple dissertations grew out of the codebase over the years. It is safe to say that the fact that the language lives at all is because of the dedication, over many years, of Prof Peter Welch (and others) at the University of Kent.

The Repository

The main repository includes a toolchain for a language, its standard libraries, and multiple runtimes.

  • Under tools, you will find the compiler, the linker, the documentation builder, and other tools that hopefully don’t factor in right now.
  • The runtime directory includes the Intel native runtime as well as the TVM, or Transterpreter Virtual Machine, which is the portable runtime. It is implemented as an ANSI C library that is intended to be statically compiled and linked into a wrapper for any given embedded target. In a word, it is an overgrown bytecode interpreter and scheduler.
  • The tvm directory contains the wrappers for the VM. When building on big platforms, the POSIX wrapper is used; “big” generally means “the platform casually builds most programs against a POSIX API.” The Arduino wrapper provides the hardware-to-VM bindings for the ATmega series of processors.

The TVM

Because of extensive tests built into the occam compiler toolchain, we are confident in the VM. Yes, you can write code that crashes; a division by zero is still bad. However, we are extremely confident in the scheduler and runtime; given good input (eg. a compiled occam21 program), the VM works.

For a momentary dive into the wrappers, you might begin by looking at the TVM runloop in tvm.c. It:

and does that forever.

The wrapper declares some constants that lay out the size and location of RAM (which we provide an overlay for; the VM runs fine on big- and little-endian hardware), and hooks for getting the time 1, 2. We provide an FFI table for linking occam programs to native code (a tricky dance), and an interrupt mapping table. The interrupt mapping lifts hardware interrupts into the occam runtime; while this does mean that we are lifting a real-time event into a soft-realtime context (a “drawback” for some developers), it also means that we are lifting random, unpredictable behaviors into a well-reasoned framework for parallel and concurrent programming. If you are an engineer who needs real time software, this is not it. If you are an engineer who needs correct, concurrent execution of code on embedded devices, and can live with soft real-time, then I have found occam on the TVM to be a joy to develop with.

(The curious can see some slides that cover this work, as well as a paper.)

The VM was never well integrated into the build for cross-compilation targets. We typically drove much of this from one or more programming environments (eg. a plugin or similar for an IDE), and we would simply plunk the compiled VM down, and ship bytecode to it. As a result, the build for embedded targets was never a first-class citizen of the project.

The Build

The build currently builds everything.

For embedded work, we really only need:

  • The TVM runtime, cross-compiled for the target architecture.
  • A wrapper, cross-compiled, and linked against libtvm.a.

There’s often a bit of a dance to then get things right for the platform. For example, we had a complex way of doing bytecode upload to the ATmega series of processors; you could upload the VM once, and then upload just the bytecode repeatedly. I personally think that it is now easier just to:

  1. Compile occam21 programs to bytecode.
  2. Let the linker spit out a .h file containing the bytecode in an array.
  3. Link the code into the wrapper/TVM.

This gives you a single binary for shipping to the target platform. Doing anything fancier would need to be justified by some use case. Given that the VM is around 20K when compiled for the 328p, and bytecode is tiny (it is Huffman encoded), then it is far, far smaller than many of the 500K monstrosities that are being shipped as part of Python and Javascript runtimes for ARM cores (eg. the Circuit Python Express, or the BBC Micro:Bit).

The compiler does not need to be cross-compiled. I have (in the past) had it running behind a web service, so that the end-user can write their code locally, the code is compiled “in the cloud,” and a .hex is shipped back. I would, today, do this in a Docker container, so it could be run locally (by an expert developer), or be used remotely (by students, casual users). I am happy to bring this back to life.

So, in short:

  1. Toolchain. GCC has moved on since the project was under active development. Some cleanup may be necessary, but there should be no reason for things to have completely gone to hell in a handbasket. Or, pehaps bitrot can be that bad.

  2. VM. The VM needs to be well supported in terms of adding cross-compilation build targets.

  3. Wrappers. At the least, a wrapper for a modern target needs to be developed. This is not hard, as evidenced by the ATmega wrapper—which probably serves as a good roadmap, and probably had to jump through more hoops than other plaforms will require, as the libc for the AVRs is 1) limited and 2) the mixed memory architecture of the device made some things more complex than necessary.

I personally think there must be a way to pull out the necessary pieces for embedded development, so as to simplify the process. By “must be a way,” I mean “a way of reorganizing the repository so it is not monolithic.” Perhaps this means “I think we should use git modules or similar to reorganize the repository.”

I’m happy to discuss any branching/reorganization of the repositories that gets us to a goal of easily building occam and the TVM for more embedded targets. I really do prefer this language for environmental sensor work over any variant of C—especially when it comes to teaching students to reason about concurrent systems.

Native vs. VM

The “native” runtime involves a 100KLOC+ of C that, to get fast context switch times, plays games with the Intel processor’s stack in assembler. It is, in my opinion, the wrong idea to ever touch this code, or imagine targeting any other processor. This will be slow and error prone over time, as each new architecture may require subtle work in the runtime.

The TVM “just works” anywhere you can compile ANSI C. It’s slower, but honestly, I care about correctness, and ease of portability/maintenance. At roughly 10KLOC, it’s a lot more manageable.

Notes about Targets / Random Ideas

We have targeted builds for the TVM at devices like the LEGO NXT in the past. To do so, we first targeted a runtime; this meant we built on top of a small OS. If an embedded target has FreeRTOS, it is completely reasonable to target a wrapper for the TVM that builds on top of FreeRTOS. Loading bytecode can then be more easily accomplished on that plaform—because you’re no longer working on the bare metal, but instead have an API between the VM and the hardware. It would also mean that builds to new architectures that support (say) FreeRTOS would be one-and-done, as we’ve targeted an abstraction (as opposed to having to develop a new wrapper for each new target).

This is what we do for the POSIX interface. It sometimes complexifies the interrupt handling… for example, we had to do a fair bit of work to make screen/keyboard handling work on POSIX platforms, but for embedded targets (even with a lift of something like QNX or FreeRTOS) it will likely look more like the Arduino wrapper than anything else.

I have also wondered about platform.io. Would it be good to “port” the build of the VM there? Would this provide some “lift” for targeting future embedded targets? The library should “just work” across multiple platforms, and each would need a wrapper. Once that was done, it becomes a matter of tying in a call to compile the occam code.

Conclusion

That was long, but hopefully provides a roadmap. The project has had many hands over many, many years—going back to the Inmos engineers who did the original work on Transputer and the language itself. I believe the toolchain has value, especially for teaching students how to reason about concurrency and parallelism on engaging and real hardware. The transition (in my experience) from occam to languages like Go or Erlang, to reasoning about concurrency in Javascript (everything is a callback), or even architecting concurrent and parallel code using semaphores and threads—all of those things are easier when you have a coherent model for reasoning about concurrent systems.

Being a college professor, helping launch a new department, takes time. However, if there are people who are keen to help bring this project back to life, it will provide motivation to support those people and help make it a reality.