Wednesday, October 30, 2019

Virtual Machine Oriented Development

Most computing devices that we have today---desktop, laptop or phone---are capable of computing any program that can be computed. There's a bit of equivocation there. What is meant is that anything that a human can manually calculate via rote, mechanical process can also be done by computer. This is the Church-Turing Thesis.

I've never really stopped to think, but what would a non-universal computing machine look like?



Several years back I suffered a bout of jealousy. I thought about engineers in other fields who build roads or buildings or even cars. A civil engineer can imagine something they've build standing still amidst the blur of 100 years passing by. An automotive engineer can imagine a car they've designed still driving the roads in 20 years.

I don't think a single line of code that I wrote 3 or more years ago is still in production, and that's ignoring all the code that I wrote that never made it into production.

These are the kinds of things you start to ponder as you reach the ripe, old, programmer retirement age of 33.

But then a funny thing happened. I played Sam & Max Hit the Road ... on my Android phone.

Here was a game released in 1995 and I was playing it on my Android device in 2014. How did that happen? Well, when LucasArts designed the game "Maniac Mansion," they decided to create a scripting language and write the game in that language, and they used that scripting language for many of the games they made. I have several original LucasArts games on CD. Some are PC versions some are Mac versions.

Over the years as I feel the nostalgia hitting me I'll grab the game files from the CD and download ScummVM for whatever platform I'm on at the time. I copied the game files to my phone and downloaded ScummVM from the Play store. That's how it happened.


Data is Code

I had been exposed to Lisp, and even written a lot of Lisp before I finally had my enlightenment about macros and metacircular interpreters. I remember vividly reading Structure and Interpretation of Computer Programs and seeing Scheme put to use creating simple yet powerful abstract interpreters. The author's start "simply" with interpreters that add new programming paradigms to Scheme. Then they proceed to simulating a register machine and writing an assembler and compiler for it. This happens in the last chapter, a space of ~100 pages.

It is a Divine joke that Structure and Interpretation of Computer Programs and The Art of Computer Programming had their titles swapped, because---while I don't wish to denigrate TAOCP which is a depth of amazing riches---SICP is about art, and in a metacircular way it is art.

It is too easy as a Lisper to understand the world this way, but data is always code. In Forth, 5 is not a number, it is an instruction to push the value 5 onto the top of the program stack. Your program receives a program as input. It receives files, network packets, key presses, and mouse clicks. It interprets this program and produces output.

A PDF file can cause a buffer overrun in a PDF reader because each byte is literally an instruction to your program-as-interpreter to "write a value at the current location and move to the next location" (or at least it can be if your program-as-interpreter has flawed semantics).

This is not a property of Lisp, it is a property of the stored program computer, Universal Turing Machine, von Neumann architecture. Code and data are made of the same stuff and stored in the same memory.


The Non-Divine Joke

In his talk "The Birth & Death of JavaScript," the 2014 version of Gary Bernhardt extrapolates where JavaScript and asm.js will take the world in 2035 (after an apocalyptic global war, of course). The punch line is that instead of JavaScript running on computers, computers run on JavaScript. This happens through a comical stack of emulators emulating emulators that emulate. Actually I think it's compilers transpiling compilers that transpile transpilers, but...same difference.

But like every joke there's a bit of truth to it.

Paul Graham writes about "Programming Bottom-Up" where you build the language "up" to your program to the point that actually expressing your program becomes somewhat trivial. You're building a domain specific language to solve exactly the problem you have. Again, this is all too natural for Lispers, but everyone does it.

The act of programming is to turn a universal computing machine into a limited computing machine. You build out data types and operations to focus the abilities of the computer into a specific domain. Programmers instinctively understand this, which is why we find it so funny that---in a twist of irony---a universal computing machine emulates a universal computing machine emulating a universal computing machine.


Virtual Machine Oriented Development

I started thinking about Virtual Machine Oriented Development because I was concerned about the transience of my legacy. I noticed that there were software products that were still around 20 years after they were written. I started seeing a VM underneath them.

But having thought about it more, I don't think that Virtual Machine Oriented Development is just about legacy. I think it might clarify the design process to be explicit about the fact that we're designing a limited computing machine that analyzes sales data. What are the data types? What are the operations? If you have power users, maybe they'd even like a scripting language that can describe which data to import and then how to analyze it?

You might find then that you've abstracted your problem into a computation model that will become valuable for years. Maybe you'll end up rewriting the interpreter for this language several times, and all the while users can keep using their existing scripts.



What does a non-universal computing machine look like? It looks like every program you've ever written.