The Voodoo Programming Language

The Voodoo Programming Language

Robbert Haarman

2009-10-03


Introduction

Voodoo is a programming language designed to be a thin abstraction layer over CPUs' native instruction sets and operating systems' calling conventions. It provides functionality to read and write memory, perform arithmetic, define and call functions, and not much more. In particular, Voodoo has no concept of types, let alone type safety. Voodoo is as dangerous as it is powerful.


Rationale

When implementing programming languages, one of the hurdles to overcome is code generation. Ultimately, the program must be translated from the source language into machine code for the machine that the program is to run on. Generally, the code to be generated differs from machine architecture to machine architecture and from operating system to operating system. Implementing efficient code generators for all combinations of hardware and operating system requires a lot of work and expertise.

To reduce the effort required for code generation, many programming language implementations compile to higher level languages, instead of to machine code. An existing implementation of the higher level language can then be used to execute the program. Unfortunately, many higher level languages provide models of computation that may hinder efficient implementation of the source language.

Voodoo aims to provide a level of abstraction high enough to reduce the effort of generating code for numerous target platforms, while allowing any construct to be expressed efficiently.


Language Overview

A Voodoo program consists of a data, function definitions, and code. All of these are introduced by magic words such as "function", "call", and "string". Any part of the program may be preceded by a label, and labels can be referred to from the code. Finally, a program is divided into a data section and a code section.

Hello World in Voodoo


section data
greeting:
string "Hello, world!\x00"

section code
import puts
export main

main:
function argc argv
    call puts greeting
    return 0
end function

Tokens

Integers

Integers consist of an optional plus or minus sign, followed by one or more digits. Examples:


0
+12
-1

Strings

Strings consist of zero or more characters enclosed in double quotes. Examples:


""
"Hello, world!"

Symbols

Symbols consist of letters, digits, underscores, and hyphens, and are not allowed to start with a digit or a hyphen. Examples:


foo
some-symbol

Escape Sequences

To facilitate entering special characters and to inhibit the special meaning certain characters normally have, an escape mechanism is provided. Escape sequences start with a backslash, and have the following meanings:

\\	  An actual backslash character
\"	  A double quote character
\n	  A newline
\r	  A carriage return character
\t	  A tab character
\xXX	  The character with hexadecimal value XX
\<space>  A space character
\<newline> Line continuation character. The newline
	   and any leading whitespace on the next line
	   is ignored.

Examples of the usage of the escape characters:

"Hello world\n"	- A string with a newline in it
"\\\""		- A string consisting of a backslash
		  followed by a double quote
call foo \
 bar		- Same as: call foo bar

Labels

Places in a program can be marked with labels, so that the place can be referred to from the code by mentioning the label. A label consists of a symbol followed by a colon. When referring to a label, only the symbol is used.

For example:


foo:
word 12

Declares a word with the value 12, and a label 'foo' that refers to the place where this value is stored. For example, if the value happens to be loaded at the address 71234, writing 'foo' in the program will have the same effect as writing 71234 in the same place.

Values

Values are the smallest unit of data in a Voodoo program. Examples of values are:

12     - The integer 12
x      - The value of x (may be a label or a local variable)
@12    - The word stored at address 12
@x     - The word stored at address x

In the rest of this document, '<x>', '<y>' and '<z>' will be used to indicate values.

Actions

Voodoo code consists of actions. Actions conist of a magic word, usually followed by a number of values. This section lists all actions supported by Voodoo. In the list below, '<x>', '<y>', and '<z>' denote values, <symbol> denotes a symbol, and '<expr>' denotes an expression.

call <x> <y> <z> ...
Calls the function <x> with the arguments <y> <z> .... There may be zero or more arguments.
goto <x>
Continues the program at location <x>, rather than at the next action after the goto.
let <symbol> <expr>
Introduces a local variable <symbol>, and initializes it to the result of evaluating <expr>. The variable exists until the next 'end function'.
return <expr>
Evaluate <expr> and return the result from the current function.
set <x> <expr>
Evaluates <expr> and assigns the result to <x>. <x> may not be a label, because labels cannot be assigned to.
set-byte <base> <offset> <x>
Sets the byte at <base> + <offset> to <x>. <offset> is given as a number of bytes.
set-word <base> <offset> <x>
Sets the word at <base> + <offset> to <x>. <offset> is given as a number of words.
tail-call <x> <y> <z> ...
This is like 'return call <x> <y> <z> ...', but re-uses the call frame of the current function. This saves stack space, so that chains of tail calls can go arbitrarily deep, whereas normal calls would run out of stack space.

Expressions

Certain actions have the ability to evaluate expressions. Expressions can be simple values, but expressions can also perform computations on values. The following are valid expressions:

add <x> <y>
The result of adding <y> to <x>.
and <x> <y>
The bitwise and of <x> and <y>.
call <x> <y> <z> ...
Similar to the action call, this calls the function <x> with the arguments <y> <z> ... (there may be zero or more arguments). The result of this expression is the value returned from the function.
div <x> <y>
The (integer) result of dividing <x> by <y>.
get-byte <x> <y>
The value of the byte at address <x> + <y>.
get-word <x> <y>
The value of the word at address <x> + (WORDSIZE * <y>).
mul <x> <y>
The result of multiplying <x> by <y>.
not <x>
The ones' complement of <x>; i.e. all the bits in <x> reversed.
or <x> <y>
The bitwise or of <x> and <y>.
sub <x> <y>
The result of subtracting <y> from <x>.
xor <x> <y>
The bitwise exclusive or of <x> and <y>.

Conditionals

Conditionals in Voodoo take the following form:

if<test>
  ... some code here ...
else if<test>
  ... other code here ...
... more "else if" parts ...
else
  ... some code ...
end if

There can be any number of else if parts, and the final else clause is optional. The tests that are provided are the following:

ifeq <x> <y>
Tests if <x> is equal to <y>.
ifge <x> <y>
Tests if <x> is greater than or equal to <y>.
ifgt <x> <y>
Tests if <x> is strictly greater than <y>.
ifle <x> <y>
Tests if <x> is less than or equal to <y>.
iflt <x> <y>
Tests if <x> is strictly less than <y>.
ifne <x> <y>
Tests if <x> is different from <y>.

Function Definitions

A function definition looks like:


function x y z
  <code>
end function

Here, the function being defined takes 3 arguments, which can be referred to as x, y, and z from the code inside the function body. A function may have zero or more arguments and is practically always preceded by a label, so that the function can be referenced from code.


Implementation

The Voodoo programming language is implemented by the Voodoo compiler. This implementation compiles Voodoo code to x86 assembly code, suitable for processing with The Netwide Assembler. The implementation is written in Ruby. It is available under the terms of the MIT license and can be downloaded from the Voodoo compiler page.


Related Work

This section provides links to some projects that are related to the Voodoo programming language.

C--

C-- is a language with very similar aims to Voodoo's. Like Voodoo, C-- aims to be a target language for programming language implementations, providing a thin abstraction layer over the target platform without getting in the way of efficient implementation of high level constructs. Compared to Voodoo, C-- seems more complete. However, development on C-- implementations seems to have stagnated.

LLVM

LLVM, the Low-Level Virtual Machine, provides a compiler framework that has similar aims to Voodoo. Compared to Voodoo, LLVM provides many more features. This comes at the cost of more complexity and a much heavier implementation.

The Common Language Infrastructure

The Common Language Infrastructure (CLI), designed by Microsoft and published as ECMA-335, is a specification that allows programs written in a variety of high-level languages to compile to a common byte code format and share data and code with one another. Compared to Voodoo, it is much more extensive, including such things as a type system and garbage collection. Where Voodoo is lightweight and aims to be language-agnostic, the CLI is heavier and more geared towards statically typed languages with class-instance object systems.

TurboVM

TurboVM is a virtual machine that exposes a RISC instruction set, designed to be simple and efficient to generate, parse, and execute. Like Voodoo, TurboVM is intended as back end for programming implementations. Also like Voodoo, it aims to be simple and lightweight. Voodoo could be compiled to TurboVM byte code, and, conversely, TurboVM byte code could be compiled to Voodoo, either as a possible intermediate step to native code generation.

Alchemist

The Alchemist code generation library is a library that can be used to generate machine code. A possible future direction for Voodoo implementations is to use Alchemist to generate (and possibly execute) machine code on the fly.