The Voodoo Programming Language
2021-05-28
Introduction
Voodoo is a programming language designed to be a thin abstraction layer over CPUs' native instruction sets and operating systems' calling conventions. Operations provided by Voodoo closely correspond to those of common CPU instruction sets, allowing programs to be expressed with minimal overhead. At the same time, Voodoo is not tied to a specific instruction set, and support for new CPUs and operating systems can easily be added.
Rationale
When implementing programming languages, one of the hurdles to overcome is code generation. Ultimately, the program must be translated from the source language into machine code for the machine that the program is to run on. Generally, the code to be generated differs from machine architecture to machine architecture and from operating system to operating system. Implementing efficient code generators for all combinations of hardware and operating system requires a lot of work and expertise.
To reduce the effort required for code generation, many programming language implementations compile to higher level languages, instead of to machine code. An existing implementation of the higher level language can then be used to execute the program. Unfortunately, many higher level languages provide models of computation that may hinder efficient implementation of the source language. Also, implementations of higher level languages are not always easy to port to new platforms.
Voodoo aims to reduce the effort required to generate reasonably efficient code for multiple target platforms. It provides low-level operations similar to those found in many CPU instruction sets, allowing for efficiency close to machine code. At the same time, it supports a variety of instruction sets and operating systems, and support for new combinations can easily be added. Using Voodoo, compiler writers can get reasonable efficiency on a variety of target platforms at the cost of writing a single code generator.
Language Overview
This document describes version 1.1 of the Voodoo programming language. Definitions in this document have been marked with the version of the language in which they have been introduced. Definitions first introduced by this document have been marked with 1.1.
Hello World in Voodoo
To quickly get a feeling for a programming language, it is often instructive to look at a simple yet complete program. The following is an implementation of the traditional Hello World program in Voodoo:
# Hello world in Voodoo
section data
greeting:
string "Hello, world!\x00"
section functions
import puts
export main
main:
function argc argv
call puts greeting
return 0
end function
When compiled and linked with a library that provides
an appropriate implementation of the puts
function, this
program will print Hello, world!
.
What follows is a more formal description of the Voodoo programming language.
Overview
The source code of a Voodoo program consists of
incantations (e.g. set x 42
) that define the code
and data of the program. Each incantation starts with a magic
word (e.g. set
) and may have zero or
more parameters. Any incantation may be preceded by
a label (e.g. foo:
), which can then be used in
other parts of the program to refer to the address at which the
incantation starts.
The basic units of data in Voodoo are bytes and words. The range and size of bytes and words is implementation-defined, which means that the definition is allowed to differ from one Voodoo implementation to another, and even in ways defined by a specific implementation, e.g. the same Voodoo implementation may use different word sizes for different target platforms.
Most incantations in Voodoo perform very simple operations, such as adding two numbers. In addition to these simple operations, Voodoo supports blocks and functions. A block provides a scope for local variables and automatically managed memory. Variables and memory allocated inside a block will automatically be de-allocated after the end of that block. Functions, like blocks, provide a scope for local variables and automatically managed memory. In addition, functions may accept parameters, which are local variables whose values are supplied by the caller of the function.
At run-time, entering a function or block creates a
frame and makes it the new active frame. These
frames form a chain which logically runs from the top-level
frame at the top to the active frame at the bottom. The active
frame determines what local variables are accessible.
The magic word goto
can be used to continue execution
of the program at a given incantation, provided that incantation
is in the scope corresponding to the active frame.
This document does not define the correct behavior of every possible Voodoo program. In places where the correct behavior is not defined (either through omission, or explicitly documented as being undefined), a Voodoo implementation is not required to implement any specific behavior. In particular, the behavior of a program that invokes undefined behavior at any point does not have to be repeatable or desirable. Later versions of the Voodoo language may define correct behavior for programs whose behavior is undefined under the current language version.
Features
1.1 To aid authors and programs in generating code that is compatible with a particular Voodoo implementation, implementations should provide means to query their features. This document specifies a number of features as key-value pairs, where the key is the feature name, and the value is a string that provides additional information about the feature (as described in the specification of the feature).
Implementations may also advertise features not specified in this document. To prevent conflicts with future versions of Voodoo, these features should have names starting in a prefix that identifies the implementation that introduced the feature. Implementations are encouraged to use this mechanism to advertise support for extensions to the language specified by this document.
For language extensions advertised using the feature mechanism, it is recommended that the value of the feature be a version number, with higher versions being backward compatible with lower versions. Incompatible changes can be indicated by a different feature name, e.g. through a numeric suffix at the end of the feature name.
Name | Description |
---|---|
bits-per-word | The value indicates the number of bits per word for the given
implementation. E.g. 32. |
byte-order | Order of bytes in a word. The following values are defined:
|
bytes-per-word | The number of bytes required to store a word. E.g. 4. |
voodoo | Version of the Voodoo language supported by the implementation.
The version described in this document is 1.1. |
Tokens
Comments
1.0 Comments start with a hash mark (#) and run until the end of the line. The following is an example comment:
# This is an example comment
Integers
1.0 Integers consist of an optional plus or minus sign, followed by one or more digits. Examples of valid integers:
0
+12
-1
Strings
1.0 Strings consist of zero or more bytes enclosed in double quotes. An escape mechanism is provided to allow bytes of any value to be included in strings (see Escape Sequences for details). Examples of valid strings:
""
"Hello, world!"
"some \"escape\x00 sequences"
Symbols
1.0 Symbols consist of letters, digits, underscores, hyphens, and escape sequences. Only a letter, underscore, or escape sequence may be used for the first byte of a symbol. Although this definition allows any byte to be part of a symbol, not all symbol names may be compatible with all target platforms. This document does not specify how implementations should represent symbols, and thus does not require that all syntactically valid symbols be usable on every implementation on every target platform. Examples of valid symbols:
foo
some_symbol
Substitute Tokens
1.1 To aid in the writing of programs that are portable to multiple target platforms and Voodoo implementations, Voodoo defines a number of tokens that implementations must substitute appropriate values for.
A symbol prefixed by a percent character (%) must be treated as equivalent to an integer value. The following table shows the percent substitutes that must be recognized by Voodoo implementations:
Symbol | Meaning |
---|---|
%saved-frame-size |
The number of bytes required to save a frame |
any feature that maps to an integer | the integer value of the feature |
The effect of using a substitute token whose meaning is not defined above is undefined.
Examples of valid substitute tokens:
%bytes-per-word |
number of bytes per word |
Escape Sequences
1.0 To facilitate entering bytes that may be treated differently by different systems and to inhibit the special meaning certain bytes normally have, an escape mechanism is provided. For example, the escape mechanism can be used to create strings that contain double quotes, or to spread a single incantation over multiple lines. Escape sequences start with a backslash, and have the following meanings:
\\ |
An actual backslash |
\" |
A double quote |
\n |
A newline |
\r |
A carriage return |
\t |
A tab |
\xXX |
The byte with hexadecimal value XX |
\<space> |
A space |
\<newline> |
Line continuation. The newline and any leading whitespace on the next line is ignored. |
Examples of the usage of escape sequences:
"Hello world\n" |
A string with a newline in it |
"\\\"" |
A string consisting of a backslash followed by a double quote |
call foo \ |
Same as: call foo bar |
Labels
1.0 Places in a program can be marked with labels, so that the place can be referred to from the code by mentioning the label. A label consists of a symbol followed by a colon. When referring to a label, only the symbol is used, without the colon.
For example:
foo:
word 12
declares a word with the value 12, and a label 'foo' that refers to the place where this value is stored. For example, if the value happens to be loaded at the address 71234, writing 'foo' in the program will have the same effect as writing 71234 in the same place.
Values
1.0 Values are the smallest unit of data in a Voodoo program. Examples of values are:
12 |
The integer 12 |
x |
The value of x (may be a label or a local variable) |
At-Expressions
1.0 The character @ can be used to denote the value at some address. An address prefixed with @ denotes the word stored at that address, rather than the address itself. The address may be an integer, a local variable, or a label. For example:
@12 |
The word stored at address 12 |
@x |
The word stored at address x |
Syntactically, at-expressions are values. Therefore, in any place where a value can be used, an at-expression can also be used.
Sections
1.0 A Voodoo program is divided in a number of sections.
In source code, these are introduced by an incantation of the form
section <identifier>
, where <identifier> is an
identifier for the section.
The following section identifiers are valid:
Identifier | Meaning |
---|---|
code |
This section contains executable code |
data |
This section contains data |
functions |
This section contains function definitions |
Example usage of the section
incantation:
section data
# define data here ...
section functions
# define functions here ...
Sections may appear in the source code in any order. The same section name may be used multiple times, in which case the contents of the section will be the concatenation of the contents of the multiple places in which it is defined.
Data Definitions
1.0 Data can be inserted into a Voodoo
program by means of the magic
words byte
, word
, and
string
, followed by the value of the byte, word, or
string to be inserted. For example:
# A byte with value 42
byte 42
# A word with value -1
word -1
# The string "hello"
string "hello"
Groups
1.1 Groups are a mechanism to group multiple data definitions into a single unit. For example, to define a data structure consisting of a word and two bytes:
example:
group
word 42
byte 0
byte 1
end group
Import and Export
1.0 To allow Voodoo programs to refer to definitions in
other files, and to allow other files to refer to definitions in
Voodoo programs, the magic words import
and export
are used.
Using import
, definitions from elsewhere are made
available to a Voodoo program:
section data
import stderr
section functions
import fputs
# We can now refer to stderr and fputs
Data and functions defined in a Voodoo program can be made
available to other programs using export
:
section data
export answer
answer:
word 42
section functions
export foo
foo:
function
# some code here
end function
export
exports the first definition that follows it.
Groups can be used to group together data that must be exported as a
single unit.
If symbols are imported or exported, the import
or export
must occur before the first use of the symbol
in any section. To ensure portability, implementations should reject
programs that do not obey this constraint.
Alignment
1.0 Many architectures require that data and/or code
observe certain alignment restrictions. For example, an architecture may
require that a word of data be at an address that is a multiple of the
word size. Voodoo provides the magic word align
, which
specifies the alignment for the next program element.
Without any parameters, align
inserts filler bytes
into the current section, so that the next element added to the section
will respect the default alignment for the section. The default
alignment for each section is implementation-defined, but must ensure
that the alignment restrictions of the target platform are observed.
When written as align <n>
, where <n> is an
integer, the incantation will insert filler bytes as necessary to align
the next element to be added to the section on a multiple of <n>
bytes.
The filler bytes inserted by align
are unspecified.
In particular, they are not guaranteed to be valid code.
Example uses of the align
incantation:
section data
x:
byte 1
# Ensure that y is aligned according to
# the target platform's alignment restrictions
align
y:
word 42
section functions
# Ensure that foo is aligned according to
# the target platform's alignment restrictions
align
foo:
function n
# some code here
end function
Actions
Voodoo code consists of actions. Actions consist of a magic word, usually followed by a number of values. This section lists all actions supported by Voodoo. In the list below, '<x>', '<y>', and '<z>' denote values, '<symbol>' denotes a symbol, and '<expr>' denotes an expression. (expressions are discussed further on). Ellipsis (…) is used to denote that more items may follow, and square brackets ([ and ]) are used to denote that an item is optional.
- 1.0
call <x> <y> <z> …
Calls the function <x> with the parameters <y> <z> …. There may be zero or more parameters.
- 1.0
goto <x>
Continues the program at location <x>, rather than at the next action after the goto.
Any value can be used as the location to go to. However, the consequences of using
goto
to jump to a label outside the active frame are undefined. Since frames are introduced by blocks and functions, this means thatgoto
should not jump to a label in a different block or function, unless the frame of that block or function has been restored usingrestore-frame
.- 1.0
let <symbol> <expr>
Introduces a local variable <symbol>, and initializes it to the result of evaluating <expr>.
This action is only allowed inside functions or blocks, that is, between
function
and the correspondingend function
, or betweenblock
and the correspondingend block
.The scope of a variable introduced by
let
includes every statement after thelet
action and before theend function
orend block
that ends the function or block in which the variable is introduced.- 1.1
restore-frame <x>
Restores a frame that was saved at address <x>. After this action, any frames below the restored frame are no longer valid. The behavior of
restore-frame
is undefined if <x> does not contain valid frame information. The values of local variables are undefined afterrestore-frame
is performed.- 1.1
restore-locals <x> [<locals>]
Sets local variables to values that have been stored at address <x>. If one or more local variables are specified, restores the specified local variables. Else, restores all local variables in scope. The behavior is undefined if the information saved at <x> does not contain a value for any of the variables to be restored.
- 1.0
return [<expr>]
Returns from the current function. If <expr> is specified, it is evaluated and the function returns the result of the evaluation. Otherwise, the return value is unspecified.
- 1.1
save-frame <x>
Saves information about the current frame to a block of memory at address <x>. The block of memory must be large enough to hold the required information. Allocating
%saved-frame-size
bytes ensures a block of memory large enough to hold a saved frame.- 1.1
save-frame-and-locals <x> [<locals>]
Saves information about the current frame and local variables to a block of memory at address <x>. If one or more local variables are specified, at least the named local variables are saved. If no local variables are specified, all local variables are saved.
The block of memory starting at <x> must be large enough to hold the required information. Allocating
%saved-frame-size
bytes ensures a block of memory large enough to hold a saved frame plus local variables.- 1.1
save-locals <x> [<locals>]
Saves local variables to a block of memory at address <x>. If one or more local variables are specified, at least the named local variables are saved. If no local variables are specified, all local variables are saved.
The block of memory starting at <x> must be large enough to hold the required information. Allocating
%saved-frame-size
bytes ensures a block of memory that is large enough.This action does not invalidate frame information or local variables previously stored at <x>, except for updating the stored values for the local variables it saves to the values these local variables currently have.
- 1.0
set <symbol> <expr>
Evaluates <expr> and assigns the result to <symbol>. <symbol> may not be a label, because labels cannot be assigned to.
- 1.1
set <at-expr> <expr>
Evaluates <expr> and stores the result at the address specified by the at-expression <at-expr>.
- 1.0
set-byte <base> <offset> <x>
Sets the byte at <base> + <offset> to <x>. <offset> is given as a number of bytes.
- 1.0
set-word <base> <offset> <x>
Sets the word at <base> + WORDSIZE * <offset> to <x>.
The address computed by <base> + WORDSIZE * <offset> is expected to be a multiple of the word size. The behavior of
set-word
is undefined if this condition is not satisfied.- 1.0
tail-call <x> <y> <z> ...
Performs a tail call to the function <x> with parameters <y> <z> …. This has an effect similar to 'return call <x> <y> <z> ...', but re-uses the call frame of the current function. This means that if <x> takes fewer or at most as many parameters as the current function, the tail call requires no extra space.
Expressions
Certain actions have the ability to evaluate expressions. Expressions can be simple values, but expressions can also perform computations on values. The following are valid expressions:
- 1.0
add <x> <y>
The result of adding <y> to <x>.
If the result of the addition cannot be represented in a single word, the result of
add
is undefined.- 1.0
and <x> <y>
The bitwise and of <x> and <y>.
- 1.0
asr <x> <y>
Performs an arithmetic right shift. The bits in <x> are shifted <y> positions to the right. The sign of <x> is preserved, so that the result has the same sign as <x>. The result is undefined if <y> is negative.
- 1.1
auto-bytes <x>
Allocates <x> bytes of memory. The allocated bytes will automatically be released after the current frame ends. The result is the address of the allocated bytes.
- 1.1
auto-words <x>
Allocates <x> words of memory. The allocated words will automatically be released after the current frame ends. The result is the address of the allocated words.
- 1.0
bsr <x> <y>
Performs a bitwise right shift. The bits in <x> are shifted <y> positions to the right. <y> zero-valued bits are shifted in from the left. The result does not necessarily have the same sign as <x>. The result is undefined if <y> is negative.
- 1.0
call <x> <y> <z> ...
Similar to the action call, this calls the function <x> with the parameters <y> <z> ... (there may be zero or more parameters). The result of this expression is the value returned from the function.
- 1.0
div <x> <y>
The (integer) result of dividing <x> by <y>.
If <x> ≥ 0 and <y> > 0, the result is the largest integer equal to or less than the algebraic quotient of <x> and <y>.
If either <x> or <y> is negative, the result is implementation-defined.
If <y> is zero, or if the quotient cannot be represented in a single machine word, the result is undefined.
- 1.0
get-byte <base> <offset>
The value of the byte at address <base> + <offset>.
- 1.0
get-word <base> <offset>
The value of the word at address <base> + (WORDSIZE * <offset>).
The address computed as <base> + (WORDSIZE * <offset>) is expected to be a multiple of the word size. If this condition is not met, the behavior of
get-word
is undefined.- 1.0
mod <x> <y>
For <x> ≥ 0 and <y> > 0, returns <x> modulo <y>.
If either <x> or <y> is negative, the result is implementation-defined.
If <y> is zero, the result is undefined.
- 1.0
mul <x> <y>
The result of multiplying <x> by <y>.
If the algebraic result of <x> * <y> cannot be represented in a single word, the result of
mul <x> <y>
contains only the low-order bits of the full result.- 1.0
not <x>
The ones' complement of <x>; i.e. all the bits in <x> inverted.
- 1.0
or <x> <y>
The bitwise or of <x> and <y>.
- 1.0
rol <x> <y>
Rotates the bits in <x> to the left by <y> positions. Bits that are rotated off the left are inserted on the right. The result is undefined if <y> is negative.
- 1.0
ror <x> <y>
Rotates the bits in <x> to the right by <y> positions. Bits that are rotated off the right are inserted on the left. The result is undefined if <y> is negative.
- 1.0
shl <x> <y>
Performs a left shift. The bits in <x> are shifted <y> positions to the left. The result is undefined if <y> is negative.
- 1.0
shr <x> <y>
Performs a right shift. The bits in <x> are shifted <y> positions to the right. It is implementation-defined whether this operation preserves the sign of <x> (for operations with specific sign-preservation properties, use asr or bsr). The result is undefined if <y> is negative.
- 1.0
sub <x> <y>
The result of subtracting <y> from <x>.
If the result of the subtraction cannot be represented in a single word, the result of
sub
is undefined.- 1.0
xor <x> <y>
The bitwise exclusive or of <x> and <y>.
Conditionals
1.0 Conditionals in Voodoo take the following form:
if<test>
... some code here ...
else if<test>
... other code here ...
... more "else if" parts ...
else
... some code ...
end if
There can be any number of else if
parts, and
the final else
clause is optional. The tests that are provided
are the following:
ifeq <x> <y>
Tests if <x> is equal to <y>.
ifge <x> <y>
Tests if <x> is greater than or equal to <y>.
ifgt <x> <y>
Tests if <x> is strictly greater than <y>.
ifle <x> <y>
Tests if <x> is less than or equal to <y>.
iflt <x> <y>
Tests if <x> is strictly less than <y>.
ifne <x> <y>
Tests if <x> is different from <y>.
Function Definitions
1.0 A function definition looks like:
function x y z
<code>
end function
Here, the function being defined takes 3 parameters,
which can be referred to as x
, y
, and
z
from the code inside the function body. A function may
have zero or more parameters and is practically always preceded by a
label, so that the function can be referenced from code.
A function should only be entered using call
,
tail-call
, or entering a prior invocation of the function
using restore-frame
and goto
. When calling
a function, the number of supplied parameters should equal the number
of parameters the function was declared with.
A function should only be left through return
,
tail-call
, or a valid combination
of restore-frame
and goto
.
Failing to meet these requirements results in undefined behavior.
Blocks
1.0 Blocks can be used to define a scope in which local
variables (introduced with let
) can be referred to. A
block looks like:
block
<code>
end block
Inside a block, variables may be introduced using
let
. Such a variable is in scope (can be referred to)
from the first statement after the let
until the
end block
that terminates the block.
Blocks can be placed anywhere an action can be placed: at top-level, inside functions, inside blocks, and inside conditionals.
Implementation
The Voodoo programming language is implemented by the Voodoo compiler. This implementation compiles Voodoo code to assembly code or object files for x86, AMD64, ARM, or MIPS. The implementation is written in Ruby. It is available under the terms of the LGPLv2 license and can be downloaded from the Voodoo compiler page.
Related Work
This section provides links to some projects that are related to the Voodoo programming language.
C--
C-- is a language with very similar aims to Voodoo's. Like Voodoo, C-- aims to be a target language for programming language implementations, providing a thin abstraction layer over the target platform without getting in the way of efficient implementation of high level constructs. Compared to Voodoo, C-- seems more complete. However, development on C-- implementations seems to have stagnated.
LLVM
LLVM, the Low-Level Virtual Machine, provides a compiler framework that has similar aims to Voodoo. Compared to Voodoo, LLVM provides many more features. This comes at the cost of more complexity and a much heavier implementation.
The Common Language Infrastructure
The Common Language Infrastructure (CLI), designed by Microsoft and published as ECMA-335, is a specification that allows programs written in a variety of high-level languages to compile to a common byte code format and share data and code with one another. Compared to Voodoo, it is much more extensive, including such things as a type system and garbage collection. Where Voodoo is lightweight and aims to be language-agnostic, the CLI is heavier and more geared towards statically typed languages with class-instance object systems.
TurboVM
TurboVM is a virtual machine that exposes a RISC instruction set, designed to be simple and efficient to generate, parse, and execute. Like Voodoo, TurboVM is intended as back end for programming implementations. Also like Voodoo, it aims to be simple and lightweight. Voodoo could be compiled to TurboVM byte code, and, conversely, TurboVM byte code could be compiled to Voodoo, either as a possible intermediate step to native code generation.
Alchemist
The Alchemist code generation library is a library that can be used to generate machine code. A possible future direction for Voodoo implementations is to use Alchemist to generate (and possibly execute) machine code on the fly.