|
|
|
|
|
|
|
|
|
|
|
# Lang2 Architecture |
|
|
|
|
|
|
|
|
|
|
|
Lang2 is a programming language. Its implementation is split into two major parts: |
|
|
|
|
|
the compiler and the virtual machine. There are also shared components. |
|
|
|
|
|
|
|
|
|
|
|
## I/O |
|
|
|
|
|
|
|
|
|
|
|
The lang2 implementation uses a simple but effective I/O model defined in |
|
|
|
|
|
[include/lang2/io.h](https://git.mort.coffee/mort/lang2/src/branch/master/include/lang2/io.h). |
|
|
|
|
|
There are two basic "interface" types: the reader `l2_io_reader` |
|
|
|
|
|
and the writer `l2_io_writer`. |
|
|
|
|
|
|
|
|
|
|
|
A reader is something which data can be read from. A reader must implement |
|
|
|
|
|
the `size_t read(struct l2_io_reader *self, void *buf, size_t len)` function, |
|
|
|
|
|
which fills `buf` with up to `len` bytes, and returns the number of bytes written. |
|
|
|
|
|
The reader should block until something is read into `buf`. A return value of |
|
|
|
|
|
0 should be interpreted as EOF. |
|
|
|
|
|
|
|
|
|
|
|
A writer is something which data can be written to. A writer must impleemnt |
|
|
|
|
|
the `void write(struct l2_io_writer *self, void *buf, size_t len)` function, |
|
|
|
|
|
which writes `len` bytes from `buf`. The writer should block until |
|
|
|
|
|
all bytes are written. |
|
|
|
|
|
|
|
|
|
|
|
For example writers, look at `l2_io_mem_writer` (with its `write` function |
|
|
|
|
|
`l2_io_mem_write`) and `l2_io_file_writer` (with its `write` function `l2_io_file_write`). |
|
|
|
|
|
For example readers, look at `l2_io_mem_reader` (with its `read` function |
|
|
|
|
|
`l2_io_mem_read`) and `l2_io_file_reader` (with `read` function `l2_io_mem_read`). |
|
|
|
|
|
|
|
|
|
|
|
Readers and writers aren't necessarily meant to be fast. Instead, if you're doing |
|
|
|
|
|
a lot of reading or writing (especially short reads and writes), you should be |
|
|
|
|
|
using the types `l2_bufio_reader` and `l2_bufio_writer`. These implement buffered |
|
|
|
|
|
reading/writing, only calling the underlying `read` or `write` function when |
|
|
|
|
|
the buffer gets full. The `l2_bufio_reader` also has the ability to peek |
|
|
|
|
|
some number (less than `L2_IO_BUFSIZ`) forwards. |
|
|
|
|
|
|
|
|
|
|
|
## Atoms |
|
|
|
|
|
|
|
|
|
|
|
In lang2, every identifier is an "atom" (name borrowed from Erlang). An atom is just |
|
|
|
|
|
a number which represents the name. Every identifier gets its own ID, and every |
|
|
|
|
|
occurrence of a particular identifier name will be associated with the same ID. |
|
|
|
|
|
You can think of it like a hash, except that there are no collisions. |
|
|
|
|
|
|
|
|
|
|
|
Every namespace (be that the namespace of local variables in a function, the |
|
|
|
|
|
namespace of global functions, or a namespace variable) is just a map from |
|
|
|
|
|
an integer ID (the atom) to a variable reference. |
|
|
|
|
|
|
|
|
|
|
|
Atom literals such as `'foo` will evaluate to an atom variable which has the |
|
|
|
|
|
same numeric ID as the identifier `foo`. This makes it possible to pass |
|
|
|
|
|
names around at runtime. |
|
|
|
|
|
|
|
|
|
|
|
## Compiler |
|
|
|
|
|
|
|
|
|
|
|
The compiler consists of the lexer, the parser and the code generator. |
|
|
|
|
|
|
|
|
|
|
|
The lexer reads from a `reader` and produces tokens as it reads. |
|
|
|
|
|
A token carries information about what kind of token it is (identifier, number, |
|
|
|
|
|
open paren, etc). In addition, the token might carry extra information; |
|
|
|
|
|
number tokens contain a number, dot-number tokens contain an integer, |
|
|
|
|
|
and identifiers and strings contain a string. |
|
|
|
|
|
|
|
|
|
|
|
The parser reads tokens from the lexer and uses the code generator to generate |
|
|
|
|
|
bytecode. Unlike other languages, the parser doesn't produce a syntax tree; |
|
|
|
|
|
it just emits bytecode as it goes. This is possible thanks to careful syntax |
|
|
|
|
|
design combined with careful bytecode design. |
|
|
|
|
|
|
|
|
|
|
|
The code generator contains, for the most part, thin wrappers around bytecode |
|
|
|
|
|
instructions. For example, the `l2_gen_halt` function just contains one line |
|
|
|
|
|
to write an `L2_OP_HALT` instruction. However, it also has the responsibility |
|
|
|
|
|
of keeping track of the atoms (so that the name `foo` always gets the same ID |
|
|
|
|
|
everywhere), and of keeping track of the string literals (so that multiple |
|
|
|
|
|
string literals with the same content only show up once in the bytecode). |
|
|
|
|
|
|
|
|
|
|
|
## Virtual Machine |
|
|
|
|
|
|
|
|
|
|
|
The virtual machine executes bytecode. |
|
|
|
|
|
The central piece of the VM is the `l2_vm_step` function, which looks at the |
|
|
|
|
|
code at the instruction pointer and executes it with a giant switch statement. |