| @@ -0,0 +1,77 @@ | |||
| # Lang2 Architecture | |||
| Lang2 is a programming language. Its implementation is split into two major parts: | |||
| the compiler and the virtual machine. There are also shared components. | |||
| ## I/O | |||
| The lang2 implementation uses a simple but effective I/O model defined in | |||
| [include/lang2/io.h](https://git.mort.coffee/mort/lang2/src/branch/master/include/lang2/io.h). | |||
| There are two basic "interface" types: the reader `l2_io_reader` | |||
| and the writer `l2_io_writer`. | |||
| A reader is something which data can be read from. A reader must implement | |||
| the `size_t read(struct l2_io_reader *self, void *buf, size_t len)` function, | |||
| which fills `buf` with up to `len` bytes, and returns the number of bytes written. | |||
| The reader should block until something is read into `buf`. A return value of | |||
| 0 should be interpreted as EOF. | |||
| A writer is something which data can be written to. A writer must impleemnt | |||
| the `void write(struct l2_io_writer *self, void *buf, size_t len)` function, | |||
| which writes `len` bytes from `buf`. The writer should block until | |||
| all bytes are written. | |||
| For example writers, look at `l2_io_mem_writer` (with its `write` function | |||
| `l2_io_mem_write`) and `l2_io_file_writer` (with its `write` function `l2_io_file_write`). | |||
| For example readers, look at `l2_io_mem_reader` (with its `read` function | |||
| `l2_io_mem_read`) and `l2_io_file_reader` (with `read` function `l2_io_mem_read`). | |||
| Readers and writers aren't necessarily meant to be fast. Instead, if you're doing | |||
| a lot of reading or writing (especially short reads and writes), you should be | |||
| using the types `l2_bufio_reader` and `l2_bufio_writer`. These implement buffered | |||
| reading/writing, only calling the underlying `read` or `write` function when | |||
| the buffer gets full. The `l2_bufio_reader` also has the ability to peek | |||
| some number (less than `L2_IO_BUFSIZ`) forwards. | |||
| ## Atoms | |||
| In lang2, every identifier is an "atom" (name borrowed from Erlang). An atom is just | |||
| a number which represents the name. Every identifier gets its own ID, and every | |||
| occurrence of a particular identifier name will be associated with the same ID. | |||
| You can think of it like a hash, except that there are no collisions. | |||
| Every namespace (be that the namespace of local variables in a function, the | |||
| namespace of global functions, or a namespace variable) is just a map from | |||
| an integer ID (the atom) to a variable reference. | |||
| Atom literals such as `'foo` will evaluate to an atom variable which has the | |||
| same numeric ID as the identifier `foo`. This makes it possible to pass | |||
| names around at runtime. | |||
| ## Compiler | |||
| The compiler consists of the lexer, the parser and the code generator. | |||
| The lexer reads from a `reader` and produces tokens as it reads. | |||
| A token carries information about what kind of token it is (identifier, number, | |||
| open paren, etc). In addition, the token might carry extra information; | |||
| number tokens contain a number, dot-number tokens contain an integer, | |||
| and identifiers and strings contain a string. | |||
| The parser reads tokens from the lexer and uses the code generator to generate | |||
| bytecode. Unlike other languages, the parser doesn't produce a syntax tree; | |||
| it just emits bytecode as it goes. This is possible thanks to careful syntax | |||
| design combined with careful bytecode design. | |||
| The code generator contains, for the most part, thin wrappers around bytecode | |||
| instructions. For example, the `l2_gen_halt` function just contains one line | |||
| to write an `L2_OP_HALT` instruction. However, it also has the responsibility | |||
| of keeping track of the atoms (so that the name `foo` always gets the same ID | |||
| everywhere), and of keeping track of the string literals (so that multiple | |||
| string literals with the same content only show up once in the bytecode). | |||
| ## Virtual Machine | |||
| The virtual machine executes bytecode. | |||
| The central piece of the VM is the `l2_vm_step` function, which looks at the | |||
| code at the instruction pointer and executes it with a giant switch statement. | |||