|
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677 |
- # Lang2 Architecture
-
- Lang2 is a programming language. Its implementation is split into two major parts:
- the compiler and the virtual machine. There are also shared components.
-
- ## I/O
-
- The lang2 implementation uses a simple but effective I/O model defined in
- [include/lang2/io.h](https://git.mort.coffee/mort/lang2/src/branch/master/include/lang2/io.h).
- There are two basic "interface" types: the reader `l2_io_reader`
- and the writer `l2_io_writer`.
-
- A reader is something which data can be read from. A reader must implement
- the `size_t read(struct l2_io_reader *self, void *buf, size_t len)` function,
- which fills `buf` with up to `len` bytes, and returns the number of bytes written.
- The reader should block until something is read into `buf`. A return value of
- 0 should be interpreted as EOF.
-
- A writer is something which data can be written to. A writer must impleemnt
- the `void write(struct l2_io_writer *self, void *buf, size_t len)` function,
- which writes `len` bytes from `buf`. The writer should block until
- all bytes are written.
-
- For example writers, look at `l2_io_mem_writer` (with its `write` function
- `l2_io_mem_write`) and `l2_io_file_writer` (with its `write` function `l2_io_file_write`).
- For example readers, look at `l2_io_mem_reader` (with its `read` function
- `l2_io_mem_read`) and `l2_io_file_reader` (with `read` function `l2_io_mem_read`).
-
- Readers and writers aren't necessarily meant to be fast. Instead, if you're doing
- a lot of reading or writing (especially short reads and writes), you should be
- using the types `l2_bufio_reader` and `l2_bufio_writer`. These implement buffered
- reading/writing, only calling the underlying `read` or `write` function when
- the buffer gets full. The `l2_bufio_reader` also has the ability to peek
- some number (less than `L2_IO_BUFSIZ`) forwards.
-
- ## Atoms
-
- In lang2, every identifier is an "atom" (name borrowed from Erlang). An atom is just
- a number which represents the name. Every identifier gets its own ID, and every
- occurrence of a particular identifier name will be associated with the same ID.
- You can think of it like a hash, except that there are no collisions.
-
- Every namespace (be that the namespace of local variables in a function, the
- namespace of global functions, or a namespace variable) is just a map from
- an integer ID (the atom) to a variable reference.
-
- Atom literals such as `'foo` will evaluate to an atom variable which has the
- same numeric ID as the identifier `foo`. This makes it possible to pass
- names around at runtime.
-
- ## Compiler
-
- The compiler consists of the lexer, the parser and the code generator.
-
- The lexer reads from a `reader` and produces tokens as it reads.
- A token carries information about what kind of token it is (identifier, number,
- open paren, etc). In addition, the token might carry extra information;
- number tokens contain a number, dot-number tokens contain an integer,
- and identifiers and strings contain a string.
-
- The parser reads tokens from the lexer and uses the code generator to generate
- bytecode. Unlike other languages, the parser doesn't produce a syntax tree;
- it just emits bytecode as it goes. This is possible thanks to careful syntax
- design combined with careful bytecode design.
-
- The code generator contains, for the most part, thin wrappers around bytecode
- instructions. For example, the `l2_gen_halt` function just contains one line
- to write an `L2_OP_HALT` instruction. However, it also has the responsibility
- of keeping track of the atoms (so that the name `foo` always gets the same ID
- everywhere), and of keeping track of the string literals (so that multiple
- string literals with the same content only show up once in the bytecode).
-
- ## Virtual Machine
-
- The virtual machine executes bytecode.
- The central piece of the VM is the `l2_vm_step` function, which looks at the
- code at the instruction pointer and executes it with a giant switch statement.
|