You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ARCHITECTURE.md 3.8KB

7 months ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. # Lang2 Architecture
  2. Lang2 is a programming language. Its implementation is split into two major parts:
  3. the compiler and the virtual machine. There are also shared components.
  4. ## I/O
  5. The lang2 implementation uses a simple but effective I/O model defined in
  6. [include/lang2/io.h](https://git.mort.coffee/mort/lang2/src/branch/master/include/lang2/io.h).
  7. There are two basic "interface" types: the reader `l2_io_reader`
  8. and the writer `l2_io_writer`.
  9. A reader is something which data can be read from. A reader must implement
  10. the `size_t read(struct l2_io_reader *self, void *buf, size_t len)` function,
  11. which fills `buf` with up to `len` bytes, and returns the number of bytes written.
  12. The reader should block until something is read into `buf`. A return value of
  13. 0 should be interpreted as EOF.
  14. A writer is something which data can be written to. A writer must impleemnt
  15. the `void write(struct l2_io_writer *self, void *buf, size_t len)` function,
  16. which writes `len` bytes from `buf`. The writer should block until
  17. all bytes are written.
  18. For example writers, look at `l2_io_mem_writer` (with its `write` function
  19. `l2_io_mem_write`) and `l2_io_file_writer` (with its `write` function `l2_io_file_write`).
  20. For example readers, look at `l2_io_mem_reader` (with its `read` function
  21. `l2_io_mem_read`) and `l2_io_file_reader` (with `read` function `l2_io_mem_read`).
  22. Readers and writers aren't necessarily meant to be fast. Instead, if you're doing
  23. a lot of reading or writing (especially short reads and writes), you should be
  24. using the types `l2_bufio_reader` and `l2_bufio_writer`. These implement buffered
  25. reading/writing, only calling the underlying `read` or `write` function when
  26. the buffer gets full. The `l2_bufio_reader` also has the ability to peek
  27. some number (less than `L2_IO_BUFSIZ`) forwards.
  28. ## Atoms
  29. In lang2, every identifier is an "atom" (name borrowed from Erlang). An atom is just
  30. a number which represents the name. Every identifier gets its own ID, and every
  31. occurrence of a particular identifier name will be associated with the same ID.
  32. You can think of it like a hash, except that there are no collisions.
  33. Every namespace (be that the namespace of local variables in a function, the
  34. namespace of global functions, or a namespace variable) is just a map from
  35. an integer ID (the atom) to a variable reference.
  36. Atom literals such as `'foo` will evaluate to an atom variable which has the
  37. same numeric ID as the identifier `foo`. This makes it possible to pass
  38. names around at runtime.
  39. ## Compiler
  40. The compiler consists of the lexer, the parser and the code generator.
  41. The lexer reads from a `reader` and produces tokens as it reads.
  42. A token carries information about what kind of token it is (identifier, number,
  43. open paren, etc). In addition, the token might carry extra information;
  44. number tokens contain a number, dot-number tokens contain an integer,
  45. and identifiers and strings contain a string.
  46. The parser reads tokens from the lexer and uses the code generator to generate
  47. bytecode. Unlike other languages, the parser doesn't produce a syntax tree;
  48. it just emits bytecode as it goes. This is possible thanks to careful syntax
  49. design combined with careful bytecode design.
  50. The code generator contains, for the most part, thin wrappers around bytecode
  51. instructions. For example, the `l2_gen_halt` function just contains one line
  52. to write an `L2_OP_HALT` instruction. However, it also has the responsibility
  53. of keeping track of the atoms (so that the name `foo` always gets the same ID
  54. everywhere), and of keeping track of the string literals (so that multiple
  55. string literals with the same content only show up once in the bytecode).
  56. ## Virtual Machine
  57. The virtual machine executes bytecode.
  58. The central piece of the VM is the `l2_vm_step` function, which looks at the
  59. code at the instruction pointer and executes it with a giant switch statement.