6.16 The Text Interpreter

The text interpreter27 is an endless loop that processes input from the current input device. It is also called the outer interpreter, in contrast to the inner interpreter (see Engine) which executes the compiled Forth code on interpretive implementations.

The text interpreter operates in one of two states: interpret state and compile state. The current state is defined by the aptly-named variable state.

This section starts by describing how the text interpreter behaves when it is in interpret state, processing input from the user input device – the keyboard. This is the mode that a Forth system is in after it starts up.

The text interpreter works from an area of memory called the input buffer28, which stores your keyboard input when you press the RET key. Starting at the beginning of the input buffer, it skips leading spaces (called delimiters) then parses a string (a sequence of non-space characters) until it reaches either a space character or the end of the buffer. Having parsed a string, it makes two attempts to process it:

If both attempts fail, the text interpreter discards the remainder of the input buffer, issues an error message and waits for more input. If one of the attempts succeeds, the text interpreter repeats the parsing process until the whole of the input buffer has been processed, at which point it prints the status message “ ok” and waits for more input.

The text interpreter keeps track of its position in the input buffer by updating a variable called >IN (pronounced “to-in”). The value of >IN starts out as 0, indicating an offset of 0 from the start of the input buffer. The region from offset >IN @ to the end of the input buffer is called the parse area29. This example shows how >IN changes as the text interpreter parses the input buffer:

: remaining source >in @ /string
  cr ." ->" type ." <-" ; immediate 

1 2 3 remaining + remaining . 

: foo 1 2 3 remaining swap remaining ;

The result is:

->+ remaining .<-
->.<-5  ok

->SWAP remaining ;-<
->;<-  ok

The value of >IN can also be modified by a word in the input buffer that is executed by the text interpreter. This means that a word can “trick” the text interpreter into either skipping a section of the input buffer30 or into parsing a section twice. For example:

: lat ." <<foo>>" ;
: flat ." <<bar>>" >IN DUP @ 3 - SWAP ! ;

When flat is executed, this output is produced31:

<<bar>><<foo>>

This technique can be used to work around some of the interoperability problems of parsing words. Of course, it’s better to avoid parsing words where possible.

Two important notes about the behaviour of the text interpreter:

When the text interpreter is in compile state, its behaviour changes in these ways:

When the text interpreter is using an input device other than the keyboard, its behaviour changes in these ways:

You can read about this in more detail in Input Sources.

>in ( – addr  ) core “to-in”

uvar variable – a-addr is the address of a cell containing the char offset from the start of the input buffer to the start of the parse area.

source ( – addr u  ) core “source”

Return address addr and length u of the current input buffer

tib ( – addr  ) core-ext-obsolescent “t-i-b”
#tib ( – addr  ) core-ext-obsolescent “number-t-i-b”

uvar variable – a-addr is the address of a cell containing the number of characters in the terminal input buffer. OBSOLESCENT: source superceeds the function of this word.

interpret ( ... – ...  ) gforth-0.2 “interpret”

Footnotes

(27)

This is an expanded version of the material in Introducing the Text Interpreter.

(28)

When the text interpreter is processing input from the keyboard, this area of memory is called the terminal input buffer (TIB) and is addressed by the (obsolescent) words TIB and #TIB.

(29)

In other words, the text interpreter processes the contents of the input buffer by parsing strings from the parse area until the parse area is empty.

(30)

This is how parsing words work.

(31)

Exercise for the reader: what would happen if the 3 were replaced with 4?