6.17.5.3 Defining recognizers

A recognizer is a Forth word with the stack effect ( c-addr u -- translation ). c-addr u describes the string to be recognized. The returned translation is an abstract, but transparent data type: On the top of stack, there is a single-cell translation token. If the recognizer does not recognize the string, it returns the translation token translate-none. If it does recognize the string, it returns a translation with a different translation token. The translation token also identifies how many other stack items the translation contains and how the translation will be processed later.

E.g., when you perform

"5" rec-number

it pushes 5 translate-cell on the stack, which is a translation with the translation token translate-cell.

You typically write a recognizer as ordinary colon definition that examines the string in some way, and then pushes the appropriate translation. E.g., a simple variant of rec-tick can be implemented as follows:

: rec-tick ( addr u -- translation )
    2dup "`" string-prefix? if
        1 /string find-name dup if
            name>interpret translate-cell exit then
        drop translate-none exit then
    rec-none ;

The only appropriate use of a translation is to pass it to one of the words for performing translation actions (see Performing translation actions).

A number of translation tokens already exist in Gforth and can be used in a recognizer you write. If none of them is appropriate for your recognizer, read the next section about defining your own translation tokens.

The system-defined translation token words are documented as removing some stack items and pushing a complete translation on the stack, e.g., for translate-cell ( x -- translation ). This makes the documentation uniform and avoids cumbersome descriptions. However, actually the current translation token words just push a cell-sized translation token on the stack (for translate-cell: ( -- translate-cell )), and, combined with the additional stack items (for translate-cell: ( x -- x translate-cell )), the result is a translation (for translate-cell: ( x -- translation )).

The text interpreter passes the output of the recognizer to a translation action (see Performing translation actions). Every translation action removes the translation from the stack, then may perform additional parsing, and finally performs the interpreting run-time of the translation token, or the compiling run-time, or the postponing run-time.

For each system-defined translation token we specify the interpreting run-time explicitly. Unless otherwise specified, the compiling run-time compiles the interpreting run-time. Unless otherwise specified, the postponing run-time compiles the compiling run-time.

In the rec-tick example above, if the recognizer recognizes, say, `dup, it returns xt-dup translate-cell. If the text interpreter then performs the compiling action, that action first removes this translation (these two cells), and compiles code that pushes xt-dup.

translate-name ( nt – translation  ) gforth-experimental

Interpreting run-time: ( ... -- ... )
Perform the interpretation semantics of nt.
Compiling run-time: ( ... -- ... )
Perform the compilation semantics of nt.

translate-cell ( x – translation  ) gforth-experimental

Interpreting run-time: ( -- x )

translate-dcell ( xd – translation  ) gforth-experimental

Interpreting run-time: ( -- xd )

translate-float ( r – translation  ) gforth-experimental

Interpreting run-time: ( -- r )

translate-complex ( r1 r2 – translation  ) gforth-experimental

Interpreting run-time: ( -- r1 r2 )

translate-string ( c-addr1 u1 – translation  ) gforth-experimental

Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the result of translating the \-escapes in c-addr1 u1.

scan-translate-string ( c-addr1 u1 ’ccc"’ – translation  ) gforth-experimental

Every translation action also parses until the first non-escaped ". The string c-addr u and the parsed input are concatenated, then the \-escapes are translated, giving c-addr2 u2.
Interpreting run-time: ( -- c-addr2 u2 )

translate-env ( c-addr1 u1 – translation  ) gforth-experimental

Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the content of the environment variable with name c-addr1 u1.

translate-to ( n xt – translation  ) gforth-experimental

xt belongs to a value-flavoured (or defer-flavoured) word, n is the index into the to-table: for xt (see Words with user-defined to etc.).
Interpreting run-time: ( ... -- ... )
Perform the to-action with index n in the to-table: of xt. Additional stack effects depend on n and xt.

One way to write a recognizer r is to call a recognizer (for the whole input of r or a substring) that recognizes more strings (e.g., rec-forth), and then look at the result to see if something was recognized that r actually deals with.

E.g., the actual implementation of rec-tick passes its input without the prefix ‘`’ to rec-forth and checks whether the resulting translation-token is nt translate-name, then converts nt to xt, and replaces translate-name with translate-cell. The benefit of this approach compared to our example implementation above is that, e.g., `environment:max-n works, where rec-scope recognizes environment:max-n.

The specific check for an nt used in rec-tick is rec-forth-nt?; it is implemented on top of the more general rec-filter.

rec-filter ( c-addr u xt: filter xt: rec – translation  ) gforth-experimental

Execute rec ( c-addr u -- translation1 ); translation1 is then examined with filter ( translation1 -- translation1 f ). If f is non-zero, translation is translation1, otherwise translation is translate-none.

rec-forth-nt? ( c-addr u – nt | 0  ) gforth-experimental “rec-forth-nt-question”

If rec-forth produces a result nt translate-name, return nt, otherwise 0.