String representations (Gforth Manual)

Next: Xchars and Unicode, Previous: Strings and Characters, Up: Strings and Characters [Contents][Index]

6.9.1 String representations ¶

Forth commonly represents strings as cell pair c-addr u on the stack; u is the length of the string in bytes (aka chars), and c-addr is the address of the first byte of the string. Note that a code point may be represented by a sequence of several chars in the string (and a user-perceived character may consist of several code points). See String words.

Another string representation is used with the string library of words containing $ (see $tring words). It uses the address of a cell-sized string handle to represent the string when its allocation plays a role, e.g., when appending to the string; this corresponds to owned strings in Rust. When only the content of the string is of interest, the c-addr u representation for the string is used with these words, too; the validity of a c-addr u pair ends when the underlying string is modified or freed; this corresponds to string slices in Rust.

A legacy string representation are counted strings, represented on the stack by c-addr. The char addressed by c-addr contains a character-count, n, of the string and the string occupies the subsequent n char addresses in memory. Counted strings are limited to 255 bytes in length. While counted strings may look attractive due to needing only one stack item, due to their limitations we recommend avoiding them, especially as input parameters of words. See Counted string words.