Forth supports chars (aka bytes), used by words such as c@
;
these can be used to represent an ASCII character.
Forth also supports extended characters, which may be represented by a sequence of several bytes (i.e., several chars). A common character encoding is the UTF-8 representation of Unicode.
In general, most code does not have to worry about extended
characters: In the string representation it does not matter whether a
byte is a part of an extended character, or it is a character by
itself, and words that consume chars (like emit
) also work when
the extended character is transferred as a sequence of chars. Forth
still provides words for dealing with extended characters
(see Xchars and Unicode).
In Unicode terms, chars are code units, whereas extended characters are code points. Note that an Unicode abstract character can consist of a sequence of code points, but Forth (like other programming languages) has no data type for individual abstract characters; of course, they can be represented as strings.
You can use the usual integer words on chars and Xchars on the stack, but Gforth also has some words for dealing with chars on the stack:
toupper
( c1 – c2 ) gforth-0.2 “toupper”
If c1 is a lower-case ASCII character, c2 is the equivalent upper-case character, otherwise c2 is c1.