6.8.4 String words

Words that are used for memory blocks are also useful for strings, so for words that move, copy, compare and search strings, see Memory Blocks. For words that display characters and strings, see Displaying characters and strings.

The following words work on previously existing strings:

compare ( c-addr1 u1 c-addr2 u2 – n ) string “compare”

Compare two strings lexicographically, based on the values of the bytes in the strings (i.e., case-sensitive and without locale-specific collation order). If they are equal, n is 0; if the string in c_addr1 u1 is smaller, n is -1; if it is larger, n is 1.

str= ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “str-equals”

Bytewise equality

str< ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “str-less-than”

Bytewise lexicographic comparison.

string-prefix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “string-prefix-question”

Is c-addr2 u2 a prefix of c-addr1 u1?

string-suffix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-1.0 “string-suffix-question”

Is c-addr2 u2 a suffix of c-addr1 u1?

search ( c-addr1 u1 c-addr2 u2 – c-addr3 u3 flag  ) string “search”

Search the string specified by c-addr1, u1 for the string specified by c-addr2, u2. If flag is true: match was found at c-addr3 with u3 characters remaining. If flag is false: no match was found; c-addr3, u3 are equal to c-addr1, u1.

scan ( c-addr1 u1 c – c-addr2 u2 ) gforth-0.2 “scan”

Skip all characters not equal to c. The result starts with c or is empty. Scan is limited to single-byte (ASCII) characters. Use search to search for multi-byte characters.

scan-back ( c-addr u1 c – c-addr u2  ) gforth-0.7 “scan-back”

The last occurence of c in c-addr u1 is at c-addr+u2−1; if it does not occur, u2=0.

skip ( c-addr1 u1 c – c-addr2 u2 ) gforth-0.2 “skip”

Skip all characters equal to c. The result starts with the first non-c character, or it is empty. Scan is limited to single-byte (ASCII) characters.

$split ( c-addr u char – c-addr u1 c-addr2 u2  ) gforth-0.7 “string-split”

Divides a string c-addr u into two, with char as separator. U1 is the length of the string up to, but excluding the first occurence of the separator, c-addr2 u2 is the part of the input string behind the separator. If the separator does not occur in the string, u1=u, u2=0 and c-addr2=c-addr+u.

nosplit? ( addr1 u1 addr2 u2 –  addr1 u1 addr2 u2 flag  ) gforth-experimental “nosplit?”

Used on the result of $split, flag is true iff the separator does not occur in the input string of $split.

-trailing ( c_addr u1 – c_addr u2  ) string “dash-trailing”

Adjust the string specified by c-addr, u1 to remove all trailing spaces. u2 is the length of the modified string.

/string ( c-addr1 u1 n – c-addr2 u2 ) string “slash-string”

Adjust the string specified by c-addr1, u1 to remove n characters from the start of the string.

safe/string ( c-addr1 u1 n – c-addr2 u2 ) gforth-1.0 “safe-slash-string”

Adjust the string specified by c-addr1, u1 to remove n characters from the start of the string. Unlike /string, safe/string removes at least 0 and at most u1 characters.

insert ( c-addr1 u1 c-addr2 u2 –  ) gforth-0.7 “insert”

Move the contents of the buffer c-addr2 u2 towards higher addresses by u1 chars, and copy the string c-addr1 u1 into the first u1 chars of the buffer.

delete ( c-addr u u1 –  ) gforth-0.7 “delete”

In the memory block c-addr u, delete the first u1 chars by copying the contents of the block starting at c-addr+u1 there; fill the u1 characters at the end of the block with blanks.

cstring>sstring ( c-addr – c-addr u  ) gforth-0.2 “cstring-to-sstring”

C-addr is the start address of a zero-terminated string, u is its length.

The following words compare case-insensitively for ASCII characters, but case-sensitively for non-ASCII characters (like in lookup in wordlists).

capscompare ( c-addr1 u1 c-addr2 u2 – n ) gforth-0.7 “capscompare”

Compare two strings lexicographically, based on the values of the bytes in the strings, but comparing ASCII characters case-insensitively, and non-ASCII characters case-sensitively and without locale-specific collation order. If they are equal, n is 0; if the first string is smaller, n is -1; if the first string is larger, n is 1.

capsstring-prefix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-1.0 “capsstring-prefix?”

Like string-prefix?, but case-insensitive for ASCII characters: Is c-addr2 u2 a prefix of c-addr1 u1?

capssearch ( c-addr1 u1 c-addr2 u2 – c-addr3 u3 flag  ) gforth-1.0 “capssearch”

Like search, but case-insensitive for ASCII characters: Search for c-addr2 u2 in c-addr1 u1; flag is true if found.

The following words create or extend strings on the heap:

s+ ( c-addr1 u1 c-addr2 u2 – c-addr u  ) gforth-0.7 “s-plus”

c-addr u is a newly allocated string that contains the concatenation of c-addr1 u1 (first) and c-addr2 u2 (second).

append ( c-addr1 u1 c-addr2 u2 – c-addr u  ) gforth-0.7 “append”

C-addr u is the concatenation of c-addr1 u1 (first) and c-addr2 u2 (second). c-addr1 u1 is an allocated string, and append resizes it (possibly moving it to a new address) to accomodate u characters.

>string-execute ( ... xt – ... c-addr u  ) gforth-1.0 “>string-execute”

Execute xt while the standard output (type, emit, and everything that uses them) is redirected to a string. The resulting string is c-addr u, which is in heap memory; it is the responsibility of the caller of >string-execute to free this string.

One could define s+ using >string-execute, as follows:

: s+ ( c-addr1 u1 c-addr2 u2 – c-addr u ) [: 2swap type type ;] >string-execute ;

For concatenating just two strings >string-execute is inefficient, but for concatenating many strings >string-execute can be more efficient.