Utilities

Blocks 180 to 199 of strongForth's default block file forth.blk contain a collection of samples and utilities. Some of them demonstrate specific features of strongForth, while others point out certain aspects of data typing and may help overcoming difficult situations.

Block 180: Upper/Lower Case Conversion

Applying UPPER or LOWER to characters converts letters to upper or lower case, respectively. Non-letter characters remain unchanged. To be able to apply the same conversions on character strings, overloaded definitions of UPPER and LOWER are provided:

180 LOAD
 OK
CHAR @ LOWER .
@ OK
CHAR A LOWER .
a OK
PARSE-WORD STRONGFORTH OVER OVER LOWER TYPE
strongforth OK

Blocks 181 to 182: ASCII Control Characters

The ANS Forth definitions CHAR and [CHAR] are very handy when it comes to producing single character literals. However, these words can only produce printable ASCII characters. Whenever ASCII control characters are required, it might be useful to have a set of corresponding constants available.

CTRL is a variation of CHAR. It accepts only letters in lower or upper case and returns the corresponding control character:

180 182 THRU
 OK
CTRL G .
beep! OK

All control characters that can be produced by using CTRL are defined as constants in block 182. The remaining 6 control characters (including <NUL>)are generated using explicit type casts.

Block 183: Temporary Base Switch

Sometimes it is necessary to temporarily switch the number conversion radix BASE to a different value, for example when creating a single hexadecimal number or displaying one number in binary, while all other number input and output is done in decimal. This is rather clumsy in Forth:

BASE @ >R HEX 03FF R> BASE !
BASE @ >R 2 BASE ! . R> BASE !

In these cases, a word is useful that saves BASE, assigns it a new value, executes an arbitrary word and then immediately restores BASE to the original value. Here is how such a word can be used:

183 LOAD
 OK
X' 03FF .
1023  OK
33333 B' .
1000001000110101  OK

In block 183, five of these words are defined:

This is a good opportunity to show how to use defining words in strongForth. :BASE actually looks almost the same as it would in ANS Forth, except for the stack diagram after DOES>. This stack diagram is mandatory. It tells the compiler that the runtime code of a new definition starts with an address of data type CONST -> UNSIGNED on the data stack. This is the address of the definition's data field, which contains the temporary number conversion radix as a constant. To make sure that the temporary value of BASE is stored in the constant data space, CONST-SPACE must be executed before creating any definitions with :BASE. Of course, this is not necessary if CONST, is used instead of , within :BASE. Remember that a definition's data field is always located in the constant data space.

Block 184: Shortcut For IF ... THEN

In many applications of conditional clauses, only one single word is to be compiled between IF and THEN. ?? provides a shortcut for those cases. Typical words ?? is applied to are EXIT, LEAVE, NEGATE and CR. For example, instead of writing

: ABS ( SIGNED -- 1ST )
  DUP 0< IF NEGATE THEN ;

you can alternatively write

: ABS ( SIGNED -- 1ST )
  DUP 0< ?? NEGATE ;

The compiled code is exactly the same.

Block 185: Replacement for ?DUP

StrongForth requires that all words have a well-defined stack diagram. It is not possible to define ANS Forth words like ?DUP, FIND, PICK and ROLL, whose stack effects depend on conditions that are generally not known at compile time. Some of these words, like FIND, can easily be modified to have unambiguous stack diagrams. Most other words with ambiguous stack diagrams are dispensible. The necessity for PICK and ROLL, for example, arises only in badly factored code and can simply be avoided by using local variables.

Nevertheless, many Forth programmers will miss ?DUP, because this word helps in many situations to keep the code short. Since it is almost always used to get rid of the ELSE branch of a conditional structure, a replacement for this typical application has been found. ?IF replaces the ANS Forth sequence ?DUP IF. Actually, ?IF is just a macro that explicitely compiles an ELSE branch to drop the zero value.

?DUP IF ... THEN

is the same as

DUP 0= IF DROP ELSE ... THEN

which in turn can be abbreviated by using ?IF:

?IF ... THEN

Block 186: Locals In Reverse Order

The ANS Forth word LOCALS| is unintuitive in the sense that the names of the locals apparently have to be provided in the wrong order. For example, in

: TEST1 ( FLAG CHARACTER UNSIGNED -- )
  LOCALS| U C F | U . C . SPACE F . ;
 OK
TRUE CHAR A 17 TEST1
17 A TRUE  OK

the first input parameter gets assigned to the last local and vice versa. The reason is that LOCALS| processes the names in it's list one by one, assigning the top of the stack to the first name and so on. LOCALS( ... ) is an alternative to LOCALS| ... | that assignes the items on the stack to the locals in the correct order:

: TEST2 ( FLAG CHARACTER UNSIGNED -- )
  LOCALS( F C U ) F . C . SPACE U . ;
 OK
TRUE CHAR A 17 TEST2
TRUE A 17  OK

LOCALS| uses a recursive algorithm instead of an iterative algorithm, which allows to elegantly reverse the order of defining the locals.

Block 187: Specifying Machine Code

All words that define new words directly or indirectly use END-CODE to finish the definition. END-CODE stores the address of the machine code in the word's code field. Of course, the machine code has to be compiled before END-CODE can be executed. This is exactly what happens during compilation of a code definition:

CODE DROP ( SINGLE -- )
  AX POP, NEXT, END-CODE

In order to create the machine code within a defining word, you would typically compile a phrase like this:

 ... [ CODE-HERE machine-code NEXT, ] LITERAL SWAP END-CODE ...
LITERAL compiles the address of the machine code as a literal. But this looks a little bit awkward. To emphasize what is actually being done here, you may alternatively use the phrase

 ... CODE[ machine-code NEXT, ]END-CODE ...
which does exactly the same. Examples of defining words that take advantage of CODE[ and ]END-CODE are given in blocks 188 to 190.

Blocks 188 to 189: Forward References

As a general rule, every Forth word has to be defined before it can be either compiled or executed. However, there are cases in which it is desireable to compile a word before it is actually defined. Compiling a word that has not yet been defined is called a forward reference. A typical application of forward references are recursions that extend over more than one word. If word A uses word B and word B in turn uses word A, none of these two words can be defined first:

: A ( ... -- ... ) ... B ... ;
: B ( ... -- ... ) ... A ... ;

Using execution tokens is a possible solution for this problem:

( ... -- ... )PROCREATES T
NULL T VALUE (B)
: A ( ... -- ... ) ... (B) EXECUTE ... ;
: B ( ... -- ... ) ... A ... ;
DT T >TOKEN B TO (B)

But there's a more elegant solution:

: B ( ... -- ... )DEFER;
: A ( ... -- ... ) ... B ... ;
: (B) ( ... -- ... ) ... A ... ;
LATEST IS B

The definition of B is deferred until after A has been defined. )DEFER; creates a word whose machine code executes a yet to be defined colon definition (B). IS calculates the parameter field address of the colon definition (B) and stores it in the parameter field of the deferred definition.B. Of course, IS checks that

The third one of these checks is done by the word ?IDENTICAL, which expects two items of data type DEFINITION on the stack. Here's a simple example about how to use )DEFER; and IS:

: FLIP  ( UNSIGNED -- 1ST )DEFER;
 OK
: FLOP  ( UNSIGNED -- 1ST )
  ." FLOP " DUP 1 > IF 1- FLIP THEN ;
 OK
:NONAME ( UNSIGNED -- 1ST )
  ." FLIP " FLOP ; IS FLIP
 OK
3 FLIP
FLIP FLOP FLIP FLOP FLIP FLOP  OK
3 FLOP
FLOP FLIP FLOP FLIP FLOP  OK

Block 190: Replacement For CREATE

The ANS Forth standard specifies that the default semantics of a word created by CREATE is to return the address of its data field. In strongForth, this behaviour does not make much sense, because an item of data type ADDRESS almost always needs to be casted to a more specific data type before it can be used in some meaningful way. In order to overcome this issue, the new word )ADDRESS; has been defined. It's typical usage is as follows:

: name ( -- CONST -> ... )ADDRESS;
: name ( -- CCONST -> ... )ADDRESS;

name is the name of a new definition. When name is executed, it returns its data field address as an item of data type CONST -> ... or CCONST -> .... Here are two examples:

: NUMBER-OF-DAYS ( -- CONST -> UNSIGNED )ADDRESS; CONST-SPACE
 OK
31 , 28 , 31 , 30 , 31 , 30 , 31 , 31 , 30 , 31 , 30 , 31 ,
 OK
NUMBER-OF-DAYS .S
CONST -> UNSIGNED  OK
5 + @ .
30  OK
: ABC ( -- CCONST -> CHARACTER )ADDRESS;
 OK
CONST-SPACE HEX 41 C, 42 C, 43 C, ALIGN
 OK
ABC 3 TYPE
ABC OK

Remember that the data field is always allocated in the constant data space. )ADDRESS; performs some internal checks to make sure that the stack diagram has exactly one output parameter and no input parameter.

Block 191: Interpreting Overloaded Versions

Operator overloading allows defining multiple words with the same name but different stack diagrams. In ANS Forth, redefining a word means that the old versions is hidden to the interpreter. In strongForth, executing and compiling the old versions is still possible, provided that the stack diagrams of all overloaded versions can be clearly distinguished. But since the interpreter only considers the name and the input parameters when searching the dictionary for a match, it obviously cannot distinguish overloaded versions that differ only in the output parameters. In those cases, the interpreter never finds the older version.

A typical example is the word BL:

BL ( -- CHARACTER )

Once the 8086 assembler is loaded, an overloaded version of BL shows up in the dictionary, because BL is also a register name:

BL ( -- MODE )

Since both versions have no input parameters, INTERPRET has no means to distinguish them and will always find the latest one:

BL .S .
MODE 29556760  OK

With )INTERPRET, the inaccessible version of BL can be made available. )INTERPRET allows specifying the complete stack diagram, including the output parameters, in the dictionary search. Only words whose stack diagrams exactly match the given one are actually found. )INTERPRET is a state-smart word that works both in interpretation and in compilation state:

( -- CHARACTER )INTERPRET BL .S .
CHARACTER   OK
: TEST [CHAR] 0 ( -- CHARACTER )INTERPRET BL DO I . LOOP ;
 OK
TEST
 !"#$%&'()*+,-./ OK
( -- INTEGER )INTERPRET BL

( -- INTEGER )INTERPRET BL ? undefined word

Blocks 192 to 193: Compiling EVALUATE

In most cases, EVALUATE is executed by an immediate word in compilation state. Any stack effects of the words EVALUATEd are directly applied to the compiler data type heap. A typical example is the application of EVALUATE in POSTPONE. The most important aspect of this use case is the fact that the context of the string being evaluated is not the word EVALUATE is being compiled into, but the environment that actually executes the word containing EVALUATE. With respect to POSTPONE, the context of the string to be evaluated is the compile time of the word executing POSTPONE. But this use case is not restricted to compilation. In the following example, the context of the evaluated string is the interpreter:

: TEST ( -- )
  " SPACES" EVALUATE ;
 OK
5 TEST
      OK

TEST does not have a stack diagram, but since SPACES expects an item of data types SIGNED or UNSIGNED on the data stack, the following usage fails:

TRUE TEST

SPACES ? undefined word
FLAG

Now, what can we do if the context of the string to be evaluated shall be the word in which EVALUATE is embedded in? In ANS Forth, this is no problem at all, but strongForth's data type system does not like it at all:

: TEST ( -- )
  5 " SPACES" EVALUATE ;

  5 " SPACES" EVALUATE ; ? data types not congruent
UNSIGNED

We have to face the problem that the stack effect of the character string being evaluated at runtime is not known to the compiler in general. What is required is an extended version of EVALUATE which informs the compiler about the expected stack effect of the evaluated string. Furthermore, the extended version has to validate at runtime that the evaluated string really has the stack effect the compiler assumed it would have. The previous example simply has to be rewritten as follows:

: TEST ( -- )
  5 " SPACES" ( UNSIGNED -- )EVALUATE ;
 OK
TEST
      OK

This is what )EVALUATE does: First of all, it creates a noname definition with the given stack diagram and the same execution token as NOOP. This means, the definition has no semantics. Next, )EVALUATE compiles the runtime code, which consists of the noname definition as a literal of data type DEFINITION and the word (EVALUATE). (EVALUATE) expects the character string and the noname definition as input parameters on the stack. Finally, )EVALUATE compiles the noname definition in order to apply its stack effect to the compiler data type heap. Because the noname definition has no semantics, no virtual code is being compiled:

SEE TEST
: TEST ( -- )
5 " SPACES" 477652010 (EVALUATE) ;  OK

Now, what does (EVALUATE) do? As for EVALUATE, there are two overloaded versions for strings in the DATA and CONST memory areas:

(EVALUATE) ( CDATA -> CHARACTER UNSIGNED DEFINITION -- )
(EVALUATE) ( CCONST -> CHARACTER UNSIGNED DEFINITION -- )

The definitions of these two versions look exactly the same, because the only difference is the overloaded version of EVALUATE they execute. Their semantics is pretty simple. They put the input parameters of the noname definition onto the current data type heap, evaluate the character string with EVALUATE and then check whether the resulting contents of the data type heap match the output parameters of the noname definition.

Block 194: Allocating Transient Areas

Character strings are often stored in the CONST memory area. Typical examples are character strings that are compiled by ", or character strings located in the data field of a definition. Unfortunately, it is not allowed to modify the contents of the CONST memory area at runtime of an application, because this memory area might be a read-only memory. Only character strings in the DATA memory area may be modified. <TRANSIENT expects a character string in the CONST memory area, and returns a copy of it in the DATA memory area. It actually allocates a transient area in the local name space. After string processing is done, TRANSIENT> deallocates the transient area. Here's a simple example:

: FIND-INDEX ( UNSIGNED -- DEFINITION SIGNED )
  [CHAR] 0 SWAP + >R " DUMMY_"
  <TRANSIENT OVER OVER + 1- R> SWAP !
  OVER SWAP 0 0 FIND ROT TRANSIENT> ;
 OK
: DUMMY1 ;
 OK
: DUMMY2 ;
 OK
: DUMMY3 ;
 OK
1 FIND-INDEX . .
-1 DUMMY1 ( -- )  OK
2 FIND-INDEX . .
-1 DUMMY2 ( -- )  OK
3 FIND-INDEX . .
-1 DUMMY3 ( -- )  OK

FIND-INDEX searches the dictionary for definitions with the name DUMMYn, where n is a numerical index between 0 and 9. Note that TRANSIENT> just expects the address of the string's first character on the data stack, because this information is sufficient in order to deallocate the string. <TRANSIENT and TRANSIENT> may be nested.

Block 195: High And Low

HIGH and LOW return the higher and the lower half of a single-cell or a double-cell item, respectively. If the size of a single cell is 16 bits, half a cell is a byte:

HEX A71D DUP HIGH .
A7  OK
LOW .
1D  OK

If HIGH and LOW are applied to double-cell items, the result is a single-cell item:

HEX 700CB57E. DUP HIGH .
700C  OK
LOW .
B57E  OK

Both words are overloaded for single-cell and double-cell items. Note that the output parameters of the single-cell versions have the same data types as the input parameters, whereas the double-cell versions alway return items of data type SINGLE:

HIGH ( SINGLE -- 1ST )
HIGH ( DOUBLE -- SINGLE )
LOW ( SINGLE -- 1ST )
LOW ( DOUBLE -- SINGLE )

As an alternative, HIGH and LOW can also be implemented as colon definitions. However, it's clear that machine code implementations are faster.

Block 196: Square Root Calculation

The two sample words perform the calculation of the square root of an unsigned single or double number, respectively. Both words are almost identical, thus showing how easy it is in strongForth to adapt an existing word to a new data type. SQRT is an overloaded word being applied to unsigned single and to unsigned double numbers:

70 LOAD
 OK
1000 SQRT .
31  OK
200000000. SQRT .
14142  OK

SQRT is an implemention of an iterative algorithm:

x(n+1) = ( x(n) + a/x(n) ) / 2

The iteration starts with x(0) = 255 for single numbers and x(0) = 65535 for double numbers. It stops when the difference between x(n+1) and x(n) is less than 2.

Because of a numeric overflow during the calculation, the results of SQRT are not correct if the radicand is equal to MAX-U (65535) or MAX-UD (4294967295).

Blocks 197 to 198: Permutation

PERMUTATION calculates a permutation of an array of single cells with a given length. If n is the number of single cells, n! different permutations exist. The last input parameter of PERMUTATION specifies the index of the generated permutation. By iterating n from 0 to n!-1, all possible permutations can be produced:

197 LOAD 198 LOAD
 OK
DATA-SPACE 1 VARIABLE M 2 , 3 , 4 ,
 OK
: RESET ( DATA -> SINGLE UNSIGNED -- )
  0 DO I 1+ CAST SINGLE OVER I + ! LOOP DROP ;
 OK
: DISPLAY ( DATA -> SINGLE UNSIGNED -- )
  0 DO DUP I + @ . LOOP DROP CR ;
 OK
: PERMUTATION-ALL ( DATA -> SINGLE UNSIGNED -- )
  DUP FAK 0
  DO OVER OVER RESET
     OVER OVER I PERMUTATION DISPLAY
  LOOP DROP DROP ;
 OK
M 4 PERMUTATION-ALL
1 2 3 4
1 2 4 3
1 3 2 4
1 3 4 2
1 4 3 2
1 4 2 3
2 1 3 4
2 1 4 3
2 3 1 4
2 3 4 1
2 4 3 1
2 4 1 3
3 2 1 4
3 2 4 1
3 1 2 4
3 1 4 2
3 4 1 2
3 4 2 1
4 2 3 1
4 2 1 3
4 3 2 1
4 3 1 2
4 1 3 2
4 1 2 3
 OK

MSWAP and FAK are words used by PERMUTATION. Using local variables in MSWAP is of course not necessary, but it makes things more clear.

It is easy to see that overloaded definitions of MSWAP and PERMUTATION for arrays of double cells or of characters can be produced without having to change anything but the stack diagrams. The consistent use of overloaded definitions for words being applied to different operands makes this possible.

Block 199: Size Of Data Types

SIZEOF is an overloaded definition that returns the size in address units of the item on top of the stack. This is nothing spectacular, but it demonstrates that both interpreter and compiler always know about the size of the items on the stack. They will always chose the right version of SIZEOF:

199 LOAD
 OK
FALSE SIZEOF .
2 OK
DT FLAG SIZEOF .
4 OK

Dr. Stephan Becher - December 27th, 2005