Chapter 16: The System And Its Environment

The Endless Interpreter

Upon start-up, strongForth enters the interpreter loop, which repeatedly executes the following steps:

This is what QUIT does. The semantics and the implementation of QUIT is not different from any other Forth system. Actually, QUIT performs some initialization work before entering the interpreter loop, as specified in the ANS Forth standard:

The implementation of QUIT is straightforward:

: QUIT ( -- )
  RP0 RP!
  NULL DATA HANDLER !
  POSTPONE [ LOCALS!
  +0 TO SOURCE-ID
  BEGIN REFILL
  WHILE INTERPRET STATE @ INVERT IF ."  OK" CR THEN
  REPEAT BYE ;

You certainly have noticed that QUIT performs two further initialization steps, which have not yet been mentioned. First, it stores a null pointer into a variable called HANDLER. HANDLER is actually a pointer to the exception frame on the return stack. Details about exception handling can be found in the next section.of this chapter. The second additional step is the initialization of the local name space.

Once QUIT cannot refill the input buffer, it simply exits strongForth by executing BYE. The semantics of BYE is identical to that of the corresponding ANS Forth word:

BYE ( -- )

Since input provided by the user input device typically never ceases, QUIT is actually an endless loop. But it is not only used upon system startup. QUIT may also be used as a simple means for suppressing the OK prompt, and for error recovery. The OK prompt is always displayed after successful interpretation of the input buffer. However, if QUIT is the last word to be interpreted, a new interpreter loop is started before the current one can be finished:

." Stay in the same interpreter loop."
Stay in the same interpreter loop. OK
." Start a new interpreter loop." QUIT
Start a new interpreter loop.

ABORT extends the semantics of QUIT by emptying the data stack as well:

: ABORT ( -- )
  POSTPONE [ SP0 SP! DTP! CR QUIT ;

At this point, we can see a small difference to other Forth systems. In strongForth, emptying the data stack requires emptying the data type heap as well. This is done by DTP!. Executing [ before DTP! ensures that DTP! is applied to the interpreter data type heap.

Whenever an ambiguous condition is recognized, strongForth throws an exception by executing THROW with an appropriate error code. If an exception handler exists, as indicated by the system variable HANDLER pointing to an exception frame on the return stack, the exception is handled by the exception handler. If no exception handler is present, THROW simply executes ERROR. ERROR is not part of the ANS Forth specification, although it is available in many Forth systems. ERROR displays an error message and then executes ABORT:

NULL CCONST -> CHARACTER VARIABLE ERROR-ADDR
NULL UNSIGNED VARIABLE ERROR-COUNT

: ERROR ( SIGNED -- )
  CASE
     -1 OF ENDOF
     -2 OF ERROR-ADDR @ ERROR-COUNT @ TYPE ENDOF
     CR SOURCE DROP >IN @ DECIMAL -TRAILING TYPE
     ."  ? ERROR " DUP . CR POSTPONE .S
  ENDCASE ABORT ;

ERROR expects an error code of data type SIGNED on the stack. -1 means that no message is displayed. -2 means that the string specified by the system variables ERROR-ADDR and ERROR-COUNT is displayed. For all other values of the error code, the error message consists of three parts:

The most common error message you will see is probably the one that indicates an undefined word:

STATE BASE - .

STATE BASE - ? ERROR -13
DATA -> FLAG DATA -> UNSIGNED

In chapter 17, a version of ERROR is presented that uses blocks to display a more descriptive error message. With this version, the same error condition would produce the following error message.

STATE BASE - .

STATE BASE - ? undefined word
DATA -> FLAG DATA -> UNSIGNED

The reason why you'll see this specific error message more often than in an ANS Forth system is simply the fact that finding a word in strongForth's dictionary is bound to an additional condition. In an ANS Forth system, the only condition for finding a word is that its name matches the parsed name. In strongForth, a word will only be found if its name matches and its input parameters match the contents of the data type heap. In the above example, none of the overloaded versions of - fits:

- ( CADDRESS 1ST -- INTEGER )
- ( ADDRESS -> DOUBLE 1ST -- INTEGER )
- ( ADDRESS -> SINGLE 1ST -- INTEGER )
- ( ADDRESS 1ST -- INTEGER )
- ( CFAR-ADDRESS INTEGER -- 1ST )
- ( FAR-ADDRESS -> DOUBLE INTEGER -- 1ST )
- ( FAR-ADDRESS -> SINGLE INTEGER -- 1ST )
- ( FAR-ADDRESS INTEGER -- 1ST )
- ( INTEGER-DOUBLE SIGNED -- 1ST )
- ( INTEGER-DOUBLE UNSIGNED -- 1ST )
- ( INTEGER-DOUBLE INTEGER-DOUBLE -- 1ST )
- ( CADDRESS INTEGER -- 1ST )
- ( ADDRESS -> DOUBLE INTEGER -- 1ST )
- ( ADDRESS -> SINGLE INTEGER -- 1ST )
- ( ADDRESS INTEGER -- 1ST )
- ( INTEGER INTEGER -- 1ST )

Even the third version in this list fails to match, because the second address has not the same data type as the first one.

Exception Handling

StrongForth supports ANS Forth's Exception and Exception Extension word sets. Exception handling in ANS Forth is based on the two words CATCH and THROW. CATCH creates an exception frame and executes a token. If no exception was thrown during execution of the token, CATCH removes the exception frame and continues execution. If, on the other hand, an exception is thrown before the execution of the token is finished, the exception frame is removed by THROW. THROW ensures that the flow of execution is continued at the same point where CATCH would have returned if the execution of the token had terminated normally.

Specialized CATCH

The ANS Forth stack diagram of CATCH looks quite similar to the one of EXECUTE, because both words execute a token:

EXECUTE ( i*x xt -- j*x )
CATCH   ( i*x xt -- j*x 0 | i*x n )

In order to keep the consistency of strongForth's data type system, CATCH needs to consider the stack effect of executing the token. But the actual value of the token at runtime is not yet known at compile time. This is the same problem as with EXECUTE, and consequently, it is resolved in the same way. If the token to be provided to CATCH is a qualified token, the compiler knows its stack effect. Does this mean that we need to define a separate version of CATCH for each qualified token, just as with EXECUTE? No, this is not necessary. You will see later in this section how this problem is resolved.

But there's another problem. CATCH as specified by ANS Forth does not have unique output parameters. If no exception is thrown during execution of the token, CATCH has the same stack effect as if the token had been executed by EXECUTE, plus an additional output parameter of data type SIGNED. But if an exception is thrown, the depth of the data stack is supposed to remain unchanged. For example, if the stack effect of the token is

( addr u -- flag )

the stack effect of CATCH would be

( addr u token -- flag 0 | addr u n )

with output parameters addr u having undefined values. Since strongForth cannot handle ambiguous stack diagrams, CATCH needs to have the same stack effect in both cases, i. e.,

( addr u token -- flag n )

This is an important deviation from the ANS Forth standard.

The implementation of one universal version of CATCH that can be applied to all qualified tokens is pretty complicated. Therefore, let's start the easy way by defining a version of CATCH for one specific qualified token:

NULL DATA VARIABLE HANDLER
( LOGICAL UNSIGNED -- 1ST )PROCREATES (LU--1)

: CATCH ( LOGICAL UNSIGNED (LU--1) -- 1ST SIGNED )
  >IN @ >R
  SOURCE-ID >R
  SP@ -> SINGLE 1+ >R
  HANDLER @ >R
  RP@ HANDLER !
  EXECUTE
  R> HANDLER !
  R> DROP
  R> DROP
  R> DROP
  +0 ;

CATCH creates an exception frame on the return stack, whose memory image looks like this (from high to low addresses):

value of >IN
value of SOURCE-ID
data stack pointer after CATCH
pointer to previous exception frame

The top two entries represent the input source specification at the point immediately before the execution of CATCH. Storing the input source specification is necessary, because it needs to be restored if an exception is thrown during execution of the token. The next entry is the value of the data stack pointer after CATCH is done. Because the depth of the data stack after executing CATCH is one cell less than immediately before executing CATCH, the size of one cell has to be added to the present value of the data stack pointer. Of course, the net effect on the data stack depth depends on the stack effect of the qualified token. For (LU--1), the net effect is "one cell less". The final entry in the exception frame is the present value of the system variable HANDLER, i. e., a pointer to the previous exception frame. CATCH stores a pointer to the current exception frame in HANDLER, where it can be obtained by THROW. Saving the old value of variable HANDLER in the exception frame ensures that CATCH can be nested.

After creating the exception frame, CATCH executes the token. If no exception is being thrown, CATCH simply removes the exception frame from the return stack and pushes zero as SIGNED on the data stack to indicate that execution terminated normally.

Now, what happens if somewhere during the execution of the token, an exception is thrown? An exception is thrown by executing THROW with an appropriate error code. THROW uses the data in the exception frame to restore data and return stack and continue execution at exactly the same point as CATCH would have done if the execution had terminated normally. Here's the definition of THROW:

: THROW ( SIGNED -- )
  DUP
  IF HANDLER @ 0=
     IF ERROR
     ELSE HANDLER @ RP!
        RP@ -> DATA @ HANDLER !
        RP@ -> SIGNED !
        RP@ -> DATA -> SIGNED 1+ @ 1+ SP!
        RP@ -> SIGNED @ ( SIGNED -- )CAST
        RP@ 2 CELLS + -> SIGNED @ TO SOURCE-ID
        RP@ 3 CELLS + -> UNSIGNED @ >IN !
       (RDROP) (RDROP) (RDROP) (RDROP)
     THEN
  ELSE DROP
  THEN ;

THROW does nothing if the error code of data type SIGNED is zero. Otherwise, it checks the value of HANDLER. If no exception frame exists, HANDLER still contains the null pointer it has been initialized with by QUIT. In this case, the exception handling is actually done by ERROR. In the other case, i. e., if HANDLER contains a valid pointer to an exception frame, THROW starts with cleaning up the return stack by making the return stack pointer point to the latest exception frame. The first entry in the exception frame is a pointer to the previous exception frame, or a null pointer if no previous exception frame exists. The semantics of

RP@ -> DATA @

is actually the same as the one of R@ in ANS Forth. It pushes a copy of the top of the return stack to the data stack. In strongForth, R@ is not available at this point, because R@ is a local variable created by >R. Since THROW accesses a cell on the return stack that has been placed there by a different word (CATCH), a low-level phrase with a type cast has to be used instead.

After the previous value of variable HANDLER has been restored, THROW reuses the first cell of the exception frame to store a temporary copy of the error code. This is necessary, because THROW's next action, restoring the data stack pointer, will make the error code unavailable. The second entry of the exception frame is the calculated value of the data stack pointer if CATCH had returned normally. This value has to be corrected by an offset of one cell for the error code. After restoring the data stack pointer, THROW retrieves the error code from the exception frame and uses a type cast in order to forget about it at compile time. According to its stack diagram, THROW may not leave anything on the data stack at all. Instead, THROW is actually returning the error code in the name of CATCH.

The contents of the next two cells of the exception frame are used to restore the input source specification to the state immediately before the corresponding CATCH was executed. The last thing THROW has to do is removing the exception frame from the return stack. Again, it has to use low-level words for this purpose, because R> would only work if THROW had created the exception frame itself.

CATCH As CATCH Can

Now, let's get back to the problem of implementing a universal version of CATCH. Remember that the simplified version, which was presented at the beginning of this section, only applies to one specific qualified token. An equivalent version for a different qualified token would only differ in two places:

  1. The stack diagram, which is the same as the one for the corresponding version of EXECUTE, plus an additional output parameter of data type SIGNED.
  2. The net effect on the data stack depth, which is incorporated into the above version of CATCH as 1+.

In strongForth, different stack diagrams cannot be assigned to one word. But it is possible to define a universal version of CATCH as a state-smart immediate word, which takes care of the stack effect and calculates the net effect on the data stack depth. This word may then compile or execute a low-level word with a generic stack effect and the net effect on the data stack depth as a parameter. Here's the definition of this low-level word:

: (CATCH) ( TOKEN INTEGER -- SIGNED )
  >IN @ >R SOURCE-ID >R
  SP@ -> SINGLE SWAP + >R HANDLER @ >R RP@ HANDLER !
  (EXECUTE)
  R> HANDLER ! R> DROP R> DROP R> DROP +0 ;

It looks very similar to the previous version of CATCH for qualified tokens of data type (LU--1). The stack effect of executing the token has been removed, and EXECUTE has been replaced by (EXECUTE). An additional input parameter of data type INTEGER contains the net effect on the data stack depth. (CATCH) is compiled or executed by the state-smart, immediate word CATCH:

: CATCH ( -- )
  " EXECUTE" TRANSIENT 0 4 FIND
  IF DEPTH-SP SWAP STATE @ DT>DT DROP DEPTH-SP -
     STATE @
     IF [DT] TOKEN >DT [LITERAL] POSTPONE (CATCH)
     ELSE ( UNSIGNED -- TOKEN INTEGER )CAST (CATCH)
        ( SIGNED -- )CAST [DT] SIGNED >DT
     THEN
  ELSE DROP -13 THROW
  THEN ; IMMEDIATE

CATCH does not have an explicit stack diagram, because it calculates its stack effect dynamically. It first tries to find a version of EXECUTE that matches the contents of the data type heap. DT>DT applies EXECUTE's stack diagram to the interpreter or compiler data type heap, depending on the value of system variable STATE. The net effect of EXECUTE's stack diagram on the data stack depth is calculated by subtracting the results of DEPTH-SP before and after applying the stack diagram. Remember that DEPTH-SP returns the depth of the data stack in cells based on the contents of the data type heap.

From now on, CATCH has to distinguish between interpretation and compilation state. In order to compile (CATCH), two parameters of data types TOKEN and INTEGER have to be provided. Data type TOKEN is just added to the compiler data type heap, because the qualified token is still on top of the data stack. [LITERAL] compiles the calculated net effect on the data stack depth as a literal of data type UNSIGNED, which is a subtype of INTEGER. Finally, (CATCH) is compiled.

Messing around with the data type heap might look somewhat confusing to you. To make this easier to understand, here are the contents of the compiler data type heap during the execution of CATCH with a qualified token of data type (LU--1):

Immediately before DT>DT:  LOGICAL UNSIGNED (LU--1)
Immediately after DT>DT:  LOGICAL
Immediately before POSTPONE:  LOGICAL TOKEN UNSIGNED
Immediately after POSTPONE:  LOGICAL SIGNED

If CATCH is used in interpretation state, the data type manipulations are different. Because CATCH does not have a stack diagram, a type cast is required to make the qualified token visible. For the same reason, another type cast has to remove data type SIGNED afterwards. But since (CATCH) actually returns an item of data type SIGNED, data type SIGNED is then manually pushed to the interpreter data type heap.

To summarize, the two type casts before and after (CATCH) just correct the obvious mistake that CATCH is defined without a stack diagram. Since state-smart words generally have different stack effects in interpretation and compilation state, their implementation in strongForth is often difficult. The necessary data type manipuations can make state-smart words pretty complicated.

An Example

This section contains a small example about how exception handling may be used in strongForth. First, we define a new version of / that throws an exception if the divisor is zero:

: / ( UNSIGNED-DOUBLE UNSIGNED -- 1ST )
  DUP 0= IF DROP -10 THROW ELSE / THEN ;
 OK

To be able to provide CATCH with the token of this version of /, we need to create a suitable qualified token:

( UNSIGNED-DOUBLE UNSIGNED -- 1ST )PROCREATES (UDU--1)
 OK

We can now try to catch exceptions thrown by /:

605686950. 825 DT (UDU--1) >TOKEN / CATCH .S . .
UNSIGNED-DOUBLE SIGNED 0 734166  OK
605686950. 0 DT (UDU--1) >TOKEN / CATCH .S . .
UNSIGNED-DOUBLE SIGNED -10 605686950  OK

In the first case, / does not throw an exception, because the divisor is positive. CATCH returns 0 to indicate that the operation terminated normally. In the second case, the divisor is zero, and CATCH returns -10 as the error code. The result of the division is undefined, because the operation could not be terminated. Remember that strongForth's version of CATCH has a unique stack diagram, no matter whether the operation terminated normally or not.

The Exception Extension Word Set

StrongForth supports the ANS Forth Exception Extension word set, which consists of the words ABORT and ABORT". ABORT is defined exactly as suggested in the standard:

: ABORT ( -- )
  -1 THROW ;

If ABORT executes without an exception frame being present, THROW simply executes ERROR, which in turn does nothing else but executing the version of ABORT from the Core word set. The version from the Core word set is only used by ERROR. In the strongForth dictionary, it is hidden by the version from the Exception Extension word set.

ABORT" is an immediate word that parses a string at compile time. Its runtime semantics is performed by the internal word (ABORT"):

: (ABORT") ( SINGLE CCONST -> CHARACTER UNSIGNED -- )
  ROT
  IF ERROR-COUNT ! ERROR-ADDR ! -2 THROW
  ELSE DROP DROP
  THEN ;

: ABORT" ( -- )
  POSTPONE " POSTPONE (ABORT") ; IMMEDIATE
" is itself an immediate word that parses a string terminated by " (double-quote), and compiles it as a string literal to be used by (ABORT"). During runtime, (ABORT") checks the value of the parameter of data type SINGLE. If this parameter is non-zero, (ABORT") stores the address and the length of the string in variables ERROR-ADDR and ERROR-COUNT, respectively, and then throws an exception with error code -2. Exception handling is then responsible for processing the string. For example, ERROR displays this string before it executes ABORT. Here's a simple example:

: MAKE-UNSIGNED ( SIGNED -- UNSIGNED )
  DUP 0< ABORT" Negative value!" CAST UNSIGNED ;
 OK
+56 MAKE-UNSIGNED .
56  OK
-56 MAKE-UNSIGNED .
Negative value!
SEE MAKE-UNSIGNED
: MAKE-UNSIGNED ( SIGNED -- UNSIGNED )
DUP 0< " Negative value!" (ABORT") ;  OK

Environment Queries

Although strongForth is definitely not fully ANS Forth compliant, a basic implementation of environment queries with ENVIRONMENT? is included. But again, ENVIRONMENT? is one of those words with ambiguous stack diagrams. ANS Forth specifies

( c-addr u -- false | i*x true )

as the stack diagram of ENVIRONMENT?. In strongForth, the first approximation is

ENVIRONMENT? ( CDATA -> CHARACTER UNSIGNED -- DOUBLE FLAG )

Only one attribute of data type DOUBLE is returned for all environment queries. If necessary, the attribute has to be casted to a more appropriate data type, like in the following example:

PARSE-WORD /PAD ENVIRONMENT? . CAST UNSIGNED .
TRUE 84  OK

Furthermore, strongForth's version of ENVIRONMENT? returns a dummy attribute even if the keyword is unknown. The dummy attribute is always zero:

PARSE-WORD GARBAGE ENVIRONMENT? .S
DOUBLE FLAG  OK
. .
FALSE 0  OK

In the present version of strongForth, ENVIRONMENT? is static. I. e., the attributes are constant, and it is not possible to add new attributes. It is intended to change this behaviour in future versions.

If ENVIRONMENT? is used within a colon definition, you have to consider that the string is expected in the DATA memory area. Strings compiled by " always reside in the CONST memory area, because they are constants. In order to compile environment queries, you have to copy the string to a transient area:

: TEST ( -- )
  ." StrongForth " " LOCALS" ENVIRONMENT?

  ." StrongForth " " LOCALS" ENVIRONMENT? ? undefined word
CCONST -> CHARACTER UNSIGNED
: TEST ( -- )
  ." StrongForth " " LOCALS" TRANSIENT ENVIRONMENT?
  SWAP CAST FLAG AND
  IF ." supports "
  ELSE ." does not support "
  THEN ." the LOCALS word set." ;
 OK
TEST
StrongForth supports the LOCALS word set. OK

Dr. Stephan Becher - November 30th, 2005