Defining words are words that define other words. Here's a simple example in ANS Forth syntax:
: CONST+ ( n -- ) CREATE , DOES> @ + ; OK 3 CONST+ 3+ OK 4 3+ . 7 OK
You probably see the difficulties stongforth's data type system will encounter. In ANS Forth, the runtime code of DOES> leaves the data field address on the stack, which is then used by @ for fetching the constant 3. In strongForth, the data field address needs to be more specific, because @ needs to know the data type of the item stored at the data field address. But even if this problem was solved by a type cast, the next problem is already waiting for us: + expects two operands on the stack.
The solution is obvious. Since the runtime code of the defining word may be considered as a completely separate word, we may provide it with it's own stack diagram:
CONST-SPACE OK : CONST+ ( INTEGER -- ) CREATE , DOES> ( INTEGER CONST -> INTEGER -- 1ST ) @ + ; OK 3 CONST+ 3+ OK 4 3+ . 7 OK
And it works! The only differences to the ANS Forth version are the stack diagrams. There's only one question: Where does the stack diagram of 3+ come from?
LATEST . 3+ ( INTEGER -- 1ST ) OK3+ obviously has the correct stack diagram, because the data field address with data type CONST -> INTEGER is not visible outside 3+. Again, the solution is pretty obvious. When the defining word CONST+ defines 3+, it simply copies the stack diagram of its own runtime part, except for the last input parameter, which is CONST -> INTEGER in this example. Quite simple, isn't it?
You might wonder why we have to switch to the constant data space at the beginning of the example. The reason is that CONST+ uses , for compiling the constant 3 into the data field of 3+. Since , applies to the current memory space, and a word's data field is always in the constant data space, we have to make the constant data space the current memory space before executing CONST+. If the parameters are accidentally compiled into a different memory space, like for example the code space, the data field remains empty and @ inside 3+ will fetch an undefined value.
The implementation of DOES> is not trivial. Generally, we have to distinguish between three different kinds of words:
The compiling part of the defining word executes the immediate word DOES>, which performs 4 tasks:
Here's the definition of DOES>, plus the memory image of the virtual code generated by it:
: DOES> ( COLON-DIAGRAM -- 1ST ) ?COMPILE CODE-HERE -> DEFINITION [LITERAL] POSTPONE (DOES) POSTPONE ; SPACE@ CODE-SPACE :NONAME SWAP , CONST-HERE DOES, SWAP SPACE! ; IMMEDIATE
LIT |
code address parameter |
(DOES) |
(EXIT) |
The value of the code address parameter is calculated from the current code space pointer, because the data structure mentioned in task 4 starts at this location. The first component of this set is a pointer to the noname definition created by :NONAME, which will become the runtime part of the defining word. The second component is the machine code shared by all words defined by the defining word. It is completely compiled by DOES,. DOES, expects the data field address of the runtime part as a parameter, because this address is part of the machine code:
DOES, ( CONST -- )
Here's the memory image of the data structure that is compiled into the code space:
DEFINITION |
machine code |
We need dedicated machine code for the words created by a defining word, because each defining word has its specific runtime part. The machine code initiates the inner interpreter to execute the runtime part. This requires that a pointer to the data field of the runtime part is compiled into the machine code.
Note that the input parameter of DOES> is an item of data type COLON-DIAGRAM that belongs the to compiling part of the defining word. It is consumed by ;. The output parameter of DOES> is also an item of data type COLON-DIAGRAM, but this one is created by :NONAME and thus belongs to the runtime part of the defining word.
Just like any other word in the strongForth dctionary, the memory image of the defined words is distributed over the name space, the constant data space, and the code space. Because the part in the code space is shared among all words defined by one specific defining word, it is compiled only once during compilation of the defining word. While executing the defining word, CREATE generates the name field, the link field, the attribute field, and the token field. The words between CREATE and DOES> compile the data field. What's still missing are the input and output parameter fields, and the code field. These fields are filled by (DOES):
: (DOES) ( CODE -> DEFINITION -- ) SPACE@ NAME-SPACE SWAP DUP 1+ SWAP @ 0 ENCLOSE-PARAMS >R NULL STACK-DIAGRAM SWAP 0 PARAM@, OVER #PARAMS R> PARAM@, NIP DUP END-DIAGRAM END-CODE SPACE! ;
The input parameter of (DOES) is the address of the data structure that has been compiled into the code space. From this structure, (DOES) obtains the definition of the defining word's runtime part, whose parameter list is the model for the defined words. ENCLOSE-PARAMS returns the parameter indexes of the last (compound) input parameter and of the first output parameter. Starting with a null stack diagram, two instances of PARAM@, compile the input parameter list and the output parameter list of the defined word. END-DIAGRAM compiles the attribute field from the stack diagram built by the two instances of PARAM@,. Finally, END-CODE resets the smudge attribute and compiles the code field as a pointer to the common machine code.
PARAM@, is a simple loop that compiles all input and output parameters between two index positions from an existing definition into a new definition. The index positions refer to the basic data types in the parameter list, starting with 0:
: PARAM@, ( DEFINITION STACK-DIAGRAM UNSIGNED 3RD -- 1ST 2ND ) ?DO OVER I PARAM@ PARAM, LOOP ;
The index positions are partly provided by ENCLOSE-PARAMS:
: ENCLOSE-PARAMS ( DEFINITION UNSIGNED -- 1ST 2ND 2ND ) BEGIN OVER OVER +PARAM OVER #PARAMS OVER > WHILE OVER OVER PARAM@ DT-INPUT ATTRIBUTE? WHILE NIP NIP REPEAT THEN NIP ;
Starting at a given parameter index, ENCLOSE-PARAMS scans the compound data types in the parameter list of a definition, until the fist output parameter or the end of the parameter list is encountered. In general, the parameter list of a definition consist of n input parameters and m output parameters:
Data Type Index | Parameter |
---|---|
0 | Input Parameter 1 |
si(1) | Input Parameter 2 |
... |
... |
si(1) + ... + si(n-2) | Input Parameter n-1 |
si(1) + ... + si(n-1) | Input Parameter n |
si(1) + ... + si(n) | Output Parameter 1 |
si(1) + ... + si(n) + so(1) |
Output Parameter 2 |
... |
... |
si(1) + ... + si(n) + so(1) + ... + so(m-1) |
Output Parameter m |
si(1) + ... + si(n) + so(1) + ... + so(m) |
si(k) and so(k) denote the number of basic data types input parameter k or output parameter k consist of. For example, the index position of the first output parameter is equal to the total number of basic data types in the input parameter list. ENCLOSE-PARAMS steps through the parameter list until reaching the first output parameter or, if the definition has no output parameters, until the end of the parameter list. It then returns the index of the last input parameter (si(1) + ... + si(n-1)), and the index of the position after the last input parameter (si(1) + ... + si(n)). +PARAM is used for moving from one parameter to the next, skipping basic data types until the end of a compound data type:
: +PARAM ( DEFINITION UNSIGNED -- 1ST 2ND ) BEGIN OVER OVER PARAM@ DT-PREFIX ATTRIBUTE? WHILE 1+ REPEAT 1+ ;
The example of CONST+ at the beginning of this chapter is actually a very simple defining word. Another example is PROCREATES, which was already presented in chapter 7. A more unusual application of CREATE ... DOES> is the definition of MARKER:
: MARKER ( -- ) NULL DATA -> SINGLE DUP 8 + SWAP DO I @ CONST, LOOP CREATE DOES> ( CONST -> SINGLE -- ) 9 - NULL DATA -> SINGLE 8 MOVE ;
The compiling part of MARKER copies the first 8 cells of the CONST memory area. These 8 cells contain strongForth's memory space pointers, the current memory space, and the content of LATEST. The runtime part simply restores these pointers and system variables. As a result, all memory allocations, including those for new definitions, are being undone.
What's different with respect to other defining words is the fact that MARKER compiles the data field before creating a new word. Strictly speaking, the 8 cells are located outside of the new word instead of within its data field. The data field itself is empty. The reason for doing it this way is that the pointers and system variables shall be saved before they are modified by the creation of the new word. If the new word is being executed later, it restores the pointers and system variables to the state before its own creation, i. e., the new word deletes its own definition. Even the block of 8 cells in the CONST memory area is de-allocated, because the CONST memory space pointer is contained in the first one of the 8 cells.
Dr. Stephan Becher - November 18th, 2005