From anton Mon Dec 31 18:11:13 2018
X-newsreader: xrn 10.00-beta-3
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Path: mips.complang.tuwien.ac.at!anton
Newsgroups: comp.lang.forth
Subject: FORWARD in Gforth
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Date: Mon, 31 Dec 2018 15:17:43 GMT
Message-ID: <2018Dec31.161743@mips.complang.tuwien.ac.at>

This posting discusses the implementation of FORWARD in Gforth, which
may be an unnecessary feature, but the implementation is still
interesting, as is shown by the fact that we came up with four
different implementations in the span of a few days.

Why FORWARD may be unnecessary:

We have DEFERred words, which are the good-enough solution for
implementing indirect recursion (which is the point of having
FORWARD).  DEFER causes a little overhead, which IMO is not enough to
justify FORWARD, but others may disagree.  In particular, the
instruction sequence executed in gforth-fast on AMD64 is

with DEFER                           with FORWARD:
lit-perform: mov     rax,[rbx]       call: mov     rax,rbx      
             add     rbx,$08               mov     rbx,[rbx]    
             mov     rcx,[rax]             mov     rdx,r13      
             mov     rdx,[rcx]             add     rax,$08      
             mov     rax,rdx               lea     r13,-$08[r13]
             jmp     eax                   mov     -$08[rdx],rax
docol:       mov     rax,r13               add     rbx,$08      
             mov     rdx,rbx               mov     rdx,-$08[rbx]
             lea     r13,-$08[r13]         mov     rax,rdx      
             lea     rbx,$18[rcx]          jmp     eax          
             mov     -$08[rax],rdx
             mov     rdx,-$08[rbx]
             mov     rax,rdx
             jmp     eax

The other issue is the syntax:

defer foo
: bar ... foo ... ;
:noname ... bar ... bar ... ; is foo

There is at least one complaint about this syntax:
<http://forth-standard.org/standard/core/DEFER#contribution-22>


What if we add FORWARD anyway?

The FORWARD syntax we use is:

FORWARD foo
: bar ... foo ... ;
: foo ... bar ... bar ... ;

I.e., the definition of foo automatically resolves the earlier forward
definition (in the same wordlist). Let's look at an implementation
that implements just this syntax, but otherwise is the same as the
DEFER solution:

s" unresolved forward definition" exception constant unresolved-forward
s" forward must be resolved with :" exception constant forward-needs-:

: unresolved-forward-error ( -- )
    unresolved-forward throw ;

: forward ( "name" -- )
    \g Defines a forward reference to a colon definition.  Defining a
    \g colon definition with the same name in the same wordlist
    \g resolves the forward references.  Use @code{.unresolved} to
    \g check whether any forwards are unresolved.
    defer ['] unresolved-forward-error lastxt defer! ;

: is-forward? ( xt -- f )
    \ f is true if xt is an unresolved forward definition
    dup >code-address dodefer: = if
        defer@ ['] unresolved-forward-error = exit then
    drop false ;

: auto-resolve ( addr u wid -- )
    \G auto-resolve the forward reference in check-shadow
    dup 2over rot find-name-in dup if
	dup is-forward? if
	    latestxt >code-address docol: <> forward-needs-: and throw
            latestxt swap defer! 2drop drop exit then then
    drop defers check-shadow ;

' auto-resolve is check-shadow

EXCEPTION ( c-addr u -- n ) produces a new exception number, that,
when thrown, reports the string as error.

FIND-NAME-IN ( c-addr u wid -- nt|0 ) searches for the name c-addr u
in the wordlist wid.  This produces an nt, but it is used as xt here
(this is possible in Gforth, as long as no ALIASes or somesuch are
involved).

>CODE-ADDRESS ( xt -- addr ) produces essentially a token that
identifies how the xt was defined.  E.g., if it was defined with :, it
produces the value produced by DOCOL: ( -- addr ).  While it is not
necessary to resolve the forward word with a colon definition in this
implementation, this check prepares the implementation for the
stricter requirements of the following versions:

LATESTXT produces the xt of the latest definition (in this case, the
definition that resolves the forward word).

CHECK-SHADOW ( addr u wid -- ) is a deferred word for checking whether
a new definition shadows an old one.  It is called whenever a word is
REVEALed.  DEFERS CHECK-SHADOW produces a call to the previous
behaviour of CHECK-SHADOW.  AUTO-RESOLVE is then plugged into
CHECK-SHADOW, such that forward words are resolved (by changing the
deferred word) if they would be shadowed, but for non-forward words,
the normal CHECK-SHADOW behaviour is performed.

Let's add the code for eliminating the defer overhead in most cases:

: unfixed-forward ( ... -- ... )
    \ the compiled code for a forward child branches to this word,
    \ which patches the call to the forward word; if the word has not
    \ been resolved, this produces an error after patching the call to
    \ report an error.
    r@ cell- {: target :}
    target @ cell- @ dup >body target !
    execute-exit ;

: forward ( "name" -- )
    \g Defines a forward reference to a colon definition.  Defining a
    \g colon definition with the same name in the same wordlist
    \g resolves the forward references.  Use @code{.unresolved} to
    \g check whether any forwards are unresolved.
    defer ['] unresolved-forward-error lastxt defer!
    ['] branch peephole-compile, ['] unfixed-forward >body ,
    [: ['] call peephole-compile, >body cell+ , ;] set-optimizer ;

This makes use of Gforth's threaded-code basis.  A forward word FOO
now looks as follows:

        FOO header
  xt -> dodefer
        unused
body -> xt (of unresolved-forward-error or of the resolution)
  tc -> branch (threaded code fragment)
        unfixed-forward (body)

The middle line of FORWARD produces the branch and its target.
PEEPHOLE-COMPILE, ( xt -- ) is the code that COMPILE, calls for
primitives.

The last line of FORWARD means that, when FOO is COMPILE,d, this
compilation produces the following code:

call
foo-tc

SET-OPTIMIZER ( xt1 -- ) changes the last defined word (our forward
word in this case) to call xt1 ( xt2 -- ) when this word is COMPILE,d,
with xt2 being the xt passed to COMPILE,.  In this case we compute
foo-tc from it with >BODY CELL+.

When, at a later time (and usually after the forward is resolved), the
call is run, it calls UNFIXED-FORWARD, which uses the return address
to find the call, and the forward word FOO, and the xt stored there.
It patches the call to directly call the code of the xt.  And it also
runs the xt with EXECUTE-EXIT, which does what it says, except that it
deals with the return address before performing the EXECUTE part.

In order to find unresolved words, we also have:

: .unresolved ( -- )
    \G print all unresolved forward references
    [: [:   replace-sourceview >r dup name>view @ to replace-sourceview
	    dup is-forward? [: dup .name ." is unresolved" cr ;] ?warning
	    r> to replace-sourceview
            drop true ;] swap traverse-wordlist ;] map-vocs ;

I won't go into the details of this word, but note the use of
quotations and TRAVERSE-WORDLIST.  These are useful features.


Why did we need three earlier versions?

The first version
<http://git.savannah.gnu.org/cgit/gforth.git/tree/forward.fs?id=c57e74c24306e92854a282729d8881eccbefe46e>
defined forward words as CREATEd words with a special SET-OPTIMIZER
behaviour.  This is not a good idea, because what COMPILE, compiled
was different from the behaviour of the word when EXECUTEd.  And
consequently, this did not work for gforth-itc with "," as COMPILE,
(which ignores what SET-OPTIMIZER says).

Another difference in the first version was that the calls were
eagerly resolved, not lazily as above: The calls were organized as a
linked list, and once the target became known, they were all patched.

The second version
<http://git.savannah.gnu.org/cgit/gforth.git/tree/forward.fs> defines
FOO as an immediate create...does> child, again with eager resolution.
This worked with the intelligent compile, as well as with ",", but did
not work when the user did something like "['] foo compile," (because
that sequence ignores the immediacy).

The third version
<http://git.savannah.gnu.org/cgit/gforth.git/tree/forward1.fs> defines
FOO as a colon definition followed by a cell that contains 0 or the
resolution.  The colon definition calls (a differently-defined)
UNFIXED-FORWARD, which performs lazy patching.  The downside of this
approach was that every non-compiled use (e.g., through EXECUTE or a
DEFERred word) would check every time whether it can patch the call,
and find that it cannot; but next time it would incur that overhead
again.

So we decided to implement the fourth version described above, i.e.,
with DEFER, SET-OPTIMIZER and lazy patching.  You can find it in
<http://git.savannah.gnu.org/cgit/gforth.git/tree/forward2.fs>.  Note
that this is a proper use of SET-OPTIMIZER (unlike in the first
version): You can leave the SET-OPTIMIZER part away (e.g., by using
"," as COMPILE,), and it will still work as intended (and the overhead
is not too bad, as discussed in the beginning).

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2018: http://www.euroforth.org/ef18/