Some Forth header structure ideas

From anton Tue Sep 14 11:24:02 2004
X-newsreader: xrn 9.03-beta-14
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Path: mips.complang.tuwien.ac.at!anton
Newsgroups: comp.lang.forth
Subject: header structure for intelligent COMPILE, and compilation token
Distribution: 
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Keywords: 
Date: Tue, 14 Sep 2004 07:42:54 GMT
Message-ID: <2004Sep14.094254@mips.complang.tuwien.ac.at>

The recent threads "Dual to triple XTs" and "Refactoring the Forth
interpreter" have made me think about how the header structure of a
system with an intelligent COMPILE, and a compilation token would best
look like.  Here's the result:

A) Nameless words and xt structure (for the intelligent COMPILE,):

This has already been mostly discussed before, but for completeness I
cover it here, too.  The data structure of a nameless word would look
like:

comp field
code field             <-- xt is the code field address (cfa)
parameter field (body)

The xt points to the cfa and the fields are arranged to minimize
breakage of code that assumes traditional implementation practices.

The comp field is one cell and contains the xt of a word that COMPILE,
uses to compile the word.  The definition of COMPILE, is:

: >comp ( cfa -- compfa )
  1 cells - ;

: compile, ( xt -- )
  dup >comp @ execute ;

In a traditional indirect threaded implementation, one can simply put
the xt of "," in every comp field.  But the point of this field is
that one can use different comp field contents.  E.g., for a constant,
one might want to put the xt of the following word in the comp field:

: const-compile, ( xt -- )
  execute            \ get the value of the constant
  POSTPONE literal ; \ and compile it

This compiles the value of the constant instead of its xt.

The code field and parameter fields can be organized in whatever way
is convenient (direct or indirect threaded, or something else), just
as in a system without intelligent COMPILE,.

B) Named words and compilation tokens:

This stuff has often been called dual-xt (by me and others), but in
practice it turns out to be unpractical to represent the compilation
semantics with a single-cell xt.  Instead, it's more practical to
represent the compilation semantics with a double-cell Compilation
Token (ct); so a more accurate name would be "xt/ct words".

Anyway, the internal structure of a ct is "xt1 xt2", and when you EXECUTE
the ct, the compilation semantics of the word is performed.

The data structure for a named word would look like:

link field
name field
xt2 field
comp field
code field             <-- cfa (xt1)
parameter field (body)

The new field here is the xt2 field (1 cell).  It contains the xt
that consitutes the second cell of the compilation token.  The cfa of
the word consitutes the first cell of the compilation token.

For a normal word (default compilation semantics) the xt2 field
contains the xt of COMPILE,.  So, the compilation token of, e.g., +
would look like

' + ' COMPILE,

and EXECUTEing it would result in compiling +.

For an immediate word the xt2 field contains the xt of EXECUTE.  So,
the compilation token of, e.g., "\" would look like:

' \ ' EXECUTE

and EXECUTEing would result in executing "\", as it should.

For a word with other compilation semantics (e.g., S"), the xt2 field
is for a word that performs these compilation semantics (what this
word does with the xt1 part is up to the word; it can just drop it, or
it can use it for, e.g., accessing the body.  E.g., for S" the xt2
would contain the xt of the word:

: drop-s"-comp ( compilation: xt1 "ccc<">" -- )
               ( run-time: -- c-addr u )
  drop
  [char] " parse
  postpone sliteral ;

The ordinary part of the word would be for implementing the
interpretation semantics.  A syntax for that might look like:

: s" ( interpretation: "ccc<">" -- c-addr u )
  [char] " parse copy-to-buffer ;
' drop-s"-comp compilation!

Words without interpretation semantics would look like this:

: if-comp drop ... compilation semantics ... ;
: if -14 throw ; \ Interpreting a compile-only word
' if-comp compilation!

The INTERPRET/COMPILE: syntax would be unnatural for this
implementation, but it is still implementable.

The [COMPILE] problem discussed earlier (how to know if a word has
default compilation semantics) would be dealt with by fiat: Every word
with default compilation semantics must use ' COMPILE, as its xt2;
then [COMPILE] can decide what to do by checking xt2.  For doing
optimizations, just use COMPILE, in xt2 and do the optimization
through the intelligent COMPILE, (i.e., through the comp field).

The reason that a double-cell ct is more practical, is that the vast
majority of words is normal or immediate, so it is not necessary to
define separate words for their compilation semantics.  In contrast, a
single-cell ct would have required a separate compilation semantics
definition for every named word (and to avoid endless recursion, these
words should be nameless).  


These data structures are not as symmetrical as the concepts of the
separate-semantics words and the intelligent COMPILE, would suggest
(i.e., the first idea was to have 1 named word -> 2 xts, and 1 xt -> 2
code fields), but that mirrors the asymetry of usage.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
EuroForth 2004: http://www.complang.tuwien.ac.at/anton/euroforth2004/
  Deadline (refereed): August 28, 2004; Conference: November 19-21, 2004


From anton Wed Sep 15 11:38:47 2004
X-newsreader: xrn 9.03-beta-14
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Subject: Re: header structure for intelligent COMPILE, and compilation token
Path: mips.complang.tuwien.ac.at!anton
Newsgroups: comp.lang.forth
Distribution: 
References: <2004Sep14.094254@mips.complang.tuwien.ac.at> <b57b10b6.0409141358.480f6b03@posting.google.com>
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Keywords: 
Date: Wed, 15 Sep 2004 08:30:43 GMT
Message-ID: <2004Sep15.103043@mips.complang.tuwien.ac.at>

alex_mcd@btopenworld.com (Alex McDonald) writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<2004Sep14.094254@mips.complang.tuwien.ac.at>...
>
>===snipped
>
>> 
>> B) Named words and compilation tokens:
>> 
>> This stuff has often been called dual-xt (by me and others), but in
>> practice it turns out to be unpractical to represent the compilation
>> semantics with a single-cell xt.  Instead, it's more practical to
>> represent the compilation semantics with a double-cell Compilation
>> Token (ct); so a more accurate name would be "xt/ct words".
>> 
>> Anyway, the internal structure of a ct is "xt1 xt2", and when you EXECUTE
>> the ct, the compilation semantics of the word is performed.
>> 
>> The data structure for a named word would look like:
>> 
>> link field
>> name field
>> xt2 field
>> comp field
>> code field             <-- cfa (xt1)
>> parameter field (body)
>> 
>> The new field here is the xt2 field (1 cell).  It contains the xt
>> that consitutes the second cell of the compilation token.  The cfa of
>> the word consitutes the first cell of the compilation token.
>
>I think this would need to be modified for systems like Win32Forth
>where the data structure is split accross two distinct areas.

Certainly, this was just an example; you can adapt it to your system
in any way that is convenient.

>I've not checked, but 
>
>link field
>name field
>xt2 field
>comp field
>ptr-to-xt field
>
>code field  <-- cfa (xt1)
>parameter field (body)
>
>might still work (with an overhead of 2 cells per named word). I need
>to check.

You should put the comp field with the code field (if you want to have
a comp field); it is needed for unnamed words, too.

This gives us:

link field
name field
xt2 field
xt1 field

comp field
code field  <-- cfa (xt1)
parameter field (body)

Swapping the xt1 and xt2 fields may increase the backwards
compatibility; OTOH, the present arrangement allows you to fetch the
compilation token with 2@.

Another alternative would be to arrange the fields like this:

link field
name field
xt1 field

xt2 field
comp field
code field  <-- cfa (xt1)
parameter field (body)

The difference would be in the possibilities for aliases (this would
only allow total aliases (i.e., where both interpretation and
compilation semantics are the same as for the original word), and in
the space consumption of aliases.

I would prefer the other arrangement (xt2 with the name), though: It
gives more flexibility for aliasing, and the space separation follows
the functionality separation between named/unnamed words.

>> Words without interpretation semantics would look like this:
>> 
>> : if-comp drop ... compilation semantics ... ;
>
>No IMMEDIATE required? I must admit to seeing it as a good indicator
>of compilation semantics.

IF-COMP is a normal word (not a compile-only word).  It's used as a
helper for defining IF.  Since it is used like ' IF-COMP, its
immediacy does not matter.

>> : if -14 throw ; \ Interpreting a compile-only word
>> ' if-comp compilation!
>
>In trying out your previous "dual-xt" I much preferred a parsing word
>for your compilation! (which presumably uses some side-effect
>last-whatever variable from : ) as in
>
>: compiles ' >xt2 ! ;
>
>Having them separate and much later in the source made it much easier
>to control when the xt2 is used. However, given the split headers and
>this method, it's problematic finding xt2 given the xt.

Well, with separate names you should use a name token (instead of the
xt) for identifying named words anyway.  Even with in-line names,
using a name token for named words is a good idea (e.g., getting from
the code field to the link field/name field is not guaranteed to work
in Gforth, even if the word has an inline name (and many words
don't)).

So, sure, you could do things like:

: if -14 throw ;
...
' if-comp s" if" find-name name>xt2 !

Then your COMPILES would be defined as:

: compiles ( xt "name" -- )
  parse-word find-name name>xt2 ! ;

And other syntaxes are possible as well, e.g.:

: if ( ... -- ... )
  -14 throw
compilation>
  drop ... compilation semantics ... ;

or

:compilation if
  drop ... compilation semantics ... ;

:compilation S"
  ... compilation semantics
interpretation>
  ... interpretation semantics ;

>> The [COMPILE] problem discussed earlier (how to know if a word has
>> default compilation semantics) would be dealt with by fiat: Every word
>> with default compilation semantics must use ' COMPILE, as its xt2;
>> then [COMPILE] can decide what to do by checking xt2.  For doing
>> optimizations, just use COMPILE, in xt2 and do the optimization
>> through the intelligent COMPILE, (i.e., through the comp field).
>
>You'll need to expand on this for me. I find [COMPILE] one of the most
>confusing words in the Forth lexicon.

[COMPILE] was originally designed for a classical system that has just
an immediate flag.  It's supposed to compile a word as if it was not
immediate (whether it was immediate or not).  In the ANS Forth
standard they described this concept as:

|6.2.2530 [COMPILE]
|bracket-compile CORE EXT
|
|	 Intrepretation: Interpretation semantics for this word are undefined.
|
|	 Compilation: ( "<spaces>name" -- )
|
|Skip leading space delimiters. Parse name delimited by a space. Find
|name. If name has other than default compilation semantics, append
|them to the current definition; otherwise append the execution
|semantics of name. An ambiguous condition exists if name is not found.

For words with default compilation semantics, and for immediate words,
this does what it has always done.  For other words, e.g., IF, it
basically treats them as immediate, which also fits with the
traditional behaviour (because traditionally, these words were
immediate).

The problem here is: If you have user-defined "other" words (e.g.,
with the implementation above, words that have neither EXECUTE nor
"COMPILE," in the xt2 field), how do you know whether words have the
default compilation semantics?

If the xt2 field of a word X contains the xt of "COMPILE,", you are
certain that X has default compilation semantics.  If it contains some
other xt, that xt might still implement the default compilation
semantics (this would typically happen if someone tried to implement
an optimization through a custom xt2 field).

My solution is to simply forbid putting xts (other than the xt of
"COMPILE,") in the xt2 field that implement the default compilation
semantics; if someone wants to do an optimization, they should use the
comp field (with the added benefit that the optimization will be
applied in more cases).

With this requirement in place, [COMPILE] could be implemented like
this:

: postpone, ( ct -- )
  swap postpone literal compile, ;

: [compile] ( Compilation: "name" -- )
  comp' ( xt1 xt2 )
  dup ['] compile, = if
    execute   \ append (compile,) the execution semantics
  else
    postpone, \ append (postpone,) the compilation semantics
  endif ;

One potential problem I see with this is if EXIT is not implemented as
a normal word on your system (it is not on Gforth; even a traditional
immediacy-based system and a traditional [COMPILE] implementation
would do a non-standard thing in that case).  You have to special-case
EXIT (and other such words, if they exist) in your [COMPILE] then.
This is only a minor issue, because nobody does a [COMPILE] EXIT, and
if they do, they probably expect the non-standard behaviour.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
EuroForth 2004: http://www.complang.tuwien.ac.at/anton/euroforth2004/
  Deadline (refereed): August 28, 2004; Conference: November 19-21, 2004


From anton Thu Sep 16 18:18:20 2004
X-newsreader: xrn 9.03-beta-14
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Subject: Re: header structure for intelligent COMPILE, and compilation token
Path: mips.complang.tuwien.ac.at!anton
Newsgroups: comp.lang.forth
Distribution: 
References: <2004Sep14.094254@mips.complang.tuwien.ac.at> <b5ecdf44.0409151040.3785df77@posting.google.com>
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Keywords: 
Date: Thu, 16 Sep 2004 13:00:24 GMT
Message-ID: <2004Sep16.150024@mips.complang.tuwien.ac.at>

m_l_g3@yahoo.com (Michael L Gassanenko) writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<2004Sep14.094254@mips.complang.tuwien.ac.at>...
>> The data structure for a named word would look like:
>> 
>> link field
>> name field
>> xt2 field
>> comp field
>> code field             <-- cfa (xt1)
>> parameter field (body)
>>
>
>I do not think both xt2 and comp fields are needed.
>Even if you care about optimizations.

Assuming that you want the xt2 field for implementing S" etc., here
are some reasons why you may still want the comp field:

- If your system does not have a simple, uniform way to perform
  "COMPILE," for arbitrary words.  E.g., gforth-0.6 requires
  primitive-centric threaded code; simply using "," as "COMPILE," does
  not work (except for gforth-itc).  This leads to the current
  COMPILE, looking like:

  : compile-to-prims, ( xt -- )
    \G compile xt to use primitives (and their peephole optimization)
    \G instead of ","-ing the xt.
    \ !! all POSTPONEs here postpone primitives; this can be optimized
    dup >does-code if
	POSTPONE does-exec , EXIT
	\ dup >body POSTPONE literal POSTPONE call >does-code , EXIT
    then
    dup >code-address CASE
	docon:   OF >body POSTPONE lit@ , EXIT ENDOF
	\ docon:   OF >body POSTPONE literal POSTPONE @ EXIT ENDOF
	\ docon is also used by VALUEs, so don't @ at compile time
	docol:   OF >body POSTPONE call , EXIT ENDOF
	dovar:   OF >body POSTPONE literal EXIT ENDOF
	douser:  OF >body @ POSTPONE useraddr , EXIT ENDOF
	dodefer: OF >body POSTPONE lit-perform , EXIT ENDOF
	dofield: OF >body @ POSTPONE lit+ , EXIT ENDOF
	\ dofield: OF >body @ POSTPONE literal POSTPONE + EXIT ENDOF
	\ code words and ;code-defined words (code words could be optimized):
	dup in-dictionary? IF drop POSTPONE literal POSTPONE execute EXIT THEN
    ENDCASE
    peephole-compile, ;

  Certainly having a comp field and the following definition of
  COMPILE, would be much better:

  : compile, ( xt -- )
    dup >comp @ execute ;

- If you want to deal with optimization through the xt2 field, you
  have to either abandon [COMPILE], or find a way to make it behave
  correctly; the latter option may be more complicated than having a
  comp field.

- If you optimize through the comp field, "COMPILE," produces the same
  code as text interpretation.  When optimizing through xt2, the code
  will be different (the COMPILE,d code will not be optimized).  This
  may surprise users, both wrt performance characteristics, and if
  they come upon a compiler bug.  So you may want to optimize through
  the comp field to avoid such surprises.

  In addition, using "COMPILE," is often cleaner than alternative
  approaches (e.g., EVALUATE).  You should not punish the users for
  writing clean code by making that code slower.

>The problem with this proposal is that it obsoletes both FIND and
>other words like S$SEARCH-ORDER ( addr len -- xt flag flag )
>that *must return only one xt*.
>Therefore, FIND will not be able to return a single xt for 
>words with untrivial compilation/execution semantics pair.
>The xt on top requires a parameter and the other xt 
>is interpretation-only.

Good point (I'll ignore S$SEARCH-ORDER, as it is no standard word, and
I have no idea what it's supposed to do).  I see the following
alternative approaches for this problem:

- Just let FIND return the xt1 or somesuch.  Sure, text interpreters
  written with FIND will not work as expected, but I think that's a
  rare usage of Forth anyway, and I don't think that the standard
  guarantees that such an interpreter should work, although that was
  probably intended (I asked RFI 8 about the meaning of FIND, but have
  not yet received an answer).

  This is probably the most sensible approach as the benefits of the
  other approaches don't pay for the additional complexity.

- Restrict xt2 to contain words of a specific structure, e.g.

  :noname drop <word> ;

  from which you can easily extract a single xt (that of <word>) that
  performs the compilation semantics.  This restriction could be
  implemented by offering a syntax (e.g., INTERPRET/COMPILE:) that
  takes such a single xt and builds the xt2 out of it.

- It is possible to convert a double-cell ct into a single-cell xt, as
  follows:

  : ct>xt ( xt1 xt2 -- xt )
    >r >r :noname r> POSTPONE literal r> compile, POSTPONE ; ;

  I think that was the idea you had here:

>>tmpcomp 2>R :NONAME [ 2R> ] LITERAL EXECUTE ;  tmpcomp> 

  but I think you got the execution timing wrong.

  Anyway, how an implementation might use that: Whenever we create a
  word with a non-trivial ct, we also create the single-xt ct for it
  with ct>xt, and store it in a lookup table that translates cts to
  their corresponding xts.  When FIND is called in compile state, and
  sees a non-trivial ct, it looks up the corresponding xt and returns
  that, along with the "word-is-immediate" flag value.


- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
EuroForth 2004: http://www.complang.tuwien.ac.at/anton/euroforth2004/
  Deadline (refereed): August 28, 2004; Conference: November 19-21, 2004
Anton Ertl