@chapter{Foreign Language Interface}

The stuff in this chapter has not been implemented. It just contains
thoughts about how the interface to other programming languages might
look and how it might be implemented (in case you want to implement it).

Foreign language interfaces make libraries written for other programming
languages available to Forth programs and are therefore extremely useful
and important. Most important are the interfaces to C (many general and
special purpose libraries: windowing, networking, data bases, system
calls for many OSs, interfaces to other languages etc.) and FORTRAN
(sophisticated numerical libraries). C++ seems to gain importance, too
(e.g., Interviews).

While some general principles can be applied to all language interfaces,
the concrete interface depends on the concrete programming language. In
the following I will explain my ideas for the interface to C.

While adding primitives is relatively convenient in gforth, it is not
suited for providing the C language interface for several reasons: As
convenient as it may be, adding hundreds of primitives for library
functions is still very tedious; and every time a primitive is added,
the whole engine has to be compiled again, which costs much time and
needs much memory (indeed, with thousands of primitives to compile
@code{gcc} would run out of space sooner or later). Also, to add
primitives, gforth has to be exited, relinked, and invoked again; this
loses the advantage of interactivity and gives the user the old
unresponsive compile-cycle feeling.

Therefore C functions should be made available in a less tedious and
more responsive manner and should be called using a general C calling
primitive (see the implementation section).

The C library should be dynamically linked on systems that support
dynamic linking (with a fallback to static linking on systems that
don't). A special tool should translate C header files containing ANSI C
prototypes and type declarations into corresponding Forth definitions.
After linking the library and loading these definitions the user can
call the C functions as if they were Forth words and can access C
structures.

Calling a C function from Forth should be as similar to calling a Forth
word as possible. E.g., we would like to call the C funtion

@example
double sin(double x);
@end example

like a Forth word

@format@code{sin}       @i{r1 -- r2}@end format

I.e., the parameters for the function call are taken from the data and
FP stack and the return value is put on the data or the FP stack. In
order to make it easy to use the original documentation, the parameters
should be passed in the same order as in C, i.e., the last parameter
should be on the top of the appropriate stack. E.g., the call

@example
memcmp(s1, s2, n);
@end example

would be called in Forth using

@example
s1 s2 n memcmp
@example

(assuming that @code{s1}, @code{s2} and @code{n} are locals or values.)

The name of the word should be exactly as in C. Conflicts with Forth
names can be avoided by putting the C stuff in a separate word list.

There is just one problem: C functions expect and return @code{int}s,
@code{char *}s etc., so we have to convert the parameters to these
types. To make the interface as convenient as possible, this conversion
is done automatically where possible. However, this is not always
possible.

Two cases have to be distinguished: Data that is passed directly as a
parameter and data that is passed via memory (e.g., by passing a pointer
to the memory area or by storing data in a variable before the call).

@section{Data in memory}

If data is passed via memory, there is not much choice: We have to
create or access the data as C would. For this purpose we have the
following words:

c-char-!
c-unsigned-char-!
c-short-!
c-unsigned-short-!
c-int-!
c-unsigned-int-!
c-long-!
c-unsigned-long-!
c-long-long-!
c-unsigned-long-long-!
c-ptr-!
c-float-!
c-double-!
c-long-double-!
c-char-@
c-unsigned-char-@
c-short-@
c-unsigned-short-@
c-int-@
c-unsigned-int-@
c-long-@
c-unsigned-long-@
c-long-long-@
c-unsigned-long-long-@
c-ptr-@
c-float-@
c-double-@
c-long-double-@


While @code{gforth} guarantees that @code{c-double-!} does the same as
@code{f!}, we add it in case other systems want to have the same
interface.

For structures and unions the C header file processor creates access
words. E.g., for the structure

@example
struct foo {
   int flip;
   double flop;
}
@end example

creates the words

@example
foo-flip ( addr1 -- addr2 )
foo-flip@ ( addr -- n )
foo-flip! ( n addr -- )
foo-flop ( addr1 -- addr2 )
foo-flop@ ( addr -- r )
foo-flop! ( r addr -- )
size-foo ( -- n )
align-foo ( addr1 -- addr2 )
@end example

For members that are structures or unions themselves, the @code{@} and
@code{!} words are not produced; for bit-fields only the @code{@} and
@code{!} words are produced.

(What about typedefs and anonymous structs?)

C variables behave just like their Forth counterpieces, i.e., they
produce an address.

@section{Parameters/Data on the stacks}

For data passed over the stacks we need automatic conversion, as the
stacks can only contain (signed or unsigned) cells, double cells, and
floats. Since we want to call given C routines (in contrast to calling C
routines written for a given piece of Forth code), the Forth type
corresponding to a C type should contain all the values of the C type. A
Forth cell should have the same size as a C pointer, so these types
correspond naturally. @code{char}s, @code{short}s, and @code{int}s have
at most the same size as a pointer in any C implementation I have heard
of and it will stay so in the forseeable future. Therefore they can be
represented by a cell. C's @code{long} types seem to have been inspired
by the need for big counters on 16-bit machines. On 32-bit and larger
machines, they need at most as much space as a pointer. I don't think
Forth on small machines (embedded controllers) will want to call C
functions (comments?), so a cell should also suffice for
@code{long}s. @code{long long}s are defined to be twice as long as longs
(@pxref{Long Long, , Double-Word Integers, gcc.info, GNU C Manual}), so
they correspond to double cells. The unsinged C types correspond to
unsigned cells and double-cells.

Since Forth has only one one floating point type (on the stack), it
corresponds to all of C's floating point types: @code{float},
@code{double} and @code{long double}.

@section{Misc. problems}

Some functions expect C function pointers. Like other pointers, these
occupy one cell. They can be extracted from the Forth word for the C
function @code{foo} by using either @code{c-func-ptr foo} or @code{' foo
>c-func-ptr}. Currently C funtion pointer cannot be created for normal
Forth words (The user would need to state the C function type that the
Forth word should emulate. In the implementation, the pointed-to
function would push values on the stacks, call the interpreter with
certain parameters (GNU C's ``nested functions'' extension might be
helpful), and pop the return value from a stack; a @code{THROW} might be
caught be a @code{CATCH} in an outer instance of the interpreter).

The most common conversion trouble will be between Forth and C
representations of strings.

A few C functions take a variable number of arguments, e.g.,
@code{printf}. If you want to use these functions, you have to specify
the actual types of the parameters for each call (How?).

@section{Implementation}

Concerning the call: The Forth system can know (from the C prototypes)
what parameters are passed, but it does not know the parameter passing
convention of the machine (which may be quite complex, e.g, take the
MIPS). A possible solution for that is to generate for every C
prototype a C function that takes the parameters from the stacks and
calls the C function of the prototype with the appropriate parameter
list. A slight variation of this is to combine these interface
functions for all C functions with the same parameter list and call
the C function proper through a function pointer. Both of these
variants make it necessary to call the C compiler whenever such an
interface function is added, and then to link the function in
(preferably dynamically). A general call primitive would call the
interface functions.

Another variant is to create at configure time primitives (or C
functions) for all the different parameter passing conventions available
on the machine (Typically the first few parameters are passed in,
possibly wild (MIPS) combinations of registers, the rest is passed on
the stack, so the number of distinct combinations is not too big) for up
to, say, 20 parameters. These primitives take the parameters from a
sequential area in memory, so they need not treat FP values different
from integers if they land on the C stack. So the call of a C function
consists of moving the parameters to this area (converting Forth to C
types), then calling the appropriate primitive.

If anyone with knowledge of C calling conventions has a better idea, I
would like to know about it.

The header file processor is best derived from a tool that already
processes C sources instead of writing a new one. Candidates that come
to my mind are @code{gcc}, @code{lcc} and @code{unprotoize} (but doesn't
unprotoize ignore structures? BTW, it is derived from
@code{gcc}). Note that there is no need to track the development of the
original tool, unless we want to keep up with e.g., new C extensions
provided by @code{gcc}. Using @code{gcc} is probably more difficult but
has the advantage that the tool could then grok the GNU C extensions
syntax and that the machine descriptions contain info about every system
where gforth runs (but don't they all use the same scheme for computing
struct field offsets? Is there anything else the system needs to know
about the machine?).

The problem with processing the header file is that we lose the
macros. We can get at least some of the constant declarations by
processing the header files with @code{gcc -E -dM}.

@section{Related work}

The 1985 Rochester Forth conference contains a paper by William L. Sebok
on the problem and a report from the Forth and Unix working group that
discusses it, too. JForth on the Amiga uses a C header file processor.