@chapter{Foreign Language Interface} The stuff in this chapter has not been implemented. It just contains thoughts about how the interface to other programming languages might look and how it might be implemented (in case you want to implement it). Foreign language interfaces make libraries written for other programming languages available to Forth programs and are therefore extremely useful and important. Most important are the interfaces to C (many general and special purpose libraries: windowing, networking, data bases, system calls for many OSs, interfaces to other languages etc.) and FORTRAN (sophisticated numerical libraries). C++ seems to gain importance, too (e.g., Interviews). While some general principles can be applied to all language interfaces, the concrete interface depends on the concrete programming language. In the following I will explain my ideas for the interface to C. While adding primitives is relatively convenient in gforth, it is not suited for providing the C language interface for several reasons: As convenient as it may be, adding hundreds of primitives for library functions is still very tedious; and every time a primitive is added, the whole engine has to be compiled again, which costs much time and needs much memory (indeed, with thousands of primitives to compile @code{gcc} would run out of space sooner or later). Also, to add primitives, gforth has to be exited, relinked, and invoked again; this loses the advantage of interactivity and gives the user the old unresponsive compile-cycle feeling. Therefore C functions should be made available in a less tedious and more responsive manner and should be called using a general C calling primitive (see the implementation section). The C library should be dynamically linked on systems that support dynamic linking (with a fallback to static linking on systems that don't). A special tool should translate C header files containing ANSI C prototypes and type declarations into corresponding Forth definitions. After linking the library and loading these definitions the user can call the C functions as if they were Forth words and can access C structures. Calling a C function from Forth should be as similar to calling a Forth word as possible. E.g., we would like to call the C funtion @example double sin(double x); @end example like a Forth word @format@code{sin} @i{r1 -- r2}@end format I.e., the parameters for the function call are taken from the data and FP stack and the return value is put on the data or the FP stack. In order to make it easy to use the original documentation, the parameters should be passed in the same order as in C, i.e., the last parameter should be on the top of the appropriate stack. E.g., the call @example memcmp(s1, s2, n); @end example would be called in Forth using @example s1 s2 n memcmp @example (assuming that @code{s1}, @code{s2} and @code{n} are locals or values.) The name of the word should be exactly as in C. Conflicts with Forth names can be avoided by putting the C stuff in a separate word list. There is just one problem: C functions expect and return @code{int}s, @code{char *}s etc., so we have to convert the parameters to these types. To make the interface as convenient as possible, this conversion is done automatically where possible. However, this is not always possible. Two cases have to be distinguished: Data that is passed directly as a parameter and data that is passed via memory (e.g., by passing a pointer to the memory area or by storing data in a variable before the call). @section{Data in memory} If data is passed via memory, there is not much choice: We have to create or access the data as C would. For this purpose we have the following words: c-char-! c-unsigned-char-! c-short-! c-unsigned-short-! c-int-! c-unsigned-int-! c-long-! c-unsigned-long-! c-long-long-! c-unsigned-long-long-! c-ptr-! c-float-! c-double-! c-long-double-! c-char-@ c-unsigned-char-@ c-short-@ c-unsigned-short-@ c-int-@ c-unsigned-int-@ c-long-@ c-unsigned-long-@ c-long-long-@ c-unsigned-long-long-@ c-ptr-@ c-float-@ c-double-@ c-long-double-@ While @code{gforth} guarantees that @code{c-double-!} does the same as @code{f!}, we add it in case other systems want to have the same interface. For structures and unions the C header file processor creates access words. E.g., for the structure @example struct foo { int flip; double flop; } @end example creates the words @example foo-flip ( addr1 -- addr2 ) foo-flip@ ( addr -- n ) foo-flip! ( n addr -- ) foo-flop ( addr1 -- addr2 ) foo-flop@ ( addr -- r ) foo-flop! ( r addr -- ) size-foo ( -- n ) align-foo ( addr1 -- addr2 ) @end example For members that are structures or unions themselves, the @code{@} and @code{!} words are not produced; for bit-fields only the @code{@} and @code{!} words are produced. (What about typedefs and anonymous structs?) C variables behave just like their Forth counterpieces, i.e., they produce an address. @section{Parameters/Data on the stacks} For data passed over the stacks we need automatic conversion, as the stacks can only contain (signed or unsigned) cells, double cells, and floats. Since we want to call given C routines (in contrast to calling C routines written for a given piece of Forth code), the Forth type corresponding to a C type should contain all the values of the C type. A Forth cell should have the same size as a C pointer, so these types correspond naturally. @code{char}s, @code{short}s, and @code{int}s have at most the same size as a pointer in any C implementation I have heard of and it will stay so in the forseeable future. Therefore they can be represented by a cell. C's @code{long} types seem to have been inspired by the need for big counters on 16-bit machines. On 32-bit and larger machines, they need at most as much space as a pointer. I don't think Forth on small machines (embedded controllers) will want to call C functions (comments?), so a cell should also suffice for @code{long}s. @code{long long}s are defined to be twice as long as longs (@pxref{Long Long, , Double-Word Integers, gcc.info, GNU C Manual}), so they correspond to double cells. The unsinged C types correspond to unsigned cells and double-cells. Since Forth has only one one floating point type (on the stack), it corresponds to all of C's floating point types: @code{float}, @code{double} and @code{long double}. @section{Misc. problems} Some functions expect C function pointers. Like other pointers, these occupy one cell. They can be extracted from the Forth word for the C function @code{foo} by using either @code{c-func-ptr foo} or @code{' foo >c-func-ptr}. Currently C funtion pointer cannot be created for normal Forth words (The user would need to state the C function type that the Forth word should emulate. In the implementation, the pointed-to function would push values on the stacks, call the interpreter with certain parameters (GNU C's ``nested functions'' extension might be helpful), and pop the return value from a stack; a @code{THROW} might be caught be a @code{CATCH} in an outer instance of the interpreter). The most common conversion trouble will be between Forth and C representations of strings. A few C functions take a variable number of arguments, e.g., @code{printf}. If you want to use these functions, you have to specify the actual types of the parameters for each call (How?). @section{Implementation} Concerning the call: The Forth system can know (from the C prototypes) what parameters are passed, but it does not know the parameter passing convention of the machine (which may be quite complex, e.g, take the MIPS). A possible solution for that is to generate for every C prototype a C function that takes the parameters from the stacks and calls the C function of the prototype with the appropriate parameter list. A slight variation of this is to combine these interface functions for all C functions with the same parameter list and call the C function proper through a function pointer. Both of these variants make it necessary to call the C compiler whenever such an interface function is added, and then to link the function in (preferably dynamically). A general call primitive would call the interface functions. Another variant is to create at configure time primitives (or C functions) for all the different parameter passing conventions available on the machine (Typically the first few parameters are passed in, possibly wild (MIPS) combinations of registers, the rest is passed on the stack, so the number of distinct combinations is not too big) for up to, say, 20 parameters. These primitives take the parameters from a sequential area in memory, so they need not treat FP values different from integers if they land on the C stack. So the call of a C function consists of moving the parameters to this area (converting Forth to C types), then calling the appropriate primitive. If anyone with knowledge of C calling conventions has a better idea, I would like to know about it. The header file processor is best derived from a tool that already processes C sources instead of writing a new one. Candidates that come to my mind are @code{gcc}, @code{lcc} and @code{unprotoize} (but doesn't unprotoize ignore structures? BTW, it is derived from @code{gcc}). Note that there is no need to track the development of the original tool, unless we want to keep up with e.g., new C extensions provided by @code{gcc}. Using @code{gcc} is probably more difficult but has the advantage that the tool could then grok the GNU C extensions syntax and that the machine descriptions contain info about every system where gforth runs (but don't they all use the same scheme for computing struct field offsets? Is there anything else the system needs to know about the machine?). The problem with processing the header file is that we lose the macros. We can get at least some of the constant declarations by processing the header files with @code{gcc -E -dM}. @section{Related work} The 1985 Rochester Forth conference contains a paper by William L. Sebok on the problem and a report from the Forth and Unix working group that discusses it, too. JForth on the Amiga uses a C header file processor.