How BASIC P-Code Compilers work?

Started by xlar54, August 17, 2009, 02:31 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

xlar54

Anyone know how those things like Blitz and BASIC-64/128 worked?  I understand pure ML generation, but the P-code they talk about somewhat confuses me.  Is it really just more of a BASIC optimizer?

BigDumbDinosaur

A lot of the P-code optimization involves lookups.  For example, if you code GOSUB 8000 in your program, the BASIC interpreter would, during runtime, have to convert the PETSCII string 8000 to an unsigned 16 bit integer ($1F40 in this case), a slow process in itself, and then sequentially search the program until a statement with that same 16 bit integer line number was found.  The compiled version would not scan the program for the statement and, instead, would refer to a lookup table of GOSUB statements.

The GOSUB statement itself would be modified so the line number reference is in 16 bit integer form as well, reducing the code from (in the above example) a minimum of 5 bytes to 3 (the GOSUB token and the 16 bit line number integer).  When the GOSUB is executed, the P-code engine attached to the compiled program would use the GOSUB lookup table to find the target statement, rather than sequentially search the entire program.  A simple and fast binary search would identify the statement's address in the program and much of the monkey-motion in interpreted BASIC would be avoided.

Other optimizations would be converting all embedded numeric references (e.g., PRINT X*47) to floating point, thus avoiding having to do the conversion during runtime.  Static strings would be referred to through another lookup table, hastening their use during runtime.  During compilation, syntax and a few other errors would be detected and trapped, reducing the amount of error checking required during run-time, further accelerating the program's performance.

The P-code engine, which is attached to the compiled program, contains machine language routines that do everything needed to executed the compiled program, either through more efficient interaction with the kernel or by replacing BASIC subroutines known to be bottlenecks.  Much of the performance of the program is dependent on how well the P-code engine has been designed.

The subject of language compilation is a complicated one, to say the least, far beyond anything that would be published on a forum.
x86?  We ain't got no x86.  We don't need no stinking x86!

xlar54

Very interesting and good information, thank you.  Languages has always been where my interest has been.  Understanding the voodoo behind these p-code engines is pretty interesting stuff.  Id say just the speed from the lookups of GOSUBs would increase performance considerably due to the way, as you said, the program must scan through over an over just to locate the target line.

BigDumbDinosaur

#3
In Micro$oft BASIC, GOTOs (as well as IF...THEN constructs) are also handled by scanning the program for the target line.  The only difference with GOSUBs is the need to store the return location on a stack, which is why in-line code always runs faster (don't code a subroutine unless it will be used in more than one location).  Otherwise, the mechanism is similar.

You should know that the timesharing BASICs that were developed in the early 1970s for use with minicomputers possess an internal structure similar to what I described above.  Current generations of many timesharing BASICS implement statement labels, making it possible to code something like GOSUB GETRECORD instead of GOSUB 8000.  That is, the program structure can be written independently of line numbering (professional business BASIC programmers never use line numbers as branch and subroutine targets).  Testing conclusively demonstrated that statement label references were faster than line number references in all cases.
x86?  We ain't got no x86.  We don't need no stinking x86!