csc thread on BASIC speed - C64 faster than C128

Started by Blacklord, August 19, 2007, 07:20 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Blacklord

Recently there has been a thread on comp.sys.cbm (COMMODORE BASIC rules). There have been comments that the C64 is faster to execute BASIC code than the C128 is.

This got me thinking, years ago BYTE Magazine published a short BASIC program that allowed you to test various machines.

The code is here :

10 PRINT TIME$
20 A=2.71826
30 B=3.14159
40 C=1
50 FOR I=1 TO 5000
60 C=C*A
70 C=C*B
80 C=C/A
90 C=C/B
100 NEXT I
110 PRINT TIME$
120 END

So, how do the various Commodore machines compare ? I used the five machines I have readily at hand & got these execution results (slowest to fastest) :

C128 (in 1MHz mode) - 206
Plus4 - 183
PET 4032 - 178
C64 - 141
VIC20 - 123
C128 (in 2MHz mode) - 101
Amiga 1000 (AmigaBasic) - 23

Clearly the C64 can execute generic code faster than a 1MHz C128, however when you take into account BASIC 7.0's better graphics & sound commands I don't think you can compare.

Some interesting points ;

1) A VIC20 blitzes a C64
2) A C128 in FAST mode executes the code more than twice as fast as a C128 in SLOW mode

Notes: The Amiga, 128, 64, VIC & Plus4 are all PAL machines, results will differ for NTSC. There was no difference in speed in either 40 or 80 column mode on the 128.

cheers,

Lance

Guest

I did some tests of my own on NTSC machines using equivalent code to what you provided:

NTSC C128 Fast: 100
NTSC C128 Slow: 208
NTSC C64: 180
NTSC Atari 1200XL: 283.55
NTSC CoCo 3: 131.82

I then decided to add string manipulation to the test:

25 A$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
35 B$="1234567890"
65 C$=LEFT$(A$,INT(A))
75 C$=LEFT$(B$,INT(B))
85 C$=RIGHT$(A$,INT(A))
95 C$=RIGHT$(B$,INT(B))

This had some interesting, and very telling results:

NTSC C128 Fast: 220
NTSC C128 Slow: 496
NTSC C64: 333
NTSC Atari 1200XL: 471

I didn't run the string test on the CoCo3 because I had already turned it off and didn't save the original and was too tired/lazy to go type it in again.

For string manipulation, the C64 trounces the 128 at the same speed, but once again FAST mode shows its superiority not just in clock, but also in disabling the VIC-II.  The Atari actually got faster than the C128 in Slow mode.  This is because Atari string manipulation is done on char arrays (like in C) instead of undetermined string variables.  The victory for the Atari is rather hollow, though, considering that the CPU runs at 1.7mhz compared to 1.0 mhz for the C128 in Slow mode.  It also gives you an idea of just how much the C128 spanks the Atari overall, especially in Fast mode.

I also intend to run these tests for the Apple //e Enhanced and Coleco Adam, but I need to figure out how to get programattic timers on these platforms as they are not part of their respective BASICs.

Blacklord

What someone needs to do now is create a graphics speed test for the two machines to do a true comparison as well as some "real world" code that performs something useful....


Perhaps tonight when I get home from work.....

Lance

Guest

I have some more results:

Original, Floating point test:

NTSC Commodore plus/4 (6502): 200 seconds
NTSC Apple IIe Enhanced (65c02): 91 seconds
NTSC Coleco Adam (Z80): 117 seconds

String & float test:

NTSC Commodore plus/4 (6502):
NTSC CoCo3 (6809): 279.63 seconds
NTSC Apple IIe Enhanced (65c02): 192 seconds
NTSC Coleco Adam (Z80): 182 seconds

Conclusions:

I've attached a PDF file with a comparison of the systems I tested.  Surprisingly, the Apple //e is both the fastest and also the most efficient.  The Color Computer 3 was the second most efficient and the C= 64 was third most efficient.  The Adam was terribly inefficient despite a very fast clock speed (3.58 mhz) and the Atari 1200 XL was by far the slowest and most inefficient sporting a 1.79 mhz clock.

http://www.paytonbyrd.com/files/8-bitEfficiency.pdf

Blacklord

I'll run these as well (tonite) on the PET, VIC20 & A1000 & post 'em back here.

Blacklord

These is the results of some simple garbage collection on both the C64 & the C128.

10 T1$=TIME$
20 DIM A$(1000)
30 FOR J=1 TO 1000
40 A$(J)=CHR$(65)
50 NEXT J
60 PRINT "STAMP 1"
70 PRINT FRE(0)
80 PRINT "STAMP 2"
90 T2$=TIME$
100 PRINT T1$, T2$

C128 (1MHz mode) 11
C128 (2MHz mode) 9
C64 132

As you can see, the c128 has a massive lead over the C64 in this department.

Interestingly though there isn't much of a difference between the fast & slow modes of the 128.

Tests were run on PAL versions of both machines. I didn't test my VIC20 with this as it's unexpanded & can't count to 1000, it stops at 419 :)

cheers,

Lance

Mangelore

Here are some results when using a SuperCPU128

Quote from: admin10 PRINT TIME$
20 A=2.71826
30 B=3.14159
40 C=1
50 FOR I=1 TO 5000
60 C=C*A
70 C=C*B
80 C=C/A
90 C=C/B
100 NEXT I
110 PRINT TIME$
120 END
PAL SuperCPU128 128 mode: 7
PAL SuperCPU128 64 mode: 5



Quote from: plbyrd25 A$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
35 B$="1234567890"
65 C$=LEFT$(A$,INT(A))
75 C$=LEFT$(B$,INT(B))
85 C$=RIGHT$(A$,INT(A))
95 C$=RIGHT$(B$,INT(B))
PAL SuperCPU128 128 mode: 16 or 17
PAL SuperCPU128 64 mode: 10



Quote from: admin10 T1$=TIME$
20 DIM A$(1000)
30 FOR J=1 TO 1000
40 A$(J)=CHR$(65)
50 NEXT J
60 PRINT "STAMP 1"
70 PRINT FRE(0)
80 PRINT "STAMP 2"
90 T2$=TIME$
100 PRINT T1$, T2$
PAL SuperCPU128 128 mode: 1
PAL SuperCPU128 64 mode: 4

Mark Smith

If someone has cc65 up and running would they care to run this through it :

#include ;
#include ;

main(int argc, char **argv)
{
  register double width, sum;
  register int intervals, i;

  /* get the number of intervals */
  intervals = atoi(argv[1]);
  width = 1.0 / intervals;

  /* do the computation */
  sum = 0;
  for (i=0; i    register double x = (i + 0.5) * width;
    sum += 4.0 / (1.0 + x * x);
  }
  sum *= width;

  printf("Estimation of pi is %f\n", sum);

  return(0);
}


It's from the Linux Parallel Processing HOWTO, I was reading it earler and thought it might give a giggle to try it on a Commodore, but then I read this thread and though it might make a good benchmark.

I'm too tired to do it myself now, but while I sleep maybe someone else can do it :-)

Oh yes, change the main() to get rid of it requiring arguments being passed to it, and hardcode the variable intervals to a number (integer).
cc65 might still whinge about the declarations of type register though ...

If it doesn't work, anyone got some torturous maths to put the old machines through ?

I'm off to bed ... have fun!

Mark


PS. If you are interested in the Parallel Processing Linux stuff, go look at http://www.linux.org/docs/ldp/howto/Parallel-Processing-HOWTO-1.html
------------------------------------------------------------------------------------------------------------------

Commodore 128, 512K 1750 REU, 1581, 1571, 1541-II, MMC64 + MP3@64, Retro-Replay + RR-Net and a 1541 Ultimate with 16MB REU, IDE64 v4.1 + 4GB CF :-)

nikoniko

Quote from: adminC128 (in 1MHz mode) - 206
C64 - 141
Since my timings in VICE seem to match yours on a real machine, here are the results of an additional experiment:

If you can afford a hit to readability and put the entire FOR loop on one line, you get down to 1:59 on the 1Mhz C128 and 1:38 on the C64.

nikoniko

Quote from: strandedinnzIf someone has cc65 up and running would they care to run this through it :
I would if cc65 had floating point support. It does however come with the classic Sieve of Eratosthenes(sp?) that is or was often used as a compiler and system benchmark. If I time this afternoon I'll compile for a variety of targets and post them here.

Guest

Quote from: nikoniko
Quote from: strandedinnzIf someone has cc65 up and running would they care to run this through it :
I would if cc65 had floating point support. It does however come with the classic Sieve of Eratosthenes(sp?) that is or was often used as a compiler and system benchmark. If I time this afternoon I'll compile for a variety of targets and post them here.
I should imagine this would level the playing field quite a bit and focus more on the overall hardware of each platform rather than standard OS implementations since cc65's runtime takes over the host computer to a certain extent.

Guest

Okie dokie,

A bug was found in the CBM based benchmarks and the numbers have been re-run as a result; and the CBM machines were helped a whole bunch in the process.

You'll find an updated PDF file at:

http://www.paytonbyrd.com/files/8-bitEfficiency.pdf

nikoniko

Here's the Sieve of Eratosthenes test, which calculates all primes up to a given number -- in this case, 8192. It's compiled for all platforms that cc65 supports that A) have enough RAM, B) have a timer function in cc65's libraries and C) have supported screen output through cc65's printf() function, or D) in one case (CBM510) a header error that I didn't bother investigating and fixing. Unfortunately, this excluded the consoles like the Lynx, Supervision and NES, and Apple II failed complaining of a missing timer. Most disappointing of all was not being able to do GEOS, as it complained of not supporting screen write. I had wondered whether GEOS might have enough interrupt servicing going on to measurably slow it down, but now I guess I'll never know.

cc65-sieve.zip

So, that leaves us with the C64, C128, C16, Plus/4, PET, CBM610, Atari 8-bits and Oric Atmos.

All in all, this test would be more useful for comparing C compilers to one another rather than systems, but just in case something unexpected would turn up I decided to do it anyway.

I don't have any real machines to test on, but VICE gave results as expected in most cases (minus a couple crashes), though the difference between the 128 and 64 surprised me. Can someone confirm whether this exists on real machines?

C128 1 Mhz - 91-92 ticks
C128 2 Mhz - 43-45 ticks
(new in VICE 1.22)

C64 mode - 88-89 ticks
C64 - 88-89 ticks
PET - 87-88 ticks
CBM610 - CPU jam
Plus/4 - crash to monitor

By the way, these are all PAL except where I couldn't find an option to select it. So I don't know whether VICE's PET/CBM emulations are PAL or NTSC.

nikoniko

Oh, and here's cc65's Sieve code in case someone wants to try a different compiler. Much like comparing BASIC implementations, I'd find it interesting if we could compare different compilers to one another -- though perhaps that should go in a different thread.

/*
 * Calculate all primes up to a specific number.
 */



#include
#include
#include
#include
#include



/*****************************************************************************/
/*                     Data     */
/*****************************************************************************/



#define COUNT 8192 /* Up to what number? */
#define SQRT_COUNT 91 /* Sqrt of COUNT */

static unsigned char Sieve[COUNT];



/*****************************************************************************/
/*                     Code               */
/*****************************************************************************/



#pragma staticlocals(1);



int main (void)
{
    /* Clock variable */
    clock_t Ticks;

    /* This is an example where register variables make sense */
    register unsigned char* S;
    register unsigned    I;
    register unsigned    J;

    /* Output a header */
    printf ("Sieve benchmark - calculating primes\n");
    printf ("between 2 and %u\n", COUNT);
    printf ("Please wait patiently ...\n");

    /* Read the clock */
    Ticks = clock();

    /* Execute the sieve */
    I = 2;
    while (I < SQRT_COUNT) {
if (Sieve[i] == 0) {
   /* Prime number - mark multiples */
   S = &Sieve[J = I*2];
           while (J < COUNT) {
      *S = 1;
      S += I;
      J += I;
   }
}
++I;
    }

    /* Calculate the time used */
    Ticks = clock() - Ticks;

    /* Print the time used */
    printf ("Time used: %lu ticks\n", Ticks);
    printf ("Press Q to quit, any other key for list\n");

    /* Wait for a key and print the list if not 'Q' */
    if (toupper (cgetc()) != 'Q') {
      /* Print the result */
      for (I = 2; I < COUNT; ++I) {
         if (Sieve[i] == 0) {
      printf ("%4d\n", I);
         }
         if (kbhit() && toupper (cgetc()) == 'Q') {
      break;
         }
      }
    }

    return EXIT_SUCCESS;
}

nikoniko

Okay -- my last C post.

Adapting the fannkuch code from the Computer Language Benchmarks Game, and passing the value of 8 to the function, I get the following in VICE for 128 and 64:

50415 C64  
52533 C128
         (1Mhz)

Given that the compiled code is identical except for parsing the "command line" aguments (via a REM statement) and passing the exit code from main back in ST, it would appear that the 128 has more to do during it's service interrupt and that over time that adds up. So keep that in mind if you're going to be doing any long-running protein folding calculations or such. :D

hydrophilic

Protein folding :D :D :D

Aside from the new commands of BASIC 7, it is pretty much the same as the 64.  However it is slower on all counts (1MHz mode) due to bank switching required to fetch every token and write every variable.

String handling is a special case.  Despite reading MANY books on the 128, NONE of them pointed out the new method used for string handling.  I can't explain it properly, but it amounts to a small amount of extra code to add a 'tag' to strings.  Appearantly old / unreferenced strings have a tag of $FF (at the end of the string) while new / valid strings have another value -- this allows for rapid garbage collection as 'garbage' strings are easy to identify.  If anyone has more info on this, I'd like to know!

Also something I discovered while examing BASIC code: the PAINT command performs a FRE() then uses all available memory in BANK 1 as a 'stack' for paint-filling.

Edit
A new feature slows down the old: ELSE.  On the 64, when an IF/THEN condition is false, the CPU immediately jumps to the next line via a link -- no scanning is involved.  But on the 128, when IF/THEN is false, the CPU must scan through the remainder of the line looking for ELSE (which may not exist), this slows things down considerably...

FOR/NEXT, GOSUB/RETURN on the 64 use the CPU's hardware stack (page 1) but on the 128 these (along with DO/LOOP) use a software stack (pages 8 and 9).  A software stack is noticably slower than a hardware one.  Note that math operations still use the CPU stack.

DO/LOOP is very similar in concept to FOR/NEXT, however DO/LOOP pushes far less data on the stack so it might be prefered if speed is a concern.

nikoniko

Quote from: hydrophilicAppearantly old / unreferenced strings have a tag of $FF (at the end of the string) while new / valid strings have another value -- this allows for rapid garbage collection as 'garbage' strings are easy to identify.
Yep. Active strings have two bytes at the end which point back to their entry in the variable table, while garbage strings have the second of those bytes overwritten with $FF.

JamesD

Try adding the following lines when running it on the CoCo3:

0 POKE 65497,0
10 TIMER=0
110 PRINT TIMER/60

The TIMER logic will give you the runtime in minutes with decimal.
The POKE enables the high speed mode and yields very different results than what you got.  A CoCo3 with a 6309 running in native mode will cut that by about 21% even without patches.  Enabling the fast keyboard routines with a few more POKEs will cut about 2% more and some more basic patches for the 6309 speeds things up further.  Since BASIC runs from RAM on the CoCo3, patches are easy.  If that's not fast enough, BASIC09 (compiled & running under OS-9) will stomp on the result you end up with from standard CoCo BASIC.

Another benchmark thread over on the atariage forums.  It's based on a very simple benchmark listed in Creative Computing.  We are still missing results for the C64 and C128.  I'll be linking them back here.
http://www.atariage.com/forums/index.php?showtopic=116886&st=0&gopid=1419099&#entry1419099

I could run this on an Apple IIc Plus (4MHz 6502), Panasonic JR-200 (did well on Ahl's benchmarks) and an NEC Trek.
It would be interesting to see what an MSX Turbo R does with this benchmark.

Steve Gray

Quote from: adminRecently there has been a thread on comp.sys.cbm (COMMODORE BASIC rules). There have been comments that the C64 is faster to execute BASIC code than the C128 is.

This got me thinking, years ago BYTE Magazine published a short BASIC program that allowed you to test various machines.

The code is here :

10 PRINT TIME$
20 A=2.71826
30 B=3.14159
40 C=1
50 FOR I=1 TO 5000
60 C=C*A
70 C=C*B
80 C=C/A
90 C=C/B
100 NEXT I
110 PRINT TIME$
120 END

So, how do the various Commodore machines compare ? I used the five machines I have readily at hand & got these execution results (slowest to fastest) :

C128 (in 1MHz mode) - 206
Plus4 - 183
PET 4032 - 178
C64 - 141
VIC20 - 123
C128 (in 2MHz mode) - 101
Amiga 1000 (AmigaBasic) - 23

Clearly the C64 can execute generic code faster than a 1MHz C128, however when you take into account BASIC 7.0's better graphics & sound commands I don't think you can compare.

Some interesting points ;

1) A VIC20 blitzes a C64
2) A C128 in FAST mode executes the code more than twice as fast as a C128 in SLOW mode

Notes: The Amiga, 128, 64, VIC & Plus4 are all PAL machines, results will differ for NTSC. There was no difference in speed in either 40 or 80 column mode on the 128.

cheers,

Lance
I tested on my ntsc CBM-II machines just for fun:

B128 (2MHz) - 73.8
P500 (1MHz) - 151.5

That makes the B128 faster than the C128 ;-)

Steve

JamesD

Quote from: JamesDTry adding the following lines when running it on the CoCo3:

0 POKE 65497,0
10 TIMER=0
110 PRINT TIMER/60

The TIMER logic will give you the runtime in minutes with decimal.
TIMER/60 isn't minutes, it's seconds.  TIMER is ticks which is 60/sec.  Oops... it's been a while.

I finally had time to run the BASIC benchmarks.  I used the VCC emulator since my CoCo 3 is in another state at the moment.  Timing isn't exact, it's actually slightly slower than the real thing.
For the first benchmark...
At standard CoCo speed Vcc came up with 135.333333 rather than 131.82 of the real thing.
In high speed CoCo3 mode, Vcc runs it in 67.3333334 so the real CoCo 3 should run it in about 65 seconds if the speed difference is consistent.

With string operations...
Slow: 283.733334
Fast: 141.6


Those numbers are without enabling the 6309 native mode that is about 20% faster.  A 20% reduction would return around 113.28 seconds in Fast mode for the long benchmark... almost half of the C128 in fast mode and that's without 6309 optimizations to the ROM.  I should mention the 6309 has differences in the interrupt stack in native mode that will cause many unpatched programs to crash but that's to accommodate the new registers.
A BASIC09 version would run it in about 1/10th the time of the CoCo3 interpreted BASIC benchmarks if speedup is consistent with other benchmarks.

BTW, the 64180 (or later Z180) run Z80 code and in native mode do it about 20% faster than the Z80 so Z80 machines could see similar speedups with an upgrade.  It also has some instructions that will speed up patched code.  The later Z180 is a 64180 equivalent with more Z80 compatible bus interface.  Sadly the chip isn't a drop in replacement like the 6309.  The 6309 and 64180 were both designed by Hitachi.