Z80 program I've written

Started by Christian Johansson, January 06, 2007, 05:23 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Christian Johansson

Actually, it was a mistake to use OUT. Yesterday evening when I had turned off my computer I thought: "Oh, I happened to use OUT instead of LD. Somebody is surely going to comment about that." and I was right about that ;) . Oh well, it shows that OUT works as well even though it shouldn't be used in this case. That might be good to know. It also shows that there are people reading what I post, which is also good :) .

I just changed the code so that it uses LD in the following way. I have tested that it works.

50 REM START PROGRAM WITH BANK0:SYS65488 AFTER ASSEMBLING IT, THE NEXT LINE STARTS THE ASSEMBLER
100 SYS4000
150 .BANK 0 ;ASSEMBLE TO BANK 0
200 .ORG $3000 ;PUT THE PROGRAM AT $3000
250 .MEM ;ASSEMBLE TO MEMORY (NOT TO FILE)
300 LD SP,$2000 ;SET STACK POINTER, NOTE THAT THE Z80 HAS A 16-BIT STACK POINTER
420 LD A,195 ;OPCODE FOR THE JP INSTRUCTION
422 LD IX,$38 ;THE Z80 STARTS AT $38 AT IRQ IN INTERRUPT MODE 1
424 LD (IX+0),A
425 LD A,426 LD (IX+1),A
427 LD A,>IRQ'ROUTINE
431 LD (IX+2),A
450 LD A,0
500 LD BC,$D01A
550 OUT (C),A ;DISABLE VIC II IRQs
600 LD A,$25 ;LO BYTE OF 1/60 S FOR PAL, USE $95 INSTEAD FOR NTSC
650 LD BC,$DC04
700 OUT (C),A ;SET LO BYTE OF CIA #1 TIMER A
750 LD A,$40 ;HI BYTE OF 1/60 S FOR PAL, USE $42 INSTEAD FOR NTSC
800 INC C
850 OUT (C),A ;SET HI BYTE OF CIA #1 TIMER A
900 LD A,$81
950 LD BC,$DC0D
1000 OUT (C),A ;ENABLE CIA #1 TIMER A IRQ
1050 LD A,$01
1100 INC C
1150 OUT (C),A ;START CIA #1 TIMER A IN CONTINUOUS MODE
1170 IM 1 ;SET INTERRUPT MODE 1
1200 EI ;ENABLE Z80 INTERRUPTS
1250 ETERNAL'LOOP JP ETERNAL'LOOP
1350 IRQ'ROUTINE LD BC,$DC0D
1400 IN A,(C) ;CLEAR CIA TIMER A INTERRUPT
1420 EI ;Z80 INTERRUPTS HAVE TO BE ENABLED AGAIN IN THE IRQ ROUTINE, ELSE NO MORE INTERRUPTS WILL OCCUR
1450 LD BC,$D020
1500 IN A,(C) ;READ BORDER COLOR
1550 INC A ;INCREASE BORDER COLOR BY 1
1600 OUT (C),A ;WRITE NEW BORDER COLOR
1650 RET ;RETURN FROM IRQ
1700 * =$FFD2
1750 .BYT $BE ;CHANGE BANK 0 TO BANK 2 IN 8502->Z80 ROUTINE TO AVOID Z80 BOOT ROM AT ADR 0
1800 * =$FFEE
1850 JP $3000 ;THE Z80 RESUMES AT $FFEE, LET'S PUT A JUMP INSTRUCTION TO OUR CODE THERE

Btw, I read that there are variants of the LD instructions that I think should make it possible to change the following code:

420 LD A,195 ;OPCODE FOR THE JP INSTRUCTION
422 LD IX,$38 ;THE Z80 STARTS AT $38 AT IRQ IN INTERRUPT MODE 1
424 LD (IX+0),A
425 LD A,426 LD (IX+1),A
427 LD A,>IRQ'ROUTINE
431 LD (IX+2),A

into:

422 LD IX,$38 ;THE Z80 STARTS AT $38 AT IRQ IN INTERRUPT MODE 1
424 LD (IX+0),195
426 LD (IX+1),431 LD (IX+2),>IRQ'ROUTINE

However, when I tried that, the code compiled without errors but it didn't work.

hydrophilic

After assembling, you might enter the ML Monitor and issue:

M 3000

To get the codes generated by the assemblier.  Assemblers (and other programs) are known to silently fail, this will let us see what code it made and we could check that it is correct.

Also, you used LD (IX+d),A instead of LD (HL),A.  I would recommend against that unless there is a _really_ good reason because it is sooo much slower.  On the 8502, STA (zp),Y is pretty fast (6 cycles) and this compares favoribly with LD (HL),A (7 cycles), but LD (IX+d),A takes 19 cycles!!  Also, LD (HL),A is 1 opcode while the IX version is 3.

What I'm trying to say is the 8502 is really good at indexing, either directly or through zero page, but the Z80 sucks.  On the other hand, the primary register pairs (BC, DE, and especially HL) are pretty fast and should generally be used for 'indexing'.

But it doesn't hurt to experiment... unless you're programming a nuclear or missle or something :)

Christian Johansson

Thank you for the information :) . I thought that instead of doing "INC L; LD (HL),A;" it must be shorter to write "LD (IX+1),A;" but apparently I was wrong.

Christian Johansson

Now I've written a VIC text scroller for the Z80 :D so here is the next lesson in "the Z80 school" or "how to win over Spectrum fans to the bright side" ;) . I've transscribed all source code listings in this thread from the screen of my C128 so it is a lot of job.

20 REM START PROGRAM WITH BANK0:SYS65488 AFTER ASSEMBLING IT, THE NEXT LINE STARTS THE ASSEMBLER
50 SYS4000
150 .BANK 0 ;ASSEMBLE TO BANK 0
250 .ORG $3000 ;PUT THE PROGRAM AT $3000
350 .MEM ;ASSEMBLE TO MEMORY (NOT TO FILE)
450 LD SP,$2000 ;SET STACK POINTER, NOTE THAT THE Z80 HAS A 16-BIT STACK POINTER
550 LD BC, $0038 ;THE Z80 STARTS AT $38 AT IRQ IN INTERRUPT MODE 1
650 LD A,195 ;OPCODE FOR THE JP INSTRUCTION
750 LD (BC),A
850 LD A,950 INC C
1050 LD (BC),A
1150 LD A,>IRQ'ROUTINE
1250 INC C
1350 LD (BC),A
1655 LD BC,$0400 ;START OF SCREEN MATRIX
1660 LD DE,1000   ;MAX NUMBER OF CHARACTERS ON SCREEN
1665 LD A,32 ;SCREEN CODE FOR SPACE
1670 - LD (BC),A ;START OF CLEAR SCREEN LOOP
1675 INC BC
1680 DEC DE
1685 JR NZ,- ;END OF CLEAR SCREEN LOOP
1690 LD BC,$1000 ;START OF COLOR RAM WHEN Z80 IS ENABLED
1695 LD D,40 ;MAX NUMBER OF CHARACTERS ON FIRST ROW
1700 LD A,7 ;YELLOW COLOR
1705 - OUT (C),A ;START OF SET FIRST ROW TO YELLOW LOOP
1710 INC C
1715 DEC D
1720 JR NZ,- ;END OF SET FIRST ROW TO YELLOW LOOP
1725 XOR A ;TRICK TO SET ACCUMULATOR TO 0 WITH ONE-BYTE INSTRUCTION
1727 LD BC,53280 ;BORDER COLOR REGISTER
1730 OUT (C),A ;SET BORDER COLOR TO BLACK
1732 INC C ;INCREASE BC TO 53281 (SCREEN COLOR REGISTER)
1735 OUT (C),A ;SET SCREEN COLOR TO BLACK
1737 LD BC,$D016 ;HORIZONTAL SCROLL REGISTER
1740 OUT (C),A ;SET 38-COLUMN MODE
1750 LD A,(ACTUAL'SCROLL'TXT)
1850 LD ($0427),A ;INITIALIZE LAST CHARACTER ON FIRST ROW TO FIRST CHARACTER OF SCROLL-TEXT
2250 LD HL,ACTUAL'SCROLL'TXT ;INITIALIZE CHARACTER POINTER
2300 ;THE KERNAL HAS ALREADY SET UP RASTER IRQ IN THE VIC CHIP SO WE DON'T DO IT AGAIN
2650 IM 1 ;SET INTERRUPT MODE 1
2750 EI ;ENABLE Z80 INTERRUPTS
2850 ETERNAL'LOOP JP ETERNAL'LOOP
2950 IRQ'ROUTINE LD BC,$D019
3050 IN A,(C)
3150 OUT (C),A ;CLEAR VIC RASTER IRQ
3250 EI ;Z80 INTERRUPTS HAVE TO BE ENABLED AGAIN IN THE IRQ ROUTINE, ELSE NO MORE INTERRUPTS WILL OCCUR
3350 LD BC,$D016
3450 IN A,(C) ;READ HORIZONTAL SCROLL REGISTER IN VIC
3550 AND 7
3650 CALL Z,SCROLL'ONE'CHAR ;JUMP IF IT'S TIME TO SCROLL A WHOLE CHARACTER
3750 DEC A
3950 OUT (C),A ;WRITE NEW HORIZONTAL SCROLL VALUE TO VIC
4050 RET ;RETURN FROM IRQ
4150 SCROLL'ONE'CHAR EXX ;SWITCH TO ALTERNATE REGISTER SET TO NOT OVERWRITE CHAR POINTER IN HL
4160 LD HL,$0401 ;SOURCE ADDRESS
4250 LD DE,$0400 ;DESTINATION ADDRESS
4350 LD BC,39 ;NUMBER OF CHARACTERS TO COPY
4450 LDIR ;BLOCK-COPY INSTRUCTION TO SCROLL FIRST ROW ONE CHARACTER TO THE LEFT
4550 EXX ;SWITCH BACK TO NORMAL REGISTER SET
4850 INC HL ;INCREASE CHARACTER POINTER
4950 LD A,(HL) ;READ CHARACTER THAT POINTER POINTS TO
4960 CP @"@" ;COMPARE WITH "@" (INDICATES END OF SCROLL TEXT), THE FIRST @ INDICATES SCREEN CODE IN POWER ASSEMBLER
4965 CALL Z,RESTART'SCROLL ;JUMP IF END OF SCROLL TEXT REACHED
5050 LD ($0427),A ;WRITE NEW CHARACTER AT LAST POSITION ON FIRST ROW
5250 LD A,8 ;A DEC A INSTRUCTION AFTERWARDS RESETS THE $D016 REGISTER TO 7
5350 RET ;RETURN TO LINE NUMBER 3750
5450 RESTART'SCROLL LD HL,SCROLL'TXT ;RESET TEXT POINTER TO START OF SPACES BEFORE SCROLL-TEXT
5550 LD A,(HL) ;READ FIRST SPACE
5650 RET ;RETURN TO LINE NUMBER 5050
5750 SCROLL'TXT .SCR "                                      " ;38 SPACES, .SCR MEANS SCREEN CODES IN POWER ASSEMBLER (NOT PETSCII)
5850 ACTUAL'SCROLL'TXT .SCR "THE Z80 HAS WOKEN UP AND IS GIVING YOU THIS SCROLLY MESSAGE.@"
6050 * =$FFD2
6150 .BYT $BE ;CHANGE BANK 0 TO BANK 2 IN 8502->Z80 ROUTINE TO AVOID Z80 BOOT ROM AT ADR 0
6250 * =$FFEE
6350 JP $3000 ;THE Z80 RESUMES AT $FFEE, LET'S PUT A JUMP INSTRUCTION TO OUR CODE THERE

hydrophilic

Cool

Only thing that confused me is  LD A,(ACTUAL'SCROLL'TXT).  Looking at it, that seems to be loading the accumulator from memory specified by a label? I've never seen a label with apostrophes before!

One optimization you could do is in the clear screen routine by using LDIR
LD HL,$0400 ;SOURCE ADDRESS
LD (HL),$20 ;fill first character with space
LD DE,$0401 ;DESTINATION ADDRESS
LD BC,999 ;NUMBER OF (remaining) CHARACTERS TO COPY
LDIR

I know that won't work for color memory at $1000 because you have to use OUT, but I always wondered if you could use LD(HL),A if HL is the 'normal' address = $d800 ?

Christian Johansson

Yes, in Power Assembler (Buddy) it's possible to have labels containing apostrophes. Thank you for the optimization tip!

cpu

#31
Hi, all, first

Quote from: David Murray on January 17, 2007, 04:24 AM
I know the topic has been brought up before and I've seen people argue over it.  But I still would like to see how fast a program written for the 128 for operating on the Z80 would actually run.  I mean something like a demo or a GUI or something like that.   I realize the CPU has a few bottleneck issues, but I bet it could do some hardcore math better than the 6502 code.

In a C128 architecture the z80 does not perform well. From wikipedia:

"The Z80 machine cycles are sequenced by an internal state machine which builds each M-cycle out of 3,4,5 or 6 discrete steps (i.e. clock cycles) depending on context. This avoids cumbersome asynchronous logic and makes the control signals behave consistently at a wide range of clock frequencies. Naturally, it also means that a higher frequency crystal must be used than without this subdivision of machine cycles (approximately 2-3 times higher). It does not imply tighter requirements on memory access times however, as a high resolution clock allows more precise control of memory timings and memory therefore can be active in parallel with the CPU to a greater extent (i.e. sitting less idle), allowing more efficient use of available memory performance. For instruction execution, the Z80 combines two full clock cycles into a long memory access period (the M1-signal) which would typically last only a fraction of a (longer) clock cycle in a more asynchronous design (such as the 6800, or similar)."


What does it mean? that for a z80 running at 4Mhz is 'normal' as running at 1Mhz for a 6502 is.
Comparing the two processor performances is not an easy task: they have too much difference in their architecure. What it's possible to say is that under majority of situation having a z80 at 4Mhz perform usually better in average of 10-30% than a 65xx or 85xx running at 1Mhz. (We are talking of common clock speed for the old days, today we can have those cpu running at several hundreds of Mhz, but is useless)

Of course, because on the C128 the z80 waste 1/2 of the time to do nothing you end up with a very slow cpu compared to the 8502 especially when referring to fast mode.

So imho there is no really reason to do z80 asm programming on C128. Even if you can do 16 bit math a bit more easy...
(That's my point, of course)


airship

I know, bumping an old thread.  ;/

Quote from: Christian Johansson on January 31, 2007, 05:18 AMWith the Z80, you can do some things more easily. You can for example do 16-bit arithmetic and you can copy a block of data with just one instruction.

So this raises the question: Would it make sense to use the Z80 to do 16-bit math and move blocks of data? Or is it faster to go ahead and use 6502 routines for these tasks?
Serving up content-free posts on the Interwebs since 1983.
History of INFO Magazine

hydrophilic

Quote from: airshipOr is it faster to go ahead and use 6502 routines for these tasks?
You have an REU (jealous) so let it do the moving if raw speed is the concern!  Seriously,  the LDIR instruction takes 21 cycles/byte.  These are (effectively) 2MHz cycles so about 10.5us / byte.  The generic 6502 code is like

LDA (s),Y
STA (d),Y
INY
BNE loop

That's (5+6+2+3) = 16 cycles/byte.  At 1MHz (VIC screen) this is about 16us/byte so the Z80 wins.  But at 2MHz (VDC only) this is only 8us/byte so the 6502 wins!

Of course there are other factors to consider, like page-crossing and less-than-a-page situations.  However, that was the "generic" 6502 version.  You can save 2 cycles/byte by using absolute indexed addressing (which requires static addresses or self-modifying code).  You can save an additonal cycle/byte with zero-page relocation or 2 cycles/byte with stack-page relocation.  Imagine:

LDA abs,Y
PHA
INY
BNE loop

Assuming 'abs' is page-aligned, that's 12us / byte at 1MHz, close to the Z80, or 6us / byte at 2MHz, smoking the Z80.  Of course you must consider the overhead of manipulating the MMU in this case.  Then again, unless you're program is entirely Z80 code, you must consider the Z80 calling overhead...

It's really a design consideration.  If you're moving a lot of data frequently, the Z80 sounds like a good choice, especialy if using the VIC.

I wonder about 16-bit math.  With fast multiply routines, it seems like 6502 would beat or tie the Z80 since the Z80 is a bit slow at reading tables.  Otherwise I think the Z80 would win by a fraction...


LD HL,(a1)
EX DE,HL
LD HL,(a2)
ADD HL,DE
LD (a3),HL

That's 16+4+16+11+16 = 63 cycles or about 31.5us.  Compare with

CLC
LDA <a1
ADC <a2
STA <a3
LDA >a1
ADC >a2
STA >a3

That's 2+6*3 = 20 cycles(us) if all zero-page addresses or 2+6*4 = 26 cycles(us) if all non-zero page addresses.  I was wrong!  The 6502 smokes the Z80!  (we won't even mention the fact the 6502 can do this double speed if needed).

Now Z80 fans might cry foul because my example uses absolute addressing while most Z80 progs are written with register addressing which is generally faster.  On the other hand, the 6502 was using simple (non-indexed) addressing.  Only a few extra cycles are needed by the 6502 to add indexing while the Z80 would need several extra cycles.

I worry I'm short-changing the Z80 so lets try multiply by 10.

LD HL,(s)
LD D,H
LD E,L ;DE=HL
ADD HL,HL ;*2
ADD HL,HL ;*4
ADD HL,DE ;*5
ADD HL,HL ;*10
LD (d),HL

That's 16+4*2+11*4+16 = 84 cyles or about 42us.  Compare with

LDA >s
STA >t
STA >d
LDA <s
STA <t ;t = s
ASL A
ROL >d ;*2
ASL A
ROL >d ;*4
ADC <t
PHA
LDA >d
ADC >t
STA >d
PLA ;*5
ASL A
STA <d
ROL >d ;*10

Oh boy, that's 5*3+7*2+4*3+7+3+7 = 58 cycles if all zero-page addresses.  Or 5*4+8*2+4*4+7+4+8 = 71 cycles if all not zero-page addreses.  So in the worst case (1MHz, all non-zp) the 6502 takes about 71us so the Z80 wins by 69%.  In the best case (2MHz, all zp) the 6502 takes 58/2 = 29us so the 6502 wins by 45%.  So here, again, it's going to be a design consideration.

Based on these examples, I conclude you should seriously consider the Z80 if using 1MHz mode.  Otherwise stick with the 6502.

Of course there is also the issue of code size.  Many bytes for the 6502 code but only about 1/2 or 1/3 as many with Z80 code. 

So many factors in software design!

I'm wondering how well the Z80 would do at floating-point math...

airship

Thanks, HP. I'm always skeptical when I hear an assertion without backup, so 'With the Z80, you can do some things more easily. You can for example do 16-bit arithmetic and you can copy a block of data with just one instruction.' set off alarms in my head. But I wasn't smart enough to do the math. I'm glad you are. :)

And yes, my REU will do data transfers quite nicely, thank you. But I'm always curious as to how the C128s 'natural resources' might be used for more power and capability. The idea of somehow leveraging the Z80, or a 1541, or whatever as a math co-processor has appeal, but I don' t think it generally has legs. Still, for some applications, maybe...
Serving up content-free posts on the Interwebs since 1983.
History of INFO Magazine

BigDumbDinosaur

Something to consider: while the Z80 is busy doing its thing, what do *you* plan to do about interrupts?
x86?  We ain't got no x86.  We don't need no stinking x86!

cpu

#36


LD HL,(a1)
EX DE,HL
LD HL,(a2)
ADD HL,DE
LD (a3),HL

That's 16+4+16+11+16 = 63 cycles or about 31.5us.  Compare with
--

CLC
LDA <a1
ADC <a2
STA <a3
LDA >a1
ADC >a2
STA >a3

ASL A
STA <d
ROL >d ;*10


Quote
Oh boy, that's 5*3+7*2+4*3+7+3+7 = 58 cycles if all zero-page addresses.  Or 5*4+8*2+4*4+7+4+8 = 71 cycles if all not zero-page addreses.  So in the worst case (1MHz, all non-zp) the 6502 takes about 71us so the Z80 wins by 69%.  In the best case (2MHz, all zp) the 6502 takes 58/2 = 29us so the 6502 wins by 45%.  So here, again, it's going to be a design consideration.

Based on these examples, I conclude you should seriously consider the Z80 if using 1MHz mode.  Otherwise stick with the 6502.

Of course there is also the issue of code size.  Many bytes for the 6502 code but only about 1/2 or 1/3 as many with Z80 code. 

So many factors in software design!

I'm wondering how well the Z80 would do at floating-point math...

the z80 code could be written more faster
LD HL,(a1)

LD DE,(a2)
ADD HL,DE
LD (a3),HL

without the EX HL,DE that eats considerable time.

About multiplication, please consider that using tables eats memory. For a generic multiply routine it's not a suitable solution.
It's interesting to see that most z80 systems works at 3-4 Mhz normally. At this speed (that is the common one) results are very different.
But, unfortunately the design of the C128 does not allowed Commodore Engineers to make better integration