C128 PHA memory-fill with relocatable stack

Started by XmikeX, December 15, 2010, 10:43 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

XmikeX

A little cold spell in my area kept me from doing anything but minimal work and reading C128 programmer's guides online.

.....and just then (!), I ran into $D509 MMU register on the C128.  This involves stack relocation, which I had never tried before.  Out of fear, perhaps?
Sure, I had a vague distant memory about this bit of MMU magic in the past, but today, I finally decided to do something about it. =)

The following code fills the VICII 40x25 screen with a single byte pattern.

    SEI                                        - set interrupt flag : prevents IRQ's from messing up our fill
    LDY #$05                              - countdown for Y-Loop below : screen mem 04xx - 08xx  #$05 is sufficient to fill screen without garbage bytes, see below.**

STACKBLAH
    LDA #$SCREENPAGE             - SET #$SCREENPAGE to value #$04, if $0400 is the start of your VIC-II's screen-memory, as per default state :  self mod increment below will change this value as the program runs

    STA $D509
    LDA #$XX                             - XX = whatever character u want .. #$53 = Hearts
    LDX #$FF                             - X counter, describes the amount of hearts (#$53) / whatever (#$XX) to print per inner loop, #$FF character bytes is sufficient to illustrate this fill, for our purposes... even though we probably could use an extra one.

    PHA                                      - ok ok, here's that extra one.

BLAH2
    DEX                                      - decrement X
    PHA                                      - push #$XX character onto relocated stack
    BNE BLAH2                           - Is X at 0?  if not.. keep LOOPING !
    LDA $SCREENPAGE               - load #$SCREENPAGE value from the memory location where #$SCREENPAGE sits
    ADC #$01                            - add one to that value (this now sets a new page in screen mem)
    STA $SCREENPAGE               - self-modifying code, store the new #$SCREENPAGE value to where the old #$SCREENPAGE sits in memory
EDIT     ^^^^ this ADC business is ugly. Replace with INC $SCREENPAGE.  I forgot about 6502's handy INC $xxxx. =)

    DEY                                      - decrement Y from LDY above.  if we want to fill screen mem with stack PHA, we need at least 4 pages (see below**).  the LDY/DEY loop embraces the rest of the program and executes it as many times as we need (within 8 bit counter limit).

    BNE STACKBLAH                    -  is Y at 0?  if not.. keep LOOPING !
    CLI                                        - clear interrupt flag : allow IRQs to happen again
    BRK                                       - exit program with BRK IRQ --useful for testing code with ML monitor.

LDY #$05 above allows the screen to be overwritten without garbage output from normal stack operations being in the visible screen area.  (In other words, the final stack relocation will occur in the memory page just after the visible screen area.) Change LDY value to LDY #$04 to visualize the active stack at the bottom of the screen.

FOR THE ASSEMBLER-HANDICAPPED : RAW OUTPUT FROM ML MONITOR (enter this in "Bank 15" or "F" configuration)
--
.F3000 SEI
.F3001 LDY #$05
.F3003 LDA #$04
.F3005 STA $D509
.F3008 LDA #$53
.F300A LDX #$FF
.F300C PHA
.F300D DEX
.F300E PHA
.F300F BNE $300D
.F3011 LDA $3004   --
.F3014 ADC #$01    --  or replace these with INC $3004
.F3016 STA $3004   --
.F3019 DEY
.F301A BNE $3003
.F301C CLI
.F301D BRK
--
Use G F3000 to execute from ML MONITOR

Remember, PHA (PusH Accumulator)  takes 3 cycles to execute vs. typical 4 for STA (not including zero page STA).  MMU "tricks" like relocating Zero Page or Stack for memory fill (as seen above) may be useful in certain circumstances.

MMU tricks are very old news to some of you and so I'll apologize in advance for any inefficiencies or errors which may have occurred during the course of this text. =)

Cheers,
XmX

XmikeX

#1
A more polished version of the above, with some added bells and whistles, is attached here.

XmX

2,133 byte, zipped .d64 follows.

Hydrophilic

Yeah it's usefull for sequential access if you can live without interrupts for a moment.  It's a real problem otherwise.

For random access, page 0 relocation is usefull because STA z,X takes 4 cycles unlike STA n,X [5] or STA (z),Y [6].  Other opcodes are available too.  Of course it has the problem of no $00 or $01 access.

In either case, I thought I should point out that both tricks also work well in Bank 1 but only if you disable Common RAM at the bottom of memory.
I'm kupo for kupo nuts!

XmikeX

#3
In case anyone else is wondering, ZP fill as hydrophilic mentioned (STA $zp,x) can work as follows:

.. for illustrative purposes on the 40x25 screen .. remember, VIC-IIe screen memory typical range is $0400-07FF ..

Paste into the ML monitor:
.f3000 ldy #$04         <--- set Y to the number of 256-byte pages you want to fill
.f3002 ldx #$02         <--- set X to #$02 to avoid $00/01 in sta $00,x below
.
.For sanity's sake, we're avoiding the CPU's built-in I/O ports at $00/01.  They are always visible while 8502 is active and therefore all references to/from $00/01 in Zero Page or Relocated Zero Page will affect these I/O ports.
.
.f3004 lda #$04         <--- we need to set up a start page for ZP relocation, so we load A with #$04 and ..
.f3006 sta $d507       <--- relocate zero page to start at $0400 here.
.f3009 lda #$00         <--- load A with our fill-byte, which is #$00, representing"@"
.f300b sta $00,x        <--- start at $02 ($00+02) and work up to $FF .. avoid $00/01
.f300d inx                   <--- increment X so that we can work up to $FF in the same page
.f300e bne $300b      <---  this checks if X=0, since X will overflow to #$00, and then we can leave the loop
.f3010 inc $3005        <--- increment the memory location holding #$04 above, so that we can score a new page to fill (#$04 becomes #$05.. page pointer sets to $0500..etc)
.f3013 dey                 <--- decrement Y from above.. Y keeps track of how many pages we want to fill.
.f3014 bne $3002      <--- this checks if Y=0, once Y hits #$00, we can leave this loop
.f3016 lda #$00         <--- we're done with our loops, so we load A with #$00
.f3018 sta $d507       <--- and stash A here to set zero page to its normal spot
.
Are we done?  not yet...
.
Since we've just safely relocated zero page to its usual spot, we can fill in these missing $00/01 areas very simply :
.
.f301b sta $0400       <--- to fill in $00/01 stuff that we avoided earlier, we apply these sta's as patches
.f301e sta $0401             **these sta's reuse the #$00 byte we loaded recently into A because #$00 is
.f3021 sta $0500                 our original fill-byte in any case.  Whatever fill-byte you use must be loaded into A
.f3024 sta $0501                 before applying these sta's here.
.f3027 sta $0600
.f302a sta $0601
.f302d sta $0700
.f3030 sta $0701
.f3033 rts                   <--- return from subroutine

j f3000        to execute within monitor
--

Benefits with ZP relocation and STA $zp,X vs. STACK relocation and PHA, as noted by hydrophilic:
    - we can have IRQs happen while we are messing with ZP
    - we can do sequential or random access at 4 cycles per byte
--

Unfortunately, MMU STACK relocation and PHA is still a very appealing 3 cycles per byte.

Fortunately, STA $zp happens to be 3 cycles, IIRC.... so we can trade memory for speed.
For example, we can set up a "table" of unrolled STA $zp bytes, instead of STA $zp,X.  We can then jump to this "table" wherever and whenever needed.

...jump here at start of "table" or anywhere within the "table"
...With respect to sta $zp, your "table" can be organized beforehand to suit your needs.  It's not necessary to have sta $02..03..04.. as i've laid it out.
sta $02      <-- 3 cycles per sta $zp
sta $03   
sta $04
sta $05
[...]
sta $fc
sta $fd
sta $fe
sta $ff
...jump back to wherever, or return to whatever called the "table"

We now have the benefit of 3 cycles per byte and no cycles lost to loop-activity.  Setting up a "table" for PHA works as well, but the speed gains for PHA will only come from being outside a loop since each PHA remains at 3 cycles.

The slight disadvantages to "tables" is that "table" action must be set-up beforehand, and they do eat up some memory.

XmX

PS: No one pointed out that ugly ADC business (and other trifles) in my original post.  It's been a while since I've messed with this stuff, and I momentarily forgot about some things .... like INC $xxxx. =)

PPS: Thanks to iAN_CooG for reminding me of certain properties of ADC (CLC, ADC).

Hydrophilic

I like your idea of
STA $02
STA $03
STA $04
...

I've used ZP redirect with indexing for bitmap graphics and was doing something like this
STA $02,X
STA $0A,X
STA $12,X
STA $1A,X
...
To fill an horizontal line (X is the raster in cell, 0~7).  If I ever get back to that project, I'll have to try 8 sets of your idea!
I'm kupo for kupo nuts!