Challenge - make this routine faster/shorter

Started by LokalHorst, July 24, 2010, 01:47 AM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

LokalHorst

Inspired by the VDC questions lately, I've hacked in a small routine which multiplies the value of A by 80 decimal. The version following below uses exactly 32 bytes.
I know that a table lookup would be faster, but it consumes n times 2 bytes memory (50 bytes for a 25line lookup), aswell as n times add 80d to the result in a loop would be shorter, but definitely slower.
Anyone who can explain "how does this multiply with right-shifts = division by 2 ?" receives a rep.-point  ;D

;ZP-adresses used
ResLo = $FC
ResHi = $FD

;on entry A is the multiplicator, the result is returned in ResLo = lo byte and ResHi = hi byte (ResHi also in A on return)
;X and Y are used to hold intermediate results

Mul80:
ldx #$00
stx ResLo
lsr
ror ResLo
lsr
ror ResLo
ldy ResLo
sta ResHi
lsr
ror ResLo
lsr
ror ResLo
tax
tya
adc ResLo
sta ResLo
txa
adc ResHi
sta ResHi
RTS


BigDumbDinosaur

For a more generalized approach that can multiply any two 8 bit numbers, how about this one:

;DOUBLE-PRECISION MULTIPLICATION -- PRODUCT IN FAC1
;
;    -----------------------
;    .X = 8 bit multiplicand
;    .Y = 8 bit multiplier
;    -----------------------
;    .A = used
;    .X = used
;    .Y = entry value
;    -----------------------
;
dpmult   lda #0
         sta fac1+1            ;clear product MSB
         stx fac1              ;save operands
         sty fac2
         ldx #8                ;bits to multiply
;
dpmult01 asl a                 ;shift partial product LSB
         rol fac1+1            ;rotate partial product MSB
         asl fac2              ;shift multiplier
         bcc dpmult02          ;skip add if next bit is zero
;
         clc                   ;add partial product to...
         adc fac1              ;multiplicand
         bcc dpmult02
;
         inc fac1+1            ;bump partial product MSB
;
dpmult02 dex
         bne dpmult01          ;next bit
;
         sta fac1              ;save product LSB
         rts
;


In the above, FAC1 (two bytes) and FAC2 (one byte) are located on zero page.  The routine itself is exactly 30 bytes.

LokalHorst's routine does have the advantage of faster execution.  Incidentally, direction of shift doesn't matter if the total shifts end up being the same.  The routine could be made to work with left shifts, but there's no advantage one way or the other.
x86?  We ain't got no x86.  We don't need no stinking x86!

LokalHorst

#2
Quote from: BigDumbDinosaur on July 24, 2010, 02:32 AM
...
Incidentally, direction of shift doesn't matter if the total shifts end up being the same.  The routine could be made to work with left shifts, but there's no advantage one way or the other.
Yes, if the # of shifts is the same, but in my specific case the version with asl rol would need 6 shifts, two more (+6 bytes and 14 Tcycles) than the lsr ror variant. Anyway, I made a small change to the original version to free the use of the Y-register, which lengthens the execution time from 57 to 60 TCycles (w/o RTS) but this is more than worth it, because the calling program doesn't need to preserve the register.

;ZP-adresses used
ResLo = $FC
ResHi = $FD

;on entry A is the multiplicator, the result is returned in ResLo = lo byte and ResHi = hi byte (ResHi also in A on return)
;X and A used to hold intermediate results

Mul80:
ldx #$00
stx ResLo
lsr
ror ResLo
lsr
ror ResLo
ldx ResLo
sta ResHi
lsr
ror ResLo
lsr
ror ResLo
pha
txa

adc ResLo
sta ResLo
pla
adc ResHi
sta ResHi
RTS

(as no one had explained the little trick inside, I'll give a tip - the summing at the end adds A*64 + A*16)


The generic 8x8 bit multiply from above can also be modified to not use the Y-register:

;DOUBLE-PRECISION MULTIPLICATION -- PRODUCT IN FAC1
;
;    -----------------------
;    .A = 8 bit multiplicand
;    .X = 8 bit multiplier
;    -----------------------
;    .A = used
;    .X = used
;    -----------------------
;
dpmult   sta fac1
         stx fac2              ;save operands
         lda #0
         sta fac1+1            ;clear product MSB
         ldx #8                ;bits to multiply
;
dpmult01 asl a                 ;shift partial product LSB
         rol fac1+1            ;rotate partial product MSB
         asl fac2              ;shift multiplier
         bcc dpmult02          ;skip add if next bit is zero
;
         clc                   ;add partial product to...
         adc fac1              ;multiplicand
         bcc dpmult02
;
         inc fac1+1            ;bump partial product MSB
;
dpmult02 dex
         bne dpmult01          ;next bit
;
         sta fac1              ;save product LSB
         rts
;



LokalHorst

#3
Although there seem to be not much interest, here's another short'n'quick:

This lil' routine converts a BCD number in A to it's binary equivalent, returned in A

tmp = $FB

bcd2bin:
PHA
AND #$F0
LSR
STA tmp
LSR
LSR
ADC tmp
STA tmp
PLA
AND #$0F
ADC tmp
RTS

Hydrophilic

Keep up the good work, LokalHorst!

Althogh I have no improvement on your "multiply by 80" or "BCD to binary" routines, these things are welcomed by me.  For example, in my "3d engine" I often have to multiply by 128, 160, or 320 (depending on "video mode").  I have written several versions, and I don't think any are perfect.  When I get a chance, I will do a comparison... your "multiply by 80" seems VERY similar to my "multiply by 160", for example.
I'm kupo for kupo nuts!