Challenge - make this routine faster/shorter

LokalHorst · July 24, 2010, 01:47 AM

Inspired by the VDC questions lately, I've hacked in a small routine which multiplies the value of A by 80 decimal. The version following below uses exactly 32 bytes.
I know that a table lookup would be faster, but it consumes n times 2 bytes memory (50 bytes for a 25line lookup), aswell as n times add 80d to the result in a loop would be shorter, but definitely slower.
Anyone who can explain "how does this multiply with right-shifts = division by 2 ?" receives a rep.-point

;ZP-adresses used
ResLo = $FC
ResHi = $FD

;on entry A is the multiplicator, the result is returned in ResLo = lo byte and ResHi = hi byte (ResHi also in A on return)
;X and Y are used to hold intermediate results

Mul80:
ldx #$00
stx ResLo
lsr
ror ResLo
lsr
ror ResLo
ldy ResLo
sta ResHi
lsr
ror ResLo
lsr
ror ResLo
tax
tya
adc ResLo
sta ResLo
txa
adc ResHi
sta ResHi
RTS

BigDumbDinosaur · July 24, 2010, 02:32 AM

For a more generalized approach that can multiply any two 8 bit numbers, how about this one:

;DOUBLE-PRECISION MULTIPLICATION -- PRODUCT IN FAC1
;
; -----------------------
; .X = 8 bit multiplicand
; .Y = 8 bit multiplier
; -----------------------
; .A = used
; .X = used
; .Y = entry value
; -----------------------
;
dpmult lda #0
sta fac1+1 ;clear product MSB
stx fac1 ;save operands
sty fac2
ldx #8 ;bits to multiply
;
dpmult01 asl a ;shift partial product LSB
rol fac1+1 ;rotate partial product MSB
asl fac2 ;shift multiplier
bcc dpmult02 ;skip add if next bit is zero
;
clc ;add partial product to...
adc fac1 ;multiplicand
bcc dpmult02
;
inc fac1+1 ;bump partial product MSB
;
dpmult02 dex
bne dpmult01 ;next bit
;
sta fac1 ;save product LSB
rts
;

In the above, FAC1 (two bytes) and FAC2 (one byte) are located on zero page. The routine itself is exactly 30 bytes.

LokalHorst's routine does have the advantage of faster execution. Incidentally, direction of shift doesn't matter if the total shifts end up being the same. The routine could be made to work with left shifts, but there's no advantage one way or the other.

LokalHorst · August 10, 2010, 04:27 AM

Quote from: BigDumbDinosaur on July 24, 2010, 02:32 AM
...
Incidentally, direction of shift doesn't matter if the total shifts end up being the same. The routine could be made to work with left shifts, but there's no advantage one way or the other.

Yes, if the # of shifts is the same, but in my specific case the version with asl rol would need 6 shifts, two more (+6 bytes and 14 Tcycles) than the lsr ror variant. Anyway, I made a small change to the original version to free the use of the Y-register, which lengthens the execution time from 57 to 60 TCycles (w/o RTS) but this is more than worth it, because the calling program doesn't need to preserve the register.

;ZP-adresses used
ResLo = $FC
ResHi = $FD

;on entry A is the multiplicator, the result is returned in ResLo = lo byte and ResHi = hi byte (ResHi also in A on return)
;X and A used to hold intermediate results

Mul80:
ldx #$00
stx ResLo
lsr
ror ResLo
lsr
ror ResLo
ldx ResLo
sta ResHi
lsr
ror ResLo
lsr
ror ResLo
pha
txa
adc ResLo
sta ResLo
pla
adc ResHi
sta ResHi
RTS

(as no one had explained the little trick inside, I'll give a tip - the summing at the end adds A*64 + A*16)

The generic 8x8 bit multiply from above can also be modified to not use the Y-register:

;DOUBLE-PRECISION MULTIPLICATION -- PRODUCT IN FAC1
;
; -----------------------
; .A = 8 bit multiplicand
; .X = 8 bit multiplier
; -----------------------
; .A = used
; .X = used
; -----------------------
;
dpmult sta fac1
stx fac2 ;save operands
lda #0
sta fac1+1 ;clear product MSB
ldx #8 ;bits to multiply
;
dpmult01 asl a ;shift partial product LSB
rol fac1+1 ;rotate partial product MSB
asl fac2 ;shift multiplier
bcc dpmult02 ;skip add if next bit is zero
;
clc ;add partial product to...
adc fac1 ;multiplicand
bcc dpmult02
;
inc fac1+1 ;bump partial product MSB
;
dpmult02 dex
bne dpmult01 ;next bit
;
sta fac1 ;save product LSB
rts
;

LokalHorst · August 10, 2010, 04:33 AM

Although there seem to be not much interest, here's another short'n'quick:

This lil' routine converts a BCD number in A to it's binary equivalent, returned in A

tmp = $FB

bcd2bin:
PHA
AND #$F0
LSR
STA tmp
LSR
LSR
ADC tmp
STA tmp
PLA
AND #$0F
ADC tmp
RTS

Hydrophilic · August 16, 2010, 10:52 PM

Keep up the good work, LokalHorst!

Althogh I have no improvement on your "multiply by 80" or "BCD to binary" routines, these things are welcomed by me. For example, in my "3d engine" I often have to multiply by 128, 160, or 320 (depending on "video mode"). I have written several versions, and I don't think any are perfect. When I get a chance, I will do a comparison... your "multiply by 80" seems VERY similar to my "multiply by 160", for example.

Challenge - make this routine faster/shorter

LokalHorst

BigDumbDinosaur

LokalHorst

LokalHorst

Hydrophilic