rAvsz}@@@ rtp` MsQ1e@9$XunBV`g4+" ` Mu >UgG@C\`j(Ce@ )$        !#%')+-/13579;=? A!C"E#G$I%K&M'O(Q)S*U+W,Y-[.]/_0a1c2e3g4i5k6m7o8q9s:u;w<y={>}?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abc ` ; FSCOPY - Fast String Copy ; A - Source BP ; B - Dest BP ; C - char count ; Note: Don't set KA/KLWINC less than 10 or things may screw up. ; actually range is > 5 for lsh'ing and > 9 for blt'ing. ; If running on a KL, call FSKLCV once before using FSCOPY. ; Various declarations and storage kawinc==12. ; # chars at which hairy word move starts winning over bp loop klwinc==18. ; # for KL flpac==6 ; # of first AC used by fast loop, which takes 7 ACs. $stent==1 ; offset from beg of loop for entry to STORE phase $gent==4 ; offset from beg of loop for entry to GET phase ; KL loop hacked here. This must be in impure storage. fscpkl: 0 ; a LSHC A, for SHIFT-IN goes here lsh a,1 movem a,(c) ; Address of dest stored in RH here 0 ; a LSHC A, for SHIFT-OUT goes here move b,(c) ; Address of source stored in RH here aobjn c,fscpkl jrst @fentrm(d) ; Space to save ACs in. savacs: block 14-1 ; accs 1 thru 13 sava14: 0 ;--------------------------------------------------------------------- ; Convert FSCOPY to speedier setup for KL (don't run in ACs) ; Do this before purifying. fsklcv: push p,[cail c,klwinc] pop p,fscopy push p,[jrst fscpy4] pop p,fscpy3 push p,[lshc a,@fscpkl] pop p,fscptl popj p, ;----------------------------------------------------------------------- ; Here we go.... fscopy: cail c,kawinc ; Less than break-even point? jrst fscpy2 ; No, use hairy word copy. pushae p,[a,b,c,d] ; Less than break-even, faster to use ildb d,a ; simple byte-by-byte copying. idpb d,b ; (actually, it would have been better to sojg c,.-2 ; not have called FSCOPY at all, popae p,[d,c,b,a] ; mainly due to this save-ac overhead!) popj p, ; Wheee, using hairy word copying! fscpy2: movem 14,sava14 ; This is a pain, but faster than using PDL, move 14,[1,,savacs] ; and we need to do something blt 14,sava14-1 ; since ACs will be massacred. ldb e,[360300,,a] ; get low 3 bits of P field for source skipge e,fschtb(e) ; Get resulting # chars, skip if addr ok movei a,1(a) ; P= 01, must bump address. movei 12,1(a) ; anyway, get addr+1 into 12. ldb d,[360300,,b] ; Repeat procedure for dest skipge d,fscht2(d) ; using slightly different table aosa 10,b ; and addr goes into 10 movei 10,(b) ; and isn't normally bumped. ; Now get index for shift values, and count for words subi c,(e) ; Get # chars minus those in 1st src wd. addi e,-6(d) ; Get E index - d*5+s, zero based. idivi c,5 ; find # words to loop through, rem in d. move b,-1(12) ; and get 1st word of source. jrst @fpath(e) ; Must now pick a path... ; BLT possible! Jump to fsblt0 if no shifting needed for setup. fsblt0: movem b,(10) ; store source word directly jrst fsblt4 fsblt: lsh b,@shasl(e) ; Shift source up against left move a,(10) ; get 1st wd of dest. lsh a,@shadr(e) ; right-adjust it lshc a,@shfix(e) ; and get everything into A. lsh a,1 ; need one more bit's worth. movem a,(10) ; Store 1st wd of dest... ; Now settle down to serious BLT'ing. fsblt4: movei 6,(c) ; transfer word count addi 6,(10) ; find addr of last dest word movei 10,1(10) ; Now get 1st dest addr, hrli 10,(12) ; and put 1st source addr in LH. blt 10,(6) ; Zoom!! jumpe d,fscpy9 ; if no remainder, super win - done! addi 12,(c) ; Hmm, must get last source word. move b,(12) ; like so. move a,fbmsk(d) ; and a word mask for chars and b,a ; clear unused bits from source, andcam a,1(6) ; and zap target bits in dest. iorm b,1(6) ; and stick last chars in. jrst fscpy9 ; OK, all done... ; Can't do BLT. Well, get A and B set up for magical shift loop. shskp2: lsh b,@shasl(e) ; Here, only need to adjust source, jrst shskp5 ; since dest will be totally clobbered. fsshft: lsh b,@shasl(e) ; Here, both src and dest must be integrated. shskp1: move a,(10) ; Here, only need adjust dest; src wd is full. lsh a,@shadr(e) shskp5: lshc a,@shfix(e) ; Stuff as many chars as possible into A. caie d,0 ; I `z f any remainder, movei c,1(c) ; add 1 more word. movni c,(c) ; Make AOBJN pointer. movsi c,(c) ; Now make another index for termination wrapup purposes. add d,ffindx(e) ; Make new index using # chs left in last wd. ; Now set things up for loop, and enter it. fscpy3: move 6,fshint(e) ; Get shift amount for shift-in move 7,[lsh a,1] ; This takes care of last bit without clobbering following chars hrli 10,(movem a,(c)) ; This stores word into dest move 11,fshout(e) ; Get shift amount for shift-out hrli 12,(move b,(c)) ; Get new word from source move 13,[aobjn c,flpac] ; loop for count of words. move 14,[jrst @fentrm(d)] ; then return to right wrapup rtn. jumpge d,flpac+$stent ; Now enter loop at either the store soja flpac+2,flpac+$gent ; or the get phase. fscpy4: hrli 10,(movem a,(c)) movem 10,fscpkl+2 ; Address for MOVEM hrrm 12,fscpkl+4 ; Address for MOVE move 12,fshint(e) ; Get LSH for shift-in movem 12,fscpkl move 12,fshout(e) ; and shift-out movem 12,fscpkl+3 jumpge d,fscpkl+$stent ; Depending on flag in D, enter loop at store sos 10,fscpkl+2 jrst fscpkl+$gent ; or at get. ;--------------------------------------------------------------------------- ; Come here when loop finished. The last word of the source string ; will be in B. It may have 1 to 5 chars left for moving, but will ; never have 0. ; Long wrapup. fscptl: lshc a,(6) ; Perform a shift-in lsh a,1 movem a,@10 ; Store full word. movei c,1(c) ; increment address index ; and drop through to Medium wrapup. ; Medium wrapup. fscptm: lshc a,@flout(d) ; Shift rest of source word into A move b,@10 ; Get dest word it will be stored into lsh b,@fladj(d) ; left-adjust chars to preserve. ; and drop thru to Short wrapup. ; Short wrapup. fscpts: lshc a,@fflout(d) ; Do final, last, shift-out. andcmi a,1 movem a,@10 ; and store last dest word. ; Done!! Just restore regs and return. fscpy9: move 14,[savacs,,1] blt 14,14 popj p, ; Indexed by low 3 bits of P field, returns # chars ; existing to right of loc BP points to. Hence value ; ranges from 5 to 1; if P = 01, SETZ indicates that ; bp address needs incrementing. fschtb: 1 ; P=10 setz 5 ; P=01, increment addr 0 ? 0 ; randomness 5 ; P=44, full word 4 ; P=35, 4 chars to go 3 ; P=26 2 ; P=17 ; This table is just like FSCHTB except values are pre-multiplied ; by 5 for easy addition into E. fscht2: 1*5 ; P=10 setz 5*5 ; P=01, increment addr 0 ? 0 ; random 5*5 ? 4*5 ? 3*5 ? 2*5 ; This table is indexed by D when it has # chars remaining from ; dividing # chars (in C) by 5. Provides mask for these chars. fbmsk: 0 ; Nothing here. .byte 7 177 ? 0 ? 0 ? 0 ? 0 177 ? 177 ? 0 ? 0 ? 0 177 ? 177 ? 177 ? 0 ? 0 177 ? 177 ? 177 ? 177 ? 0 .byte ; FPATH table vectors off to BLT and other minor stuff as ; soon as all the basic computations are made. ; Indexed by E. fpath: fsblt ? fsshft ? fsshft ? fsshft ? shskp1 fsshft ? fsblt ? fsshft ? fsshft ? shskp1 fsshft ? fsshft ? fsblt ? fsshft ? shskp1 fsshft ? fsshft ? fsshft ? fsblt ? shskp1 shskp2 ? shskp2 ? shskp2 ? shskp2 ? fsblt0 ; SHASL table, contains # bits to shift first source wd left so ; as to left-adjust it in B. Indexed by E. shasl: repeat 5, repeat 5,<4-.rpcnt>*7 ; ent 4,3,2,1,0 ; ent 4,3,2,1,0 ; ent 4,3,2,1,0 ; ent 4,3,2,1,0 ; ent 4,3,2,1,0 ; SHADR table, contains # bits to shift first dest wd right so ; as to right-adjust it in A. Indexed by E. shadr: repeat 5, %%%cnt==.rpcnt+1 ? repeat 5,[ ? 0,,-<%%%cnt*7+1>] ; ent -1,-1,-1,-1,-1 ; ent -2,-2,-2,-2,-2 ; ent -3,-3,-3,-3,-3 ; ent -4,-4,-4,-4,-4 ; ent -5,-5,-5,-5,-5 ; macro to make randomness more bearable. define ent a,b,c,d,e a*7 ? b*7 ? c*7 ? d*7 ? e*7 termin ; SHFIX table, contains # bits to left-shift A and B combined so ; as to move as many characters out of B as possible. Indexed ; by E. MIN(d,e) (d and e after fschtb) shfix: repeat 5,[%%%cnt==.rpcnt repeat 5,[ifle .rpcnt-%%%cnt, <1+.rpcnt>*7 .else <1+%%%cnt>*7 ]] ; ent 1,1,1,1,1 ; ent 1,2,2,2,2 ; ent 1,2,3,3,3 ; ent 1,2,3,4,4 ; ent 1,2,3,4,5 ; FSHINT table, containing appropriate LSHC instructions for shifting ; in the first chars of a fresh source word. Indexed by E. fshint: repeat 5,[%%%cnt==.rpcnt repeat 5,[ifle %%%cnt, %%%cnt==5 LSHC A,%%%cnt*7 %%%cnt==%%%cnt-1 ]] ; ent 5,4,3,2,1 ; ent 1,5,4,3,2 ; ent 2,1,5,4,3 ; ent 3,2,1,5,4 ; ent 4,3,2,1,5 ; FSHOUT table, containing appropriate LSHC instructions for shifting ; out the last chars of an old source word, to make room for a ; new one. Indexed by E. fshout: repeat 5,[%%%cnt==5-.rpcnt repeat 5,[ifge %%%cnt-5, %%%cnt==0 LSHC A,%%%cnt*7 %%%cnt==%%%cnt+1 ]] ; ent 0,1,2,3,4 ; ent 4,0,1,2,3 ; ent 3,4,0,1,2 ; ent 2,3,4,0,1 ; ent 1,2,3,4,0 ; FFINDX table, contains part of D index for fast add-in. ; Indexed by E. Similar to FSHOUT. Sign bit also indicates ; whether entry point is $STENT (pos) or $GENT (ne% g). ffindx: repeat 5,[%%%cnt==5-.rpcnt ? %%%cn2==.rpcnt repeat 5,[ifge %%%cnt-5, %%%cnt==0 ifle %%%cn2-.rpcnt,%%%cnt*5 .else setz %%%cnt*5 %%%cnt==%%%cnt+1 ]] ; ent5 0,1,2,3,4 ; ent5 4,0,1,2,3 ; ent5 3,4,0,1,2 ; ent5 2,3,4,0,1 ; ent5 1,2,3,4,0 ; ent s,s,s,s,s ; entry point flag (sign bit) ; ent g,s,s,s,s ; ent g,g,s,s,s ; ent g,g,g,s,s ; ent g,g,g,g,s define entx a,b,c,d,e ; Last item (5) is actually first (0) 7*e ? 7*a ? 7*b ? 7*c ? 7*d termin ; FENTRM table, dispatching to appropriate wrapup routine when fast AC ; loop is finished. Indexed by D. fentrm: define entxj a,b,c,d,e irp l,,[e,a,b,c,d] fscpt!l termin termin entxj m,m,m,m,s entxj m,m,m,s,l entxj m,m,s,l,l entxj m,s,l,l,l entxj s,l,l,l,l ; FLOUT table, for use by Medium wrapup routine; pushes out remaining ; source chars in B, making room for incoming dest word. ; Indexed by D. flout: entx 1,2,3,4,0 entx 1,2,3,0,1 entx 1,2,0,1,2 entx 1,0,1,2,3 entx 0,1,2,3,4 ; FLADJ table, also for Medium wrapup routine; adjusts dest word in ; B to left-adjust chars to be preserved. fladj: entx 1,2,3,4,5 entx 2,3,4,5,1 entx 3,4,5,1,2 entx 4,5,1,2,3 entx 5,1,2,3,4 ; FFLOUT table, for Short wrapup routine. Final Last shift-out of ; chars in B, so that the last dest word can be stored from A. ; Indexed by D. Adds 1 extra bit since MOVEM A, done right after it, ; and nothing to preserve in B. fflout: define entx1 a,b,c,d,e e*7+1 ? a*7+1 ? b*7+1 ? c*7+1 ? d*7+1 termin entx1 4,3,2,1,5 entx1 3,2,1,4,4 entx1 2,1,3,4,3 entx1 1,2,4,3,2 entx1 1,4,3,2,1 % !l SUBTTL Sqaure Root Procedure ; This is AGB's super fast SQRT routine, probably impossible to ;improve on... clobbers B,C. SQRT: SKIPG B,A POPJ P, ASH A,-1 ADD A,[262370613] ; 0.292893/0.840186 B8. ; or 0.414214/0.594101 B9. TLON A,400 JRST SQRT2 IFN $ITS, FMPRI A,301460 ; 0.594101^101 IFE $ITS, FMPR A,[301460,,0] ; PDP-6 doesn't have FMPRI !! JRST SQRT3 SQRT2: IFN $ITS, FMPRI A,300656 ; 0.840186^100 IFE $ITS, FMPR A,[300656,,0] SQRT3: MOVE C,B FDV B,A FAD A,B ? FSC A,-1 FDVR C,A FAD A,C ? FSC A,-1 ; MORE EXACT THAN FADR POPJ P, !l vThis archive directory is intended to hold various small hacks, primarily PDP-10 assembler routines for such things as byte pointer hacking and the like. For example, FSCOPY is a fast string mover; SQRT is a super fast SQRT routine. Some day various HAKMEM hacks may be reincarnated here.  v$; "Safe BLT" ; Takes U3 and U2 as args to a BLT moving core upwards, i.e. ; U3= ,, and U2=+ ; Problem is to avoid overwritting of source while transferring ; (paradigm is having dest addr=source addr+1 to clear core!) SAFBLT: PUSHAE P,[U1,U2,U3,U4] HRRZ U1,U3 ;get dest addr in u1 HLRZS U3 ;source addr in u3 MOVE U4,U2 SUBI U4,(U1) ;get length of block in u4 JUMPE U4,SAFBL8 ADDI U4,(U3) ;now u1= dest addr, u2= dest addr+ blocklen ; u3= src addr, u4= src addr + blocklen MOVEM U1,DSTSAV' MOVEM U3,SRCSAV' SUBI U1,(U3) ;get step size MOVEM U1,STPSIZ' ;save ;now lasts = u4 and lastd = u2, find news and newd MOVE U3,U4 MOVE U1,U2 SAFBL3: SUB U3,STPSIZ ;news = lasts-stpsiz SUB U1,STPSIZ ;newd = lastd-stpsiz CAMGE U3,SRCSAV ;compare news with original source addr JRST SAFBL4 ;aha, must truncate stpsiz for this last one. PUSH P,U1 HRL U1,U3 ;get news,,newd BLT U1,-1(U2) ; xfer from news to newd until reach lastd-1 POP P,U1 MOVE U2,U1 ;lastd = newd MOVE U4,U3 ;lasts = news JRST SAFBL3 ;loop til branch out SAFBL4: HRRZ U1,DSTSAV HRL U1,SRCSAV ;get original source,,dest BLT U1,-1(U2) ;xfer remaining stuffs SAFBL8: POPAE P,[U4,U3,U2,U1] POPJ P, $CrCr