A fast (written in assembly) library for continued fractions
Thomas Baruchel
tbaruchel at free.fr
Fri Nov 3 23:17:29 CET 2006
I wrote for my own purposes the following X86 assembly code, intended
to be called from GFORTRAN. Such a thing should only work if you have
a similar configuration to mine, but anyway, it may interest you.
I wanted a fast library for computing the length in the continued
fraction expansion of sqrt(n) for any integer n that can fit in
a REAL*8 gfortran type for statistical purposes. I also included
a faster function that tells if this length is odd or even. Though
I am not an assembly expert, there should be a significant speed
improvement if you use this code rather than your own fortran
functions, since the whole process is performed on the FPU stack
with no load/store useless travels.
It may help you, it may not. You know about these kinds of software...
Instructions are at the beginning of the file, as comments.
Read them carefully. Code is below.
Regards.
------------------ FILE quad.s BEGINS BELOW --------------------
/*
Compile your piece of code with something like
gfortran -fno-underscoring mycode.f90 quad.s
and add freely your own flags. Enjoy !
Of course, if you don't want to use the -fno-underscoring
option for some reason, you can add an underscore at the end
of the global functions below, like that:
.globl initfpu_
initfpu_:
it should work.
Remember this piece of code has the following status:
Seems to work for me, maybe it will also seem for you,
maybe your computer will explode, maybe it will make the
earth explode, maybe God will choose not to have created
the world, think about it before using it.
(Anyway, if you see some kind of bug, please let me know ;-)
Remember that
a) this is x86 assembly code;
b) it uses the GNU AS (gas) syntax;
c) it uses the gfortran calling conventions;
(now you understand why you can't expect it will works ;-)
Usage. In your program (or subroutine or whatever), declare
the function you need like:
logical :: oddperiod
integer*8 :: period
(the two subroutines don't need to be declared).
Then use
integer*4 :: oldfpustatus
call initfpu(oldfpustatus)
before any computation; if you don't do it, it won't work.
When you don't need any more the library,
call restorefpu(oldfpustatus)
will restore the control register of the FPU.
During all that time, you shouldn't be disturbed too much,
for your own mathematical operations, but
remember the rounding mode of the FPU has been set to
"round down" (rather than the usual "nearest").
How does it work? The whole computation is performed on the
stack of the FPU; only integer values are needed for these
tasks, but they are stored as REAL*10, 80 bits reals. They
are not converted to anything during the whole process, thus
it should work for great integers stored in your own REAL*8,
64 bits reals. Of course, it may happen (I think it will never,
but didn't check) that an intermediate result can not fit in
such a REAL*10 register; in that case, read above what is
written about the end of the world...
*/
/*
call initfpu(integer*4)
=====================
This subroutine should be called before any other call to the
library; its purpose is to set the rounding control in the FPU
control register to the "round down" mode.
The subroutine puts in the integer variable the initial value
of the FPU control register.
If you want to use often this subroutine (which you shouldn't:
speed being an issue, this subroutine is intended to be used
once, then the functions billions of times, not the contrary ;-),
maybe removing the "fnclex" can improve the speed, since I don't
think it plays a great role here.
*/
.globl initfpu
initfpu:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
fnstcw (%eax)
fwait
movw (%eax), %dx
andb $0xf3, %dh
orb $0x4, %dh
movw %dx, 2(%eax)
fnclex
fldcw 2(%eax)
fwait
movw $0, 2(%eax)
leave
ret
/*
call restorefpu(integer*4)
========================
This subroutine should be used for restoring the initial state of
the FPU control register; the integer variable must contain the
right value (which should have been previously set by the
subroutine initfpu).
*/
.globl restorefpu
restorefpu:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
fclex
fldcw (%eax)
leave
ret
/*
sqrtint( real double precision : REAL*8 )
=========================================
This function returns the integer part of the square root
of the given real. The return value has itself the real type.
*/
.global sqrtint
sqrtint:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
fldl (%eax)
fsqrt
frndint
leave
ret
/*
oddperiod( real double precision : REAL*8 )
===========================================
Despite the type of the real argument, the variable has to
be some integer value stored as a double precision.
The function has the logical type. It returns FALSE if
the argument is a perfect square or if the continued
fraction expansion of its square root has an even period
length; it returns TRUE in the remaining case.
oddperiod(dble(25)) --> FALSE
oddperiod(dble(13)) --> TRUE
oddperiod(dble(12)) --> FALSE
If it may help, you can declare the function as an integer*4
type rather than a logical one (output will be 0 or 1).
Though it seems to work with integer*8, I am not absolutely
sure it *always* will (unless you add the following instruction
movl $0, %edx
at the beginning of the code (for instance between the current
second and third lines of code), but really why would you
do it? Maybe something in the calling conventions is such that
the register EDX already contains 0, but, again, who cares?
If you read the instructions for the period() function, you
will learn about some 2^32 limitations, but you don't have to
care about that here: the length is not kept in any register
during the computation and the code should work as long as the
values in the FPU stack are the right one.
*/
.global oddperiod
oddperiod:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
fldln2
fldl (%eax)
fld %st
fsqrt
frndint
fld %st
fld %st
fmul %st, %st
fld %st(3)
fsubp %st, %st(1)
fcom %st(4)
fstsw %ax
fwait
sahf
ja oddperiod0
ffree %st
ffree %st(1)
ffree %st(2)
ffree %st(3)
ffree %st(4)
oddperiod2:
movl $0, %eax
leave
ret
oddperiod0:
fld %st(1)
fadd %st(3), %st
fdiv %st(1), %st
frndint
fld %st(2)
fld %st(1)
fmul %st(3), %st
fsubp %st, %st(1)
fxch %st(3)
fsub %st(3), %st
fabs
fcomp %st(6)
fstsw %ax
fwait
sahf
ja oddperiod1
fld %st(3)
fadd %st, %st
fsubp %st, %st(1)
fcomp %st(5)
fstsw %ax
fwait
ffree %st
ffree %st(1)
ffree %st(2)
ffree %st(3)
ffree %st(4)
sahf
ja oddperiod2
movl $1, %eax
leave
ret
oddperiod1:
ffree %st
fincstp
fld %st(1)
fmul %st, %st
fld %st(4)
fsubp %st, %st(1)
fdivp %st, %st(1)
frndint
jmp oddperiod0
/*
period( real double precision : REAL*8 )
========================================
Despite the type of the real argument, the variable has to
be some integer value stored as a double precision.
The function has integer*8 type, though the output
can't be greater than 2^32-1 (this choice has been made
for avoiding sign problem when the result is greater
(or equal) then 2^31 and smaller then 2^32.
The result is the length of period in the continued fraction
expansion of the square root of the given integer.
If speed is really for you an obsession, remember that it
is highly unprobable that you use that library for periods
between the two fatidic values. Maybe it will increase a little
the speed to use integer*4 which allows to delete twice the line
movl $0,%edx
(not the first time it appears, but the second and third one).
If you try such a thing and see some speed improvement, please
let me know.
*/
.global period
period:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl $0, %edx
fldln2
fldl (%eax)
fld %st
fsqrt
frndint
fld %st
fld %st
fmul %st, %st
fld %st(3)
fsubp %st, %st(1)
fcom %st(4)
fstsw %ax
fwait
sahf
ja period0
ffree %st
ffree %st(1)
ffree %st(2)
ffree %st(3)
ffree %st(4)
movl %edx, %eax
movl $0, %edx
leave
ret
period0:
inc %edx
fld %st(1)
fadd %st(3), %st
fdiv %st(1), %st
frndint
fld %st(2)
fld %st(1)
fmul %st(3), %st
fsubp %st, %st(1)
fxch %st(3)
fsub %st(3), %st
fabs
fcomp %st(6)
fstsw %ax
fwait
sahf
ja period1
fld %st(3)
fadd %st, %st
fsubp %st, %st(1)
fcomp %st(5)
fstsw %ax
fwait
ffree %st
ffree %st(1)
ffree %st(2)
ffree %st(3)
ffree %st(4)
sahf
jb period2
add %edx, %edx
period2:
movl %edx, %eax
movl $0,%edx
leave
ret
period1:
ffree %st
fincstp
fld %st(1)
fmul %st, %st
fld %st(4)
fsubp %st, %st(1)
fdivp %st, %st(1)
frndint
jmp period0
More information about the SeqFan
mailing list