Node:Extended GCD, Next:Jacobi Symbol, Previous:Accelerated GCD, Up:Greatest Common Divisor Algorithms
The extended GCD calculates gcd(a,b) and also cofactors x and
y satisfying a*x+b*y=gcd(a,b). Lehmer's
multi-step improvement of the extended Euclidean algorithm is used. See Knuth
section 4.5.2 algorithm L, and mpn/generic/gcdext.c
. This is an
O(N^2) algorithm.
The multipliers at each step are found using single limb calculations for
sizes up to GCDEXT_THRESHOLD
, or double limb calculations above that.
The single limb code is faster but doesn't produce full-limb multipliers,
hence not making full use of the mpn_addmul_1
calls.
When a CPU has a data-dependent multiplier, meaning one which is faster on
operands with fewer bits, the extra work in the double-limb calculation might
only save some looping overheads, leading to a large GCDEXT_THRESHOLD
.
Currently the single limb calculation doesn't optimize for the small quotients
that often occur, and this can lead to unusually low values of
GCDEXT_THRESHOLD
, depending on the CPU.
An analysis of double-limb calculations can be found in "A Double-Digit Lehmer-Euclid Algorithm" by Jebelean (see References). The code in GMP was developed independently.
It should be noted that when a double limb calculation is used, it's used for the whole of that GCD, it doesn't fall back to single limb part way through. This is because as the algorithm proceeds, the inputs a and b are reduced, but the cofactors x and y grow, so the multipliers at each step are applied to a roughly constant total number of limbs.