GNU MP 4.1

Node:Extended GCD, Next:Jacobi Symbol, Previous:Accelerated GCD, Up:Greatest Common Divisor Algorithms

Extended GCD

The extended GCD calculates gcd(a,b) and also cofactors x and y satisfying a*x+b*y=gcd(a,b). Lehmer's multi-step improvement of the extended Euclidean algorithm is used. See Knuth section 4.5.2 algorithm L, and mpn/generic/gcdext.c. This is an O(N^2) algorithm.

The multipliers at each step are found using single limb calculations for sizes up to GCDEXT_THRESHOLD, or double limb calculations above that. The single limb code is faster but doesn't produce full-limb multipliers, hence not making full use of the mpn_addmul_1 calls.

When a CPU has a data-dependent multiplier, meaning one which is faster on operands with fewer bits, the extra work in the double-limb calculation might only save some looping overheads, leading to a large GCDEXT_THRESHOLD.

Currently the single limb calculation doesn't optimize for the small quotients that often occur, and this can lead to unusually low values of GCDEXT_THRESHOLD, depending on the CPU.

An analysis of double-limb calculations can be found in "A Double-Digit Lehmer-Euclid Algorithm" by Jebelean (see References). The code in GMP was developed independently.

It should be noted that when a double limb calculation is used, it's used for the whole of that GCD, it doesn't fall back to single limb part way through. This is because as the algorithm proceeds, the inputs a and b are reduced, but the cofactors x and y grow, so the multipliers at each step are applied to a roughly constant total number of limbs.