brock

Saturday, July 7, 2012

BCD Arithmatics


Introduction

Binary coded decimal arithmetic has fallen out of favor among modern hardware designers, and it is poorly supported in most modern programming languages, yet it is still occasionally appropriate. For example, when a numeric field of a record in a text file must be incremented, the methods presented here will be significantly faster than converting the textual value to a binary integer, incrementing, and then converting back to text.
For programmers with a devious sense of humor, these methods can be a very effective way to obfuscate code in languages such as C. These techniques are also historically significant. The DEC (now Compaq) Alpha architecture and the more recent Intel MMX extensions to the 80x86 architecture both supporting manipulation of vectors of small objects such as pixels packed into 64-bit words; the motive for this can be found in the much older ideas described here.
In the following sections, all arithmetic is assumed to be done in 32 bit registers (the algorithms are easily adapted to other word lengths), and neither carry out nor overflow are deemed significant. The code written for clarity, using C notation for operators and constants, with named temporary variables whenever a subexpression in the computation is cited in the text. Practical implementations will very likely use far fewer temporary variables.
All of the number representations discussed here are unsigned, although the subtraction operation rests on the use of ten's complement arithmetic, and this, in turn, suggests the use of the ten's complement operation for negation. In that case, number with a most significant digit greater than 5 can be interpreted as negative numbers.
For unsigned decimal numbers (or for 10's complement numbers with the same sign), the conventional unsigned binary comparison operators all remain applicable with these decimal represenations. Thus, if A and B are unsigned decimal numbers, and if > is the conventional unsigned greater-than operator for binary numbers, A>B will yield the correct results for comparing these numbers.

BCD Represenations
Binary coded decimal numbers may be represented in a number of ways. The most obvious is packed BCD, where each decimal digit is represented by a 4 bit field, and the digits are packed consecutively into a computer word. This scheme allows 4 digits to be packed into a 16 bit word, or 8 digits into a 32 bit word, as shown below.
    B31                             B15                            B0
    |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    |   D7  |   D6  |   D5  |   D4  |   D3  |   D2  |   D1  |   D0  |
In addition to its efficiency, this format is useful because it is compatable with the use of hexidecimal I/O routines. Traditionally, packed BCD was viewed as requiring the use of special hardware to do BCD arithmetic, but in fact, it is possible to add two packed BCD numbers using a short sequence of conventional binary and logical operators.
Many early computers used 6-bit BCD codes, where each BCD digit was padded to 6 bits. This was compatable with the 36, 48 and 60 bit words used by many computers in the 1950's and 1960's, and on both the 36-bit IBM 704-709 series of machines (up through the 7044 and 7094) and 60-bit CDC 6600 series, this format was used to do BCD arithmetic without any special hardware support. On a modern 16 or 32 bit machine, this allows 3 or a bit more than 5 decimal digits per word, respectively.
    B31                             B15                            B0
    |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    | D5|0 0|   D4  |0 0|   D3  |0 0|   D2  |0 0|   D1  |0 0|   D0  |
In the 32 bit format, productive use can be made of the fractional digit D5 for carry detection and propagation in extended precision arithmetic operations.
It should be noted that in most historically important 6-bit BCD collating sequences, all numerals other than zero were represented as shown above, while zero was sometimes represented by the binary code 1010, allowing the 6-bit code 000000 to be used for space. Ones in the two most significant bits of each 6-bit byte were used in the codes for letters and punctuation marks. See material on punched card codes for more detail on this subject.
The methods discussed here can be extended to other padding systems in obvious ways. For example, 6 5-bit fields can be packed into a 32-bit word, with one bit of padding between BCD digits. Alternately, 3 bits of padding can be added between digits, so that a 32-bit word holds 4 complete 7-bit fields, plus 4 bits that can be used for a 5th BCD digit.
Padding each BCD digit to fill an 8 bit byte is quite practical, but this inefficient representation is not particularly interesting except in the special case where each digit is directly represented by its ASCII code, as follows:
    B31                             B15                            B0
    |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    |0 0 1 1|   D3  |0 0 1 1|   D2  |0 0 1 1|   D1  |0 0 1 1|   D0  |
Of course, the same approach can be used for EBCDIC, using 0xF for a pad digit instead of 0x3. It is quite practical to do arithmetic operations directly on all of these representations!
Packed BCD Arithmetic
Validity Checking
The least significant 7 digits of a packed BCD number may be tested for validity as follows:
    valid(a)
       t1 = a + 0x06666666
       t2 = a ^ 0x06666666
       t3 = t1 ^ t2
       t4 = t3 & 0x11111110
       if t4 nonzero, a is invalid
This code deliberately ignores D7, the most significant digit, because the arithmetic operators presented here allow this digit to hold values up to 15. To understand this code, note that binary addition of 6 to each digit, to produce t1, will cause any digit with a value greater than 9 to generate a carry into the next higher digit.
t2 is similar to t1, in the sense that exclusive-oring is similar to adding. The two will be equal in bits where there was no carry in to that position, but they will differ in bits where there was a carry in. Therefore, t3 holds a one wherever there was a carry into a bit of t1, and t4 holds a 1 wherever there was a carry into a BCD digit of t1. Since the addition used to produce t1 should not produce such carries for a valid BCD number, t4 should be zero!
    valid(a)
       t1 = a + 0x06666666
       t2 = t1 ^ a
       t3 = t2 & 0x11111110
       if t3 nonzero, a is invalid
The above improvement was suggested by John Mertus at Brown University; Here, note that adding 6 to a 4-bit nibble of a produces a carry out of that nibble if the nibble is invalid, but also, adding 6 does not change the least significant bit. By exclusive oring the result in t1 with a to make t2, the least significant bit of each nibble is compared with the original. Finally, we ignore all the other bits and demand that all the least significant bits be zero, indicating no carry from the previous bit positions.
Antoine Schweitzer-Chaput suggested an alternative validity check to me. Note that all BCD digits of the form 0xxx are valid, as are digits of the form x00x. Therefore, if d is a BCD digit, we can test it for validity with:
   valid(d)
     if (d & 0x8) is zero or (d & 0x6) is zero
Extending this to packed BCD is a fun challenge.
Addition
The code to add two BCD numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on many of the mechanisms introduced above.
    add(a,b)
       t1 = a + 0x06666666
       t2 = t1 + b
       t3 = t1 ^ b
       t4 = t2 ^ t3
       t5 = ~t4 & 0x11111110
       t6 = (t5 >> 2) | (t5 >> 3)
       return t2 - t6
Here, the addition used to form t1 should not produce any carries, since we assume that t1 is valid. The addition used to form t2 will produce a carry exactly when the decimal addition of two digits in the operands would produce a carry, and these are the carries we are concerned with.
In t2, the digits in each position that produced a carry will have the correct value, but the digits in positions that did not produce a carry will contain an excess 6. The problem, then, is to remove the extra 6 from each digit that did not generate a carry.
Note that t2 and t3 differ wherever there was a carry, so t4 records all carries into positions in the sum. Therefore, t5 holds the inverse of each of the 7 carries between successive decimal digits of the sum. This bit pattern is used to generate t6, where each digit is either 6 where the corresponding digits of the sum did not generate a carry or 0 where they did. Subtracting this from t2 produces the correct BCD sum.
Note that the most significant digit of the sum will exceed 9 if there should have been a carry out of that position. Furthermore, there is no easy way to detect this carry! Out of range values in the high digit may be of some limited use, but in general, the top BCD digit should only be used to accumulate carrys out of 7-digit BCD operands. If its use is limited in this way, the numbers will always be positive and will never produce a 2's complement overflow.
Subtraction
To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of a number, subtract it digit-by-digit from 99999999 (decimal) and then add 1, as follows:
    tencomp(a)
       t1 = 0xF9999999 - a
       return add( t1, 0x00000001 )
Here, t1 is (mostly) the 9's complement of the argument, except that the topmost digit is the 15's complement because of the redundancy allowed in that digit position. There is room for optimization in this code! As a first step, the code for addition has been substituted into place (with appropriate variable renaming).
    tencomp(a)
       t1 = 0xF9999999 - a
       t2 = t1 + 0x06666666
       t3 = t2 + 0x00000001
       t4 = t2 ^ 0x00000001
       t5 = t3 ^ t4
       t6 = ~t5 & 0x11111110
       t7 = (t6 >> 2) | (t6 >> 3)
       return t3 - t7
This reduces to the following obscure bit of code, with variables renamed again:
    tencomp(a)
       t1 = 0xFFFFFFFF - a
       t2 = - a
       t3 = t1 ^ 0x00000001
       t4 = t2 ^ t3
       t5 = ~t4 & 0x11111110
       t6 = (t5 >> 2) | (t5 >> 3)
       return t2 - t6
6-bit BCD Arithmetic
Validity Checking
All 5 full digits of a 6-bit BCD number may be tested for validity as follows:
    valid(a)
       t1 = a + 006666666666
       t2 = t1 & 006060606060
       if t2 = 006060606060, a is valid
To understand this code, note that binary addition of 066 to each digit, to produce t1, will cause any digit with a value greater than 9 to generate a carry into the next higher digit. The lower 6 forces a carry out of any digit greater than 9, while the higher 6 sets the pad bits and allows the carry to propagate up to the next higher digit position.
t2 holds just the pad bits in the result from t1. Where no carry was propagated in the computation of t1, these will be set, while where a carry was propagated through the padding, they will be reset. A valid 6-bit BCD number should not have generated any carry, so all of the pad bits should be set!
Addition
The code to add two 5-digit 6-bit BCD numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on the mechanisms just introduced:
    add(a,b)
       t1 = a + b + 006666666666
       t2 = t1 & 006060606060
       return t1 - (t2 >> 3)
Here, the addition used to form t1 adds an extra 6 to each BCD digit, forcing a carry out of that digit if the sum exceeds 9, and the same constant also includes ones in each pad bit in order to force this cary to propagate up to the next higher digit.
In t1, the digits in each position that produced a carry will have the correct value, but the digits in positions that did not produce a carry will contain an excess 6. The problem, is to remove this 6 from each digit that did not generate a carry.
t2 contains just the pad bits from the sum. For those digits that did not produce a carry, the pad bits will still be set to one. For those digits that produced a carry, the pad bits will be set to zero by the propagagion of the carry.
Shifting t2 3 places right produces a 6 in each digit position that did not generate a carry and zero elsewhere. Subtracting this from t1 produces the required correction!
A carry out from the 5-digit BCD sum will appear in partial digit, D5, that occupies the topmost 2 bits of the 32 bit number.
Subtraction
To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of 5-digit 6-bit BCD number, subtract it digit-by-digit from 99999 (decimal) and then add 1, as follows:
    tencomp(a)
       t1 = 001111111111 - a
       return add( t1, 000000000001 )
Here, t1 is the 9's complement of the argument (note that 11 octal is 9 decimal). There is considerable room for optimization in this! Substituting the definition of add and renaming the variables to avoid collision, we get:
    tencomp(a)
       t1 = 001111111111 - a
       t2 = t1 + 000000000001 + 006666666666
       t3 = t2 & 006060606060
       return t2 - (t3 >> 3)
Rearranging terms and renaming temporaries gives:
    tencomp(a)
       t1 = (001111111111 - a)
           + 000000000001
           + 006666666666
       t2 = t1 & 006060606060
       return t1 - (t2 >> 3)
Which simplifies to:
    tencomp(a)
       t1 = 010000000000 - a
       t2 = t1 & 006060606060
       return t1 - (t2 >> 3)
ASCII Arithmetic
Validity Checking
To verify that a 4-digit ASCII string contains only valid BCD codes, techniques similar to those used for packed BCD and 6-bit BCD can be used:
    valid(a)
       t1 = a + 0xC6C6C6C6
       t2 = t1 & 0xF0F0F0F0
       if t2 = 0xF0F0F0F0, a is valid
To understand this code, note that binary addition of 0xC6 to each digit, to produce t1, will cause any digit with a value greater than 9 to generate a carry into the next higher digit. The lower 6 forces a carry out of any digit greater than 9, while the higher 0xC sets those pad bits which were not already set in the ASCII representation of the digit, thus allowing the carry to propagate up to the next higher digit position.
t2 holds just the pad bits from t1. Where no carry was propagated in the computation of t1, these will be set, while where a carry was propagated, they will be reset. A valid 6-bit BCD number should not have generated any carry, so all pad bits should be set!
Validity checking is more complex if blank and zero are to be interchangable, and more complex still if only leading blanks are allowed. On the other hand, oring 0x10101010 with a potential number converts all blanks to zeros, while also converting certain punctuation marks to other numerals. In certain applications, this may be a reasonable way to deal with blanks.
Addition
The code to add two 4-digit ASCII numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on the mechanisms just introduced:
    add(a,b)
       t1 = a + b + 0x96969696
       t2 = t1 & 0x30303030
       t3 = t1 - (t2 >> 3)
       return (t3 & 0x0F0F0F0F) | 0x30303030
Here, the addition used to form t1 adds an extra 6 to each BCD digit, forcing a carry out of that digit if the sum exceeds 9. In addition, it sets those pad bits which would not already be set by the padding included in ASCII for each character. This forces carry propagation up to the next higher digit.
In t1, the digits in each position that produced a carry will have the correct value, but the digits in positions that did not produce a carry will contain an excess 6.
t2 contains the two least significant bits of the padding from the sum; these bits will be 1 where no carry was propagated, so they can be shifted to subtract the excess 6 from the adjacent digit of the sum. This corrected sum, in t3 still contains the incorrect values in the pad fields. No matter what we do, it seems to take two extra binary operations to restore the pad fields to their proper values.
The carry out from the 4-digit ASCII sum can be extracted from the pad bits of any of the temporary variables used above, so no digits are wasted in extended precision ASCII arithmetic.
By all means, if your purpose is to write obfuscated code, and if your programming language allows it, substitute ASCII literals for the hex constants wherever possible, as follows:
    add(a,b)
       t1 = a + b + 0x96969696
       t2 = t1 & '0000'
       t3 = t1 - (t2 >> 3)
       return (t3 & 0x0F0F0F0F) | '0000'
Subtraction
To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of a number, subtract it digit-by-digit from 9999 (decimal) and then add 1, as follows:
    tencomp(a)
       t1 = 0x69696969 - a
       return add( t1, 0x30303031 )
Here, t1 is the 9's complement of the argument; the tricky aspect in this case is that the pad bits of the argument are nonzero, so the constant 9999 must be padded strangely to create the correct results. In any case, there is considerable room for optimization in this code! Substituting the code for addition and renaming temporaries gives:
    tencomp(a)
       t1 = 0x69696969 - a
       t2 = t1 + 0x30303031 + 0x96969696
       t3 = t2 & 0x30303030
       t4 = t2 - (t3 >> 3)
       return (t4 & 0x0F0F0F0F) | 0x30303030
Combining terms, renaming, and simplifing the result gives the following strange bit of code:
    tencomp(a)
       t1 = 0x20202020 - a
       t2 = t1 & 0x30303030
       t3 = t1 - (t2 >> 3)
       return (t3 & 0x0F0F0F0F) | 0x30303030
Of course, if your purpose is to write obfuscated code, by all means substitute ASCII literals for the hex constants wherever possible, creating:
    tencomp(a)
       t1 = '    ' - a
       t2 = t1 & '0000'
       t3 = t1 - (t2 >> 3)
       return (t3 & 0x0F0F0F0F) | '0000'
Much of the complexity of the above code is imposed by computations that determine the carry out of each digit of the BCD sum. Hardware that allows the carry propagation from bit to bit of the adder to be interrupted at selected points can greatly speed these computations. Both the DEC (later Compaq) Alpha and the Intel MMX extensions to the 80x86 family provide for this, allowing 64 bit words to be divided into many subfields, with carry propagation from field to field interrupted, and with the carries captured in a form allowing their manipulation. These facilities are not exposed to the high level language programmer, but they are used intensively in carefully handcrafted assembly-language routines for common vector operations such as those involved in pixel manipulation of graphics images.