Binary coded decimal arithmetic has fallen out of favor
among modern hardware designers, and it is poorly supported in most modern
programming languages, yet it is still occasionally appropriate. For example,
when a numeric field of a record in a text file must be incremented, the
methods presented here will be significantly faster than converting the textual
value to a binary integer, incrementing, and then converting back to text.
For programmers with a devious sense of humor, these methods
can be a very effective way to obfuscate code in languages such as C. These
techniques are also historically significant. The DEC (now Compaq) Alpha
architecture and the more recent Intel MMX extensions to the 80x86 architecture
both supporting manipulation of vectors of small objects such as pixels packed
into 64-bit words; the motive for this can be found in the much older ideas
described here.
In the following sections, all arithmetic is assumed to be
done in 32 bit registers (the algorithms are easily adapted to other word
lengths), and neither carry out nor overflow are deemed significant. The code
written for clarity, using C notation for operators and constants, with named
temporary variables whenever a subexpression in the computation is cited in the
text. Practical implementations will very likely use far fewer temporary
variables.
All of the number representations discussed here are
unsigned, although the subtraction operation rests on the use of ten's
complement arithmetic, and this, in turn, suggests the use of the ten's
complement operation for negation. In that case, number with a most significant
digit greater than 5 can be interpreted as negative numbers.
For unsigned decimal numbers (or for 10's complement numbers
with the same sign), the conventional unsigned binary comparison operators all
remain applicable with these decimal represenations. Thus, if A and B are
unsigned decimal numbers, and if > is the conventional
unsigned greater-than operator for binary numbers, A>B will
yield the correct results for comparing these numbers.
BCD Represenations
Binary coded decimal numbers may be represented in a number
of ways. The most obvious is packed BCD, where each decimal digit is
represented by a 4 bit field, and the digits are packed consecutively into a
computer word. This scheme allows 4 digits to be packed into a 16 bit word, or
8 digits into a 32 bit word, as shown below.
B31
B15
B0
|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _|
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
| D7 |
D6 | D5 | D4 | D3 |
D2 | D1 | D0 |
In addition to its efficiency, this format is useful because
it is compatable with the use of hexidecimal I/O routines. Traditionally,
packed BCD was viewed as requiring the use of special hardware to do BCD
arithmetic, but in fact, it is possible to add two packed BCD numbers using a
short sequence of conventional binary and logical operators.
Many early computers used 6-bit BCD codes, where
each BCD digit was padded to 6 bits. This was compatable with the 36, 48 and 60
bit words used by many computers in the 1950's and 1960's, and on both the
36-bit IBM 704-709 series of machines (up through the 7044 and 7094) and 60-bit
CDC 6600 series, this format was used to do BCD arithmetic without any special
hardware support. On a modern 16 or 32 bit machine, this allows 3 or a bit more
than 5 decimal digits per word, respectively.
B31
B15
B0
|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _|
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
| D5|0 0| D4 |0
0| D3 |0 0| D2 |0 0| D1
|0 0| D0 |
In the 32 bit format, productive use can be made of the
fractional digit D5 for carry detection and propagation in extended precision
arithmetic operations.
It should be noted that in most historically important 6-bit
BCD collating sequences, all numerals other than zero were represented as shown
above, while zero was sometimes represented by the binary code 1010, allowing
the 6-bit code 000000 to be used for space. Ones in the two most significant
bits of each 6-bit byte were used in the codes for letters and punctuation
marks. See material on punched card codes for more detail on this
subject.
The methods discussed here can be extended to other padding
systems in obvious ways. For example, 6 5-bit fields can be packed into a
32-bit word, with one bit of padding between BCD digits. Alternately, 3 bits of
padding can be added between digits, so that a 32-bit word holds 4 complete
7-bit fields, plus 4 bits that can be used for a 5th BCD digit.
Padding each BCD digit to fill an 8 bit byte is quite
practical, but this inefficient representation is not particularly interesting
except in the special case where each digit is directly represented by its ASCII
code, as follows:
B31
B15
B0
|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _|
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|0 0 1 1| D3 |0 0 1
1| D2 |0 0 1 1| D1 |0 0 1 1|
D0 |
Of course, the same approach can be used for EBCDIC, using 0xF for
a pad digit instead of 0x3. It is quite practical to do arithmetic
operations directly on all of these representations!
Packed BCD Arithmetic
Validity Checking
The least significant 7 digits of a packed BCD number may be
tested for validity as follows:
valid(a)
t1 = a + 0x06666666
t2 = a ^ 0x06666666
t3 = t1 ^ t2
t4 = t3 &
0x11111110
if t4 nonzero, a is
invalid
This code deliberately ignores D7, the most significant
digit, because the arithmetic operators presented here allow this digit to hold
values up to 15. To understand this code, note that binary addition of 6 to
each digit, to produce t1, will cause any digit with a value greater than
9 to generate a carry into the next higher digit.
t2 is similar to t1, in the sense that
exclusive-oring is similar to adding. The two will be equal in bits where there
was no carry in to that position, but they will differ in bits where there was
a carry in. Therefore, t3 holds a one wherever there was a carry into
a bit of t1, and t4 holds a 1 wherever there was a carry into a
BCD digit of t1. Since the addition used to produce t1 should
not produce such carries for a valid BCD number, t4 should be zero!
valid(a)
t1 = a + 0x06666666
t2 = t1 ^ a
t3 = t2 &
0x11111110
if t3 nonzero, a is
invalid
The above improvement was suggested by John Mertus at Brown
University; Here, note that adding 6 to a 4-bit nibble of a produces
a carry out of that nibble if the nibble is invalid, but also, adding 6 does
not change the least significant bit. By exclusive oring the result in t1 with a to
make t2, the least significant bit of each nibble is compared with the
original. Finally, we ignore all the other bits and demand that all the least
significant bits be zero, indicating no carry from the previous bit positions.
Antoine Schweitzer-Chaput suggested an alternative validity
check to me. Note that all BCD digits of the form 0xxx are valid, as are digits
of the form x00x. Therefore, if d is a BCD digit, we can test it for
validity with:
valid(d)
if (d & 0x8) is zero or (d
& 0x6) is zero
Extending this to packed BCD is a fun challenge.
Addition
The code to add two BCD numbers must solve the problem of
propagating carries from one BCD digit to the next, and the logic for doing so
rests on many of the mechanisms introduced above.
add(a,b)
t1 = a + 0x06666666
t2 = t1 + b
t3 = t1 ^ b
t4 = t2 ^ t3
t5 = ~t4 &
0x11111110
t6 = (t5
>> 2) | (t5 >> 3)
return t2 - t6
Here, the addition used to form t1 should not
produce any carries, since we assume that t1 is valid. The addition
used to form t2 will produce a carry exactly when the decimal
addition of two digits in the operands would produce a carry, and these are the
carries we are concerned with.
In t2, the digits in each position that produced a
carry will have the correct value, but the digits in positions that did not
produce a carry will contain an excess 6. The problem, then, is to remove the
extra 6 from each digit that did not generate a carry.
Note that t2 and t3 differ wherever
there was a carry, so t4 records all carries into positions in the
sum. Therefore, t5 holds the inverse of each of the 7 carries between
successive decimal digits of the sum. This bit pattern is used to generate t6,
where each digit is either 6 where the corresponding digits of the sum did not
generate a carry or 0 where they did. Subtracting this from t2 produces
the correct BCD sum.
Note that the most significant digit of the sum will exceed
9 if there should have been a carry out of that position. Furthermore, there is
no easy way to detect this carry! Out of range values in the high digit may be
of some limited use, but in general, the top BCD digit should only be used to
accumulate carrys out of 7-digit BCD operands. If its use is limited in this
way, the numbers will always be positive and will never produce a 2's
complement overflow.
Subtraction
To subtract two numbers, 10's complement addition should be
used. To compute the 10's complement of a number, subtract it digit-by-digit
from 99999999 (decimal) and then add 1, as follows:
tencomp(a)
t1 = 0xF9999999 - a
return add( t1,
0x00000001 )
Here, t1 is (mostly) the 9's complement of the
argument, except that the topmost digit is the 15's complement because of the
redundancy allowed in that digit position. There is room for optimization in
this code! As a first step, the code for addition has been substituted into place
(with appropriate variable renaming).
tencomp(a)
t1 = 0xF9999999 - a
t2 = t1 + 0x06666666
t3 = t2 + 0x00000001
t4 = t2 ^ 0x00000001
t5 = t3 ^ t4
t6 = ~t5 &
0x11111110
t7 = (t6
>> 2) | (t6 >> 3)
return t3 - t7
This reduces to the following obscure bit of code, with
variables renamed again:
tencomp(a)
t1 = 0xFFFFFFFF - a
t2 = - a
t3 = t1 ^ 0x00000001
t4 = t2 ^ t3
t5 = ~t4 &
0x11111110
t6 = (t5
>> 2) | (t5 >> 3)
return t2 - t6
6-bit BCD Arithmetic
Validity Checking
All 5 full digits of a 6-bit BCD number may be tested for
validity as follows:
valid(a)
t1 = a + 006666666666
t2 = t1 &
006060606060
if t2 = 006060606060, a
is valid
To understand this code, note that binary addition of 066 to
each digit, to produce t1, will cause any digit with a value greater than
9 to generate a carry into the next higher digit. The lower 6 forces a carry
out of any digit greater than 9, while the higher 6 sets the pad bits and
allows the carry to propagate up to the next higher digit position.
t2 holds just the pad bits in the result from t1.
Where no carry was propagated in the computation of t1, these will be set,
while where a carry was propagated through the padding, they will be reset. A
valid 6-bit BCD number should not have generated any carry, so all of the pad
bits should be set!
Addition
The code to add two 5-digit 6-bit BCD numbers must solve the
problem of propagating carries from one BCD digit to the next, and the logic
for doing so rests on the mechanisms just introduced:
add(a,b)
t1 = a + b +
006666666666
t2 = t1 &
006060606060
return t1 - (t2
>> 3)
Here, the addition used to form t1 adds an extra 6
to each BCD digit, forcing a carry out of that digit if the sum exceeds 9, and
the same constant also includes ones in each pad bit in order to force this
cary to propagate up to the next higher digit.
In t1, the digits in each position that produced a
carry will have the correct value, but the digits in positions that did not
produce a carry will contain an excess 6. The problem, is to remove this 6 from
each digit that did not generate a carry.
t2 contains just the pad bits from the sum. For those
digits that did not produce a carry, the pad bits will still be set to one. For
those digits that produced a carry, the pad bits will be set to zero by the
propagagion of the carry.
Shifting t2 3 places right produces a 6 in each
digit position that did not generate a carry and zero elsewhere. Subtracting
this from t1 produces the required correction!
A carry out from the 5-digit BCD sum will appear in partial
digit, D5, that occupies the topmost 2 bits of the 32 bit number.
Subtraction
To subtract two numbers, 10's complement addition should be
used. To compute the 10's complement of 5-digit 6-bit BCD number, subtract it
digit-by-digit from 99999 (decimal) and then add 1, as follows:
tencomp(a)
t1 = 001111111111 - a
return add( t1,
000000000001 )
Here, t1 is the 9's complement of the argument
(note that 11 octal is 9 decimal). There is considerable room for optimization
in this! Substituting the definition of add and renaming the variables to avoid
collision, we get:
tencomp(a)
t1 = 001111111111 - a
t2 = t1 + 000000000001
+ 006666666666
t3 = t2 &
006060606060
return t2 - (t3
>> 3)
Rearranging terms and renaming temporaries gives:
tencomp(a)
t1 = (001111111111 - a)
+ 000000000001
+ 006666666666
t2 = t1 &
006060606060
return t1 - (t2
>> 3)
Which simplifies to:
tencomp(a)
t1 = 010000000000 - a
t2 = t1 &
006060606060
return t1 - (t2
>> 3)
ASCII Arithmetic
Validity Checking
To verify that a 4-digit ASCII string contains only valid
BCD codes, techniques similar to those used for packed BCD and 6-bit BCD can be
used:
valid(a)
t1 = a + 0xC6C6C6C6
t2 = t1 &
0xF0F0F0F0
if t2 = 0xF0F0F0F0, a
is valid
To understand this code, note that binary addition of 0xC6 to
each digit, to produce t1, will cause any digit with a value greater than
9 to generate a carry into the next higher digit. The lower 6 forces a carry
out of any digit greater than 9, while the higher 0xC sets those pad bits which
were not already set in the ASCII representation of the digit, thus allowing
the carry to propagate up to the next higher digit position.
t2 holds just the pad bits from t1. Where no carry
was propagated in the computation of t1, these will be set, while where a
carry was propagated, they will be reset. A valid 6-bit BCD number should not
have generated any carry, so all pad bits should be set!
Validity checking is more complex if blank and zero are to
be interchangable, and more complex still if only leading blanks are allowed.
On the other hand, oring 0x10101010 with a potential number converts
all blanks to zeros, while also converting certain punctuation marks to other
numerals. In certain applications, this may be a reasonable way to deal with
blanks.
Addition
The code to add two 4-digit ASCII numbers must solve the
problem of propagating carries from one BCD digit to the next, and the logic
for doing so rests on the mechanisms just introduced:
add(a,b)
t1 = a + b + 0x96969696
t2 = t1 &
0x30303030
t3 = t1 - (t2
>> 3)
return (t3 &
0x0F0F0F0F) | 0x30303030
Here, the addition used to form t1 adds an extra 6
to each BCD digit, forcing a carry out of that digit if the sum exceeds 9. In
addition, it sets those pad bits which would not already be set by the padding
included in ASCII for each character. This forces carry propagation up to the
next higher digit.
In t1, the digits in each position that produced a
carry will have the correct value, but the digits in positions that did not
produce a carry will contain an excess 6.
t2 contains the two least significant bits of the
padding from the sum; these bits will be 1 where no carry was propagated, so
they can be shifted to subtract the excess 6 from the adjacent digit of the
sum. This corrected sum, in t3 still contains the incorrect values in
the pad fields. No matter what we do, it seems to take two extra binary operations
to restore the pad fields to their proper values.
The carry out from the 4-digit ASCII sum can be extracted
from the pad bits of any of the temporary variables used above, so no digits
are wasted in extended precision ASCII arithmetic.
By all means, if your purpose is to write obfuscated code,
and if your programming language allows it, substitute ASCII literals for the
hex constants wherever possible, as follows:
add(a,b)
t1 = a + b + 0x96969696
t2 = t1 &
'0000'
t3 = t1 - (t2
>> 3)
return (t3 &
0x0F0F0F0F) | '0000'
Subtraction
To subtract two numbers, 10's complement addition should be
used. To compute the 10's complement of a number, subtract it digit-by-digit
from 9999 (decimal) and then add 1, as follows:
tencomp(a)
t1 = 0x69696969 - a
return add( t1,
0x30303031 )
Here, t1 is the 9's complement of the argument;
the tricky aspect in this case is that the pad bits of the argument are
nonzero, so the constant 9999 must be padded strangely to create the correct
results. In any case, there is considerable room for optimization in this code!
Substituting the code for addition and renaming temporaries gives:
tencomp(a)
t1 = 0x69696969 - a
t2 = t1 + 0x30303031 +
0x96969696
t3 = t2 &
0x30303030
t4 = t2 - (t3
>> 3)
return (t4 &
0x0F0F0F0F) | 0x30303030
Combining terms, renaming, and simplifing the result gives
the following strange bit of code:
tencomp(a)
t1 = 0x20202020 - a
t2 = t1 &
0x30303030
t3 = t1 - (t2
>> 3)
return (t3 &
0x0F0F0F0F) | 0x30303030
Of course, if your purpose is to write obfuscated code, by
all means substitute ASCII literals for the hex constants wherever possible,
creating:
tencomp(a)
t1 =
' ' - a
t2 = t1 &
'0000'
t3 = t1 - (t2
>> 3)
return (t3 &
0x0F0F0F0F) | '0000'
Much of the complexity of the above code is imposed by
computations that determine the carry out of each digit of the BCD sum.
Hardware that allows the carry propagation from bit to bit of the adder to be
interrupted at selected points can greatly speed these computations. Both the
DEC (later Compaq) Alpha and the Intel MMX extensions to the 80x86 family
provide for this, allowing 64 bit words to be divided into many subfields, with
carry propagation from field to field interrupted, and with the carries captured
in a form allowing their manipulation. These facilities are not exposed to the
high level language programmer, but they are used intensively in carefully
handcrafted assembly-language routines for common vector operations such as
those involved in pixel manipulation of graphics images.