[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Binary Divides for uP core



Correct me if I am wrong but don't FPGA use a LUT (Look Up Table)
to emulate the low level logic elements (AND, OR, ...). A device like the
Altera APEX 20K has 6 inputs (data1-data4, Carry-In, Cascade-In)
and arguably a 7th input (one of the clears). For output it has conceivably
4 with register packing: Register, Register Bypass, Carry-Out, Cascade-Out.
With careful investigation and crafty handiwork you might be able to
construct the LE to perform two bits at a time. I haven't looked
at this close enough to see if this can be done.

Placing that aside and working 1 bit at a time for performance you
would not look at the problem from an abstract pseudo C programmers
point of view (restricting yourself to operations performed in C) but
look at the problem from the perspective of what the Logic Element
does well. For the purposes of division I would try to construct
a barrel shifter comparitor subtractor. I don't think you could squeeze
in subtraction. That lacking then create a barrel shifter comparitor while
using a different set of LEs to do the conditional subtraction.

The point is to pack two (or three) abstract operations (pseudo C)
into one LE and perform these operations in one cycle. This
accomplishes several things. Reduction in LE's, reduction in
clock counts to propagate the solution. This produces a faster
core while using less resources (and power).

Jim Dempsey
Newbee to FPGA programming
Old hand at optimization techniques

----- Original Message -----
From: "Richard Herveille" <richard@asics.ws>
To: <cores@opencores.org>
Sent: Wednesday, January 16, 2002 3:05 AM
Subject: Re: [oc] Binary Divides for uP core


>
> Correct, these are called high-radix dividers (comparable to high-radix
> multipliers).
> If you are targeting ASICS and have time to/want to generate the tables
> this works great and is faster than a radix-2 (1-bit at a time) divider.
> If you are (also) targeting FPGAS, this is a NONO. Because it will make
> the divider slower. This is caused by the fact that the ripple-carry
> structure in FPGAS is much faster than the other structures. So your
> critical path is not the adder/subtractor structure, but the
lookup-tables.
>
> Richard
>
> >You can also multiply using a combination of look-up table and adder.
> >Instead of doing a shift and add one bit at a time you can process two,
> >three, four, ... bits at a time. Some of the CRC32 generators used to
> >do this (way back).
> >
> >Also (way back), some implimentations of division generated 1/n then
> >performed a multiply.
> >
> >     m / n = m * (1 / n)
> >
> >The accuracy of the remainder is not the same.
> >
> >Jim Dempsey
> >
> >
> >----- Original Message -----
> >From: "Richard Herveille" <richard@asics.ws>
> >To: <cores@opencores.org>
> >Sent: Tuesday, January 15, 2002 2:30 AM
> >Subject: Re: [oc] Binary Divides for uP core
> >
> >
> > >
> > > As everybody knows (or should know), a hardware multiplier is simply a
> > > bunch of adders.
> > > The proces is called "multiplication by repeated addition".
> > > Each adder adds a shifted-and-anded part to the final result, see
example.
> > >
> > > A=10, B=2 A*B=
> > >
> > >   1010
> > >     10
> > > -----
> > >   0000    ((1010 && 0) << 0)
> > > 10100    ((1010 && 1) <<1)
> > > ----- +
> > > 10100
> > >
> > > The final result always fits into Abits + Bbits (4+2=6 bits in the
> > > example), i.e.
> > > the result of a k*k bit multiplication always fits into 2k bits.
> > > Also there is no (real) difference between unsigned and signed
> >multiplication.
> > >
> > >
> > > Now for the difficult part division:
> > >
> > > Differences from the multiplication
> > > 1) The most common implementation is a bunch of subtractors.
> > >     This process is called "Division by repeated subtraction".
> > >     Other implementations use BSD numbers and/or adder-subtractor
> >structures.
> > > 2) In a divider bits become available one at a time, starting at the
MSB.
> > >     intuitive: Try to divide 1189/4 by hand, you also start at the
MSD.
> > > 3) Because of point-2 dividers are inherently slower than multipliers
> > >     Also single clock division is (almost) impossible.
> > >     note: Single cycle IS possible (pipeline your design).
> > > 4) A 2K by K division does NOT always fit into K bits
> > >     example 0100/01
> > > 5) There's a big difference between signed and unsigned division.
> > >
> > > Here are some common implementations you should try to find, because
these
> > > are the simplest to implement:
> > > "Restoring Unsigned Divider"
> > > "Non-Restoring Unsigned Divider"
> > > "Non-Restoring Signed Divider"
> > > "Division by repeated multiplication"
> > > The last method works great, if a hardware multiplier is available.
> > > There are also some structures which combine multipliers and dividers.
> > >
> > >
> > > Richard
> > >
> > >
> > > >Hi,
> > > >
> > > >Does anybody know the routine for doing a binary division in HDL?
> > > >I'm developing a uP core and would like to do divisions in one
> > > >clock cycle if possible? The multiple instruction was easy to
> > > >figure out but I'm still stuck on the division one.
> > > >
> > > >Paul
> > > >
> > > >
> > > >--
> > > >To unsubscribe from cores mailing list please visit
> > > >http://www.opencores.org/mailinglists.shtml
> > >
> > > --
> > > To unsubscribe from cores mailing list please visit
> >http://www.opencores.org/mailinglists.shtml
> > >
> >
> >--
> >To unsubscribe from cores mailing list please visit
> >http://www.opencores.org/mailinglists.shtml
>
> --
> To unsubscribe from cores mailing list please visit
http://www.opencores.org/mailinglists.shtml
>

--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml