[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oc] wide crc_32 doesn't need to be slow.



Allan:

I looked into making wider CRC's run faster.

At first glance, it seems that the logic grows and grows.  So
wider CRC's would seem to run slower due to logic depth.

But I found that the dependence on the internal CRC state does
NOT grow more and more complicated.

What this means is that you can do a calculation based on the
new data in, and combine it with a calculation based on the
internal state.

The advantage of this is that you can pipeline the complicated
calculation based on input data.  It can take more than 1 clock.

I think that one should be able to make the CRC calculation for
any width wider than 32 bits take roughly the same time as
a 32-bit calculation, with a pipeline depth which grows with
the input width.  Plus lots of extra pipeline flops.

I included a 64-bit calculation in the crc_32.lib which demonstrates this.
I would expect it and wider CRC's to run in about the time it takes
to cascade 5 XOR's into a flop.

I found a paper on the web by a person from IBM which uses a
different tactic.  It describes a way to calculate a much wider CRC
with simple terms, and then collapse it to the final crc_32 at the end.

As you say, it looks like a 64-bit CRC should be able to run at
10 GBit/sec in an FPGA.  I have no idea what IBM needs to do
checksums on which runs substantially faster.  Curious.

blue beaver


> You reach a point where making the bus wider doesn't make the CRC
> faster, in terms of bits per second.  The number of inputs to each XOR
> grows with the bus width.
>
> I was doing some parallel PRBS experiments recently, and I found in some
> cases that the 128 bit bus produced lower bit rates than a 64 bit
> design.



--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml