[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Beyond Transmeta...



> I suppose it would be nice if network would be optimized like normal
> circuits.
> But I am still concerned about speed - normal adder would probably add much
> faster.

It would add much faster in that case you gave, but if you are adding a lot 
of numbers in parallel, and you had enough 1 bit processors to handle it, 
then you could gain considerablely more adds within that 32 cycles.

> I think calculations for 4 processors are meaningless. You should calculate
> number for larger numbers, like 32, 128,... (all for 32b operands)

With the add network I created, it can only use 4 processors simultaniously. 
In the first pass it adds the first 2 bits, and gets their carries (2 adds 2 
carries = 4 instructions). In the second pass it adds the carry from the 
previous pass, and then combines those carries into another carry, and adds 
and retrieves the carry from the next 2 bits to add. And it keeps repeating 
the second step until it gets to the end. So each pass depends on the last 
passes carries. There is 1 pass per bit, and each pass is about 4 
instructions (except the last 2, which are 3 and 1), so you can only use 4 
1bit processors.

If you have 32 1bit processors, then you will be able to do 8 adds in 32 
cycles (32 / 4 = 8). The adds themselves are not any faster, but you can get 
more adds done in the same period of time.

> I don't see how could be back propagation used here.  But you can have
> connections
> like neural networks - in layers. But I would still suggest to use some sort
> of uniform
> basic topology with 4 or 6 connection. Otherwise performance would degrade
> too much.

> I don't see how - if you pass parameters to function-network is like you
> pass
> registers to normal function. Parallelity is limited because you must wait
> for
> function result.
> How you gain free memory?

Well, making it a persistent network, means that every time a function is 
used, its network is placed all over the place. For example, every time in 
the source code of a program that 2 numbers are added together, a network is 
created that adds them together. So you have a lot of redundant networks, but 
each has the ability to be done independantly of the others and only changes 
if something that it depends on changes. Its like converting an application 
in to a huge piece of silicon processor. So you gain free memory by replacing 
a network with an symbolic instruction (like an RISC instruction). That saves 
a lot of memory, but loses parallelity which also results in loss of 
performance for the 1bit processors.

> Yes, but for final result you should know that. I suppose that some
> operations
> could take variable times (like loops). You must calculate clocks then.

I'm getting a little confused here. Do you mean final results as in, timing 
the system performance for a particular application (like benchmarks etc)?

> It would be interesting if network would't be synchronised - no global clock
> (BTW: with larger network you would have serious clock problems),
> that means longer routing takes more time, but when signal arrives and
> stabilises you should have your result ready. But there are problems with
> calculating. Of course this timing could (and probably must) be calculated
> by compiler.

Hmm, that is an interesting idea. If each application had its own timing 
network, but each of those timing networks would consume 1bit processors and 
if you exceed the amount of processors then passes will need to be done in 
more then 1 clock, this would throw the timer off (a pass being a set of 
instructions that can be done in parallel), an external clock would not have 
this problem, although there is still an issue of interrupts from the input 
bits, they may accumulate, but another way is to use trigger based input, so 
that when a time is requested a bit is set and the clock bits are then 
changed.

Leyland Needham