[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Beyond Transmeta...



> > I don't belive that you could reach more performance simplifying
> > instructions... I suppose you have computer for desktop
> > applications in mind (like Transmeta). Did you calculate how
> > many cycles does your add take (most commont ins in GP
> > aplications)? Clock ratio would be much lower.
>
> Well, it depends on the kind of performance you are looking for. Serial
> performance is fine with CISC kind of processors, because internally they
do
> a number of small things in parallel, and with new SIMD instructions they
> reach further into doing more parallel work. But with the a simpler
processor
> like the one I describe you can achieve better parallel performance. I
> calculate with 4 1bit processors you can add 1 bit per clock, so for 32bit
> number takes 32 clocks, but when you take into acount there is only 4 1bit
> processors versus a 32bit processor, you can see that the 1bit processors
> should be more scaleable, with less heat and complications and more then
> likely higher stability at higher clock speeds you can have it running
> faster, or if you have 32 1bit processors, it can do 8 32bit additions
> simultaniously with 32clocks, that is 4 clocks an add when you average it
> out, then there is 128 1bit processors, which gets you 1 clock an add,
going
> beyond 128 can get you less then 1 clock an add, that is probably the
nature
> of a network, things get done slower but so many things get done
> simultaniously.
Yes I know that. As I said your computer is pretty suitable for DSP
aplications
(like filters, etc), especially if you add some floating point support.
But latency (response time) is also important also for DSPs. I would
estimate
that your computer would make additions about 5 times slower than normal
ones. Time needed for sequential programs is thus too large.

> > I suppose that kind of programs would be pretty large. And you haven't
> > got any addressing here. How would you know where to get operands
> > and where to put them?
>
> There are diffrent ways of possibly handling connections. But one thing I
> have mentioned is the ability for self modification, which makes things
even
> more difficult, the networks must have access to their own connections.
Yes but if you have access to your own connections you must address them
somehow, right?

> One way that could be used to handle connections could be an
multidimensional
> space (mapped from flat memory space) for the bits to exist in, and to say
> that the bits can connect with bits a certain distance away from them
selves.
> A 3d or 4d space may be the best solution, one of the dimensions could
> represent the instructions, so you have layers of networks that modify one
> another, these modifications could probably be used to dynamicly change
the
> nature of a network as needed, so that one network can change another from
an
> add operation to an subtraction operation. But then there is the issue of
how
> to know which bit uses what instruction, its definetly very complicated.
But
> I think if it can be figured out, and more then likely it will be figured
out
> by looking at things diffrently, right now we are used to doing things in
a
> serial kind of way.
For placeing and optimizing look at OpenRisc200 project which is in a way
similar.

> Bandwidth is definetly an issue, but if the 1 bit processor is small
enough
> it may be incorperated into the memory and bandwidth would be made up for
by
> having it travel at the same clock rate.
Do you intend to have program in this or other memory?

> Or on the other hand memory size maybe made up for by transfering higher
> level instructions... Like in our own mind we attach words names and
labels
> to complex ideas to help us understand things better... This may very well
be
> what the network would do, to do complex logic in certain parts of the
> network, like how a certain part of the human brain does math, while
another
> does words. In this way it creates a processing network, like a CISC CPU
> turned into or emulated by a network, but with this method you can create
> multiple processing networks, that can be localized to be the equivalent
of
> having multiple CISC processors. Like having one processor per pixel, to
do
> real time ray tracing...
Do you think such large programs would fit into your network? If they won't
you would have to change most of the network every cycle. That's the
bandwidth
problem I was talking about. For general applications particular procedures
are not often used. Generally you can execute just a few of them
simultaneusly,
so your network would be mostly idle?

> Not one particular... but I have a few ideas of how it would work. The
> biggest factor is organization of networks. I've done experiments with
> treating mathematical equations as networks, and using certain properties
to
> shift the network around so that you can get diffrent equations. If done
> right you can optimize a network to be more parallel, to take advantage of
a
> certain number of 1bit processors. The other apsect is how things are
mapped
> into the network, there are many diffrent ways for seperate networks to
> connect with each other, by using the same shifting, you can achieve
various
> changes to the network, which allow for diffrent kinds of connections that
> where not possible before, these changes can either be helpful for a
> particular situation or bad, if it is good it is kept, if it is bad it is
> shifted back. So you shift around the network to achieve various effects.
> This is still partly in theory, mostly from my networked variable
experments,
> I have yet to see how to shift around binary networks.
Yes, it would be nice if you could use some transformations instead of
reloading
it all over.

Marko