[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Beyond Transmeta...

To: <cores@opencores.org>
Subject: Re: [oc] Beyond Transmeta...
From: "Marko Mlinar" <Marko.Mlinar@campus.fri.uni-lj.si>
Date: Fri, 9 Jun 2000 15:16:39 +0200
References: <64.382a97e.2671f6fe@aol.com>
Reply-To: cores@opencores.org
Sender: owner-cores@opencores.org

> > I suppose it would be nice if network would be optimized like normal
> > circuits.
> > But I am still concerned about speed - normal adder would probably add
much
> > faster.
>
> It would add much faster in that case you gave, but if you are adding a
lot
> of numbers in parallel, and you had enough 1 bit processors to handle it,
> then you could gain considerablely more adds within that 32 cycles.
yes, of course, but what am I trying to tell you:
1. you still need ~32 cycles per add, RISCs need 1c, thus sequential time to
execute
is a lot larger
2. you cannot have more than 32 parallelly running adds, so your network is
just a (let's say a few times) faster than RISC - just because of frequency,
but only if you have such parallelity.

> With the add network I created, it can only use 4 processors
simultaniously.
> In the first pass it adds the first 2 bits, and gets their carries (2 adds
2
> carries = 4 instructions). In the second pass it adds the carry from the
> previous pass, and then combines those carries into another carry, and
adds
> and retrieves the carry from the next 2 bits to add. And it keeps
repeating
> the second step until it gets to the end. So each pass depends on the last
> passes carries. There is 1 pass per bit, and each pass is about 4
> instructions (except the last 2, which are 3 and 1), so you can only use 4
> 1bit processors.
>
> If you have 32 1bit processors, then you will be able to do 8 adds in 32
> cycles (32 / 4 = 8). The adds themselves are not any faster, but you can
get
> more adds done in the same period of time.
I suppose you know that your network can be only slower than normal add
network consisting only of gates. I think that 32b+32b adder needs 23 layers
of logic. For same implementation you would need much more gate layers!

> > I don't see how - if you pass parameters to function-network is like you
> > pass
> > registers to normal function. Parallelity is limited because you must
wait
> > for
> > function result.
> > How you gain free memory?
>
> Well, making it a persistent network, means that every time a function is
> used, its network is placed all over the place. For example, every time in
> the source code of a program that 2 numbers are added together, a network
is
> created that adds them together. So you have a lot of redundant networks,
but
> each has the ability to be done independantly of the others and only
changes
> if something that it depends on changes. Its like converting an
application
> in to a huge piece of silicon processor. So you gain free memory by
replacing
> a network with an symbolic instruction (like an RISC instruction). That
saves
> a lot of memory, but loses parallelity which also results in loss of
> performance for the 1bit processors.
uhh... this thing has to be done in HW. It really complicates all thing.
it would be better if you would route things staticaly (by compiler), and
then
use free instructions/functions (if not free =>wait). Only small
instructions
should be inlined.
BTW: you can pipeline one function (if it doesn't contain loops) so you can
execute 1 function/cycle, regardless of function length.

> > Yes, but for final result you should know that. I suppose that some
> > operations
> > could take variable times (like loops). You must calculate clocks then.
>
> I'm getting a little confused here. Do you mean final results as in,
timing
> the system performance for a particular application (like benchmarks etc)?
yes, exactly. Besides I don't see another way.

> > It would be interesting if network would't be synchronised - no global
clock
> > (BTW: with larger network you would have serious clock problems),
> > that means longer routing takes more time, but when signal arrives and
> > stabilises you should have your result ready. But there are problems
with
> > calculating. Of course this timing could (and probably must) be
calculated
> > by compiler.
I just remembered - you cannot pipeline such things easily (that idea I
described
in BTW)...

> Hmm, that is an interesting idea. If each application had its own timing
> network, but each of those timing networks would consume 1bit processors
and
> if you exceed the amount of processors then passes will need to be done in
> more then 1 clock, this would throw the timer off (a pass being a set of
> instructions that can be done in parallel), an external clock would not
have
> this problem, although there is still an issue of interrupts from the
input
> bits, they may accumulate, but another way is to use trigger based input,
so
> that when a time is requested a bit is set and the clock bits are then
> changed.
Sorry I didn't get your idea clearly.
I think there could be problems with external timer and clocks.

You could also have two bits for each signal:
1. data (same meaning)
2. 'clock' bit - changes only when data hasn't changed
Using this principle network can be totally unsinchronized.

regards,
    Marko

References:
- Re: [oc] Beyond Transmeta...
  - From: Suboner@aol.com

Prev by Date: [oc] think different (was: Beyond Transmeta...)
Next by Date: Re: [oc] think different (was: Beyond Transmeta...)
Prev by thread: Re: [oc] MISCs and partially desinchronized networks
Next by thread: Re: [oc] Beyond Transmeta...
Index(es):
- Date
- Thread