[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Beyond Transmeta...



 
----- Original Message -----
From: "Marko Mlinar" <markom@opencores.org>
To: <cores@opencores.org>
Sent: Tuesday, February 11, 2003 1:53 AM
Subject: Re: [oc] Beyond Transmeta...
<snip>

> In practice, you would find it hard to make a multiplier that would fit your
> purpose, also your logic would switch many times, consuming more power than
> standard circuits, and not to speak of multi-phase clock issues.
 
There is not much standard about the circuit. So you invent a new serial multiplier
 
abcd x efgh
 
Becomes: (set typeface to courier)
 
             abcd(h)+
            abcd(g)+
          pppppp+
          abcd(f)+
        ppppppp+
        abcd(e)+
      pppppppp
 
Where (x) indicates a conditional pull of the bit stream through an adder
(else pull of 0's) The product is fully complete in 11 clocks, but available
for use after 1 clock. Note that the lsb of the product is immutable after
1 clock, the 2nd lsb is immutable after 2 clocks, ... i.e. each bit of the
product is available for additional operations as it emerges. Therefor,
if you were to incorporate the multiply above into a multiply and
acumulate operation (MAC) i.e.
 
    result = (abcd x efgh) + ijkl
 
Then the addition of ijkl can begin after only 1 clock tick of the bitstream.
 
Re: power. Could be much less than conventional means.
The multiply requires 4 1-bit  serial adders. Each performing 4 additions.
Which is 16 1-bit cell operations, no latch operations
The routing logic is not illustrated above so that would increase power
consumption.
 
The traditional multiply would require perhaps 4 4-bit adder operations,
4 4-bit latch operations, 4 9-bit shift register operations, (additional operations)
at least 68 1-bit cell operations. This indicates bitstream could consume
1/4 the power of conventional means (at least for this example).
 
Using the assumption that the bitstream can clock at word width times
the parallel implimentation the traditional method computes the MAC
((4  adds + 4 shift/latch) + add) x 4 or 36 clock times of the bitstream
method. Not as good as the 50x as shown earlier.
 
Also note, as you go wider in word width the parallel method must slow
down for carry propigation whereas the bitstream does not.
 
There are a lot of unknowns here so don't be so quick to assume anything
about power consumption. A general rule of thumb though is if you can
generate the same result with less work you will consume less power.

> But even when leaving aside the implementation issues, you have will problems
> with loops, function calls and sw model, especially with PLD idea.
 
Why think in terms of loops and function calls? Go out of the box.
Start with a clean sheet of paper.

> There is also problem of debugging.
 
Initial debugging would be done through emulation. Not unlike what you do
now (synthesys). When the routing is proven then it would be incorporated
into the larger project and tested again.
 

Jim Dempsey