[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] MISCs and partially desinchronized networks



Marko,

your "MISC Matrix" design is very interesting. Since you mention
storing programs in the local memories, I will suppose that you are
talking about a MIMD (multiple instructions, multiple data) design
instead of a SIMD like early Connection Machines. If that is the case,
then the lack of control instructions is strange.

Perhaps a bit of the history of highly parallel computers might help
you:

Researchers noticed in the mid 1980s that it was important that the
communication network between the processos have at least as many
dimensions as the problem you were trying to solve. So if you were
trying to calculate a 3D fluid flow, for example, then your 2D mesh
network might become a performance bottleneck. Having more dimensions
in the network was never a problem since they could simply be ignored.
So everybody started to build "hyper cube" machines which seemed to
scale up better than the alternatives.

By the early 1990s we had made the network circuits smarter so that
packets could flow through a node without interrrupting the calculation
in that node (unless that happened to be the packet's destination).
With this change, the network started to perform as if it had infinite
dimensions (or, to be more exact, one dimension for each node) no
matter how it was organized physically. So the simpler 2D and 3D meshes
and toruses became popular. We became worried about scalable
total bandwidth and limiting latency (see the Scalable Coherent
Interface, SCI, for ideas about this - http://www.SCIzzL.com/).

Then the problem was that we made the network too smart - it started to
look like LANs with layered protocols and buffers and so on. So a
packet was calculated by the applications and copied to a buffer in
memory. The OS was invoked and copied the packet to a system buffer.
Then the hardware was started up and a DMA circuit copied the system
buffer to the network itself. In the destination the reverse processes
took place.

Enter active messages - the applications pushes the packet directly to
the network hardware as it is being calculated (avoiding four trips
through the memory bus!!). In the destination, the arrival of the
packet header invokes the application callback function that directly
consumes the incoming bytes.

Your system is more like SCI, so you are on the right track as long as
you can keep it simple.

I hope this helps.
-- Jecel
P.S.: I will try to finish the pages describing my CPU later this week
and then I will post the URL here. In addition, it might be interesting
for me to explain why I am keeping this as a separate project even
though the overlap with Open Cores is so great (I have USB, Firewire,
Ethernet MAC, and so on).