1 May 2008 tampe   » (Apprentice)

The power of the brain is not all about parallel execution, it's about parallel lookup

One of the mysteries of today (for me) is that we don't have parallelized memory lookups. The idea is for a threads to send a request for N memory units dereferenced at memory location Addr + 0 ... Addr + N - 1. Please increase bandwidth, don't bother so much about latancy and caches. And when we are at it, a memory unit could contain data or a reference - use that!

For example, one application that's dear to me is CFD simulations, and there inside of it I guess you need to be able to do a multiplication of a sparse matrix with a vector very quickly. The idea is to do the execution of all the scalar products by simply doing just what I describe, ask for the dereferenced memory from memory region. The system batches these request, sends it to the memory unit, which distribute them and fetches them in parallel. Just store the memory locations randomly to make sure parallel paths will be taken. To simulate very large neural networks this system can be very effective as well.

For example if an execution unit can execute 5 Gflop per second we would need 50GBytes/s of bandwidth so by increasing current bandwidth with a quite small factor and attach a parallel hypothetical memory unit we could (it looks) match the theoretical flops of todays cpu:s for any sparse matrix multiplication or similar algorithm. And hey, add a bunch of redundant small simple cores and don't be so fixated about saturate these you will be able to surly saturate the bandwidth with no extra cost due to the fact that many simple independent cores with a smaller cache is cheaper that larger ones and larger cache.

The programmer would have a simpler memory model due to the fact that we do not need to be so fixated in storing everything at a certain location close to each other. And the cost of dereferencing goes down if you can do stuff in parallel.

I just see to much potential with this idea and have not seen it tried. Maybe there is a hitch but I do not see that Any Ideas?

If you invented such a memory we would use big hash-tables much more then we use today. think about that. Shortly if this were possible a lot of cool ideas would scale to large systems without an exponential increase in the complexity of the underlying software.

I don't care, I just have fun, If this means dollars for you, cool - cheers.

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!