I always thought a scan was something you went for if you had stomach pains.
That was, until I posted a question on macresearch.org about how to do cumulative addition.
Seems easy enough. You have five numbers and you want to add them in a cumulative fashion. The result will also be five numbers, with each being the sum inclusive of it's position.
For example, let's say you have 1, 3, 2, 4, 1.
The result will be 1, 1 + 3, 1 + 3 + 2, 1 + 3 + 2 + 4, and 1 + 3 + 2 + 4 + 1.
Not too shabby. In good 'ol fashioned procedural code you could code that faster than a blindfolded fanatical cave dwelling warlord could re-assemble a Kalashnikov. But not so with parallel computing. For that, you need a scan.
But which scan? Inclusive or exclusive? Do you use the Hillis and Steele algorithm (thanks Geoffrey), or do you go for something even faster, as documented in the NVIDIA scan.pdf document? Not only that, but in your SDK folder you'll find about 4 different scan kernel examples? Apple have some too.
Some are naive, others are optimised. Some will handle big arrays, others small.
It's a bit like your wife asking for a cosmetic set for Christmas. You find yourself in a strange aisle of the city centre pharmacy store surrounded by a multitude of threatening and alien concepts. You start sweating. Women begin eyeing you strangely. In store cameras rotate and are now trained on you. Panic sets in.
What can you do? What can save you from all this, and more importantly, what can save you from yourself? Stay tuned for the next exciting episode of....The Joy of OpenCL.
Friday, 12 March 2010
Subscribe to:
Post Comments (Atom)
It is the reason we have just create the OpenCL Parallel Primitive library. You can find it at : http://code.google.com/p/clpp/
ReplyDeleteThe goal is to provide you the best (as a black blox) "scan" primitive (or sorting algorithm), depending of your parameters, context (hardware, sdk, ...)