Re: The trouble with Parallelism

A few months ago, my colleague Mike Barker tackled a word split algorithm (see here) and proved that taking the parallel approach may not always be right.
In some case, a simple 20 lines single threaded “naive” solution can out-perform your hard thought complex multi-threaded monster.

Still – the problem is intrigued me because on the face of it – This is an ideal problem for a parallel algorithm. It is a classic “Data parallelism” problem borrowed from Guy Steele’s examples of
parallel computing using Fortress.

So – in the interest of science I devised an experiment.

I created 3 different solutions to the word split problem – All written in java.

  1. Brute Force – The same simple 20 lines, single threaded – iterate, split, done. (source)
  2. Simple Divide and conquer – A trivial implementation of the divide and conquer algorithm. The text is divided between a fixed number of worker threads. They split a section of the code to words and possibly chunks (The two character sequences at the edges which may be a part of a bigger word). Those chunks will be merged together in the merge phase. One thread is dedicated to block and wait until all the splitting is done by all threads….. It is that trivial/dumb. After all worker threads are done, that single thread merges the results (source).
  3. Flexible Divide – No merge at all. Every thread gets a section of the text and decides itself whether the first characters are a chunk from a bigger word (by looking at the character  just before its section) or not. If they are, it skips them. Then it splits all the words in its section and carries on perhaps even beyond the border between itself and the next thread until it splits the last word that started in its section. Notice – no contention, no need for threads to combine anything between them. (Have you spotted the weakness in this algorithm? see conclusions point no. 4) (source)

I ran those algorithms on a 16 core machine using 16 worker threads (for the parallel algorithms).

The text was Alice in Wonderland, but to test the algorithms, I ran it a few times, every time doubling the size of the text by concatenating another Alice book.

and the results:

Conclusions:

  1. The “weak” proof stands  (obviously) – Parallel algorithms are not always better. They carry a cost.
  2. Contention between threads is the big cost of parallel algorithms. Contention can be either on the data or resources. In this case – merging of the results creates contention. Best way to tackle this is by finding an algorithm that does not have it.
  3. Depending on the algorithm – if the cost of parallelism is constant – then the there will be input data set big enough for the benefit of parallelism to outweigh the cost. If it depends on the size of the input, then you’re really in trouble.
  4. Many times, an algorithm that sacrifices performance in certain cases may gain you huge performance improvements in other cases. This is the case with the flexible divide. If all the text was one long word with no spaces at all – The flexible divide performance would be awful. One thread would do work similar to BruteForce but all others will just keep skipping characters ie. waste CPU cycles. However, if we ignore that edge case and agree that our input contain words whose average size is a lot less than the 100K the book takes – Then it out-performs the others.
  5. Final conclusion for me is: There are no free lunches – You can’t have a high performing parallel algorithm without thinking about it – You can’t always ignore parallel solutions, you can’t expect to follow simple rules blindly.

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Group has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Group will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Group does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Group or any other FX and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.