Today I read a paper "MapReduceand Parallel DBMSs: Friends or Foes?" which compare the techology MR and Parlell DBMS. The following is extracted from this paper.
1. ETL and "read once", MR wins
2. Complex analytics MR wins
3. Semi-structured data. MR wins
4. Quick-and-dirty analyses. MR wins
5. Limited-budget operations. MR wins
6. Query-intensive (Performance). Parallel DBMS wins (Although the performance of Parallel DBMS is better than MR, the loading processing of Parallel DBMS cost much more time)
The reason why the performance of Parallel DBMS is better than MR
1. Repetitive record parsing
2. Compression.
3. Pipelining. (Parallel DBMS never write intermediate data into local disk, just push it to the target node, while MR write intermediate data into disk)
4. Scheduling.
5. Column-oriented storage.