HAWQ versus Hive on HDP (Part 2)

Part 1 covered the comparison of HAWQ versus Hive using Map-Reduce instead of Tez. Now in Part 2, I will execute the same scripts but with Tez enabled.

Enabling Tez is pretty easy to do with HDP You simply change this in the General section under Hive and restart Hive:

hive.execution.engine = tez

Next, I re-ran the same loads and queries. Overall, it is roughly 25% faster with Tez enabled but it is still much slower than HAWQ.

Here is the graph of results:
HAWQ vs Hive 2

– Loading was slower with Tez enabled but this is probably because I’m testing with a VM.
– Every query was about 40% faster in Hive with Tez versus without.
– HAWQ was about 30 times faster than Hive with Tez and about 57 times faster than Hive with Map-Reduce.
– The execute time reported by Hive was incorrect for each query. For example, I used “time hive -f icd9.sql” to execute a the icd9.sql query and capture the timings. Time reported:

real    0m25.060s

but Hive reported:

Time taken: 13.17 seconds, Fetched: 19 row(s)

So another test but the same result. Hive is much, much slower than HAWQ and even with Tez enabled. If you want Enterprise level SQL for Hadoop, HAWQ is the best solution.

1 thought on “HAWQ versus Hive on HDP (Part 2)

  1. Jon Post author

    I have received some questions on my tests about the Hive Cost Based Optimizer. It was turned on during the tests as well as the auto stats.

    I’ve also been looking at some other Hive benchmarks out there. It seems a common trick is to partition data at different grains between vendors to skew the results….


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.