Pivotal Greenplum Version 5.x

Version 3.x
I’ve been using Greenplum since version 3.0 back in 2007. That version was actually the first commercially available version too. The versions were released about once every 6 to 12 months and got as high as 3.3.

Version 4.x
Once the version got to 4.3, the number just seemed to get stuck. 4.3.0 was a pretty big release which changed “Append Only” tables to be “Append Optimized” which simply meant you could start updating and deleting tables stored in the append-only format. But more enhancements came to 4.3 but the version number never exceeded 4.3.

Labs Methodology
Major enhancements came to version 4.3.x and the innovation came at a faster pace but you may not have noticed this if you were just looking at the version number. Pivotal’s engineering team embraced the Pivotal Labs methodologies of pair programming and quick iterations. There was a huge transformation happening in this group.

Version 5.x
When 5.0 came out, it was a big deal. Greeplum had always been a fork of PostgreSQL 8.2 but with 5.0, Greenplum was rebased to 8.3. A lot of work went into this change which also requires migrating from 4.3.x to 5.0.

Now this next change, I didn’t expect. During the 4.3.x development, the only time the version number would change to either 4.4 or 5.0, it would be when the database required a migration to upgrade. With 5.x, the version numbers are coming fast and don’t require migrations to upgrade. It just a simple binary swap to upgrade.

The result has been the release of 5.0 on 2017-09-14, 5.1 on 2017-10-20, and 5.2 on 2017-11-18. Do you see the pattern? Monthly point releases! All of these releases so far have been simple binary upgrades but have a ton of improvements each time.

Version 6.0?
Version 6 is in sight. Seriously. I would expect this to be the next rebase of Greenplum to PostgreSQL 8.4 and will require a database migration to upgrade. It is amazing how the Labs culture and open source software development has enhanced the development of Greenplum.

3 thoughts on “Pivotal Greenplum Version 5.x

  1. Mark

    Hello

    Thanks for the great info you keep posting,

    In regards 4.x to 5.x have you seen any performance difference?
    I have a 4.x install that works ok and I do not need features from the 5.x releases, but if there any performance enhancement I would be happy to go trough the pain of upgrading

    Also do you know what’s happening on the codegen side?
    I tried DeepGreen and it flies, about 3 times faster than GrenPlum 4.x
    I could not compile 5.x with codegen enabled so I can’t compare DeepGreen VS GreenPlum with CodeGen, do you know if compiling with codegen makes any difference?

    Last but not least, as we are talking about compiling, ubuntu opensource binaries VS Pivotal binaries, do you know if there is any difference?

    Sorry for all the questions!

    Thanks again!
    Mark

    Reply
    1. Jon Post author

      > In regards 4.x to 5.x have you seen any performance difference?

      Yes, 5.x is faster for many reasons. First, ANALYZE is significantly faster. I’ve observed analyzedb running on 3TB of data taking just 1 to 2 minutes in 5.x where it took up to an hour in 4.3.x.

      Query performance is better because of improvements to the optimizer. Different types of queries are handled better and there is a new GUC that handles the number of iterations the optimizer takes to find the best plan. This has improved simple queries considerably.

      > Also do you know what’s happening on the codegen side?

      I’ve never used codegen. That sounds like a good question for the Greenplum Users group.

      > I tried DeepGreen and it flies, about 3 times faster than GrenPlum 4.x

      I had never heard of DeepGreen until now and it looks interesting. It is an AWS Marketplace offering which states it is 100% compatible with Greenplum but I don’t see anything that suggests they are making improvements beyond what is in the open source code.

      Their AWS product is also just a single node. So, they don’t have mirroring enabled which will help performance. As I said above, 5.x is faster than 4.3.x so there is improvement there. How they configure the AWS resources is very important to optimizing performance. For example, the disk format and mount options are critical and can make Greenplum slow or fast.

      The number of segments per host is also a huge factor. They are using instance types with lots of memory so they are probably increasing this. Without mirrors, they can run even more segments per host. Lastly, you may not be worried about concurrent query performance and they may not have tested that either. So, they could run an even greater number of segments for single user performance.

      Pivotal also has AWS Marketplace products and it includes Greenplum v5.x. I am the author of that code and did the performance optimization and testing. There is a single node option there too or up to 18 nodes (2 masters and 16 segment hosts). There are other instance types that are less expensive if you just want to try it out or run the same instance type you see in DeepGreen.

      > Last but not least, as we are talking about compiling, ubuntu opensource binaries VS Pivotal binaries, do you know if there is any difference?

      Yes, there will be a difference. There are some components not included like QuickLZ because of licensing issues. A full list can be seen here: http://gpdb.docs.pivotal.io/530/relnotes/GPDB_530_README.html#topic_uvz_dfc_cz

      Reply
  2. Mark

    Thanks, I will definitely do some testing then, and report back here

    I came to know deepgreen by chance looking for multi tread postgress, they have a 30 days trial binaries on their website, I tested locally on an identical HW setup (1 master 3 datanodes with 4 segments) because the GP database is 100% compatible with DG, hover binaries are not drop in replacement, so you need to do the install like for gp

    All conditions the same, DG is truly 3-5 times faster, they claim it is because they do the following:
    better join and aggregation algorithms
    new subsystem to handle spills
    advanced techniques that maximize CPU performance through JIT-compiled query execution, vectorized scans, and data-path optimization.

    This is way I was interested in codegen

    Unfortunately they are not open source, we need to use opensource so despite the good performances we are not using it

    Thanks again!

    Mark

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.