Pivotal recently released this new version without any major announcements but customers should definitely check it out! There are three main enhancements that I will cover in this blog post.
1. Append Optimized Tables
2. WAL Standby Replication
3. Orca Optimizer
Greenplum database has had “Append-Only” tables for several versions. It only allowed for inserting new data to a table which has a savings of about 20 bytes per row because it doesn’t need the overhead of per-row visibility. Append-Only tables allows you to use compression or change the orientation to column so it is used for larger tables in the database.
With Greenplum Database 4.3, Append-Only tables are now Append-Optimized. You still get the benefits of an Append-Only table but now you will be able to UPDATE and DELETE from these tables as well.
Here is an example with a compressed and column oriented table that is Append-Optimized:
gpdb=# CREATE TABLE bar (a int, b text) gpdb-# WITH (appendonly=true, orientation=column, compresstype=quicklz) gpdb-# DISTRIBUTED BY (a); CREATE TABLE --Notice how it is still using "appendonly" in the syntax
INSERT some data and SELECT from this new table.
gpdb=# INSERT INTO bar VALUES (1, 'jon'); INSERT 0 1 gpdb=# INSERT INTO bar VALUES (2, 'roberts'); INSERT 0 1 gpdb=# SELECT * FROM bar; a | b ---+--------- 2 | roberts 1 | jon (2 rows)
gpdb=# UPDATE bar SET b = UPPER(b); UPDATE 2 gpdb=# SELECT * FROM bar; a | b ---+--------- 1 | JON 2 | ROBERTS (2 rows)
gpdb=# DELETE FROM bar WHERE a = 1; DELETE 1 gpdb=# SELECT * FROM bar; a | b ---+--------- 2 | ROBERTS (1 row)
This looks just like any other table but the difference is in the row versioning. Instead of using MVCC (which was borrowed from PostgreSQL), Append-Optimized tables have a Visibility Map which is a bitmap of what rows are no longer visible due to a DELETE or UPDATE statement on the table.
Because it isn’t using MVCC anymore, serializable transactions with UPDATE and DELETE to an Append-Optimized table are not allowed. A serializable transaction is a strict transaction isolation in which transactions execute as if they run one after another rather than concurrently. Serializable is almost never used in a big data platform so it isn’t a big deal.
Another change is how VACUUM works. Instead of marking stale rows as deleted so that row can be overwritten (MVCC), VACUUM compacts the physical files and resets the visibility map.
When should use use a Heap table (default) or an Append-Optimized table? If you have any OLTP work in Greenplum, Heap is better. If you have frequent UPDATE or DELETE statements, heap is likely better because VACUUM should run faster. Query performance may also be a bit better for smaller tables that are Heap. So for larger tables that have no or infrequent UPDATE or DELETE statements, use Append-Optimized. If you want to compress the table or make it column oriented, you must make it Append-Optimized.
WAL Standby Replication
Greenplum database has a Master server where users connect and execute queries. The Master server manages sessions, security, metadata, and query plans. Queries are executed on the Segment servers in the Greenplum cluster. To provide high availability, the Master server has a Standby-Master server. Prior to 4.3, Greenplum used block level replication to keep the data on the Master in sync with Standby-Master.
With 4.3, the WAL (Write-Ahead Log) is now streamed to the Standby-Master that is in a standby mode. This provides faster promotion of the Standby-Master in the event where the Master fails. It also has an improvement in performance for the Master server.
Engineering has refactored the query optimizer to make it easier to maintain and enhance. They also took this opportunity to enhance this optimizer to handle certain edge cases where the optimizer picked a poor plan. It does a better job at handling certain scenarios with partitioned tables, subqueries, and common table expressions.
Orca should not be used in Production in release 4.3. To use it, you need to be part of the Early Access Program (EAP). It is a very simple command to enable it but we want customers to provide feedback before you make Orca the default optimizer. Contact your Pivotal Field Engineer to get into this program and if you don’t know who this is, contact me and I’ll help you.