Rolling partitions with Polymorhpic Storage

Greenplum’s Polymorhpic storage is a very useful feature in Greenplum database where tables can be created and altered with a mixture of columnar, compressed, and row storage.

Polymorphic Storage

Here is an example of creating one of these tables and then managing this over time.

CREATE TABLE edw.sales_fact
(
  store_id integer,
  sales_person_id integer,
  product_id integer,
  sale_date date,
  sale_price numeric,
  product_cost numeric,
  spiff numeric
)
DISTRIBUTED BY (store_id, sales_person_id, product_id, sale_date)
PARTITION BY RANGE(sale_date) 
        (
        START ('2011-01-01'::timestamp) INCLUSIVE END ('2011-02-01'::timestamp) EXCLUSIVE WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5),
        START ('2011-02-01'::timestamp) INCLUSIVE END ('2011-03-01'::timestamp) EXCLUSIVE WITH (appendonly=true, orientation=column, compresstype=quicklz),
        START ('2011-03-01'::timestamp) INCLUSIVE END ('2011-04-01'::timestamp) EXCLUSIVE
        )
;

As you can see, this table has three partitions with each having different storage techniques.

  • January 2011 is columnar and compressed with ZLib
  • February 2011 is columnar and compressed with QuickLZ
  • March 2011 is stored as a row store and is not compressed

As time goes on, the DBA may want to make March 2011 compressed and then add a partition for April 2011. To do this, we first need to move the March 2011 data to a new table and exchange the partition. The last step is to add the new April 2011 partition.

CREATE TABLE edw.sales_fact_march
(
  store_id integer,
  sales_person_id integer,
  product_id integer,
  sale_date date,
  sale_price numeric,
  product_cost numeric,
  spiff numeric
)
WITH (appendonly=true, orientation=column, compresstype=quicklz)
DISTRIBUTED BY (store_id, sales_person_id, product_id, sale_date);

INSERT INTO edw.sales_fact_march 
SELECT * FROM edw.sales_fact 
WHERE sale_date >= '2011-03-01' 
AND sale_date < '2011-04-01';

ANALYZE edw.sales_fact_march;

This next step is optional but you can now TRUNCATE the March 2011 partition because all of the data is in the sales_fact_march table. Plus, we need to exchange the two and it is faster to exchange with a truncated table.

TRUNCATE TABLE edw.sales_fact_1_prt_3;

Note: I got the partition table name from pg_partitions.

Now exchange my row stored partition for the columnar and compressed partition.

ALTER TABLE edw.sales_fact 
EXCHANGE PARTITION FOR (RANK(3)) 
WITH TABLE edw.sales_fact_march 
WITH VALIDATION;

Note: I got the RANK from pg_partitions.

We no longer need the temporary March 2011 data so we can drop this table.

DROP TABLE edw.sales_fact_march;

Lastly, add an empty April 2011 partition.

ALTER TABLE edw.sales_fact 
ADD PARTITION START ('2011-04-01'::timestamp) INCLUSIVE 
END ('2011-05-01'::timestamp) EXCLUSIVE;

The new DDL for the table looks like this:

CREATE TABLE edw.sales_fact
(
  store_id integer,
  sales_person_id integer,
  product_id integer,
  sale_date date,
  sale_price numeric,
  product_cost numeric,
  spiff numeric
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (store_id, sales_person_id, product_id, sale_date)
PARTITION BY RANGE(sale_date) 
          (
          START ('2011-01-01'::date) END ('2011-02-01'::date) WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5) , 
          START ('2011-02-01'::date) END ('2011-03-01'::date) WITH (appendonly=true, orientation=column, compresstype=quicklz), 
          START ('2011-03-01'::date) END ('2011-04-01'::date) WITH (appendonly=true, orientation=column, compresstype=quicklz), 
          START ('2011-04-01'::date) END ('2011-05-01'::date)
          )
;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.