Greenplum Database Incremental Backup

Greenplum Database has a feature called “incremental” backup but really it is a incremental differential backup. In this post, I will explain how it works and give you a working example.

AO Tables
In version 4.2 and earlier, AO tables are Append Only but starting in 4.3, the tables are Append Optimized. Everyone using 4.3 should be using AO tables as it uses less space, supports this incremental feature, and executes faster. Incremental backup requires AO tables.

Example:

CREATE TABLE foo (id integer, bar text) WITH (appendonly=true) DISTRIBUTED BY (id);

Using the incremental option on a regular heap table will not make any difference. It will backup all heap tables with every backup.

Demo

First create a database.

gpdb=# drop database poc;
DROP DATABASE
gpdb=# CREATE DATABASE poc;
CREATE DATABASE
gpdb=# \c poc
You are now connected to database "poc" as user "gpadmin".

Now create an AO table and insert some data.

poc=# CREATE TABLE foo (id integer, bar text) WITH (appendonly=true) DISTRIBUTED BY (id);
CREATE TABLE
poc=# INSERT INTO foo VALUES (1, 'bar_1');
INSERT 0 1
poc=# INSERT INTO foo VALUES (2, 'bar_2');
INSERT 0 1
poc=# 

I’m next going to execute a full backup. For the demo, I’m going to turn off compression so it is easier to view the backup files.

I’m using gpcrondump with these switches:
-x specifies the database to backup
-a means automatic and don’t prompt
-z turns off compression

bigmac:~ gpadmin$ gpcrondump -x poc -a -z
20140909:13:39:19:006156 gpcrondump:bigmac:gpadmin-[INFO]:-Starting gpcrondump with args: -x poc -a -z
...

Just to see what is in the backup file, I’m going to cat one of the segment’s backup files. It shows one of the two records in this tiny table.

bigmac:~ gpadmin$ cat /data/primary/gpseg0/db_dumps/20140909/gp_dump_0_2_20140909133919 
--
-- Greenplum Database database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_with_oids = false;

--
-- Data for Name: foo; Type: TABLE DATA; Schema: public; Owner: gpadmin
--

COPY foo (id, bar) FROM stdin;
1	bar_1
\.


--
-- Greenplum Database database dump complete
--

Now I immediately run an incremental backup. Nothing changed in my database so it should be an empty backup set.

bigmac:~ gpadmin$ gpcrondump -x poc -a -z --incremental
20140909:13:41:19:006395 gpcrondump:bigmac:gpadmin-[INFO]:-Starting gpcrondump with args: -x poc -a -z --incremental
...

To verify:

bigmac:~ gpadmin$ cat /data/primary/gpseg0/db_dumps/20140909/gp_dump_0_2_20140909134119 
--
-- Greenplum Database database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

--
-- Greenplum Database database dump complete
--

So my incremental backup doesn’t have any data in it. If I had deleted or inserted any data to my one table, the entire table would be part of the backup set. That is why it is an incremental differential backup. If there is a difference in the AO table (or partition), the entire table/partition will be included in the incremental backup.

Restore
So now that I have one full backup and one incremental, let’s drop the table and then restore it.

bigmac:~ gpadmin$ psql poc
psql (8.2.15)
Type "help" for help.

poc=# DROP TABLE foo;
DROP TABLE
poc=# \q

And now the restore using gpdbrestore. I’m using the following switches:
-a means automatic and don’t prompt
-e will drop the target database and then recreate it
-s uses the latest backup set

bigmac:~ gpadmin$ gpdbrestore -a -e -s poc
20140909:13:42:58:006720 gpdbrestore:bigmac:gpadmin-[INFO]:-Starting gpdbrestore with args: -a -e -s poc
...

Verify the restore worked:

bigmac:~ gpadmin$ psql poc
psql (8.2.15)
Type "help" for help.

poc=# SELECT * FROM foo;
 id |  bar  
----+-------
  1 | bar_1
  2 | bar_2
(2 rows)

poc=# 

Conclusion
Both commands used here, gpcrondump and gpdbrestore, have lots of options to backup and restore respectively. Using the incremental switch on gpcrondump will save you disk space and execution time while still giving you the ability to do a full system restore or even a selective table restore.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.