S3 is Amazon’s Simple Storage Service which is an inexpensive cloud storage solution and has quickly become a solution for cold data and backups. Greenplum now has External Tables that can read and write data to S3 so you can leverage this popular storage service with Greenplum. Here is how you do it!
1. You will need an Amazon account with your Access Key ID and Secret Access Key. If you have the aws CLI installed and configured, just cat ~/.aws/credentials
2. Get your default region. This is in ~/.aws/config
3. You’ll need a bucket in this region and this can be done with the AWS web interface.
4. You’ll need a configuration file. Here is an example (be sure to change the secret and accessid).
[default] secret = <secret> accessid = <accesssid> threadnum = 4 chunksize = 67108864 low_speed_limit = 10240 low_speed_time = 60 encryption = true
5. Copy the configuration to every Segment Host in your cluster.
for i in $(cat segment_hosts.txt); do scp s3_demo.conf $i:/home/gpadmin; done
Writing to S3
1. Create the Writable External Table with the S3 protocol, AWS URL that has the correct region, and the configuration file that is found on every Segment Host.
CREATE WRITABLE EXTERNAL TABLE public.demo_write (id int, fname text, lname text) LOCATION ('s3://s3-us-east-1.amazonaws.com/pivotalguru/demo config=/home/gpadmin/s3_demo.conf') FORMAT 'TEXT' (DELIMITER '|' NULL AS '');
2. Execute an INSERT statement:
INSERT INTO demo_write SELECT i, 'Jon_' || i, 'Roberts_' || i FROM generate_series(1,10000) as i;
Note: Each Segment will create a file in S3 in your bucket with the prefix you specify in the location specified in the Writable External Table. In this demo, each file is prefixed with “demo”. An example filename is “demo9767abbb3.txt”.
Reading from S3
1. Create the External Table with the same location and configuration as before.
CREATE EXTERNAL TABLE public.demo_read (id int, fname text, lname text) LOCATION ('s3://s3-us-east-1.amazonaws.com/pivotalguru/demo config=/home/gpadmin/s3_demo.conf') FORMAT 'TEXT' (DELIMITER '|' NULL AS '');
2. Select the data.
gpadmin=# select * from demo_read limit 10; id | fname | lname -----+---------+------------- 58 | Jon_58 | Roberts_58 90 | Jon_90 | Roberts_90 122 | Jon_122 | Roberts_122 191 | Jon_191 | Roberts_191 207 | Jon_207 | Roberts_207 239 | Jon_239 | Roberts_239 271 | Jon_271 | Roberts_271 319 | Jon_319 | Roberts_319 335 | Jon_335 | Roberts_335 351 | Jon_351 | Roberts_351 (10 rows) Time: 1139.538 ms
1. An S3 External Table that references a single file will only use a single Segment to read the data. Instead, try to have at least 1 file per Segment for an S3 External Table.
2. S3 External Tables supports gzip compression only.
3. Use S3 External Tables for cold storage or to create a backup of a table or query using a Writable External Table.