Outsourcer 5.1.5 has the following changes:
– Fixed os.queue column_name being incorrectly set to a value after a refresh job completes. The value should have been left blank.
– Installer script was enhanced to create tables with hash distribution for Greenplum and HAWQ 1.3 and with random distribution for HAWQ 2.0.
5.1.1 enhances Append jobs to use Big Integer in addition to Integer data types. Additionally, you can now use Timestamp data types.
Be sure to always use an ordered sequence in Oracle and an ordered identity in SQL Server when using an Append job. Timestamp is useful when you are using the system timestamp in Oracle or SQL Server to append new data.
5.0.9 adds support for HAWQ 2.0.
I’m looking for feedback on how best to handle table distribution for tables created in HAWQ 2.0. Outsourcer automatically sets distribution keys based on the source primary keys so it always uses hash when there is a PK found. HAWQ 2.0 supports hash and random distribution as before but random distribution allows a cluster to be resized without having to redistribute the data.
– Should I keep the code as-is?
– Should I set a global environment variable to allow you to set all tables to be created random or not?
– Should I update nearly every UI screen as well as the job and queue tables to have a random boolean that is only used for HAWQ 2.0?
5.0.8 enhances table storage for HAWQ tables. Outsourcer no longer creates “column” oriented tables in HAWQ and instead uses “Parquet”. Additionally, when HAWQ tables use both Compression and Parquet, the compression algorithm is now Snappy rather than Quicklz. Compressed row oriented tables still use Quicklz.
I have updated GPLink to handle timestamp and date formats problems. This has only been a problem with Oracle’s JDBC driver but it may be needed with other databases.
I created a new project that simplifies the process to create Greenplum or Hawq External Tables using gpfdist to stream data from any valid JDBC source. It is like pointing gpfdist at Sqoop to pull data without landing a file but gplink ensures that the data is cleansed first so that the data will be readable by Greenplum or Hawq.
This will work with PostgreSQL (yeah!), MySQL (yuck), DB2, Informix, etc. You will have to download all third party JDBC drivers separately.
This is a new project so I’m looking forward to any feedback you can provide.