Monthly Archives: June 2012

Outsourcer 2.7

I have a new version which is only a small bug fix for Oracle dates. If you are storing dates in Oracle less than year 100, then you will need this new version. If you aren’t, then you don’t need to upgrade.

Details:
Oracle has the ability to store dates between January 1, 4712 BCE through December 31, 4712 CE and I convert the dates to a string to load it. I was relying on the locale to do this conversion but Java apparently converts the date 0099-05-01 to 99-05-01. This doesn’t follow the ISO date standard so it fails to load this value.

My fix is to simply format Oracle DATE columns to a four digit year.

Link:
Outsourcer 2.7

Reference:
Oracle Documentation

Chorus

I recently dug into Greenplum Chorus to better understand the collaboration tool.  I installed it on my Mac with my Greenplum Database, Greenplum Hadoop, and Greenplum Command Center.  In this post, I will review how Chorus was designed and how it facilitates collaboration in an organization.
Greenplum Chorus

Chorus Features
First off, the image above shows Greenplum Chorus, Database, and Hadoop. I created the image to better understand the product so this isn’t a marketing slide given to me.

Starting with Chorus, you have these main features.

  • Scheduler: Used to refresh data in Sandbox (more on this below)
  • Data Definitions: Connections to Greenplum databases and Hadoop. You can browse the data in both databases and Hadoop and then link this to Workspaces.
  • Insights: Comments made in Chorus that are so important to the business, it is shared to everyone! No longer will revelations about data be isolated to a single person or a small group. Now you can share your insights with others!

Workspace Features

Next we have Workspaces.  In my image, I have a “Sales” Workspace as well as other Workspaces like HR, Marketing, Analytics, and Accounting.  This is where people work together around data and as you can see, the Chorus Administrator can create as many Workspaces as needed for your organization.  Each Workspace have these features:

  • Linked Data: Data from Greenplum databases and Hadoop that is linked to the workspace. This makes it simple to query this data in Chorus without needing to make a copy of the data.
  • Sandbox Data: This is either data copied from Greenplum databases using the scheduler or completely new tables derived using SQL you write. This is very powerful and goes a long way to providing business users the ability to find new value in data.
  • Chorus View: This is similar to a database view but the definition resides in Chorus.
  • Files: This is basically metadata (Text, Images, other) plus code (SQL) that is stored in the workspace. Versioning is done automatically too. You can execute the SQL directly in Chorus. Very powerful stuff.
  • Notes: This is more metadata about anything in your workspace.  Notes are also what can be promoted to an Insight for the entire organization.  You are building a living and breathing data dictionary with Chorus!

Visualization

Workspaces also have the ability to visualize data. This is done with graphing similar to a BI tool. The visualization is there to quickly understand the data and then take action on it. Maybe the action is to write a Note or Insight or might be to further investigate with additional queries to Greenplum database and Hadoop.  Chorus isn’t meant to replace reporting tools.  Instead, the aim is to quickly understand the data and then take action with a Note, Insight, and/or more investigation.

Security

Logging into Chorus can be handled by LDAP/Active Directory if you like.  Hadoop and Database connections can be made public to the Chorus users or you can require users to log into the data sources so security is handled by the Database and Hadoop.

Summary

Chorus is a great collaboration tool for Greenplum.  I am very impressed with the tool and expect more great things from the product in the future.