Creating a Date Dimension

Here is another technique worth mentioning in regards to building a date dimension.  One way to do this is to create a loop and iterate over this to insert the records.  This is slow in virtually every database and we have a way to do this with a single INSERT statement.

First create the table:

DROP TABLE if exists hr.date_dim;

CREATE TABLE hr.date_dim
(
  date_key timestamp without time zone NOT NULL,
  date_week timestamp without time zone NOT NULL,
  date_month timestamp without time zone NOT NULL,
  date_quarter timestamp without time zone NOT NULL,
  date_year timestamp without time zone NOT NULL
)
DISTRIBUTED BY (date_key);

And now the magic:

INSERT INTO hr.date_dim
SELECT  mydate AS date_key,
        date_trunc('week', mydate) AS date_week,
        date_trunc('month', mydate) AS date_month,
        date_trunc('quarter', mydate) AS date_quarter,
        date_trunc('year', mydate) AS date_year
FROM    (
        SELECT '1995-01-01'::timestamp + interval '1 day' * (generate_series(0, (EXTRACT('days' FROM '2007-04-16'::timestamp - '1995-01-01'::timestamp)::int))) AS mydate
        ) AS sub;

My example builds the date dimension with every day populated between January 1, 1995 through April 16, 2007.  I did this as it was the minimum and maximum dates in my HR table with job history information.

This solution is similar to the other post I made about using generate_series to avoid a nested loop because I’m using generate_series again.  It is a very powerful and easy way to dynamically create a dataset without using a loop to do so.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.