Monday, April 27, 2015

Migrating Oracle Table to Redshift – Generating the table DDL

In my current work we are migrating some of our workload to Redshift. And that includes migrating many of tables from Oracle to Amazon Redshift. We know these two are very different platforms and we cant simply copy the Oracle’s DDL and create a table on Redshift.  Redshift doesn’t(yet) enforce primary keys, the data types are different; the partitioning concept is not (yet) there on Redshift. Also, there is something new in form of compression encoding that we need to take care of while building  tables on Redshift.

I will skip the topic of compression encoding in this post as I think Redshift Documentation already explains it very well.  I am going to focus on other strategies like finding an alternative of Oracle’s Primary Key and Partitions in Amazon Redshift and choosing the right data types for columns.
Primary Keys Alternative:
In our practice, we have seen the most frequently used keys are great options for distribution keys and many a time those keys turn out to be Primary Keys. So while designing Redshift table, if you define your distribution keys columns as primary key columns of Oracle, it will be a good beginning.
Primary Keys Alternative:
Sort Keys are a good replacement for Oracle Partitions.
Column Data Types:
Below is the query which take a Oracle table’s DDL and generates the corresponding column Definition at Redshift. Again its only an estimated data type selection that I have come up with, which should give you a head start.

1 comment: