Thursday, February 25, 2010

Adventures in Oracle Parallel Index Creation

Adventures in Oracle Parallel Processing

So currently I'm working on a large database rebuild. We are moving from a single node instance to a 3-node RAC System on Oracle 10g. The database is roughly 3 terabytes in size and we keep 18 months of rolling history. I wanted to repartition the tables during this move so instead of just doing a data file move I am re-creating the tables and then using datapump import to get the data into them. Once the table and data is there then I will build all the indexes manually. The tables will be partitioned by month and then sub partitioned by hash (a sequence number field so it is unique). As I said, we keep an 18 month rolling window of partitions available and once a month ends that partition's data is compressed and is essentially read only, although we don't actually make it so there is nothing that will ever write to it again. It's used strictly for historical reporting purposes.

The first time I ran the process I ran into two problems.

Problem #1:

Unless you specify pctfree 0 in the index create scripts it will use the tablespace default which in my case was 10%. Because these partitions exceed 100gig each and sometimes more I was basically wasting 10+ gig per partition and after 18 partition loads it was over 200gig of wasted space.

Problem #2:

I had specified parallel in the index create scripts but not specified degree or instance relying instead on Oracle to make the right choices with regard to both of these parameters. I also was multitasking the index creation and running three of them simultaneously, thinking that with a 3 Node RAC system this would not be a problem. Upon running them though I was finding that the average index creation time was 8-10 hours each! This was certainly unacceptable as I have over 50 indexes to create in the database.

Solution:

I reworked the scripts and dropped the tablespaces. I added pctfree 0 to the scripts and kept the parallel at the default level but decided to run them one at a time. To my surprise the indexes were each completing in about 1 to 1.5 hours each! When I looked at the session this time oracle had created 32 parallel processes per node and was churning away using all the resources that it could and it was fast! I decided to post on this because I attended the Southeast Oracle Users's conference today and while listening in on one of the presentations the presenter suggested that we should ALWAYS specify the degree of parallelism and instantiation. In this case it is not true. Hope you enjoyed reading and until next time.

Bill Schoonmaker
Database Guy