Friday, November 12, 2010

Using advantage of partition elimination

[The problem]

    Application team asked for help:
  • Daily financial report delayed near one month because job running very slow while DB to shutdown everyday for cold backup. 
  • My estimation is about 30 hours for the job to completed. 
  • Vendor not able to provide solution even tried changing code a few times.



[Diagnostic]
    This range partitioning table is about 140Gb big, with 500+ partitions, even partition stores 5 values of job_id.
    From execution plan, partition is performed but no parallelism regardless PARALLEL server enabled, and no performing full table scan, how ever generating lots of I/O , consist gets requires , physical reads etc.
    Total consist gets is about 5 times of partitions needed for scan.
    The statement is like this:
     select ... from p_table where part_key_col in (select distinct job_id from job_table...);

    The execution plan shows NEST LOOP for each value return from sub-query, caused redundantly access to partitions.
    ie.
     for each job_id returned from sub-query (110 distinct job_id)
       do
         full partition scan  ( one of total 22 partitions )
       done.
  Hence, each partition is scanned 5 times in worst situation. Total times: 110
  While parallelized scan can't happen in single partition, which only occurs for simultaneoustly access to multiple partitions.

   
[Solution]
    Rewrite the code , to make partition elimination happen.
    select  ... from p_table where part_key_col between (select min(distinct job_id) from job_table ...) and (select max(distinct job_id) from job_table ...);
    Not that the logic slightly changed, but applicable in this case.

     After make this change according to my suggestion, job finished within 30 minutes, while observing 12 parallel processes running happily to scan 22 partitions once only.
    Cheers!

[Update on 17-Nov]

One more think,  SQL logic should not be changed.  Studied more about partition pruning from data warehousing guide, found USE_HASH hint achieved same effects without rewrite SQL.