Running Jobs in Parallel Using DBMS_SCHEDULER (part 1)

DBMS_SCHEDULER is a fairly nice feature of the Oracle database that allows Oracle to schedule and run jobs in the background. This also allows the clever DBA to run multiple procedures at the same time.

The method I describe here can be used for most types of database processing. In the examples this post and for part two, I will be demonstrating methods for running some standard database tasks in parallel: index rebuilds and manual statistics jobs.

We have run into certain very large tables in our database that require more attention than is typical when it comes to maintaining stats. Rather than spend the time to tweak the percentage required to mark the stats as stale, we determined that is simpler to just add those tables requiring special attention to our special stats job, which runs weekly to gather statistics on those tables. Below is the table layout:

OWNER		VARCHAR2(60)
TABLE_NAME		VARCHAR2(60)
GROUP_NUMBER		NUMBER
PARALLEL_DEGREE		NUMBER
ESTIMATE_PERCENT		NUMBER
LAST_RUN_DATE		DATE
RUNNING_NOW		CHAR(1)

The intent of this task is to allow the prioritization of the tables by groups, with group one being the most important. Tables in group one are analyzed by themselves, and with a high degree of parallelization to ensure that these will always finish.  After completing all the tables in group one in real time, the code then proceeds through the other groups and analyzes the tables in those groups.

Group numbers larger than one allow the program to work through the individual groups until all are done. Right now, we only have two groups, but the configuration allows for as many groups as desired. Next we have the driver program, this is a stored procedure with a single argument, the number of simultaneous detached jobs that can be run at once.

The procedure ANALYZE_BY_GROUP, below, is the driving procedure for this process.

procedure analyze_by_group(num_simul in number) authid current_user is
/* This Stored procedure is used to analyze only those tables whose statistics need special attention outside the normal
gather_stats_job. Do not place all tables in the stats_groups table, use this only for tables requiring special attention*/
num_running number;
job_exists number;
newstatid varchar2(30):=to_char(sysdate,'yymmddhh24mi');
begin
 dbms_output.enable(100000);
 for analyze_cursor in
 (select owner, table_name, parallel_degree, estimate_percent from stats_groups where group_number=1 and owner=(select user from dual))
 loop
 dbms_stats.export_table_stats(statown=>analyze_cursor.owner,stattab=>'SPECIAL_TAB_STATS',ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,statid=>'SPECIAL'||to_char(sysdate,'yyyymmddhh24mi'));
 if analyze_cursor.estimate_percent=0
 then
 update stats_groups set running_now='T' where owner=analyze_cursor.owner and table_name=analyze_cursor.table_name;
 commit;
-- dbms_output.put_line(analyze_cursor.owner||'.'||analyze_cursor.table_name);
 dbms_stats.gather_table_stats(ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,degree=>analyze_cursor.parallel_degree,
 estimate_percent=>dbms_stats.auto_sample_size,cascade=>true);
 else
 update stats_groups set running_now='T' where owner=analyze_cursor.owner and table_name=analyze_cursor.table_name;
 commit;
-- dbms_output.put_line(analyze_cursor.owner||'.'||analyze_cursor.table_name);
 dbms_stats.gather_table_stats(ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,degree=>analyze_cursor.parallel_degree,
 estimate_percent=>analyze_cursor.estimate_percent,cascade=>true);
 end if;
 update stats_groups set running_now='F', last_run_date=sysdate where owner=analyze_cursor.owner and table_name=analyze_cursor.table_name;
 commit;
 end loop;
 for analyze_cursor2 in
 (select distinct group_number from stats_groups where group_number >1 and owner=(select user from dual))
 loop
-- sleep(10);
 select count(1) into num_running from all_scheduler_jobs where job_name like 'ASYNC_ANALYZE%';
-- dbms_output.put_line(to_char(num_running));
 if (num_running <num_simul)
 then
-- dbms_output.put_line('submitting job '||to_Char(analyze_Cursor2.group_number));
 select count(1) into job_exists from all_scheduler_jobs where job_name='ASYNC_ANALYZE_'||to_char(analyze_cursor2.group_number);
 if job_exists >0
 then
 dbms_scheduler.drop_job(job_name=>'ASYNC_ANALYZE_'||to_char(analyze_cursor2.group_number),force=>true);
 end if;
 dbms_scheduler.create_job(
 job_name=>'ASYNC_ANALYZE_'||to_char(analyze_cursor2.group_number),
 job_type=>'PLSQL_BLOCK',
 job_action=>'BEGIN async_analyze_group('||analyze_cursor2.group_number||'); END;',
 comments=>'Asynch analyze job for tables requiring special attention. Argument is the group number from the stats_group table',
 auto_drop=>true, --set this to false for debugging purposes, job will remain on dba_scheduler_jobs view until manually removed
 enabled=>true);
 end if;
 end loop;
end;

Note that the code above simply executes the stored procedure ASYNC_ANALYZE_GROUP using DBMS_SCHEDULER, which allows us to run multiple detached jobs. Below is the ASYNC_ANALYZE_GROUP stored procedure:

procedure async_analyze_group(in_group_number in number)authid current_user
is
  newstatid varchar2(30):=to_char(sysdate,'yyyy-mm-dd hh24mi');
begin
  dbms_output.enable(100000);
  for analyze_cursor in
  (select owner, table_name, parallel_degree, estimate_percent from stats_groups where group_number=in_group_number and owner=(select user from dual))
  loop
    update stats_groups set running_now='T' where owner=analyze_cursor.owner and table_name=analyze_cursor.table_name;
    commit;
    dbms_stats.export_table_stats(statown=>analyze_cursor.owner,stattab=>'SPECIAL_TAB_STATS',ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,statid=>'SPECIAL'||to_char(sysdate,'yymmddhh24mi'));
    if analyze_cursor.estimate_percent=0
    then
      dbms_stats.gather_table_stats(ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,degree=>analyze_cursor.parallel_degree,
        estimate_percent=>dbms_stats.auto_sample_size,cascade=>true);
    dbms_output.put_line(analyze_cursor.owner||'.'||analyze_cursor.table_name||'    0');
    else
       dbms_stats.gather_table_stats(ownname=>analyze_cursor.owner,tabname=>analyze_cursor.table_name,degree=>analyze_cursor.parallel_degree,
        estimate_percent=>analyze_cursor.estimate_percent,cascade=>true);
    end if;
    update stats_groups set running_now='F',last_run_date=sysdate where owner=analyze_cursor.owner and table_name=analyze_cursor.table_name;
    commit;
  end loop;
  update stats_groups set running_now='F' where group_number=in_group_number;
  commit;
end;

The async analyze job is run asynchronously through the Oracle job scheduler.  Because it is asynchronous, we can start additional jobs to run at the same time. Note that the logic of this code divides various tables into groups, and then gathers stats group by group.  Because of this layout, we can also put each table in its own group and then run multiple tables simultaneously.  Either method is legitimate, and the code was written to allow for both possibilites.

The procedures above are a fairly simplistic method for running large processes in parallel. Your comments are appreciated as always.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: