Sunday, May 18, 2014

RMAN Backup Performance

Finishing an RMAN backup on time (fast) is usually what every DBA looks for. Here I would be discussing my experience of RMAN backup scenarios which I tested for speed and performance. Environment details are as follows.

  • Backups were tested for a 3 nodes RAC (10.2.0.4) database on Windows 2003. 
  • Database had more than 250 datafiles (files sizes ranging from 5G to 10G) with total datafiles size around 3.4 Terabytes and segments size ( SUM(bytes) FROM dba_segments) around 3.1 Terabytes). 
  • Backups were taken on NFS mounted NAS disk.
  • Time was recorded for 250G (input size) data backup and then calculated the time for whole 3.1T data.
Backups script (slight modifications were done to the script for each scenario)
################################################################

connect target /;
allocate channel for maintenance type disk ;
delete noprompt force obsolete ;
run
{
allocate channel disk10 type disk connect /@instance1;
allocate channel disk20 type disk connect /@instance2 ;
allocate channel disk30 type disk connect /@instance3;

backup as compressed backupset format '\\<IP_Address>\backup_folder\ora_%U.rbf' incremental level 0 database filesperset 1;
sql "alter system archive log current";
sql "alter system switch logfile";

backup format '\\<IP_Address>\backup_folder\arc_%U.rbf' archivelog all  delete input;
backup current controlfile format '\\<IP_Address>\backup_folder\ctl_%U.rbf';
}
allocate channel for maintenance type disk ;
delete noprompt force obsolete ;
delete noprompt archivelog all backed up 1 times to disk;

########################################################################

Some Points Worth discussing (from Oracle Documentation)
  1. Parallelism has no or minimal benefit of multiple channels if writing to a single disk/tape (See test cases of local disks backup bellow)
  2. If you have data striped across several disks, you don’t need to multiplex at all and hence can set MAXOPENFILES to 1.
  3. FILESPERSET defaults to 64 or number of datafiles divided by number of channels - whichever is lower.
  4. MAXOPENFILES defaults to 8 which means open 8 files at a time and multiplex them. Setting it to 1 means no multiplexing and write all files in sequence in a single backup set.
  5. Multiplexing is determined by lesser of FILESPERSET and MAXOPENFILES values.
  6. MAXSETSIZE and MAXPIECESIZE are used to restrict the size of backups if there is limitation on disk/tape sizes. 
  7. The FILESPERSET parameter determines how many datafiles should be included in each backup set, while MAXOPENFILES defines how many datafiles RMAN can read from simultaneously and multiplex them together.
Assume that you are backing up six datafiles with one RMAN channel. If FILESPERSET is 6 and MAXOPENFILES is 1, then the channel includes 6 datafiles in a set but does not multiplex the files because RMAN is not reading from more than one file simultaneously. The channel reads one file at a time and writes to the backup piece. In this case, the degree of multiplexing is 1.

Now, assume that FILESPERSET is 6 and MAXOPENFILES is 3. In this case, the channel can read and write in the following order:
Read from datafiles 1, 2, and 3 simultaneously and write to the backup piece 
Read from datafiles 4, 5 and 6 simultaneously and write to the backup piece 

Note that multiplexing too many files can decrease restore performance. If possible, group files that will be restored together into the same backup set. Assume that RMAN backs up seventeen files with FILESPERSET = 64 and MAXOPENFILES = 16 and then you decide to restore data17.dbf which is datafile 17, then RMAN will read the multiplexed data for the first sixteen files and then starts reading the data for data17.dbf. In this case, moving to the beginning of the  backup of data17.dbf may take more time than the restore itself.

Test Cases (Backup Time is in HH:MI:SS Format)
Each RAC node has NFS Shared disk mounted on it.
  1. 3 channels (1 channel per instance) with filesperset 1 and without maxopenfiles (maxopenfiles would be automatically 1 in this case)
    250G took 00:50:00 to complete, expected time for 3.1T is approximately 10:32:00
  2. 6 channels (2 channels per instance) with filesperset 1 and without maxopenfiles (maxopenfiles would be automatically 1 in this case)
    250G took 00:33:00 to complete, expected time for 3.1T is approximately 06:40:00
  3. 9 channels (3 channels per instance) with filesperset 1 and without maxopenfiles (maxopenfiles would be automatically 1 in this case)
    250G took 00:25:40 to complete, expected time for 3.1T is approximately 05:25:00
  4. 12 channels (4 channels per instance) with filesperset 1 and without maxopenfiles (maxopenfiles would be automatically 1 in this case)
    250G took 00:18:00 to complete, expected time for 3.1T is approximately 03:35:00
  5. 15 channels (with 3 instances, 5 channels per instance) with filesperset 1 and without maxopenfiles (maxopenfiles would be automatically 1 in this case)
    250G took 00:16:40 to complete, expected time for 3.1T is approximately 03:30:00
Conclusions
  1. On a single physical server (whether RAC node or single server), if multiple channels are opened to write to a single backup destination disk, there is no significant improvement in backup finish time.
  2. On a single physical server, if backup is being taken on NFS mount, you can get better performance by increasing number of channels - because NFS is usually slow because data has to travel through the network to be written on NFS mount, and having more channels would make RMAN fully utilize storage speed.
    As a personal experience, to take a compressed full backup of a database of 1.3 Terabyte on NFS, backups completed in 10 hours with 4 channels, and completed in 2.5 hours when I used 16 channels (There were 24 CPUs in the server)
  3. On a NFS mounted shared disk (across all RAC nodes), as the number of channels increase (across all RAC nodes), backup time reduces. Reasons is same as mentioned in above point.
  4. For single server or RAC, taking backup on tapes would also - performance can be achieved by having multiple channels for writing in parallel to multiple tape drives.
Point to Note
While specifying multiple channels, you should always keep in mind the number of CPUs you have on your server; because opening several channels may bottleneck your CPUs (by increasing load average or run queue length). Analyze current CPU utilization and allocate channels accordingly. 


4 comments:

  1. Can you share the script for allocating 15 channels wrt tape

    ReplyDelete
  2. Hello,
    It is very simple, replace "disk" with 'sbt_tape' to open channels for tape.

    ReplyDelete
  3. I see that you connected to each node in the script.

    run
    {
    allocate channel disk10 type disk connect /@instance1;
    allocate channel disk20 type disk connect /@instance2 ;
    allocate channel disk30 type disk connect /@instance3;

    Can we do it dynamically also ??

    ReplyDelete