Saturday, July 10, 2021

Monitoring RMAN Restore and Recovery Progress

 While we start a restore and recovery process for database or datafiles, next step is to monitor the progress of restore and recovery process. This is particularly important when end user or customer is keen to learn the approximate completion time of recovery when a critical system is down because of database/datafile corruption. In this article, I would explain how we monitor the recovery of database, and approximate time of completion for the recovery process.

I will discuss a real time scenario that I recorded when I was performing a recovery of a datafile that needed hundreds of archived redo logs to be restored form the backup and applying them during recovery.

RMAN> RECOVER DATAFILE 207;


After recovery had started, I queried different views to monitor the progress. I will discuss each of them one by one.

Querying these views is worthwhile if you are recovering datafiles or entire database.

SQL> select last_time,checkpoint_change#,last_change# from v$datafile where file#=207;

LAST_TIME                CHECKPOINT_CHANGE#     LAST_CHANGE#
---------------- ------------------ ----------------------------------------------
03-05-2018 06:26         4871194091                               4871323936

Above query shows that my datafile 207 has last change number 4871323936 when it was available for writing. Checkpoint has not moved forward because file is no longer available (OFFLINE).

SQL> select distinct first_time,inst_id,thread#,sequence# from gv$archived_log where first_change#<=4871323936 and next_change#>=4871323936 order by thread#

FIRST_TIME          INST_ID        THREAD#     SEQUENCE#
---------------- ---------- ---------- -----------------------------------------
03-05-2018 06:16          1                              1         44063
03-05-2018 06:16          2                              1         44063
03-05-2018 06:15          1                              2         43155
03-05-2018 06:15          2                              2         43155

Above query shows that until what (archived) redo log sequence number this datafile needs to be recovered. We simply tried to find the sequence numbers based on the last change number of the datafile.

select recover,CHECKPOINT_CHANGE#,CHECKPOINT_TIME from v$datafile_header where file#=207;

REC        CHECKPOINT_CHANGE#      CHECKPOINT_TIME
--- ------------------ ------------------------------------------------------
YES         48707439880                          3-05-2018 05:46

After the start of recovery, above query output shows the current recovery progress of the datafile. This shows that datafiles has been recovered until change number and time returned by this query. Based on the output, if we query V$ARCHIVED_LOG, we can see until what sequence number the archived logs have been applied. This will also help us to know how many more archived logs application is left.

Based on the checkpoint_change# value returned bu the above query, if we query V$ARCHIVED_LOG, we can see until what sequence number the archived logs have been applied. This will also help us to know how many more archived logs application is left. See below output.

select distinct first_time,inst_id,thread#,sequence# from gv$archived_log where first_change#<=4870743988 and next_change#>=4870743988 order by thread#;

 FIRST_TIME              INST_ID       THREAD#  SEQUENCE#
---------------- ---------- ---------- ---------------------------------------
03-05-2018 05:31          1                   1                  44060
03-05-2018 05:46          1                   1                  44061
03-05-2018 05:31          2                   1                  44060
03-05-2018 05:46          2                   1                  44061
03-05-2018 05:46          1                   2                  43153
03-05-2018 05:46          2                   2                  43153


V$RMAN_STATUS

This view can be queried to check the status of RMAN jobs currently running. You can explore other columns of the view as well. This view can also be used to monitor backup jobs progress. This view returned me SID of the session currently doing the recovery/restore. In case of multiple channels, you may find multiple SIDs (sessions) doing the recovery.

select sid,recid,start_time,end_time,status,operation,mbytes_processed/1024 GB_PROCESSED,input_bytes/1024/1024/1024 INPUT_GB,output_bytes/1024/1024/1024 OUTPUT_GB,object_type from v$rman_status where operation like '%R%' and status='RUNNING' order by 1;

SID      RECID START_TIME       END_TIME         STATUS                  OPERATION                         GB_PROCESSED   INPUT_GB  OUTPUT_GB OBJECT_TYPE
---------- ---------- ---------------- ---------------- ----------------------- --------------------------------- ------------ ---------- ---------- -------------
 508       4557 03-05-2018 10:30 04-05-2018 00:50 RUNNING                 RESTORING AND APPLYING LOGS         260.326409 168.926084 260.326409 DATAFILE FULL
 508       4553 03-05-2018 10:29 04-05-2018 00:50 RUNNING                 RECOVER                                      0          0          0 DATAFILE FULL
 508       4552 03-05-2018 10:29 04-05-2018 00:50 RUNNING                 RMAN                                         0          0          0
 

 V$SESSION_LONGOPS

This view is very handy in monitoring the long running processes. First we can find oud the SID of the sessions currently connected with RMAN, and then query v$sesison_longops to find out the restore/recovery process.

SQL> select message,sofar,totalwork,time_remaining from v$session_longops where sid=508;

MESSAGE                                                                SOFAR  TOTALWORK TIME_REMAINING
----------------------------------------------------------------- ---------- ---------- --------------
Media Recovery: Elapsed Time : 54107 out of 0 Seconds done             54107          0
Media Recovery: Last Applied Redo : 1 out of 1 SCN+Time done               1          1              0
Media Recovery: Average Apply Rate : 5619 out of 0 KB/sec done          5619          0
Media Recovery: Log Files : 342 out of 662 Files done                    342        662          50626
Media Recovery: Maximum Apply Rate : 120111 out of 0 KB/sec done      120111          0
Media Recovery: Active Apply Rate : 75362 out of 0 KB/sec done         75362          0
Media Recovery: Active Time : 3190 out of 0 Seconds done                3190          0
Media Recovery: Redo Applied : 296919 out of 0 Megabytes done         296919          0


V$RMAN_OUTPUT

This view shows us the output of the rman job that we are currently monitoring. Output is same as you see on your command line where you have initiated the command. Use SID and RECID columns returned by V$RMAN_STATUS query to filter the output to see outputs of the command you have executed.

SQL> select output from v$rman_output where sid=508 and session_recid=4552 order by recid ;

channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=43917
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=43010
channel ORA_DISK_1: restoring archived log

archived log thread=1 sequence=43918
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=43011
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=43919|
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=43012
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=43920
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=43013
channel ORA_DISK_1: reading from backup piece /u05/rman_backup/MTU01MYM/MTU01MYM_LEVEL_1_20180502_2lt1rtkk_1_1

 

 

select output from v$rman_output where sid=508 and session_recid=4552 order by recid desc;

archived log for thread 1 with sequence 43755 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43755__152
archived log for thread 1 with sequence 43754 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43754__152
archived log for thread 1 with sequence 43753 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43753__152
archived log for thread 1 with sequence 43752 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43752__152
archived log for thread 1 with sequence 43751 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43751__152
archived log for thread 1 with sequence 43750 is already on disk as file /u03/MTU01MYMP/archivelog/2018_05_03/o1_mf_1_43750__152

starting media recovery 

channel ORA_DISK_1: SID=491 instance=MTU01MYM1 device type=DISK
allocated channel: ORA_DISK_1
using target database control file instead of recovery catalog
Starting recover at 03-MAY-18


 


No comments:

Post a Comment

Popular Posts - All Times