Friday, June 26, 2015

Heartbeat failed to connect to standby. Error is 16009

On one of my 11.2.0.4 dataguard setup on Windows 64 bit, following errors were being reported every minute, in alert log file of standby database.
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:01:10 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:02:11 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:03:11 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:04:11 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:05:11 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:06:11 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.
Thu Jun 25 07:07:12 2015
PING[ARC2]: Heartbeat failed to connect to standby 'proddb'. Error is 16009.

Meanwhile, following errors were appearing in the alert log file of primary database
Thu Jun 25 07:01:10 2015
RFS[3838]: Assigned to RFS process 6296
RFS[3838]: Database mount ID mismatch [0x99658056:0x9965e5d7] (2573566038:2573592023)
RFS[3838]: Client instance is standby database instead of primary
RFS[3838]: Not using real application clusters
Thu Jun 25 07:02:10 2015
RFS[3839]: Assigned to RFS process 5928
RFS[3839]: Database mount ID mismatch [0x99658056:0x9965e5d7] (2573566038:2573592023)
RFS[3839]: Client instance is standby database instead of primary
RFS[3839]: Not using real application clusters
Thu Jun 25 07:03:11 2015
RFS[3840]: Assigned to RFS process 7452
RFS[3840]: Database mount ID mismatch [0x99658056:0x9965e5d7] (2573566038:2573592023)
RFS[3840]: Client instance is standby database instead of primary
RFS[3840]: Not using real application clusters

Error 16009 in standby database alert log file gave a hint that probably standby database is trying to ship archives to primary and therefore this error is being reported. After investigation I found that log_archive_des_2 parameter (to ship archives to the standby when this standby would become primary) was not set properly. It had following value
SQL> show parameter log_archive_dest_2

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      service=proddb lgwr async

As we see that VALID_FOR option is missing from this parameter. Default value of VALID_FOR option in log_archive_des_n parameter is “(ALL_LOGFILES, ALL_ROLES)”, which means that archives would be shipped to this destination even if current database role is standby.
So I modified this parameter as follows
SQL> ALTER SYSTEM SET log_archive_dest_2='service=proddb lgwr async valid_for=(online_logfiles, primary_role)' SCOPE=both;

(ONLINE_LOGFILE, PRIMARY_ROLE) value in VALID_FOR option means that, send archived logfiles to this destination only if current database is running in Primary role and archives are generated from online log files.
After this modification, error messages stopped appearing from both Primary and Standby database.


1 comment: