I have written 2
articles (AWR 1, AWR 2) in relation to the disk latency and how to read AWR reports to
investigate IO slowness. In this article I will explain how we check disks IO
performance in Linux systems. We will check how the disks where our datafiles
and redo log files are stored are performing. These disks could be simple Linux
mount points, or ASM disks. In case of ASM, we will need to find out the disks
that are part of ASM diskgroups so that we can check the performance of those
disks. For example, following is the way how we can find out the disks that are
part of ASM diskgroup.
Oracle Installation guides, Linux Administration tips for DBAs, Performance Tuning tips, Disaster Recovery, RMAN, Dataguard and ORA errors solutions.
No contents from my website can be published anywhere else without my permission. Test every solution before implementing in the production environment.
Monday, March 26, 2018
Monday, March 19, 2018
Reading and Understanding AWR Report for IO or Disk latency - 2
This a second article regarding IO latency issues investigation using AWR. First article can be found here. In this article I will further explain about checking IO latencies at the OS level in Linux
Log file sync
We had a production
server running on Virtual Machine (vmware), and after a downtime, we started
receiving complains about slow database. AWR report showed that “log fie sync”
wait event that comes under COMMIT wait class was at the top, and database was
spending more than 30% of its time on log
file sync wait. Log file sync wait even can be observed in a very busy OLTP
database, but it should not consume this much time as we were seeing, and
should be found at the bottom of the list of top wait events.
Monday, March 12, 2018
Reading and Understanding AWR Report for IO or Disk latency - 1
Recently I performed a
failover of my Oracle database (running on Linux) to my standby database, and
after the switchover, application team started complaining about extreme
slowness. I was using OEM Cloud Control and the graph was showing high waits
for “free buffer wait” and alert log started showing Checkpoint not Complete. Since I never saw these waits on my previous
primary server (now standby), so first thing came into my mind was that the
disks on the standby server (now primary) are probably very slow, because
hardware of my servers was very old. Servers also had internal disks (not SAN
or NAS). I generated AWR report for the time when database was running fine and
without any performance issue, and then a latest time report based on latest
snapshots to see what is going wrong with the IO.
Tuesday, March 6, 2018
ORA-00742: Log read detects lost write in thread %d sequence %d block %
During real time apply
on one of my physical standby RAC database , the managed recovery process
crashed with this error message, following is the entry in alert log file.
CORRUPTION DETECTED: In redo blocks starting at
block 169592count 142 for thread 4 sequence 157
Sat Jul 02 19:12:25 2016
MRP0: Background Media Recovery terminated with
error 742
|