Thursday, July 23, 2015

Finding Sessions with High CPU Usage

If database server CPU usage is showing 100%, or high 90%, DBA needs to find out which session is hogging the CPU(s) and take appropriate action. I will explain in the following that how can we find out the session(s) which are excessively using CPU.

v$statname and v$sesstat dictionary view contains different metrics values, and in this example we will be checking values of 2 metrics highlighted in red bellow. These metrics would help to find out CPU usage by the sessions.
SQL> col name format a70
SQL> select statistic#,name from v$statname where upper(name) like '%CPU%';

STATISTIC# NAME
---------- ----------------------------------------------------------------------
         0 OS CPU Qt wait time
        10 recursive cpu usage
        16 CPU used when call started
        17 CPU used by this session
        61 IPC CPU used by this session
        64 global enqueue CPU used by this session
       229 gc CPU used by this session
       248 cell physical IO bytes sent directly to DB node to balance CPU
       581 parse time cpu

9 rows selected.

If we want to check which session is top consumer of the CPU currently, we can use following script to find it out.
Script for 11g and bellow
alter session set nls_date_format='Dd-MON-YY HH24:MI:SS';
set lines 250
set pages 2000
col name format a26
col username format a15
col program format a40
col SESS_CPU_SECS wra format 999,999,999.99
col LAST_CPU_SECS wra format 999,999,999.99
col logon_secs  wra format 999,999,999
col Percent  wra format 999.99

select sess_cpu.sid, NVL(sess_cpu.username, 'Oracle Process') username, sess_cpu.status, sess_cpu.logon_time,  round((sysdate - sess_cpu.logon_time)*1440*60) logon_SECS, sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS, round ((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, sess_cpu.sql_id          
from
(select se.sql_id,ss.statistic#,se.sid, se.username, se.status, se.program, se.logon_time, sn.name, ss.value from v$session se, v$sesstat ss,
v$statname sn
where se.sid=ss.sid
and sn.statistic#=ss.statistic#
and sn.name in ('CPU used by this session') ) sess_cpu,
(select ss.statistic#,se.sid, ss.value, value/100 seconds from v$session se, v$sesstat ss, v$statname sn
where se.sid=ss.sid
and sn.statistic#=ss.statistic#
and sn.name in ('CPU used when call started') ) call_cpu
where sess_cpu.sid=call_cpu.sid
order by SESS_CPU_SECS ;


Script for 12c and above
Column CON_ID is added in the script to list the container id where sessions are connected. If CON_ID is 0, it would mean that this is an internal process/session. If value is 1, it would mean that this session is from container database. Any value other than 0 or 1 would represent a pluggable database sessions. Check v$containers to match container ID with your pluggable database.
alter session set nls_date_format='Dd-MON-YY HH24:MI:SS';
set lines 250
set pages 2000
col name format a26
col username format a15
col program format a40
col SESS_CPU_SECS wra format 999,999,999.99
col LAST_CPU_SECS wra format 999,999,999.99
col logon_secs  wra format 999,999,999
col Percent  wra format 999.99

select sess_cpu.con_id, sess_cpu.sid, NVL(sess_cpu.username, 'Oracle Process') username, sess_cpu.status, sess_cpu.logon_time,  round ((sysdate-sess_cpu.logon_time)*1440*60) logon_SECS, sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS,
round((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, sess_cpu.sql_id          
from
(select se.con_id,se.sql_id,ss.statistic#,se.sid, se.username, se.status, se.program, se.logon_time, sn.name, ss.value from v$session se, v$sesstat ss, v$statname sn
where se.sid=ss.sid
and sn.statistic#=ss.statistic#
and sn.name in ('CPU used by this session') ) sess_cpu,
(select se.con_id, ss.statistic#,se.sid, ss.value, value/100 seconds from v$session se, v$sesstat ss, v$statname sn
where se.sid=ss.sid
and sn.statistic#=ss.statistic#
and sn.name in ('CPU used when call started') ) call_cpu
where sess_cpu.sid=call_cpu.sid
and sess_cpu.con_id=call_cpu.con_id
order by SESS_CPU_SECS ;

LOGON_SECS
column shows total time in seconds this session is connected to the database.
SESS_CPU_SECS columns shows number of seconds this session has spent using CPU.
LAST_CPU_SECS column shows the CPU taken by last execution call – a single SQL may call CPU several times during its execution.
PERCENT column shows the percent of time this session has spent using CPU since this session has connect to the database.
  
Points to note
While on CPU, LAST_CPU_SECS would show zero (session status would be ACTIVE). During execution of SQL(s), session may be switching between CPU and IO - and hence we can see value changing under SESS_CPU_SECS and LAST_CPU_SECS columns as soon as session switches from CPU to IO.
If a session is on CPU, status would be ACTIVE and SESS_CPU_SECS would be showing last value captured for this session, and LAST_CPU_DECS would be 0.
If a session is on IO, status would be ACTIVE and SESS_CPU_SECS would be showing total seconds of CPU taken by the session until now.
LAST_CPU_SECS would show CPU seconds taken during last call to the CPU. I have also seen value of zero under this column even if session is not ACTIVE. I can't figure it out why, but most important thing is total time of CPU taken by the session which is under SESS_CPU_SECS.
See the following example where I have used ORDER BY SESS_CPU_SECS to display sessions which are at the top CPU consumption since log in.

If you see session bellow highlighted in red, it is connected to the database for 799 seconds and out of which it has spent 260 seconds on CPU which accounts to 32.55 percent of total session time.
SID
USERNAME
STATUS
LOGON_TIME
LOGON_SECS
SESS_CPU_SECS
LAST_CPU_SECS
PERCENT
SQL_ID
262
Oracle Process
ACTIVE
23-JUL-15 09:44:48
22,804
.08
.08
.00
 
2
Oracle Process
ACTIVE
23-JUL-15 09:44:14
22,838
.10
.10
.00
 
504
Oracle Process
ACTIVE
23-JUL-15 09:44:18
22,834
.22
.22
.00
 
508
Oracle Process
ACTIVE
23-JUL-15 09:44:27
22,825
.26
.26
.00
 
11
SYS
ACTIVE
23-JUL-15 15:47:58
1,014
.48
.00
.05
0r86pjqdx9r5m
18
SYS
INACTIVE
23-JUL-15 14:58:11
4,001
1.90
.02
.05
 
509
Oracle Process
ACTIVE
23-JUL-15 09:44:41
22,811
12.07
12.07
.05
 
256
Oracle Process
ACTIVE
23-JUL-15 09:44:19
22,833
16.94
16.94
.07
 
5
Oracle Process
ACTIVE
23-JUL-15 09:44:19
22,833
30.57
30.57
.13
 
764
C##SALMAN
INACTIVE
23-JUL-15 15:51:33
799
260.07
.00
32.55
c0yavgxt1nxqc

After some time, percentage of CPU usage increases to 35.15 percent. 
SID
USERNAME
STATUS
LOGON_TIME
LOGON_SECS
SESS_CPU_SECS
LAST_CPU_SECS
PERCENT
SQL_ID
262
Oracle Process
ACTIVE
23-JUL-15 09:44:48
23,341
.08
.08
.00
 
2
Oracle Process
ACTIVE
23-JUL-15 09:44:14
23,375
.10
.10
.00
 
504
Oracle Process
ACTIVE
23-JUL-15 09:44:18
23,371
.22
.22
.00
 
508
Oracle Process
ACTIVE
23-JUL-15 09:44:27
23,362
.26
.26
.00
 
11
SYS
ACTIVE
23-JUL-15 15:47:58
1,551
.67
.00
.04
0r86pjqdx9r5m
18
SYS
INACTIVE
23-JUL-15 14:58:11
4,538
2.04
.00
.04
 
509
Oracle Process
ACTIVE
23-JUL-15 09:44:41
23,348
12.13
12.13
.05
 
256
Oracle Process
ACTIVE
23-JUL-15 09:44:19
23,370
17.12
17.12
.07
 
5
Oracle Process
ACTIVE
23-JUL-15 09:44:19
23,370
30.91
30.91
.13
 
764
C##SALMAN
INACTIVE
23-JUL-15 15:51:33
1,336
469.58
.00
35.15
c0yavgxt1nxqc

In real time scenario, if you see a session spending huge amount time on CPU, have a look at the SQL(s) it is executing (last column showing SQL_ID) and investigate further.
You can modify this script to “ORDER BY PERCENT” to see which session has spent most of its connected time on CPU.

10 comments:

  1. Assalamu Alaikkum Warah..

    Dear Salman,

    I am getting the beloew error while executing the above script.

    ((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, sess_cpu.sql_id
    *
    ERROR at line 2:
    ORA-01476: divisor is equal to zero

    ReplyDelete
    Replies
    1. Wa alaikumus salam.
      Can you please with for sometime and re-execute the query? At my side if I execute this query within a second after a session connects with the database, I see this error. So either you can wait for a few seconds after a session connects with the DB. If you DB has a lot of sessions connecting and disconnecting simultaneously, you have more chances to see this error. In that case, you can remove column "round((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, " from this query and I hope this will resolve your problem

      Delete
  2. Hi Rajesh,
    If you see outputs that I have pasted in this post, you will notice that sql_id column does not show any value for some sessions, so this is an expected behavior. For internal sessions, you may not see any SQL_ID listed. There could be some other scenarios for some sessions where you might not see any SQL_ID.

    ReplyDelete
  3. can you post RAC specific query for the same

    ReplyDelete
    Replies
    1. Hi VIshith,
      I do not have RAC specific query as I prefer to monitor CPU usage individually on each node, if this is a RAC.

      Delete
    2. For some reports, I have needed to adapt this script for RAC. Also, I have added some columns for my needed. Here is:

      ---RAC
      set lines 250
      set pages 2000
      col "SID,SERIAL@INST_ID" for a20
      col name format a26
      col username format a40
      col program format a40
      col SESS_CPU_SECS wra format 99999999999
      col LAST_CPU_SECS wra format 99999999999
      col LOGON_MIN wra format 999999999
      col "CPU USAGE %" wra format 999.99
      col service_name for a20;
      SELECT * FROM (select sess_cpu.spid,sess_cpu.sid||','||sess_cpu.serial#||'@'||call_cpu.inst_id as "SID,SERIAL@INST_ID", NVL(sess_cpu.username, program) username,sess_cpu.service_name,
      sess_cpu.status,sess_cpu.LAST_CALL_ET, to_char(sess_cpu.logon_time,'dd/mm/yyyy hh24:mi:ss') "LOGON TIME",round((sysdate - sess_cpu.logon_time)*1440*60/60) LOGON_MIN,
      sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS, round ((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) "CPU USAGE %",
      sess_cpu.sql_id from
      (select se.inst_id,se.sql_id,ss.statistic#,pr.spid,se.sid,se.serial#, se.username,se.service_name,se.status,se.last_call_et, se.program, se.logon_time, sn.name, ss.value from gv$session se,
      gv$process pr, gv$sesstat ss,v$statname sn
      where se.sid=ss.sid
      and se.inst_id=ss.inst_id
      and pr.inst_id=se.inst_id
      and sn.statistic#=ss.statistic#
      and pr.addr=se.paddr
      and sn.name in ('CPU used by this session') ) sess_cpu,
      (select se.inst_id,ss.statistic#,se.sid, ss.value, value/100 seconds from gv$session se, gv$sesstat ss, v$statname sn
      where se.sid=ss.sid
      and se.inst_id=ss.inst_id
      and sn.statistic#=ss.statistic#
      and sn.name in ('CPU used when call started') ) call_cpu
      where sess_cpu.sid=call_cpu.sid
      and sess_cpu.inst_id=call_cpu.inst_id
      and sess_cpu.status = 'ACTIVE'
      order by "CPU USAGE %" DESC) WHERE ROWNUM <=15;

      Delete
  4. am getting the beloew error while executing the above script.

    ((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, sess_cpu.sql_id
    *
    ERROR at line 2:
    ORA-01476: divisor is equal to zero

    ReplyDelete
    Replies
    1. Hi Prashant,
      I responded to this problem in one of my previous replies. Can you please see my reply to the first comment above.

      Delete

Popular Posts - All Times