Experience: is what you get soon after you need it.

Experience: is what you get soon after you need it.

Rasul Allah (sal Allahu alaihi wa sallam) said: "Restore the trusts of those who trust you, and deal not falsely with him who deals falsely with you." [Abu Dawud, Tirmidhi]

Search This Blog

Wednesday, September 21, 2011

Oracle I/O

IO db file sequential read db file scattered read io i/o

Types of IO

Direct path read:

Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache). If asynchronous IO is supported (and in use) then Oracle can submit IO requests and continue processing. It can then pick up the results of the IO request later and will wait on "direct path read" until the required IO completes.

If asynchronous IO is not being used then the IO requests block until completed but these do not show as waits at the time the IO is issued. The session returns later to pick up the completed IO data but can then show a wait on "direct path read" even though this wait will return immediately.

Hence this wait event is very misleading as:

  • The total number of waits does not reflect the number of IO requests
  • The total time spent in "direct path read" does not always reflect the true wait time.

This style of read request is typically used for:

  • Sort IO (when a sort does not fit in memory)
  • Parallel Query slaves
  • Readahead (where a process may issue an IO request for a block it expects to need in the near future)
The average response time is directly related to the type of IO:
  • Read or Write

  • Single Block or MultiBlock

    Single block IO, as suggested by its name, reads only one at a time.
    For example, when a session waits for a single-block IO it may typically wait on the event "db file sequential read" to indicate that it is waiting for that block to be delivered.

    Multi-Block operations read more than one block at a time ranging from 2 to 128 Oracle blocks depending on the size of the block and operating system settings. Usually a multi-block request had a limit of 1Mb in size.
    For example when a session waits for a multi-block IO it may typically wait on the event "db file scattered read" to indicate that it is waiting for those blocks to be delivered.
  • Synchronous or Asynchronous.

    Synchronous (Blocking) operations wait for the hardware to complete the physical I/O, so that they can be informed of and manage appropriately the success or failure of the operation (and to receive the results in the case of a successful read). The execution of the process is blocked while it waits for the results of the system call.

    With Asynchronous (Non-Blocking) operations the system call will return immediately once the I/O request has been passed down to the hardware or queued in the operating system (typically before the physical I/O operation has even begun). The execution of the process is not blocked, because it does not need to wait for the results of the system call. Instead, it can continue executing and then receive the results of the I/O operation later, once they are available.

Asynchronous operations should be equally as fast as or faster than synchronous.
Single Block operations should be as fast as or faster than multi-block

'log file parallel write', 'control file write' and 'direct path writes' should be no more than 15ms.

Identifying IO Response Time

Oracle records the response time of IO operations as the "Elapsed Time" indicated in specific wait events and statistics."Response time" and "elapsed time" are synonymous and interchangeable terms in this context.

Below is a list of some of the more popular wait events and their typical acceptable wait times (not an exhaustive list)
Wait Event R/W Synchronous
Elapsed Time
(with 1000+ waits per hour)
control file parallel write


Asynchronous Multi < 15ms
control file sequential read


Synchronous Single < 20 ms
db file parallel read Read Asynchronous Multi < 20 ms
db file scattered read Read Synchronous Multi < 20 ms
db file sequential read Read Synchronous Single < 20 ms
direct path read Read Asynchronous Multi < 20 ms
direct path read temp Read Asynchronous Multi < 20 ms
direct path write Write Asynchronous Multi < 15 ms
direct path write temp Write Asynchronous Multi < 15 ms
log file parallel write Write Asynchronous Multi < 15 ms

Sources in Oracle identifying Response Time

    10046 Trace File

    When level 8 or 12 is specified in the 10046 trace, wait events are included.
    The response time is specified in ela field.
    From 9i onwards the value is specified in microseconds. In 8i and before the time is in is 1/100th second (10ms).

    WAIT #5: nam='cell single block physical read' ela= 672 cellhash#=2520626383 diskhash#=1377492511 bytes=16384 obj#=63 tim=1280416903276618

    672 microseconds = 0.672 ms

    WAIT #5: nam='db file sequential read' ela= 1018 file#=2 block#=558091 blocks=1 obj#=0 tim=10191852599110

    1018 microseconds => 1.018 ms

    System State Dump

    For each process in a system state, the wait information is included among the other process information. This will either show an active wait: "waiting for" or a case where waiting is completed and the process is on CPU : "waited for" / "last wait for".
    • "waiting for"
      This means that the process is currently in a wait state.
      Prior to 11g the field to look at is "seconds since wait started" which shows how long the process has been waiting on this wait event
      Starting 11gR1 the field to look at is "total" which is the total time elapsed on this wait.

      If a process is indicated to be "waiting for" an IO related operation and "seconds since wait started" > 0 most likely this means that the IO got "lost" and the session can be considered to be hanging. (since we have mentioned previously that an average acceptable wait would be 20ms , any IO wait of duration > 1 second is a cause for concern)

    • "last wait for" is relevant in versions prior to 11g and indicates a process that is no longer waiting (ie it is on CPU). The last wait is recorded and its wait time indicated in "wait_time" field. (In 11g, "last wait for" is replaced by "not in wait")

      last wait for 'db file sequential read' blocking sess=0x0 seq=100 wait_time=2264 seconds since wait started=0
      file#=45, block#=17a57, blocks=1

      2264 microseconds => 2.264 ms

    • "waited for" means the session is no longer waiting. This is used in systemstate trace starting from 11gR1.The field to look at is "total" indicating the total time waited.

      0: waited for 'db file sequential read' file#=9, block#=46526, blocks=1
      wait_id=179 seq_num=180 snap_id=1
      wait times: snap=0.007039 sec, exc=0.007039 sec, total=0.007039 sec
      wait times: max=infinite
      wait counts: calls=0 os=0

      0.007039 sec => 7.039 ms

Statspack and AWR reports

Foreground and Background Wait Events

These reports show sections detailing waits by both foreground and background operations separately. The following is an example of such a section:

%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read 2,354,428 0 8,256 4 2.6 21.2
db file scattered read 23,614 0 48 2 0.0 .1

In this report the average Response Time is shown by the Av Rd (ms) column (Average Read in Milliseconds)

Tablespace IO Stats

The Tablespace section of these reports also gives useful information from the tablespace perspective:

Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
1,606,553 446 2.2 8.3 75,575 21 60,542 0.9

Again the Read Response Time is indicated by the Av Rd (ms) column (Average Read in Milliseconds). The write IO time is indicated by the amount of time it is having to wait to service buffer writes which is indicated in the report by the Av Buf Wt(ms) column (Average time to write a buffer in Milliseconds)

Wait Event Histogram

The Wait event histogram section can provide useful information regarding the spread of the write times that makes up the average. It can show if the average is made up of many writes close to the average or if it is being skewed by a few very large or small values:

                                                  % of Waits              
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
db file parallel read 4139 .2 .5 2.5 26.4 23.5 15.0 31.9 .1
db file parallel write 329K 88.5 4.0 2.1 1.9 2.3 1.1 .3 .0
db file scattered read 14.4K 54.3 8.5 6.1 16.6 11.5 2.6 .4

Each column indicates the percentage of wait events waited up to the time specified between each bucket. For example the waits indicated under "<16ms" are those that are greater than "8ms" but less than "16ms".
As long as the greatest percentage of waits are in the buckets from <1ms up to 16ms then the IO performance is generally acceptable.

  SELECT sid, total_waits, time_waited     FROM v$session_event    WHERE event='db file sequential read'      and total_waits>0    ORDER BY 3,2   ; 

Tracing your own Session

To start tracing your own session simply execute the PL/SQL call:

Tracing another Session

Once a session has been isolated then you need to know the SID of the target session (and possibly the SERIAL#). To start tracing the target session simply call:

If it is important to capture an exact session you should specify both the SID and SERIAL# . Eg:


This PL/SQL procedure will ask the target session to start writing trace output. The trace may NOT start immediately as there are certain points in the Oracle code where a session checks to see if it has been asked to do something (like writing trace output) - the target session only starts tracing once it has seen the request.

Stopping Trace

To stop tracing in your own session call:

To stop tracing another session call:


One can also look at:

  • Statements with high DISK_READS in <>
  • Sessions with high "physical reads" in <>
to list the file# and block# being waited for.
  Eg: In Unix one could process the waits using a command like:    grep 'buffer busy waits' trace_file | \   awk ' { print substr($0,9); } ' | \   sort > sorted_buffer_wait_list 
The actual waits may then be mapped back to an object using SQL like:
  SELECT segment_owner, segment_name     FROM dba_extents     WHERE file_id=&FILE_ID       AND &BLOCK_ID between block_id and block_id+blocks-1   ; 
One more thing:

A query can wait on 'db file sequential read' for a long time even if the execution plan appears to be optimal. This usually happens when the result set of index scan is large.

For example,

WHERE A = 1253
AND B in ('CA', 'CO')
AND C > 210 ;

Rows Row Source Operation
------- ---------------------------------------------------
215431 TABLE ACCESS BY INDEX ROWID BIG_TABLE (cr=880191 pr=430780 pw=0 time=2293667056 us) <<<<<
3582275 INDEX RANGE SCAN BIG_TABLE_IDX (cr=664748 pr=218595 pw=0 time=352506821 us)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 14363 0.00 0.02
db file sequential read 461688 1.15 2254.55 <<<<<
SQL*Net message from client 14363 0.01 9.77
SQL*Net break/reset to client 1 0.00 0.00


In most cases like this, the execution of the query waits on "TABLE ACCESS BY INDEX ROWID" much longer than on INDEX SCAN. It is because the random access to the table rows is much more expensive than index scan.


We can try one or more of the following.

1. Check if there is a better index or plan.

2. Try table full scan.

WHERE A = 1253
AND B in ('CA', 'CO')
AND C > 210 ;

3. Create a concatenated index to avoid the table access if the query has only a few columns of the table in SELECT and WHERE clauses. Be sure to put the most selective predicate column in the first place. For example,


NOTE : This can only be applied to SELECT for the table. If the query updates the table, this will not help.

4. Move the table into the tablespace with a larger block size. And it will be a little helpful to reorganize the table so that the index may have smaller clustering factor

5. Consider increasing buffer cache so that more table blocks can be cached. It is a good idea to use keep buffer pool if the table is frequently accessed.

Note 76374.1 Multiple Buffer Pools

6. Consider using IOT if the index is defined on a unique key.

7. Consider using parallel execution with high degree.

8. Improve the I/O of devices where the table resides. This requires the help from a system administrator.

This is not my own but adapted from Note 1275596.1 , This is for my quick reference only

No comments: