Monday, December 26, 2016

Redo log in depth

What is RBA?
Called as Redo Byte Address. Consuming approximate 10 bytes. It represents/identifies start of redo
record. RBA contains Log Sequence Number (4 bytes), Block Number within Redo Log (4 bytes) and Byte number within block (2 bytes) in started point.
There are different types of RBA available in SGA those are following.
Low RBA
Dirty buffer contains first redo change address called Low RBA. From x$bh we can check low RBA.
High RBA
Dirty buffer contains last and most recent redo changes address called High RBA. From x$bh we can
check High RBA.
Checkpoint RBA
DBWR has written buffers from checkpoint queue are pointing to checkpoint RBA while incremental
checkpoint is enabled. This RBA copies in to control file’s checkpoint progress record. When instance
recovery occurs that time it starts from checkpointing RBA from control file. We can check this RBA from
x$targetrba (sometimes from x$kccrt).
Target RBA
It is representing RBA which DBWR want to done advance checkpoint for instance recovery (if enabled).
We can check from x$targetrba.
On-disk RBA
That RBA which was flushed in to online Redo Log File on disk. This RBA recorded in to control file record

section. We can check from x$kcccp for on-disk RBA (sometimes from x$targetrba).

Checkpoint in oracle database






ORACLE CHECKPOINTS
In this post, I will explain about checkpoints – their purpose and different types of checkpoints.
CHECKPOINTS BENEFITS
   Database blocks are temporarily stored in Database buffer cache. As blocks are read, they are stored in DB buffer cache so that if any user accesses them later, they are available in memory and need not be read from the disk. When we update any row, the buffer in DB buffer cache corresponding to the block containing that row is updated in memory. Record of the change made is kept in redo log buffer. On commit, the changes we made are written to the disk thereby making them permanent. But where are those changes written? To the data files containing data blocks? No!!! The changes are recorded in online redo log files by flushing the contents of redo log buffer to them. This is called write ahead logging.  If the instance crashed right now, the DB buffer cache will be wiped out but on restarting the database, Oracle will apply the changes recorded in redo log files to the datafiles.
    Why doesn’t Oracle write the changes to datafiles right away when we commit the transaction? The reason is simple. If it chose to write directly to the datafiles, it will have to physically locate the data block in the datafile first and then update it which means that after committing, user has to wait until DBWR searches for the block and then writes it before he can issue next command. Moreover, writing to datafiles is performed in  units of Oracle data blocks. Each block may contain multiple rows. Modifying even one column in one row of a block will necessitate writing whole of the block. This will bring down the performance drastically. That is where the role of redo logs comes in. The writes to the redo logs are sequential writes – LGWR just dumps the info in redologs to log files sequentially and synchronously so that the user does not have to wait for long. Moreover, in contrast to DBWR which writes data blocks, LGWR will write only the changes vectors . Hence,  write ahead logging also improves performance by reducing the amount of data written synchronously. When will the changes be applied to the datablocks in datafiles? The data blocks in the datafiles will be updated by the DBWR asynchronously in response to certain triggers. These triggers are called checkpoints.
  Checkpoint is a synchronization event at a specific point in time which causes some / all dirty blocks to be written to disk thereby guaranteeing that blocks dirtied prior to that point in time get written.
  Whenever dirty blocks are written to datafiles, it allows oracle
- to reuse a redo log : A redo log can’t be reused until DBWR writes all the dirty blocks protected by that logfile to disk. If we attempt to reuse it before DBWR has finished its checkpoint, we get the following message in alert log : Checkpoint not complete.
to reduce instance recovery time : As the memory available to a database instance increases, it is possible to have database buffer caches as large as several million buffers. It requires that the database checkpoint advance frequently to limit recovery time, since infrequent checkpoints and large buffer caches can exacerbate crash recovery times significantly.
to free buffers for reads : Dirtied blocks can’t be used to read new data into them until they are written to disk. Thus DBWrR writes dirty blocks from the buffer cache, to make room in the cache.
Various types of checkpoints  in Oracle :
– Full checkpoint
– Thread checkpoint
- File checkpoint
- Parallel Query checkpoint
- Object checkpoint
- Log switch checkpoint
_ Incremental checkpoint
Whenever a checkpoint is triggered :
- DBWR writes some /all dirty blocks to datafiles
- CKPT process updates the control file and datafile headers
                      FULL CHECKPOINT
- Writes block images to  the database for all dirty buffers from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . DBWR thread checkpoint buffers written
- Caused by :
  . Alter system checkpoint [global]
  . ALter database begin backup
  . ALter database close
  . Shutdown [immediate]
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                   THREAD CHECKPOINT
 – Writes block images to the database for all dirty buffers from one instance
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . DBWR thread checkpoint buffers written
- Caused by :
  . Alter system checkpoint local
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                   FILE CHECKPOINT
   When a tablespace is put into backup mode or take it offline, Oracle writes all the dirty blocks from the tablespace to disk before changing the state of the tablespace.
- Writes block images to the database for all dirty buffers for all files of a tablespace from all instances
- Statistics updated
  . DBWR checkpoints
  . DBWR tablespace checkpoint buffers written
  . DBWR checkpoint buffers written
- Caused by :
  . Alter tablespace xxx offline
  . Alter tablespace xxx begin backup
  . Alter tablespace xxx read only
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                  PARALLEL QUERY CHECKPOINT
   Parallel query often results in direct path reads (Full tablescan or index fast full scan). This means that blocks are read straight into the session’s PGA, bypassing the data cache; but that means if there are dirty buffers in the data cache, the session won’t see the most recent versions of the blocks unless they are copied to disk before the query starts – so parallel queries start with a checkpoint.
- Writes block images to the database for all dirty buffers belonging to objects accessed by the query from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
- Caused by :
  . Parallel Query
  . Parallel Query component of Parallel DML (PDML) or Parallel DDL (PDDL)
- Mandatory for consistency
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                 OBJECT CHECKPOINT
   When an object is dropped/truncated, the session initiates an object checkpoint telling DBWR to copy any dirty buffers for that object to disk and the state of those buffers is changed to free.
- Writes block images to the database for all dirty buffers belonging to an object from all instances.
- Statistics updated
  . DBWR checkpoints
  . DBWR object drop buffers written
- Caused by dropping or truncating a segment:
  . Drop table XXX
  . Drop table XXX Purge
  . Truncate table xxx
  . Drop index xxx
- Mandatory for media recovery purposes
- Controlfile and datafile headers are updated
  . Checkpoint_change#
                LOG SWITCH CHECKPOINT
- Writes the contents of the  dirty buffers whose information is protected by a redo log to the database .
- Statistics updated
  . DBWR checkpoints
  . DBWR checkpoint buffers written
  . background checkpoints started
  . background checkpoints completed
- Caused by log switch
 – Controlfile and datafile headers are updated
  . Checkpoint_change#
                  INCREMENTAL CHECKPOINT
   Prior to Oracle 8i, only well known checkpoint was log switch checkpoint. Whenever LGWR filled an online logfile, DBWR would go into a frenzy writing data blocks to disks, and when it had finished, Oracle would update each data file header block with the SCN to show that file was updated up to that point in time.
   Oracle 8i introduced incremental checkpointing which triggered DBWR to write some dirty blocks from time to time so as to advance the checkpoint and reduce the instance recovery time.
Incremental checkpointing has been implemented using two algorithms :
 – Ageing algorithm
- LRU/TCH algorithm
                                               AGEING ALGORITHM
This strategy involves writing changed blocks that have been dirty for the longest time and  is called aging writes. This algorithm relies on the CKPT Q running thru the cache and buffers being linked to the end of this list the first time they are made dirty.
   .The LRU list contains all the buffers – free / pinned / dirty. Whenever a buffer in LRU list is dirtied, it is placed in CKPT Q as well i.e. a buffer can  simultaneously have pointers in both LRU list and CKPT Q but the buffers in CKPT Q are arranged in the order in which they were dirtied.Thus,  checkpoint queue contains dirty blocks in the order of SCN# in which they were dirtied
  Every 3 secs DBWR wakes up and checks if there are those many  dirty buffers in CKPT Q which need to br written so as to satisfy instance recovery requirement..
If those many or more dirty buffers are not found,
   DBWR goes to sleep
else (dirty buffers found)
  .CKPT target RBA is calculated based on
   – The most recent RBA
   – log_checkpoint_interval
   – log_checkpoint_timeout
   – fast_start_mttr_target
   – fast_start_io_target
   – 90% of the size of the smallest redo log file
   . DBWR walks the CKPT Q from the low end (dirtied earliest) of the redo log file collecting buffers for writing to disk until it reaches the buffer that is more recent than the target RBA. These buffers are placed in write list-main.
  . DBWR walks the write list-main and checks all the buffers
    – If changes made to the buffer have already been written to redo log files
        . Move those buffers to write-aux list
      else
        . Trigger LGWR to write changes to those buffers to redo logs
        . Move those buffers to write-aux list
  . Write buffers from write-aux list to disk
  . Update checkpoint RBA in SGA
  . Delink those buffers from CKPT Q
  . Delink those buffers from write-aux list
- Statistics Updated :
   . DBWR checkpoint buffers written
- Controlfile updated every 3 secs by CKPT
   . Checkpoint progress record
   As sessions link buffers to one end of the list, DBWR can effectively unlink buffers from the other end and copy them to disk. To reduce contention between DBWR and foreground sessions, there are two linked lists in each working set so that foreground sessions can link buffers to one while DBWR is unlinking them from the other.
                                                          LRU/TCH ALGORITHM
 LRU/TCH algorithm writes the cold dirty blocks to disk that are on the point of being pushed out of cache.
   As per ageing algorithm, DBWR will wake up every 3 seconds to flush dirty blocks to disk. But if blocks get dirtied at a fast pace during those 3 seconds and a server process needs some free buffers, some buffers need to be flushed to the disk to make room. That’s when LRU/TCH algorithm is used to write those dirty buffers which are on the cold end of the LRU list.
    Whenever a server process needs some free buffers to read data, it scans the LRU list from its cold end to look for free buffers.
While searching
  If unused buffers found
    Read blocks from disk into the buffers and link them to the corresponding hash bucket
   if it finds some clean buffers (contain data but not dirtied or dirtied and have been flushed to disk),
      if they are the candidates to be aged out (low touch count)
          Read blocks from disk into the buffers and link them to the corresponding hash bucket
      else (have been accessed recently and should not be aged out)
         Move them to MRU end depending upon its touch count.
   If it finds dirty buffers (they are already in CKPT Q),
     Delink them from LRU list
     Link them  to the write-main list (Now these buffers are in CKPT Q and write-main list)
   The server process scans a threshold no. of buffers (_db_block_max_scan_pct = 40(default)). If it does not find required no. of free buffers,
    It triggers DBWR to dirty blocks in write-mainlist to disk
 . DBWR walks the write list-main and checks all the buffers
    – If changes made to the buffer have already been written to redo log files
           . Move those buffers to write-aux list
       else
          . Trigger LGWR to write changes to those buffers to redo logs
          . Move those buffers to write-aux list
  . Write buffers from write-aux list to disk
  . Delink those buffers from CKPT Q and w rite-aux list
  . Link those buffers to LRU list as free buffers
 Note that
- In this algorithm, the dirty blocks are delinked from LRU list before linking them to write-main list in contrast to ageing algorithm where the blocks can be simultaneously be in both CKPT Q and LRU list.
 – In this algorithm, checkpoint is not advanced because it may be possible that the dirty blocks on the LRU end may actually not be the ones which were dirtied earliest. They may be there because the server process did not move them to the MRU end earlier. There might be blocks present in CKPT Q which were dirtied earlier than the blocks in question.
I hope the information was useful. Thanks for your time.

Step by Step YugabyteDB 2.11 (Open Source) Distributed DB - Multi-node Cluster Setup on RHEL

Scope - ·        Infrastructure planning and requirement for installation of the multi-node cluster database ·        Prerequisites Software...