Monday, January 01, 2018

ora-00600 related docs for block corruption

Mark some doc ID here for recent incident. A  Happy New Year ahead to all!

ORA-600 [6006] ORA-600 [6856] During Startup Instance, Followed by Termination by SMON (Doc ID 549000.1)

Identify the Corruption Extension for Block Corruption, Table/Index Inconsistency, Data Dictionary and Lost Writes (Doc ID 836658.1)

RMAN - Identify Datafile Block Corruptions

  • To identify both Physical and Logical Block Corruptions use the "CHECK LOGICAL" option. The next command checks the complete database for both corruptions without actually doing a backup:
$ rman target /
RMAN>  backup check logical validate database;
The next command checks the complete database for both corruptions in a backup:
$ rman target /
RMAN> backup check logical database
  • Check the view V$DATABASE_BLOCK_CORRUPTION to identify the block corruptions detected by RMAN. 
  • Use Doc ID 472231.1 (section "Step 2: Identify the corrupt segments") to identify all the Corrupted Objects in the Database reported by RMAN.
  • The above command can use PARALLELISM using multiple channels to make the validation faster.  Reference Doc ID 472231.1 for examples of PARALLELISM.
  • By Default RMAN backups, without the CHECK LOGICAL option, only detect Physical Block Corruptions.


Identify corruption caused by LOST WRITES

A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage.  The result is that the block in the database is a stale/old copy which is not logical or physical corrupt; block internal structures are correct. Reference Doc ID 840978.1 for more information about Physical and Logical block corruption.

A block with lost changes may produce several errors when compared with another context like ORA-600 [kdsgrp1] (table/index inconsistency or invalid chained row pointer), ORA-8103 (old object id), etc; or the next errors during media recovery (like in a physical standby): ORA-600 [3020], ORA-752 (if db_lost_write_protect is enabled).

  • DBV/RMAN are not intended to identify inconsistency caused by LOST Write:
Identifying the corruption extension by lost IO is not straight forward as dbverify/rman run intra-block checks (blocks are not compared with another context). The block itself is healthy as structures are valid (not garbage). However, in very rare cases a block can be indirectly exposed to logical corruption especially in the space management area if there is a lost write. Example is that the block was being marked as full in the metadata but that change was lost. Subsequent inserts may logically corrupt the block.
  • Media RECOVERY / Physical Standby
The best option is to have a media recovery in place like a standby database or restore/recover the database in another system.  Media recovery performs checks to identify if the block content is the one expected as the redo structure keeps track of block previous version (expected scn) and compare it with the current block scn.  If there is a mismatch then ORA-600 [3020] or ORA-752 are produced.




  • EXPORT
Running export may help to identify if a chained row has an invalid pointer which may cause ORA-00600 [25027] or ORA-00600 [kdsgrp1] but export does not identify an old block version or if there is a table/index mismatch.  In UNIX systems it can be done to /dev/null:
exp system/manager full=y log=exp_validation.log file=/dev/null volsize=0



How to identify all the Corrupted Objects in the Database with RMAN (Doc ID 472231.1)


To make it faster,  RMAN can be configured to use PARALLELISM with multiple channels:

RMAN> configure device type disk parallelism 4;
RMAN> backup validate check logical database;

OR

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}
Output
V$DATABASE_BLOCK_CORRUPTION is updated with the corrupt blocks.


In 12c the NOLOGGING blocks identified by rman validate are in new view v$nonlogged_block:

 RMAN keeps corruption information in the control file (v$database_block_corruption, v$backup_corruption)


1) validate all database files and archived redo log files for physical and logical corruption:

   BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;



Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1)

  
 ORA-1499. Table/Index row count mismatch (Doc ID 563070.1)

Trial Recovery - Recover database Test (Doc ID 283262.1)


How to use TEST option for any RECOVER command ?

For example, you can start SQL*Plus and then issue any of the following commands:
RECOVER DATABASE TEST
RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL TEST
RECOVER TABLESPACE TEST
RECOVER DATABASE UNTIL CANCEL TEST
 
By default, trial recovery always attempts to corrupt blocks in memory if this action allows trial recovery to proceed. In other words, trial recovery by default can corrupt an unlimited number of data blocks. You can specify the ALLOW n CORRUPTION clause on the RECOVER ... TEST statement to limit the number of data blocks trial recovery can corrupt in memory. For an example
SQL> RECOVER DATABASE TEST ALLOW n CORRUPTION;


-- where n is the number of blocks.
 
 
 RMAN : Block-Level Media Recovery - Concept & Example (Doc ID 144911.1)
 
RMAN> run {BACKUP VALIDATE DATABASE;}
 
SQL> select * from V$backup_corruption;
 
  Alternatively, you can use Data Recovery Advisor (DRA):
RMAN> list failure;
RMAN> repair failure preview;
RMAN> repair failure noprompt;
  
 
1) validate all database files and archived redo log files for physical and logical corruption:

   BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

2) to check individual data blocks, as shown in the following example:

   VALIDATE DATAFILE 4 BLOCK 10 TO 13;

3) validate backup sets:

   VALIDATE BACKUPSET 3;
 
 
The following RMAN command recovers the corrupted blocks:


1) recover all corrupted blocks reported in v$database_block_corruption

   RMAN> RECOVER CORRUPTION LIST;

2) recover individual blocks, see eg:

   RMAN> RECOVER DATAFILE 1 BLOCK 233, 235 DATAFILE 2 BLOCK 100 TO 200;

ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING - Error explanation and solution (Doc ID 794505.1)
 
 
 The
 "VALIDATE" RMAN command is used to identify NOLOGGING blocks and 
populates the view v$database_block_corruption (versions lower than 12c)
 and v$nonlogged_block (12c and greater). 


In version 12.2 the new command "validate .. nonlogged block" is available to validate NOLOGGING Blocks.  In the next example datafiles 5 and 6 have nologged blocks:
RMAN> validate database nonlogged block;



Monitoring NOLOGGING Operations


The RMAN command "REPORT UNRECOVERABLE" reports when a data file has been changed by a NOLOGGING operation and the datafile has not been backed up since then. Example:
RMAN> report unrecoverable;


In 12c there is the option to use the RMAN command: RECOVER NONLOGGED BLOCK with DATAFILE,TABLESPACE,DATABASE granularity.  An example for DATABASE is:
RMAN> RECOVER DATABASE NONLOGGED BLOCK;
To avoid the problem from being introduced, force logging in the PRIMARY database with:
alter database force logging;
 
 


 DBMS_REPAIR SCRIPT (Doc ID 556733.1)

ORA-1578 ORA-26040 in a LOB segment - Script to solve the errors (Doc ID 293515.1)