Thursday, January 24, 2013

Amazing Automatic Block Media Recovery by Active Data Guard



Oracle Database 11g Release 2 now has the capability to automatically repair corrupt data blocks in your production database as soon as the corruption is detected by using your Active Data Guard standby database to retrieve good copies of the corrupted blocks. Automatic Block Media Recovery will also automatically repair corrupted blocks that are discovered in your physical standby databases. This feature reduces the amount of time that data is inaccessible due to block corruption and will avoid returning errors to your application.

This reduces block recovery time by using up-to-date good blocks in real-time, as opposed to retrieving blocks from disk or tape backups, or from Flashback logs. The process is as follows:

If a corrupt data block is discovered on a Primary database.


If you have a physical standby database open read only with Active Data Guard, then the standby repairs corrupt data blocks in the primary database automatically. As result, any corrupt data block encountered when a primary database is accessed is automatically replaced with an uncorrupted copy of that block from the Active Data Guard physical standby database.



Below steps are my test.


1. Found a block to be to be corrupted.


SYS@DG> conn liqy/liqyliqy

Connected.

LIQY@DG> select * from t1;


F1

----------

1


SYS@DG> select segment_name , header_file , header_block

2 from dba_segments

3 where segment_name = 'T1'

4 and owner = 'LIQY';


SEGMENT_NAME HEADER_FILE HEADER_BLOCK

----------------------------------------------------------- ----------- ------------

T1 4 130


The header of the table is in block 130, so the only row should be in block 131, and we will get an error when we try to select data from the corrupted block.


SYS@DG> show parameter block


NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

db_block_buffers integer 0

db_block_checking string FALSE

db_block_checksum string TYPICAL

db_block_size integer 8192

db_file_multiblock_read_count integer 118


2.To show how this feature works we need to close our physical standby so that Automatic Block Media Recovery cannot happen , and we do see the error because it cannot automatically recover the block.


In standby node, ensure data is fine before our corruption.

orarac2poc:DGS:/home/oracle> sqlplus / as sysdba


SQL*Plus: Release 11.2.0.2.0 Production on Thu Jan 24 14:56:51 2013


Copyright (c) 1982, 2010, Oracle. All rights reserved.



Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options


SYS@DGS> conn liqy/liqyliqy

Connected.

LIQY@DGS> select * from t1;


F1

----------

1


orarac2poc:DGS:/home/oracle> dgmgrl

DGMGRL for Linux: Version 11.2.0.2.0 - 64bit Production


Copyright (c) 2000, 2009, Oracle. All rights reserved.


Welcome to DGMGRL, type "help" for information.

DGMGRL> connect sys/oracle123

Connected.

DGMGRL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

DGMGRL> startup

ORACLE instance started.

Database mounted.

Database opened.

DGMGRL> exit




3. Corrupt block 131 using the "dd" command in Linux.


orarac1poc:DG:/u01/oradata/DG> ls -lrt

total 2896464

...

-rw-r----- 1 oracle asmadmin 5251072 Jan 24 10:13 users01.dbf


orarac1poc:DG:/u01/oradata/DG> date

Thu Jan 24 14:53:26 SGT 2013

orarac1poc:DG:/u01/oradata/DG> cp -p users01.dbf users01.dbf.201301241454

orarac1poc:DG:/u01/oradata/DG> dd of=users01.dbf bs=8192 conv=notrunc seek=131 <> make it corrupted.
> EOF
0+1 records in
0+1 records out
19 bytes (19 B) copied, 7.4e-05 seconds, 257 kB/s
orarac1poc:DG:/u01/oradata/DG> ls -l users01.dbf
-rw-r----- 1 oracle asmadmin 5251072 Jan 24 15:01 users01.dbf

4. In primary node, access the table again.

orarac1poc:DG:/u01/oradata/DG> sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Thu Jan 24 15:01:28 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SYS@DG> conn liqy/liqyliqy
Connected.
LIQY@DG> select * from t1;

F1
----------
1

LIQY@DG> exit

#still accessible, looks we forget one thing.
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
orarac1poc:DG:/u01/oradata/DG> sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Thu Jan 24 15:01:54 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SYS@DG> alter system flush buffer_cache;

System altered.

-- in alert log
Thu Jan 24 15:02:24 2013
ALTER SYSTEM: Flushing buffer cache


SYS@DG> select * from liqy.t1;
select * from liqy.t1
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/u01/oradata/DG/users01.dbf'


-- information in alert.log

Thu Jan 24 15:02:26 2013


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dgs_DGB.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dgs_DGB.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dgs_DGB.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=DGS.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=DGS.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=DGS.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:26
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Hex dump of (file 4, block 131) in trace file /u01/app/oracle/diag/rdbms/dg/DG/trace/DG_ora_30281.trc
Corrupt block relative dba: 0x01000083 (file 4, block 131)
Bad header found during multiblock buffer read
Data in bad block:
type: 109 format: 1 rdba: 0x20746920
last change scn: 0x7075.72726f63 seq: 0x74 flg: 0x65
spare1: 0x6b spare2: 0x65 spare3: 0xa
consistency value in tail: 0x381f0601
check value in block header: 0x2e64
computed block checksum: 0x6675
Reading datafile '/u01/oradata/DG/users01.dbf' for corruption at rdba: 0x01000083 (file 4, block 131)
Reread (file 4, block 131) found same corrupt data (no logical check)
Starting background process ABMR
Thu Jan 24 15:02:31 2013
ABMR started with pid=28, OS id=30316
Automatic block media recovery service is active.
Automatic block media recovery requested for (file# 4, block# 131)
Thu Jan 24 15:02:31 2013


***********************************************************************

Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orarac2poc)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=DGS.POC)(CID=(PROGRAM=oracle)(HOST=orarac1poc)(USER=oracle))))

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
Time: 24-JAN-2013 15:02:31
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Errors in file /u01/app/oracle/diag/rdbms/dg/DG/trace/DG_bmr0_30318.trc:
ORA-17627: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-17629: Cannot connect to the remote database server <
Automatic block media recovery failed for (file# 4, block# 131)
[No standby available]
Thu Jan 24 15:02:32 2013
Corrupt Block Found
TSN = 4, TSNAME = USERS
RFN = 4, BLK = 131, RDBA = 16777347
OBJN = 61354, OBJD = 61354, OBJECT = T1, SUBOBJECT =
SEGMENT OWNER = LIQY, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/dg/DG/trace/DG_ora_30281.trc (incident=61473):
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/u01/oradata/DG/users01.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/dg/DG/incident/incdir_61473/DG_ora_30281_i61473.trc
Thu Jan 24 15:02:36 2013
Dumping diagnostic data in directory=[cdmp_20130124150236], requested by (instance=1, osid=30281), summary=[incident=61473].
Thu Jan 24 15:02:36 2013
Sweep [inc][61473]: completed
Hex dump of (file 4, block 131) in trace file /u01/app/oracle/diag/rdbms/dg/DG/incident/incdir_61473/DG_m000_30326_i61473_a.trc
Corrupt block relative dba: 0x01000083 (file 4, block 131)
Bad header found during validation
Data in bad block:
type: 109 format: 1 rdba: 0x20746920
last change scn: 0x7075.72726f63 seq: 0x74 flg: 0x65
spare1: 0x6b spare2: 0x65 spare3: 0xa
consistency value in tail: 0x381f0601
check value in block header: 0x2e64
computed block checksum: 0x6675
Reread of blocknum=131, file=/u01/oradata/DG/users01.dbf. found same corrupt data
Reread of blocknum=131, file=/u01/oradata/DG/users01.dbf. found same corrupt data
Reread of blocknum=131, file=/u01/oradata/DG/users01.dbf. found same corrupt data
Reread of blocknum=131, file=/u01/oradata/DG/users01.dbf. found same corrupt data
Reread of blocknum=131, file=/u01/oradata/DG/users01.dbf. found same corrupt data
Checker run found 1 new persistent data failures
Thu Jan 24 15:02:37 2013
Sweep [inc2][61473]: completed


5. Now let us start active physical standby and access the table to see what happens.

DGMGRL> startup
ORACLE instance started.
Database mounted.
Database opened.
DGMGRL> exit
orarac2poc:DGS:/home/oracle> sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Thu Jan 24 15:03:44 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SYS@DGS> select * from liqy.t1;

F1
----------
1

-- in standby alert log, no info related ABMR.


Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE

Media Recovery Log /u01/oradata/DG/FRA/DGS/archivelog/2013_01_24/o1_mf_1_52_8j1q5vpm_.arc

Media Recovery Waiting for thread 1 sequence 53 (in transit)

Recovery of Online Redo Log: Thread 1 Group 5 Seq 53 Reading mem 0

Mem# 0: /u01/oradata/DG/stbredo05.rdo



6. but in physical database now.


SYS@DG> select * from liqy.t1;


F1

----------

1


-- in primary db's alert log.


******************************************************************

LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2

******************************************************************

Thu Jan 24 15:03:23 2013

ARCd: Standby redo logfile selected for thread 1 sequence 52 for destination LOG_ARCHIVE_DEST_2

LNS: Standby redo logfile selected for thread 1 sequence 53 for destination LOG_ARCHIVE_DEST_2

Thu Jan 24 15:04:14 2013
Hex dump of (file 4, block 131) in trace file /u01/app/oracle/diag/rdbms/dg/DG/trace/DG_ora_30281.trc
Corrupt block relative dba: 0x01000083 (file 4, block 131)
Bad header found during multiblock buffer read

Data in bad block:

type: 109 format: 1 rdba: 0x20746920

last change scn: 0x7075.72726f63 seq: 0x74 flg: 0x65

spare1: 0x6b spare2: 0x65 spare3: 0xa

consistency value in tail: 0x381f0601

check value in block header: 0x2e64

computed block checksum: 0x6675

Reading datafile '/u01/oradata/DG/users01.dbf' for corruption at rdba: 0x01000083 (file 4, block 131)
Reread (file 4, block 131) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 4, block# 131)
Thu Jan 24 15:04:16 2013
Automatic block media recovery successful for (file# 4, block# 131)
Automatic block media recovery successful for (file# 4, block# 131)
WARNING: AutoBMR fixed mismatched on-disk block 20746920 with in-mem rdba 1000083.



Amazing!!! That is only what I can say. It is very help for media recovery!



References:


Fix physically corrupt data blocks using PLSQL - DBMS_REPAIR

Oracle's "Active Data Guard Hands On Lab.pdf"