Tuesday, March 19, 2013

key points of ADR, Clustware log housekeep

Due to space pressure in my test servers. I learn follow key points for various log file housekeeping in the environment having RAC, data guard, ASM instances.

1. Change default ADR retention policy.

LONGP_POLICY (long term) defaults to 365 days and relates to things like Incidents and Health Monitor warnings.
SHORTP_POLICY (short term) defaults to 30 days and relates to things like trace and core dump files.

Here are commands I used to set the retention policy.

 show control;
 set control (SHORTP_POLICY =360);
 set control (LONGP_POLICY =720);
 show control;

Note that files that are purged based on LONGP_POLICY includes the following:
-files in the alert directory
-files in the incident/incdir_ directory and metadata(in the directories metadata,ir,lck)  in the incident schema
-files in the sweep directory
-files in the stage directory
-HM — files in the hm directory and metadata in the HM schema

And files that are purged based on SHORTP_POLICY are:
-files in the trace directory
-files in the cdump directory
-files in the trace/cdmp_ directories
-files in the incpkg directory and metadata in the IPS schema

2. MMON doesn't clean ADR log files immediately, even I restarted database and wait for one day (nothing happened).

As per metalink document 975448.1 the purge happens once in 7 days.

3. The location where you issue adrci command affects homes you will see in adrci command prompt.

e.g.
orarac1poc:DG:/u01/app/oracle> cd "/u01/app/grid"
orarac1poc:DG:/u01/app/grid> adrci

ADRCI: Release 11.2.0.2.0 - Production on Tue Mar 19 14:19:32 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/grid"
adrci> show homes
ADR Homes:
diag/asm/+asm/+ASM1
diag/tnslsnr/orarac1poc/listener_scan1
diag/tnslsnr/orarac1poc/listener
diag/asmtool/user_grid/host_2737638882_80

In my case, I need to change default retention policy for each homes of ASM,  clustware, database (single instance, RAC and data guard).

if you go to lower level, then only one home is shown.

[grid@orarac1poc listener_scan1]$ pwd
/u01/app/grid/diag/tnslsnr/orarac1poc/listener_scan1
[grid@orarac1poc listener_scan1]$ adrci

ADRCI: Release 11.2.0.2.0 - Production on Tue Mar 19 17:38:19 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/grid"
adrci> show homes
ADR Homes:
diag/tnslsnr/orarac1poc/listener_scan1

4. "Oracle Clusterware logs and RDBMS logs rotation policy [ID 1368695.1]" explains clustware logs and so on. Read it again and again.

Note that some logs are not managed automatically, especially for /rdbms/audit directory. I have hundreds of thousands of *.aud files there. Therefore I use cron job to purge it  . 

find ${GRID_HOME}/rdbms/audit \( -name '*.aud' \) -mtime +15 -exec rm {} \; -prune


5. Purge command of adrci.

I notice some scripts using purge but without any argument , this purges all diagnostic data in the current ADR home based on the default purging policies:
 
 
Updates on 25-Mar.

After one week of above changes,
1. I notice old XML files in alert and trace are purged as expected.
2. But not of XML files of Grid Infrastructure, e.g SCAN Listener's log, which requires manual housekeeping.

 
/u01/app/11.2.0/grid/log/diag/tnslsnr/orarac1poc/listener_scan1/alert
[grid@orarac1poc alert]$ ls -lt
total 489044
-rw-r----- 1 grid oinstall  6972932 Mar 19 14:46 log.xml
-rw-r----- 1 grid oinstall 10485778 Mar 16 13:13 log_47.xml
-rw-r----- 1 grid oinstall 10485817 Mar 11 22:36 log_46.xml
-rw-r----- 1 grid oinstall 10485801 Mar  7 08:00 log_45.xml
...
-rw-r----- 1 grid oinstall 10485946 Dec 22  2011 log_3.xml
-rw-r----- 1 grid oinstall 10486146 Dec 22  2011 log_2.xml
-rw-r----- 1 grid oinstall 10485989 Dec 22  2011 log_1.xml

As quick solution, I use "find" command to do the job.

#purge XML listener log
find /u01/app/grid/diag/tnslsnr/orarac1poc/listener/alert \( -name 'log*.xml' \) -mtime +15 -exec rm {} \; -prune

for me, I even like to turn listener DIAG_ADR off in sqlnet.ora.
DIAG_ADR_ENABLED = OFF

Updated on 11-Apr.

After monitoring for few weeks, I found the SHORTP/LONGP POLICY doesn't work with my 11.2.0.2 env to purge ASM alert and trace sub-directories.

References:
 
 

Purging trace and dump files with 11g ADRCI

ADRCI: ADR Command Interpreter

adrci does not remove trace files from operating system

ADR files auto housekeeping

Scripts: adrci purge script, housekeeping

Thread: Check/Clear Oracle Cluster log files