Friday, May 30, 2008

Merge Join caused Cartesian Product

--Merge Join caused Cartesian Product

1591 RPTOPR CREATE TABLE tmp_aio_acct_tp1 as SELECT distinct(a.customer_id),
1591 RPTOPR decode((c.bld_stat_id),1,'Blacklisted') blist_status, d.bdr_evs
1591 RPTOPR dt start_blist, d.bdr_blist_resn_type_id blist_reason FROM tmp_a
1591 RPTOPR io_acct_tb a, blist_cust b, blist_dtls c, blist_dtls_resn_code d
1591 RPTOPR -- WHERE rtrim(a.customer_id) = rtrim(b.blc_cust_id) and WHERE
1591 RPTOPR rtrim(a.customer_id) = b.blc_cust_id and b.bld_blist_id=c.bld_bl
1591 RPTOPR ist_id and b.bld_blist_id=d.bld_blist_id and d.bdr_evedt is NULL
1591 RPTOPR


--this query can't finish event after 10 hours.
--expect 16 hours to completed.
--notice the executions is exterm high.

^LSQL ordered by Executions DB/Inst: PRXP/PRXRP Snaps: 581-587
-> Total Executions: 96,444,779
-> Captured SQL account for 100.0% of Total

CPU per Elap per
Executions Rows Processed Rows per Exec Exec (s) Exec (s) SQL Id
------------ --------------- -------------- ---------- ----------- -------------
96,183,306 0 0.0 0.00 0.00 b9ag273wzuhbx
Module: oracleODSP@ods01 (TNS V1-V3)
SELECT "BLC_CUST_ID","BLD_BLIST_ID" FROM "DBO"."BLIST_CUST" "B" WHERE "BLC_CUST_
ID"=:1 AND "BLD_BLIST_ID"=:2


Elapsed CPU Elap per % Total
Time (s) Time (s) Executions Exec (s) DB Time SQL Id
---------- ---------- ------------ ---------- ------- -------------
2,900 2,898 96,183,306 0.0 14.4 b9ag273wzuhbx
Module: oracleODSP@ods01 (TNS V1-V3)
SELECT "BLC_CUST_ID","BLD_BLIST_ID" FROM "DBO"."BLIST_CUST" "B" WHERE "BLC_CUST_
ID"=:1 AND "BLD_BLIST_ID"=:2


^LSQL ordered by Gets DB/Inst: PRXP/PRXRP Snaps: 581-587
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> Total Buffer Gets: 719,230,530
-> Captured SQL account for 99.9% of Total

Gets CPU Elapsed
Buffer Gets Executions per Exec %Total Time (s) Time (s) SQL Id
-------------- ------------ ------------ ------ -------- --------- -------------
319,369,588 96,183,306 3.3 44.4 2897.99 2900.45 b9ag273wzuhbx
Module: oracleODSP@ods01 (TNS V1-V3)
SELECT "BLC_CUST_ID","BLD_BLIST_ID" FROM "DBO"."BLIST_CUST" "B" WHERE "BLC_CUST_ID"=:1 AND "BLD_BLIST_ID"=:2



1591 RPTOPR 1 0 1 LOAD AS SELECT
1591 RPTOPR 2 1 1 SORT
1591 RPTOPR 3 2 1 NESTED LOOPS
1591 RPTOPR 4 3 1 NESTED LOOPS
1591 RPTOPR 5 4 1 MERGE JOIN
1591 RPTOPR 6 5 1 REMOTE BLIST_DTLS_RESN_CODE
1591 RPTOPR 7 5 2 BUFFER
1591 RPTOPR 8 7 1 TABLE ACCESS TMP_AIO_ACCT_TB
1591 RPTOPR 9 4 2 REMOTE BLIST_CUST
1591 RPTOPR 10 3 2 REMOTE BLIST_DTLS


-- the rows of these two tables explains the high number of executions.
-- 3 millions rows in BLIST_DTLS_RESN_CODE and one quarter records meet "d.bdr_evedt is NULL", 7k rows in TMP_AIO_ACCT_TB
-- 3200k/4 * 7k = 5600k * k = 5600 Millions



SELECT distinct(a.customer_id),
decode((c.bld_stat_id),1,'Blacklisted') blist_status, d.bdr_evsdt start_blist, d.bdr_blist_resn_type_id blist_reason
FROM t1 a, blist_cust b, blist_dtls c, blist_dtls_resn_code d
WHERE rtrim(a.customer_id) = b.blc_cust_id and b.bld_blist_id=c.bld_blist_id and b.bld_blist_id=d.bld_blist_id and d.bdr_evedt is NULL;

SELECT /*+ ordered use_nl(b c d) */
distinct(a.customer_id),
decode((c.bld_stat_id),1,'Blacklisted') blist_status, d.bdr_evsdt start_blist, d.bdr_blist_resn_type_id blist_reason
FROM t1 a, blist_cust b, blist_dtls c, blist_dtls_resn_code d
WHERE rtrim(a.customer_id) = b.blc_cust_id and b.bld_blist_id=c.bld_blist_id
and b.bld_blist_id=d.bld_blist_id and d.bdr_evedt is NULL;

1601 RPTOPR 0 17168 SELECT STATEMENT
1601 RPTOPR 1 0 1 SORT
1601 RPTOPR 2 1 1 NESTED LOOPS
1601 RPTOPR 3 2 1 NESTED LOOPS
1601 RPTOPR 4 3 1 NESTED LOOPS
1601 RPTOPR 5 4 1 TABLE ACCESS T1
1601 RPTOPR 6 4 2 REMOTE BLIST_CUST
1601 RPTOPR 7 3 2 REMOTE BLIST_DTLS
1601 RPTOPR 8 2 2 REMOTE BLIST_DTLS_RESN_CODE



-- finished within 1 mins



--try this hint (bad plan same as orginal)
SELECT /*+ use_nl(b c d) */
distinct(a.customer_id),
decode((c.bld_stat_id),1,'Blacklisted') blist_status, d.bdr_evsdt start_blist, d.bdr_blist_resn_type_id blist_reason
FROM t1 a, blist_cust b, blist_dtls c, blist_dtls_resn_code d
WHERE rtrim(a.customer_id) = b.blc_cust_id and b.bld_blist_id=c.bld_blist_id
and b.bld_blist_id=d.bld_blist_id and d.bdr_evedt is NULL;


1601 RPTOPR 0 4506 SELECT STATEMENT
1601 RPTOPR 1 0 1 SORT
1601 RPTOPR 2 1 1 NESTED LOOPS
1601 RPTOPR 3 2 1 NESTED LOOPS
1601 RPTOPR 4 3 1 MERGE JOIN
1601 RPTOPR 5 4 1 REMOTE BLIST_DTLS_RESN_CODE
1601 RPTOPR 6 4 2 BUFFER
1601 RPTOPR 7 6 1 TABLE ACCESS T1
1601 RPTOPR 8 3 2 REMOTE BLIST_CUST
1601 RPTOPR 9 2 2 REMOTE BLIST_DTLS

-
--tried ordered only

SELECT /*+ ordered */
distinct(a.customer_id),
decode((c.bld_stat_id),1,'Blacklisted') blist_status, d.bdr_evsdt start_blist, d.bdr_blist_resn_type_id blist_reason
FROM t1 a, blist_cust b, blist_dtls c, blist_dtls_resn_code d
WHERE rtrim(a.customer_id) = b.blc_cust_id and b.bld_blist_id=c.bld_blist_id
and b.bld_blist_id=d.bld_blist_id and d.bdr_evedt is NULL;

1601 RPTOPR 0 9458 SELECT STATEMENT
1601 RPTOPR 1 0 1 SORT
1601 RPTOPR 2 1 1 HASH JOIN
1601 RPTOPR 3 2 1 REMOTE BLIST_DTLS_RESN_CODE
1601 RPTOPR 4 2 2 HASH JOIN
1601 RPTOPR 5 4 1 TABLE ACCESS T1
1601 RPTOPR 6 4 2 REMOTE


--okay



--think driving_site should also work, but have no chance to test again.


The join operations group of hints controls how joined tables merge data together. A join
operation may direct the optimizer to choose the best path for retrieving all rows for a query
(throughput) or for retrieving the first row (response time).
while ORDERED tells the optimizer to join the tables based on their
order in the FROM clause using the first table listed as the driving table (accessed first).

--tuned queries from hours to seconds using this method


However, as to root casue , optimizer is still a black box to us!

structure of redo.log is exactly the same as its archived log

What happens when issue below command in the wrong database

recover data using backup control file until cancel;

This could be happened on Windows platform, if you unfortunately have the following settings.

1. Has below setting on both client & server node in sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES = (NTS) #allow local login as sysdba
2. has the tnsnames.ora to point to remote database.

As REMOTE_OS_AUTHENT has no affect on Windows Native Authentication. The same information is passed
regardless of this setting.
-- bad control caused the disaster


Hence, after press enter, the database is likely become recovery mode. (You may not know immediately ...).

--think control file status changed

Finally you will know ...


When you try to recover it :

--you can't use real backuped control file without restore all backed up datafiles
--startup mount
--RECOVER database using backup controlfile until cancel ; (this control file is current file which just corrupted by the command)


it complains can't find the last archived log file . e.g. TEST10P_001_0627748383_17328.ARC
actually, it is the CURRENT redo log.


So what I did is , copy the redo.log to archived directory and rename it to TEST10P_001_0627748383_17328.ARC


then tried the command again


ORA-279 signalled during: ALTER DATABASE RECOVER database using backup controlfile until cancel ...
Fri May 23 16:16:58 2008
ALTER DATABASE RECOVER CONTINUE DEFAULT
Fri May 23 16:16:58 2008
Media Recovery Log G:\TESTDB\ORAARCH2\TEST10P\TEST10P_001_0627748383_17328.ARC


RECOVER database using backup controlfile until cancel ;

cheers ! I saw the message: Media Recovered.

Finally, "alter database open resetlogs;"


We are saved!

Heart beat dropped from 200 to 65.



From this case, I guess the file structure of redo.log is exactly the same as its archived log.

DBMS package for tracing

--using dbms_system
execute dbms_system.set_sql_trace_in_session(9,190,TRUE);

--You can also initiate a TRACE for the session that you are in using the DBMS_SESSION package.

execute DBMS_SESSION.SET_SQL_TRACE (TRUE);

--10g

DBMS_MONITOR

Sunday, May 25, 2008

白狐

曾经沧海难为水,除却巫山不是云。
取次花丛懒回顾,半缘修道半缘君。

Friday, May 23, 2008

the after images

Today boss asked me log file is before image or after image.

I am confused, as the term "after image" ...

Google it and got the followings, ... , ring boss a call:

Oracle Log Buffer

Oracle creates redo logs for all update operations. In case of disk failure, the redo logs are used to roll forward, since they contain the after image of all row changes.

First, the after images are written to the log buffer area of RAM. The LGWR background process then transfers the images to Oracle online redo log files. In the last step, the ARCH background process writes the online redo log files to the archived redo log file system, where they are available to recover the database in the event of disk failure.

However, some shops avoid this overhead by using triple-mirrored disks and running their databases in NOARCHIVELOG mode. These shops believe the high redundancy is sufficient protection from a disk crash, and they deliberately reject the ability to roll the database forward, in return for faster performance.

Thursday, May 22, 2008

redo log size

CHECK V$MTTR_TARGET_ADVICE and v$log_history for log switch ,
and optimal log size , which is influced by mttr target

The view V$INSTANCE_RECOVERY contains a new column, OPTIMAL_LOGFILE_SIZE,
which recommends a minimum size for the redo log files:
SQL> select optimal_logfile_size from v$instance_recovery;

Wednesday, May 21, 2008

Determining the Bad Source of statistics

Basic Analysis:
1.Is the rows estimate from explain plan correct?
2.Is the rows estimate correct in all_[tab|part]_col_statistics?
3.Is the sample size large enough to give accurate row estimates?
4.Obtain accurate stats and recheck the plan

Monday, May 19, 2008

About Histogram

Generally help with skewed data
column level statistics (include index statistics and histrogram)

Saturday, May 17, 2008

Abandom Notepad++

Finally I uninstall it, which was with me more than 3 years.

The reason is that it is boycotting Beijing Olympic 2008. Don't have any reason to appreciate such a programmer.

There are quite a few replacement listed here:

--gvim
--Textpad
--Programmer's Notepad

Cheers. I am not more feeling disgusting whenever using the text editor.

block's life of Full Table Scan

When a block needs to be read into the buffer cache, Oracle must first find a free buffer in
which to store the block. The process searches the LRU list, beginning at the least recently used
end. It will search until it finds a free buffer. If it cannot find any, it will signal the DBW0 process
to flush any dirty blocks back to disk in order to make room. If no dirty blocks exist, the least
recently used block will be aged out to make room. The block will then be written to the buffer,
and the buffer moves to the most recently used end of the LRU list.

The exception to this rule is when a full table scan operation is performed. Blocks retrieved
from a full table scan are added to the least recently used end of the LRU list, so they will be aged
out quickly. This is because full table scan blocks are generally scanned quickly and are no longer
needed. This functionality can cause problems for small tables that are accessed frequently, such
as lookup tables. Small tables are meant to be accessed via full table scans. This is because a full
table scan will outperform an index lookup on a very small table.
Because Oracle 10g will put these blocks at the least recently used end of the LRU list, they
will age out quickly. As a result, the next time that the table is accessed, Oracle 10g may likely
have to perform PIO to retrieve the blocks again. In this situation, you can add the CACHE clause
to the table (either via ALTER TABLE or in the CREATE TABLE statements) to circumvent this
behavior. You can also choose to pin the table in memory.

remove external procedures

By default, Oracle creates a service for external procedures in the listener. For the majority
of users, this service is never used. Therefore, the service should be removed from the listener.

Database checkpoint

Database checkpoints are closely tied to redo log file switches. A checkpoint is an event that
flushes the modified data from the buffer cache to the disk and updates the control file and datafiles.
The CKPT process updates the headers of datafiles and control files; the actual blocks are
written to the file by the DBWn process. A checkpoint is initiated
When the redo log file is filled and a log switch occurs.
When the instance is shut down with NORMAL, TRANSACTIONAL, or IMMEDIATE.
When a tablespace status is changed to read-only or put into BACKUP mode.
When other values specified by certain parameters (discussed later in this section) are
reached.
You can force a checkpoint if needed, as shown here:
ALTER SYSTEM CHECKPOINT;
Forcing a checkpoint ensures that all changes to the database buffers are written to the datafiles
on disk.
Another way to force a checkpoint is by forcing a log file switch:
ALTER SYSTEM SWITCH LOGFILE;

Automatic Checkpoint Tuning
===========================
Oracle Database 10g supports automatic checkpoint tuning. It is enabled if
fast_start_mttr_target is explicitly set to an non-zero value, or
if fast_start_mttr_target is not set at all.
It is an advancement over the MTTR related parameter introduced in
earlier versions. The idea is to use the periods of low I/O usage
to advance checkpoints and therefore improve availability.

How it works
============
Enabling fast-start checkpointing increases the average number of writes
per transaction that DBWn issues for a given workload. However, if the
system is not already near or at its maximum I/O capacity, then
fast-start checkpointing has a negligible impact on performance

How to monitor
==============
View V$MTTR_TARGET_ADVICE will give information on the additional I/O's
on various values of FAST_START_MTTR_TARGET.

PROs and CONs:
=============
- FAST_START_MTTR_TARGET is set to a low value :
Fast-start checkpointing is more aggressive.
The average number of writes per transaction that DBWn issues
is higher in order to keep the thread checkpoint sufficiently
advanced to meet the requested MTTR.
- FAST_START_MTTR_TARGET is set to a high value:
Fast-start checkpointing in less aggressive, and the average
number of writes per transaction that DBWn issues is lower.

rows returned order

In heap-organized tables and traditional hash clusters, the order in which
rows are returned is not under user control and depends on internal algorithms and the relative
physical location of data blocks on disk.
For each hash cluster key, Oracle maintains a list of
rows sorted by one or more sort columns.

Thursday, May 15, 2008

object's precedence

There is an order of precedence with regards to the use of synonyms and local objects. This is:

1. Local objects will always be accessed first.

2. If a local object does not exist, the object with a private synonym will be accessed.

3. If a private synonym does not exist or the object does not exist, then the public synonym will be used.

Note that a synonym can be created for an object that does not exist, and an object with an associated synonym can be dropped without removing the synonym. This can cause all sorts of interesting problems for DBA’s, so be careful.

Monday, May 12, 2008

Ojbect Namespace

the namespace shared by tables and views, and the database has separate
namespaces for each of the following:
Indexes
Constraints
Clusters
Database triggers
Private database links
Dimensions
Roles
Public synonyms
Public database links
Tablespaces
Profiles
Parameter files (PFILEs)

reconfig DB Console HTTP port number

cat $ORACLE_HOME/install/portlist.ini

with the arguments to configure DB Control and Agent Port number

emca -reconfig ports -DBCONTROL_HTTP_PORT 5504 -AGENT_PORT 1834

STARTED EMCA at May 12, 2008 2:08:16 PM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.

Enter the following information:
Database SID: MNPP

Do you wish to continue? [yes(Y)/no(N)]: Y
May 12, 2008 2:08:24 PM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /software/oramnp/product/10.2.0/cfgtoollogs/emca/MNPP/emca_2008-05-12_02-08-16-PM.log.
May 12, 2008 2:08:24 PM oracle.sysman.emcp.util.DBControlUtil stopOMS
INFO: Stopping Database Control (this may take a while) ...
May 12, 2008 2:08:38 PM oracle.sysman.emcp.util.DBControlUtil startOMS
INFO: Starting Database Control (this may take a while) ...
May 12, 2008 2:10:20 PM oracle.sysman.emcp.EMDBPostConfig performReconfiguration
INFO: >>>>>>>>>>> The Database Control URL is http://mnp:5504/em <<<<<<<<<<<
Enterprise Manager configuration completed successfully
FINISHED EMCA at May 12, 2008 2:10:20 PM

example of index suppressed

The conversion function to_number(), causes existing index not chosen.

use to_char function on the right side instead. i.e

WHERE CDT_ACCT_NBR = to_cahr(:1)


SQL> @chk_sqltext

933 DBLN_RPTP SELECT "CDT_ACCT_NBR","CDT_COMPANY_NAME" FROM "DBO"."CUST_DTLS"
933 DBLN_RPTP "B" WHERE TO_NUMBER("CDT_ACCT_NBR")=:1

SQL> @chk_sqlplan

933 DBLN_RPTP 0 3699 SELECT STATEMENT
933 DBLN_RPTP 0 3699 SELECT STATEMENT
933 DBLN_RPTP 1 0 1 TABLE ACCESS CUST_DTLS
933 DBLN_RPTP 1 0 1 TABLE ACCESS CUST_DTLS

SQL> @chk_indcol
Enter value for tbl: CUST_DTLS
old 1: select index_name,column_name,column_position from dba_ind_columns where table_name='&tbl'
new 1: select index_name,column_name,column_position from dba_ind_columns where table_name='CUST_DTLS'

PK_CUST_DTLS
CDT_ACCT_NBR
1


Sunday, May 11, 2008

ORA-16014: log 2 sequence# 12 not archived, no available destinations

Mon May 12 16:43:32 2008

ORA-16014: log 2 sequence# 12 not archived, no available destinations

ORA-00312: online log 2 thread 1: '/apps/oracle/oradata/OCP10G/redo02.log'

Mon May 12 16:43:32 2008

Errors in file /apps/oracle/admin/OCP10G/bdump/ocp10g_arc0_10273.trc:

ORA-16014: log 2 sequence# 12 not archived, no available destinations

ORA-00312: online log 2 thread 1: '/apps/oracle/oradata/OCP10G/redo02.log'

Mon May 12 16:43:51 2008

Read of flashback database logfile 33, block 685 found corrupted data.

See trace file for more information.

Reread of flashback database logfile 33, block 685 found same corrupted data

Read of flashback database logfile 33, block 685 found corrupted data.

See trace file for more information.
Reread of flashback database logfile 33, block 685 found same corrupted data



Above error observed in my test database, verified that disk space still has enough space.

-- only 3 groups of redo log;
--the 2nd group looks corrupted and can't be resued.

alter system archive log all to '/home/oracle/';


SQL> alter database clear unarchived logfile group 2;

Database altered.

SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM
------------- ---------
1 1 18 52428800 1 NO CURRENT
1402127 12-MAY-08

2 1 0 52428800 1 YES UNUSED
1311896 29-APR-08

3 1 17 52428800 1 YES INACTIVE
1401025 12-MAY-08


SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM
------------- ---------
1 1 21 52428800 1 NO CURRENT
1405111 12-MAY-08

2 1 19 52428800 1 YES ACTIVE
1405094 12-MAY-08

3 1 20 52428800 1 YES ACTIVE
1405099 12-MAY-08


SQL> alter system switch logfile;

System altered.

SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM
------------- ---------
1 1 27 52428800 1 YES ACTIVE
1405140 12-MAY-08

2 1 28 52428800 1 NO CURRENT
1406092 12-MAY-08

3 1 26 52428800 1 YES INACTIVE
1405138 12-MAY-08



REF: http://www.dbforums.com/archive/index.php/t-1119443.html

http://www.databasedesign-resource.com/oracle-log-files.html



Review: This is a rare encountered scenario of redo log corrupted.

Clear A Log File If It Has Become Corrupt And Avoid Archiving

ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP ;

another relevant command is

1. This statement overcomes two situations where dropping redo logs is not possible: If there are only two log groups and if the corrupt redo log file belongs to the current group:
ALTER DATABASE CLEAR LOGFILE GROUP 4;


Clearing Online Redo Log Files
Under certain circumstances, a redo log group member (or all members of a log group) can
become corrupted. To solve this problem, you can drop and re-add the log file group or group
member. It is much easier, however, to use the ALTER DATABASE CLEAR LOGFILE command. The
following example clears the contents of redo log group 3 in the database:
ALTER DATABASE CLEAR LOGFILE GROUP 3;
Another distinct advantage of this command is that you can clear a log group even if the database
has only two log groups and only one member in each group. You can also clear a log
group member even if it has not been archived by using the UNARCHIVED keyword. In this case,
it is advisable to do a full database backup at the earliest convenience, because the unarchived
redo log file is no longer usable for database recovery.

Saturday, May 10, 2008

Useful v$acive_session_history

SQL> desc v$active_session_history;
Name Null? Type
----------------------------------------- -------- ----------------------------
SAMPLE_ID NUMBER
SAMPLE_TIME TIMESTAMP(3)
SESSION_ID NUMBER
SESSION_SERIAL# NUMBER
USER_ID NUMBER
SQL_ID VARCHAR2(13)
SQL_CHILD_NUMBER NUMBER
SQL_PLAN_HASH_VALUE NUMBER
FORCE_MATCHING_SIGNATURE NUMBER
SQL_OPCODE NUMBER
SERVICE_HASH NUMBER
SESSION_TYPE VARCHAR2(10)
SESSION_STATE VARCHAR2(7)
QC_SESSION_ID NUMBER
QC_INSTANCE_ID NUMBER
BLOCKING_SESSION NUMBER
BLOCKING_SESSION_STATUS VARCHAR2(11)
BLOCKING_SESSION_SERIAL# NUMBER
EVENT VARCHAR2(64)
EVENT_ID NUMBER
EVENT# NUMBER
SEQ# NUMBER
P1TEXT VARCHAR2(64)
P1 NUMBER
P2TEXT VARCHAR2(64)
P2 NUMBER
P3TEXT VARCHAR2(64)
P3 NUMBER
WAIT_CLASS VARCHAR2(64)
WAIT_CLASS_ID NUMBER
WAIT_TIME NUMBER
TIME_WAITED NUMBER
XID RAW(8)
CURRENT_OBJ# NUMBER
CURRENT_FILE# NUMBER
CURRENT_BLOCK# NUMBER
PROGRAM VARCHAR2(64)
MODULE VARCHAR2(48)
ACTION VARCHAR2(32)
CLIENT_ID VARCHAR2(64)


A good usage is to find ou session info , e.g username, machine , program , based on sql_id

1. from sql_id from v$sqlarea
2. join v$session and v$session_active_history to get the relevant info

select username, osuser, module, machine from v$session
where sid = (select session_id from v$active_session_history where sql_id='gf75mx4wpah0s');

-- more useful table should be

DBA_HIST_ACTIVE_SESS_HISTORY

which has user_id already.

SESSION_ID NUMBER Session identifier
SESSION_SERIAL# NUMBER Session serial number (used to uniquely identify a session's objects)
USER_ID NUMBER Oracle user identifier
SQL_ID VARCHAR2(13) SQL identifier of the SQL statement that is currently being executed



The insufficiency of v$active_session_history is its retention should be limited by ASH buffer size.
-- select * from v$sgastat where name like '%ASH%';

Create DB Control

Problem 1 : Running EMCA Fails To Accept the SYSMAN Password

EMCA ON RAC RETURNS INVALID USERNAME / PASSWORD FOR SYS , DBSMNP , SYSMAN USER When running " $./emca -config dbcontrol db -repos create "

Workaround from metalink:
When the emca command prompts for the sysman password, provide the sysman password in double quotes.

I think below steps to re-create DB control can also fix it.

and many funning problem with DB Console . (if u don't want to buy license for grid control, and want to running many instances on same server, even more complex, they are under different oracle owner id, e.g. oracle1, oracle2)

You may face the following problems:
-- port number conflict created together with dbua
-- database looks hang when re-create repository
-- can't see graphic of active session and CPU info in first page, with java error message java.lang.Exception: No Such Target


To avoid above funny problems is to re-create DB control

1. Re-config ports if conflicts exist.
-- verify this by check portlist.ini under $ORACLE_HOME/install
-- Type below magic command, it looks can take care all port at unix id group level (oracle1, oracle2 belong to same group dba).
> emca -reconfig ports

STARTED EMCA at May 9, 2008 10:10:12 AM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.

Enter the following information:
Database SID
...
INFO: >>>>>>>>>>> The Database Control URL is http://testdb:5503/em <<<<<<<<<<<>export ORACLE_HOSTNAME (FOR using virtual hostname)
> emca -deconfig dbcontrol db
> emca -config dbcontrol db -repos create

to indicate port number for http port and agent port , use below two parameters with emca

  • -DBCONTROL_HTTP_PORT <port_number>

    This port number is used in the Database Control Console URL. For example, if you set this port to 5570, you can then display the Database Control Console using the following URL:

    http://host.domain:5570/em
  • -AGENT_PORT <port_number>

    This port is used by the Database Control Management Agent, which is monitoring and administering the database for the Database Control.


REF:Oracle® Enterprise Manager Advanced Configuration
10g Release 2 (10.2)

对ORACLE DBA 这份职业的看法

可以做的很久,如果够强,足以养老的职业
觉得是有前途(钱途)的工作!
关键是要过硬本领,要强
一个足够惊心动魄的工作
一个足以显示重要的工作
一个有激情的工作,在激情中时刻保持冷静分析

一般是一个比较闲的工作,一个需要不断学习进步的工作
一个需要理论+实践并重的工作,想出什么都要实践验证,动手动脑并重
关键时刻能一击必胜,:-)

Effect of creating missing index

From AWR report of one OLTP database, I identified three missing index based on top buffer gets and physical reads. After creating them with eager, the outcome is tremendous to me -almost improved 100%.

Comparing Period: 3pm ~ 6pm of 8-May and 9 -May (in seconds)

DB Time (=response time?) : 39317s -- > 20526s (database become less busy)
Service Time: 23709s --> 10767s ( less CPU time)
Wait Time: 15608s -- > 9759s (DB Time - Service Time)
Buffer Hit Ratio: 75% --> 89% (some more to tune)



TIP
Tuning the top 25 buffer get and top 25 physical get queries has
yielded system performance gains of anywhere from 5 percent to
5000+ percent in my tuning. The SQL section of the STATSPACK
report tells you which queries to consider tuning first. The top 10 SQL
statements should not be substantially more than 10 percent of your
buffer gets or disk reads.

Thursday, May 08, 2008

large pool for shared server + sga_target

Tue May 6 11:43:54 2008
Errors in file /software/oraprx/admin/PRXP/bdump/prxp_s001_26546.trc:
ORA-00600: internal error code, arguments: [17183], [0xC0000000651B8AF8], [], [], [], [], [], []
ORA-02071: error initializing capabilities for remote database CATP.WORLD
ORA-04031: unable to allocate 27168 bytes of shared memory ("large pool","SELECT ACCOUNT_NO FROM IVOC_...","session heap","oper caps")

Tue May 6 13:10:35 2008
Errors in file /software/oraprx/admin/PRXP/bdump/prxp_s003_26550.trc:
ORA-00600: internal error code, arguments: [17285], [0xC000000065C3DE20], [1], [0xC0000000A78B0EC8], [], [], [], []
ORA-04031: unable to allocate 2520 bytes of shared memory ("large pool","unknown object","session heap","koh-kghu session heap")


- Set SORT_AREA_SIZE, SORT_AREA_RETAINED_SIZE and HASH_AREA_SIZE to much lower
values, say, 64K – 1MB to resolve this problem. The smaller value for these
parameters will be OK for an OLTP system.

When we use Shared Servers (MTS), memory used for sort and hash area will be
allocated from the large pool for the MTS connections even if you are configuring
Automatic PGA feature by setting PGA_AGGREGATE_TARGET and WORKAREA_SIZE_POLICY.

If you also set PGA_AGGREGATE_TARGET parameter to a non-zero value and
WORKAREA_SIZE_POLICY to auto, it will be used by dedicated server connections
in the database for sort and hash area memory but not the MTS connections.

Wednesday, May 07, 2008

Check current sql execution plan and historical plan

get sql_id from v$sqlarea

alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';

--for current execution plan
select operation,object_name,id,parent_id, position,cost,timestamp from v$sql_plan where sql_id='6y9n667q3wgb6';

--for historical plan
select operation,object_name,id,parent_id, position,cost,timestamp from dba_hist_sql_plan where sql_id ='01y6hmraf8kz6' order by timestamp, id,parent_id,position;

--based on count(distinct(timestamp))>1 , you should be alerted that execution plan is changed.

alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';

select sql_id, count(distinct(timestamp)) from dba_hist_sql_plan where object_owner not in ('SYS','SYSMAN','MGMT_BSLN','DBSNMP') group by sql_id having count(distinct(timestamp))>1;

Import statistics failed due to object out of synch

Elapsed: 00:00:05.57
SQL> EXEC DBMS_STATS.IMPORT_SCHEMA_STATS('DBO','STAT_TABLE',statid=>'run_new',statown=>'DBO');
BEGIN DBMS_STATS.IMPORT_SCHEMA_STATS('DBO','STAT_TABLE',statid=>'run_new',statown=>'DBO'); END;

*
ERROR at line 1:
ORA-20000: partition "CCE_OCT2004" of table "DBO"."CUSTOMER_CONTACT_EVENT" does not exist or insufficient privileges
ORA-06512: at "SYS.DBMS_STATS", line 6712
ORA-06512: at "SYS.DBMS_STATS", line 6738
ORA-06512: at "SYS.DBMS_STATS", line 7270
ORA-06512: at line 1


Elapsed: 00:00:12.92
SQL>
SQL> spool off

select * FROM dbo.stat_table where c5 = 'DBO' and c2='CCE_OCT2004';


SQL> delete FROM dbo.stat_table where c5 = 'DBO' and c2='CCE_OCT2004';

29 rows deleted.

Elapsed: 00:00:00.05

SQL> EXEC DBMS_STATS.IMPORT_SCHEMA_STATS('DBO','STAT_TABLE',statid=>'run_new',statown=>'DBO');
SQL> commit;

sga_target and its compoents

Although we see Oracle make it easier to dynamically manager its compoments: shared pool, large pool, buffer cache , java pool .

However, based on my bleeding experience, it is better estimate & set a minimal value for them, especially for:
1. shared pool
2. large pool if you are using shared server.

monitor its historical size and set at least 85% of maximum size as minimal size, is my recommendation.

Too small value of shared pool may result in execution plan changed.

Too small large pool may caused ora-4031 related to large pool (shared server connection)

Below two statements help.
--check on the breakdown of sga usage.
set linesize 132 pagesize 5000
col snap_time format a30
select s2.begin_interval_time snap_time,pool sga_component,sum(bytes) bytes from
dba_hist_sgastat s1, dba_hist_snapshot s2
where s1.pool is not null
and s1.snap_id=s2.snap_id
group by s2.begin_interval_time,pool
union
select s2.begin_interval_time snap_time,name sga_component, bytes from dba_hist_sgastat s1, dba_hist_snapshot s2
where name ='buffer_cache'
and s1.snap_id=s2.snap_id
ORDER BY 2,1
/

select component,current_size from v$sga_dynamic_components;

COMPONENT CURRENT_SIZE
---------------------------------------------------------------- ------------
shared pool 671088640
large pool 536870912
java pool 16777216
streams pool 0
DEFAULT buffer cache 1493172224
KEEP buffer cache 0
RECYCLE buffer cache 0
DEFAULT 2K buffer cache 0
DEFAULT 4K buffer cache 0
DEFAULT 8K buffer cache 0
DEFAULT 16K buffer cache 0
DEFAULT 32K buffer cache 0
ASM Buffer Cache 0

13 rows selected.

SQL> alter system set shared_pool_size=750M;

System altered.

SQL> set pages 1000
SQL> select component,current_size from v$sga_dynamic_components;

COMPONENT CURRENT_SIZE
---------------------------------------------------------------- ------------
shared pool 788529152
large pool 536870912
java pool 16777216
streams pool 0
DEFAULT buffer cache 1375731712
KEEP buffer cache 0
RECYCLE buffer cache 0
DEFAULT 2K buffer cache 0
DEFAULT 4K buffer cache 0
DEFAULT 8K buffer cache 0
DEFAULT 16K buffer cache 0
DEFAULT 32K buffer cache 0
ASM Buffer Cache 0

13 rows selected.

Friday, May 02, 2008

RHEL/CentOS5 suspend

Still in the process of recovery my centos 5's disaster. Still feel funny with RHEL &CentOS, as my current hardware setting works fine with ubuntu and suse linux.

Today did some study and listing download below URL. hope they will help.

http://mhensler.de/swsusp/
http://tuxonice.net/
http://kbase.redhat.com/faq/FAQ_43_7121.shtm
http://www.thinkwiki.org/wiki/Problems_with_ACPI_suspend-to-ram
http://www.thinkwiki.org/wiki/How_to_make_ACPI_work
http://www.thinkwiki.org/wiki/Problem_with_display_remaining_black_after_resume


lease note that Red Hat Support will not answer any questions about this package and cannot officially support this. Red Hat Support is not responsible for any problems that may arise out of using this package. However, these instructions plus the comments in the package should be very helpful to most users.

The package can be downloaded here: http://people.redhat.com/vanhoof/acpi-suspend/RPM/

Execute the command rpm -ivh acpi-sleep-*. It contains the following files, which should be placed in the directories indicated:

/etc/acpi/actions/sleep.sh
/etc/acpi/events/lid.conf
/etc/acpi/events/sleep.conf
/etc/acpi/events/sample.conf

The only file that needs to be modified is /etc/acpi/actions/sleep.sh. Read through the comments in that file to determine which items you need to enable and which you can comment out. It might take some experimentation to get the settings right for your system.

On some systems, it may also be necessary to boot with a special kernel parameter. If the system does not properly suspend to ram after making the above changes (or if it does not properly recover from sleep mode), try adding acpi_sleep=s3_bios to your kernel line in /boot/grub/grub.conf. An example of this change is below:

title Red Hat Enterprise Linux AS (2.6.9-22.0.1.ELsmp)
root (hd0,0)
kernel /vmlinuz-2.6.9-22.0.1.ELsmp ro root=/dev/VolGroup00/LogVol00 rhgb quiet acpi_sleep=s3_bios
initrd /initrd-2.6.9-22.0.1.ELsmp.img

Another factor to consider is the BIOS settings for your system. This
is something that is largely system dependent, making it hard to offer
specific advice. Generally speaking, though, you may want to disable
extraneous features in the BIOS to rule them out as factors in the
suspend-to-ram functionality. Again, experimenting with different
settings may be necessary.

Thursday, May 01, 2008

centos 5 can't suspend

this is the 2nd disaster, i accidentally click the icon in centos 5 menu.
i go to suspend , but when it wake up. the partition file system ext3 becomes damage.

no choice , losing data even attach to another linux and run fsck to fix it.

have to reinstall it from beginning.

No wonder , all because TODAY is labor day.

Crying ....

VMware-player-2.0.2-59824.exe caused system keeping restart

symptoms like hit msblast virus immediate after I install the vmware player of version mentioned above

the error message is lsass.exe return code 1073741819.

solution:
upon logon , within 1 minutes you need to finish the followings.
1. run/cmd, type command "shutdown -a" to stop the coming reboot.
2. uninstall vmware.
3. the rest : you may take after this big shock !!!
3.1 run windows update
3.2 run all anti virus/spyware softwares you have


BIG SHOCK TO ME ON THIS LABOR DAY!

How to check datatype length

ySQL> desc birthdates;
Name Null? Type
----------------------------------------- -------- ----------------------------
CLIENT_ID NUMBER
BIRTHDATE DATE

SQL> select * from birthdates;

CLIENT_ID BIRTHDATE
---------- ---------
1 01-MAY-08
2 01-MAY-08

The DUMP function shows the datatype, length (number of bytes), and the actual byte values
for a particular element.

SQL> select dump(client_id) from birthdates;

DUMP(CLIENT_ID)
--------------------------------------------------------------------------------
Typ=2 Len=2: 193,2
Typ=2 Len=2: 193,3

SQL> select dump(birthdate) from birthdates;

DUMP(BIRTHDATE)
--------------------------------------------------------------------------------
Typ=12 Len=7: 120,108,5,1,14,3,2
Typ=12 Len=7: 120,108,5,1,14,3,30