2016-09-15 | Adam Boliński

ORA-600 [ktbdchk1: bad dscn] in DG/ADG

I was working on client site and few weeks ago and I was facing first time  with problem   ORA-600 [ktbdchk1: bad dscn] which  cause applications problem , last week I was facing the same problem once again so I decide to dig more to regenerate this bug and   try to resolve problem more successful than Oracle patch.

So let’s start to investigate this ORA-600 , this bug is very well know to Oracle as

Bug 22241601  ORA-600 [kdsgrp1] ORA-1555 / ORA-600 [ktbdchk1: bad dscn] due to Invalid Commit SCN in INDEX block

This Bug is related to Invalid Itl Commit SCN in INDEX blocks (object_type=INDEX) causing dependent scn violations

There is NO DATA CORRUPTION in the INDEX block.

In Symptoms we can se this errors :

ORA-1555
ORA-600 [2663]
ORA-600 [kdsgrp1]
ORA-600 [ktbdchk1: bad dscn]

This error is raising up when you have Data Guard configuration and switching between the sites and using combination of ADG.

This is what we was looking for , but how we can resolve this one :

Affects:

Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 11.1 but BELOW 12.2
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

Fixed:

The fix for 22241601 is first included in

As you can see fix for this bug  will be included in 12.2 , but how we can resolve this one  ? I found Patch 22241601 for client RDBMS version 11.2.0.4.8 , but what is very interesting is note in  the information we can see of this patch :

Install the fix to prevent this issue from being introduced; installing the fix also by default tries to repair existent invalid ITL commit scn (healing). There is not need to set any parameter for it. Databases already having init.ora parameter _ktb_debug_flags=8 can remove the parameter after the fix is installed as _ktb_debug_flags=8 is now the default so the healing is enabled by default. Note that this fix is not disabled if _ktb_debug_flags is set to 0; a value not including 8 will only disable the healing for already affected ITL SCN but the fix still solves the problem of the invalid ITL SCN in the INDEX to be introduced as it does not depend of any parameter. Sometimes the fix may not repair the block for an already existent invalid SCN on disk

To be precise we must know that _ktd_debug_flags is designed to heal blocks having invalid dependent scn’s on switchover operations.  I know from my experience that this patch not help in few of my customers but to investigate this let’s try to reproduce this error in my testing environment.

Bug is showing up in  Dataguard configuration after switchover/failover so I prepare Data Guard configuration with 5TB data and done a lot of  DML operation  on it , after that a lot switches between Standby/Production and  I finally got it :

ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident.

Let’s install patch 22241601 for my testing environment Production and Standby , let’s see what happened after that .  Few minutes latter after Patch installation I got the same error messages when I try to manipulate data on table with suffering index.

ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident.

We know that best way to investigate  cases like this is to use dbverify , output will by us follows :

itl[<itl_id>] has higher commit scn(aaa.bbb) than block scn (xx.yy)
Page <Block#> failed with check code 6056

We get hex for corrupted  block id and we recalculate  for object id which is  regarding some index object.

In my test for 5TB I got a lot of bad pages in many files , so I decided as MOS is advice to rebuild affected index objects.  After recreation of 8 Index Objects I rediscover that not all recreated Index objects was in corrected state,  3 of them (very large index , small free size in data files) was with the same  bug as before.  I assume that this recreated objects fill once again corrupted blocks, but I did not trace this behavior more deep, of course this can be resolved with recreate objects in new TBS but sometimes clients do not have such free space in their hardware.

Last my idea was  just try to see how this error will be reproduce when I enable db_block_checking in FULL mode. I done this time consuming recreation of bug once again and repeat  5 times and I must say that I do not faced with this bug when I have FULL db_block_checking.

I will be investigating this bug more detail in near future but let me know if my  tests will be helpful  for someone facing the same bug.