Redundancy

I have been thinking about our DR process, and how to improve it. More on that in a later post. However we recently had a couple of planned server room outages, where my previous planning has been beneficial.

Production is redundant and resilient across our server rooms. However, the other environments are not because redundancy is expensive and only really needed for production systems.

So what we do is to have half of the non-production systems at one site. and the other half at the other. This means that we can shut a machine room down, and half the development environments will continue to run.

Rman Fail

Here is another issue we had with RMAN. This one has been bugging me for years.

We were doing a duplicate using rman. For some reason the recovery catalogue didn’t contain the archive log we needed to do the recovery, so the restore completed, and finished with the familiar error:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
...
executing command: SET until clause

Starting recover at 2018-02-02 16:03:13

starting media recovery

unable to find archived log
archived log thread=1 sequence=307
Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: 'system.dbf'

The normal thing to do here is to correct the error, and redo the backup. But we already had the redo logs on the disc, and wanted to be able to apply them.

More Crash Consistent Recovery

Different Scenarios

In my previous post on this topic, I noted that you could use the snapshot time on the recover database command to recover the database from a SAN snapshot.

I realise that there are different possible scenarios and my write up wasn’t clear on which approach is applicable when. Also, the test I did was unrealistic as I used the logs after the snapshot was taken, and the whole point of using SAN snapshots is that they contain everything required for a crash consistent recovery.

Recovery Manager Problems

Lots of people seem to like Oracles Recovery manager. I am not one of them. I think this is because of a lack of understanding on my part of how it works. It is a complex beast, and at the same time has some annoying limitations.

I like to automate things. I have a number of scripts to call RMAN to do backups and restores in common situations. These fail far too often for my liking. I feel I should look into why. Maybe I can learn to love RMAN? We shall see.

High Water Mark

We are running a data conversion and got a wait event I don’t normally see: Oracle enterprise manager showing configuration waits The brown is identified as Configuration. It I drill down, I can see more detail. Oracle enterprise manager showing High Watermark waits Here light purple is HW Contention (i.e. High Watermark Contention). Darker purple is Write Complete waits, and yellow is buffer busy waits.

We have deferred segment creation switched on for the database. This means that the segment needs to be created before data can be written. The high watermark can only be moved by one process at a time.

Exporting Statistics

This was surprisingly more difficult than I expected. We know that we can export stats from the dictionary to a table, and from the table to a file, and that file can be copied and imported to another database for the stats to be imported. Easy right?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Dec 19 08:55:39 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL> exec DBMS_STATS.CREATE_STAT_TABLE('SCHEMA','MYSTATS','USERS');

PL/SQL procedure successfully completed.

SQL> @export_stats
BEGIN
DBMS_STATS.EXPORT_TABLE_STATS ( 'SCHEMA', 'MYTABLE', NULL, stattab => 'MYSTATS');
END;

*
ERROR at line 1:
ORA-20002: Version of statistics table "SCHEMA"."MYSTATS" is too old.  Please
try upgrading it with dbms_stats.upgrade_stat_table
ORA-06512: at "SYS.DBMS_STATS", line 18000
ORA-06512: at line 1

Hang on, I just created it! How can it be too old? Still, I will upgrade it if it insists.

Parsing

We are running a data conversion. The powers that be decided to use APIs to convert the data as they contain error checking. The problem is that they are generally designed for interactive use updating one row at a time, so they are very slow to update large batches of data.

This was tuned and is getting much faster, however we noticed that there are a lot of waits on cursor: pin: S wait on X. From experience I know that this is caused by excessive hard parsing. However, I couldn’t find any statements with a lot of hard parses. The most was about 40, so nothing like the thousands which were a cause of this problem early in the release cycle of 12c, where cursors weren’t being shared properly (Bug 20476175 fixed by the patch of the same number).

Crash Consistent Recovery

Problem

Since Oracle 12c, you can recover a crash consistent snapshot. Oracle support note 604683.1 says how to do this.

We had an issue where the recovery wanted to effectively run to the end of time, and wouldn’t ever finish. No matter how many logs were applied, it said:

1
2
3
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/CS_SR/system/system01.dbf'

Solution

The solution is the snapshot time of the recovery clause. To demonstrate I have a database being recovered from a snapshot, and here are some archived redo logs from the source database.