More Recovery Manager Problems

A nice feature of RMAN is that if a restore fails, and you rerun it, it realises that it doesn’t have to redo all the work it has already done. At least, that is usually what happens. Today we got: 1 2 3 4 5 6 7 8 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 03/29/2018 12:54:29 RMAN-05501: aborting duplication of target database RMAN-03015: error occurred in stored script Memory Script RMAN-06004: ORACLE error from recovery catalog database: RMAN-20003: target database incarnation not found in recovery catalog As documented in oracle support note 2036644.

Redundancy

I have been thinking about our DR process, and how to improve it. More on that in a later post. However we recently had a couple of planned server room outages, where my previous planning has been beneficial. Production is redundant and resilient across our server rooms. However, the other environments are not because redundancy is expensive and only really needed for production systems. So what we do is to have half of the non-production systems at one site.

Rman Fail

Here is another issue we had with RMAN. This one has been bugging me for years. We were doing a duplicate using rman. For some reason the recovery catalogue didn’t contain the archive log we needed to do the recovery, so the restore completed, and finished with the familiar error: 1 2 3 4 5 6 7 8 9 10 11 12 13 ... executing command: SET until clause Starting recover at 2018-02-02 16:03:13 starting media recovery unable to find archived log archived log thread=1 sequence=307 Oracle Error: ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01152: file 1 was not restored from a sufficiently old backup ORA-01110: data file 1: 'system.

More Crash Consistent Recovery

Different Scenarios In my previous post on this topic, I noted that you could use the snapshot time on the recover database command to recover the database from a SAN snapshot. I realise that there are different possible scenarios and my write up wasn’t clear on which approach is applicable when. Also, the test I did was unrealistic as I used the logs after the snapshot was taken, and the whole point of using SAN snapshots is that they contain everything required for a crash consistent recovery.

Recovery Manager Problems

Lots of people seem to like Oracles Recovery manager. I am not one of them. I think this is because of a lack of understanding on my part of how it works. It is a complex beast, and at the same time has some annoying limitations. I like to automate things. I have a number of scripts to call RMAN to do backups and restores in common situations. These fail far too often for my liking.

High Water Mark

We are running a data conversion and got a wait event I don’t normally see: The brown is identified as Configuration. It I drill down, I can see more detail. Here light purple is HW Contention (i.e. High Watermark Contention). Darker purple is Write Complete waits, and yellow is buffer busy waits. We have deferred segment creation switched on for the database. This means that the segment needs to be created before data can be written.

Exporting Statistics

This was surprisingly more difficult than I expected. We know that we can export stats from the dictionary to a table, and from the table to a file, and that file can be copied and imported to another database for the stats to be imported. Easy right? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 $ sqlplus / as sysdba SQL*Plus: Release 12.

Parsing

We are running a data conversion. The powers that be decided to use APIs to convert the data as they contain error checking. The problem is that they are generally designed for interactive use updating one row at a time, so they are very slow to update large batches of data. This was tuned and is getting much faster, however we noticed that there are a lot of waits on cursor: pin: S wait on X.