How I Test My Backups
I like to test my backups. It helps me sleep to know I could get my data back if
the worst happened and it was scrambled by ransomware, or a bug in our code.
My sleep was rendered less peaceful when the restores suddenly started failing
for no reason that I could understand. We use RMAN to backup and restore
the data, and the script
is fairly simple - it effectively says to restore the database as it was at
noon yesterday. Something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
export NB_ORA_CLIENT=proddbserver
sqlplus / as sysdba <<-!
shutdown abort
create spfile from pfile;
startup nomount
!
rman <<-!
connect auxiliary /
connect catalog catuser@CATDB
run {
allocate auxiliary channel d1 type 'SBT_TAPE';
allocate auxiliary channel d2 type 'SBT_TAPE';
allocate auxiliary channel d3 type 'SBT_TAPE';
duplicate database 'PRODDB' to 'RESTDB' until
time "to_date('2024-05-09 12:00','YYYY-MM-DD HH24:MI')";
}
!
|
Assuming that today is the 10th May, yesterday would have been the 9th. There is a
slight complication in that I have another database on this server called PRODDB,
so I have to alter the path names, but its relatively simple to set the relevant
parameters in the pfile:
1
2
|
*.DB_FILE_NAME_CONVERT = 'PRODDB','RESTDB'
*.LOG_FILE_NAME_CONVERT = 'PRODDB','RESTDB'
|
What Went Wrong
This worked
fine for years, but suddenly it started erroring as follows. Sorry, this is rather
a lot of output, it restores the control files OK, but then falls over because it
doesn’t know what to call the datafiles. Scroll down for more…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
|
Recovery Manager: Release 19.0.0.0.0
- Production on Wed May 8 12:03:06 2024 Version 19.23.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
connected to recovery catalog database
PL/SQL package CATUSER.DBMS_RCVCAT version 19.21.00.00.
in RCVCAT database is not current
PL/SQL package CATUSER.DBMS_RCVMAN version 19.21.00.00
in RCVCAT database is not current
connected to auxiliary database (not started)
RMAN>
Oracle instance started
Total System Global Area 10737417432 bytes
Fixed Size 13683928 bytes
Variable Size 5301600256 bytes
Database Buffers 5402263552 bytes
Redo Buffers 19869696 bytes
RMAN> 2> 3> 4> 5> 6> 7>
allocated channel: d1
channel d1: SID=849 device type=SBT_TAPE channel d1:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
allocated channel: d2
channel d2: SID=970 device type=SBT_TAPE channel d2:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
allocated channel: d3
channel d3: SID=1091 device type=SBT_TAPE channel d3:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
Starting Duplicate Db at 2024-May-08 12:03:26
contents of Memory Script:
{
set until scn 41106151363386;
sql clone "alter system set db_name = ''PRODDB''
comment= ''Modified by RMAN duplicate'' scope=spfile";
sql clone "alter system set db_unique_name = ''RESTDB''
comment= ''Modified by RMAN duplicate'' scope=spfile";
shutdown clone immediate;
startup clone force nomount
restore clone primary controlfile;
alter clone database mount;
}
executing Memory Script
executing command: SET until clause
sql statement: alter system set db_name = ''PRODDB''
comment= ''Modified by RMAN duplicate'' scope=spfile
sql statement: alter system set db_unique_name = ''RESTDB''
comment= ''Modified by RMAN duplicate'' scope=spfile
Oracle instance shut down
Oracle instance started
Total System Global Area 10737417432 bytes
Fixed Size 13683928 bytes
Variable Size 5301600256 bytes
Database Buffers 5402263552 bytes
Redo Buffers 19869696 bytes
allocated channel: d1
channel d1: SID=849 device type=SBT_TAPE channel d1:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
allocated channel: d2
channel d2: SID=970 device type=SBT_TAPE channel d2:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
allocated channel: d3
channel d3: SID=1091 device type=SBT_TAPE channel d3:
Veritas NetBackup for Oracle - Release 8.3.0.1 (2020081919)
Starting restore at 2024-May-08 12:04:01
channel d1: starting datafile backup set restore channel d1:
restoring control file
channel d1: reading from backup piece c-799314582-20240507-01
channel d1: piece handle=c-799314582-20240507-01 tag=TAG20240507T100515
channel d1: restored backup piece 1
channel d1: restore complete, elapsed time: 00:00:15
output file name=/RESTDB/control01/control1.ctl
output file name=/RESTDB/control02/control2.ctl
output file name=/RESTDB/control03/control3.ctl
output file name=/RESTDB/control04/control4.ctl
Finished restore at 2024-May-08 12:04:17
database mounted
Oracle instance started
Total System Global Area 10737417432 bytes
Fixed Size 13683928 bytes
Variable Size 5301600256 bytes
Database Buffers 5402263552 bytes
Redo Buffers 19869696 bytes
contents of Memory Script:
{
sql clone "alter system set db_name = ''RESTDB''
comment= ''Reset to original value by RMAN'' scope=spfile";
sql clone "alter system reset db_unique_name scope=spfile";
shutdown clone immediate;
}
executing Memory Script
sql statement: alter system set db_name = ''RESTDB''
comment= ''Reset to original value by RMAN'' scope=spfile
sql statement: alter system reset db_unique_name scope=spfile
Oracle instance shut down
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/08/2024 12:05:02
RMAN-05501: aborting duplication of target database
RMAN-05624: data file name not found in the repository
for data file number=222
RMAN-05624: data file name not found in the repository
for data file number=221
RMAN-05624: data file name not found in the repository
for data file number=220
|
Sorry that was so long! OK, so RMAN has gone from being
able to take the file names of the production database server to not being
able to, and for no reason.
Oracle Service Request
We have Oracle Support! They will be able to help! I raised a service
request with
Oracle, and they pointed me at
Doc ID 2704529.1,
which basically says the script I have successfully been running
for years is impossible. You have to specify a
set newname
command in the run block. So my RMAN command becomes:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
rman <<-!
connect auxiliary /
connect catalog catuser@CATDB
run {
allocate auxiliary channel d1 type 'SBT_TAPE';
allocate auxiliary channel d2 type 'SBT_TAPE';
allocate auxiliary channel d3 type 'SBT_TAPE';
set newname for database to '/RESTDB/data/%f_%b';
duplicate database 'PRODDB' to 'RESTDB' until
time "to_date('2024-05-09 12:00','YYYY-MM-DD HH24:MI')";
}
!
|
The problem with this is that it doesn’t work! I will spare you the
entire output this time, after restoring the controlfile it errors
again.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
RMAN-08161: contents of Memory Script:
{
sql clone "alter system set db_name =
''CS_HESA'' comment=
''Reset to original value by RMAN'' scope=spfile";
sql clone "alter system reset db_unique_name scope=spfile";
shutdown clone immediate;
}
RMAN-08162: executing Memory Script
RMAN-06162: sql statement: alter system set db_name = ''CS_HESA''
comment= ''Reset to original value by RMAN'' scope=spfile
RMAN-06162: sql statement: alter system reset db_unique_name scope=spfile
RMAN-06402: Oracle instance shut down
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/09/2024 14:04:24
RMAN-05501: aborting duplication of target database
RMAN-06136: Oracle error from auxiliary database:
ORA-19715: invalid format b for generated name
ORA-27302: failure occurred at: slgpn
|
I can restore with just the file number using set newname '/RESTDB/data/%f'
,
and that does restore, but all my database files
end up in the same directory, and they are all named with numbers.
So they don’t have my lovely naming convention where I group them
into folders and name them based on the table space name.
I like being able to restore the database. This is great! But I
also like seeing what my files are for!
The Real Problem - And The Solution
Looking at the RMAN repository I can see that there is some weirdness:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
$ rman target /
Recovery Manager: Release 19.0.0.0.0
- Production on Thu May 9 17:33:13 2024
Version 19.23.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
connected to target database: PRODDB (DBID=799314582)
RMAN> connect catalog catuser@CATDB
recovery catalog database Password:
connected to recovery catalog database
PL/SQL package CATUSER.DBMS_RCVCAT version 19.21.00.00.
in RCVCAT database is not current
PL/SQL package CATUSER.DBMS_RCVMAN version 19.21.00.00
in RCVCAT database is not current
RMAN> list db_unique_name of database;
List of Databases
DB Key DB Name DB ID Database Role Db_unique_name
------- ------- ----------------- --------------- ------------------
56747359 PRODDB 799314582 PRIMARY PRODDB_PRODHOST
56747359 PRODDB 799314582 STANDBY RESTDB
|
Well, this is weird, my test restore database has been registered with the same DBID as
production. RMAN seems to think it is a standby, which it is not. How did this happen?
No idea. Let’s try unregistering it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
RMAN> unregister DB_UNIQUE_NAME RESTDB;
database db_unique_name is "RESTDB", db_name is "PRODDB" and DBID is 799314582
Want to unregister the database with target db_unique_name (enter YES or NO)? YES
database with db_unique_name RESTDB unregistered from the recovery catalog
RMAN> list db_unique_name of database;
List of Databases
DB Key DB Name DB ID Database Role Db_unique_name
------- ------- ----------------- --------------- ------------------
56747359 PRODDB 799314582 PRIMARY PRODDB_PRODHOST
RMAN>
|
That looks much better. And a test restore works fine. I have not been able to
find this documented anywhere. There is nothing in the Oracle Support knowledge
base. So I thought I would put this out there in the hope it helps someone.
Especially since that someone might be me in the future!