Managing Windows with Ansible

After a lot of trial and error, I finally found a way to make it work. Here is what I did.

As noted before, rather than trying to run an installer from a share, I simply copied it on to the local disc of the VM.

Before running Ansible on windows, the operating system has to be configured using (for example) the ConfigureRemotingForAnsible.ps1 script. I took the defaults. I didn’t bother setting up any of the advanced options like CredSSP, as in practice there didn’t seem to be any benefit.

Getting Windows and Ansible to play nicely

The challenge is to run an installer from a share, then run some extra configuration. We do this in Unix (Well, GNU/Linux) all the time, and it is really, really easy. We even have the software on a Windows share, so we just need to mount that and run it!

It turns out that Windows is different from Unix. A mounted share doesn’t belong to the system like in Linux, it belongs to a session. A windows session is what is created when a user logs in to Windows, it contains the desktop, the windows, and the various attributes of the connected user, their permissions for example.

Weird PeopleTools Install Issue On Windows

I am not sure how this situation arises, but I discovered that sometimes the windows installer (Puppet based) on Windows gets into a state where it isn’t installed, but thinks it is, which means that the install won’t work, but it generates messages as if it had worked.

Under c:\psft\pt\ps_home8.55.23 is only the jre directory. Running the install again appears to work, but doesn’t do anything.

The solution is to run the uninstall, then the install will work.

Powershell Parameters

I got an error from powershell:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
PS C:\> Start-Process -NoNewWindow -Verb RunAs
Start-Process : Parameter set cannot be resolved using the specified named
parameters.
At line:1 char:1
+ start-process -NoNewWindow -verb runas
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Start-Process], ParameterB
   indingException
    + FullyQualifiedErrorId : AmbiguousParameterSet,Microsoft.PowerShell.Comma
   nds.StartProcessCommand

This is because the Start-Process command has two parameter sets. Looking at the page linked, the parameters are listed in two groups. These groups can’t be mixed and matched. You have to pick one or the other. So if you want to do runas, you can’t use NoNewWindow.

More Recovery Manager Problems

A nice feature of RMAN is that if a restore fails, and you rerun it, it realises that it doesn’t have to redo all the work it has already done.

At least, that is usually what happens.

Today we got:

1
2
3
4
5
6
7
8
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 03/29/2018 12:54:29
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-06004: ORACLE error from recovery catalog database:
    RMAN-20003: target database incarnation not found in recovery catalog

As documented in oracle support note 2036644.1, this is caused by oracle bug 14683854, and the workaround is to remove

Redundancy

I have been thinking about our DR process, and how to improve it. More on that in a later post. However we recently had a couple of planned server room outages, where my previous planning has been beneficial.

Production is redundant and resilient across our server rooms. However, the other environments are not because redundancy is expensive and only really needed for production systems.

So what we do is to have half of the non-production systems at one site. and the other half at the other. This means that we can shut a machine room down, and half the development environments will continue to run.

Rman Fail

Here is another issue we had with RMAN. This one has been bugging me for years.

We were doing a duplicate using rman. For some reason the recovery catalogue didn’t contain the archive log we needed to do the recovery, so the restore completed, and finished with the familiar error:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
...
executing command: SET until clause

Starting recover at 2018-02-02 16:03:13

starting media recovery

unable to find archived log
archived log thread=1 sequence=307
Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: 'system.dbf'

The normal thing to do here is to correct the error, and redo the backup. But we already had the redo logs on the disc, and wanted to be able to apply them.

More Crash Consistent Recovery

Different Scenarios

In my previous post on this topic, I noted that you could use the snapshot time on the recover database command to recover the database from a SAN snapshot.

I realise that there are different possible scenarios and my write up wasn’t clear on which approach is applicable when. Also, the test I did was unrealistic as I used the logs after the snapshot was taken, and the whole point of using SAN snapshots is that they contain everything required for a crash consistent recovery.