Django on RedHat 7

Django on RHEL7

Here is how I chose to set up Django on RHEL7. This is somewhat painful because the default Python in RHEL7 is still version 2. Using this to create any new programs seems foolish as it has already been de-supported by the Python project, so I really have to use Python 3.

Set up Linux

We need Python 3, Django, Apache, and Postgresql. See the instructions linked in the next section if the Postgresql packages aren’t found. The version that comes with RHEL7 is too old for the version of Django we want to install.

Gnu PG Cache Time To Live

As discussed previously, we use regpg to manage Ansible secrets. This has been really useful. One annoyance though is that some tasks can take up to 6 hours to run, but the gpg agent only caches the gpg passphrase for 10 minutes or so. I end up having to type the passphrase in several times during a run. I occasionally kick off a run before I leave for the day. It would be a shame if it was stalled overnight due to waiting for a passphrase.

Ansible trick - Retry an intermittent error

We have recently been having issues with mounting windows shares. It occasionally doesn’t work. We don’t have access to fix it. The playbook fails at this step. This is really annoying, because if it tried again it would probably work fine, and the playbook could complete. It turns out that Ansible does have a way to retry. I got the idea from this Stack Overflow question

The solution is the Ansible retries keyword. Here is a test of a command that intermittently fails:

Auditing Options in the Database

Oracle Auditing

Oracle have a number of different types of auditing, and in recently created databases they all coexist. I looked at this recently and thought I had better make some notes before I forget.

There are three types of auditing:

  • Traditional auditing. This is the way things worked before 12.1
  • Unified auditing. This is the new rewrite of auditing, but needs some effort to get working.
  • Fine grained auditing. This stays the same between traditional and unified auditing.
  • Mixed mode auditing. In the Database Security Guide it mentions that newly created databases can use mixed mode auditing. This allows us to use the functionality of both types of audit.

Auditing is one tool that can be used to help secure the system. A particular risk is that of an attacker updating or removing the audit trail. So auditing should be considered as one tool to keep the system safe. Unfortunately it is one of those things that doesn’t look shiny or help users to do their work, so doesn’t tend to get enough resources to make it work properly.

Fixing Logical Standby Go Slow

Logical Standby Databases

We use a logical standby database. The logical standby database is inherently fragile, because it mines the primary logs, and attempts to rebuild the SQL to apply on the logical.

This rebuilt SQL is fine most of the time, but if an application release has altered a table, or updated most of the rows in a large table, this generated SQL often performs very poorly.

Logical Standby Failure Modes

Typically there are two failure modes for the logical standby database. Either it will fall over because a SQL statement it has generate doesn’t work, or the generated SQL statement will perform really badly and take forever. The first is easy to deal with. Find the error message, fix the error, or ignore the SQL and continue. The second is more difficult, as there is no error message to investigate. Typically we will notice that the logical standby is getting a backlog of redo to apply. We need to investigate what is taking a long time.

SQL Performance Issues

The initial investigation of this issue is written in the Production Emergency post. You might want to read that first if you haven’t already.

Addressing the underlying problem

We had identified the source of the problem. The SQL text can be extracted from the database as follows:

1
2
3
4
5
6
set linesize 300
set pagesize 0
set long 30000
set longchunksize 300

select sql_text from v$sql where sql_id = '67bqun92ngrsj';

This displays the SQL text. If the application populates the module, program and client_id of v$session using dbms_application_info, then there is enough information to find out what code is causing the problem and even who was running the program that caused it.

What to do when production locks up

Recently our PeopleSoft system locked up. Nobody could do anything, they just got a blank page in the browser.

The System Model

The approach to use in this situation is to consider how the application works. In our case a user’s web browser will connect to the load balancer, which will connect to a web server. The web server will pass the query to an available application server out of the pool, which will then pass the query to the database. Then the results go back up the chain.

Triggering an Action in Another Repository

GitLab Organization

It seemed reasonable when specifying the repositories, to have three (at first):

  • One to hold the Ansible roles to build the VMs
  • One to hold the custom code that the above repository will deploy
  • One to define what the environments look like, things like
    • Memory
    • Number of VMs at each tier
    • The names of the VMs
    • Passwords
    • And so on.

Problem

The problem is that when the developer commits code to his repository, they would like it to be deployed to an environment so they can test it. It makes sense to do this automatically, we have all the information about the environment in the environment repository. This is different to the repository where the code is being checked in. It would be nice if we could call across to that and trigger the automated deploy.