Controlling Memory and Swap Usage in Linux with Systemd
The Problem
We have an application that uses a lot of memory when a user takes particular actions. I feel the application should take steps to protect itself - it shouldn’t allow an action from a user which will cause issues. I wasn’t able to influence the application so we lived with occasional out of memory issues, and times where processes could not be forked due to lack of memory.
This came to a head during a recent OS upgrade which delivered a new Linux Kernel. This seemed a lot slower to invoke the out of memory killer, such that the VM locked up completely for about an hour. During this time the application was unresponsive, and we could not even log in to the VM as root to sort things out. Clearly this is unacceptable, and in the absence of being able to fix the application the problem needed to be mitigated in some way.
The upgrade was:
Component | Old Version | New Version |
---|---|---|
RHEL1 | 7 | 8 |
Linux | 3.10 | 4.18 |
Systemd | 219 | 239 |
Unsuccessful Attempts
Restoring the old behaviour
There may be a way to tune the kernel to bring back the old behaviour. If so I wasn’t able to find it. Please email me if you know how to do this! My email address is at the bottom of the page currently - click on the envelope icon.
Using Ulimit
Setting the soft limit doesn’t work:
|
|
This limit is silently ignored, the process can still use more memory.
Setting a hard limit errors.
The
bash builtins manual
for ulimit
does in fact note that
-m The maximum resident set size (many systems do not
honor this limit)
It looks like Linux is one of these many systems.
Changing the Number of Malloc Arenas
The C library provides a function called malloc to allocate memory. The
glibc version of malloc
creates a number of pools for memory that it calls arenas.
Each thread can only use a particular arena. The PeopleSoft application server
reuses allocated memory, but potentially if a thread tries to reuse previously
memory allocated by another thread in another arena it won’t be able to, so
will create more memory itself. The number of arenas allowed is controlled by
environment variable MALLOC_ARENA_MAX
2, or GLIBC_TUNABLES
34. By setting it to 1 all threads use
the same arena so all allocated memory can be reused by all threads. Since
PeopleSoft only uses on thread at a time in each process this doesn’t impact
performance, and should reduce memory usage. My testing showed that this
approach didn’t help with the issue we experienced because the memory was
grabbed by one thread.
What Did Work?
Control Groups
Linux has had a feature called control groups (often shortened to cgroups). These can be used to control the access of a group of processes to a resource. While systemd isn’t required to make use of control groups, it is a useful interface to use, and can be controlled by Ansible. RedHat has some documentation on control groups and using systemd to control then, which is useful as I am working on a RHEL system. There is also some documentation in the systemd section of freedesktop.org
Changes To Make
First we had to alter the
systemd unit file.
The delivered one just calls
the legacy init.d scripts with a service type of oneshot. This means the
processes escape
from the cgroup. To enable them to be caught by systemd we needed to
write a unit file from scratch. These processes are started from a
control process called psadmin
, which then exits but starts a daemon that
starts and stops processes. This is a
service type
of forking. Systemd can
also switch to the correct user. So the unit file:
/usr/lib/systemd/system/psft-appserver-APPDOM.service
ends up looking like this:
|
|
We had to increase the timeout from the default of 30 seconds, that’s how long it takes to start the application.
Once this is done, the processes are captured by the control group that is created for us by systemd. Next we need to limit the control groups access to memory and swap, using set-property:
This switches on memory accounting which is needed to be able to control memory use, then limits the memory to 85% of the memory on the server, and forbids the group from using swap.
It Works!
Testing reveals that when the application uses a lot of memory, it no longer swaps, and is killed by the out of memory killer. One advantage of this approach is that if we do run out of memory, it is less likely innocent bystanders will be killed.
Limitations
Does Not Fix the Underlying Problem.
The real problem here is that the application is using too much memory. We have got the system back to the state where it is killed by the out of memory killer rather than waiting for an hour in an unusable state, but the real solution is not to write the application such that it won’t allow the user to take actions that use all available memory.
We Must Always Use Systemd
We used to stop and start the application using the command in the
unit file. But if we do this now, systemd won’t capture the processes in
the control group, so the memory won’t be controlled. We must always use
systemctl
to stop and start the application now.
Prevents Processes from Being Forked
By design forked processes (i.e. programs called by the application) are still captured by the control group. Thus if all allowed memory is in use, the control group will prevent processes from being forked as there isn’t enough memory for them. In our case this means Cobol programs called by the application won’t start. Also when the virus checker is called as part of a file upload, it won’t start. The application interprets this as an infected file, and informs the user their file is infected with a virus, when it is not.
The Application Can Still be Swapped
Our monitoring software flagged that 100% of swap was being used on the VM hosting the application. What could be using it? The application is the only thing running on the VM (apart from some monitoring software and the operating system). What was using all the swap?
Running top
, pressing f
to change the fields displayed and including
SWAP
in the list by arrowing down and selecting it with space, then pressing
escape shows that it is our application using swap. Taking note of a Process
ID (PID) of one of these processes, and running:
|
|
Where <PID>
is the process id from top
, returns the following:
|
|
Systemd claims the control group isn’t using any swap. How can this be?
It turns out that control groups are grouped in a hierarchy. So the control group we created is a part of the root control group. In effect this means if the root control group, is feeling memory pressure, the operating system is free to swap out pages belonging to the processes in our child control group. In practice, this doesn’t seem to matter.
RedHat Enterprise Linux, which is what we use on our VMs at present. ↩︎