Hunting Down Dirty Memory Pages

Investigating and reducing memory consumption due to dirty memory pages

 ·  8 minutes read

Today's post is about an issue you have probably never encountered or even considered. It is only relevant for shared libraries developers, and even then, not always. However, I think it is beneficial for everyone to be familiar with how things work at a lower level, so I decided to write this post.

A few weeks ago I got a report about an increase private dirty pages from our libraries that essentially caused increased memory consumption for every application linking to the EFL. The main culprit was the object system (Eo), which I maintain, so I decided to take a look.

As the first step I manually reviewed the code. This led me to a mistake in related code which I eventually fixed. My fixes improved the situation a bit, but the main issue was still there. So I started investigating...

Note: unless specifically mentioned otherwise, all of this post assumes Linux on Intel hardware, though while the details may vary, the concepts should apply almost everywhere.

Introduction to Memory Pages in Linux

First, if you are not familiar with the concept, read about the topic on Wikipedia. Pages are essentially blocks of virtual memory, and are the smallest unit of data being managed by the OS, in our case Linux. The page size is usually 4KiB, and is the case here.

When an executable is being compiled, all of the information in it is being mapped to different sections depending on usage. For example with clang (may change based on compiler):

static const int a = 5;

will be mapped to .rodata, that is "read-only data". Another example is executable code (actual instructions) that would be mapped to .text. Then the linker decides how to map all of this into actual memory, and thus, into pages. Pages have permissions associated to them: read (R), write (W) and execute (X). For example, for security reasons, the stack is marked as RW, because you want to be able to read and write to the stack, but not executable in order to protect from a certain class of attacks. The actual executable code, is marked as RX, that is, you can read it, and execute it, but not modify it. A nice feature of non-writeable pages is that they never change (duh...), so the OS can reuse them and thus save memory. So for example, if your executable size is 2MiB, it'll be loaded into the memory once, it won't be loaded for every instance of the application. The OS is smart enough to share these pages.

As a side note, Linux also implements Copy-on-Write for RW pages, so even if a page is RW, it may be shared across different instances assuming the data hasn't been written to. Pages that can be shared are called clean, and ones that have been written to are called dirty.

There is more to be said about pages, but we've covered all we need in order to investigate the issue, so we will stop here.

The Issue Reported

Now that we know a bit more about pages we can more intelligently discuss the issue reported. The problem was that the EFL in general, and heavy users of Eo in particular, all of a sudden had a lot of private dirty pages. This means, a lot of pages of memory that are mapped from the library itself (in the executable, not allocated on runtime) that are being written to and thus can't be shared and have to be duplicated for each running process; a big issue for heavily used libraries.

RW pages exist for a reason, so it could just be that these were legitimate usages, though judging by the amount of pages, this seemed unlikely. The first step was to find what is mapped to these pages, so I started there.

Finding the Memory Users

Unfortunately, while nm is a very useful command to mapping symbols to memory regions, so I know which symbols are mapped to RW pages (most likely); it doesn't (and can't) indicate which symbols map to dirty pages. Even more unfortunate is that I am not aware of any tool that provides that information (please let me know if you know of one!). I was just about to write one, when I saw that all of the relevant RW pages in my test case were dirty, so any memory in those would be relevant.

I already knew which structures were the largest in Eo, so I decided to guess and see if they were mapped to RW pages or RO. My guess was spot on, and I found a few symbols that should be RO, but were actually RW. For example:

 static const Efl_Event_Description *_event_desc[] = {
      // SNIP ...

This is a common mistake due to the confusing syntax C uses for const. This is an array of pointers to const Efl_Event_Description. This may look correct at first glance, until you realise the array itself is not constant. It should be:

 static const Efl_Event_Description * const _event_desc[] = {
      // SNIP ...

This change saved us a few pages in the more event heavy areas, which is a first step, but the problem was still there, so the search recommenced.

I then stumbled upon

static const Efl_Class_Description _class_desc = {
    // SNIP ...

This looks innocent. The Efl_Class_Description type is a struct, and const was correctly applied. This should have definitely been RO, but for some reason it was put in a RW page. Seeing this, and other similar structures, I knew I found what I was looking for, now I just needed to figure out why it was happening.

After thinking about it for a bit, and considering a few different ideas, I suspected it was related to the fact that while these structures were constant, some of the fields were referring to other symbols, and in some cases, due to relocation, the linker would have to figure out the address at runtime, and thus won't be able to mark the pages as RO. This could be easily checked with nm, though I haven't thought about it at the time, so I went on investigating by other means. I ended up writing a small contained example, so I know if I was right or not.

I ran my example in both gcc and clang. Unfortunately gcc gave me some less than optimal results, so I will use clang in my examples.

Checking my Hypothesis

In order to check my hypothesis I wrote a small program (issue.c):

#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define PAGE_SIZE 4096
#define ALLOC_SIZE (PAGE_SIZE * 1000)

typedef struct
   const void *invalidater;
   const char data[ALLOC_SIZE];
} Invalid;

static const char ro[ALLOC_SIZE];
static const Invalid rw = { NULL, { 0 }};

int main()
   printf("%zd\n", (size_t) getpid());
   printf("%p %p\n", ro, &rw); // So they are not optimised out
   scanf("\n"); // Keep the program running
   return 0;

This program attempts to allocate two variables:

  • ro: 1000 pages of read only memory.
  • rw: 1000 pages (and one pointer) of read only memory that I suspected was going to be RW.

Upon running it prints its PID and then waits.

Now I can run it and inspect exactly what's going on. For that I will use pmap (and redact some of the non-relevant output).

Let's compile and run our program:

$ clang issue.c
$ ./a.out
0x4006a0 0x7e86a0

And then in another terminal:

$ pmap 11835
11835:   ./a.out
0000000000400000   8004K r-x-- a.out
0000000000dd0000      4K rw--- a.out

As you can see, both variables have been mapped to RO pages (the first). This is what we expected (which wasn't the case with GCC), because we don't rely on anything that we don't know on compile time. This was just a test to see everything works.

Now we are going to change the program to check my hypothesis. We will change the NULL in the declaration of rw to some symbol which may be relocated, for example strlen, and then compile and run again:

$ clang issue.c
$ ./a.out
0x4006e0 0x7e86e0

And then in another terminal:

$ pmap 11941
11941:   ./a.out
0000000000400000   8004K r-x-- a.out
0000000000dd0000      4K rw--- a.out

And... It still works. At this point I started to question myself, maybe I was wrong and something was going on.

Then I realised there is still one thing that is different between my test case and libraries that exhibit the issue that may be related. They are libraries, and thus have position independent code, so I tested once more, this time with PIC enabled:

$ clang issue.c -fPIC
$ ./a.out
0x4006d0 0x9e8818

And then in another terminal:

$ pmap 12002
12002:   ./a.out
0000000000400000   4004K r-x-- a.out
00000000009e8000   4004K rw--- a.out

Voila! We managed to replicate the issue.

Verifying with nm

As I mentioned before, this would have been easy to verify with nm, so I'll also show that for completeness. Though even with nm, I would have needed to enable PIC to trigger the issue.

Relevant nm output for the RO (issue not present) case:

$ nm -f sysv ./a.out
rw |07e86e0| r | OBJECT|03e8008| |.rodata

As you can see, rw is put into the .rodata section, that is read-only data.

Relevant nm output for the RW (issue present) case:

$ nm -f sysv ./a.out
rw |09e8818| d | OBJECT|03e8008| |

Here, rw is put into the section, which is a section that is read-only after relocation, which means, not read-only.

My Pages are RO and not RW

I got reports from two different people (thanks Daniel Hirt and Mark Mossberg!) that their pmap output looked something like this:

$ pmap 10214
10214:   ./a.out
0000000000400000   4004K r-x-- a.out
00000000009e8000   4004K r---- a.out
0000000000dd1000      4K rw--- a.out

The reason for that is most likely linker differences.

One way to verify this is indeed the case:

$ strace ./a.out
... SNIP ...
mprotect(0x9e8000, 4100096, PROT_READ)  = 0
... SNIP ...

As you can see, the program is calling mprotect to change the page (look at the address) to be RO. If you read the previous section about nm, you probably saw rw was put into the "read-only after relocation" section, which means the linker was allowed and encouraged to mark the pages read-only after it has finished the relocation updates.

As mentioned above, I took a short-cut checking for RW pages instead of private dirty pages because I had complete overlap between the two. This short-cut may not work for your case, though the nm output should still give you the information you need.

Solving the Issue

Two solutions come to mind. Either separate the "truly constant" values from the relocatable values. So for example, if we had a struct like:

struct {
     int ro;
     void *symbol;

Split it to two separate structs.

Or alternatively, reconsider parts of the design, maybe using the pointers in the struct is not even needed and it's enough to pass them to a function. This way the structures won't be mapped to RW memory, but would be temporarily stored on the stack before the function invocation, reducing the memory usage.

I have already reduced some memory in Eo users by using the second method. I will soon complement that with the first method to reduce it even further. Preliminary tests show significant reduction in memory usage, so a big win.

Lessons Learned

While I knew the theory behind it, I was surprised to see how code I assumed would be mapped to RO pages ended up in RW pages that get dirty immediately. More specifically, that any structure that has a pointer to anything, even to a constant string, like:

static const char * const rw = "test";

would end up in RW pages. Review your libraries and make sure you are not wasting memory, and remember that heap memory is not the only memory that can be wasted.

Please let me know if you spotted any mistakes or have any suggestions, and follow me on Twitter or RSS for updates.

low-level debug memory