Today's post is about an issue you have probably never encountered or even considered. It is only relevant for shared libraries developers, and even then, not always. However, I think it is beneficial for everyone to be familiar with how things work at a lower level, so I decided to write this post.
A few weeks ago I got a report about an increase private dirty pages from our libraries that essentially caused increased memory consumption for every application linking to the EFL. The main culprit was the object system (Eo), which I maintain, so I decided to take a look.
As the first step I manually reviewed the code. This led me to a mistake in related code which I eventually fixed. My fixes improved the situation a bit, but the main issue was still there. So I started investigating...
Note: unless specifically mentioned otherwise, all of this post assumes Linux on Intel hardware, though while the details may vary, the concepts should apply almost everywhere.
Introduction to Memory Pages in Linux
First, if you are not familiar with the concept, read about the topic on Wikipedia. Pages are essentially blocks of virtual memory, and are the smallest unit of data being managed by the OS, in our case Linux. The page size is usually 4KiB, and is the case here.
When an executable is being compiled, all of the information in it
is being mapped to different sections depending on usage. For example with clang
(may change based on compiler):
static const int a = 5;
will be mapped to .rodata, that is "read-only data". Another example is executable code (actual instructions) that would be mapped to .text. Then the linker decides how to map all of this into actual memory, and thus, into pages. Pages have permissions associated to them: read (R), write (W) and execute (X). For example, for security reasons, the stack is marked as RW, because you want to be able to read and write to the stack, but not executable in order to protect from a certain class of attacks. The actual executable code, is marked as RX, that is, you can read it, and execute it, but not modify it. A nice feature of non-writeable pages is that they never change (duh...), so the OS can reuse them and thus save memory. So for example, if your executable size is 2MiB, it'll be loaded into the memory once, it won't be loaded for every instance of the application. The OS is smart enough to share these pages.
As a side note, Linux also implements Copy-on-Write for RW pages, so even if a page is RW, it may be shared across different instances assuming the data hasn't been written to. Pages that can be shared are called clean, and ones that have been written to are called dirty.
There is more to be said about pages, but we've covered all we need in order to investigate the issue, so we will stop here.
The Issue Reported
Now that we know a bit more about pages we can more intelligently discuss the issue reported. The problem was that the EFL in general, and heavy users of Eo in particular, all of a sudden had a lot of private dirty pages. This means, a lot of pages of memory that are mapped from the library itself (in the executable, not allocated on runtime) that are being written to and thus can't be shared and have to be duplicated for each running process; a big issue for heavily used libraries.
RW pages exist for a reason, so it could just be that these were legitimate usages, though judging by the amount of pages, this seemed unlikely. The first step was to find what is mapped to these pages, so I started there.
Finding the Memory Users
Unfortunately, while nm
is a very useful command to mapping symbols to memory
regions, so I know which symbols are mapped to RW pages (most likely); it doesn't
(and can't) indicate which symbols map to dirty pages. Even more unfortunate is
that I am not aware of any tool that provides that information (please let me
know if you know of one!). I was just about to write one, when I saw that all of
the relevant RW pages in my test case were dirty, so any memory in those would
be relevant.
Note: in order to check which variables got mapped to RW pages I used pmap
.
Using this tool I was able to see the address range for each page, and using
some debug output I was able to get the addresses of the symbols in question
and so was able to know which was stored in RW pages. Using pmap
is very easy;
more on that in the next section.
I already knew which structures were the largest in Eo, so I decided to guess and see if they were mapped to RW pages or RO. My guess was spot on, and I found a few symbols that should be RO, but were actually RW. For example:
static const Efl_Event_Description *_event_desc[] = {
// SNIP ...
};
This is a common mistake due to the confusing syntax C uses for const
. This is
an array of pointers to const Efl_Event_Description
. This may look correct at
first glance, until you realise the array itself is not constant. It should be:
static const Efl_Event_Description * const _event_desc[] = {
// SNIP ...
};
This change saved us a few pages in the more event heavy areas, which is a first step, but the problem was still there, so the search recommenced.
I then stumbled upon
static const Efl_Class_Description _class_desc = {
// SNIP ...
};
This looks innocent. The Efl_Class_Description
type is a struct
, and const
was correctly applied. This should have definitely been RO, but for some reason
it was put in a RW page. Seeing this, and other similar structures, I knew I found
what I was looking for, now I just needed to figure out why it was happening.
After thinking about it for a bit, and considering a few different ideas,
I suspected it was related to the fact that while these structures were constant,
some of the fields were referring to other symbols, and in some cases, due to
relocation, the linker would have to figure out the address at runtime, and thus
won't be able to mark the pages as RO. This could be easily checked with nm
,
though I haven't thought about it at the time, so I went on investigating by
other means. I ended up writing a small contained example, so I know if I was
right or not.
I ran my example in both gcc
and clang
. Unfortunately gcc
gave me some less than optimal
results, so I will use clang
in my examples.
Checking my Hypothesis
In order to check my hypothesis I wrote a small program (issue.c):
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#define PAGE_SIZE 4096
#define ALLOC_SIZE (PAGE_SIZE * 1000)
typedef struct
{
const void *invalidater;
const char data[ALLOC_SIZE];
} Invalid;
static const char ro[ALLOC_SIZE];
static const Invalid rw = { NULL, { 0 }};
int main()
{
printf("%zd\n", (size_t) getpid());
printf("%p %p\n", ro, &rw); // So they are not optimised out
scanf("\n"); // Keep the program running
return 0;
}
This program attempts to allocate two variables:
ro
: 1000 pages of read only memory.rw
: 1000 pages (and one pointer) of read only memory that I suspected was going to be RW.
Upon running it prints its PID and then waits.
Now I can run it and inspect exactly what's going on. For that I will use pmap
(and redact some of the non-relevant output).
Let's compile and run our program:
$ clang issue.c
$ ./a.out
11835
0x4006a0 0x7e86a0
And then in another terminal:
$ pmap 11835
11835: ./a.out
0000000000400000 8004K r-x-- a.out
0000000000dd0000 4K rw--- a.out
As you can see, both variables have been mapped to RO pages (the first). This is what we expected (which wasn't the case with GCC), because we don't rely on anything that we don't know on compile time. This was just a test to see everything works.
Now we are going to change the program to check my hypothesis. We will change the
NULL
in the declaration of rw
to some symbol which may be relocated, for
example strlen
, and then compile and run again:
$ clang issue.c
$ ./a.out
11941
0x4006e0 0x7e86e0
And then in another terminal:
$ pmap 11941
11941: ./a.out
0000000000400000 8004K r-x-- a.out
0000000000dd0000 4K rw--- a.out
And... It still works. At this point I started to question myself, maybe I was wrong and something was going on.
Then I realised there is still one thing that is different between my test case and libraries that exhibit the issue that may be related. They are libraries, and thus have position independent code, so I tested once more, this time with PIC enabled:
$ clang issue.c -fPIC
$ ./a.out
12002
0x4006d0 0x9e8818
And then in another terminal:
$ pmap 12002
12002: ./a.out
0000000000400000 4004K r-x-- a.out
00000000009e8000 4004K rw--- a.out
Voila! We managed to replicate the issue.
Verifying with nm
As I mentioned before, this would have been easy to verify with nm
, so
I'll also show that for completeness. Though even with nm
, I would have needed
to enable PIC to trigger the issue.
Relevant nm
output for the RO (issue not present) case:
$ nm -f sysv ./a.out
rw |07e86e0| r | OBJECT|03e8008| |.rodata
As you can see, rw
is put into the .rodata section, that is read-only
data.
Relevant nm
output for the RW (issue present) case:
$ nm -f sysv ./a.out
rw |09e8818| d | OBJECT|03e8008| |.data.rel.ro
Here, rw
is put into the .data.rel.ro section, which is a section
that is read-only after relocation, which means, not read-only.
My Pages are RO and not RW
I got reports from two different people (thanks Daniel Hirt and Mark Mossberg!) that their pmap
output looked something like this:
$ pmap 10214
10214: ./a.out
0000000000400000 4004K r-x-- a.out
00000000009e8000 4004K r---- a.out
0000000000dd1000 4K rw--- a.out
The reason for that is most likely linker differences.
One way to verify this is indeed the case:
$ strace ./a.out
... SNIP ...
mprotect(0x9e8000, 4100096, PROT_READ) = 0
... SNIP ...
As you can see, the program is calling mprotect
to change the page (look at the
address) to be RO. If you read the previous section about nm
, you probably saw
rw
was put into the "read-only after relocation" section, which means the linker
was allowed and encouraged to mark the pages read-only after it has finished the
relocation updates.
As mentioned above, I took a short-cut checking for RW pages instead of private
dirty pages because I had complete overlap between the two. This short-cut may
not work for your case, though the nm
output should still give you the information
you need.
Solving the Issue
Two solutions come to mind. Either separate the "truly constant" values from the
relocatable values. So for example, if we had a struct
like:
struct {
int ro;
void *symbol;
};
Split it to two separate struct
s.
Or alternatively, reconsider parts of the design, maybe using the pointers in the
struct
is not even needed and it's enough to pass them to a function.
This way the structures won't be mapped to RW memory, but would be temporarily stored
on the stack before the function invocation, reducing the memory usage.
I have already reduced some memory in Eo users by using the second method. I will soon complement that with the first method to reduce it even further. Preliminary tests show significant reduction in memory usage, so a big win.
Lessons Learned
While I knew the theory behind it, I was surprised to see how code I assumed would be mapped to RO pages ended up in RW pages that get dirty immediately. More specifically, that any structure that has a pointer to anything, even to a constant string, like:
static const char * const rw = "test";
would end up in RW pages. Review your libraries and make sure you are not wasting memory, and remember that heap memory is not the only memory that can be wasted.
Please let me know if you spotted any mistakes or have any suggestions, and follow me on Twitter or RSS for updates.