Recovering Data From A Corrupt tar Archive

My backup tar archive got corrupted. This is how I saved my data…

 ·  4 minutes read

A few days ago, a bug in the task list app I use on my phone, Orgzly, wiped all of my tasks. This was terribly annoying, but I wasn't too concerned because I knew I had backups. Well, at least I thought I had.

I intend to write a post about my phone's backup setup in the future, but in short, it's something like this: I turn on "backup to file" in apps that support it, use syncthing to copy the files to my home server, and then use duplicity to securely back everything up to a remote location. I have yet to find a simpler, secure, and privacy respecting backup solution for Android. If you have any suggestions, please let me know.

After realising my task app wiped everything, I quickly checked the local file backup on my phone. Unfortunately, this had already synced and also got wiped.

My syncthing is configured to only sync when charging, so I tried checking my home server, though unfortunately I was too late, and it was already synced and thus was empty as well.

At least, I thought to myself, I have my duplicity backups, I can just restore the backup from there. I opened duplicity only to find out that it was not configured to backup the syncthing directory, meaning I had no backups there too! I occasionally verify my backups actually work (can restore from them), though I haven't noticed that this important directory was not included in the backup.

At this point I was starting to get worried, thinking I'd lose all of my (many!) tasks, though then I realised that I upgraded my phone's firmware a few days before, and had a full NANDroid backup. However, the files I was looking for were not showing up, and tar was reporting an error. This is where our story begins.

The Issue

As I said above, the files I was looking for weren't showing up (though some other files were), and tar was reporting the following error:

tar: Malformed extended header: missing equal sign

I tried searching the internet for this error message, but found no solutions, only what seems to be the same issue reported by another user. After reading a bit more, it looks like TWRP (the Android recovery I made the backup with), had a bug with creating backups in some cases. In my case it wasn't even affecting all of the files, just some, though the ones I cared about were among those.

I then tried searching the web for corrupted tar recovery and such similar terms. I found some suggestions and ideas on how to solve it, but nothing worked. I tried opening it with gnu tar, bsdtar, busybox and cpio to no avail. At this point I got a bit desperate and decided it's time to check if there's even data there. To do that, I used grep on my 2GB backup file (data.tar).

$ grep /data/data/com.orgzly data.tar
Binary file data.tar matches

I was excited to see that the files seem to be there, and to verify they actually contained my data, I opened the file with less, searched the above string and looked around. My data was there!

The Solution

I remembered that the tar archive format was quite simple, so I was initially planning on writing a small python program that would parse the tar archive and will help me extract my file. However, I then realised that it's actually not necessary, and I can just achieve everything I want from the shell.

Let's start by finding my data in the file. As you remember, earlier I used less to check if my data was there. When I searched there I noticed my file (/data/data/com.orgzly/backups/orgzly.db) was the second match, and that the next match was just after what seemed to be the end of my file. Based on these assumptions, I ran grep to find the offsets, and got the following:

$ grep --byte-offset --only-matching --text /data/data/com.orgzly/databases/orgzly.db data.tar
... snip ...

Based on this, I concluded that the relevant tar entry starts at 2095004675 and ends at 2095349251. I then used dd to extract this chunk to make the file easier to work with.

$ dd skip=2095004675 count=344576 bs=1 if=data.tar of=orgzly.db
344576+0 records in
344576+0 records out
344576 bytes (345 kB, 336 KiB) copied, 0.393915 s, 875 kB/s

I now needed to remove the tar headers. One way of doing it is to search for the tar header format, parse it, extract the file's length, and use that. However, I had a better idea: I figured out I could just find the beginning of the file using its magic number.

$ grep --byte-offset --only-matching --text SQLite orgzly.db

I then trimmed the file accordingly, and tried opening it with sqlite:

$ dd skip=509 bs=1 if=orgzly.db of=orgzly.fixed.db
344067+0 records in
344067+0 records out
344067 bytes (344 kB, 336 KiB) copied, 0.397661 s, 865 kB/s

$ sqlite3 orgzly.fixed.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .tables
android_metadata         note_properties          repos
book_links               notes                    rook_urls
book_syncs               notes_view               rooks
books                    org_ranges               searches
books_view               org_timestamps           times_view
current_versioned_rooks  properties               versioned_rooks
db_repos                 property_names
note_ancestors           property_values

And it worked!

Now that I managed to recover the database, I copied it to my phone, fixed the permissions and SELinux attributes and started orgzly. Everything worked, and my tasks were saved!

Please let me know if you spotted any mistakes or have any suggestions, and follow me on Twitter or RSS for updates.

backup linux