Retaining History When Moving Files Across Repositories in Git

Splitting or merging repositories does not have to cause loss of history

 ·  4 minutes read

As my friends and colleagues know, I think the history of a project is very important for development. Being able to bisect, blame or read the log have proven very useful for finding bugs and understanding why a piece of code was written the way it was. Therefore, it makes sense that I would do whatever I can to make sure history is preserved when moving files across repositories.

Luckily for us, git has made it extremely easy.

Merging repositories

Merging a repository (bar) into another repository (foo) is easy.

$ cd /path/to/foo
$ # To use a local copy, replay the url with: file:///path/to/bar/.git
$ git remote add bar https://git.domain.com/bar.git
$ git fetch bar
$ git merge --allow-unrelated-histories bar/master

This is it. It is very simple and retains all of the history from bar while maintaining the same commit hashes! This means that for example daed567e will point to the same commit in both foo and bar.

Unfortunately it is not always that simple. Sometimes you may face conflicts, if for example you had a README file in both repositories, the merge operation will fail. Luckily, this is also easy to solve.

First, abort the failed merge (if you already tried to merge):

$ git merge --abort

Now switch to a temporary branch that holds bar:

$ git checkout -b barmaster bar/master

Now you can deal with the conflicting files by either removing them, moving all of bar into a directory such as bar_directory or renaming them individually.

We can finally switch back to master and merge our branch again:

$ git checkout master
$ git merge --allow-unrelated-histories barmaster

We are done. Do not forget to push your changes.

Update: Starting from version 2.9, git requires --allow-unrelated-histories for the above merges. Thanks a lot to Jeff Evans for the correction.

Splitting repositories

Splitting repositories is slightly more involved compared to merging them, because in this case we would like to remove all of the unrelated files and commits from history so our new repository is clean.

There are two approaches for this stage. The whitelist (we only keep a list of files) and the blacklist (we keep everything except for the list of files). I prefer the whitelist approach, so I will only cover that one.

For this example we will split bar out of foobar.

Let us first start by switching to a temporary branch we can work on.

$ git checkout -b tmp

Now we need to decide which files we would like to preserve.

Now we will create a script that moves the correct files into a new temporary directory and run it on all of our repository's history.

#!/bin/bash

mkdir -p newroot/

# Redirect output to silence "file not found" warnings.
mv README.md newroot/ 2>/dev/null
mv src newroot/ 2>/dev/null

true

Now run the script on our history:

$ git filter-branch -f --prune-empty --tree-filter /path/to/script HEAD

After that we should have a new repository with a directory called newroot that contains all of the files we wish to preserve. If we spotted an issue, we can just reset our branch to the initial state (git reset --hard master) and try again, otherwise, we can move to the next step: filtering the repository to be only this directory.

$ git filter-branch --prune-empty -f --subdirectory-filter newroot

Assuming everything is correct we can go on and push it to our new repository as master.

$ git remote add bar git+ssh://git@git.domain.com/bar.git
$ git push bar tmp:master

That is it. You have now split bar out of foo. The last remaining thing to do is to delete the remaining bar related files from our foobar repository and commit the changes.

Moving arbitrary files between repositories

Moving arbitrary files is very easy when you consider it is just a split from one repository followed by a merge to another. For this reason I will not elaborate further, just follow the two sections above.

Finishing notes

This is a very simple guide. In some more complex cases you will probably have to write more complex scripts or use some optimisation techniques. I suggest you also take a look at my slides from a talk I gave about migrating the Enlightenment project from SVN to git. They contain some useful tips and tricks. Especially if you have a big project with a very rich history.

Please let me know if you encountered any issues or have any suggestions.

Please let me know if you spotted any mistakes or have any suggestions, and follow me on Twitter or RSS for updates.

git programming