A little of my Git background. I've been using Git, the free and open source distributed version control system, since at least 2007. It started out as an easier way to get a copy of the Linux kernel, but it wasn't until I worked at improving the one wire system in the kernel that I really started to see the advantage of Git over any revision control software I had previously used. The one wire system gives access to the one wire bus which talks to some tiny temperature sensors. The driver could deadlock, crash, and ate up the CPU, and fixing it took some 30 individual bug fixes and feature improvements. I've always seen and used revision control software (configuration management software, source control software, what ever you want to call it), as a way to record what you did, get back what it used to look like, and give diffs between the two. In developing the one wire changes I found out that Git can improve the source code not just by adding on more changes on the top of what is already there, but it lets you go back and edit the history. That editing the history is a really important feature for open source software because it lets you freely experiment and add debugging routines and not worry about the history you are building up, because when you get it to the state that is working you can go back clean things up. That cleanup can be removing those debug routines, logically reorder the changes, and whatever else to make the history be a pristine set of logical bug fixes and features when you go to submit them for inclusion back to the owner. It's also sometimes easier to go back and fix a bug a few revisions earlier than it is to add it ontop. I know it isn't just the history editing feature that is making Git be my preferred revision control software, but overall Git just fits into my workflow better. Git grafting one history onto a commit of another history. Being a distributed version control system means Git is designed for a repository to be cloned, worked on in isolation, and then later be merged back as easily as possible. The problem is it doesn't work at all when a repository isn't cloned from another, that is, when they don't have a common parent commit. Unfortunately in the corporate world and any non distributed version control system that's how things work. Someone checks out a copy of the source from one system, imports those files to another repository (because they don't have write access to the first one, because it's not within the corporate network, or all of the above and the project is hosted in subversion and you just want to use git), for whatever the reason, and later you want to merge them back together. For the subversion project, that's no longer a problem because there's git-svn, it will extract the entire history and leave you with a git repository that will allow incremental fetching later on to keep them up to date. But what about that older fork of the source code and it's history? Git like some other systems has a way to dump the entire history of a repository to a file, and restore that to another repository, but as they won't have a common initial parent they will be two completely disconnected trees that just happen to live in one repository.

Logically the one history is just a branch off some point in the other tree. Just identifying that point doesn't help. That is unless the fork has no branches and no merging, in which case it's git-format-patch and git-am to import it. But for something more complicated with branches and merges that you actually want to preserve, this is how I did it. I'll call it grafting one history onto another.

The commands are git-fast-export and git-fast-import, but just dumping and importing will create a disconnected tree. The first step is to identify where the fork happened. Setup a branch at that location in the target repository to have an exact set of files of the revision going to be imported. This will require knowing where the repository to be grafted branched off from. All is not lost they don't exactly match. If that is the case, checkout the initial revision from the repository to be grafted and copy all the files over and make a commit in the target to make them match.

From the graft repository,

git-fast-export --all > dump.txt
edit dump.txt

Remove the initial commits that aren't wanted so that the last commit removed has the exact same set of of files as the target branch. Basically you are looking for a line that starts with the word commit, delete that line and all lines before the next line that starts with blob. Note, it's easier to leave the initial blob entries than sorting out which ones are only used by the commits removed. Modify the line in the first commit to be kept that starts with from. It will currently be a colon followed by an integer referencing a commit message that was removed, or if it was the first commit it will need to be added (look at a later commit for the syntax). Change the current value (including the colon) to the sha1 commit id of the target branch where the rest of the dump will be grafted onto. For example the from and to lines,

from :2
from e4af540ede26d707e5d67ba7a5ed58b3bcac234a

Then in the target repository,

git-fast-import < dump.txt

and look at your now unified repository. In the future just use git for everything and always clone a repository, but if you don't it only takes a little manual tweaking for git to handle that as well.