Some day you or a collaborator may accidentally commit sensitive data, such as a password or SSH key, into a Git repository. Although you can remove the file from the latest commit with
git rm, the file will still exist in the repository's history. Fortunately, there are other tools that can entirely remove unwanted files from a repository's history. This article will explain how to use two of them:
git filter-branch and the BFG Repo-Cleaner.
Danger: Once you have pushed a commit to GitHub, you should consider any data it contains to be compromised. If you committed a password, change it! If you committed a key, generate a new one.
This article tells you how to make commits with sensitive data unreachable from any branches or tags in your GitHub repository. However, it's important to note that those commits may still be accessible in any clones or forks of your repository, directly via their SHA-1 hashes in cached views on GitHub, and through any pull requests that reference them. You can't do anything about existing clones or forks of your repository, but you can permanently remove all of your repository's cached views and pull requests on GitHub by contacting GitHub support.
To illustrate how
git filter-branch works, we'll show you how to remove the Rakefile from the history of the GitHub gem repository (and add it to
.gitignore to ensure that it is not accidentally re-committed).
Clone the GitHub gem repository.
git clone https://github.com/defunkt/github-gem.git # Initialized empty Git repository in /Users/tekkub/tmp/github-gem/.git/ # remote: Counting objects: 1301, done. # remote: Compressing objects: 100% (769/769), done. # remote: Total 1301 (delta 724), reused 910 (delta 522) # Receiving objects: 100% (1301/1301), 164.39 KiB, done. # Resolving deltas: 100% (724/724), done.
Navigate to the repository's working directory.
git filter-branch, forcing (
--force) Git to process—but not check out (
--index-filter)—the entire history of every branch and tag (
--tag-name-filter cat -- --all), removing the specified file (
'git rm --cached --ignore-unmatch Rakefile') and any empty commits generated as a result (
--prune-empty). Note that you need to specify the path to the file you want to remove, not just its filename.
Be careful! This will overwrite your existing tags.
git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch Rakefile' \ --prune-empty --tag-name-filter cat -- --all # Rewrite 48dc599c80e20527ed902928085e7861e6b3cbe6 (266/266) # Ref 'refs/heads/master' was rewritten
If the file used to exist at any other paths (because it was moved or renamed), you must run this command on those paths, as well.
Add the Rakefile to
.gitignoreto ensure that you don't accidentally commit it again.
echo "Rakefile" >> .gitignore git add .gitignore git commit -m "Add Rakefile to .gitignore" # [master 051452f] Add Rakefile to .gitignore # 1 files changed, 1 insertions(+), 0 deletions(-)
Double-check that you've removed everything you wanted to from your repository's history, and that all of your branches are checked out.
Once you're happy with the state of your repository, force-push your local changes to overwrite your GitHub repository, as well as all the branches you've pushed up:
git push origin --force --all # Counting objects: 1074, done. # Delta compression using 2 threads. # Compressing objects: 100% (677/677), done. # Writing objects: 100% (1058/1058), 148.85 KiB, done. # Total 1058 (delta 590), reused 602 (delta 378) # To https://github.com/defunkt/github-gem.git # + 48dc599...051452f master -> master (forced update)
In order to remove the sensitive file from your tagged releases, you'll also need to force-push against your Git tags:
git push origin --force --tags # Counting objects: 321, done. # Delta compression using up to 8 threads. # Compressing objects: 100% (166/166), done. # Writing objects: 100% (321/321), 331.74 KiB | 0 bytes/s, done. # Total 321 (delta 124), reused 269 (delta 108) # To https://github.com/defunkt/github-gem.git # + 48dc599...051452f master -> master (forced update)
Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.
After some time has passed and you're confident that
git filter-branchhad no unintended side effects, you can force all objects in your local repository to be dereferenced and garbage collected with the following commands (using Git 1.8.5 or newer):
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now # Counting objects: 2437, done. # Delta compression using up to 4 threads. # Compressing objects: 100% (1378/1378), done. # Writing objects: 100% (2437/2437), done. # Total 2437 (delta 1461), reused 1802 (delta 1048)
Note that you can also achieve this by pushing your filtered history to a new or empty repository and then making a fresh clone from GitHub.
The BFG Repo-Cleaner is a faster, simpler alternative to
git filter-branch for removing unwanted data. For example, to remove any file named 'Rakefile' (and leave your latest commit untouched), run:
bfg --delete-files Rakefile
To replace all text listed in
passwords.txt wherever it can be found in your repository's history, run:
bfg --replace-text passwords.txt
See the BFG Repo-Cleaner's documentation for full usage and download instructions.
There are a few simple tricks to avoid committing things you don't want committed:
- Use a visual program like GitHub Desktop or gitk to commit changes. Visual programs generally make it easier to see exactly which files will be added, deleted, and modified with each commit.
- Avoid the catch-all commands
git add .and
git commit -aon the command line—use
git add filenameand
git rm filenameto individually stage files, instead.
git add --interactiveto individually review and stage changes within each file.
git diff --cachedto review the changes that you have staged for commit. This is the exact diff that
git commitwill produce as long as you don't use the