âšī¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.1 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://rtyley.github.io/bfg-repo-cleaner/ |
| Last Crawled | 2026-04-13 16:18:51 (2 days ago) |
| First Indexed | 2014-05-31 03:00:40 (11 years ago) |
| HTTP Status Code | 200 |
| Meta Title | BFG Repo-Cleaner by rtyley |
| Meta Description | A simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from Git history. |
| Meta Canonical | null |
| Boilerpipe Text | $ bfg --strip-blobs-bigger-than 100M --replace-text banned.txt repo.git
The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history: Removing Crazy Big Files Removing Passwords , Credentials & other Private data The git-filter-branch command is enormously powerful and can do things that the BFG can't - but the BFG is much better for the tasks above, because: Faster : 10 - 720x faster Simpler : The BFG isn't particularily clever, but is focused on making the above tasks easy Beautiful : If you need to, you can use the beautiful Scala language to customise the BFG. Which has got to be better than Bash scripting at least some of the time. First clone a fresh copy of your repo, using the --mirror flag: $ git clone --mirror git://example.com/some-big-repo.git
This is a bare
repo, which means your normal files won't be visible, but it is a full copy of the Git database of your repository, and at this point
you should make a backup of it to ensure you don't lose anything. Now you can run the BFG to clean your repository up: $ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements: $ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Finally, once you're happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror flag, this push will update all refs on your remote server) : $ git push At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.
In all these examples bfg is an alias for java -jar bfg.jar . Delete all files named 'id_rsa' or 'id_dsa' : $ bfg --delete-files id_{dsa,rsa} my-repo.git Remove all blobs bigger than 50 megabytes : $ bfg --strip-blobs-bigger-than 50M my-repo.git Replace all passwords listed in a file (prefix lines 'regex:' or 'glob:' if required) with ***REMOVED*** wherever they occur in your repository : $ bfg --replace-text passwords.txt my-repo.git Remove all folders or files named '.git' - a reserved filename in Git. These often
become a problem when migrating to
Git from other source-control systems like Mercurial : $ bfg --delete-folders .git --delete-files .git --no-blob-protection my-repo.git
For further command-line options, you can run the BFG without any arguments,
which will output text like this .
By default the BFG doesn't modify the contents of your latest commit on your
master (or ' HEAD ') branch, even though it will clean
all the commits before it.
That's because your latest commit is likely to be the one
that you deploy to production, and a simple deletion of a private credential or a big
file is quite likely to result in broken code that no longer has the hard-coded data it
expects - you need to fix that, the BFG can't do it for you. Once you've committed your
changes- and your latest commit is clean with none of the undesired data in it -
you can run the BFG to perform it's simple deletion operations over all your historical
commits.
Note: Cleaning Git repos is about completely eradicating bad stuff from history.
If something 'bad' (like a 10MB file, when you're specifying
--strip-blobs-bigger-than 5M ) is in a protected commit, it won't be
deleted - it'll persist in your repository,
even
if the BFG deletes if from earlier commits . If you want the BFG to delete
something you need to make sure your current commits are clean .
Note that although the files in those protected commits won't be changed, when those commits follow on from earlier dirty commits, their commit
ids will change, to reflect the changed history - only the SHA-1 id of the filesystem-tree will remain the same.
If you want to turn off the protection (in general, not recommended) you can
use the --no-blob-protection flag:
$ bfg --strip-biggest-blobs 100 --no-blob-protection repo.git The BFG is 10 - 720x faster
than git-filter-branch , turning an overnight job into one that takes less than ten minutes . BFG's performance advantage is due to these factors: The approach of git-filter-branch is to step through every commit in your repository, examining the complete file-hierarchy of each one. For the intended use-cases of The BFG this is wasteful, as we don't care where in a file structure a 'bad' file exists - we just want it dealt with. Inherent in the nature of Git is that every file and folder is represented precisely once (and given a unique SHA-1 hash-id). The BFG takes advantage of this to process each and every file & folder exactly once - no need for extra work. Taking advantage of the great support for parallelism in Scala and the JVM, the BFG does multi-core processing by default - the work of cleaning your Git repository is spread over every single core in your machine and typically consumes 100% of capacity for a substantial portion of the run. All action takes place in a single process (the process of the JVM), so doesn't require the frequent fork-and-exec-ing needed by git-filter-branch 's mix of Bash and C code.
I tried deleting using several "how to" blog entries for git filter-branch, but wasn't successful. Then tried The BFG; worked like a champ - very cool tool!
â Bill Hunt , CTO at OptTown
I found The BFG Repo-Cleaner and ran it to clean up some large files, and was amazed by the performance.
â Jason Frey , Software Engineer at Red Hat
I was able to shrink the current repository down to ~500 megabytes in about 10 minutes when using this tool. My hand crafted scripts clock in at 615 megabytes in 3 days time for comparison.
â Elliot Glaysher , Google Software Engineer on Google Chrome
The BFG was simple to set up and so fast that I had to ask Roberto, "Is that it?" and check for myself... it worked exactly as intended.
â Nicholas Tollervey , Developer at The Guardian
Roberto's creations ( Agit and The BFG) are both very cool ;-)
â Junio C Hamano , Maintainer of Git Also see more feedback on Twitter... That's it - the Scala library and all other dependencies are folded into the downloadable jar . The BFG is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version. The BFG is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details. |
| Markdown | # BFG Repo-Cleaner
## Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
[View project on GitHub](https://github.com/rtyley/bfg-repo-cleaner)
```
$ bfg --strip-blobs-bigger-than 100M --replace-text banned.txt repo.git
```
# an alternative to git-filter-branch
The BFG is a simpler, faster alternative to [`git-filter-branch`](https://git-scm.com/docs/git-filter-branch) for cleansing bad data out of your Git repository history:
- Removing **Crazy Big Files**
- Removing **Passwords**, **Credentials** & other **Private data**
The `git-filter-branch` command is enormously powerful and can do things that the BFG can't - but the BFG is *much* better for the tasks above, because:
- [Faster](https://rtyley.github.io/bfg-repo-cleaner/#speed) : **10 - 720x** faster
- [Simpler](https://rtyley.github.io/bfg-repo-cleaner/#examples) : The BFG isn't particularily clever, but *is* focused on making the above tasks easy
- Beautiful : If you need to, you can use the beautiful Scala language to customise the BFG. Which has got to be better than Bash scripting at least some of the time.
# Usage
First clone a fresh copy of your repo, using the [`--mirror`](https://stackoverflow.com/q/3959924/438886) flag:
```
$ git clone --mirror git://example.com/some-big-repo.git
```
This is a [bare](https://git-scm.com/docs/gitglossary.html#def_bare_repository) repo, which means your normal files won't be visible, but it is a *full* copy of the Git database of your repository, and at this point you should **make a backup of it** to ensure you don't lose anything.
Now you can run the BFG to clean your repository up:
```
$ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
```
The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard [`git gc`](https://git-scm.com/docs/git-gc) command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:
```
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
```
Finally, once you're happy with the updated state of your repo, push it back up *(note that because your clone command used the `--mirror` flag, this push will update **all** refs on your remote server)*:
```
$ git push
```
At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you *don't* want to risk pushing back into your newly cleaned repo.
# Examples
In all these examples `bfg` is an alias for `java -jar bfg.jar`.
Delete all files named 'id\_rsa' or 'id\_dsa' :
```
$ bfg --delete-files id_{dsa,rsa} my-repo.git
```
Remove all blobs bigger than 50 megabytes :
```
$ bfg --strip-blobs-bigger-than 50M my-repo.git
```
Replace all passwords listed in a file *(prefix lines 'regex:' or 'glob:' if required)* with `***REMOVED***` wherever they occur in your repository :
```
$ bfg --replace-text passwords.txt my-repo.git
```
Remove all folders or files named '.git' - a [reserved filename](https://github.com/git/git/blob/d29e9c89d/fsck.c#L228-L229) in Git. These often [become a problem](https://stackoverflow.com/q/16821649/438886) when migrating to Git from other source-control systems like Mercurial :
```
$ bfg --delete-folders .git --delete-files .git --no-blob-protection my-repo.git
```
For further command-line options, you can run the BFG without any arguments, which will output [text like this](https://repository.sonatype.org/service/local/artifact/maven/redirect?r=central-proxy&g=com.madgag&a=bfg&v=LATEST&e=txt).
# Your *current* files are sacred...
By default the BFG doesn't modify the contents of your *latest* commit on your `master` (or '`HEAD`') branch, even though it *will* clean all the commits before it.
That's because your latest commit is likely to be the one that you deploy to production, and a simple deletion of a private credential or a big file is quite likely to result in broken code that no longer has the hard-coded data it expects - you need to fix that, the BFG can't do it for you. Once you've committed your changes- and your latest commit is *clean* with none of the undesired data in it - you can run the BFG to perform it's simple deletion operations over all your historical commits.
Note:
- Cleaning Git repos is about *completely* eradicating bad stuff from history. If something 'bad' (like a 10MB file, when you're specifying `--strip-blobs-bigger-than 5M`) is in a protected commit, it *won't* be deleted - it'll persist in your repository, [even if the BFG deletes if from earlier commits](https://github.com/rtyley/bfg-repo-cleaner/issues/53#issuecomment-50088997). If you want the BFG to delete something **you need to make sure your current commits are *clean***.
- Note that although the files in those protected commits won't be changed, when those commits follow on from earlier dirty commits, their commit ids **will** change, to reflect the changed history - only the SHA-1 id of the filesystem-tree will remain the same.
If you want to turn off the protection (in general, not recommended) you can use the `--no-blob-protection` flag:
```
$ bfg --strip-biggest-blobs 100 --no-blob-protection repo.git
```
# Faster...
The BFG is [10 - 720x](https://docs.google.com/spreadsheet/ccc?key=0AsR1d5Zpes8HdER3VGU1a3dOcmVHMmtzT2dsS2xNenc) faster than `git-filter-branch`, turning an *overnight* job into one that takes *less than ten minutes*.
BFG's performance advantage is due to these factors:
- The approach of `git-filter-branch` is to step through every commit in your repository, examining the complete file-hierarchy of each one. For the intended use-cases of The BFG this is wasteful, as we don't care *where* in a file structure a 'bad' file exists - we just want it dealt with. Inherent in the nature of Git is that *every* file and folder is represented precisely once (and given a unique [SHA-1](https://en.wikipedia.org/wiki/SHA-1) hash-id). The BFG takes advantage of this to process each and every file & folder exactly **once** - no need for extra work.
- Taking advantage of the great support for parallelism in [Scala](https://docs.scala-lang.org/overviews/parallel-collections/overview.html) and the JVM, the BFG does multi-core processing by default - the work of cleaning your Git repository is spread over every single core in your machine and typically consumes 100% of capacity for a substantial portion of the run.
- All action takes place in a single process (the process of the JVM), so doesn't require the frequent fork-and-exec-ing needed by `git-filter-branch`'s mix of Bash and C code.
# Feedback
> I tried deleting using several "how to" blog entries for git filter-branch, but wasn't successful. Then tried The BFG; worked like a champ - very cool tool\!
>
> â [Bill Hunt](https://linkedin.com/in/billh), CTO at OptTown
> I found The BFG Repo-Cleaner and ran it to clean up some large files, and was amazed by the performance.
>
> â [Jason Frey](https://github.com/Fryguy), Software Engineer at [Red Hat](https://www.redhat.com/)
> I was able to shrink the current repository down to ~500 megabytes in about 10 minutes when using this tool. My hand crafted scripts clock in at 615 megabytes in 3 days time for comparison.
>
> â [Elliot Glaysher](https://github.com/eglaysher), Google Software Engineer on [Google Chrome](https://code.google.com/p/chromium/issues/detail?id=111570#c29)
> The BFG was simple to set up and so fast that I had to ask Roberto, *"Is that it?"* and check for myself... it worked exactly as intended.
>
> â [Nicholas Tollervey](https://github.com/ntoll), Developer at [The Guardian](https://www.guardian.co.uk/)
> Roberto's creations ([Agit](https://play.google.com/store/apps/details?id=com.madgag.agit) and The BFG) are both very cool ;-)
>
> â [Junio C Hamano](https://git-blame.blogspot.com/), [Maintainer of Git](https://github.blog/2020-04-07-celebrating-15-years-of-git-an-interview-with-git-maintainer-junio-hamano/)
**Also see more feedback on** [**Twitter...**](https://twitter.com/rtyley/timelines/464727264345993216)
# Requirements
- The [Java Runtime Environment](https://www.oracle.com/java/technologies/downloads/) (**Java 11** or above - BFG [v1.14.0](https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar) was the last version to support Java 8)
That's it - the Scala library and all other dependencies are folded into the [downloadable jar](https://repo1.maven.org/maven2/com/madgag/bfg/1.15.0/bfg-1.15.0.jar).
# Links...
- [Rewriting Git project history with The BFG](https://www.guardian.co.uk/info/developer-blog/2013/apr/29/rewrite-git-history-with-the-bfg) - a blogpost for The Guardian
- [GitMinutes](https://episodes.gitminutes.com/2013/04/gitminutes-06-roberto-tyley-on.html) podcast interview
- [Git Going Faster... with Scala](https://www.parleys.com/play/53a7d2d1e4b0543940d9e56b) - talk for ScalaDays 2014, later Parleys *[Presentation of the Day](https://twitter.com/Parleys/status/517319848331083776)*
- [InfoQ interview](https://www.infoq.com/articles/git-Cleaner)
- [Questions tagged `git-rewrite-history`](https://stackoverflow.com/questions/tagged/git-rewrite-history) on Stack Overflow
# License
The BFG is free software: you can redistribute it and/or modify it under the terms of the [GNU General Public License](https://www.gnu.org/licenses/gpl.html) as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The BFG is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
[Download v1.15.0](https://repo1.maven.org/maven2/com/madgag/bfg/1.15.0/bfg-1.15.0.jar)
[The BFG](https://github.com/rtyley/bfg-repo-cleaner) is by [Roberto Tyley](https://github.com/rtyley), the author of [Prout](https://github.com/guardian/prout), [gu:who](https://github.com/guardian/gu-who), [Agit](https://github.com/rtyley/agit) and the packager of [Spongy Castle](https://rtyley.github.io/spongycastle/). [Twitter](https://twitter.com/rtyley) [PGP](https://rtyley.github.io/bfg-repo-cleaner/rtyley.gpg)
[](https://github.com/rtyley/bfg-repo-cleaner/actions/workflows/ci.yml)
This page was generated by [GitHub Pages](https://pages.github.com/) using the Architect theme by [Jason Long](https://bsky.app/profile/jasonlong.me). |
| Readable Markdown | null |
| Shard | 143 (laksa) |
| Root Hash | 2566890010099092343 |
| Unparsed URL | io,github!rtyley,/bfg-repo-cleaner/ s443 |