reducing_the_repo_size_using_git.md 4.81 KB
Newer Older
1 2
# Reducing the repository size using Git

Pascal Borreli's avatar
Pascal Borreli committed
3
A GitLab Enterprise Edition administrator can set a [repository size limit][admin-repo-size]
4
which will prevent you from exceeding it.
5 6 7 8 9 10 11

When a project has reached its size limit, you will not be able to push to it,
create a new merge request, or merge existing ones. You will still be able to
create new issues, and clone the project though. Uploading LFS objects will
also be denied.

If you exceed the repository size limit, your first thought might be to remove
12 13 14 15 16 17
some data, make a new commit and push back to the repository. Perhaps you can
move some blobs to LFS, or remove some old dependency updates from history.
Unfortunately, it's not so easy and that workflow won't work. Deleting files in
a commit doesn't actually reduce the size of the repo since the earlier commits
and blobs are still around. What you need to do is rewrite history with Git's
[`filter-branch` option][gitscm], or a tool like the [BFG Repo-Cleaner][bfg].
18 19

Note that even with that method, until `git gc` runs on the GitLab side, the
20 21 22
"removed" commits and blobs will still be around. You also need to be able to
push the rewritten history to GitLab, which may be impossible if you've already
exceeded the maximum size limit.
23

24 25 26 27 28 29 30 31 32 33 34 35
In order to lift these restrictions, the administrator of the GitLab instance
needs to increase the limit on the particular project that exceeded it, so it's
always better to spot that you're approaching the limit and act proactively to
stay underneath it. If you hit the limit, and your admin can't - or won't -
temporarily increase it for you, your only option is to prune all the unneeded
stuff locally, and then create a new project on GitLab and start using that
instead.

If you can continue to use the original project, we recommend [using the
BFG Repo-Cleaner](#using-the-bfg-repo-cleaner). It's faster and simpler than
`git filter-branch`, and GitLab can use its account of what has changed to clean
up its own internal state, maximizing the space saved.
36

37 38 39 40
> **Warning:**
> Make sure to first make a copy of your repository since rewriting history will
> purge the files and information you are about to delete. Also make sure to
> inform any collaborators to not use `pull` after your changes, but use `rebase`.
41

42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
> **Warning:**
> This process is not suitable for removing sensitive data like password or keys
> from your repository. Information about commits, including file content, is
> cached in the database, and will remain visible even after they have been
> removed from the repository.

## Using the BFG Repo-Cleaner

> [Introduced](https://gitlab.com/gitlab-org/gitlab-ce/issues/19376) in GitLab 11.6.

1. [Install BFG](https://rtyley.github.io/bfg-repo-cleaner/).

1. Navigate to your repository:

    ```
    cd my_repository/
    ```

1. Change to the branch you want to remove the big file from:

    ```
    git checkout master
    ```

1. Create a commit removing the large file from the branch, if it still exists:

    ```
    git rm path/to/big_file.mpg
    git commit -m 'Remove unneeded large file'
    ```

1. Rewrite history:

    ```
    bfg --delete-files path/to/big_file.mpg
    ```

    An object map file will be written to `object-id-map.old-new.txt`. Keep it
    around - you'll need it for the final step!

1. Force-push the changes to GitLab:

    ```
    git push --force-with-lease origin master
    ```

    If this step fails, someone has changed the `master` branch while you were
    rewriting history. You could restore the branch and re-run BFG to preserve
    their changes, or use `git push --force` to overwrite their changes.

1. Navigate to **Project > Settings > Repository > Repository Cleanup**:

    ![Repository settings cleanup form](img/repository_cleanup.png)

    Upload the `object-id-map.old-new.txt` file and press **Start cleanup**.
    This will remove any internal git references to the old commits, and run
    `git gc` against the repository. You will receive an email once it has
    completed.

## Using `git filter-branch`

103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
1. Navigate to your repository:

    ```
    cd my_repository/
    ```

1. Change to the branch you want to remove the big file from:

    ```
    git checkout master
    ```

1. Use `filter-branch` to remove the big file:

    ```
    git filter-branch --force --tree-filter 'rm -f path/to/big_file.mpg' HEAD
    ```

1. Instruct Git to purge the unwanted data:

    ```
    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    ```

1. Lastly, force push to the repository:

    ```
    git push --force origin master
    ```

Your repository should now be below the size limit.

[admin-repo-size]: https://docs.gitlab.com/ee/user/admin_area/settings/account_and_limit_settings.html#repository-size-limit
[bfg]: https://rtyley.github.io/bfg-repo-cleaner/
[gitscm]: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch