Sometimes it happens that a teammate commits unwanted files to the git repository and later we delete them from the repo. But still these files are in git history, so every clone of repository will fetch these files history which consumes time, bandwidth and disk space.
Let’s check way to clean up the git repository for deleted files.
“Make sure you take a backup copy of local repository for anything that goes wrong in your case.”
git filter-branch
Use git filter-branch command to remove a file from all the commits:
Copy to Clipboard
1 git filter-branch --prune-empty -d /dev/shm/scratch \
2 --index-filter "git rm --cached -f --ignore-unmatch filename" \
3--tag-name-filter cat -- --all
git filter-branch options used:
- --prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
- -d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
- --index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
- --tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
- -- specifies the end of options to git filter-branch
- --all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
You can also remove a whole directory:
1 git filter-branch --prune-empty -d /dev/shm/scratch \
2 --index-filter "git rm --cached -f --ignore-unmatch filename" \
3 --tag-name-filter cat -- --all
You can check that, commits including the file have been modified and commit with only that file are removed from the log. Check using gitk or git log.
Shrink the repository
We used git-filter-branch to get rid of files from commits. People expect the resulting repository to be smaller than the original, but you need a few more steps to actually make it smaller because Git tries hard not to lose your objects until you tell it to.
- Remove the original refs backed up by git-filter-branch (do this for all branches):
Copy to Clipboard1
git update-ref -d refs/original/refs/heads/master
Expire all reflogs with:
Copy to Clipboard1
git reflog expire --expire=now --all
- Garbage collect all unreferenced objects with
Copy to Clipboard1
git gc --prune=now
You are ready to push now.
git push
Push your updated tree on the git repository. Make sure you have enough rights to do so.
Amit K
- Amit Kansagara is a seasoned ERP solution expert with over 15 years of experience in multiple industries. He has spent more than a decade in Australia, Malaysia, and the United States providing custom software solutions. He specializes in automation, enabling firms to focus on key activities through the use of effective ERP systems. He currently works as an ERP Consultant and specializes in designing and implementing solutions for large-scale organizations, with a focus on RFID-based inventory systems, AI integration, and process automation. Amit is committed to assisting enterprises in optimizing their operations and achieving long-term success through innovative technological solutions.
Schedule Consultation with Amit Schedule Now