关于git gc:应该多久使用一次git-gc?

关于git gc:应该多久使用一次git-gc?

How often should you use git-gc?

您应该多久使用一次git-gc?

手册页只说:

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

是否有一些命令来获取一些对象计数,以确定是否是时候进行gc?


这主要取决于使用多少存储库。一个用户每天检查一次,而分支/合并/等操作每周检查一次,您可能不需要每年运行一次以上。

数十名开发人员从事数十个项目,每个项目每天要检查2-3次,因此您可能希望每晚运行一次。

不过,频繁运行它不会有什么坏处。

我要做的是现在运行它,然后从现在开始的一周内测量磁盘利用率,再次运行,然后再次测量磁盘利用率。如果大小减少5%,则每周运行一次。如果下降幅度更大,则应更频繁地运行它。如果其下降幅度较小,则应减少运行频率。


请注意,垃圾收集存储库的不利之处在于,垃圾被收集了。众所周知,作为计算机用户,我们现在认为垃圾的文件在未来三天之内可能会变得非常有价值。 git保留了大部分碎片的事实已经节省了我的培根数次–通过浏览所有悬而未决的提交,我已经恢复了很多我偶然罐装的工作。

因此,不要在您的私人克隆游戏中有太多整齐的怪胎。几乎不需要它。

OTOH,对于主要用作远程设备的回购协议,数据可恢复性的价值值得怀疑。所有开发人员推入和/或撤出的地方。在那里开始频繁运行GC和重新包装可能是明智的。


git的最新版本会在需要时自动运行gc,因此您无需执行任何操作。请参阅man git-gc(1)的"选项"部分:"某些git命令在执行可能会创建许多松散对象的操作后运行git gc --auto。"


如果您使用的是Git-Gui,它会告诉您何时应该担心:

1
This repository currently has approximately 1500 loose objects.

以下命令将带来一个相似的数字:

1
$ git count-objects

除了从源头来看,git-gui会自己做数学运算,实际上是在.git/objects文件夹中计数,可能会带来一个近似值(我不知道tcl不能正确读取它!)。

在任何情况下,似乎都会根据300个松散物体附近的任意数量发出警告。


您可以使用新的(Git 2.0 Q2 2014)设置gc.autodetach进行操作,而不会受到任何干扰。

参见commit 4c4ac4d和9f673f9(Nguy?nTháiNg?c Duy,又名pclouds):

gc --auto takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc output is not really interesting.
You can keep it in foreground by changing gc.autodetach.

从该2.0版本开始,尽管存在一个错误:git 2.7(Q4 2015)将确保不会丢失该错误消息。
参见Nguy?nTháiNg?c Duy(pclouds)的commit 329e6e8(2015年9月19日)。
(由Junio C Hamano合并-gitster-在076c827号提交中,2015年10月15日)

gc: save log from daemonized gc --auto and print it next time

While commit 9f673f9 (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid"gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log.
Following gc --auto will not run and gc.log printed out until the user removes gc.log.


在进行大量结帐后,我使用了git gc,并且有很多新对象。它可以节省空间。例如。如果您使用git-svn签出一个大型SVN项目并执行git gc,通常可以节省很多空间


将其放入每天晚上(下午?)运行的cron工作中。


引用来自:
使用Git进行版本控制

Git runs garbage collection automatically:

? If there are too many loose objects in the repository

? When a push to a remote repository happens

? After some commands that might introduce many loose objects

? When some commands such as git reflog expire explicitly request it

And finally, garbage collection occurs when you explicitly request it
using the git gc command. But when should that be? There’s no solid
answer to this question, but there is some good advice and best
practice.

You should consider running git gc manually in a few
situations:

? If you have just completed a git filter-branch . Recall that
filter-branch rewrites many commits, introduces new ones, and leaves
the old ones on a ref that should be removed when you are satisfied
with the results. All those dead objects (that are no longer
referenced since you just removed the one ref pointing to them)
should be removed via garbage collection.

? After some commands that might introduce many loose objects. This
might be a large rebase effort, for example.

And on the flip side,
when should you be wary of garbage collection?

? If there are orphaned refs that you might want to recover

? In the context of git rerere and you do not need to save the
resolutions forever

? In the context of only tags and branches being sufficient to cause
Git to retain a commit permanently

? In the context of FETCH_HEAD retrievals (URL-direct retrievals via
git fetch ) because they are immediately subject to garbage collection

? In the context of only tags and branches being sufficient to cause
Git to retain a commit permanently

? In the context of FETCH_HEAD retrievals (URL-direct retrievals via
git fetch ) because they are immediately subject to garbage collection


我在执行大型提交时使用,尤其是当我从存储库中删除更多文件时使用..之后,提交速度更快


您不必经常使用git gc,因为git gc(垃圾收集)是在几个常用命令上自动运行的:

1
2
3
4
git pull
git merge
git rebase
git commit

来源:git gc最佳做法和常见问题解答


推荐阅读