How can git log --cherry-pick be parallelized?

118 Views Asked by At

I've inherited some code that is using git log --no-merges --right-only --cherry-pick --since='2 months ago' some_tag..origin/master -- path1, path2, ... as an initial step in determining commits that are missing from some_tag. The main problem is it's slow and there is no status.

I can use git log git log --since='2 months ago' origin/master -- path1, path2, ... to get all the commits added for those paths in the specified time, which is fast. Then I'd like to spawn multiple threads to then check the commits individually, but I'm not sure what the equivalent would be for a single commit. Perhaps generating a patch file and using git apply --check and git apply --reverse --check, but I'm not sure that would be equivalent.

Or perhaps there is a more direct way to do it?

2

There are 2 best solutions below

1
LeGEC On

You can call git patch-id (in parallel) on the patch for each of these commits, and then compare which commits have the same patch-id on both sides.

It is not entirely clear if you are looking to port complete commits (in which case you should compute patch-id on the complete diff) or only on that specific subset of files (in which case you could restrict the input diff to that set of paths only)

0
aviso On

Per this conversation, the file paths are what make this a long running operation and the time can be significantly reduced by taking the output of git log --no-merges --right-only --cherry-pick --since='2 months ago' some_tag..origin/master and filtering it with the output of git log --since='2 months ago' origin/master -- path1, path2, ... In my case this produces the same result in seconds. It is not clear if this will work in all cases or if this something that can be optimized internally.