scalar: work around hang in git-fetch(1) with fsmonitor (!148) · Merge requests · GitLab.org / Git · GitLab

Snippets Groups Projects

For a long time, we have seen CI jobs on macOS to fail both in GitLab Workflows and in GitLab CI. After some painful debugging we have found out that the offending test suites are t9210 and t9211. The common symptom here is that there was a git-fetch(1) process hanging while it does seemingly nothing, as well as a bunch of fsmonitor processes. When killing the fsmonitor processes, git-fetch(1) becomes unstuck and the test continues to run.

This issue can only be reproduced when the system is highly loaded. The most successful way to trigger the issue is to run both of these test suites in parallel with --stress. Eventually, tests start to get stuck and progress grinds to a halt.

All of this smells like a race condition somewhere deep in the fsmonitor logic. The most likely scenario is that some events in the FSEventStream used by macOS to listen for filesystem events get lost. I cannot really tell though, and do not have enough knowledge around macOS internals to properly debug this. This is made even harder by the fact that this race only happens sometimes and under high load, which makes it really hard to debug.

Instead of fixing the underlying issue, I have found a workaround that makes the symptom go away: we can start the fsmonitor daemon manually before we execute git-fetch(1). This means that git-fetch(1) won't have to spawn the daemon itself anymore, and that is seemingly sufficient to fix the underlying race. At least CI seems to be happy, and running the two tests with --stress for ~30 minutes didn't surface any hanging tests anymore.

While it feels bad to paper over the issue without fully understanding it, it does at least solve an actual bug. It shouldn't be a regression in functionality either, as we would eventually spawn the fsmonitor even without this change -- either via git-fetch(1), or via a later call to start_fsmonitor_daemon() via register_dir() that we execute after the checkout.

Edited May 24, 2024 by Patrick Steinhardt

Activity

Patrick Steinhardt assigned to @pks-gitlab May 10, 2024

assigned to @pks-gitlab
Patrick Steinhardt added 1 commit May 10, 2024
added 1 commit

f0cbdf12 - x

Compare with previous version
Patrick Steinhardt closed May 10, 2024

closed
Patrick Steinhardt reopened May 15, 2024

reopened
Patrick Steinhardt restored source branch pks-ci-macos-hang May 15, 2024

restored source branch pks-ci-macos-hang
Patrick Steinhardt added 3 commits May 15, 2024
added 3 commits

1dfe3736 - x

08432461 - increaselikelihood

ca1436fa - nproc

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

4125bb8b - sync

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

6088a92e - more tests

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

ae9af6be - moar

Compare with previous version
Patrick Steinhardt added 2 commits May 15, 2024
added 2 commits

8fdaeaa5 - ci slimdown

90580e19 - timebox tests

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

13b9c9da - timebox tests

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

298c81a1 - timebox tests

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

7b13dd1b - ps output

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

a36f7f4b - ps output

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

f27aee52 - ps output

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

e7f964be - posix ps

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

20240889 - moredebug

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

c4bcf468 - x

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

6bfc9de6 - x

Compare with previous version
Patrick Steinhardt added 1 commit May 15, 2024
added 1 commit

df2120fd - perforce update

Compare with previous version

Please register or sign in to reply