-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of ZooCache instances and Watchers #5183
base: main
Are you sure you want to change the base?
Conversation
This commit removes most of the places where ZooCache instances were being created in favor of re-using the ZooCache from the ClientContext. Additionally, this commit does not place a Watcher on each node that is cached and instead places a single persistent recursive Watcher at the paths in which the caching is taking place. This change roughly reduces the Watchers reported in WatchTheWatchCountIT by 50%. While reducing the number of Watchers, this commit could reduce ZooKeeper server performance in two ways: 1. There is a note in the ZooKeeper javadoc for the AddWatchMode enum that states there is a small performance decrease when using recursive watchers as all of the segments of ZNode paths need to be checked for watch triggering. 2. Because a Watcher is not set on each node this commit modified the ZooCache.ZCacheWatcher to remove the parent of the triggered node, the triggered node, and all of its siblings from the cache. This overmatching may mean increased lookups in ZooKeeper. Related to apache#5134 Closes apache#5154, apache#5157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how much this conflicts with #4809 and #5124, I'd like to hold off on some of these, and would prefer to go with a much more narrow change that just reuses the existing ZooCache on the context. The prerequisite branch I made to help @meatballspaghetti get started on the chroot stuff for #4809 and the work I have already nearly wrapped up for #5124 are heavily conflicted with these changes, and I'd prefer to get those in first.
Additionally, this PR seems to be broken on its reliance on a newer ZooKeeper version, newer than the one we have been testing as a supported. Do you know which ZK version introduced these features? If the versions required to support this change are too narrow, it might really limit the deployment options for users, if that range of versions is buggy in some way. In the past, we've generally supported a pretty large range of ZooKeeper versions, allowing quite a lot of flexibility for users when making upgrade decisions. I'd like to not narrow the versions of ZK we support too much, if this was only introduced in 3.9 or maybe even 3.8. But if this was introduced in 3.6 or 3.7, it's probably fine.
I also have questions about how these new watchers work, and whether it'd actually perform better. It seems there were some open questions about that.
// Get the instanceID using ZooKeeper, not ZooCache as we will need | ||
// the instanceID to create the ZooKeeper root path when creating | ||
// ZooCache. If the instanceId cannot be found, this is a good | ||
// time to find out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chroot stuff that @meatballspaghetti is working on addresses this a different way (very similar, actually, but as a prerequisite to the chroot stuff, I've moved some of this code into ZooUtil, out of the ClientContext and am further working to help improve that situation by simplifying the bootstrapping of contexts by improving the way ClientInfo and ServerInfo interact with ZooKeeper to initialize it).
I'm also getting rid of ZooCacheFactory and changing ZooCache to decorate a ZooKeeper as part of my work to remove ZooSession in #5124, and these changes conflict heavily with that.
I think this would be easier and simpler to do if this was held off until at least #5124 is done, but possibly until some of the chroot stuff is done, too, because it could have a significant impact on it.
Right now, I don't think we have an excessive number of watchers like we've had in the past, so it's not much of a problem, and I think the benefits of simplifying ZooSession and chroot our ZK interactions are going to be much more beneficial.
Re-using the ZooCache on the context is actually the larger change code-wise. Modifying ZooCache to use the persistent recursive watchers is pretty much self contained within ZooCache except for some tests.
I have no issue with waiting for #4809 and #5124 to be merged first, assuming they are done relatively soon.
Persistent Recursive Watchers were added in 3.6.0, so it's a very minor bump in the minimum required version.
I agree, which is why I targeted this for 4.0, to give us time to test it.
I don't know what excessive means in this context. The WatchTheWatchCountIT is failing due to the increased watchers added in a recent PR for caching the non-existent nodes in ZooCache. |
In the past, we had watchers scale as |
This commit removes most of the places where ZooCache instances were being created in favor of re-using the ZooCache from the ClientContext. Additionally, this commit does not place a Watcher on each node that is cached and instead places a single persistent recursive Watcher at the paths in which the caching is taking place.
This change roughly reduces the Watchers reported in WatchTheWatchCountIT by 50%. While reducing the number of Watchers, this commit could reduce ZooKeeper server performance in two ways:
There is a note in the ZooKeeper javadoc for the AddWatchMode enum that states there is a small performance decrease when using recursive watchers as all of the segments of ZNode paths need to be checked for watch triggering.
Because a Watcher is not set on each node this commit modified the ZooCache.ZCacheWatcher to remove the parent of the triggered node, the triggered node, and all of its siblings from the cache. This overmatching may mean increased lookups in ZooKeeper.
Related to #5134
Closes #5154, #5157