-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to start worker crashes pool #122
Comments
@juhlig did you find any way around this in the meantime? I'm having the same issue and I'm actually trying to do what ecto does with its connections, it just logs errors with some delay between failured and it doesn't crash |
@alex88 Well, yes and no =^^= That was why I submitted #124... One option is to implement your workers in a way so that the crash-prone part is not part of the starting procedure. In other words, let them always start, and crash later. This of course is only feasible if you have control over the worker implementation. What I really ended up with was an own pool project, |
@juhlig thanks for the detailed answer, I've currently implemented the crashing part after the init with a retry, the problem is that in my case (grpc connection) poolboy will checkout the connection even if it's not connected yet |
Hm, yes, but I would argue that this is not poolboy's responsibility. It just gives you a worker, it doesn't say anything about that worker being ready. I don't know the intricacies of your use case, so the following is based on some guesswork. It looks like what you want is establishing a connection as part of the worker startup routine, ie when poolboy gives you a worker, it is guaranteed to be in a connected state. But if it can't connect... what do you expect to happen then? As you asked a question in this very issue, I guess you expect poolboy to give you "nothing" (instead, raise an exception of sorts in the process calling checkout) because the worker couldn't start (In a sense, this is just what poolboy does, only it crashes and thereby all the other workers in the same pool along the way). You also mentioned retries, so I think what your startup procedure is supposed to do is try to connect, maybe retry a few times, and either complete when it finally got connected, or fail. Even if poolboy did handle the failing part gracefully, baking retries into the start procedure drags it out. Poolboy starts workers in a synchronous fashion, and it will become unresponsive until the startup has completed. I would suggest to start a worker and do the connecting in a second step, as you already do. The worker should block and not accept any work until it has either reached an operational state, or given up (and crashes), completing the second step in this way or that. This way, the pool is always responsive, the individual users of workers may have to wait, but other users may still check workers out from the pool at the same time. |
@juhlig sorry for the delay. I agree it's not poolboy responsibility know which worker is connected and which one is not. my idea of something that would work is that if the init fails, poolboy should just restart it (even better with some backoff) and if no workers are available, the checkout function will timeout and then the user will handle the timeout.
In my case the grpc client throws an error on first connection if it fails but automatically handle reconnections, so if it fails on init you get nothing, if it fails later you can still checkout a client which will return a connection error but will work after the reconnection.
Yes that was my first try and on top of that not only poolboy wasn't starting but also the whole app because poolboy was included in the main app supervisor.
Thanks for the suggestion, maybe some better error handling can also be done on the client side to make the integration with poolboy easier and more reliable. Thank you! |
@alex88 I was having a similar problem while maintaining a connection pool. I found a solution which is not perfect but does the job. You can initialize a worker without making the grpc connection. When the first call is made to the worker you make the connection and store the connection in the wroker's state and use that connection for the rest of the calls. Another solution to not crash the worker on init is to use {continue, Term} and handle_continue/2 in your worker. I don't know if this is even helpful since I'm replying after such a long time. |
Thanks for checking in, we've ended up using a single connection instead of a pool and that works fine for now, will definitely check out your solution if we decide to give it a try again |
Or use |
In the case that a worker cannot be started (for example, connection limit of a database reached), the entire pool will crash. This is tragical with overflow workers, when requesting an overflow worker will crash the pool and all the linked existing workers.
In my case, I'm working in a tight environment where other things running on the same machine may simply exhaust the local ports. A new connection to the database is then not possible, which is somewhat ok, can always try later. However, I wouldn't know that unless starting an overflow worker is attempted, and then all the existing connections in the pool will crash.
The text was updated successfully, but these errors were encountered: