-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout for stuck workers #65
Comments
Are you restarting the goworker application? We see this behaviour when the application is hard-stopped and doesn't have time to cleanup the records in Redis ( |
@mingan Yes, sometimes we are re-deploying the Docker containers when jobs are still running, that might be the cause. I think we should definitely cleanup when starting up again. Edit: const shutdown = async () => {
await scheduler.end();
await worker.end();
process.exit();
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown); |
Yeah, the problem is to figure out a safe mechanism to do so and keep it compatible with the Resque gem. |
We don't have any disadvantage from those jobs other than memory consumption from Redis, right? So concurrency still works as expected and the stuck jobs are not being considered by |
If it's the same issue we have experienced, there are extra values in the set of workers and the dead workers appear to still be working in the UI (there are records under the given prefix). I'm not sure if the jobs themselves are failed or abandoned, that might be an issue. There's similar code in goworker https://github.com/benmanns/goworker/blob/master/signals.go which stops polling and stops idle workers. I don't remember it correctly and don't have time to look it up at the moment but I think it doesn't force a running worker to stop so unless it finishes normally, it might hang. |
This logic should be added like it's on the "main" Resque: https://github.com/resque/resque/blob/master/lib/resque/worker.rb#L599 Which basically consists on having a I'll try to work on this and add it to the lib, would this be something that would be merged if implemented? (cc @benmanns) |
It can happen that workers get stuck silently. We were using
node-resque
worker before, which handled this scenario very well. Withgoworker
, jobs just keep shown as running inresque-web
.Also, the worker count in
resque-web
keeps increasing (should be 4).The text was updated successfully, but these errors were encountered: