Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

Open
constantinpape opened this issue Apr 27, 2020 · 0 comments

Comments

@constantinpape
Copy link
Member

To really get the discussion going that we started a few weeks ago:
It would be beneficial to re-use the cloud_ilastik architecture for running jobs on different target systems (local, slurm, etc.) for my implementations of scalable (3D) segmentation and image analysis algorithms, currently available here https://github.com/constantinpape/cluster_tools.

Briefly, my current implementation has three issues:

  1. To implement a task for a given target, I use a mixin pattern. E.g. to implement an ilastik slurm prediction task, this would look like class IlastikPredictionSlurm(IlastikPredictionBase, SlurmTask), see this for details. This approach has the mdrawback that it does not scale well to new computation targets because for each existing task one needs to define a new mixin subclass.
  2. Monitoring and logging are convoluted (it's fine for me, because I know what's happening, but it's not easily usable for anyone else). This is not really tied to 1, but it would be great to implement a clean solution once and re-use it.
  3. Re-running a partially failed job is very cumbersome and it's usually easier to delete the (intermediate) result and rerun the whole job.

The advantages of using the cloud_ilastik implementation: 1. is solved more elegantly already.
I don't know how/if you have tackled 2 and 3 already, but at least moving to a more common code-base would decrease redundant work. Also, this would allow cloud_ilastik to use the scalable algorithms I have implemented already.

This came up in the context of our more recent project for processing high-throughput screening data, where @Tomaz-Vieira had a closer look at the implementation: sciai-lab/batchlib#5. Since then, I have simplified the design, because we don't really need a multi-target solution. But in general this issue is relevant for batch processing of 2d image as well. Also, for this project I have implemented a solution for issue 3 that works well for images and could probably be extended to nD chunked data, see this for details.

More concretely, the questions I would like to explore:

  • How can we integrate cloud_ilastik and the algorithms in cluster_tools? Can I just use cloud_ilastik as is or is it better to implement a common parent library?
  • Are there existing solutions / libraries we can offload some work to? (I will open a follow-up issue on this soon.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant