-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement Orso file writing #6
Comments
Next step:
|
Enumerating all information in detail here makes little sense because there is a lot. See https://orsopy.readthedocs.io/en/latest/modules.html But here is a high level overview. (As far as ORSO is concerned, everything is optional.)
I think the only technical challenge is the very last part. |
To clarify what I mean by tagging providers, here is an example: RawData = NewType('RawData', int)
CorrectedData = NewType('CorrectedData', int)
ReducedData = NewType('ReducedData', int)
def tag(*tags):
def impl(func):
func.__tags__ = tags
return func
return impl
def load() -> RawData:
return RawData(3)
@tag({"correction": "correct something"})
def correct(raw: RawData) -> CorrectedData:
return CorrectedData(raw // 2)
def reduce(corrected: CorrectedData) -> ReducedData:
return ReducedData(corrected + 3)
pl = sciline.Pipeline((load, correct, reduce))
tg = pl.get(ReducedData)
g = tg._graph
for p, *_ in g.values():
print(p, getattr(p, '__tags__', ())) I.e., the This is pretty generic and doesn't interfere with Sciline. But we may need a better name than Without the tag, we couldn't tell from the pipeline alone what to put into |
To clarify: I meant to enumerate everything that was implemented in the |
In the notebook: owner = fileio.base.Person(
'Jochen Stahn', 'Paul Scherrer Institut', '[email protected]'
)
sample = fileio.data_source.Sample(
'Ni/Ti Multilayer', 'gas/solid', 'air | (Ni | Ti) * 5 | Si'
)
creator = fileio.base.Person(
'Andrew R. McCluskey', 'European Spallation Source', '[email protected]'
)
orso = make_orso(
owner=owner,
sample=sample,
creator=creator,
reduction_script='https://github.com/scipp/ess/blob/main/docs/instruments/amor/amor_reduction.ipynb',
) In the Amor module:
And it automatically sets corrections and wavelength and scattering angle ranges during the workflow. |
'It' being the workflow, 'setting' things in the Orso? Which corrections? |
The various functions used in the workflow set orso fields via the Rhe |
In the old implementation, yes. I think the plan here was to reconsider that. That is why we would like a list of things that are written by the old workflow. |
👇
Plus these in the various functions in
Plus these in the amor notebook:
Plus these in the offspec reduction notebook:
And, or course, the actual data. |
For the corrections, how about adding a provider with optional dependencies, which "detects" the presence of certain input parameters or intermediates: def detect_correction(footprint: Optional[FootprintCorrectedData[Sample]) -> OrsoCorrections:
if footprint is not None:
... Now, technically the presence of |
The most important argument against that is that this approach checks values, not providers. So, let's say, the user disables the footprint correction. They would still have the Less importantly, I wanted to avoid having a central place that defines what counts as a correction. If we are willing to have such a place, we don't need a special provider. We can just walk the graph and check for each provider whether it is in the list of corrections. This would be simpler than having to check for the special |
I don't see what you are saying. If you have a provider that claims to do a footprint correction but does not, there is nothing you can do about that, even if you look at the task graph. In the old approach, if you pass the Orso along with the data, if the provider that does not do a footprint correction still writes that to to Orso you have the same problem. We should avoid providers claiming to return something that is a lie.
Agreed, we should analyze the task graph, and combine that with the remaining Orso bits that were made using Sciline. |
Unfortunately, in the current system, what a provider claims to do (via its return type) is dictated by the surrounding pipeline. Let's say, again, that we want to disable the footprint correction. We have two options here. 1) replace the footprint correction provider to do nothing but still return a An analysis of the keys in the graph cannot detect this case. Only an analysis of the providers can. That is, the provider name, id, or other metadata, but not its return annotation. Hence my suggestion of tagging providers.
Correct. But then, the provider author explicitly specified that the provider counts as a correction. And this is not tied to the environment the provider is used in (the pipeline keys). With your proposal, it is not the author of the provider but the author of the package or pipeline that specifies this. I much prefer an explicit solution that is tied directly to the provider. |
My claim is that this is a bad idea.
That sounds a bit hypothetical. At least currently it seems that our team is writing 90+% of the workflows and providers.
I don't see how that doesn't have the same (or a very similar problem) problem. If you look for a provider in a certain module to "tell" whether a correction has been applied, someone can still come and modify the provider. |
This won't scale though. We cannot control all workflows that will be used. In particular, I'm concerned with modifications to workflows, not the base workflows provided by ESS. People will change things with no regard for recommendations or established best practices. And if the metadata doesn't reflect those changes, it is useless.
Let's say we have @provider(metadata={'correction': 'footprint correction'})
def footprint_correction(data: Data) -> FootprintCorrectedData:
# do work
return ... If someone comes in and modifies this function to not do a footprint correction but leaves the metadata as it is, this is entirely on them. Everything they need to know and modify is right there. Again, this is about being explicit, right at the place where it matters, that this provider is counted as a correction. |
Since it might be general issues after all, NMX reduction has some hard-coded arbitrary manipulation, So current solution was to,
It might be more clear to see In summary I would like to:
However, these are mostly about documenting potentially-surprising manipulations, not for storing them in a nice way like Orso. |
The remainder of this issue is not about storage but only about accessing the relevant information. So your comment fits well. But why can't you split the providers into smaller providers where some of them are the steps you want to track? And magic numbers can be turned into parameters. Possibly default parameters implemented in ess.nmx. |
Because most of those numbers(scale factors) are only for McStas, and we didn't want to expose them as part of workflow. |
I came up with the solution for the internally called smaller steps cases...! If |
Blocked until the next release of Sciline. |
The Sciline version of the workflow does not do the Orso file writing that was done previously.
A Orso-file provider should be added to allow users to request a Orso file describing the steps of the workflow.
The text was updated successfully, but these errors were encountered: