-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Builder API #23223
base: main
Are you sure you want to change the base?
Model Builder API #23223
Conversation
Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes.
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects | ||
// "0": Disabled. [DEFAULT] | ||
// "1": Enable Model Builder Session | ||
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session"; |
Check notice
Code scanning / CodeQL
Unused static variable Note
#include "core/framework/error_code_helper.h" | ||
#include "core/framework/execution_provider.h" | ||
#include "core/session/abi_session_options_impl.h" | ||
// #include "core/session/environment.h" |
Check notice
Code scanning / CodeQL
Commented-out code Note
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects | ||
// "0": Disabled. [DEFAULT] | ||
// "1": Enable Model Builder Session | ||
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't currently used. Initially I was thinking it would enable copying inputs/initializers to the correct device AOT, but...
a) that requires knowing where the value would be used, which is dependent on partitioning that happens later and depends on which EPs are enabled so easy to get wrong;
and
b) would be counter-productive if an optimizer wanted to update the initializer as we'd have to copy it back to CPU to do that.
TBD if needed.
Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library. |
You can use the SessionOption that's typically used to save the optimized ONNX model. onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h Lines 909 to 910 in 6e76179
Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required. |
api.ReleaseTensorTypeAndShapeInfo(tensor_type_info); // input_type_info took a copy | ||
|
||
// create ValueInfo and release the type info as CreateValueInfo takes a copy. | ||
OrtValueInfo* input_value_info = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where to release OrtValueInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems SetGraphInputs()
(and SetGraphOutputs()
) would take ownership of OrtValueInfo
s. Resolving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ORT_CLASS_RELEASE macro defines ReleaseValueInfo if needed.
Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process? |
You can specify the optimization level. If you keep it to level 1 (GraphOptimizationLevel.ORT_ENABLE_BASIC) it will only use standard ONNX ops. onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h Lines 343 to 345 in a3833a5
onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h Lines 1020 to 1029 in a3833a5
https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html |
* Pass ORT_API_VERSION to `OrtApiBase::GetApi()` Also removes the inclusion of onnx.pb.h header. * Add third_party/onnxruntime_headers Import https://github.com/microsoft/onnxruntime/tree/main/include Commit is based on microsoft/onnxruntime#23223 * Use ORT Model Builder API * Refactor scoped ORT type ptr 1. Rename to ScopedOrtTypePtr 2. Use macros 3. Introduce `operator T*()` 4. Introduce `Release()` method 5. Rename `get_ptr()` to `Get()` 6. Rename `get_pptr()` to `GetAddressOf()` * Remove ONNX Runtime headers from third_party/microsoft_dxheaders
if (attributes != nullptr) { | ||
n->attributes.reserve(attribs_len); | ||
for (size_t i = 0; i < attribs_len; ++i) { | ||
n->attributes.push_back(*reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(attributes[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call ReleaseOpAttr after it's copied into the node so the user doesn't have to? Would be more consistent with the rest of the API to 'take ownership' of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same question applies to the CreateXXXTypeInfo/CreateValueInfo calls.
Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.) but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.)
OpAttr
might also be re-used?
but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.
I feel the consistency is for those AddXxxToXxx()
methods who do ownership transferring? SetGraphInputs()
/SetGrapOutputs()
also take ownership (I was not aware of it when I initially used the API), should they be renamed to AddInputsToGraph()
/AddOutputsToGraph()
?
* Pre-existing memory: | ||
* Use CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue to create an OrtValue | ||
* with a tensor that contains a pointer to the existing data. | ||
* User must keep pointer valid for lifetime of the inference session. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true if using CreateTensorWithDataAndDeleterAsOrtValue()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ownership transfers to ORT so the pointer must remain valid, but in that case the user shouldn't be freeing the memory at any point. Will update the comment to clarify.
ORT_API(const OrtModelBuilderApi*, GetModelBuilderApi); | ||
|
||
ORT_API_STATUS_IMPL(CreateTensorWithDataAndDeleterAsOrtValue, _In_ OrtAllocator* deleter, | ||
_In_ void* p_data, size_t p_data_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would p_data
be written by ORT? Or should it take const void*
if it is only read by ORT? A similar question is for CreateTensorWithDataAsOrtValue()
which takes _Inout_ void* p_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORT wouldn't write to p_data
itself, but a user could use update the data in the OrtValue returned using the ORT API (e.g. call GetTensorMutableData and make changes).
@@ -27,6 +27,7 @@ | |||
#include "core/common/span_utils.h" | |||
#include "core/common/status.h" | |||
#include "core/common/logging/logging.h" | |||
#include "core/framework/ort_value.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It introduces a circular dependency between onnxruntime_graph and onnxruntime_framework. ort_value is a core concept in onnxruntime_framework, which also depends on MemoryInfo, Allocators, etc. It means that the lifetime of a graph will be bound to an allocator. Furthermore, people may ask if the OrtValue can be put on GPU devices, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That dependency already existed with InjectExternalInitializedTensors in this file. And there are lots of places in the graph code where we use types from the framework library. If we want to fix that we might need to limit the graph library to fairly pure ONNX related types, and have ORT things built on top of those in the framework library. e.g. you'd have an ONNX Graph class as well as an onnxruntime Graph class, and things like OrtValue usage would be in the latter.
Long term I think it would be better to convert initializers to OrtValue when loading from the ONNX model so we detach from the protobuf types asap. There are many reasons for doing so. Having to add things like InjectExternalInitializedTensors to efficiently manage memory is a good sign the current setup isn't working well.
Can you elaborate on how the lifetime of the Graph is bound to an allocator? The OrtValue instances internally have a Tensor instance where the deleter is in a shared_ptr, so I would have thought the Graph instance can go away at any time, and the shared_ptr for the allocator in the Tensor deleter would also keep the allocator alive for as long as needed.
The problem I'm trying to address is that there's pre-existing memory where we want to transfer ownership to ORT. e.g. to free CPU based memory if we copy it to GPU. Because we have protobuf based initializers there's no way to attach the deleter to them, and the ORT API deals in OrtValue. So this was the best option I could find to essentially pass through that OrtValue to session state finalization.
The OrtValue could theoretically be on GPU. If you did that you could avoid a copy (if you knew for sure the value would be used on GPU) but you'd break the current setup with optimizers as they expect initializer data to be on CPU. Not clear we want to allow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tensor cannot live longer than the allocator that allocated the buffer.
An allocator cannot live longer than the corresponding EP(e.g. CUDA EP). Because the EP needs to manage a lot of handlers, and the allocator may need to use a device handler to do malloc/free. All such handlers get destroyed when the EP is destroyed.
That could make things complicated. For example, in InferenceSession class, we have:
std::shared_ptr<onnxruntime::Model> model_;
// The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx
PathString model_location_;
// The list of execution providers.
ExecutionProviders execution_providers_;
//...
std::unique_ptr<SessionState> session_state_;
The model_
variable contains a graph, which contains OrtValues, which should be deleted before the execution_providers_ . But they are not ordered in that way. We had similar issues with "execution_providers_" and "session_state_". So, this is very subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. So whilst the Tensor has a shared_ptr for the allocator, if the allocator depends on internals of the EP, and the EP goes away, it breaks due to that?
And if we add OrtValue to Graph, which is in InferenceSession::model_, which will be released after execution_providers_ it may break?
Should execution_providers_ therefore be declared prior to model_ in InferenceSession?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
However, the code is ok as for now if the graph's OrtValues only use CPU allocators which are relatively simple.
* If using CreateTensorWithDataAsOrtValue you must keep the pointer valid for lifetime of the inference session. | ||
* Set `data_is_external` to true. | ||
* | ||
* Allocated memory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our testing, some ops seem to require shape-inference-related initializers being in allocated memory, including:
Reshape
'sshape
Reduce
'saxes
Expand
'sshape
Slice
'sstarts
,ends
andsteps
If using pre-existing memory, there will be shape inference error, e.g.
[ShapeInferenceError] Cannot parse data from external tensors. Please load external data into raw data for tensor: x
If that is the case, it would be helpful to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general approach, there's about 60 bytes or so of overhead to use the external memory structure for pre-allocated memory, so if the value is less than say 128 bytes you're probably better off using allocated memory.
I think doing so could almost guaranteed that shape inferencing isn't going to break as I can't think of an input that shape inferencing would read that would have 128 bytes of data (16x int64_t dimension or indices values).
If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.
SGTM, that would be very helpful.
Description
Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.
TODO: Validation API is feature complete and additional tests.
Motivation and Context