-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not pipe non-json data through json.dumps #910
base: main
Are you sure you want to change the base?
Conversation
I'm not sure if this makes sense here, but instead of
|
I don't even know how i would end up with plain json-data in my notebooks, therefore I'd rather not add any complexity I can't judge myself. |
AFAICT, Jupyter widgets use JSON to store their data in the notebook file. They use MIME types like An example notebook: https://github.com/jupyter-widgets/ipywidgets/blob/master/docs/source/examples/Widget%20List.ipynb I don't know if those output data should be ignored by the If not, they should be added |
At least in this example, I'd wager it should explicitly not be extracted. (What can the user want with the model_id that seems to be for communication with the widget-handler?) |
I'm quite confused why a byte string ended up in the notebook data to be honest. I thought according to the nbformat everything is a JSON data type, so either a string or another JSON. In that case the fix would be to ensure that the data is read as a string instead. The problem with the spec is that it's a bit ambiguous: JSON output may consist of a single string. That's why the logic I applied in determining the output type is to check whether it's not a string, and therefore must be JSON, or a string, in which case it may be a JSON and we have to check the mime type. If for some reason we do expect that byte arrays may appear in cell outputs, I believe a better fix would be to include those in the check In addition to ipywidgets storing their outputs as JSON, so does plotly (its mime type is application/vnd.plotly.v1+json) and |
I guess you're right about the specs. The file itself even says that it expects images to be enclosed in a string:
Still, i'm not sold on your solution. |
The list of mimetypes accepted by default is a separate issue from extracting the output correctly, and therefore I did not touch it. That traitlet isn't a complete list of mime types that should ever be treated by the extract output preprocessor, but merely what the extract output treats by default. Several preprocessors override that list. We therefore may not assume that we are never going to encounter other mimetypes. Specifically your PR would break nbconvert for those who want to extract widget information as pointed by @mgeier, or a plotly plot, which is also JSON.
That is certainly true, but this bit of code is not about the meaning of the data (represented as mime type), but rather about the storage format of the data. Unfortunately right now the storage format is not explicitly specified in the metadata (see also #858). Roughly speaking, there are 3 options currently:
The second and third option would become strings at this point in the pipeline, therefore the logic I am applying is the following;
That's my understanding of what is happening in the preprocessor, largely based on discussions with @minrk. What I really do not understand is: why do we ever encounter a byte string! This seems like a bug: JSON doesn't have a way to specify the type of string data, svg may contain arbitrary unicode symbols, and nothing in the spec tells about byte strings. Instead we're base64 encoding anything that isn't unicode. |
Ah, totally didn't catch that usage, but fair point.
Ok, I now understand your assumptions. But still, is this the right place to (implicitly) introduce strong typing? I have two issues with this:
|
Some remarks:
|
Related to this topic, I ended up tackling the |
fixes #904 and possible similar errors by treating json data explicitely