Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use content filtering instead of the image editor #144

Closed

Conversation

jaspermdegroot
Copy link
Member

See #83

Update: This PR has been replaced by #170

@jaspermdegroot
Copy link
Member Author

I used an anonymous function in preg_replace_callback which isn't supported in PHP 5.2. That's why the tests fail. Will rewrite the code.

if ( $size ) {
$size = $size[1];

// Otherwise create an array with the values from the width and height attributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there's no wp class or width / height attribute in the image html?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to double check the source code but I believe the wp-image-* class and width and height attributes are always present. Of course we could implement a check and do nothing if they are not there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaspermdegroot there are often cases where the developer chooses to remove the WordPress classes and w/h attributes. Would we be able to retrieve the image ID if those weren't present? Or does this function fire with a higher priority, so that our calculations happen before theme customizations kick in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to retrieve the image ID if those weren't present?

No, I don't think so.

Or does this function fire with a higher priority ...

No, actually the priority is currently very low (999).

We can consider to make the priority very high instead of low, but we probably should add a check anyway to see if the image id and size could be retrieved from attributes and bail if not.

ryelle added a commit to ryelle/wp-tevko-responsive-images that referenced this pull request Jul 23, 2015
Uses DOMDocument to find images, then pulls the image ID from the `wp-image-*` class (could switch to whatever content filtering is used for ResponsiveImagesCG#144, so we’re not reliant on something that might be disabled). Generate the srcset, add it to the image tag, and update the post.
ryelle added a commit to ryelle/wp-tevko-responsive-images that referenced this pull request Jul 23, 2015
Uses DOMDocument to find images, then pulls the image ID from the `wp-image-*` class (could switch to whatever content filtering is used for ResponsiveImagesCG#144, so we’re not reliant on something that might be disabled). Generate the srcset, add it to the image tag, and update the post.
ryelle added a commit to ryelle/wp-tevko-responsive-images that referenced this pull request Jul 23, 2015
Uses DOMDocument to find images, then pulls the image ID from the `wp-image-*` class (could switch to whatever content filtering is used for ResponsiveImagesCG#144, so we’re not reliant on something that might be disabled). Generate the srcset, add it to the image tag, and update the post.
@joemcgill joemcgill force-pushed the dev branch 2 times, most recently from 51ce34b to 8b9bf7e Compare August 26, 2015 13:38
@jaspermdegroot
Copy link
Member Author

I did an A/B performance test.

Server:

  • Apache
  • PHP version 5.6.10
  • MySQL version 5.5.42

Tool:

  • ApacheBench
  • concurrency level 1
  • 100 requests

WordPress:

  • WordPress 4.3
  • theme TwentyFifteen
  • RICG responsive images plugin version 2.4.0
  • no other plugins

WordPress installation A uses the plugin as it is; using the image editor to insert the srcset and sizes attributes in the markup.
WordPress installation B has the changes from this PR applied to the plugin; uses content filtering to insert the srcset and sizes attributes in the markup.

All images in this test have 3 sources available for the srcset (medium, large, and full).
Images are placed between text paragraphs (except for test post "Heavy" that almost only contains images).

The results, based on 3 runs:

Test post "Light" with 1271 words and 5 images:

  • A: 233 ms (233.149 ms | 233.129 ms | 233.250 ms)
  • B: 250 ms (250.602 ms | 250.324 ms | 250.606 ms)
  • B: 250 ms (250.420 ms | 250.536 ms | 250.870 ms)

Difference: 17 ms 17 ms
Average: 3.4 ms / image 3.4 ms / image

Test post "Heavy" with 18 words and 25 images:

  • A: 231 ms (231.971 ms | 231.653 ms | 231.710 ms)
  • B: 306 ms (306.277 ms | 306.485 ms | 305.591 ms)
  • B: 308 ms (308.088 ms | 308.056 ms | 307.233 ms)

Difference: 75 ms 77 ms
Average: 3.0 ms / image 3.08 ms / image

Test post "Very heavy" with 5020 words and 50 images:

  • A: 261 ms (260.643 ms | 261.079 ms | 261.135 ms)
  • B: 409 ms (409.819 ms | 408.795 ms | 407.859 ms)
  • B: 412 ms (411.727 ms | 412.712 ms | 410.793 ms)

Difference: 148 ms 151 ms
Average: 2.96 ms / image 3.02 ms / image

Update: I ran the test again. The old (strikethrough) results for B are based on using preg_match to get the ID and image size name from classes. The new results are based on using attachment_url_to_postid to get the ID and preg_match to get the width and height from the url.

@jaspermdegroot
Copy link
Member Author

A few things that might improve performance is if we could grab the wp-image-{id} and size-{size} classnames in a single preg_match() instead of doing three (one for the class attribute and separate ones for each classname)

That would be nice indeed, but I would need help with that because I don't really see how we can do that.
We also have to keep in mind that the "image size class" might not be there.

I also think we could bail early if a srcset attribute is already present.

Agreed

If we're looking for other ways to bail early, we might be able to check to see if the file path doesn't match wp_uploads()

Good idea

I also think we want to lower the priority value in the add_filter() function so the code executes before someone has potentially stripped off the classnames we're looking for. I'm not sure if we should go all the way to zero (lower numbers execute first) or if we should leave room for someone to override by choosing something like 5.

A priority of 5 sounds good

@joemcgill
Copy link
Member

That would be nice indeed, but I would need help with that because I don't really see how we can do that. We also have to keep in mind that the "image size class" might not be there.

If we did a single preg_match that looks for all those matches and come back with none, we can safely return early since we must have all of those pieces to execute the filter, correct?

@jaspermdegroot
Copy link
Member Author

we must have all of those pieces to execute the filter, correct?

No, we don't need the image size class because we can grab the values from the height and width attributes

@joemcgill
Copy link
Member

Good point. So we could slim it down to 2 regex matches but that's about it.

@jaspermdegroot
Copy link
Member Author

Yes, we can reduce it to 2

@tevko
Copy link
Member

tevko commented Sep 2, 2015

what happens if the height and width attributes are missing?

@jaspermdegroot
Copy link
Member Author

what happens if the height and width attributes are missing?

Then we bail

@jaspermdegroot
Copy link
Member Author

Per discussion on Slack I changed the code to use attachment_url_to_postid to get the image ID and preg_match to get the width and height from the url.

@jaspermdegroot
Copy link
Member Author

@joemcgill

If we're looking for other ways to bail early, we might be able to check to see if the file path doesn't match wp_uploads().

The attachment_url_to_postid core function also uses wp_upload_dir() to strip the baseurl from the path. So if we really want to optimize things we should use a customized attachment_url_to_postid function which bails if the path doesn't start with the baseurl.

Not sure if that's a good idea in regards to merging into core. What do you think?

@tevko
Copy link
Member

tevko commented Sep 3, 2015

@jaspermdegroot would this cause any issues for multisite installations?

@joemcgill
Copy link
Member

@jaspermdegroot I think it would be smart to do a sanity check on the file path before passing a url to attachment_url_to_postid(). Honestly, I'm surprised that logic isn't present in the function itself. It may be worth filing a ticket to enhance that function in core, because right now you could pass any url you want and WP will run a query trying to find the post id that matches.

Another thing I noticed is that an edited image may have a filename that doesn't match the value of __wp_attached_file even taking the size pattern matching into account. Not sure if we need to handle those cases or not.

@peterwilsoncc
Copy link

Are you able to include a filter to get child attachments with the original selects on WP_Query or WP_Post. On my reading it looks like this will add a DB request for each image and avoiding this where possible would sure be dandy.

Some benchmarks would probably be in order, my instinct is it would be quicker but data would be better :).

@joemcgill
Copy link
Member

@peterwilsoncc That's a really interesting approach. I may be mistaken, but don't attachments have to be uploaded from that particular post's edit screen in order to show up as an attachment? In other words, if someone uploaded an image directly to the media library and then inserted it into a post, would it still show up in the list of attachments for that post?

@peterwilsoncc
Copy link

@joemcgill - that's correct, it would only save requests if the author's workflow was to upload the image from the post screen.

At present it looks like there are three SQL calls per image:

# attachment_url_to_postid
SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = '2014/07/4000.png'

# tevkori_get_srcset_array
SELECT * FROM wp_posts WHERE ID = 16 LIMIT 1
SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (16) ORDER BY meta_id ASC

Getting attachments as part of the original call should remove the second one for most images, possibly the third - I'll need to do some Nasty Hacks to check.

It could also be mitigated by updating attachment_url_to_postid to accept multiple URLs and batching the requests for each post.

I'm slowly coming around to the idea of dynamically adding the srcset, avoiding extra SQL requests on each image/loop would probably convince me.

@chriscct7
Copy link

I think while its a good idea to do a performance test on 5.6, I think it'd be really prudent to also do one for 5.2. Some of the performance benefits you might see running it on 5.6, might not exist (and in fact a patch could easily be a performance detractor on 5.2). While that's unlikely, confirming that's not the case is probably a good idea.

@jaspermdegroot
Copy link
Member Author

@tevko

would this cause any issues for multisite installations?

Is there any particular part in the code that makes you think this could cause issues with multisite installations?
Anyways, we should test it. I currently don't have multisite installation to test this on though.

Update: Our unit tests do test multisite and test pass so it shouldn't be a problem.

@jaspermdegroot
Copy link
Member Author

@joemcgill

I think it would be smart to do a sanity check on the file path before passing a url to attachment_url_to_postid(). Honestly, I'm surprised that logic isn't present in the function itself. It may be worth filing a ticket to enhance that function in core, because right now you could pass any url you want and WP will run a query trying to find the post id that matches.

Ok

Another thing I noticed is that an edited image may have a filename that doesn't match the value of __wp_attached_file even taking the size pattern matching into account. Not sure if we need to handle those cases or not.

I think we should try to handle those cases, but it doesn't feel rock solid to simply strip any "-e" followed by 13 digits from the end of the file name since it's perfectly valid to upload an image with that pattern in the filename. Probably an edge case but still.

@jaspermdegroot
Copy link
Member Author

The extra SQL request was also my concern when we discussed using the url to get the ID instead of grabbing it from the class name and tests do show that is a bit slower, but I agree with @tevko and @joemcgill that it is more reliable.

If we can avoid the extra request that would be great. Thanks for your input @peterwilsoncc . I will look into your approach.

@jaspermdegroot
Copy link
Member Author

@chriscct7

Good point. I will test performance against PHP 5.2 as well.

@peterwilsoncc
Copy link

I've done some work going directly from the image URLs to getting the attachment objects. It has less DB calls but whether it is more efficient overall will need further benchmarking.

I've been working on the branch peterwilsoncc:content-filtering-pwcc [diff with jaspermdegroot:content-filtering]

Callback is now in a class which requires a separate instance for each post, there's some shuffling around to make sure all image sizes return an ID, all of this leads me to question how efficient it is.

All in all, not sure how much help this has been for y'all.

@jaspermdegroot jaspermdegroot mentioned this pull request Sep 14, 2015
@jaspermdegroot
Copy link
Member Author

This PR has been replaced by #170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants