Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGL contexts in GrDirectContext.MakeEGL #294

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

HinTak
Copy link
Collaborator

@HinTak HinTak commented Dec 25, 2024

This is a slightly different way of addressing #287 . I am posting this just as a reference for @Swarzox , and if he wants to try the CI builds to see if it works in his usage. Personally, I am against merging this in its current form, for 3 reasons:

  • it adds library dependencies (EGL) which are extremely rarely used by most Linux users
  • to properly utilise accelerated headless EGL, one should compile and link against the GPU vendor's EGL, rather than mesa's (as in our release build); I am not sure it is a good idea to build against one but run against the other, or that it works without segfaulting, at all. For the intended usage in Support for EGL contexts in GrDirectContext.MakeGL #287, it is best to build on the platform against the GPU vendor's EGL library, so it is a bit meaningless to have this in our wheels.
  • the surrounding GL-relates code upstream is likely to change in the near future, as it did since m87. Carrying and updating a patch which is not upstreamable will be a continual effort.

Fixes #287

Cc @Swarzox this pull add two new ways of getting a GL context:
skia.GrDirectContext.MakeGL(skia.GrGLInterface.MakeEGL()) and skia.GrDirectContext.MakeGL(skia.GrGLInterface.MakeGLX()), to explicitly initialize GL contexts in those two ways. skia.GrDirectContext.MakeGL() is still via GLX. As I commented in #287 , if you set skia_use_egl=true when building skia, the default switches to via EGL.

I'd be interested in two things:

  • whether the mesa build works at all against vendor's EGL.
  • if you are not using vendor's GL driver but just using mesa in a headless context, I'd still like to know if headless EGL with mesa has any performance advantage over Xvfb. You seems to have some rendering performance tests in mind?

Depending on the answer to the 2nd question, we might still think about merging this, if there is a good reason...

@HinTak
Copy link
Collaborator Author

HinTak commented Dec 25, 2024

This pull is complete as is (with tests etc), but I am against merging, for reasons of doubting its being generally useful and long-term effort of maintaining the addition.

@HinTak
Copy link
Collaborator Author

HinTak commented Dec 25, 2024

The mac 13 Intel build unfortunately is expected to fail because of actions/runner-images#11241 . If that's the only failed job, re-run when that issue is addressed.

@HinTak
Copy link
Collaborator Author

HinTak commented Dec 25, 2024

I need to test this here, it is certainly an oversight: the egl api works in X with real hardware, and work headless with real hardware. I have never tried doing EGL with Xvfb - it looks like the EGL API segfaults in software-only X! This isn't a surprise on hindsight: the whole point of EGL is to use real GPU without X, so I have never tried it with software-only Xvfb.

So the EGL API needs to detect when is Xvfb and just bail, rather than segfault.

Cannot reproduce the problem locally, but the most likely source of segfault
is that Xvfb is not capable of EGL in CI. This change allows EGL context
to fail if under "some sort" of X.
…keGL

Another go at fixing EGL segfault - maybe the libEGL's are too old:
On EL7, EGL/egl.h is in mesa-libEGL-devel instead of libglvnd-devel (later Redhat)
Missing GLES2/gl2.h - in mesa-libGLES-devel
@HinTak
Copy link
Collaborator Author

HinTak commented Dec 26, 2024

Looking more carefully, the segfault in CI seems to be on load, rather than at the specific tests. And with the other changes (the EGL headers are in different packages compared to current redhat/fedora - my own system), really suggests that the segfault is a side-effect of #175 - building against an old libEGL and running on new system can segfault; hence my latest addition to this pull. This adds to my reservation against merging this - the EGL libraries seems to be not backward ABI compatible, hence not expected to be compatie across vendor either.

I.e. @Swarzox you should take this pull and build it on your platform against the system's own EGL, whatever it is shipped with. Don't use binary wheels from somebody else.

The current gn code still has this problem, so upgrading gn isn't useful.
Hence a custom patch.
This reverts commit 2fa072e.

Current gn has this fixed:

commit 0725d7827575b239594fbc8fd5192873a1d62f44
Author: David 'Digit' Turner <[email protected]>
Date:   Tue Jan 18 12:14:01 2022 +0100

    Remove misc GCC-related compiler warnings.
@HinTak
Copy link
Collaborator Author

HinTak commented Dec 27, 2024

Current gn does not like being compiled with older g++:

FAILED: src/gn/operators.o 
  g++ -MMD -MF src/gn/operators.o.d -I../src -I. -Wno-deprecated-copy -DNDEBUG -O3 -fdata-sections -ffunction-sections -Werror -D_FILE_OFFSET_BITS=64 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -pthread -pipe -fno-exceptions -fno-rtti -fdiagnostics-color -Wall -Wextra -Wno-unused-parameter -Wextra-semi -Wundef -std=c++20 -Wno-redundant-move -Wno-restrict -Wno-deprecated-copy -Wno-implicit-fallthrough -Wno-redundant-move -Wno-unused-variable -Wno-format -Wno-strict-aliasing -Wno-cast-function-type -c ../src/gn/operators.cc -o src/gn/operators.o
  In file included from /opt/rh/gcc-toolset-12/root/usr/include/c++/12/string:40,
                   from ../src/base/strings/string_number_conversions.h:11,
                   from ../src/gn/operators.cc:10:
  In static member function ‘static constexpr void std::char_traits<char>::assign(char_type&, const char_type&)’,
      inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:421:23,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.tcc:532:22,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::assign(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:1647:19,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:815:28,
      inlined from ‘Err {anonymous}::MakeOverwriteError(const BinaryOpNode*, const Value&)’ at ../src/gn/operators.cc:202:17:
  /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/char_traits.h:354:16: error: ‘((const std::char_traits<char>::char_type*)((char*)&empty_def + offsetof(std::__cxx11::string, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::<unnamed>)))[2]’ may be used uninitialized [-Werror=maybe-uninitialized]
    354 |         __c1 = __c2;
        |                ^~~~
  ../src/gn/operators.cc: In function ‘Err {anonymous}::MakeOverwriteError(const BinaryOpNode*, const Value&)’:
  ../src/gn/operators.cc:198:15: note: ‘empty_def’ declared here
    198 |   std::string empty_def;
        |               ^~~~~~~~~
  In static member function ‘static constexpr void std::char_traits<char>::assign(char_type&, const char_type&)’,
      inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:421:23,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.tcc:532:22,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::assign(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:1647:19,
      inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:815:28,
      inlined from ‘Err {anonymous}::MakeOverwriteError(const BinaryOpNode*, const Value&)’ at ../src/gn/operators.cc:205:17:
  /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/char_traits.h:354:16: error: ‘((const std::char_traits<char>::char_type*)((char*)&empty_def + offsetof(std::__cxx11::string, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::<unnamed>)))[2]’ may be used uninitialized [-Werror=maybe-uninitialized]
    354 |         __c1 = __c2;
        |                ^~~~
  ../src/gn/operators.cc: In function ‘Err {anonymous}::MakeOverwriteError(const BinaryOpNode*, const Value&)’:
  ../src/gn/operators.cc:198:15: note: ‘empty_def’ declared here
    198 |   std::string empty_def;
        |               ^~~~~~~~~
  cc1plus: all warnings being treated as errors
  [141/199] CXX src/gn/path_output.o
  [142/199] CXX src/gn/pool.o
  [143/199] CXX src/gn/pattern.o
  [144/199] CXX src/gn/parser.o
  [145/199] CXX src/gn/parse_tree.o
  ninja: build stopped: subcommand failed.

While older gn doesn't like being compiled with newer g++ (though we had a patch for this earlier)

See
python/cpython#128161
"Defining iterator in a separate class no longer works in 3.13"

We have iterator for SkTextBlob defined by SkTextBlob::Iter(textblob),
which is the c++/pybind11 equivalent of the same situation.
Following the suggestion:
python/cpython#128161 (comment)

Also see actions/runner-images#11241

Fixes kyamagu#295
@HinTak
Copy link
Collaborator Author

HinTak commented Dec 31, 2024

I am going to close this, tidy up and redo. The bottom line is that the bulk of the changes are correct and work as intended, if you build and test directly on the platform - I have replicated my desktop's success in github CI:
https://github.com/HinTak/skia-m1xx-python/blob/6dc10735a0fc8e97f989a96be95b174d2a87d9b4/.github/workflows/ci.yml#L256
It looks like it is cibuildwheel 's library bundling system which is causing the segfault (I.e. #175 ) . I think it is because you must use libEGL and libGL from the same source/same version, but cibuildwheel bundles libEGL but uses libGL on the runtime host. To test that idea, I'll need to suppress the segfault on CI to let it output the wheels and get the wheels myself to have a look.

This basically adds to the opinion against merging this - it is okay as an enhancement if you build skia-python yourself on the platform you intend to run egl headless, but binary wheels built with this patch on one linux isn't compatible with another Linux. @Swarzox .

@HinTak
Copy link
Collaborator Author

HinTak commented Jan 1, 2025

Filed pypa/cibuildwheel#2118 for the segfault.

The bundled libEGL is what causes the segfault.

If I disable the tests and download the untested wheel, run patchelf --replace-needed ... in reverse for libEGL, then it works. The instruction is added to the top of the EGL patch.

What happens, I think, is this: libEGL is a stub/ trampoline library from the libglvnd project to load libEGL_* from GPU vendors or mesa on a per screen basis. It must be next to the vendor's to be useful. Loading a relocated copy means it can't even find the mesa one for typical fallback, therefore it crashes on load.

@HinTak
Copy link
Collaborator Author

HinTak commented Jan 1, 2025

The instruction to fix the wheel is:

patchelf --replace-needed libEGL-e42ac8e3.so.1.1.0  libEGL.so.1.1.0  ~/.local/lib/python3.13/site-packages/skia.*.so

For the python 3.13 wheel, after installed in user location.

@HinTak
Copy link
Collaborator Author

HinTak commented Jan 1, 2025

I have added a stanza in the ci to build skia-python and test egl on fedora the conventional way, and it passes. @Swarzox

@HinTak
Copy link
Collaborator Author

HinTak commented Jan 1, 2025

Small doc update after retesting the patchelf instructions (the 8 hex digits I assume are checksums of the libraries; they changed as I switched the build hosts back to 2_28). You can download the build artefact, install and run patchelf to correct the wheels @Swarzox . I'll be interested to know if you can run the patchelf'ed wheel directly against your headless system.

@HinTak
Copy link
Collaborator Author

HinTak commented Jan 1, 2025

Some of this (doing a direct fedora build and testing on it) is useful, so i probably will copy them over to m134, but there isn't much point of tidying - most of the mess is in the testing code, the fundamental is in the skia patch and src/skia/ and they are very small and very clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for EGL contexts in GrDirectContext.MakeGL
1 participant