Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18211: Override class loaders for class graph scanning in connect. #18403

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.ServiceLoader;
import java.util.SortedSet;
import java.util.TreeSet;
Expand Down Expand Up @@ -76,8 +78,21 @@ private static <T> String versionFor(Class<? extends T> pluginKlass) throws Refl

@Override
protected PluginScanResult scanPlugins(PluginSource source) {
// By default, java and classgraph uses parent first classloading, hence if a plugin is loaded by the classpath
// loader, and then by an isolated plugin loader, the default precedence will always load the classpath version.
// This breaks isolation and hence connect uses isolated plugin loaders, which are child first classloaders.
// Therefore, we override the classloader order to be child first, so that the isolated plugin loader is used first.
// In addition, we need to explicitly specify the full classloader order, as classgraph only scans the classes available
// in the classloaders and not the entire parent chain. Due to this reason if a plugin is extending a class present
// in classpath/application it will not be able to find the parent class unless we explicitly specify the classloader order.
List<ClassLoader> classLoaderOrder = new ArrayList<>();
ClassLoader cl = source.loader();
while (cl != null) {
classLoaderOrder.add(cl);
cl = cl.getParent();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? What is the ignoreParentClassLoaders method doing then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more analysis on this. My statement is somewhat incorrect. Classgraph does try to find and scan the classes from parent but it fails to do so. Elaborating more.

Classgraph scanning works by computing a list of class path URLs from the provided class loaders and then manually scans the .class files under the URLs to retrieve information about those classes. Unless we instruct classloading explicitly (which we do here) the classes are not loaded using Java's classloading. If the desired class is not found in the scanning our code never tries to load the class. The list of class path URLs are ordered based on the ordering of the provided classloaders. All the logic is in the constructor of ClasspathFinder.

ClasspathFinder uses various ClassLoaderHandlers to obtain the set of URLs for a classloader. Connects PluginClassLoader uses URLClassLoaderHandler which works fine and gets the list of jars in a plugin path. But when it comes to the application classloader and platform classloader which has the URLs for classpath it uses JPMSClassLoaderHelper and tries to get the URLs through a illegal reflections access on a private field, which will fail on modern Java (throws an IllegalAccessException, can be mitigated using --illegal-access=permit but this is not present since Java 17 and not really recommended). Even though the classloader chain is computed the, URLs in classpath are not obtained because of this. This is why some of the tests were failing where SubclassOfClasspathConverter which extens ByteArrayConverter was not computed to be an implementation of a converter since ByteArrayConverter is in classpath.

To force classpath URL scanning explicitly we need ClasspathFinder to execute this part of code, which is only possible with classloader overrides if one of the provided classloader is application/platform classloader. Passing the classloader chain for the PluginClassLoader achieves this. ignoreParentClassLoader is tied to overrideClassloader == null check, hence is of no use with classLoader overrides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the explanation, I see that ClassGraph is going outside the classloader hierarchy for scanning, and just uses a list of jars. Here's what I was seeing:

Trunk

    .addClassLoader(source.loader())
  • Gradle wrapper
  • Plugin
  • Gradle worker
  • build/ class directories
  • connect-runtime
  • build/ lib jars
  • Gradle caches (dependencies)

Single Loader

    .overrideClassLoaders(source.loader())
  • Plugin

Loader and parents

    List<ClassLoader> classLoaderOrder = new ArrayList<>();
    ClassLoader cl = source.loader();
    while (cl != null) {
        classLoaderOrder.add(cl);
        cl = cl.getParent();
    }
    ...
        .overrideClassLoaders(classLoaderOrder.toArray(new ClassLoader[0]))
  • Runtime modules
  • Plugin
  • Gradle worker
  • build/ class directories
  • connect-runtime
  • build/ lib jars
  • Gradle caches (dependencies)

It doesn't appear that the order in which we specify the classloaders has an effect on the order in which scanning actually takes place. And I think that makes sense given the ClassLoaderHandler stuff; It's applying a deterministic ordering to generate these lists of jars.

Look at this: https://github.com/classgraph/classgraph/blob/6f9012f2a193ebfefe4a4384e7642820e7aab0f5/src/main/java/nonapi/io/github/classgraph/classloaderhandler/README

Note that URLClassLoader subclasses do not need a custom ClassLoaderHandler (unless they need to override the delegation order, as with SpringBootRestartClassLoaderHandler), URLClassLoader subclasses are handled automatically by ClassGraph.

This sounds like it applies to us, because we have a child-first/parent-last delegation order, but we don't have a special handler telling ClassGraph about it. Maybe we can pursue upstreaming a handler to make the ordering work properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can see that your reproduction case fails with the old code and succeeds with the new code, why is that?

I can see after the initial ClassGraph#scan() call, it actually always finds the PluginClassLoader isolated copy of ByteArrayConverter, even with the trunk implementation. The difference happens inside of ClassInfoList#loadClasses and ClassGraphClassLoader#loadClass. Here's what I was seeing there:

Trunk

environmentClassLoaderDelegationOrder

  • PlatformClassLoader
  • AppClassLoader
  • FilteringClassLoader (?)
  • URLClassLoader (?)
  • PluginClassLoader

Loader and parents

overrideClassLoaders

  • PluginClassLoader
  • AppClassLoader
  • PlatformClassLoader

So the order of the specified classloaders does change the classloading order of the plugins, even though it doesn't change the scanning behavior.

I think the current implementation is satisfactory as a workaround for the behavior, but we should follow-up with ClassGraph and try and use their ClassLoaderHandler infrastructure to get the right classloader sorting. I explored circumventing the ClassGraphClassLoader entirely and calling Class.forName(..., ..., source.loader()), but its a bit clunky because we have to also include a bunch of error handling.

ClassGraph classGraphBuilder = new ClassGraph()
.addClassLoader(source.loader())
.overrideClassLoaders(classLoaderOrder.toArray(new ClassLoader[0]))
.enableExternalClasses()
.enableClassInfo();
try (ScanResult classGraph = classGraphBuilder.scan()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.kafka.connect.runtime.isolation;

import org.apache.kafka.connect.json.JsonConverter;
import org.junit.jupiter.api.io.TempDir;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.MethodSource;
Expand All @@ -31,6 +32,7 @@

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertInstanceOf;
import static org.junit.jupiter.api.Assertions.assertTrue;


Expand Down Expand Up @@ -137,6 +139,17 @@ public void testVersionedPluginsHasVersion(PluginScanner scanner) {
versionedPluginResult.forEach(pluginDesc -> assertEquals("1.0.0", pluginDesc.version()));
}

@ParameterizedTest
@MethodSource("parameters")
public void testClasspathPluginIsAlsoLoadedInIsolation(PluginScanner scanner) {
// json converter is part of the classpath by default
String jsonConverterLocation = JsonConverter.class.getProtectionDomain().getCodeSource().getLocation().getPath();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very brittle to me.

Rather than using a real class, can this use a TestPlugin with a similar name to a classpath plugin?

Or the last time we needed to set up a "classpath" plugin, it was by injecting a new loader in the hierarchy in PluginsTest#assertClassLoaderReadsVersionFromResource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test using a TestPlugin. Setting up a test plugin the way we do in assertClassLoaderReadsVersionFromResource was not working to catch this regression. I think its because classgraph with addClassLoader sets up its own classloader chain and not the one we use.

PluginScanResult result = scan(scanner, Collections.singleton(Path.of(jsonConverterLocation)));
assertFalse(result.converters().isEmpty());
result.converters().stream().filter(pluginDesc -> pluginDesc.className().equals(JsonConverter.class.getName()))
.forEach(pluginDesc -> assertInstanceOf(PluginClassLoader.class, pluginDesc.loader()));
}

private PluginScanResult scan(PluginScanner scanner, Set<Path> pluginLocations) {
ClassLoaderFactory factory = new ClassLoaderFactory();
Set<PluginSource> pluginSources = PluginUtils.pluginSources(pluginLocations, PluginScannerTest.class.getClassLoader(), factory);
Expand Down
Loading