Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk import times scale with the number of tablets in a table. #5201

Open
keith-turner opened this issue Dec 19, 2024 · 1 comment
Open

Bulk import times scale with the number of tablets in a table. #5201

keith-turner opened this issue Dec 19, 2024 · 1 comment
Labels
bug This issue has been verified to be a bug.

Comments

@keith-turner
Copy link
Contributor

Describe the bug

When bulk importing into N tablets the bulk import v2 code will scan all tablet in the metadata table between the minimum and maximum tablet being imported into. For example if importing into 10 tablets into a table with 100K tablets its possible that the bulk import scans all 100K tablets, it depends on where the minimum and maximum tablet in the 10 fall in the 100k.

Expected behavior

Ideally the amount of scanning done would be directly related to the number of tablets being bulk imported and not the number of tablets int he table. This would be a large change to the way the code works. A good first step would be to add some logging to the current code that captures how much time this behavior is wasting. Then further decisions could be made about improving the code based on that.

@keith-turner keith-turner added the bug This issue has been verified to be a bug. label Dec 19, 2024
@keith-turner
Copy link
Contributor Author

This applies to bulk v2 code, not sure if applies to bulk v1 code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue has been verified to be a bug.
Projects
None yet
Development

No branches or pull requests

1 participant