Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-2.1: [opt](cache) enhance cache key computation by removing comments and trimming SQL input #46099 #46472

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jan 6, 2025

Cherry-picked from #46099

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jan 6, 2025
@dataroaring dataroaring reopened this Jan 6, 2025
@hello-stephen
Copy link
Contributor

run buildall

1 similar comment
@924060929
Copy link
Contributor

run buildall

…rimming SQL input (#46099)

- Currently, the SQL cache system in Doris may miss cache hits due to
semantically identical queries being treated as different because of:
  - Extra whitespace characters in the SQL query
  - SQL comments that don't affect the query execution
- For example, these queries are semantically identical but would
generate different cache keys:
  ```sql
  SELECT * FROM table;
  -- Same query with comments and extra spaces
  /* Comment */  SELECT   *   FROM   table  ;
  ```
- This PR improves the SQL cache hit rate by:
  - Trimming whitespace from SQL queries
  - Removing SQL comments before calculating the cache key MD5
- This ensures that queries that are semantically identical but differ
only in whitespace or comments will now hit the same cache entry,
improving cache efficiency and reducing unnecessary query executions
@yiguolei yiguolei force-pushed the auto-pick-46099-branch-2.1 branch from dd98532 to 009683e Compare January 6, 2025 23:55
@yiguolei
Copy link
Contributor

yiguolei commented Jan 6, 2025

run buildall

@yiguolei yiguolei merged commit 52455ed into branch-2.1 Jan 7, 2025
21 of 23 checks passed
@github-actions github-actions bot deleted the auto-pick-46099-branch-2.1 branch January 7, 2025 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants