Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](nereids) infer in-predicate from or-predicate #46468

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

englefly
Copy link
Contributor

@englefly englefly commented Jan 6, 2025

What problem does this PR solve?

previous verion has follow drawbacks

  1. inferred some in-predicates, which cannot be pushed down to storage layer
  2. it is easy to lead dead loop, because other expression rewrite rule may remove its flag in expression's mutableState

in order to solve above issues, we implemented a new version
first, it is a plan node level rule to avoid to be applied to the same expression repeatedly
second, we define replace mode and extract mode. if in replace mode, the original expression should be equivalent to the inferred in-pred, which is used for all plan node's expressions except filter.

for example, orig = "(a=1 and b=1) or (a=2 and c=2)" is equivalent to "a in (1, 2) and (a=1 and b=1) or (a=2 and c=2)".
orig is a filter condition, "a in (1, 2)" can be pushed down to storage layer, and this infer is useful. But if this is orig is an other join condition, this inferrence is useless. So in extract mode, "a in (1, 2)" is inferred, but in replace mode, it is not.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Contributor Author

englefly commented Jan 6, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32462 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f67c7b303ba2da58f7f5534733f595b8bdc9362b, data reload: false

------ Round 1 ----------------------------------
q1	17586	6882	6051	6051
q2	2049	293	164	164
q3	10442	1237	698	698
q4	10210	845	425	425
q5	7512	2142	1964	1964
q6	200	177	151	151
q7	898	735	608	608
q8	9214	1317	1098	1098
q9	5103	4881	4899	4881
q10	6730	2290	1843	1843
q11	489	275	280	275
q12	352	392	225	225
q13	17764	3627	3048	3048
q14	238	232	218	218
q15	563	505	495	495
q16	629	593	610	593
q17	554	838	323	323
q18	7063	6534	6365	6365
q19	1204	936	528	528
q20	301	321	191	191
q21	2876	2198	2004	2004
q22	359	338	314	314
Total cold run time: 102336 ms
Total hot run time: 32462 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6287	6208	6237	6208
q2	233	322	229	229
q3	2268	2680	2275	2275
q4	1415	1868	1363	1363
q5	4348	4722	4773	4722
q6	188	180	142	142
q7	2044	1982	1871	1871
q8	2610	2735	2626	2626
q9	7192	7171	7186	7171
q10	3039	3340	2796	2796
q11	592	509	486	486
q12	705	758	620	620
q13	3501	3912	3207	3207
q14	304	302	288	288
q15	577	531	518	518
q16	659	711	637	637
q17	1189	1721	1239	1239
q18	7770	7351	7165	7165
q19	754	979	1054	979
q20	1916	1952	1796	1796
q21	5391	5067	4754	4754
q22	624	604	573	573
Total cold run time: 53606 ms
Total hot run time: 51665 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190911 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f67c7b303ba2da58f7f5534733f595b8bdc9362b, data reload: false

query1	984	378	364	364
query2	6518	2286	2349	2286
query3	6707	214	215	214
query4	34058	23937	23458	23458
query5	4379	620	492	492
query6	287	200	175	175
query7	4621	488	309	309
query8	303	248	241	241
query9	9536	2667	2654	2654
query10	486	308	248	248
query11	18098	15433	15269	15269
query12	161	112	113	112
query13	1636	524	390	390
query14	10330	7451	7458	7451
query15	248	204	182	182
query16	8221	590	433	433
query17	1598	783	569	569
query18	2094	409	338	338
query19	206	185	145	145
query20	116	120	109	109
query21	211	124	103	103
query22	4411	4338	4276	4276
query23	34153	33289	34967	33289
query24	7125	2367	2277	2277
query25	477	450	390	390
query26	1120	274	151	151
query27	2033	446	338	338
query28	5100	2454	2423	2423
query29	555	534	419	419
query30	227	177	152	152
query31	1008	884	838	838
query32	80	59	58	58
query33	502	347	328	328
query34	742	824	496	496
query35	788	833	743	743
query36	1025	1059	960	960
query37	124	106	72	72
query38	4229	4120	4238	4120
query39	1504	1431	1430	1430
query40	208	120	98	98
query41	48	45	47	45
query42	118	108	103	103
query43	510	523	473	473
query44	1281	806	795	795
query45	181	179	168	168
query46	858	1029	659	659
query47	1900	1900	1843	1843
query48	368	392	319	319
query49	800	497	387	387
query50	636	660	388	388
query51	7228	7050	6930	6930
query52	104	103	93	93
query53	225	252	184	184
query54	464	488	409	409
query55	89	84	78	78
query56	255	275	255	255
query57	1207	1194	1137	1137
query58	241	228	236	228
query59	2949	3058	2981	2981
query60	273	265	262	262
query61	112	107	113	107
query62	844	808	738	738
query63	240	187	189	187
query64	4377	1018	673	673
query65	3272	3213	3260	3213
query66	857	438	298	298
query67	16034	15725	15508	15508
query68	8393	701	520	520
query69	446	294	256	256
query70	1220	1146	1067	1067
query71	441	286	269	269
query72	6167	3826	3885	3826
query73	658	748	356	356
query74	10367	8995	9160	8995
query75	4209	3158	2665	2665
query76	3740	1171	783	783
query77	772	377	278	278
query78	10285	10096	9647	9647
query79	4045	783	590	590
query80	728	539	441	441
query81	494	272	238	238
query82	581	155	121	121
query83	194	182	153	153
query84	288	97	139	97
query85	773	369	309	309
query86	347	316	305	305
query87	4726	4650	4496	4496
query88	4260	2174	2145	2145
query89	415	328	304	304
query90	1901	249	188	188
query91	136	144	105	105
query92	69	56	59	56
query93	1948	818	528	528
query94	648	395	306	306
query95	334	265	261	261
query96	485	611	274	274
query97	2836	3029	2817	2817
query98	233	203	212	203
query99	1684	1585	1451	1451
Total cold run time: 295503 ms
Total hot run time: 190911 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f67c7b303ba2da58f7f5534733f595b8bdc9362b, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.60	0.11	0.10
query5	0.43	0.41	0.41
query6	1.18	0.65	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.04
query9	0.58	0.51	0.51
query10	0.56	0.57	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.59	0.59
query14	2.70	2.69	2.75
query15	0.90	0.83	0.82
query16	0.38	0.38	0.38
query17	1.07	1.08	1.07
query18	0.23	0.20	0.22
query19	1.88	1.85	1.98
query20	0.02	0.01	0.01
query21	15.39	0.90	0.59
query22	0.75	0.87	0.64
query23	15.25	1.46	0.51
query24	3.62	0.85	0.67
query25	0.10	0.10	0.19
query26	0.42	0.16	0.13
query27	0.06	0.04	0.05
query28	12.75	1.62	1.05
query29	12.58	4.00	3.25
query30	0.25	0.09	0.06
query31	2.82	0.63	0.38
query32	3.23	0.55	0.47
query33	3.08	3.14	3.17
query34	16.91	5.16	4.53
query35	4.56	4.52	4.52
query36	0.76	0.49	0.48
query37	0.10	0.06	0.07
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.13	0.14
query41	0.08	0.03	0.02
query42	0.03	0.02	0.03
query43	0.03	0.03	0.03
Total cold run time: 105.87 s
Total hot run time: 31.09 s

@englefly
Copy link
Contributor Author

englefly commented Jan 7, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32809 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 46a6653bec41960ffa892436e3bf15f2032cf72c, data reload: false

------ Round 1 ----------------------------------
q1	17584	6140	6054	6054
q2	2046	317	177	177
q3	10404	1277	752	752
q4	10204	879	437	437
q5	7881	2242	2023	2023
q6	205	186	152	152
q7	904	752	610	610
q8	9231	1404	1198	1198
q9	5286	5006	5014	5006
q10	6794	2334	1870	1870
q11	473	286	270	270
q12	346	356	222	222
q13	17754	3661	3011	3011
q14	241	224	215	215
q15	568	528	493	493
q16	637	630	596	596
q17	597	871	345	345
q18	7230	6316	6482	6316
q19	1762	947	534	534
q20	315	325	200	200
q21	2876	2221	2015	2015
q22	362	344	313	313
Total cold run time: 103700 ms
Total hot run time: 32809 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6324	6264	6203	6203
q2	243	328	230	230
q3	2243	2643	2282	2282
q4	1460	1850	1375	1375
q5	4347	4847	4848	4847
q6	181	176	142	142
q7	2049	1937	1781	1781
q8	2630	2849	2731	2731
q9	7312	7318	7364	7318
q10	3088	3364	2871	2871
q11	588	511	489	489
q12	664	727	578	578
q13	3540	3931	3258	3258
q14	275	294	290	290
q15	584	544	503	503
q16	641	694	654	654
q17	1249	1749	1247	1247
q18	7705	7561	7269	7269
q19	784	947	1171	947
q20	1994	2039	1926	1926
q21	5717	5313	4891	4891
q22	653	628	619	619
Total cold run time: 54271 ms
Total hot run time: 52451 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197193 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 46a6653bec41960ffa892436e3bf15f2032cf72c, data reload: false

query1	1313	969	926	926
query2	6474	2392	2297	2297
query3	10996	4585	4635	4585
query4	33178	23902	23997	23902
query5	4716	628	465	465
query6	296	204	192	192
query7	3990	487	305	305
query8	310	247	244	244
query9	9489	2777	2764	2764
query10	481	306	243	243
query11	18304	15594	15232	15232
query12	158	113	98	98
query13	1532	519	382	382
query14	10120	6819	6975	6819
query15	239	215	191	191
query16	7998	591	428	428
query17	1493	743	588	588
query18	2129	427	316	316
query19	205	214	184	184
query20	136	126	118	118
query21	229	134	115	115
query22	4832	4651	4502	4502
query23	35193	33821	34588	33821
query24	7177	2445	2320	2320
query25	517	490	438	438
query26	756	273	154	154
query27	2125	476	339	339
query28	5389	2535	2481	2481
query29	545	542	420	420
query30	216	183	147	147
query31	987	910	858	858
query32	74	55	57	55
query33	483	352	296	296
query34	759	877	518	518
query35	819	874	741	741
query36	1054	1037	942	942
query37	118	105	72	72
query38	4324	4357	4095	4095
query39	1555	1483	1467	1467
query40	211	121	113	113
query41	45	46	44	44
query42	122	105	105	105
query43	527	515	496	496
query44	1423	866	832	832
query45	189	176	166	166
query46	903	1103	664	664
query47	2040	2021	1924	1924
query48	389	413	332	332
query49	710	488	403	403
query50	647	663	414	414
query51	7164	7141	7017	7017
query52	103	97	91	91
query53	225	258	191	191
query54	475	502	412	412
query55	88	76	80	76
query56	251	266	250	250
query57	1248	1225	1197	1197
query58	235	236	229	229
query59	3238	3314	3116	3116
query60	281	275	259	259
query61	110	144	105	105
query62	862	845	763	763
query63	235	191	191	191
query64	3485	1056	643	643
query65	3343	3297	3256	3256
query66	785	416	313	313
query67	16671	15894	15609	15609
query68	8722	708	512	512
query69	474	282	258	258
query70	1216	1145	1146	1145
query71	427	282	265	265
query72	6424	3905	3875	3875
query73	649	764	379	379
query74	10380	9168	8846	8846
query75	3968	3241	2670	2670
query76	3640	1191	785	785
query77	762	352	289	289
query78	10260	10129	9708	9708
query79	3524	796	584	584
query80	739	533	420	420
query81	485	281	237	237
query82	574	150	119	119
query83	186	162	145	145
query84	279	92	86	86
query85	772	350	289	289
query86	351	320	303	303
query87	4777	4457	4396	4396
query88	4112	2197	2187	2187
query89	434	329	303	303
query90	1834	187	185	185
query91	133	139	103	103
query92	67	55	51	51
query93	1231	867	529	529
query94	641	381	281	281
query95	324	264	257	257
query96	478	612	275	275
query97	2901	2968	2801	2801
query98	244	208	197	197
query99	1754	1558	1442	1442
Total cold run time: 299476 ms
Total hot run time: 197193 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 46a6653bec41960ffa892436e3bf15f2032cf72c, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.10	0.10
query5	0.43	0.43	0.43
query6	1.13	0.64	0.65
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.61	0.49	0.50
query10	0.56	0.56	0.56
query11	0.15	0.10	0.10
query12	0.14	0.11	0.10
query13	0.60	0.60	0.60
query14	2.84	2.74	2.75
query15	0.89	0.84	0.81
query16	0.38	0.36	0.38
query17	1.07	1.09	1.05
query18	0.22	0.20	0.21
query19	1.94	1.86	1.99
query20	0.02	0.00	0.01
query21	15.38	0.89	0.56
query22	0.76	0.83	0.65
query23	15.30	1.48	0.51
query24	3.25	1.72	0.96
query25	0.27	0.13	0.08
query26	0.27	0.14	0.14
query27	0.06	0.05	0.05
query28	13.48	1.51	1.04
query29	12.63	4.01	3.34
query30	0.26	0.09	0.06
query31	2.83	0.59	0.38
query32	3.23	0.54	0.46
query33	3.18	3.05	3.08
query34	16.78	5.15	4.44
query35	4.48	4.43	4.50
query36	0.83	0.48	0.48
query37	0.10	0.07	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.49 s
Total hot run time: 31.16 s

@englefly
Copy link
Contributor Author

englefly commented Jan 8, 2025

run p0

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 8, 2025
Copy link
Contributor

github-actions bot commented Jan 8, 2025

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jan 8, 2025

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants