Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-128375: Better instrument for FOR_ITER #128445

Merged
merged 9 commits into from
Jan 6, 2025

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jan 3, 2025

This PR:

  • Changes the POP_TOP and the end of a for loop to POP_ITER (it pops the iterator)
  • Removes the NOT_TAKEN following FOR_ITER and instruments the FOR_ITER and POP_ITER instead
  • Fixes handling of EXTENDED_ARGS in co_branches().

The last one isn't strictly related to the issue, but came up while fixing the main issue.

Changing the instrumentation pattern has two effects:

  • For uninstrumented code, there is one less instruction dispatch (the NOT_TAKEN) per iteration, at the cost of one additional dispatch when the loop is exited.
  • For instrumented code, once the non-exit branch is DISABLEd, the FOR_ITER can be specialized and jitted.

Performance is neutral on the standard benchmarks for tier1, as expected.

Comment on lines 908 to 910
code_offset_to_line(code, src)-base,
code_offset_to_line(code, left)-base,
code_offset_to_line(code, right)-base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
code_offset_to_line(code, src)-base,
code_offset_to_line(code, left)-base,
code_offset_to_line(code, right)-base
code_offset_to_line(code, src) - base,
code_offset_to_line(code, left) - base,
code_offset_to_line(code, right) - base,


tier1 inst(INSTRUMENTED_END_FOR, (receiver, value -- receiver)) {
/* Need to create a fake StopIteration error here,
* to conform to PEP 380 */
// * to conform to PEP 380 ED_NO*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

* This has the benign side effect that if value is
* finalized it will see the location as the FOR_ITER's.
*/
frame->instr_ptr = prev_instr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precedent for this? Seems like the kind of thing that can cause confusion/problems later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tier 2 code generator doesn't like it either. I think I'll need to add a new annotation "no_saved_ip" to tell the code generator not to update frame->instr_ptr.

Any suggestions for a better name than "no_saved_ip"?

@iritkatriel
Copy link
Member

I was wondering if we couldn't, instead of putting in NOT_TAKEN instructions, have a static table on the code object with the fallthrough locations. They are only needed for instrumentation, so could be processed only when instrumentation is enabled.

@nedbat
Copy link
Member

nedbat commented Jan 3, 2025

Trying this PR compared to fa985be, more of my tests fail (on a work-in-progress branch to account for BRANCH_LEFT etc). I'll debug my code, but I wish there were a way to get a concrete plan to work from.

@markshannon
Copy link
Member Author

I was wondering if we couldn't, instead of putting in NOT_TAKEN instructions, have a static table on the code object with the fallthrough locations. They are only needed for instrumentation, so could be processed only when instrumentation is enabled.

I think that would be worse overall, for a few reasons:

  • We need to be able to disable the left and right events separately. Without a NOT_TAKEN instruction, each POP_JUMP... would need POP_JUMP...INSTRUMENTED, POP_JUMP...INSTRUMENTED_LEFT_ONLY, and POP_JUMP...INSTRUMENTED_RIGHT_ONLY.
    With the current 4 POP_JUMP variants that would be 12 instructions instead of the 6 we need with NOT_TAKEN.
  • Such a table would be a lot bigger than extra space taken by the NOT_TAKEN instructions
  • It wouldn't be significantly faster when not instrumented. The cost of NOT_TAKEN is very low as it normally skipped and is eliminated entirely in the jit.
  • It would be quite a lot slower when instrumented thanks to the table lookups.

@nedbat
Copy link
Member

nedbat commented Jan 3, 2025

Is this intended?

continue.py:

for i in range(10):
    a = i
    continue    # 3
    a = 99
assert a == 9   # 5

Disassembled with this branch:

% python3 -m dis -O continue.py
  0          0       RESUME                   0

  1          2       LOAD_NAME                0 (range)
             4       PUSH_NULL
             6       LOAD_SMALL_INT          10
             8       CALL                     1
            16       GET_ITER
      L1:   18       FOR_ITER                 5 (to L2)
            22       STORE_NAME               1 (i)

  2         24       LOAD_NAME                1 (i)
            26       STORE_NAME               2 (a)

  3         28       JUMP_BACKWARD            7 (to L1)

  1   L2:   32       END_FOR
            34       POP_ITER

  5         36       LOAD_NAME                2 (a)
            38       LOAD_SMALL_INT           9
            40       COMPARE_OP              88 (bool(==))
            44       POP_JUMP_IF_TRUE         3 (to L3)
            48       NOT_TAKEN
            50       LOAD_COMMON_CONSTANT     0 (AssertionError)
            52       RAISE_VARARGS            1
      L3:   54       LOAD_CONST               0 (None)
            56       RETURN_VALUE

sys.monitoring events using run_sysmon.py:

3.14.0a3+ (heads/pr/128445:511733c91b3, Jan  3 2025, 10:10:48) [Clang 16.0.0 (clang-1600.0.26.6)]
PY_START: continue.py@0 #0
LINE: continue.py #1
BRANCH_LEFT: continue.py@18->22 #1->1
LINE: continue.py #2
LINE: continue.py #3
JUMP: continue.py@28->18 #3->1
LINE: continue.py #1
BRANCH_RIGHT: continue.py@34->36 #1->5                  ******
LINE: continue.py #5
BRANCH_RIGHT: continue.py@44->54 #5->5
PY_RETURN: continue.py@56 #5

Is it right that the starred line has a BRANCH_RIGHT event from a POP_ITER instruction?

@markshannon
Copy link
Member Author

Is it right that the starred line has a BRANCH_RIGHT event from a POP_ITER instruction?

No. It should be from the FOR_ITER. The problem was that the POP_ITER was being instrumented with INSTRUMENT_LINE which lost the previous instruction offset. I think I have that fixed now.

@nedbat
Copy link
Member

nedbat commented Jan 4, 2025

Two more examples of branch event surprises. run_sysmon.py is used to see the sys.monitoring events. These are using commit cbe6c7c from PR #128445.

ifdebug.py
for value in [True, False]:
    if value:
        if __debug__:
            x = 4
    else:
        x = 6

Disassembled:

% python3 -m dis -O ifdebug.py
  0          0       RESUME                   0

  1          2       LOAD_CONST               0 ((True, False))
             4       GET_ITER
      L1:    6       FOR_ITER                18 (to L3)
            10       STORE_NAME               0 (value)

  2         12       LOAD_NAME                0 (value)
            14       TO_BOOL
            22       POP_JUMP_IF_FALSE        6 (to L2)
            26       NOT_TAKEN

  3         28       NOT_TAKEN

  4         30       LOAD_SMALL_INT           4
            32       STORE_NAME               1 (x)
            34       JUMP_BACKWARD           16 (to L1)

  6   L2:   38       LOAD_SMALL_INT           6
            40       STORE_NAME               1 (x)
            42       JUMP_BACKWARD           20 (to L1)

  1   L3:   46       END_FOR
            48       POP_ITER
            50       LOAD_CONST               1 (None)
            52       RETURN_VALUE

Events:

3.14.0a3+ (heads/pr/128445:cbe6c7c085c, Jan  4 2025, 05:39:38) [Clang 16.0.0 (clang-1600.0.26.6)]
PY_START: ifdebug.py@0 #0
LINE: ifdebug.py #1
BRANCH_LEFT: ifdebug.py@6->10 #1->1
LINE: ifdebug.py #2
BRANCH_LEFT: ifdebug.py@22->28 #2->3
LINE: ifdebug.py #3
BRANCH_LEFT: ifdebug.py@28->30 #3->4        ****
LINE: ifdebug.py #4
JUMP: ifdebug.py@34->6 #4->1
LINE: ifdebug.py #1
BRANCH_RIGHT: ifdebug.py@22->38 #2->6
LINE: ifdebug.py #6
JUMP: ifdebug.py@42->6 #6->1
BRANCH_RIGHT: ifdebug.py@6->50 #1->1
PY_RETURN: ifdebug.py@52 #1

The starred line is a BRANCH event from a NOT_TAKEN instruction.

1176_async.py
import asyncio

async def async_gen():
    yield 4

async def async_test():
    async for i in async_gen():
        print(i + 8)

asyncio.run(async_test())

Disassembled:

% python3 -m dis -O 1176_async.py
  0          0       RESUME                   0

  1          2       LOAD_SMALL_INT           0
             4       LOAD_CONST               0 (None)
             6       IMPORT_NAME              0 (asyncio)
             8       STORE_NAME               0 (asyncio)

  3         10       LOAD_CONST               1 (<code object async_gen at 0x103781c50, file "1176_async.py", line 3>)
            12       MAKE_FUNCTION
            14       STORE_NAME               1 (async_gen)

  6         16       LOAD_CONST               2 (<code object async_test at 0x1037c5b80, file "1176_async.py", line 6>)
            18       MAKE_FUNCTION
            20       STORE_NAME               2 (async_test)

 10         22       LOAD_NAME                0 (asyncio)
            24       LOAD_ATTR                6 (run)
            44       PUSH_NULL
            46       LOAD_NAME                2 (async_test)
            48       PUSH_NULL
            50       CALL                     0
            58       CALL                     1
            66       POP_TOP
            68       LOAD_CONST               0 (None)
            70       RETURN_VALUE

Disassembly of <code object async_gen at 0x103781c50, file "1176_async.py", line 3>:
   3          0       RETURN_GENERATOR
              2       POP_TOP
       L1:    4       RESUME                   0

   4          6       LOAD_SMALL_INT           4
              8       CALL_INTRINSIC_1         4 (INTRINSIC_ASYNC_GEN_WRAP)
             10       YIELD_VALUE              0
             12       RESUME                   5
             14       POP_TOP
             16       LOAD_CONST               0 (None)
             18       RETURN_VALUE

  --   L2:   20       CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
             22       RERAISE                  1
ExceptionTable:
  L1 to L2 -> L2 [0] lasti

Disassembly of <code object async_test at 0x1037c5b80, file "1176_async.py", line 6>:
   6           0       RETURN_GENERATOR
               2       POP_TOP
        L1:    4       RESUME                   0

   7           6       LOAD_GLOBAL              1 (async_gen + NULL)
              16       CALL                     0
              24       GET_AITER
        L2:   26       GET_ANEXT
              28       LOAD_CONST               0 (None)
        L3:   30       SEND                     3 (to L6)
        L4:   34       YIELD_VALUE              1
        L5:   36       RESUME                   3
              38       JUMP_BACKWARD_NO_INTERRUPT 5 (to L3)
        L6:   40       END_SEND
        L7:   42       STORE_FAST               0 (i)

   8          44       LOAD_GLOBAL              3 (print + NULL)
              54       LOAD_FAST                0 (i)
              56       LOAD_SMALL_INT           8
              58       BINARY_OP                0 (+)
              62       CALL                     1
              70       POP_TOP
              72       JUMP_BACKWARD           25 (to L2)

   7    L8:   76       CLEANUP_THROW
        L9:   78       JUMP_BACKWARD_NO_INTERRUPT 20 (to L6)
       L10:   80       END_ASYNC_FOR
              82       LOAD_CONST               0 (None)
              84       RETURN_VALUE

  --   L11:   86       CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
              88       RERAISE                  1
ExceptionTable:
  L1 to L2 -> L11 [0] lasti
  L2 to L4 -> L10 [1]
  L4 to L5 -> L8 [3]
  L5 to L7 -> L10 [1]
  L7 to L8 -> L11 [0] lasti
  L8 to L9 -> L10 [1]
  L10 to L11 -> L11 [0] lasti

Events:

3.14.0a3+ (heads/pr/128445:cbe6c7c085c, Jan  4 2025, 05:39:38) [Clang 16.0.0 (clang-1600.0.26.6)]
PY_START: 1176_async.py@0 #0
LINE: 1176_async.py #1
LINE: 1176_async.py #3
LINE: 1176_async.py #6
LINE: 1176_async.py #10
PY_START: 1176_async.py@4 #6
LINE: 1176_async.py #7
PY_START: 1176_async.py@4 #3
LINE: 1176_async.py #4
LINE: 1176_async.py #8
12
JUMP: 1176_async.py@72->26 #8->7
LINE: 1176_async.py #7
PY_RESUME: 1176_async.py@12 #4
PY_RETURN: 1176_async.py@18 #4
PY_RETURN: 1176_async.py@84 #7
PY_RETURN: 1176_async.py@70 #10

Here there are no BRANCH events at all. I would expect some for the async for line, analogous to how synchronous for is evented.

@markshannon
Copy link
Member Author

These examples are really helpful, but I don't think they are relevant to this PR.
Could you add these examples to the issue, otherwise they get lost.

The ifdebug.py example is a case of the NOT_TAKEN not being removed when the corresponding JUMP is removed. The 1176_async.py is a result of the weird way that async for is currently implemented. I've added tasks for both.

* the FOR_ITER as the previous instruction.
* This has the benign side effect that if value is
* finalized it will see the location as the FOR_ITER's.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just teach POP_ITER to look at prev_instr-1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_instr is the previously executed instruction, which is probably not the previous instruction in the code.
There is no way for POP_ITER to tell where the jump came from if we overwrite frame->instr_ptr

@markshannon markshannon merged commit f826bec into python:main Jan 6, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants