Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ILASM] Add support for deterministic builds and PDB checksums #109091

Open
wants to merge 73 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Oct 21, 2024

Follow-up to #85344. Fixes #8293. Fixes #62484.

This PR simply pulls in the SHA-256 wrappers added in #109559 into the prototype in #85344.

TIHan added 30 commits April 25, 2023 11:33
@amanasifkhalid
Copy link
Member Author

Aside from some determinism failures on Linux arm32 (run) that I'm unable to reproduce when cross-compiling, ILAsm roundtrip tests are passing. I'm unfamiliar with most of the code from #85344, so I suppose I ought to open this up for review sooner rather than later, in case I'm missing something obvious.

cc @dotnet/jit-contrib, not sure who's the right person to request for this, but feel free to loop in other reviewers. Thanks!

if (createFlags & ICEE_CREATE_FILE_DET)
{
// A timestamp of 0 triggers asserts in the VM
m_ntHeaders->FileHeader.TimeDateStamp = VAL32(1);
Copy link
Member

@markples markples Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md and the behavior of csc -deterministic use a hash of the file in this field. I see other hashing going on in this PR, but I think it applies to other parts of the file (?)

It's not clear to me if this matters. I'm not sure what the difference is between this field and the value can be stored in the REPRO debug directory.

I wondered if the runtime is asserting because it wants (mostly?) uniqueness of these values, but if

// for dynamic modules use 0 as the time stamp
ULONG ulTimeStamp = 0;
if (!pModule->IsReflectionEmit())
{
ulTimeStamp = pModule->GetPEAssembly()->GetPEImageTimeDateStamp();
_ASSERTE(ulTimeStamp != 0);
is the assertion that you're referring to, then it only seems to be about the value zero, which is unfortunate since https://learn.microsoft.com/en-us/windows/win32/debug/pe-format seems to indicate that 0 and FFFFFFFF are valid. (Perhaps FFFFFFFF would be a better choice if producing a hash is problematic for some reason?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the assert I was hitting. The deterministic PE/COFF documentation suggests the actual timestamp value doesn't matter, though I suppose we ought to match csc's behavior and use the file hash here too, rather than some magic value we have to explain. Either way, I agree that the assertion in the VM is probably overzealous.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr ilasm

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid amanasifkhalid changed the title [ILASM] Add support for deterministic compilation [ILASM] Add support for deterministic builds and PDB checksums Jan 6, 2025
@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jan 6, 2025

The parent PR's PDB checksum implementation wasn't null-terminating the hash algorithm prefix ("SHA256", etc). With this fixed, dumpbin seems to be able to read the new debug directory entries correctly. For example, here's the debug directory dump for a binary built with /PDB /DET specified:

Debug Directories

        Time Type        Size      RVA  Pointer
    -------- ------- -------- -------- --------
    00000000 cv            42 00002160      360    Format: RSDS, {E4632B33-1CEC-8C2E-513D-F9C4DF5B2A1E}, 1, C:\wk\runtime2\IL-RT\trythrowexcept_d.pdb
    00000000 pdbhash       27 000021A2      3A2    SHA256: 33 2B 63 E4 EC 1C 2E 8C 51 3D F9 C4 DF 5B 2A 1E 2B FF 5A 00 2B A7 C0 AD 6E 7B 34 30 34 59 64 BA
    00000000 repro          0 00000000        0

@amanasifkhalid
Copy link
Member Author

Determinism tests are passing in CI, too. cc @dotnet/jit-contrib, @jkotas could you PTAL? Thanks!

@@ -614,6 +614,12 @@ HRESULT PEWriter::Init(PESectionMan *pFrom, DWORD createFlags)
m_ntHeaders->FileHeader.Characteristics |= VAL16(IMAGE_FILE_RELOCS_STRIPPED);
}

if (createFlags & ICEE_CREATE_FILE_DET)
{
// We don't need a meaningful date/time stamp -- just a consistent one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtto

#if !defined(_WIN32) && !defined(__APPLE__)
if (!IsOpenSslAvailable())
{
fprintf(stderr, "\nWarning: OpenSSL not available. Disabling build determinism.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an error.

proc.kill()
sys.exit(1)

if hash != hash_file(inputAssemblyName):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to compare the content of the whole file instead of just a hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems simpler; I'll move this over, too.

@amanasifkhalid
Copy link
Member Author

Timestamps in the debug directory dumps are no longer zero, but are taken from the bytes after the GUID in the PDB checksum:

  Debug Directories

        Time Type        Size      RVA  Pointer
    -------- ------- -------- -------- --------
    005AFF2B cv            42 00002160      360    Format: RSDS, {E4632B33-1CEC-8C2E-513D-F9C4DF5B2A1E}, 1, C:\wk\runtime2\IL-RT\trythrowexcept_d.pdb
    005AFF2B pdbhash       27 000021A2      3A2    SHA256: 33 2B 63 E4 EC 1C 2E 8C 51 3D F9 C4 DF 5B 2A 1E 2B FF 5A 00 2B A7 C0 AD 6E 7B 34 30 34 59 64 BA
    00000000 repro          0 00000000        0

And the header timestamp matches:

FILE HEADER VALUES
             14C machine (x86)
               2 number of sections
          5AFF2B time date stamp
               0 file pointer to symbol table
               0 number of symbols
              E0 size of optional header
             102 characteristics
                   Executable
                   32 bit word machine

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr ilasm

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

// on the IMDInternalEmit interface.
//
// When the CLSID_CorMetaDataDispenser instance above has activated against a current
// clr.dll (which is the only supported configuration for the determinism feature), it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ilasm in .NET Core has statically linked copy of metadata implementation. This comment that talks about clr.dll is not relevant here. Was it copied from .NET Framework port?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: I do not think you would be able to use IMDInternalEmit interface for the .NET Framework port. It would create versioning problems. I think we will need a proper new interface defined for .NET Framework port.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it copied from .NET Framework port?

I believe so -- this comment is from the initial commit from the parent PR.


ULONG timestamp;
memcpy_s(&timestamp, sizeof(ULONG), pdbChecksum + sizeof(GUID), sizeof(ULONG));
m_pPortablePdbWriter->SetTimestamp(timestamp);
Copy link
Member

@jkotas jkotas Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to generate deterministic GUID and the timestamp from the content hash even when we are not generating the PDBs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems better than leaving the GUID/timestamp as zero. Are you suggesting we derive them from something like the hash we use for the MVID below, if the PDB stream isn't available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I believe that generating .dll without .pdbs is fairly common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to avoid depending on PDB stream to generate the GUID/timestamp in all cases, even when we are genering the PDBs.

@amanasifkhalid
Copy link
Member Author

The determinism tests were failing because I was using different assembly names as inputs into ilasm, which would change the CodeView debug directory entry. This should be fixed

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr ilasm

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants