Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rust] add crates.io enichment option for rust audit binary, json schema and spdx license updates. #3554

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jimmystewpot
Copy link

@jimmystewpot jimmystewpot commented Dec 31, 2024

Description

This pull request supports remotely enriching Rust auditable binaries using crates.io. It adds the license, supplier, originator, description, and other fields (optionally if enabled) to the manifest.

This information is unavailable in the cargo lock and binary; if approved, I will add this capability to the other rust cataloger.

Type of change

  • New feature (non-breaking change which adds functionality)

I've also updated the SPDX license list, as that was failing the make test, and updated the JSON schema version to support the new crates-enriched metadata. There are still some missing unit tests, specifically the mocks for the crates.io lookup and caching functionality. I wanted to submit a PR early, seek guidance, and ensure this would benefit the community before investing more time in standardising it across the Rust catalogers.

The new feature adds a rust key to the configuration that allows the feature to be turned on/off and some settings tuned for site-specific needs.

Checklist:

  • I have added unit tests that cover changed behaviour partially
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

@github-actions github-actions bot added the json-schema Changes the json schema label Dec 31, 2024
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jimmystewpot, thanks for this enhancement; overall this looks great, and it's much appreciated that you've followed conventions really well. I left a few specific comments, but the biggest takeaways, I think are:

  • "duplicate" the configuration struct (but it won't be completely duplicated -- for the multilevel configuration, you'll use *bool whereas the rust.CatalogerConfig would have a bool, for example)
  • we probably don't want to choose between one metadata type or the other, but rather add a way to keep both (though the suggestions I have are only suggestions and I'd like to run these by the team when we start to introduce new patterns for things). maybe the best thing is just to add the fields to the existing structs and not worry about having to support multiple metadata types just yet
  • you'll need to Sign-off your commit(s) see contributing.md

"github.com/anchore/syft/syft/pkg/cataloger/rust"
)

type rustConfig rust.CatalogerConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a typedef of the rust struct but should be a declaration of all fields the CLI allows to be configured -- the idea is that these options define the CLI interface and can evolve separately from the internal configuration structs.


// NewCargoLockCataloger returns a new Rust Cargo lock file cataloger object.
func NewCargoLockCataloger() pkg.Cataloger {
return generic.NewCataloger("rust-cargo-lock-cataloger").
func NewCargoLockCataloger(opts CatalogerConfig) pkg.Cataloger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this opts is not used?

type CatalogerConfig struct {
InsecureSkipTLSVerify bool `yaml:"insecure-skip-tls-verify" json:"insecure-skip-tls-verify" mapstructure:"insecure-skip-tls-verify"`
UseCratesEnrichment bool `json:"use-crates-enrichment" yaml:"use-crates-enrichment" mapstructure:"use-crates-enrichment"`
Proxy string `yaml:"proxy,omitempty" json:"proxy,omitempty" mapstructure:"proxy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to the go http_proxy environment variable? I don't think we would need a special config for this, rather just advise users to use the environment variable such that it's used for all http calls instead of needing to configure each individually. If there's really some reason that we need configuration other than the environment variable, we should figure out how to set this globally for all http requests.

"github.com/anchore/syft/internal/mimetype"
"github.com/anchore/syft/syft/pkg"
"github.com/anchore/syft/syft/pkg/cataloger/generic"
)

const cargoAuditBinaryCatalogerName = "cargo-auditable-binary-cataloger"
const (
toolName = "syft" // used for the user-agent string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't add a string here for the user-agent. Ideally this would come from configuration passed from the API user. The app name is initially passed in here, this just needs to get passed through to the appropriate configuration. Maybe we should make this more convenient somehow. But we really don't want this hardcoded as "syft", since a number of apps use the Syft API and are not, in fact, Syft.

const cargoAuditBinaryCatalogerName = "cargo-auditable-binary-cataloger"
const (
toolName = "syft" // used for the user-agent string.
cargoAuditBinaryCatalogerName = "rust-cargo-auditable-binary-cataloger"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right change, but I think this would technically be a breaking change if anyone was using the cargo-auditable-binary-cataloger string. This should probably be reverted to the previous value but add an issue to update it, so we can make sure to do so in Syft 2.0 or whenever appropriate.

func newCratesResolver(name string, opts CatalogerConfig) *rustCratesResolver {
base, err := url.Parse(opts.CratesBaseURL)
if err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never panic but instead use error returns

Comment on lines +55 to +57
if r := recover(); r != nil {
fmt.Fprintf(os.Stderr, "recovered from panic while resolving license at: \n%s", string(debug.Stack()))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be a need for panic recovery here -- what is panicking?

// cratesRemoteMetadata represents the remote metadata for a crate
// as fetched from crates.io via an API request.
// This is used for deserialization of the response from crates.io
type cratesRemoteMetadata struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this type should probably be defined above or below methods on rustCratesResolver rather than in the middle

Comment on lines +96 to +110
switch c.opts.UseCratesEnrichment {
case true:
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(c.opts.CratesTimeout))
defer cancel()
cratesEnrichment, err := c.cratesResolver.ResolveCrate(ctx, dep.Name, dep.Version)
if err != nil {
log.Tracef("rust cataloger: failed to resolve crate %s/%s using crates.io: %v", dep.Name, dep.Version, err)
// fallback to not using the crates enriched package information.
p = newPackageFromAudit(&dep, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation))
continue
}
p = newPackageWithEnrichment(&dep, cratesEnrichment, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation))
case false:
p = newPackageFromAudit(&dep, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if/else for boolean

p = newPackageFromAudit(&dep, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation))
continue
}
p = newPackageWithEnrichment(&dep, cratesEnrichment, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the thing that we need to discuss a bit as a team how best to handle: in the case enrichment is enabled, there is no pkg.RustBinaryAuditEntry metadata created, instead populating a richer, but different, metadata struct type. I've long been a proponent of allowing multiple metadata types, but we don't really have a standard way of doing this yet. I don't think we should have less data when enriching, but we would end up in a situation that potentially something is checking for the pkg.RustBinaryAuditEntry type and it's not found in this case.

I've talked with @wagoodman about this, but I don't think we came to a concrete solution, although since metadata types are arbitrary we could easily add a []any or something similar, and maybe have a helper function to find and return metadata. I don't know if we need this yet, but it definitely looks like some of the fields are being read when outputting different formats from the new enriched data.

If it were me, and the restrictions we have today exist, I might think adding a helper function in the syft/pkg package of something like:

func GetMetadata[T any](p *Package) *T {
  if t, ok := p.Metadata.(T); ok {
    return &t
  }
  if t, ok := p.Metadata.(*T); ok {
    return t
  }
  if metadatas, ok := p.Metadata.([]any); ok {
    for _, m := range metadatas {
      if t, ok := m.(T); ok {
        return &t
      }
      if t, ok := m.(*T); ok {
        return t
      }
    }
  }
  return nil
}

... or something of the sort. which would let us use it fairly simply where we need it, like:

if m := pkg.GetMetadata[pkg.RustBinaryAuditEntry](p); m != nil {
  // do something with the metadata
}

... and we then could set metadata to []any{ RustBinaryAuditEntry{...}, RustCargoMetadata{...} }. And, though it's not directly applicable here, if we migrated usage of the metadata types to this function instead of the direct type assertions we have, we could then also support merging packages more completely without losing certain metadata, etc..

Sorry for the long-winded comment here, just noting this for discussion along with some background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
json-schema Changes the json schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants