Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating the use of static IAM User credentials for S3 support in Kubeflow Pipelines #704

Open
surajkota opened this issue Apr 24, 2023 · 1 comment

Comments

@surajkota
Copy link
Contributor

surajkota commented Apr 24, 2023

What Has Worked

Kubeflow on AWS allows using Amazon S3 as a backend for pipeline artifact storage in Kubeflow pipelines. This integration provides the following advantages as described in the documentaion:

  • Removes the need to host, monitor and scale the default in-cluster object storage MinIO provided by open source Kubeflow
  • Amazon S3 offers industry-leading scalability, data availability, security, and performance, and could be used to meet your compliance requirements.
  • Decouples the storage of artifacts from the Kubeflow deployment so that they can be persisted beyond Kubeflow deployment, used by applications/users inside and outside of Kubeflow etc.

What Has Not Worked

Until Kubeflow 1.6, using S3 for artifact storage in Kubeflow Pipelines required users to provide permanent/static IAM credentials. Static credentials can pose a security risk and users have raised concerns about using this feature in current fashion. Further, these credentials are made available to the pipeline pods via Kubernetes secret which means they cannot be rotated due to following reasons:

  • Changes to secrets are not automatically picked up by the deployments and requires restarting them which can impact production workload
  • These secrets are copied to user namespace and changes to the secrets in Kubeflow namespace are not reflected in the secret created in user profile adding to operational overhead and downtime in the event the credentials need to be rotated

What Is Changing

Starting Kubeflow 1.7, IAM Role for Service Account (IRSA) can be used to configure Amazon S3 as an artifact store for pipelines. IAM Role for Service Account (IRSA) allows to use temporary credentials to make API requests and to scope permissions at pod level via Kubernetes service accounts. For more information and advantages of using IRSA, refer to the Amazon EKS documentation. Support for using IRSA as a pipeline S3 credential option has been added to all supported deployment options such as RDS-S3 and Cognito-RDS-S3. With this feature, you no longer need to use static/permanent credentials to use Amazon S3 with Kubeflow.

Note: IRSA is only supported in KFPv1, if you plan to use KFPv2, choose the IAM User option. IRSA support for KFPv2 will be added in the upcoming releases.

Notable changes

Deprecating the use of IAM User/static credentials

We are deprecating the use of IAM User/static credentials for this functionality. Using IAM User credentials is supported in Kubeflow 1.7 release to unblock users from migrating to Kubeflow 1.7 and provide flexibility to evaluate and migrate to using the new functionality. Support for configuring S3 via IAM User credentials will be removed in future releases.

Attaching an IAM role with Kubeflow profiles

In a multi user Kubeflow environment, the pods created by pipelines workflow and the pipelines frontend services run in user profile namespace. The service account used for these pods needs permissions for the S3 bucket used by pipelines to read and write artifacts from S3. Users will need to provide an IAM role as input for every profile when using S3 integration with Kubeflow via the AwsIamForServiceAccount plugin for Profiles. We are working on providing additional documentation regarding this, please refer to the instructions for Creating profile role kf-pipeline-profile-role role in the RDS-S3 deployment guide in the meantime.

Instead of creating a profile with only a user email, you will need to provide an IAM role configured with S3 access to the bucket

apiVersion: kubeflow.org/v1
kind: Profile
metadata:
  name: kubeflow-user-example-com
spec:
  owner:
    kind: User
    name: [email protected]
  plugins:
  - kind: AwsIamForServiceAccount
    spec:
      awsIamRole: arn:aws:iam::123456789012:role/profile-role-user-example-com

Call for Action

We highly recommend all our customers using S3 integration with Kubeflow to use the latest version of Kubeflow and try out using S3 with Kubeflow pipelines via IRSA.

References

@surajkota surajkota changed the title Deprecating the use of static IAM User credentials for S3 support in Kubeflow pipelines Deprecating the use of static IAM User credentials for S3 support in Kubeflow Pipelines Apr 24, 2023
@thesuperzapper
Copy link

Hey all, I just wanted to share that deployKF uses S3 directly (no minio gateway), which also means that it comes with support for IRSA!

It's going to be documented better soon, but these are the values you need to configure to use S3, happy to help anyone who has trouble getting it working.

If you want, you can also read a bit more about deployKF in this comment: kubeflow/manifests#2451 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants