Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible cluster installation #3

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ventifus
Copy link

@ventifus ventifus commented Sep 27, 2024

Migrating Azure/ARO-RP#3843 to aro-e2e. Refactored to move the roles into an Ansible collection azureredhatopenshift.cluster.

What this PR does / why we need it:

This PR leverages Ansible to automate cluster creation in predefined configurations. These configurations are intended to exercise ARO features in a repeatable manner. For example, we want to make sure that various complex cluster configurations continue to work in production:

  • UserDefinedRouting with blackhole route
  • Disk encryption at host
  • Disk encryption with bring-your-own-key
  • Bring-your-own-NSG (TODO)

Also implemented is the ability to upgrade through a sequence of versions. See ansible/inventory/upgrades.yaml and ansible/inventory/upgrades-candidate.yamlfor examples.

Refer to ansible/README.md for details how to run the Ansible playbooks via make ansible-image and make cluster targets.

For development, this also adds lint-ansible and test-ansible targets. Lint requires ansible-lint to be installed. Linting also runs as part of the container image build. test-ansible runs inside an aro-ansible container and does not need ansible installed.

@ventifus ventifus changed the title Initial Ansible commit. Ansible cluster installation Sep 27, 2024
@ventifus ventifus force-pushed the ansible-cluster-automation branch 3 times, most recently from 68d8215 to 8673260 Compare September 28, 2024 00:10
@ventifus ventifus force-pushed the ansible-cluster-automation branch from 8673260 to 56f7782 Compare September 30, 2024 15:55
@lranjbar
Copy link
Contributor

lranjbar commented Oct 3, 2024

lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2024
@ventifus
Copy link
Author

ventifus commented Oct 3, 2024

/assign sudobrendan

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2024
@ventifus ventifus force-pushed the ansible-cluster-automation branch from 6da33d3 to 6402b2c Compare October 7, 2024 23:18
@ventifus ventifus force-pushed the ansible-cluster-automation branch from 3b8b3be to cf4915f Compare October 24, 2024 16:29
@lranjbar
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
@ventifus ventifus force-pushed the ansible-cluster-automation branch from cf4915f to 8fbdedd Compare October 30, 2024 15:33
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
@lranjbar
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
@ventifus ventifus force-pushed the ansible-cluster-automation branch from 8fbdedd to c1328e7 Compare October 30, 2024 16:18
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
Copy link

@s-fairchild s-fairchild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left some questions. Namely just asking about code that's commented out.
Can we clean this up by removing the unused code? If it's intended for implementation later, I recommend creating a new branch based on this branch, and deleting it from here.

Comment on lines 6 to 52
# Content that Ansible needs to load from another location or that has
# been deprecated/removed
# plugin_routing:
# action:
# redirected_plugin_name:
# redirect: ns.col.new_location
# deprecated_plugin_name:
# deprecation:
# removal_version: "4.0.0"
# warning_text: |
# See the porting guide on how to update your playbook to
# use ns.col.another_plugin instead.
# removed_plugin_name:
# tombstone:
# removal_version: "2.0.0"
# warning_text: |
# See the porting guide on how to update your playbook to
# use ns.col.another_plugin instead.
# become:
# cache:
# callback:
# cliconf:
# connection:
# doc_fragments:
# filter:
# httpapi:
# inventory:
# lookup:
# module_utils:
# modules:
# netconf:
# shell:
# strategy:
# terminal:
# test:
# vars:

# Python import statements that Ansible needs to load from another location
# import_redirection:
# ansible_collections.ns.col.plugins.module_utils.old_location:
# redirect: ansible_collections.ns.col.plugins.module_utils.new_location

# Groups of actions/modules that take a common set of options
# action_groups:
# group_name:
# - module1
# - module2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from the ansible project template, I've deleted it

Comment on lines 1 to 42
# oc project default (this test assumes we are using the default ns)
# oc new-app --image=nginx && oc expose svc/nginx
# oc get deployment,pod,route,svc
# curl -I --silent http://<hostname you get from the route>

# "Apply the following yml file:
# apiVersion: v1
# kind: PersistentVolumeClaim
# metadata:
# name: azurefile-csi
# spec:
# storageClassName: azurefile-csi
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 3Gi
# ---
# apiVersion: v1
# kind: PersistentVolumeClaim
# metadata:
# name: managed-csi
# spec:
# storageClassName: managed-csi
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 3Gi
# ---
# apiVersion: v1
# kind: PersistentVolumeClaim
# metadata:
# name: managed-csi-encrypted-cmk
# spec:
# storageClassName: managed-csi-encrypted-cmk
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: 3Gi
# "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and several similar comments were describing the manual steps from the smoketest spreadsheet that was getting automated.

# storage: 3Gi
# "

# TODO: Push the required images to the cluster's internal registry so it will work on disconnected clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO for later, or in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, it's not needed immediately

Comment on lines 36 to 50
# - name: create_aro_cluster | Download azaroext
# ansible.builtin.get_url:
# url: https://arosvc.blob.core.windows.net/azext/aro-{{ AZAROEXT_VERSION }}-py2.py3-none-any.whl
# dest: /tmp/aro-{{ AZAROEXT_VERSION }}-py2.py3-none-any.whl
# mode: "0644"
# delegate: "{{ delegation }}"
# register: download_aroazext
# - name: create_aro_cluster | Install azaroext
# when: download_aroazext is changed
# ansible.builtin.command:
# argv:
# - az
# - extension
# - add
# - -s="/tmp/aro-{{ AZAROEXT_VERSION }}-py2.py3-none-any.whl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

# - add
# - -s="/tmp/aro-{{ AZAROEXT_VERSION }}-py2.py3-none-any.whl"

# TODO: Create clusters with azure_rm_openshiftmanagedcluster if possible

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for this PR, or later?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later; the ansible module azure_rm_openshiftmanagedcluster isn't at parity with our aro cli, we should work with the Azure Ansible team to get it updated.

Comment on lines 99 to 100
# private_link_service_network_policies:
# "{% if outbound_type is defined and outbound_type == 'UserDefinedRouting' %}Disabled{% else %}{{ omit }}{% endif %}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 57 to 59
# azure.azcollection.azure_rm_adserviceprincipal:
# app_id: "{{ aro_ad_sp_info.id }}"
# state: absent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a todo for later

Comment on lines 71 to 73
# azure.azcollection.azure_rm_adapplication:
# app_id: "{{ aro_ad_sp_info.appId }}"
# state: absent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a todo for later

Comment on lines +2 to +6
# - name: upgrade_cluster | Begin upgrade process
# loop: "{{ upgrade }}"
# when: upgrade is defined
# ansible.builtin.include_tasks:
# file: ../../tasks/upgrade_cluster.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documentation on how to include this particular task file.

endif

NO_CACHE ?= true
PODMAN_REMOTE_ARGS ?=

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this populated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are optional environment variables, Loki's RP-in-containers efforts made use of them so I included them here for completeness

…isn't supported by the Azure Ansible module at this time
@ventifus ventifus force-pushed the ansible-cluster-automation branch from c1328e7 to 83aa5f6 Compare October 30, 2024 16:42
@ventifus ventifus force-pushed the ansible-cluster-automation branch from fd7c762 to 4af9d52 Compare October 30, 2024 17:29
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2024
@s-fairchild
Copy link

/retest

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 13, 2024
@ventifus ventifus force-pushed the ansible-cluster-automation branch 5 times, most recently from 58273c0 to 7880c72 Compare November 14, 2024 21:27
@ventifus ventifus force-pushed the ansible-cluster-automation branch from 7880c72 to ae9af46 Compare November 16, 2024 00:51
@ventifus
Copy link
Author

Added code in the latest commit for https://issues.redhat.com/browse/ARO-12055, enabling MIWI support and local RP support via rp_mode=development

Copy link

@shubhadapaithankar shubhadapaithankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2024
Copy link

openshift-ci bot commented Dec 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: s-fairchild, shubhadapaithankar, tsatam, ventifus
Once this PR has been reviewed and has the lgtm label, please ask for approval from sudobrendan. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 21, 2024
Copy link

openshift-ci bot commented Dec 21, 2024

New changes are detected. LGTM label has been removed.

Copy link

openshift-ci bot commented Dec 21, 2024

@ventifus: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e ad10a6a link false /test e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants