Bosse writes.
Technical · Howto

Automating installation of Red Hat OpenShift 4 on VSphere

March 4, 2023 6 min read By Bosse Klykken
My writedown of a howto for installing OCP on VMware. I'd avoid that whole thing unless you have to.

Prerequisites

  • Infrastructure set up and available
    • VLANs
    • Load balancers (API + apps)
    • Remote access (to bastion/jumphost)
    • Firewalls
  • VMware vSphere set up and ready
  • Bastion host installed and configured

Download auth file from Red Hat and modify it to disable telemetry

🎯 BASTION

Log on to https://cloud.redhat.com with a Red Hat account with the proper subscriptions attached, or get it from someone with such access. The auth file is JSON file containing tokens providing access to vendor registries.

The JSON file contains a reference to cloud.redhat.com. This needs to be removed in order to disable OpenShift’s telemetry option where an unacceptable amount of data about the cluster is shared with the vendor.

jq -c 'del(.["auths"]["cloud.openshift.com"])' authfile.json > new-authfile.json

The cluster type to be created is bare metal user-provisioned infrastructure (UPI).

Download openshift-installer and openshift client (oc)

🎯 BASTION

Log on to https://cloud.redhat.com with a Red Hat account with the proper subscriptions attached, or get it from someone with such access in order to get updated download URLs.

Download Red Hat Core OS OVA template

🎯 BASTION

Fetch the latest version (https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/) of the OVA template. This template must be added to the vCenter Content Library. The OVA template is then used to create a VM, which is used as the source when creating a new VM later in these instructions. After cloning it to VM, mark the VM as a template.

The template should be placed in vc01.example.com > ocp > templates. Name it accordingly, e.g.: rhcos-4.9

Do not start the VM after it is generated from the OVA template. It will taint the disk and it can not be used a source for new VMs. This is why you’ll mark the VM as a template after it is cloned from the OVA template.

Generate a ED25519 SSH key

🎯 BASTION

ssh-keygen -t ed25519 -f <destination>

Create the install-config.yaml file

🎯 BASTION

Fill out the install-config.yaml with required information

  • The file must be named install-config.yaml
  • Be aware that when this file will be deleted by the create manifest operation later, so keep a copy around
  • Review IP ranges as these can not be changed after installation. They must not be in use elsewhere in the organization (other than other Kubernetes clusters)
  • Verify that the noProxy IPs include the LB VIPs and (importantly) the vCenter IP
  • Verify that the noProxy IPs includes 169.254.0.0/16
  • networkType can be either OpenShiftSDN or OVNKubernetes, but there has been varying issues with installing with OVN. It will be the default SDN in OpenShift 4.8, so a migration path is already documented by the vendor, but due to the installation issues avoid installing it for now.
  • PS: noProxy auto-adds several entries automagically:, node, cluster and service CIDRs, localhost, internal API url (e.g. api-int.ocp.example.com), .cluster.local and .svc. (Source: link1 link2)
apiVersion: v1
baseDomain: lab.example.com
proxy:
  httpProxy: http://proxy.example.com:3128
  httpsProxy: http://proxy.example.com:3128
  noProxy: <bastion>,.example.com,10.,192.168.,10.20.30.40,10.20.30.0/24
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp1
networking:
  clusterNetwork:
  - cidr: 10.229.0.0/16
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  vsphere:
  vcenter: address.to.vcenter.api
  username: username@vsphere.local
  password: password
  datacenter: DCxx
  defaultDatastore: datastorename
fips: false
pullSecret: '<auth json from above>'
sshKey: '<ssh pubkey from above>'

Create manifests

🎯 BASTION

Create , and copy the install-config.yaml into mkdir /path/to/manifest

cp install-config.yaml /path/to/manifest
openshift-install create manifests --dir=/path/to/manifest

Remove master Machine and worker MachineSet files

Delete the master Machine and worker MachineSet files - these are generated because the installer assumes that we’re using installer-provisioned infrastructure (IPI) while we’re actually using user-provisioned infrastructure (UPI)

cd /path/to/manifest
rm openshift/99_openshift-cluster-api_master-machines-* openshift/99_openshift-cluster-api_worker-machineset-0.
yaml

Modify master schedulability

Modify the cluster-scheduler-02-config.yml file so that the masters won’t schedule normal workloads when the cluster starts up.

cd /path/to/manifest
sed -i 's,mastersSchedulable: true,mastersSchedulable: false,g' manifests/cluster-scheduler-02-config.yml

Add NTP configuration as MachineConfig BASTION The NTP configuration needs to be applied as MachineConfigs, with the /etc/chrony.conf as a base64 string. The same base64-string can be applied to both files.

echo """pool 10.123.30.25
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony""" | base64 -w0

cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK

Place the following files under: /path/to/manifest/openshift

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-masters-chrony-configuration
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 3.2.0
    networkd: {}
    passwd: {}
    storage:
      files:
      - contents:
        source: data:text/plain;charset=utf-8;base64,cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
        mode: 420
        overwrite: true
        path: /etc/chrony.conf
  osImageURL: ""
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-workers-chrony-configuration
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 3.2.0
    networkd: {}
    passwd: {}
    storage:
      files:
      - contents:
        source: data:text/plain;charset=utf-8;base64,cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
        mode: 420
        overwrite: true
        path: /etc/chrony.conf
  osImageURL: ""

Create ignition files

🎯 BASTION

The ignition files are what we will provide to the nodes to allow them to configure themselves as part of the bootstrapping process. This will also generate authentication files for the cluster.

Configure the cluster identifier in vSphere directory

🎯 VCENTER

When the ignition files are created, a unique identifier is generated in the metadata.json file. This identifier needs to be used to create a directory in vSphere and in the common config in the Ansible playbook.

jq .infraID /path/to/manifests/metadata.json

"ocp1-f9lkm"

Convert the ignition files to base64

🎯 BASTION

The ignition files needs to be applied as base64-formatted variables in vCenter to the Red Hat CoreOS template.

cd /path/to/manifests
for foo in master worker bootstrap; do base64 -w0 ${foo}.ign > ${foo}.64; done

Create DNS A and PTR records for all required nodes and load balancer VIPs

🎯 NAMESERVER

In this implementation, the DNS will be handled by a private DNS server with dnsmasq. It can also be handled by the generic DNS if that is feasible.

In /etc/dnsmasq.conf on the nameserver, every node and VIP has an address and a ptr-record entry.

local=/.ocp2.lab.example.com/
local=/.30.20.10.in-addr.arpa./
listen-address=127.0.0.1,10.20.30.40
...
address=/worker6.ocp2.lab.example.com/10.20.30.10
ptr-record=10.30.20.10.in-addr.arpa.,"worker6.ocp2.lab.example.com"
...

Create Ansible playbook for provisioning nodes

🎯 BASTION

Create an ansible subdirectory and create the playbook file playbook.yaml in it.

- hosts: localhost
  vars_files:
    - vars/bootstrap.yaml
    - vars/master.yaml
    - vars/worker.yaml
  vars:
    vsphere:
      vcenter_hostname: vcenter.address
      vcenter_username: username@vsphere.local
      vcenter_password: password
      datacenter_name: DCXX
      cluster_name: vcenter-clustername
      target_folder: 'ocp1-xxxxx'
      template_name: 'rhcos-4.9'
  tasks:
    - name: 'Create a virtual machine from the template'
      vmware_guest:
        hostname: '{{ vsphere.vcenter_hostname }}'
        username: '{{ vsphere.vcenter_username }}'
        password: '{{ vsphere.vcenter_password }}'
        datacenter: '{{ vsphere.datacenter_name }}'
        cluster: '{{ vsphere.cluster_name }}'
        folder: '{{ vsphere.target_folder }}'
        name: '{{ item.name }}'
        template: '{{ vsphere.template_name }}'
        advanced_settings:
          - key: guestinfo.ignition.config.data.encoding
            value: base64
          - key: guestinfo.ignition.config.data
            value: '{{ item.ignite }}'
          - key: disk.EnableUUID
            value: 'TRUE'
          - key: 'guestinfo.afterburn.initrd.network-kargs'
            value: 'ip={{ item.ip }}::{{ item.gw }}:{{ item.mask }}:::none nameserver=<nameserver ip>'
        disk:
          - size_gb: "{{ item.disk }}"
            type: thin
            datastore: datastore01
        hardware:
          memory_mb: "{{ item.mem }}"
          num_cpus: "{{ item.cpu }}"
          scsi: paravirtual
          hotadd_cpu: False
          hotremove_cpu: False
          hotadd_memory: False
          version: 19
        networks:
          - name: '{{ item.network }}'
            start_connected: yes'
            type: static
        validate_certs: False
        #state: poweredon
        state: present
        #state: absent
      with_items:
        - { name: bootstrap.ocp2.domain.name, ip: 10.20.6.102, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 65536, cpu: 16, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ b_ignite }}" }
        - { name: master0.ocp2.domain.name, ip: 10.20.6.98, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
        - { name: master1.ocp2.domain.name, ip: 10.20.6.99, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
        - { name: master2.ocp2.domain.name, ip: 10.20.6.100, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
        - { name: infra0.ocp2.domain.name, ip: 10.20.6.66, gw: 10.20.6.65, mask: 255.255.255.240, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-1111-XXXX-INFRA2', ignite: "{{ w_ignite }}" }
        - { name: infra1.ocp2.domain.name, ip: 10.20.6.67, gw: 10.20.6.65, mask: 255.255.255.240, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-1111-XXXX-INFRA2', ignite: "{{ w_ignite }}" }
        - { name: worker0.ocp2.domain.name, ip: 10.20.7.4, gw: 10.20.7.1, mask: 255.255.255.0, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-2222-XXXX-WORKER2', ignite: "{{ w_ignite }}" }
        - { name: worker1.ocp2.domain.name, ip: 10.20.7.5, gw: 10.20.7.1, mask: 255.255.255.0, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-2222-XXXX-WORKER2', ignite: "{{ w_ignite }}" }

Create Ansible variable files based on the base64 ignite configs

🎯 BASTION

These ignition configs will be available for Ansible, depending on the role defined by the ignite-parameter in the with_items list above.

echo "b_ignite: '$(cat /path/to/manifests/bootstrap.64)'" > /path/to/ansible/vars/bootstrap.yaml
echo "m_ignite: '$(cat /path/to/manifests/master.64)'" > /path/to/ansible/vars/master.yaml
echo "w_ignite: '$(cat /path/to/manifests/worker.64)'" > /path/to/ansible/vars/worker.yaml

Install required Python prerequisites for vSphere integration in Ansible

🎯 BASTION

yum install python3
pip3 install pyvmomi

Run playbook to create nodes

🎯 BASTION

cd /path/to/ansible
ansible-playbook playbook.yaml

Start bootstrap and master node and verify that it has started up properly

🎯 VCENTER LOADBALANCER

First, start the bootstrap node in vCenter, and go into the VM console to monitor the boot process. If it arrives at the login prompt with correct IP address specified, you can assume it has booted properly. Verify on the load balancer that it has entered the pool. Only then, start the master nodes in vCenter and go into the VM console to monitor the boot process in the same way as with the bootstrap node.

Wait for the bootstrap process to complete

🎯 BASTION

Use the openshift-installer utility to get feedback when the cluster has been bootstrapped. openshift-install wait-for bootstrap-complete --dir=/path/to/manifests When the bootstrap is finished, set the environment variable to point to the kube-admin KUBECONFIG: export KUBECONFIG=/path/to/manifests/auth/kubeconfig Run oc get nodes to verify that the API can be accessed and that the masters are in a “Ready” state.

Stop and delete the bootstrap node

🎯 VCENTER

Right click the bootstrap node in vCenter and forcibly stop it, then delete it. It is not needed anymore.

Start worker/infra nodes and authorize them to join the cluster

🎯 VCENTER BASTION

Start the nodes in vCenter, then check for CSR that need to be approved

oc get csr

When there are certificates pending, approve them all with

oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

There are several certificates that need to be approved for each node. Rerun the approve command above to approve them all.

Verify that the nodes eventually join the cluster and go into a “Ready” state (or new certicates pop in) with:

watch -n1 'oc get nodes; oc get csr'

Label worker nodes to allow them to handle egress traffic

🎯 BASTION

When using EgressIP, we need a set of nodes to handle routing of traffic out from the cluster. All outgoing traffic must be from the worker nodes.

The following command will label all worker nodes as able to host egress traffic.

oc label nodes -l node-role.kubernetes.io/worker="" k8s.ovn.org/egress-assignable=""

Configure build proxy settings

🎯 BASTION

By default the build containers in OpenShift does not use proxy settings, which will in most cases make the build process fail as they don’t have a route to get external resources/dependencies.

Update the cluster-wide build.config object to use the same proxy settings set on the cluster level. Restrict builds to worker nodes.

OCP_HTTP_PROXY=$(oc get proxy cluster -o json | jq '.spec.httpProxy')
OCP_HTTPS_PROXY=$(oc get proxy cluster -o json | jq '.spec.httpsProxy')
OCP_NO_PROXY=$(oc get proxy cluster -o json | jq '.status.noProxy')

oc patch build.config cluster --type merge --patch "
{\"spec\":{
    \"buildDefaults\": {
        \"defaultProxy\": {
            \"httpProxy\": $OCP_HTTP_PROXY ,
            \"httpsProxy\": $OCP_HTTPS_PROXY ,
            \"noProxy\": $OCP_NO_PROXY
            }
        },
        \"buildOverrides\": {
            \"nodeSelector\": {
                \"node-role.kubernetes.io/worker\": \"\"
            }
        }
    }}"

Note that NO_PROXY uses the status entry, not the spec entry. The reason for that is that Openshifts adds platform-known CIDRs and hostnames.

Log in to the cluster web GUI with the kubeadmin password

🎯 BROWSER

Find the URL with

oc get route -n openshift-console

Log in with user kubeadmin. Password is in the /path/to/manifests/auth/kubeadmin-password file.

Check cluster operator status

🎯 BROWSER

Go to Administration > Cluster Settings > ClusterOperators.

Sort list by status and monitor operators not in “Available” status.

Configure authentication

🎯 BROWSER BASTION

Go to Administration > Cluster settings > Global configuration and select OAuth.

Define an authorization based on requirements (OIDC, LDAP etc). Temporary access can be provided through the htpasswd provider.

htpasswd

Install httpd-utils on the bastion, and create a htpasswd file with credentials

yum -y httpd-utils
htpasswd -B /path/to/htpasswd username

You can enter this password into the authentication provider mentioned before. If the htpasswd provider is already defined and you need to edit it (add/modify/delete users), you need to update the htpasswd secret in openshift-config

LDAP

For manual configuration, use the GUI through Administration > Cluster Settings > Global Configuration > OAuth and add an LDAP provider, or use add an LDAP entry to the spec.identityProviders of the OAuth.config.openshift.io

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
    - ldap:
      attributes:
        email:
          - mail
        id:
          - sAMAccountName
        name:
          - cn
        preferredUsername:
          - sAMAccountName
        bindDN: '...'
        bindPassword:
          name: ldap-bind-password-kxp5s
        ca:
          name: ldap-ca-kjxnk
          insecure: false
        url: 'ldaps://ldap.example.com:636/OU=UsersOU,DC=dc1,DC=example,DC=com?sAMAccountName?sub'
      mappingMethod: claim
      name: LDAP
      type: LDAP
  • The bindPassword secret is in the openshift-config namespace and has the key bindPassword
  • The LDAP CA configmap is in the openshift-config namespace and has the key ca.crt

Remove the kubeadmin user

🎯 BASTION

When one or more authentication providers has been provided, the kubeadmin access can be removed

oc delete secrets kubeadmin -n kube-system

Docs: https://docs.openshift.com/container-platform/4.8/authentication/remove-kubeadmin.html

Verify that the installer has configured vSphere storage properly

🎯 BROWSER

  • Go to Storage > PersistentVolumeClaims > Create PersistentVolumeClaim
  • Create a 1GB or so test volume
  • Verify that it gets provisioned (bound)

Enable the internal image registry

🎯 BASTION

When installing UPI on platforms without available shared (ReadWriteMany) storage, the internal registry will be removed. After shared storage capabilities has been added to the cluster, the registry can be reenabled if desired. If external registries like Harbor is in use as part of a strengthened container supply chain, the internal registry may be omitted.

echo 'apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: image-registry-storage
  namespace: openshift-image-registry
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 250Gi
  storageClassName: nfs
  volumeMode: Filesystem' | oc create -f -

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch {"spec":{"managementState":"Managed","storage":{"pvc":{"claim":"image-registry-storage"}}}}

Optional: Disable IPv6 platform wide

🎯 BASTION

If IPv6 is not desired to run on the nodes, it must be disabled expliclity by modifying the CoreOS kernel parameters. This is done through a MachineConfig definition.

echo '---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-openshift-machineconfig-worker-kargs
spec:
  kernelArguments:
    - ipv6.disable=1
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-openshift-machineconfig-master-kargs
spec:
  kernelArguments:
    - ipv6.disable=1
' | oc apply -f -