Automating installation of Red Hat OpenShift 4 on VSphere
Prerequisites
- Infrastructure set up and available
- VLANs
- Load balancers (API + apps)
- Remote access (to bastion/jumphost)
- Firewalls
- VMware vSphere set up and ready
- Bastion host installed and configured
Download auth file from Red Hat and modify it to disable telemetry
🎯 BASTION
Log on to https://cloud.redhat.com with a Red Hat account with the proper subscriptions attached, or get it from someone with such access. The auth file is JSON file containing tokens providing access to vendor registries.
The JSON file contains a reference to cloud.redhat.com. This needs to be removed in order to disable OpenShift’s telemetry option where an unacceptable amount of data about the cluster is shared with the vendor.
jq -c 'del(.["auths"]["cloud.openshift.com"])' authfile.json > new-authfile.json
The cluster type to be created is bare metal user-provisioned infrastructure (UPI).
Download openshift-installer and openshift client (oc)
🎯 BASTION
Log on to https://cloud.redhat.com with a Red Hat account with the proper subscriptions attached, or get it from someone with such access in order to get updated download URLs.
- https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-install-linux.tar.gz
- https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
Download Red Hat Core OS OVA template
🎯 BASTION
Fetch the latest version (https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/) of the OVA template. This template must be added to the vCenter Content Library. The OVA template is then used to create a VM, which is used as the source when creating a new VM later in these instructions. After cloning it to VM, mark the VM as a template.
The template should be placed in vc01.example.com > ocp > templates. Name it accordingly, e.g.: rhcos-4.9
Do not start the VM after it is generated from the OVA template. It will taint the disk and it can not be used a source for new VMs. This is why you’ll mark the VM as a template after it is cloned from the OVA template.
Generate a ED25519 SSH key
🎯 BASTION
ssh-keygen -t ed25519 -f <destination>
Create the install-config.yaml file
🎯 BASTION
Fill out the install-config.yaml with required information
- The file must be named install-config.yaml
- Be aware that when this file will be deleted by the create manifest operation later, so keep a copy around
- Review IP ranges as these can not be changed after installation. They must not be in use elsewhere in the organization (other than other Kubernetes clusters)
- Verify that the noProxy IPs include the LB VIPs and (importantly) the vCenter IP
- Verify that the noProxy IPs includes 169.254.0.0/16
- networkType can be either OpenShiftSDN or OVNKubernetes, but there has been varying issues with installing with OVN. It will be the default SDN in OpenShift 4.8, so a migration path is already documented by the vendor, but due to the installation issues avoid installing it for now.
- PS: noProxy auto-adds several entries automagically:, node, cluster and service CIDRs, localhost, internal API url (e.g. api-int.ocp.example.com), .cluster.local and .svc. (Source: link1 link2)
apiVersion: v1
baseDomain: lab.example.com
proxy:
httpProxy: http://proxy.example.com:3128
httpsProxy: http://proxy.example.com:3128
noProxy: <bastion>,.example.com,10.,192.168.,10.20.30.40,10.20.30.0/24
compute:
- hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: ocp1
networking:
clusterNetwork:
- cidr: 10.229.0.0/16
hostPrefix: 23
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
platform:
vsphere:
vcenter: address.to.vcenter.api
username: username@vsphere.local
password: password
datacenter: DCxx
defaultDatastore: datastorename
fips: false
pullSecret: '<auth json from above>'
sshKey: '<ssh pubkey from above>'
Create manifests
🎯 BASTION
Create
cp install-config.yaml /path/to/manifest
openshift-install create manifests --dir=/path/to/manifest
Remove master Machine and worker MachineSet files
Delete the master Machine and worker MachineSet files - these are generated because the installer assumes that we’re using installer-provisioned infrastructure (IPI) while we’re actually using user-provisioned infrastructure (UPI)
cd /path/to/manifest
rm openshift/99_openshift-cluster-api_master-machines-* openshift/99_openshift-cluster-api_worker-machineset-0.
yaml
Modify master schedulability
Modify the cluster-scheduler-02-config.yml file so that the masters won’t schedule normal workloads when the cluster starts up.
cd /path/to/manifest
sed -i 's,mastersSchedulable: true,mastersSchedulable: false,g' manifests/cluster-scheduler-02-config.yml
Add NTP configuration as MachineConfig BASTION The NTP configuration needs to be applied as MachineConfigs, with the /etc/chrony.conf as a base64 string. The same base64-string can be applied to both files.
echo """pool 10.123.30.25
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony""" | base64 -w0
cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
Place the following files under: /path/to/manifest/openshift
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-masters-chrony-configuration
spec:
config:
ignition:
config: {}
security:
tls: {}
timeouts: {}
version: 3.2.0
networkd: {}
passwd: {}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
mode: 420
overwrite: true
path: /etc/chrony.conf
osImageURL: ""
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-workers-chrony-configuration
spec:
config:
ignition:
config: {}
security:
tls: {}
timeouts: {}
version: 3.2.0
networkd: {}
passwd: {}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,cG9vbCAxMC4xMjMuMzAuMjUKZHJpZnRmaWxlIC92YXIvbGliL2Nocm9ueS9kcmlmdAptYWtlc3RlcCAxLjAgMwpydGNzeW5jCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK
mode: 420
overwrite: true
path: /etc/chrony.conf
osImageURL: ""
Create ignition files
🎯 BASTION
The ignition files are what we will provide to the nodes to allow them to configure themselves as part of the bootstrapping process. This will also generate authentication files for the cluster.
Configure the cluster identifier in vSphere directory
🎯 VCENTER
When the ignition files are created, a unique identifier is generated in the metadata.json file. This identifier needs to be used to create a directory in vSphere and in the common config in the Ansible playbook.
jq .infraID /path/to/manifests/metadata.json
"ocp1-f9lkm"
Convert the ignition files to base64
🎯 BASTION
The ignition files needs to be applied as base64-formatted variables in vCenter to the Red Hat CoreOS template.
cd /path/to/manifests
for foo in master worker bootstrap; do base64 -w0 ${foo}.ign > ${foo}.64; done
Create DNS A and PTR records for all required nodes and load balancer VIPs
🎯 NAMESERVER
In this implementation, the DNS will be handled by a private DNS server with dnsmasq. It can also be handled by the generic DNS if that is feasible.
In /etc/dnsmasq.conf on the nameserver, every node and VIP has an address and a ptr-record entry.
local=/.ocp2.lab.example.com/
local=/.30.20.10.in-addr.arpa./
listen-address=127.0.0.1,10.20.30.40
...
address=/worker6.ocp2.lab.example.com/10.20.30.10
ptr-record=10.30.20.10.in-addr.arpa.,"worker6.ocp2.lab.example.com"
...
Create Ansible playbook for provisioning nodes
🎯 BASTION
Create an ansible subdirectory and create the playbook file playbook.yaml in it.
- hosts: localhost
vars_files:
- vars/bootstrap.yaml
- vars/master.yaml
- vars/worker.yaml
vars:
vsphere:
vcenter_hostname: vcenter.address
vcenter_username: username@vsphere.local
vcenter_password: password
datacenter_name: DCXX
cluster_name: vcenter-clustername
target_folder: 'ocp1-xxxxx'
template_name: 'rhcos-4.9'
tasks:
- name: 'Create a virtual machine from the template'
vmware_guest:
hostname: '{{ vsphere.vcenter_hostname }}'
username: '{{ vsphere.vcenter_username }}'
password: '{{ vsphere.vcenter_password }}'
datacenter: '{{ vsphere.datacenter_name }}'
cluster: '{{ vsphere.cluster_name }}'
folder: '{{ vsphere.target_folder }}'
name: '{{ item.name }}'
template: '{{ vsphere.template_name }}'
advanced_settings:
- key: guestinfo.ignition.config.data.encoding
value: base64
- key: guestinfo.ignition.config.data
value: '{{ item.ignite }}'
- key: disk.EnableUUID
value: 'TRUE'
- key: 'guestinfo.afterburn.initrd.network-kargs'
value: 'ip={{ item.ip }}::{{ item.gw }}:{{ item.mask }}:::none nameserver=<nameserver ip>'
disk:
- size_gb: "{{ item.disk }}"
type: thin
datastore: datastore01
hardware:
memory_mb: "{{ item.mem }}"
num_cpus: "{{ item.cpu }}"
scsi: paravirtual
hotadd_cpu: False
hotremove_cpu: False
hotadd_memory: False
version: 19
networks:
- name: '{{ item.network }}'
start_connected: yes'
type: static
validate_certs: False
#state: poweredon
state: present
#state: absent
with_items:
- { name: bootstrap.ocp2.domain.name, ip: 10.20.6.102, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 65536, cpu: 16, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ b_ignite }}" }
- { name: master0.ocp2.domain.name, ip: 10.20.6.98, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
- { name: master1.ocp2.domain.name, ip: 10.20.6.99, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
- { name: master2.ocp2.domain.name, ip: 10.20.6.100, gw: 10.20.6.97, mask: 255.255.255.240, disk: 120, mem: 16384, cpu: 8, network: 'VLAN-1234-XXXX-MASTER2', ignite: "{{ m_ignite }}" }
- { name: infra0.ocp2.domain.name, ip: 10.20.6.66, gw: 10.20.6.65, mask: 255.255.255.240, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-1111-XXXX-INFRA2', ignite: "{{ w_ignite }}" }
- { name: infra1.ocp2.domain.name, ip: 10.20.6.67, gw: 10.20.6.65, mask: 255.255.255.240, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-1111-XXXX-INFRA2', ignite: "{{ w_ignite }}" }
- { name: worker0.ocp2.domain.name, ip: 10.20.7.4, gw: 10.20.7.1, mask: 255.255.255.0, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-2222-XXXX-WORKER2', ignite: "{{ w_ignite }}" }
- { name: worker1.ocp2.domain.name, ip: 10.20.7.5, gw: 10.20.7.1, mask: 255.255.255.0, disk: 64, mem: 65536, cpu: 16, network: 'VLAN-2222-XXXX-WORKER2', ignite: "{{ w_ignite }}" }
Create Ansible variable files based on the base64 ignite configs
🎯 BASTION
These ignition configs will be available for Ansible, depending on the role defined by the ignite-parameter in the with_items list above.
echo "b_ignite: '$(cat /path/to/manifests/bootstrap.64)'" > /path/to/ansible/vars/bootstrap.yaml
echo "m_ignite: '$(cat /path/to/manifests/master.64)'" > /path/to/ansible/vars/master.yaml
echo "w_ignite: '$(cat /path/to/manifests/worker.64)'" > /path/to/ansible/vars/worker.yaml
Install required Python prerequisites for vSphere integration in Ansible
🎯 BASTION
yum install python3
pip3 install pyvmomi
Run playbook to create nodes
🎯 BASTION
cd /path/to/ansible
ansible-playbook playbook.yaml
Start bootstrap and master node and verify that it has started up properly
🎯 VCENTER LOADBALANCER
First, start the bootstrap node in vCenter, and go into the VM console to monitor the boot process. If it arrives at the login prompt with correct IP address specified, you can assume it has booted properly. Verify on the load balancer that it has entered the pool. Only then, start the master nodes in vCenter and go into the VM console to monitor the boot process in the same way as with the bootstrap node.
Wait for the bootstrap process to complete
🎯 BASTION
Use the openshift-installer utility to get feedback when the cluster has been bootstrapped.
openshift-install wait-for bootstrap-complete --dir=/path/to/manifests
When the bootstrap is finished, set the environment variable to point to the kube-admin KUBECONFIG:
export KUBECONFIG=/path/to/manifests/auth/kubeconfig
Run oc get nodes to verify that the API can be accessed and that the masters are in a “Ready” state.
Stop and delete the bootstrap node
🎯 VCENTER
Right click the bootstrap node in vCenter and forcibly stop it, then delete it. It is not needed anymore.
Start worker/infra nodes and authorize them to join the cluster
🎯 VCENTER BASTION
Start the nodes in vCenter, then check for CSR that need to be approved
oc get csr
When there are certificates pending, approve them all with
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
There are several certificates that need to be approved for each node. Rerun the approve command above to approve them all.
Verify that the nodes eventually join the cluster and go into a “Ready” state (or new certicates pop in) with:
watch -n1 'oc get nodes; oc get csr'
Label worker nodes to allow them to handle egress traffic
🎯 BASTION
When using EgressIP, we need a set of nodes to handle routing of traffic out from the cluster. All outgoing traffic must be from the worker nodes.
The following command will label all worker nodes as able to host egress traffic.
oc label nodes -l node-role.kubernetes.io/worker="" k8s.ovn.org/egress-assignable=""
Configure build proxy settings
🎯 BASTION
By default the build containers in OpenShift does not use proxy settings, which will in most cases make the build process fail as they don’t have a route to get external resources/dependencies.
Update the cluster-wide build.config object to use the same proxy settings set on the cluster level. Restrict builds to worker nodes.
OCP_HTTP_PROXY=$(oc get proxy cluster -o json | jq '.spec.httpProxy')
OCP_HTTPS_PROXY=$(oc get proxy cluster -o json | jq '.spec.httpsProxy')
OCP_NO_PROXY=$(oc get proxy cluster -o json | jq '.status.noProxy')
oc patch build.config cluster --type merge --patch "
{\"spec\":{
\"buildDefaults\": {
\"defaultProxy\": {
\"httpProxy\": $OCP_HTTP_PROXY ,
\"httpsProxy\": $OCP_HTTPS_PROXY ,
\"noProxy\": $OCP_NO_PROXY
}
},
\"buildOverrides\": {
\"nodeSelector\": {
\"node-role.kubernetes.io/worker\": \"\"
}
}
}}"
Note that NO_PROXY uses the status entry, not the spec entry. The reason for that is that Openshifts adds platform-known CIDRs and hostnames.
Log in to the cluster web GUI with the kubeadmin password
🎯 BROWSER
Find the URL with
oc get route -n openshift-console
Log in with user kubeadmin. Password is in the /path/to/manifests/auth/kubeadmin-password file.
Check cluster operator status
🎯 BROWSER
Go to Administration > Cluster Settings > ClusterOperators.
Sort list by status and monitor operators not in “Available” status.
Configure authentication
🎯 BROWSER BASTION
Go to Administration > Cluster settings > Global configuration and select OAuth.
Define an authorization based on requirements (OIDC, LDAP etc). Temporary access can be provided through the htpasswd provider.
htpasswd
Install httpd-utils on the bastion, and create a htpasswd file with credentials
yum -y httpd-utils
htpasswd -B /path/to/htpasswd username
You can enter this password into the authentication provider mentioned before. If the htpasswd provider is already defined and you need to edit it (add/modify/delete users), you need to update the htpasswd secret in openshift-config
LDAP
For manual configuration, use the GUI through Administration > Cluster Settings > Global Configuration > OAuth and add an LDAP provider, or use add an LDAP entry to the spec.identityProviders of the OAuth.config.openshift.io
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
name: cluster
spec:
identityProviders:
- ldap:
attributes:
email:
- mail
id:
- sAMAccountName
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: '...'
bindPassword:
name: ldap-bind-password-kxp5s
ca:
name: ldap-ca-kjxnk
insecure: false
url: 'ldaps://ldap.example.com:636/OU=UsersOU,DC=dc1,DC=example,DC=com?sAMAccountName?sub'
mappingMethod: claim
name: LDAP
type: LDAP
- The bindPassword secret is in the openshift-config namespace and has the key bindPassword
- The LDAP CA configmap is in the openshift-config namespace and has the key ca.crt
Remove the kubeadmin user
🎯 BASTION
When one or more authentication providers has been provided, the kubeadmin access can be removed
oc delete secrets kubeadmin -n kube-system
Docs: https://docs.openshift.com/container-platform/4.8/authentication/remove-kubeadmin.html
Verify that the installer has configured vSphere storage properly
🎯 BROWSER
- Go to Storage > PersistentVolumeClaims > Create PersistentVolumeClaim
- Create a 1GB or so test volume
- Verify that it gets provisioned (bound)
Enable the internal image registry
🎯 BASTION
When installing UPI on platforms without available shared (ReadWriteMany) storage, the internal registry will be removed. After shared storage capabilities has been added to the cluster, the registry can be reenabled if desired. If external registries like Harbor is in use as part of a strengthened container supply chain, the internal registry may be omitted.
echo 'apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: image-registry-storage
namespace: openshift-image-registry
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 250Gi
storageClassName: nfs
volumeMode: Filesystem' | oc create -f -
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch {"spec":{"managementState":"Managed","storage":{"pvc":{"claim":"image-registry-storage"}}}}
Optional: Disable IPv6 platform wide
🎯 BASTION
If IPv6 is not desired to run on the nodes, it must be disabled expliclity by modifying the CoreOS kernel parameters. This is done through a MachineConfig definition.
echo '---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-openshift-machineconfig-worker-kargs
spec:
kernelArguments:
- ipv6.disable=1
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-openshift-machineconfig-master-kargs
spec:
kernelArguments:
- ipv6.disable=1
' | oc apply -f -