What is Magnum?
Magnum is an OpenStack project which is a Container Orchestration Engine. EHK v 3.0 uses Magnum to deploy and manage Kubernetes clusters
How Does EHK v3.0 differ from EHK 2.0?
EHK v3.0 is architecturally completely different and provides a much smoother user experience. Here are some of the headline changes -
- Users are completely self-service as they deploy and manage their own Kubernetes nodes
- Worker node auto-scaling works out of the box based on load
- LoadBalancer-type service works like public Cloud providers
- GPU usage supported (hardware allowing)
- Ingress fully integrated with Octavia (OpenStack load balancing service)
- Container usage supported out of the box (Docker runtime is being deprecated)
- Cluster expansions are only limited by OpenStack project quotas
- Users have access to the OpenStack project layer
Security
Warning!
By default, a new cluster is open to the world. You need to secure it as soon as it is deployed - see Securing your Cluster below.
Quick Start
Here is a quick start video demonstrating Magnum cluster creation and autoscaling of pods and nodes
How to Size Your Cluster
This table gives some examples of the total resources you may need to request for the Embassy project quota for different configurations. It includes the flavour size to select when creating a Magnum cluster.
Note
You can have more than one Magnum cluster per OpenStack project.
To spec multiple clusters add the total resource together. For example for two clusters, 1 small and 1 medium, your quota request in the ServiceNow form would be 32 vCPU, 32 GB memory and 500 GB Cinder storage.
Note
The floating IP (public IP) column is the minimum required per cluster for Ingress, Monitoring and API. The minimum allowed per OpenStack project is 10 floating IPs to cover LoadBalancer Type and multiple clusters
Don’t forget to add storage GB for any persistent volumes you may need, as the storage below only covers Kubernetes nodes' OS and logs.
| Type | vCPU | Memory (GB) | Node Disk (GB) | Flavour | Masters | Minions | Floating IP | Use case |
|---|---|---|---|---|---|---|---|---|
| extra-small | 8 | 8 | 120 | 4c4m60d | 1 | 1 | 3 | dev |
| small | 12 | 12 | 180 | 4c4m60d | 1 | 2 | 3 | dev |
| medium | 20 | 20 | 320 | 4c4m60d | 3 | 2 | 3 | prod |
| large | 32 | 32 | 480 | 4c4m60d | 3 | 5 | 3 | prod |
| XL | 48 | 96 | 360 | 8c16m60d | 3 | 3 | 3 | prod |
| XXL | 64 | 64 | 780 | 4c4m60d | 3 | 10 | 3 | prod |
Creating an EHK v3.0 Cluster
A Kubernetes cluster needs at least one master and worker node. The master nodes are the point of entry for the Kubernetes API which you use to spin up containers on the worker nodes.
Cluster Templates
The Cloud team has prepared a cluster template for you to use to create your Kubernetes cluster. This can be done on the command line with the OpenStack Magnum client, or in Horizon web GUI.
Horizon GUI Cluster Creation
A Magnum cluster can be created with the web interface. However, the kubeconfig credentials have to be downloaded via the command line.
Log in to your project in Embassy
https://uk1.embassy.ebi.ac.uk/
Navigate to Container Infra/Clusters and select Create Cluster
Create New Cluster
In this dialogue box use these guidelines
- Cluster Name - Name of your choice
- Cluster template - ehk-basic-template-v3.2
- Availability Zone - nova
- KeyPair - select a keypair in your OpenStack Project. (You can create a new keypair under Compute/Key Pairs)
- Number of Master Nodes - 1 to 3 (use 1 for a small dev cluster)
- Flavour of Master Nodes - select magnum-ssd-masters
- Number of Worker Nodes - 1 to x
- Flavour of Worker Nodes - Use How to Size Your Cluster for flavour reference
- Select Auto-scale Worker Nodes
- Select Create new Network
- Select Enable Load Balancer for Master Nodes
- Cluster API - Important to leave this as Accessible on a private network only. See Persistent Volume Expansion
- Ingress Controller - This will be installed automatically anyway
- unselect Automatically Repair Unhealthy Nodes
Secure Cluster
Command Line Cluster Creation
First, you will need the OpenStack Magnum Client. Please see here Using the OpenStack CLI
Check available templates
openstack coe cluster template list -f value | grep ehk 792819b6-5eb8-4740-b7d3-95295399bdb8 ehk-basic-template-v1.0
Deploy cluster
openstack coe cluster create test-k8s-cluster \
--cluster-template ehk-basic-template-v1.0 \
--node-count 3 \
--keypair [your-keypair] \
--master-count 1
Minimum node counts
You will soon only have 1 node (minion) as the cluster will automatically scale down due to a lack of active resources. To work around this you can add min_node_count or remove auto-scaling in your own custom template and redeploy, or you can alter the cluster afterwards as follows -
openstack coe nodegroup list test-k8s-cluster openstack coe nodegroup show test-k8s-cluster default-worker --fit-width openstack coe cluster resize test-k8s-cluster --nodegroup default-worker 3 openstack coe nodegroup update test-k8s-cluster default-worker replace min_node_count=3
Secure Cluster
Terraform Cluster Creation
Note
You need to use an unrestricted application credential for this use case.
Please have a look at how to use Terrafom before creating a Magnum cluster.
resource "openstack_containerinfra_cluster_v1" "test-cluster" {
provider = openstack
name = "test-cluster"
cluster_template_id = "04d9e633-ef89-435b-ae9d-8cffa2aae330"
keypair = "my-keypair"
node_count = 1
flavor = "1c1m20d"
master_count = 1
master_flavor = "1c2m20d"
fixed_network = "my-network"
fixed_subnet = "my-subnet"
}
# Export resulting kubeconfig file
resource "local_file" "kubconfig_file" {
depends_on = [openstack_containerinfra_cluster_v1.my-k8s]
filename = "./kubeconfig"
file_permission = 600
content = openstack_containerinfra_cluster_v1.my-k8s.kubeconfig.raw_config
}
Fetch Kubernetes Credentials
Credentials are fetched via the OpenStack CLI (Magnum client). By default the credentials provide system:masters access.
grep 'client-certificate-data' ./config | awk '{print $2}' | base64 -d | openssl x509 -text | grep O=
Subject: CN=admin, O=system:masters
This should only be used to initially access your cluster due to security implications well described here https://blog.aquasec.com/kubernetes-authorization i.e not for routine admin tasks. Users should create new roles and credentials for this as outlined in this section here Organising cluster access using Kubeconfig files
Command to use to fetch initial credentials-
openstack coe cluster config [cluster name] --dir /config/
Kubernetes Support Policy
The Virtualisation and On-Premises Cloud team is dedicated to facilitating the deployment of Kubernetes clusters through our Magnum service while ensuring optimal performance and reliability at the infrastructure level. To streamline our support efforts and provide the best service to our users, we have established the following Kubernetes Support Policy:- Supported Versions:
- Usually, upgrading Embassy OpenStack to a higher version is followed by the introduction of a new Kubernetes version which usually happens twice a year.
- We provide support for the latest stable release of Kubernetes available on Embassy Hosted Kubernetes (Magnum).
- Additionally, we offer support for one version older than the latest stable release, but only if it is still officially supported by the Kubernetes community.
- Support Scope:
- Our support scope encompasses the provisioning, management, and maintenance of the underlying infrastructure required for Kubernetes cluster deployment.
- We ensure that the infrastructure components, such as networking, storage, and compute resources, are properly configured and optimised for Kubernetes workloads.
- End-of-Life Versions:
- We will ensure timely communication regarding any upcoming modifications to new Kubernetes templates and supported versions. Kubernetes versions that have reached their end-of-life (EOL) and for which higher versions are available in Embassy, will not be supported.
- We strongly recommend that users upgrade to the latest version available in Embassy to ensure ongoing security updates and access to new features and improvements.
- New Kubernetes versions will undergo testing and subsequent announcement, followed by the addition of new templates to Embassy. However, for users who have not upgraded to a supported version, functionality for their clusters cannot be guaranteed.
- Upgrade Recommendations:
- We encourage users to regularly upgrade their Kubernetes clusters to the latest supported version by Embassy to take advantage of performance enhancements, security patches, and new features.
- Our team is available to assist with upgrade planning to minimise downtime and ensure a smooth transition.
- Getting Support:
- For assistance with infrastructure-related aspects of Kubernetes deployment service, users can contact our support team through ServiceNow.
- When reaching out for support, please include relevant details about your setup and any specific issues encountered to facilitate a timely resolution.
Securing Your Cluster
To secure the cluster you will need to restrict it to allowed IP addresses/subnets. To do this you cannot use OpenStack security groups. You must use the OpenStack Octavia Load Balancer listener to select allowed networks. Never alter the instance (node) security groups as this will break connectivity with the LB.
There are two different methods required. The first one secures the kube api server listener on 6443, and the second secures listeners that are created when you create a service in Kubernetes.
Warning!
By default, all exposed services are open to the world with no restriction
Securing Load Balancers that are created by services in Kubernetes
Simply add the required internal and external network ranges to the service. This allows access from EBI internal networks and VPN to Kubernetes nodes running on internal network 10.0.0.0/24
Kind: Service
apiVersion: v1
metadata:
name: test
namespace: default
spec:
type: LoadBalancer
loadBalancerSourceRanges:
- 193.62.192.0/21
- 10.0.0.0/24
Securing API Server on port 6443
This secures the Kubernetes API on port 6443 to internal EBI networks (including VPN).
Check I have a cluster available -
openstack coe cluster list -f json
[
{
"status": "CREATE_COMPLETE",
"uuid": "fc8e21eb-34bf-47a4-8170-faed729f2da9",
"health_status": "HEALTHY",
"master_count": 1,
"keypair": "magnum-test",
"node_count": 1,
"name": "vac-test"
}
]
Look for my LB listener for the API -
openstack loadbalancer listener list -f json | jq '.[]|select(.protocol_port==6443)'
{
"protocol_port": 6443,
"protocol": "TCP",
"name": "vac-test-dgb3dcidunvz-api_lb-3afx5fzx6lkd-listener-h3xt3lecsond",
"admin_state_up": true,
"default_pool_id": "feb8f1a0-3513-4c44-8eaa-2fbf91ce7d91",
"project_id": "deb06b667e454f8a9e005bf334b6e65e",
"id": "0a08414c-4f68-4d59-9386-9beb116017f4"
}
Get the local subnet CIDR for my Magnum network -
openstack subnet show 505151c1-09b9-4c9c-a349-fd348420188a -f json | jq .cidr "10.0.0.0/24"
Set the allowed CIDR to EBI networks only (including VPN) -
openstack loadbalancer listener set --allowed-cidr 193.62.192.0/21 --allowed-cidr 10.0.0.0/24 vac-test-dgb3dcidunvz-api_lb-3afx5fzx6lkd-listener-h3xt3lecsond openstack loadbalancer listener show vac-test-dgb3dcidunvz-api_lb-3afx5fzx6lkd-listener-h3xt3lecsond -f json | jq .allowed_cidrs "10.0.0.0/24\n193.62.192.0/21"
Now test I have access from the VPN. First, get the API LB public address -
openstack loadbalancer show vac-test-dgb3dcidunvz-api_lb-3afx5fzx6lkd-loadbalancer-wct5fhalbuid -c vip_port_id -f value bcae96b3-7d22-49fa-bfc7-f53acec0efd2 openstack floating ip list -f json | jq '.[]|select(.Port=="bcae96b3-7d22-49fa-bfc7-f53acec0efd2")|."Floating IP Address"' "45.88.80.193"
Successfully test we have connectivity -
nc -z -v 45.88.80.193 6443 Connection to 45.88.80.193 6443 port [tcp/*] succeeded!
OpenStack Security Groups
These can be found in Horizon web GUI https://uk1.embassy.ebi.ac.uk/
If you have enabled SSH access you also need to change the port 22 SSH rules in both master and minion groups
Note
For more detail on security please see Security Best Practices
Storage
To choose the type of storage backend you need to work out the access mode for your application
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
ReadWriteOnce
This means the volume can be mounted as read-write by a single node.
When you create a cluster a default ReadWriteOnce storage class is installed. This provides dynamic Persistent Volumes using Cinder (backed by Ceph).
kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE default-cinder (default) cinder.csi.openstack.org Delete Immediate true 67m
ReadWriteMany
It is possible to achieve this by installing an NFS server in your cluster backed by our default-cinder PV above.
The process below enables you to build the image and push it to your EBI Gitlab registry (if you have one).
Note
We have also provided a prebuilt image here quay.io/embassycloud/nfs-provisioner which does not require a secret to pull and simplifies the process.
git clone https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner docker login https://dockerhub.ebi.ac.uk -u [gitlab user] -p [gitlab personal access token] cd nfs-ganesha-server-and-external-provisioner make build make container docker tag nfs-provisioner:latest dockerhub.ebi.ac.uk/[your gitlab project]/[your gitlab repo]/nfs-provisioner docker push dockerhub.ebi.ac.uk/[your gitlab project]/[your gitlab repo]/nfs-provisioner kubectl create secret docker-registry regcred --docker-server=dockerhub.ebi.ac.uk --docker-username=[your gitlab user] --docker-password=[gitlab personal access token]
Create a new PVC
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfsclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: default-cinder
Alter the file deploy/kubernetes/deployment.yaml
image: dockerhub.ebi.ac.uk/[your gitlab project]/[your gitlab repo]/nfs-provisioner
imagePullSecrets:
- name: regcred
volumes:
- name: export-volume
persistentVolumeClaim:
claimName: nfsclaim
Alter the file deploy/kubernetes/class.yaml to “Retain” to prevent deletion on pod termination
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: example-nfs provisioner: example.com/nfs reclaimPolicy: Retain mountOptions: - vers=4.1
Then create NFS server and new storage class and test PVC
kubectl create -f deploy/kubernetes/deployment.yaml kubectl create -f deploy/kubernetes/rbac.yaml kubectl create -f deploy/kubernetes/class.yaml kuˆbectl create -f deploy/kubernetes/claim.yaml
So you will now have two storage classes and two PVCs with two different backends, one for the NFS server backend and a new test one
kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE default-cinder (default) cinder.csi.openstack.org Delete Immediate true 6h49m example-nfs example.com/nfs Delete Immediate false 73m kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE NFSClaim Bound pvc-69e1c626-6628-4ec6-b738-415658cbde2f 2Gi RWO default-cinder 5h28m nfs Bound pvc-33ca8e4c-9c97-4a03-b91f-f953a31721a8 1Mi RWX example-nfs 65m
Persistent Volume Expansion
PersistentVolume expansion allows Kubernetes users to simply edit their PersistentVolumeClaim objects and specify new size in PVC Spec and Kubernetes will automatically expand the volume using storage backend and also expand the underlying file system in-use by the Pod without requiring any downtime at all if possible.
The following steps guide you with the expansion of Persistent Volumes (PVs) on your Magnum Kubernetes cluster.
Note
The allowVolumeExpansion property of a StorageClass determines whether or not the StorageClass supports the ability to resize volumes after they are created. It is set to true by default in the default-cinder StorageClass (the default StorageClass of Magnum Kubernetes clusters).
Expand PVC with the new size. Make sure you only edit the value of resources.request.storage under the spec of PVC.
kubectl edit pvc my-claim
Verify that the volume has been expanded. You should be able to see the increased PV capacity.
kubectl get pv
Also, you can verify that the cider volume has been expanded
openstack volume show 8218e1d7-cea9-4b78-8e51-281ff95904b7 -c size
When Kubernetes starts expanding the volume it will add a Resizing condition to the PVC, which will be removed once the expansion completes.
kubectl get pvc
External Cluster Access Methods
The templates we provide do not apply any floating IPs (public IP addresses) on the cluster nodes as this is considered insecure and resource-heavy. If you are using the Horizon GUI to install your cluster please DO NOT select Accessible on the Public Internet for the Cluster API as this will override our template settings and try and attach a floating IP to every node in the cluster, causing both quota and security issues.
External access is via an OpenStack Load Balancer (Octavia) which is automatically provisioned with a floating IP on cluster creation. The kubeconfig file will point to this IP for API access on port 6443. This will need securing, please see Securing your Cluster for details. All applications within Kubernetes will be accessed via OpenStack Load Balancers.
The recommended method is using Ingress as one OpenStack load balancer can accommodate many applications with the same floating IP and load balancer. Conversely type: LoadBalancer provisions a new OpenStack load balancer and floating IP for each service making it resource heavy.
Here is how various access methods work in detail
LoadBalancer Type
When you specify type: LoadBalancer in a service manifest an OpenStack load balancer is automatically created with a floating IP. This passes in traffic externally via a designated port to a NodePort. The NodePort will need to be secured in the OpenStack Security Groups, please see Securing your Cluster for details
Ingress
The recommended Ingress is NGINX Ingress controller https://kubernetes.github.io/ingress-nginx/. This is automatically installed as part of a Magnum post-install manifest. The Ingress controller service is a LoadBalancer service that automatically creates an OpenStack Load Balancer listening on port 80 and 443
kubectl get svc ingress-nginx-controller -n ingress-nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.254.247.42 45.88.80.167 80:31666/TCP,443:31920/TCP 19m
We can see the public IP for our services using Ingress under EXTERNAL-IP. This is the floating IP bound the the load balancer.
You can also restrict access to your ingress service by adding the whitelist-source-range annotation to your ingress object:
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: myservice-ingress
namespace: myservice
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/whitelist-source-range: 193.62.0.0/16
spec:
tls:
- hosts:
secretName: myservice.ebi.ac.uk
rules:
- host: myservice.ebi.ac.uk
http:
paths:
- backend:
serviceName: myservice-service
servicePort: 8000
path: /
NodePort
type: NodePort is not recommended to expose services as this will not work in an automated way. No external access will be possible until a new OpenStack Octavia Load Balancer, Listener and Pool have been created manually for the specific NodePort. See Exposing type: NodePort for details.
Note
The NodePort will need to be secured in the OpenStack Security Groups, please see Securing your Cluster for details
SSH access to Kubernetes Nodes
This can be achieved by creating a bastion host in your OpenStack project with SSH key access only and a floating IP attached (recommended).
It is possible to create a new OpenStack Octavia load balancer listening on port 22 using a similar workflow to Exposing type: NodePort, but this may timeout frequently and is not recommended.
Note
Before doing this the OpenStack project must be secured for SSH access with OpenStack Security Groups, see Securing your Cluster. To ssh to EHK nodes you must use the private key used to deploy the cluster. The user is core for cluster created with EHK templates
Monitoring
The cluster is automatically installed with kube-prometheus and exposes a LoadBalancer service frontend for Grafana that provides cluster metrics
kubectl get svc -A | grep grafana-pub kube-system magnum-grafana-public LoadBalancer 10.254.153.121 45.88.81.21 80:32134/TCP
So to view this you would simply navigate to http://48.88.81.21 and login
Note
user and password is admin. You will be asked to change your password. Please See Securing your Cluster to limit access to this service
Monitoring your own application
We can use the pre-installed kube-prometheus stack to achieve this as per this workflow
Exposing type: NodePort
This workflow shows how to manually configure an OpenStack Load Balancer for NodePort external access.
Note
You will need the Magnum OpenStack Client and a secured EHK 3.0 cluster. See Securing your Cluster and Using the OpenStack CLI
Nodeport Workflow
Create test Nginx deployment with nodeport 31389
kubectl create deployment nginx --image=nginx kubectl create service nodeport nginx --tcp=80:80 --node-port 31389 kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx NodePort 10.254.137.50 <none> 80:31389/TCP
List subnets
openstack subnet list -c ID -c Name -f value 6882959c-0db5-4f50-ad2d-d05fbfeaf654 test-cems-bcntwzytb3so-network-ul7yucjqajfr-private_subnet-6tpafqbzhn44 5s
Create a loadbalancer
openstack loadbalancer create --vip-subnet-id 6882959c-0db5-4f50-ad2d-d05fbfeaf654 --name test-cems-lb
Create a listener
openstack loadbalancer listener create --protocol-port 31389 --name test-cems-listener --protocol TCP test-cems-lb
Create a pool
openstack loadbalancer pool create --listener test-cems-listener --protocol TCP --lb-algorithm SOURCE_IP --name cems-test-pool
List available nodes
openstack server list +--------------------------------------+---------------------------------+--------+---------------------+----------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+---------------------------------+--------+---------------------+----------------+-----------+ | 35ffe989-e3a9-4e3e-89e0-5715bf868938 | test-cems-bcntwzytb3so-node-0 | ACTIVE | test-cems=10.0.0.77 | FedoraCoreOS32 | m1.medium | | 041a5c6a-0290-4deb-95be-070ef617a232 | test-cems-bcntwzytb3so-master-0 | ACTIVE | test-cems=10.0.0.32 | FedoraCoreOS32 | m1.medium | +--------------------------------------+---------------------------------+--------+---------------------+----------------+-----------+
Create pool members
openstack loadbalancer member create --address 10.0.0.77 --protocol-port 31389 cems-test-pool
Get a floating IP
openstack floating ip create public
Get LB port to associate
openstack loadbalancer show -c vip_port_id test-cems-lb +-------------+--------------------------------------+ | Field | Value | +-------------+--------------------------------------+ | vip_port_id | 703014b9-3872-44ea-8f3a-477cd2569250 | +-------------+--------------------------------------+
Associate the floating IP to the LB The first UUID is the LV vip_port_id. The second is the floating IP UUID which you can see with the command OpenStack floating ip list
openstack floating ip set --port 703014b9-3872-44ea-8f3a-477cd2569250 f59f98b2-9b80-458d-ad12-048af49294e4
Test external access
curl [floating IP address]:31389 \...e <h1>Welcome to nginx!</h1> ...../
Resizing Your Cluster
OpenStack Magnum Resize Documentation
For example, to remove a specific node you need to find out the cluster name, the node group UUID and the server UUID
openstack coe cluster resize [cluster name] --nodegroup [node group name] 5 --nodes-to-remove [server UUID]
Bulk Copying Data to a Persistent Volume
Using parallel rsync to copy data into a PV from a source outside the cluster.
Here we are going to create a PV with the default storage class. This will be used to create a new storage class hosted by the NFS provisioner.
Note
Why Do We Need a New Storage Class? - we are launching multiple jobs (pods) in parallel. This means parallel writes to the same Persistent Volume. The default storage class does not allow this as it is ReadWriteOnce. By creating a new storage class with the NFS provisioner we can have a ReadWriteMany Persistent Volume which allows parallel writes from multiple pods (Kubernetes jobs)
We then create a PVC and this is mounted by our Kubernetes job. The job then uses rsync to copy files from a source server (which must have rsync installed) to the new PV in ReadWriteMany mode. The new PV will be retained on job deletion and can then be mounted by your application.
Workflow
First, create a new storage class with the NFS Provisioner by following the section ReadWriteMany above.
Make sure the PVC you create is large enough to hold your data.
Create a secret for the SSH private key for your source server (this server must have a public IP address so it is accessible from Embassy)
kubectl create secret generic my-ssh-key --from-file=id_rsa=/home/cems/rsynctestk8s/test.pem
Create a PVC from the new Storage Class you created
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rwmdata
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: example-nfs
We now create multiple Kubernetes jobs to run in parallel, which use an Alpine image we provide to rsync to the destination PV
First, create the yaml template and call it “job-tmpl.yaml”
apiVersion: batch/v1
kind: Job
metadata:
name: rsync-$ITEM
spec:
template:
spec:
containers:
- name: rsync-$ITEM
image: quay.io/embassycloud/alpinersync:latest
imagePullPolicy: Always
command: ["rsync", "-a","-v","-e", "ssh -i /mykey/id_rsa -o StrictHostKeyChecking=no", "centos@[ip address]:/home/centos/test-transfer/$ITEM", "/data"]
volumeMounts:
- mountPath: /data
name: data
- mountPath: "/mykey"
readOnly: true
name: ssh-key-volume
volumes:
- name: data
persistentVolumeClaim:
claimName: rwmdata
- name: ssh-key-volume
secret:
secretName: my-ssh-key
defaultMode: 256
restartPolicy: Never
backoffLimit: 1
Then create the manifests substituting $ITEMS in the template for the relevant path to rsync
mkdir ./jobs for i in path1 path2 path3 do cat job-tmpl.yaml | sed "s/\$ITEM/$i/" > ./jobs/job-$i.yaml done
Then run the jobs -
kubectl create -f ./jobs
If finished successfully you can delete the job
kubectl delete job rsync
Based on this https://kubernetes.io/docs/tasks/job/parallel-processing-expansion/
There are other built-in ways to do job parallel processing using Kubernetes jobs, but these depend on a message queue (Redis, RabbitMQ) which adds much more complexity and overhead to this process
cert-manager
cert-manager self-checks do not work with proxy protocol
https://github.com/jetstack/cert-manager/issues/863
this is an issue as our default nginx ingress controller configuration has proxy protocol enabled.
https://github.com/kubernetes/cloud-provider-openstack/issues/1287
If you do not need proxy protocol then the cleanest solution is to remove and reinstall with the latest compatible helm chart.
Organising cluster access using Kubeconfig files
Suppose you want to provide your users with different ways to authenticate. For example, administrators might have sets of certificates that they provide to individual users and give them cluster-admin access, namespace admin access or create a custom role with a subset of privileges (e.g. deployment-master).
Note
The cert-manager API is only fully working in cluster template >=ehk-basic-template-v2.3
Create user credentials and a new kubeconfig file
Step 1: Create the user credentials
Kubernetes does not have API Objects for User Accounts, so we will use OpenSSL certificates. You need to create a Private Key and a Certificate Sign Request using this private key. Make sure you specify your username and group in the -subj section (CN is for the username and O is for the group). This command will create both files in your current directory:
openssl req -new -newkey rsa:4096 -nodes -keyout myuser.key -out myuser.csr \
-subj "/CN=myuser/O=myproject"
Then, you need to approve the Certificate Signing Requests using the certmanager process running in your cluster:
cat <<EOF | kubectl apply -f -
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: myuser-myproject
spec:
groups:
- system:authenticated
request: $(cat myuser.csr | base64 | tr -d '\n')
signerName: kubernetes.io/kube-apiserver-client
usages:
- client auth
EOF
certificatesigningrequest.certificates.k8s.io/myuser-myproject created
# Check status
kubectl get csr
NAME AGE REQUESTOR CONDITION
myuser-myproject 3s kube-admin Pending
A new Certificate Signing Request should be waiting in your cluster to be Approved and Issued. Run the following command to approve this request:
kubectl certificate approve myuser-myproject certificatesigningrequest.certificates.k8s.io/myuser-myproject approved # Check current Status kubectl get csr NAME AGE REQUESTOR CONDITION myuser-myproject 27s kube-admin Approved,Issued
You can download the issued certificate and save it to a myuser-myproject.crt file by running the following:
kubectl get csr myuser-myproject -o jsonpath='{.status.certificate}' \
| base64 --decode > myuser-myproject.crt
Now you can use myuser-myproject.crt and myuser.key as the keypair to access your cluster resources. Save both crt and key files in a safe location.
The last requirement to set up a new kubeconfig file is the cluster CA certificate. That is easy to get as we already have it in our existing kubeconfig file. To retrieve it, we can use the following command to save it into a file named ‘k8s-ca.crt’:
kubectl config view -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' --raw \
| base64 --decode - > k8s-ca.crt
Step 2: Create a namespace
Execute the kubectl create command to create the namespace (as the admin user):
$ kubectl create namespace myproject namespace/myproject created
Step 3: Create the new kubeconfig file
Now you can create a new kubeconfig file. Again, a kubeconfig file consists of a cluster configuration (Name, URL, CA cert), a user configuration (name, key, cert) and a context configuration. A context specifies the cluster, the user and the namespace that kubectl will use when making calls to the API server.
# Create new kubeconfig file with given credentials and a new context
CLUSTER=$(kubectl config view -o jsonpath='{.clusters[0].name}')
SERVER=$(kubectl config view -o jsonpath='{.clusters[0].cluster.server}')
kubectl config set-cluster $CLUSTER --server=$SERVER --certificate-authority=k8s-ca.crt --kubeconfig=myuser-config --embed-certs
Cluster "local" set.
kubectl config set-credentials myuser --cluster=$CLUSTER --client-certificate=myuser-myproject.crt --client-key=myuser.key --kubeconfig=myuser-config
User "myuser" set.
kubectl config set-context myuser-myproject-context --cluster=$CLUSTER --namespace=myproject --user=myuser --kubeconfig=myuser-config
Context "myuser-myproject-context" created.
kubectl config use-context myuser-myproject-context --kubeconfig=myuser-config
Switched to context "myuser-myproject-context".
# Check context
kubectl config get-contexts --kubeconfig=myuser-config
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* myuser-myproject-context local myuser myproject
Finally, select one of the following use cases to choose the required level of access for this particular user.
Use case 1: Create a user with cluster-admin privileges
Kubernetes uses roles to determine if a user is authorised to make a call. Roles are scoped to either the entire cluster via a ClusterRole object or to a particular namespace via a Role object.
Bind the role to the user
In order to allow this user to gain full cluster-admin privileges, attach the cluster-admin role to him:
# This user doesn't have access to resource nodes kubectl get nodes --kubeconfig=myuser-config Error from server (Forbidden): nodes is forbidden: User "myuser" cannot list resource "nodes" in API group "" at the cluster scope # Give the user cluster admin access kubectl create clusterrolebinding myuser-cluster-admin --clusterrole=cluster-admin --user=myuser # List Cluster Role Binding kubectl get clusterrolebinding myuser-cluster-admin NAME AGE myuser-cluster-admin 12s # Test you can get node status using this kubeconfig file kubectl get nodes --kubeconfig=myuser-config NAME STATUS ROLES AGE VERSION myuser-master-00 Ready controlplane,etcd 6d19h v1.14.6 myuser-worker-00 Ready worker 6d19h v1.14.6
How to revoke access to your cluster
As Kubernetes currently does not support certificate revocation lists (CRL), to revoke user access to the cluster, you can simply delete the ClusterRoleBinding created above:
kubectl delete clusterrolebinding myuser-cluster-admin rolebinding.rbac.authorization.k8s.io/myuser-cluster-admin deleted
Use case 2: User with admin privileges at the namespace level
Kubernetes uses roles to determine if a user is authorised to make a call. Roles are scoped to either the entire cluster via a ClusterRole object or to a particular namespace via a Role object.
Bind the role to the user
We need to give the user a role that will give him complete freedom within this namespace, but prevent any call outside of this namespace.
# Give admin access to this namespace kubectl create rolebinding myuser-admin --namespace=myproject --clusterrole=admin --user=myuser rolebinding.rbac.authorization.k8s.io/myuser-admin created # Deploy an simple image kubectl config use-context myuser kubectl run --image bitnami/dokuwiki mydokuwiki --kubeconfig=myuser-config kubectl get pods --kubeconfig=myuser-config NAME READY STATUS RESTARTS AGE mydokuwiki-66c5dd668-w9m4g 0/1 ContainerCreating 0 12s # Access to node level is restricted kubectl get nodes --kubeconfig=myuser-config Error from server (Forbidden): nodes is forbidden: User "myuser" cannot list resource "nodes" in API group "" at the cluster scope
How to revoke access to your cluster
As Kubernetes currently does not support certificate revocation lists (CRL), to revoke user access to the cluster, you can simply delete the RoleBinding created above:
kubectl delete rolebinding myuser-admin --namespace=myproject rolebinding.rbac.authorization.k8s.io/myuser-admin deleted # Access to pods now is restricted kubectl get pods --kubeconfig=myuser-config Error from server (Forbidden): pods is forbidden: User "myuser" cannot list resource "pods" in API group "" in the namespace "myproject"
Use case 3: Create a user with limited privileges within the namespace
Create the role for managing deployments
Create a role-deployment-manager.yaml file with the following contents:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: myproject
name: deployment-manager
rules:
- apiGroups: [ "", "extensions", "apps" ]
resources: [ "deployments", "replicasets", "pods" ]
verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
# You can also use ["*"]
Create the Role in the cluster using the kubectl create role command:
kubectl create -f role-deployment-manager.yaml role.rbac.authorization.k8s.io/deployment-manager created
Bind the role to the user
Create a rolebinding-deployment-manager.yaml file with the content below.
kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: deployment-manager-binding namespace: myproject subjects: - kind: User name: myuser apiGroup: "" roleRef: kind: Role name: deployment-manager apiGroup: ""
In this file, we are binding the deployment-manager Role to the User Account myuser inside the myproject namespace. Deploy the RoleBinding by running the kubectl create command:
kubectl create -f rolebinding-deployment-manager.yaml # You can access to this namespace again kubectl get pods --kubeconfig=myuser-config NAME READY STATUS RESTARTS AGE mydokuwiki-66c5dd668-w9m4g 1/1 Running 0 7m27s # .. and run some operations in this namespace kubectl delete pods mydokuwiki-66c5dd668-w9m4g --kubeconfig=myuser-config pod "mydokuwiki-66c5dd668-w9m4g" deleted
How to revoke access to your cluster
kubectl delete rolebinding deployment-manager-binding rolebinding.rbac.authorization.k8s.io/deployment-manager-binding deleted
Troubleshooting Magnum Cluster Access
Check all instances in your project are running
openstack server list
Start shut down instances that are hosting your magnum cluster
openstack server start my-magnum-node
Check the console of the instances to make sure they are up
openstack console url show my-magnum-node
If the instance will not start contact the Embassy Cloud Team
Check your loadbalancers status
openstack loadbalancer list
Check the API load balancer connectivity First get the LB IP
i=$(openstack loadbalancer show fg060cbc-b259-459a-80bc-e7f0ac883ff -f json | jq -r '.vip_port_id') ; openstack floating ip list -f json | jq -r --arg i "$i" '.[]|select(.Port==$i)|."Floating IP Address"' 45.88.81.225
Then check with netcat
nc -z -v 45.88.81.225 6443
If no success connecting then ask the Cloud Team to failover the LB and wait a few minutes for the LB to become active again
Check the LB status with
openstack loadbalancer show fg060cbc-b259-459a-80bc-e7f0ac883ff
Once happy try and connect again
nc -z -v 45.88.81.225 6443
Changing Cluster Template Versions
New cluster templates are made available to provide an optimal environment for running Magnum clusters. They are named in incremental version numbers, for example ehk-basic-template-v3.2 We recommend that all users migrate to the latest templates as they can contain updated Kubernetes versions, improved storage backends, bug/security fixes and so on.
This section covers migrating applications to new templates.
Automation and the Ephemeral Nature of Kubernetes Clusters
We expect all Magnum users to be aware that Kubernetes is a continuously moving target and should therefore make sure all applications making use of Magnum clusters are fully portable. This means automation wherever possible. Be prepared for the possibility that we will have to retire older templates due to security or functional issues. Making provision to move clusters to the latest template should be part of the application workflow, and will ensure a safe stable environment.
Cluster Migration
There is no way to simply update your current cluster to a new template at the moment. The best strategy is to apply for increased temporary resources to deploy a smaller cluster with a new template and deploy the application there. Once you are happy that the application is running successfully delete the old cluster.
Reusing Floating IP Addresses
This explains how to migrate a public IP to a new cluster
Procedure
First, disassociate the floating IPs you want to preserve from the relevant load balancers in OpenStack, then delete the Magnum cluster. This will ensure the floating IPs will not be released from your OpenStack project.
You need to create your new cluster and for an exposed service manually remove automatically allocated floating IPs from the Octavia LB in OpenStack related to your service and assign the old one.
Then alter the service to specify the newly allocated IP
spec: type: LoadBalancer loadBalancerIP: [old FIP address]
Note
This works with LoadBalancer type, so therefore also works with the ingress-nginx-controller
Reusing Persistent Volumes
This demonstrates how you can reuse PVs in a new Magnum Cluster. This saves time-consuming storage migrations.
Note
Make sure you specify the same filesystem used in the original volume or you may lose data
OpenStack Magnum Persistent Storage Documentation
Procedure
First, we create a new cluster
openstack coe cluster create magnum-test-cluster \
--cluster-template ehk-basic-template-v2.1 \
--node-count 1 \
--keypair magnum-test \
--master-count 1 \
--flavor m1.medium
openstack coe cluster list -f json
[
{
"status": "CREATE_COMPLETE",
"uuid": "ac80f2d4-e9f0-47af-be3a-f799e68145d6",
"health_status": "HEALTHY",
"master_count": 1,
"keypair": "magnum-test",
"node_count": 1,
"name": "magnum-test-cluster"
}
]
openstack coe cluster config ac80f2d4-e9f0-47af-be3a-f799e68145d6 --dir ./
export KUBECONFIG=/home/myhome/testing-pv/config
Then we create a PVC which will dynamically create a PV (Cinder Volume)
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: default-cinder
resources:
requests:
storage: 1Gi
EOF
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
claim1 Bound pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0 1Gi RWO default-cinder 11s
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0 1Gi RWO Delete Bound default/claim1 default-cinder 17
Then we launch our pod referencing the PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test-pv-pod
spec:
volumes:
- name: test-pv-storage
persistentVolumeClaim:
claimName: claim1
containers:
- name: test-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: test-pv-storage
EOF
kubectl get po
NAME READY STATUS RESTARTS AGE
test-pv-pod 1/1 Running 0 5m44s
We populate the PV with some data
kubectl exec -it test-pv-pod -- bash -c "echo 'this is a test' > /usr/share/nginx/html/index.html" kubectl exec -it test-pv-pod -- /bin/bash root@test-pv-pod:/# curl http://localhost this is a test
We need to delete and recreate our Magnum cluster so we flag this volume as ‘Retain’ so it will not be deleted
kubectl patch pv pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
persistentvolume/pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0 patched
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0 1Gi RWO Retain Bound default/claim1 default-cinder 30m
kubectl delete pvc claim1
persistentvolumeclaim "claim1" deleted
kubectl delete pv pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0
persistentvolume "pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0" deleted
kubectl get pv
No resources found
kubectl get pvc
No resources found in default namespace.
We look to check if the Cinder volume still exists in OpenStack
openstack volume list -f json | jq '.[]|select(.Name=="pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0")'
{
"Status": "available",
"Size": 1,
"Attached to": [],
"ID": "854b4eb4-aedc-4826-92fa-f3c4ee0c013c",
"Name": "pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0"
}
We delete the Magnum cluster and create a new one the same way as before making sure we fetch the kubeconfig again
(osclient) cems@cems-ubuntu:~/testing-pv$ openstack coe cluster delete ac80f2d4-e9f0-47af-be3a-f799e68145d6 Request to delete cluster ac80f2d4-e9f0-47af-be3a-f799e68145d6 has been accepted.
Here is the new cluster
openstack coe cluster list -f json
[
{
"status": "CREATE_COMPLETE",
"uuid": "aa6e8041-04aa-4107-8fe4-335cd300655c",
"health_status": "HEALTHY",
"master_count": 1,
"keypair": "magnum-test",
"node_count": 1,
"name": "magnum-test-cluster"
}
]
We fetch the volume Cinder volume ID for the previous PV and create a new pod directly referencing this Volume
export VOLID="$(openstack volume list -f json | jq -r '.[]|select(.Name=="pvc-5c466e50-1a0d-4886-8169-a6e6c3a922f0")|.ID')"
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test-pv-pod
spec:
volumes:
- name: test-pv-storage
cinder:
# Enter the volume ID below
volumeID: $VOLID
fsType: ext4
containers:
- name: test-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: test-pv-storage
EOF
We can see the data is preserved
kubectl exec -it test-pv-pod -- /bin/bash root@test-pv-pod:/# curl http://localhost this is a test
If we would rather create a PV with the existing volume we can do it this way by specifying the volume ID. We can then reference the claim in the pod as previously
cat <<EOF | kubectl apply -f -
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: "test-pv"
spec:
capacity:
storage: "1Gi"
accessModes:
- "ReadWriteOnce"
cinder:
fsType: "ext4"
volumeID: $VOLID
storageClassName: default-cinder
EOF
cat <<EOF | kubectl apply -f -
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "test-claim"
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "1Gi"
volumeName: "test-pv"
EOF
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-claim Bound test-pv 1Gi RWO default-cinder 5s
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
test-pv 1Gi RWO Retain Bound default/test-claim default-cinder 20s
Reusing Networks
There may be times when you will want to reuse an existing OpenStack Network for your Magnum cluster. A use case for example may be that the floating IP on the router gateway has controlled access to internal resources and changing this would break access.
The GUI appears to let you select your own network but this does not work -
A workaround is to use the following options when deploying the cluster using the CLI or Terraform:
--fixed-network --fixed-subnet