Intro
In my previous blog series I stepped though how to build a Enterprise PKS (will reference as PKS from here out) and NSX-T home lab nested env from the ground up which was well received and referenced by many. In this series I will guide you through how to troubleshoot PKS. First up is using BOSH CLI.
BOSH Overview
You may, or may not, know BOSH is what makes PKS awesome under the covers for day 1 and day 2 operations. If you read my other blog posts you will see I’m not one to repeat content already created by others, so in the same vain, find some great YouTube vids below on BOSH by Merlin Glynn and James Watters.
- Why Bosh? Part 1: BOSH Unique Capabilities
- Why Bosh? Part 2: Platforms Running on BOSH
- Kubernetes on BOSH for the Enterprise
BOSH CLI
To interact with the BOSH Director, we use BOSH CLI! BOSH CLI can be downloaded from here and is available for Windows, Mac, and Linux. BOSH CLI is already per-installed on the PCF Ops Manager VM which is very useful if you need to quickly perform some troubleshooting. Installing BOSH CLI is as simple as granting it excute permissions and copying/moving it to your PATH. For more detailed steps, see here.
BOSH Credentials
To interact with BOSH Director using BOSH CLI, naturally we need credentials. We can retrieve these credentials from PCF Ops Manager, aka Ops Man. Open and log into Ops Man web UI > on the installation dashboard click the BOSH Director tile > click Credentials tab > towards the bottom of the list of credentials click Link to Credential for Bosh Commandline Credentials
This will open a page with the credentials for BOSH Director in a single string to be executed from Ops Man VM.
This command line string can be used each time by taking from “BOSH_CLIENT=” to “bosh” and appending a BOSH command eg vms
ubuntu@opsmanager-2-4:~$ BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=io2Rd9uaa6SoOnSefHXlR_gBHyj6fbu1 BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=172.14.0.3 bosh vms
Alternatively, what can be done for convenience is to export the command line string for the session and just use bosh <command>
ubuntu@opsmanager-2-4:~$ export BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=io2Rd9uaa6SoOnSefHXlR_gBHyj6fbu1 BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=172.14.0.3 bosh
ubuntu@opsmanager-2-4:~$ bosh vms
Or, add the export command to a file and source it. Don’t forget to delete the file once finished as not best to leave a file with credentials lying around!
ubuntu@opsmanager-2-4:~$ vi pks-env
ubuntu@opsmanager-2-4:~$ source pks-env
ubuntu@opsmanager-2-4:~$ bosh vms
Above was when issuing BOSH commands from the Ops Man VM. If using a jump host and managing several foundations/environments you can configure a bosh alias for each using bosh alias-env. Note, use –ca-cert to specify path to BOSH director cert or if your certs are trusted via system installed CA certs, there is no need to provide –ca-cert option.
root@ubuntu-jump:~# bosh alias-env pks -e 10.0.80.3
root@ubuntu-jump:~# bosh alias-env pas -e 10.1.80.3
With the alias set, we need to login. The credentials are retrieved from Ops Man > BOSH Director tile > Credentials tab > Director Credentials.
To login we use bosh -e <alias> login and the retrieved credentials.
root@ubuntu-jump:~# bosh -e pks login
Using environment '10.10.0.3'
Email (): director
Password ():
Successfully authenticated with UAA
Succeeded
When issuing bosh commands, you then specify the environment alias eg bosh -e <alias> <command>
root@ubuntu-jump:~# bosh -e pks vms
Now that we can issues commands, lets have a look at a few I typically use when troubleshooting.
bosh cli options
bosh cli has the following options for each command. The most common I use is -d / –deployment
Application Options: -v, --version Show CLI version --config= Config file path (default: ~/.bosh/config) [$BOSH_CONFIG] -e, --environment= Director environment name or URL [$BOSH_ENVIRONMENT] --ca-cert= Director CA certificate path or value [$BOSH_CA_CERT] --sha2 Use SHA256 checksums [$BOSH_SHA2] --parallel= The max number of parallel operations (default: 5) --client= Override username or UAA client [$BOSH_CLIENT] --client-secret= Override password or UAA client secret [$BOSH_CLIENT_SECRET] -d, --deployment= Deployment name [$BOSH_DEPLOYMENT] --column= Filter to show only given column(s) --json Output as JSON --tty Force TTY-like output --no-color Toggle colorized output -n, --non-interactive Don't ask for user input [$BOSH_NON_INTERACTIVE]
bosh deployments
https://bosh.io/docs/cli-v2#deployments
Lists deployments tracked by the Director. Shows their names, used releases and stemcells.
bosh [OPTIONS] deployments
$ bosh deployments
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Name Release(s) Stemcell(s) Team(s)
pivotal-container-service-8757185288bf36d3580b backup-and-restore-sdk/1.8.0 bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.24 -
bosh-dns/1.10.0
bpm/0.13.0
cf-mysql/36.14.0
cfcr-etcd/1.8.0
docker/33.0.2
kubo/0.25.8
kubo-service-adapter/1.3.0-build.154
nsx-cf-cni/2.3.1.10693410
on-demand-service-broker/0.24.0
pks-api/1.3.0-build.154
pks-helpers/50.0.0
pks-nsx-t/1.19.0
pks-telemetry/2.0.0-build.113
pks-vrli/0.7.0
sink-resources-release/0.1.15
syslog/11.4.0
uaa/64.0
wavefront-proxy/0.9.0
service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 bosh-dns/1.10.0 bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.24 pivotal-container-service-8757185288bf36d3580b
bpm/0.13.0
cfcr-etcd/1.8.0
docker/33.0.2
kubo/0.25.8
nsx-cf-cni/2.3.1.10693410
pks-helpers/50.0.0
pks-nsx-t/1.19.0
pks-telemetry/2.0.0-build.113
pks-vrli/0.7.0
sink-resources-release/0.1.15
syslog/11.4.0
wavefront-proxy/0.9.0
2 deployments
Succeeded
Here we see “pivotal-container-service-8757185288bf36d3580b” which is the PKS API and “service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5” which is a PKS K8s cluster. Both of these are the deployment names that will be used in other BOSH commands.
bosh vms
https://bosh.io/docs/cli-v2#vms
Lists all VMs managed by the Director. Show instance names, IPs and VM CIDs.
bosh [OPTIONS] vms [vms-OPTIONS]
[vms command options] --dns Show DNS A records --vitals Show vitals --cloud-properties Show cloud properties
$ bosh vms
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 160
Task 161. Done
Task 160 done
Deployment 'pivotal-container-service-8757185288bf36d3580b'
Instance Process State AZ IPs VM CID VM Type Active
pivotal-container-service/9b1a76d7-0fd6-4b21-a0ad-bb2fba1d25e5 running Mgmt-AZ 172.14.0.4 vm-adc19244-85bb-4e26-812a-76da6dc20783 large true
1 vms
Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'
Instance Process State AZ IPs VM CID VM Type Active
master/529ba6a2-2cdf-491c-942b-2df671c9be2e running AZ1 172.15.0.2 vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8 medium.disk true
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99 running AZ1 172.15.0.3 vm-556623ca-d18f-4a06-a6a7-322cf7c2775f medium.disk true
worker/4b53f264-43b5-48dc-87a6-a2379a285b72 running AZ2 172.15.0.4 vm-b96a690d-29de-472f-8596-c7f018e36f83 medium.disk true
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d running AZ3 172.15.0.5 vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb medium.disk true
4 vms
Succeeded
Lists all VMs managed by the Director in a single deployment. Show instance names, IPs and VM CIDs.
bosh -d deployment vms
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 159. Done
Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'
Instance Process State AZ IPs VM CID VM Type Active
master/529ba6a2-2cdf-491c-942b-2df671c9be2e running AZ1 172.15.0.2 vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8 medium.disk true
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99 running AZ1 172.15.0.3 vm-556623ca-d18f-4a06-a6a7-322cf7c2775f medium.disk true
worker/4b53f264-43b5-48dc-87a6-a2379a285b72 running AZ2 172.15.0.4 vm-b96a690d-29de-472f-8596-c7f018e36f83 medium.disk true
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d running AZ3 172.15.0.5 vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb medium.disk true
4 vms
Succeeded
Load, cpu, memory, and disk stats per instance (VM).
bosh -d deployment vms --vitals
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms --vitals
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 182. Done
Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'
Instance Process State AZ IPs VM CID VM Type Active VM Created At Uptime Load CPU CPU CPU CPU Memory Swap System Ephemeral Persistent
(1m, 5m, 15m) Total User Sys Wait Usage Usage Disk Usage Disk Usage Disk Usage
master/529ba6a2-2cdf-491c-942b-2df671c9be2e running AZ1 172.15.0.2 vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8 medium.disk true Wed Mar 20 23:12:41 UTC 2019 8d 20h 35m 18s 0.69, 0.57, 0.45 - 4.1% 2.9% 0.2% 28% (1.1 GB) 0% (268 kB) 46% (33i%) 8% (2i%) 4% (0i%)
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99 running AZ1 172.15.0.3 vm-556623ca-d18f-4a06-a6a7-322cf7c2775f medium.disk true Wed Mar 20 23:12:40 UTC 2019 8d 20h 35m 19s 0.24, 0.20, 0.13 - 3.4% 2.2% 0.1% 30% (1.2 GB) 0% (0 B) 46% (33i%) 12% (2i%) 19% (7i%)
worker/4b53f264-43b5-48dc-87a6-a2379a285b72 running AZ2 172.15.0.4 vm-b96a690d-29de-472f-8596-c7f018e36f83 medium.disk true Wed Mar 20 23:12:40 UTC 2019 8d 20h 35m 14s 0.09, 0.09, 0.08 - 5.2% 1.8% 0.1% 30% (1.2 GB) 0% (0 B) 46% (33i%) 12% (2i%) 19% (7i%)
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d running AZ3 172.15.0.5 vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb medium.disk true Wed Mar 20 23:12:41 UTC 2019 8d 20h 35m 12s 0.00, 0.00, 0.00 - 0.3% 0.3% 0.0% 10% (397 MB) 0% (0 B) 46% (33i%) 9% (1i%) -
4 vms
Succeeded
Show cloud properties for each instance.
bosh -d deployment vms --cloud-properties
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms --cloud-properties
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 183. Done
Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'
Instance Process State AZ IPs VM CID VM Type Active Cloud Properties
master/529ba6a2-2cdf-491c-942b-2df671c9be2e running AZ1 172.15.0.2 vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8 medium.disk true cpu: 2
datacenters:
- clusters:
- Compute-cluster:
resource_pool: AZ1
name: AZ1
disk: 32768
ram: 4096
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99 running AZ1 172.15.0.3 vm-556623ca-d18f-4a06-a6a7-322cf7c2775f medium.disk true cpu: 2
datacenters:
- clusters:
- Compute-cluster:
resource_pool: AZ1
name: AZ1
disk: 32768
ram: 4096
vmx_options:
disk.enableUUID: "1"
worker/4b53f264-43b5-48dc-87a6-a2379a285b72 running AZ2 172.15.0.4 vm-b96a690d-29de-472f-8596-c7f018e36f83 medium.disk true cpu: 2
datacenters:
- clusters:
- Compute-cluster:
resource_pool: AZ2
name: AZ2
disk: 32768
ram: 4096
vmx_options:
disk.enableUUID: "1"
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d running AZ3 172.15.0.5 vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb medium.disk true cpu: 2
datacenters:
- clusters:
- Compute-cluster:
resource_pool: AZ3
name: AZ3
disk: 32768
ram: 4096
vmx_options:
disk.enableUUID: "1"
4 vms
Succeeded
bosh tasks
https://bosh.io/docs/cli-v2#tasks
Lists active and previously ran tasks.
bosh [OPTIONS] tasks [tasks-OPTIONS]
[tasks command options] -r, --recent= Number of tasks to show -a, --all Include all task types (ssh, logs, vms, etc)
ubuntu@opsmanager-2-4:~$ bosh tasks
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
ID State Started At Last Activity At User Deployment Description Result
195 processing Fri Mar 29 23:13:37 UTC 2019 Fri Mar 29 23:13:37 UTC 2019 pivotal-container-service-8757185288bf36d3580b service-instance_6b0fd642-0d22-4711-a9cf-2b98364ee1df create deployment -
1 tasks
Succeeded
List the 30 most recent tasks
bosh tasks --recent
$ bosh tasks --recent
Using environment '10.10.0.3' as client 'ops_manager'
ID State Started At Last Activity At User Deployment Description Result
166 done Wed Mar 27 16:21:18 UTC 2019 Wed Mar 27 16:25:21 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
165 done Wed Mar 27 16:16:17 UTC 2019 Wed Mar 27 16:20:43 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
160 done Wed Mar 27 16:03:14 UTC 2019 Wed Mar 27 16:15:36 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 create deployment /deployments/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258
159 done Wed Mar 27 16:03:09 UTC 2019 Wed Mar 27 16:26:21 UTC 2019 ops_manager pivotal-container-service-5c365f595dc7e6c897d6 run errand upgrade-all-service-instances from deployment pivotal-container-service-5c365f595dc7e6c897d6 1 succeeded, 0 errored, 0 canceled
158 done Wed Mar 27 16:02:45 UTC 2019 Wed Mar 27 16:03:07 UTC 2019 ops_manager pivotal-container-service-5c365f595dc7e6c897d6 run errand pks-nsx-t-precheck from deployment pivotal-container-service-5c365f595dc7e6c897d6 1 succeeded, 0 errored, 0 canceled
157 done Wed Mar 27 15:59:23 UTC 2019 Wed Mar 27 16:02:43 UTC 2019 ops_manager pivotal-container-service-5c365f595dc7e6c897d6 create deployment /deployments/pivotal-container-service-5c365f595dc7e6c897d6
156 done Wed Mar 27 15:59:19 UTC 2019 Wed Mar 27 15:59:19 UTC 2019 ops_manager - create release Created release 'wavefront-proxy/0.9.0'
155 done Wed Mar 27 15:58:54 UTC 2019 Wed Mar 27 15:58:54 UTC 2019 ops_manager - create release Created release 'backup-and-restore-sdk/1.8.0'
154 done Wed Mar 27 15:58:17 UTC 2019 Wed Mar 27 15:58:17 UTC 2019 ops_manager - create release Created release 'bpm/0.13.0'
153 done Wed Mar 27 15:58:07 UTC 2019 Wed Mar 27 15:58:07 UTC 2019 ops_manager - create release Created release 'uaa/64.0'
152 done Wed Mar 27 15:57:48 UTC 2019 Wed Mar 27 15:57:48 UTC 2019 ops_manager - create release Created release 'pks-telemetry/2.0.0-build.140'
151 done Wed Mar 27 15:57:32 UTC 2019 Wed Mar 27 15:57:33 UTC 2019 ops_manager - create release Created release 'sink-resources-release/0.1.15.1'
150 done Wed Mar 27 15:57:18 UTC 2019 Wed Mar 27 15:57:18 UTC 2019 ops_manager - create release Created release 'syslog/11.4.0'
149 done Wed Mar 27 15:57:09 UTC 2019 Wed Mar 27 15:57:09 UTC 2019 ops_manager - create release Created release 'pks-vrli/0.7.0'
148 done Wed Mar 27 15:57:02 UTC 2019 Wed Mar 27 15:57:04 UTC 2019 ops_manager - create release Created release 'nsx-cf-cni/2.3.2.11695762'
147 done Wed Mar 27 15:56:50 UTC 2019 Wed Mar 27 15:56:51 UTC 2019 ops_manager - create release Created release 'pks-nsx-t/1.21.0'
146 done Wed Mar 27 15:56:41 UTC 2019 Wed Mar 27 15:56:41 UTC 2019 ops_manager - create release Created release 'pks-helpers/50.0.0'
145 done Wed Mar 27 15:56:40 UTC 2019 Wed Mar 27 15:56:40 UTC 2019 ops_manager - create release Created release 'pks-api/1.3.3-build.1'
144 done Wed Mar 27 15:56:26 UTC 2019 Wed Mar 27 15:56:26 UTC 2019 ops_manager - create release Created release 'on-demand-service-broker/0.24.0'
143 done Wed Mar 27 15:56:16 UTC 2019 Wed Mar 27 15:56:16 UTC 2019 ops_manager - create release Created release 'kubo-service-adapter/1.3.3-build.1'
142 done Wed Mar 27 15:56:10 UTC 2019 Wed Mar 27 15:56:10 UTC 2019 ops_manager - create release Created release 'cfcr-etcd/1.8.0'
141 done Wed Mar 27 15:55:59 UTC 2019 Wed Mar 27 15:55:59 UTC 2019 ops_manager - create release Created release 'kubo/0.25.11'
140 done Wed Mar 27 15:55:36 UTC 2019 Wed Mar 27 15:55:36 UTC 2019 ops_manager - create release Created release 'docker/35.1.0'
139 done Wed Mar 27 15:55:24 UTC 2019 Wed Mar 27 15:55:26 UTC 2019 ops_manager - create release Created release 'cf-mysql/36.14.0'
138 done Wed Mar 27 15:54:39 UTC 2019 Wed Mar 27 15:54:40 UTC 2019 ops_manager - create release Created release 'pks-helpers/50.0.0'
137 done Wed Mar 27 15:54:32 UTC 2019 Wed Mar 27 15:54:32 UTC 2019 ops_manager - create release Created release 'syslog/11.4.0'
136 done Wed Mar 27 15:54:25 UTC 2019 Wed Mar 27 15:54:25 UTC 2019 ops_manager - create release Created release 'bosh-dns/1.10.0'
135 done Wed Mar 27 15:42:30 UTC 2019 Wed Mar 27 15:46:26 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
134 done Wed Mar 27 15:37:38 UTC 2019 Wed Mar 27 15:42:28 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
30 tasks
Succeeded
List x number of recent tasks
bosh tasks --recent=<num>
$ bosh ts -r=5
Using environment '10.10.0.3' as client 'ops_manager'
ID State Started At Last Activity At User Deployment Description Result
166 done Wed Mar 27 16:21:18 UTC 2019 Wed Mar 27 16:25:21 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
165 done Wed Mar 27 16:16:17 UTC 2019 Wed Mar 27 16:20:43 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
160 done Wed Mar 27 16:03:14 UTC 2019 Wed Mar 27 16:15:36 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 create deployment /deployments/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258
159 done Wed Mar 27 16:03:09 UTC 2019 Wed Mar 27 16:26:21 UTC 2019 ops_manager pivotal-container-service-5c365f595dc7e6c897d6 run errand upgrade-all-service-instances from deployment pivotal-container-service-5c365f595dc7e6c897d6 1 succeeded, 0 errored, 0 canceled
158 done Wed Mar 27 16:02:45 UTC 2019 Wed Mar 27 16:03:07 UTC 2019 ops_manager pivotal-container-service-5c365f595dc7e6c897d6 run errand pks-nsx-t-precheck from deployment pivotal-container-service-5c365f595dc7e6c897d6 1 succeeded, 0 errored, 0 canceled
5 tasks
Succeeded
bosh task
https://bosh.io/docs/cli-v2#task
Show details of a single task. Continues to follow task if it is still running.
bosh [OPTIONS] task [task-OPTIONS] [ID]
[task command options] --event Track event log --cpi Track CPI log --debug Track debug log --result Track result log -a, --all Include all task types (ssh, logs, vms, etc)
$ bosh task 128
Using environment '10.10.0.3' as client 'ops_manager'
Task 128
Task 128 | 15:09:19 | Preparing deployment: Preparing deployment
Task 128 | 15:09:21 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:22 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:22 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:38 | Preparing deployment: Preparing deployment (00:00:19)
Task 128 | 15:09:51 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 128 | 15:09:51 | Creating missing vms: master/320749e3-aa1e-4710-a200-7beb67762c8a (0)
Task 128 | 15:09:51 | Creating missing vms: worker/820e841a-e121-437a-823b-1642468e4ae9 (0)
Task 128 | 15:09:51 | Creating missing vms: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1)
Task 128 | 15:09:51 | Creating missing vms: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2)
Task 128 | 15:11:59 | Creating missing vms: master/320749e3-aa1e-4710-a200-7beb67762c8a (0) (00:02:08)
Task 128 | 15:12:00 | Creating missing vms: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1) (00:02:09)
Task 128 | 15:12:02 | Creating missing vms: worker/820e841a-e121-437a-823b-1642468e4ae9 (0) (00:02:11)
Task 128 | 15:12:09 | Creating missing vms: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2) (00:02:18)
Task 128 | 15:12:11 | Updating instance master: master/320749e3-aa1e-4710-a200-7beb67762c8a (0) (canary) (00:02:48)
Task 128 | 15:14:59 | Updating instance worker: worker/820e841a-e121-437a-823b-1642468e4ae9 (0) (canary) (00:05:54)
Task 128 | 15:20:53 | Updating instance worker: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1) (00:08:31)
Task 128 | 15:29:24 | Updating instance worker: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2) (00:08:04)
Task 128 Started Wed Mar 27 15:09:19 UTC 2019
Task 128 Finished Wed Mar 27 15:37:28 UTC 2019
Task 128 Duration 00:28:09
Task 128 done
Succeeded
Retrieve tasks debug logs.
bosh task <ID> --debug
I tend to pipe this to a file as it can be quite large eg 2MB+
$ bosh t 157 --debug > task_157_debug.txt
Below is an excerpt of a task debug log
I, [2019-03-27T15:59:22.451689 #25] [0x2ab99f506418] INFO -- TaskHelper: Director Version: 268.2.2
I, [2019-03-27T15:59:22.451757 #25] [0x2ab99f506418] INFO -- TaskHelper: Enqueuing task: 157
I, [2019-03-27T15:59:23.341512 #32742] [] INFO -- DirectorJobRunner: Looking for task with task id 157
D, [2019-03-27T15:59:23.343843 #32742] [] DEBUG -- DirectorJobRunner: (0.000370s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
I, [2019-03-27T15:59:23.346548 #32742] [] INFO -- DirectorJobRunner: Found task #<Bosh::Director::Models::Task @values={:id=>157, :state=>"processing", :timestamp=
>2019-03-27 15:59:22 UTC, :description=>"create deployment", :result=>nil, :output=>"/var/vcap/store/director/tasks/157", :checkpoint_time=>2019-03-27 15:59:22 UTC,
:type=>"update_deployment", :username=>"ops_manager", :deployment_name=>"pivotal-container-service-5c365f595dc7e6c897d6", :started_at=>nil, :event_output=>"", :res
ult_output=>"", :context_id=>""}>
I, [2019-03-27T15:59:23.346696 #32742] [] INFO -- DirectorJobRunner: Running from worker 'worker_3' on director/b40c1629-8572-46ea-7680-af70833300c4 (127.0.0.1)
I, [2019-03-27T15:59:23.346769 #32742] [] INFO -- DirectorJobRunner: Starting task: 157
I, [2019-03-27T15:59:23.346995 #32742] [task:157] INFO -- DirectorJobRunner: Creating job
D, [2019-03-27T15:59:23.349534 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000405s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
D, [2019-03-27T15:59:23.351056 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000213s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
I, [2019-03-27T15:59:23.353690 #32742] [task:157] INFO -- DirectorJobRunner: Performing task: #<Bosh::Director::Models::Task @values={:id=>157, :state=>"processing
", :timestamp=>2019-03-27 15:59:22 UTC, :description=>"create deployment", :result=>nil, :output=>"/var/vcap/store/director/tasks/157", :checkpoint_time=>2019-03-27
15:59:22 UTC, :type=>"update_deployment", :username=>"ops_manager", :deployment_name=>"pivotal-container-service-5c365f595dc7e6c897d6", :started_at=>nil, :event_ou
tput=>"", :result_output=>"", :context_id=>""}>
D, [2019-03-27T15:59:23.356326 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000095s) (conn: 47254693400660) BEGIN
D, [2019-03-27T15:59:23.360350 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000758s) (conn: 47254693400660) UPDATE "tasks" SET "state" = 'processing', "timesta
mp" = '2019-03-27 15:59:23.353831+0000', "description" = 'create deployment', "result" = NULL, "output" = '/var/vcap/store/director/tasks/157', "checkpoint_time" =
'2019-03-27 15:59:23.354810+0000', "type" = 'update_deployment', "username" = 'ops_manager', "deployment_name" = 'pivotal-container-service-5c365f595dc7e6c897d6', "
started_at" = '2019-03-27 15:59:23.354518+0000', "event_output" = '', "result_output" = '', "context_id" = '' WHERE ("id" = 157)
D, [2019-03-27T15:59:23.366628 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.005804s) (conn: 47254693400660) COMMIT
I, [2019-03-27T15:59:23.366932 #32742] [task:157] INFO -- DirectorJobRunner: Reading deployment manifest
Likewise for CPI (Cloud Provider Interface) logs.
bosh task <ID> --cpi
I pipe them to a file.
$ bosh t 26 --cpi > task_26_cpi.txt
Below is an excerpt of a task cpi log
I, [2019-03-15T14:40:26.108211 #15220] INFO -- [req_id cpi-909205]: Starting create_vm...
D, [2019-03-15T14:40:26.110953 #15220] DEBUG -- [req_id cpi-909205]: Running method 'RetrieveServiceContent'...
D, [2019-03-15T14:40:26.196337 #15220] DEBUG -- [req_id cpi-909205]: Running method 'Login'...
D, [2019-03-15T14:40:26.255184 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByInventoryPath'...
D, [2019-03-15T14:40:26.265972 #15220] DEBUG -- [req_id cpi-909205]: Running method 'CreateContainerView'...
D, [2019-03-15T14:40:26.305712 #15220] DEBUG -- [req_id cpi-909205]: All clusters provided: {"SA-Management-Edge"=>#<VSphereCloud::ClusterConfig:0x000055bfa94b7a30
@name="SA-Management-Edge", @config={"resource_pool"=>"AZ-MGMT"}>, "SA-Compute-01"=>#<VSphereCloud::ClusterConfig:0x000055bfa94b3d18 @name="SA-Compute-01", @config=
{"resource_pool"=>"AZ-COMP-3"}>}
D, [2019-03-15T14:40:26.305879 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByInventoryPath'...
I, [2019-03-15T14:40:26.365699 #15220] INFO -- [req_id cpi-909205]: Using global ephemeral disk datastore pattern: ^(SA\-Shared\-02\-Remote)$
I, [2019-03-15T14:40:26.379263 #15220] INFO -- [req_id cpi-909205]: Initiating VM Allocator with VM Placement Criteria: Disk Config: [Disk ==> Disk CID: Size:16384
Ephemeral:true Target-DS:^(SA\-Shared\-02\-Remote)$ Existing-DS:] Req Memory: 4096 Mem Headroom: 128
I, [2019-03-15T14:40:26.379427 #15220] INFO -- [req_id cpi-909205]: Gathering vm placement resources for vm placement allocator pipeline
D, [2019-03-15T14:40:26.434012 #15220] DEBUG -- [req_id cpi-909205]: Found requested resource pool: <[Vim.ResourcePool] resgroup-401>
D, [2019-03-15T14:40:26.451823 #15220] DEBUG -- [req_id cpi-909205]: Filter VM Placement Cluster: SA-Management-Edge Memory: 59442 for free memory required: 4224
D, [2019-03-15T14:40:26.452066 #15220] DEBUG -- [req_id cpi-909205]: Filter VM Placement Cluster : SA-Management-Edge hosts: datastores: [<Datastore: / SA-ESXi-01-
Local>, <Datastore: / SA-Shared-02-Remote>, <Datastore: / SA-ESXi-02-Local>, <Datastore: / SA-ESXi-03-Local>, <Datastore: / SA-Shared-01-Remote>] for combination of
DS satisfying disk configurations
D, [2019-03-15T14:40:26.452192 #15220] DEBUG -- [req_id cpi-909205]: Trying to find placement for Disk ==> Disk CID: Size:16384 Ephemeral:true Target-DS:^(SA\-Share
d\-02\-Remote)$ Existing-DS:
I, [2019-03-15T14:40:26.452420 #15220] INFO -- [req_id cpi-909205]: Initiating Disk Allocator with Storage Criteria: Space Req: 16384 Target Pattern: ^(SA\-Shared\
-02\-Remote)$
Existing-DS Pattern:
D, [2019-03-15T14:40:26.555554 #15220] DEBUG -- [req_id cpi-909205]: Found <Datastore: / SA-Shared-02-Remote> for Disk ==> Disk CID: Size:16384 Ephemeral:true Targe
t-DS:^(SA\-Shared\-02\-Remote)$ Existing-DS:
D, [2019-03-15T14:40:26.555788 #15220] DEBUG -- [req_id cpi-909205]: Initial Storage Options for creating ephemeral disk from pattern: ["SA-Shared-02-Remote"]
D, [2019-03-15T14:40:26.555872 #15220] DEBUG -- [req_id cpi-909205]: Storage Options for creating ephemeral disk are: ["SA-Shared-02-Remote"]
I, [2019-03-15T14:40:26.556014 #15220] INFO -- [req_id cpi-909205]: Checking if ip '10.10.0.5' is in use
D, [2019-03-15T14:40:26.556117 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByIp'...
I, [2019-03-15T14:40:26.564608 #15220] INFO -- [req_id cpi-909205]: No IP conflicts detected
I, [2019-03-15T14:40:26.565046 #15220] INFO -- [req_id cpi-909205]: Creating vm: vm-884de320-42fa-4af5-ad61-f6fa19a554f9 on <[Vim.ClusterComputeResource] domain-c2
6> stored in <[Vim.Datastore] datastore-421>
bosh instances
https://bosh.io/docs/cli-v2#instances
List all instances in a deployment.
bosh [OPTIONS] instances [instances-OPTIONS]
[instances command options] -i, --details Show details including VM CID, persistent disk CID, etc. --dns Show DNS A records --vitals Show vitals -p, --ps Show processes -f, --failing Only show failing instances
Instance command options –details and –ps are ones I typically use often
bosh instances --details
$ bosh instances --details
Using environment '10.10.0.3' as client 'ops_manager'
Task 186
Task 185
Task 186 done
Task 185 done
Deployment 'pivotal-container-service-5c365f595dc7e6c897d6'
Instance Process State AZ IPs State VM CID VM Type Disk CIDs Agent ID Index Resurrection Bootstrap Ignore
Paused
pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f running AZ-MGMT 10.10.0.4 started vm-9ab49a58-6f4e-4cb7-9b30-19d5cc5e5ada large disk-d0ed65d5-ed25-4bbf-adf9-baf5fe3d92d9 fe77c7f7-dfb1-4c54-91a0-33376dc1d6c2 0 false true false
1 instances
Deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Instance Process State AZ IPs State VM CID VM Type Disk CIDs Agent ID Index Resurrection Bootstrap Ignore
Paused
apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857 - AZ-COMP-1 - started - micro - - 0 false true false
master/320749e3-aa1e-4710-a200-7beb67762c8a running AZ-COMP-1 10.20.0.2 started vm-02ded866-05fe-46ae-a543-0d92ce07a953 medium.disk disk-0f0d3195-e26a-45ba-ac2c-605fefa682ec f5451f4d-b044-4bc5-bf1f-fe5ca553997a 0 false true false
worker/50384028-2098-4579-bb6e-5dfa5f36053c running AZ-COMP-3 10.20.0.5 started vm-b72f5307-4c4d-48ca-b4ba-f4c8ee8b20cf medium.disk disk-95291f7c-435d-456f-a27e-9c0e105bd003 1d649278-46e0-4786-83e8-745f47128224 2 false false false
worker/820e841a-e121-437a-823b-1642468e4ae9 running AZ-COMP-1 10.20.0.3 started vm-0cab1c50-6bca-43c0-a92a-8ab89bfb49fd medium.disk disk-9eb21d95-ca43-43e2-b23b-95a206924741 5abc24ef-d611-4573-a6c4-d453df7463d1 0 false true false
worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 running AZ-COMP-2 10.20.0.4 started vm-d95600f8-bb70-4e05-b02f-b0af042368ff medium.disk disk-b404dd6f-25d5-4019-9aac-ba3e1cfc2d02 b187b7d8-0ef3-45f6-911b-a7c11d72f950 1 false false false
5 instances
Succeeded
bosh instances --ps
Show all the processes for each instance. This information can also be retrieved when directly on the instance and using the monit summary command.
$ bosh instances --ps
Using environment '10.10.0.3' as client 'ops_manager'
Task 190
Task 189
Task 190 done
Task 189 done
Deployment 'pivotal-container-service-5c365f595dc7e6c897d6'
Instance Process Process State AZ IPs
pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f - running AZ-MGMT 10.10.0.4
~ bosh-dns running - -
~ bosh-dns-healthcheck running - -
~ bosh-dns-resolvconf running - -
~ broker running - -
~ cluster_health_logger running - -
~ galera-healthcheck running - -
~ gra-log-purger-executable running - -
~ mariadb_ctrl running - -
~ pks-api running - -
~ pks-nsx-t-osb-proxy running - -
~ telemetry-server running - -
~ uaa running - -
1 instances
Deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Instance Process Process State AZ IPs
apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857 - - AZ-COMP-1 -
master/320749e3-aa1e-4710-a200-7beb67762c8a - running AZ-COMP-1 10.20.0.2
~ blackbox running - -
~ bosh-dns running - -
~ bosh-dns-healthcheck running - -
~ bosh-dns-resolvconf running - -
~ etcd running - -
~ kube-apiserver running - -
~ kube-controller-manager running - -
~ kube-scheduler running - -
~ ncp running - -
~ pks-helpers-bosh-dns-resolvconf running - -
worker/50384028-2098-4579-bb6e-5dfa5f36053c - running AZ-COMP-3 10.20.0.5
~ blackbox running - -
~ bosh-dns running - -
~ bosh-dns-healthcheck running - -
~ bosh-dns-resolvconf running - -
~ docker running - -
~ kube-proxy running - -
~ kubelet running - -
~ nsx-kube-proxy running - -
~ nsx-node-agent running - -
~ ovs-vswitchd running - -
~ ovsdb-server running - -
~ pks-helpers-bosh-dns-resolvconf running - -
worker/820e841a-e121-437a-823b-1642468e4ae9 - running AZ-COMP-1 10.20.0.3
~ blackbox running - -
~ bosh-dns running - -
~ bosh-dns-healthcheck running - -
~ bosh-dns-resolvconf running - -
~ docker running - -
~ kube-proxy running - -
~ kubelet running - -
~ nsx-kube-proxy running - -
~ nsx-node-agent running - -
~ ovs-vswitchd running - -
~ ovsdb-server running - -
~ pks-helpers-bosh-dns-resolvconf running - -
worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 - running AZ-COMP-2 10.20.0.4
~ blackbox running - -
~ bosh-dns running - -
~ bosh-dns-healthcheck running - -
~ bosh-dns-resolvconf running - -
~ docker running - -
~ kube-proxy running - -
~ kubelet running - -
~ nsx-kube-proxy running - -
~ nsx-node-agent running - -
~ ovs-vswitchd running - -
~ ovsdb-server running - -
~ pks-helpers-bosh-dns-resolvconf running - -
5 instances
Succeeded
bosh cloud-check
https://bosh.io/docs/cli-v2#cloud-check
https://bosh.io/docs/cck/
BOSH provides the Cloud Check CLI command (a.k.a cck) to repair IaaS resources used by a specific deployment. It is not commonly used during normal operations; however, it becomes essential when some IaaS operations fail and the Director cannot resolve problems without a human decision or when the Resurrector is not enabled.
The Resurrector will only try to recover any VMs that are missing from the IaaS or that have non-responsive agents. The cck tool is similar to the Resurrector in that it also looks for those two conditions; however, instead of automatically trying to resolve these problems, it provides several options to the operator.
In addition to looking for those two types of problems, cck also checks correct attachment and presence of persistent disks for each deployment job instance.
bosh [OPTIONS] cloud-check [cloud-check-OPTIONS]
[cloud-check command options] -a, --auto Resolve problems automatically --resolution= Apply resolution of given type -r, --report Only generate report; don't attempt to resolve problems
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 cck
Using environment '10.10.0.3' as client 'ops_manager'
Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Task 203
Task 203 | 18:44:14 | Scanning 4 VMs: Checking VM states (00:00:07)
Task 203 | 18:44:21 | Scanning 4 VMs: 4 OK, 0 unresponsive, 0 missing, 0 unbound (00:00:00)
Task 203 | 18:44:21 | Scanning 4 persistent disks: Looking for inactive disks (00:00:35)
Task 203 | 18:44:56 | Scanning 4 persistent disks: 4 OK, 0 missing, 0 inactive, 0 mount-info mismatch (00:00:00)
Task 203 Started Sun Mar 31 18:44:14 UTC 2019
Task 203 Finished Sun Mar 31 18:44:56 UTC 2019
Task 203 Duration 00:00:42
Task 203 done
# Type Description
0 problems
Succeeded
bosh events
https://bosh.io/docs/cli-v2#events
In addition to keeping a historical list of Director tasks for debugging history, the Director keeps detailed list of actions user and system took during its operation.
bosh [OPTIONS] events [events-OPTIONS]
[events command options] --before-id= Show events with ID less than the given ID --before= Show events before the given timestamp (ex: 2016-05-08 17:26:32) --after= Show events after the given timestamp (ex: 2016-05-08 17:26:32) --task= Show events with the given task ID --instance= Show events with given instance --event-user= Show events with given user --action= Show events with given action --object-type= Show events with given object type --object-name= Show events with given object name
Below is an excerpt as by default 200 events are returned.
$ bosh events
1220 Wed Mar 27 16:26:21 UTC 2019 ops_manager release lock lock:deployment:pivotal-container-service-5c365f595dc7e6c897d6 159 pivotal-container-service-5c365f595dc7e6c897d6 - - -
1219 <- 1047 Wed Mar 27 16:26:21 UTC 2019 ops_manager run errand upgrade-all-service-instances 159 pivotal-container-service-5c365f595dc7e6c897d6 pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f (0) exit_code: 0 -
1218 Wed Mar 27 16:26:06 UTC 2019 health_monitor release lock lock:deployment:service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 167 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 - - -
1217 Wed Mar 27 16:26:05 UTC 2019 health_monitor acquire lock lock:deployment:service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 167 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 - - -
1216 Wed Mar 27 16:26:05 UTC 2019 health_monitor create alert dd88d138-1a48-45b5-a02a-2ad54e54cc2c - service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 - message: 'Recreating unresponsive VM. Alert @ 2019-03-27 16:26:05 UTC, severity 4: -
Notifying Director to recreate instance: ''apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857'';
deployment: ''service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258''; 1 of 5 agents
are unhealthy (20.0%)'
bosh ssh
https://bosh.io/docs/cli-v2#ssh
ssh into instance(s)
bosh [OPTIONS] ssh [ssh-OPTIONS] [INSTANCE-GROUP[/INSTANCE-ID]]
[ssh command options] -c, --command= Command --opts= Options to pass through to SSH -r, --results Collect results into a table instead of streaming
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 ssh master
Using environment '10.10.0.3' as client 'ops_manager'
Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Task 213. Done
Unauthorized use is strictly prohibited. All access and activity
is subject to logging and monitoring.
Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-43-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Last login: Sun Mar 31 19:12:39 2019 from 10.10.0.2
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
master/320749e3-aa1e-4710-a200-7beb67762c8a:~$
monit
Monit is not a bosh command but I think is best mentioned here as its typically ran when bosh ssh’d on to an instance. On any BOSH managed VM, you can access Monit status for release jobs’ processes via Monit CLI. Before you can run the command you have to switch to become a root user (via sudo su) since Monit executable is only available to root users.
Each enabled release job has its own directory in /var/vcap/jobs/ directory. Each release job directory contains a monit file (e.g. /var/vcap/jobs/redis-server/monit) with final monit configuration for that release job. This is how you can tell which processes belong to which release job. Most release job only start a single process.
You can also get more detailed information about individual processes via monit status.
While debugging why certain process is failing it is usually useful to tell Monit to stop restarting the failing process. You can do so via monit stop <process-name> command. To start it back up use monit start <process-name> command.
monit summary
master/320749e3-aa1e-4710-a200-7beb67762c8a:~$ sudo su
master/320749e3-aa1e-4710-a200-7beb67762c8a:/var/vcap/bosh_ssh/bosh_94f00b3b4a94422# monit summary
The Monit daemon 5.2.5 uptime: 4d 4h 3m
Process 'kube-apiserver' running
Process 'kube-controller-manager' running
Process 'kube-scheduler' running
Process 'etcd' running
Process 'blackbox' running
Process 'ncp' running
Process 'bosh-dns' running
Process 'bosh-dns-resolvconf' running
Process 'bosh-dns-healthcheck' running
Process 'pks-helpers-bosh-dns-resolvconf' running
System 'system_localhost' running
monit status
master/320749e3-aa1e-4710-a200-7beb67762c8a:/var/vcap/bosh_ssh/bosh_94f00b3b4a94422# monit status
The Monit daemon 5.2.5 uptime: 4d 4h 5m
Process 'kube-apiserver'
status running
monitoring status monitored
pid 9836
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 548300
memory kilobytes total 548300
memory percent 13.5%
memory percent total 13.5%
cpu percent 2.4%
cpu percent total 2.4%
data collected Sun Mar 31 19:19:59 2019
Process 'kube-controller-manager'
status running
monitoring status monitored
pid 9902
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 106600
memory kilobytes total 106600
memory percent 2.6%
memory percent total 2.6%
cpu percent 2.4%
cpu percent total 2.4%
data collected Sun Mar 31 19:19:59 2019
Process 'kube-scheduler'
status running
monitoring status monitored
pid 9958
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 36072
memory kilobytes total 36072
memory percent 0.8%
memory percent total 0.8%
cpu percent 0.4%
cpu percent total 0.4%
data collected Sun Mar 31 19:19:59 2019
Process 'etcd'
status running
monitoring status monitored
pid 10016
parent pid 1
uptime 4d 4h 5m
children 1
memory kilobytes 3212
memory kilobytes total 202712
memory percent 0.0%
memory percent total 5.0%
cpu percent 0.0%
cpu percent total 0.9%
data collected Sun Mar 31 19:19:59 2019
Process 'blackbox'
status running
monitoring status monitored
pid 10051
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 14028
memory kilobytes total 14028
memory percent 0.3%
memory percent total 0.3%
cpu percent 0.4%
cpu percent total 0.4%
data collected Sun Mar 31 19:19:59 2019
Process 'ncp'
status running
monitoring status monitored
pid 10129
parent pid 1
uptime 4d 4h 5m
children 2
memory kilobytes 2776
memory kilobytes total 73748
memory percent 0.0%
memory percent total 1.8%
cpu percent 0.0%
cpu percent total 0.9%
data collected Sun Mar 31 19:19:59 2019
Process 'bosh-dns'
status running
monitoring status monitored
pid 9489
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 24636
memory kilobytes total 24636
memory percent 0.6%
memory percent total 0.6%
cpu percent 0.4%
cpu percent total 0.4%
data collected Sun Mar 31 19:19:59 2019
Process 'bosh-dns-resolvconf'
status running
monitoring status monitored
pid 9384
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 3760
memory kilobytes total 3760
memory percent 0.0%
memory percent total 0.0%
cpu percent 0.0%
cpu percent total 0.0%
data collected Sun Mar 31 19:19:59 2019
Process 'bosh-dns-healthcheck'
status running
monitoring status monitored
pid 9405
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 9740
memory kilobytes total 9740
memory percent 0.2%
memory percent total 0.2%
cpu percent 0.0%
cpu percent total 0.0%
data collected Sun Mar 31 19:19:59 2019
Process 'pks-helpers-bosh-dns-resolvconf'
status running
monitoring status monitored
pid 9384
parent pid 1
uptime 4d 4h 5m
children 0
memory kilobytes 3760
memory kilobytes total 3760
memory percent 0.0%
memory percent total 0.0%
cpu percent 0.0%
cpu percent total 0.0%
data collected Sun Mar 31 19:19:59 2019
System 'system_localhost'
status running
monitoring status monitored
load average [0.27] [0.44] [0.43]
cpu 3.9%us 4.4%sy 0.7%wa
memory usage 1107804 kB [27.4%]
swap usage 524 kB [0.0%]
data collected Sun Mar 31 19:19:59 2019
bosh clean-up
https://bosh.io/docs/cli-v2#clean-up
Remove all unused releases, stemcells, etc.; otherwise most recent resources will be kept.
bosh [OPTIONS] clean-up --all
$ bosh clean-up --all
Using environment '10.10.0.3' as client 'ops_manager'
Continue? [yN]: y
Task 215
Task 215 | 19:25:50 | Deleting releases: docker/35.0.0
Task 215 | 19:25:50 | Deleting releases: pks-nsx-t/1.19.0
Task 215 | 19:25:50 | Deleting releases: nsx-cf-cni/2.3.1.10693410
Task 215 | 19:25:50 | Deleting releases: pks-telemetry/2.0.0-build.113
Task 215 | 19:25:50 | Deleting releases: sink-resources-release/0.1.15
Task 215 | 19:25:50 | Deleting releases: kubo/0.25.9
Task 215 | 19:25:50 | Deleting packages: pks-nsx-t-cli/f8541c3ab3a3ddd8720ae4337cfda9d72fd9e9f2
Task 215 | 19:25:50 | Deleting jobs: docker/9106f2911a920980d955bda7bcdd42996e6a9b4c48374c293dff034caf2a1892
Task 215 | 19:25:50 | Deleting packages: sink-agent/d093297b1d5a3cadefaea61eee38ef924a9cf429
Task 215 | 19:25:50 | Deleting packages: cni/5502f221857b523ce73c29c1de24135f6d883b18
Task 215 | 19:25:50 | Deleting packages: billing-recorder/c4a73cafaedb4598eeaa302be97eb2c8f763bee7
Task 215 | 19:25:50 | Deleting jobs: docker/9106f2911a920980d955bda7bcdd42996e6a9b4c48374c293dff034caf2a1892 (00:00:00)
Task 215 | 19:25:50 | Deleting packages: pks-nsx-t-cli/f8541c3ab3a3ddd8720ae4337cfda9d72fd9e9f2 (00:00:00)
Task 215 | 19:25:50 | Deleting packages: cni/5502f221857b523ce73c29c1de24135f6d883b18 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-ncp-cleanup/901e0de4fdb24bcf248b60a8bc8df8a8aed36ca5
Task 215 | 19:25:51 | Deleting packages: sink-agent/d093297b1d5a3cadefaea61eee38ef924a9cf429 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: ncp_rootfs/1cf38e69b6a232870627e9b237df1521d27ff630
Task 215 | 19:25:51 | Deleting packages: billing-recorder/c4a73cafaedb4598eeaa302be97eb2c8f763bee7 (00:00:01)
Task 215 | 19:25:51 | Deleting releases: sink-resources-release/0.1.15 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: kubernetes/6d07c3c855858410b4d50cde2dcb216f460387e9aa8258780bdd961934b2b257
Task 215 | 19:25:51 | Deleting releases: docker/35.0.0 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: send-crd/cef4ac64394b103b8512ccc6396178d8f55a5108
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-ncp-cleanup/901e0de4fdb24bcf248b60a8bc8df8a8aed36ca5 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-osb-proxy/b881c15160571d6b22e30df88729307904af3ffe
Task 215 | 19:25:51 | Deleting packages: ncp_rootfs/1cf38e69b6a232870627e9b237df1521d27ff630 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: nsx-cni/b2d11aeb068c1f8b11093746041503521a4f5645
Task 215 | 19:25:51 | Deleting packages: kubernetes/6d07c3c855858410b4d50cde2dcb216f460387e9aa8258780bdd961934b2b257 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: send-crd/cef4ac64394b103b8512ccc6396178d8f55a5108 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: send-to-vac/075751c4f6e1e9de76ad12558743d681300008c9
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-osb-proxy/b881c15160571d6b22e30df88729307904af3ffe (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kube-apiserver/792816434466dceb68402952aea817f09b489f85e56365b6e01fcf8cbb42981e
Task 215 | 19:25:51 | Deleting packages: nsx-cni/b2d11aeb068c1f8b11093746041503521a4f5645 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: nsx-cni-common/bf71c6d46900c360ddd6e0ec5c25ea7f4149ca0d
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-osb-proxy/868d40563f38b7c34f25f0975032faeb7925213a
Task 215 | 19:25:51 | Deleting packages: send-to-vac/075751c4f6e1e9de76ad12558743d681300008c9 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: telemetry-agent-image/e852395d1dba4700757f6917fe8c794372fe6548
Task 215 | 19:25:51 | Deleting jobs: kube-apiserver/792816434466dceb68402952aea817f09b489f85e56365b6e01fcf8cbb42981e (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kube-controller-manager/bedc72c949209b3baf286d17604f0d90307f32d285481946f5ff28a0fd49db24
Task 215 | 19:25:51 | Deleting packages: nsx-cni-common/bf71c6d46900c360ddd6e0ec5c25ea7f4149ca0d (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-kube-proxy/1203863af9df1a2088b9080968009000ac0fb228
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-osb-proxy/868d40563f38b7c34f25f0975032faeb7925213a (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-ops-files/0e188d77bd76af84cbebe883a6744ca21fec71f7
Task 215 | 19:25:51 | Deleting jobs: kube-controller-manager/bedc72c949209b3baf286d17604f0d90307f32d285481946f5ff28a0fd49db24 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kubelet/c949e7cfa12980e770765b6b6b9d83d007cef7f1e4721487f730e044e1ec6f3e
Task 215 | 19:25:51 | Deleting packages: telemetry-agent-image/e852395d1dba4700757f6917fe8c794372fe6548 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-kube-proxy/1203863af9df1a2088b9080968009000ac0fb228 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: ncp/e015499c0e150d82d1f4d6cc18131665c6086cf0
Task 215 | 19:25:51 | Deleting jobs: telemetry-server/373c0ae8ade1198bfc7782e0d95aa98fdd48b3ce
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-ops-files/0e188d77bd76af84cbebe883a6744ca21fec71f7 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kubelet/c949e7cfa12980e770765b6b6b9d83d007cef7f1e4721487f730e044e1ec6f3e (00:00:00)
Task 215 | 19:25:51 | Deleting releases: pks-nsx-t/1.19.0 (00:00:01)
Task 215 | 19:25:51 | Deleting jobs: ncp/e015499c0e150d82d1f4d6cc18131665c6086cf0 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-node-agent/44920bf6488385cc9e276b183b088e74d47bbe57
Task 215 | 19:25:51 | Deleting jobs: telemetry-server/373c0ae8ade1198bfc7782e0d95aa98fdd48b3ce (00:00:00)
Task 215 | 19:25:51 | Deleting releases: kubo/0.25.9 (00:00:01)
Task 215 | 19:25:51 | Deleting releases: pks-telemetry/2.0.0-build.113 (00:00:01)
Task 215 | 19:25:51 | Deleting jobs: nsx-node-agent/44920bf6488385cc9e276b183b088e74d47bbe57 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: openvswitch/07485c61b5aaf8872db4b96e5e67341cb3ad6867 (00:00:00)
Task 215 | 19:25:51 | Deleting releases: nsx-cf-cni/2.3.1.10693410 (00:00:01)
Task 215 | 19:25:51 | Deleting orphaned disks: disk-25e22704-32cc-4b7d-8a32-8e2cc6a7d720
Task 215 | 19:25:51 | Deleting orphaned disks: disk-123f922e-b311-4dc6-9352-2c8568c25ee8 (00:00:20)
Task 215 | 19:26:15 | Deleting orphaned disks: disk-25e22704-32cc-4b7d-8a32-8e2cc6a7d720 (00:00:24)
Task 215 | 19:26:15 | Deleting dns blobs: DNS blobs (00:00:00)
Task 215 Started Sun Mar 31 19:25:50 UTC 2019
Task 215 Finished Sun Mar 31 19:26:15 UTC 2019
Task 215 Duration 00:00:25
Task 215 done
Succeeded
bosh update-resurrection
https://bosh.io/docs/cli-v2#update-resurrection
Enables or disables resurrection globally. Note, this state is not reflected in the bosh instances command’s Resurrection column.
Quick recap on Resurrector. The Resurrector is a plugin to the Health Monitor. It’s responsible for automatically recreating VMs that become inaccessible. The Resurrector continuously cross-references VMs expected to be running against the VMs that are sending heartbeats. When resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director (scan and fix task) to try to “resurrect” that VM. The Director may do one of two things: 1) create a new VM if the old VM is missing from the IaaS. 2) replace a VM if the Agent on that VM is not responding to commands
bosh [OPTIONS] update-resurrection on|off
bosh locks
https://bosh.io/docs/cli-v2/#locks
Lists current locks
bosh [OPTIONS] locks
$ bosh locks
Using environment '10.10.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Type Resource Task ID Expires at
deployment service-instance_c655c028-d671-4a81-be5b-5bc23652426a 510 Thu Feb 21 00:01:14 UTC 2019
1 locks
bosh errands
https://bosh.io/docs/cli-v2#errands
https://bosh.io/docs/errands/
Lists all errands defined by the deployment.
bosh [OPTIONS] errands
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 errands
Using environment '10.10.0.3' as client 'ops_manager'
Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Name
apply-addons
apply-specs
drain-cluster
smoke-tests
telemetry-agent
wavefront-proxy-errand
6 errands
Succeeded
bosh run-errands
https://bosh.io/docs/cli-v2#run-errand
https://bosh.io/docs/errands/
Runs errand job by name
bosh [OPTIONS] run-errand [run-errand-OPTIONS] NAME
[run-errand command options] --instance=INSTANCE-GROUP[/INSTANCE-ID] Instance or group the errand should run on (must specify errand by release job name) --keep-alive Use existing VM to run an errand and keep it after completion --when-changed Run errand only if errand configuration has changed or if the previous run was unsuccessful --download-logs Download logs --logs-dir= Destination directory for logs (default: .)
bosh logs
https://bosh.io/docs/cli-v2#logs
Download logs
bosh [OPTIONS] logs [logs-OPTIONS] [INSTANCE-GROUP[/INSTANCE-ID]]
[logs command options] --dir= Destination directory (default: .) -f, --follow Follow logs via SSH --num= Last number of lines -q, --quiet Suppresses printing of headers when multiple files are being examined --job= Limit to only specific jobs --only= Filter logs (comma-separated) --agent Include only agent logs
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 logs
Using environment '10.10.0.3' as client 'ops_manager'
Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'
Task 220
Task 220 | 20:39:28 | Fetching logs for worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for worker/820e841a-e121-437a-823b-1642468e4ae9 (0): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for worker/50384028-2098-4579-bb6e-5dfa5f36053c (2): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for master/320749e3-aa1e-4710-a200-7beb67762c8a (0): Finding and packing log files
Task 220 | 20:39:33 | Fetching logs for worker/820e841a-e121-437a-823b-1642468e4ae9 (0): Finding and packing log files (00:00:05)
Task 220 | 20:39:33 | Fetching logs for worker/50384028-2098-4579-bb6e-5dfa5f36053c (2): Finding and packing log files (00:00:05)
Task 220 | 20:39:34 | Fetching logs for worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1): Finding and packing log files (00:00:06)
Task 220 | 20:40:06 | Fetching logs for master/320749e3-aa1e-4710-a200-7beb67762c8a (0): Finding and packing log files (00:00:38)
Task 220 | 20:40:07 | Fetching group of logs: Packing log files together
Task 220 Started Sun Mar 31 20:39:28 UTC 2019
Task 220 Finished Sun Mar 31 20:40:07 UTC 2019
Task 220 Duration 00:00:39
Task 220 done
Downloading resource '5fb2e8de-584c-49e9-8b25-0a7a22ce6d03' to '/home/ubuntu/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258-20190331-203823-446588011.tgz'...
########################################################## 96.52% 24.17 MiB/s
Succeeded
There are many other bosh commands but above are the ones I use often. And so that wraps up part 1 of my troubleshooting series.