PKS Troubleshooting – Part 1: Using BOSH CLI

Intro

In my previous blog series I stepped though how to build a Enterprise PKS (will reference as PKS from here out) and NSX-T home lab nested env from the ground up which was well received and referenced by many. In this series I will guide you through how to troubleshoot PKS. First up is using BOSH CLI.

BOSH Overview

You may, or may not, know BOSH is what makes PKS awesome under the covers for day 1 and day 2 operations. If you read my other blog posts you will see I’m not one to repeat content already created by others, so in the same vain, find some great YouTube vids below on BOSH by Merlin Glynn and James Watters.

BOSH CLI

To interact with the BOSH Director, we use BOSH CLI! BOSH CLI can be downloaded from here and is available for Windows, Mac, and Linux. BOSH CLI is already per-installed on the PCF Ops Manager VM which is very useful if you need to quickly perform some troubleshooting. Installing BOSH CLI is as simple as granting it excute permissions and copying/moving it to your PATH. For more detailed steps, see here.

BOSH Credentials

To interact with BOSH Director using BOSH CLI, naturally we need credentials. We can retrieve these credentials from PCF Ops Manager, aka Ops Man. Open and log into Ops Man web UI > on the installation dashboard click the BOSH Director tile > click Credentials tab > towards the bottom of the list of credentials click Link to Credential for Bosh Commandline Credentials

This will open a page with the credentials for BOSH Director in a single string to be executed from Ops Man VM.

This command line string can be used each time by taking from “BOSH_CLIENT=” to “bosh” and appending a BOSH command eg vms

ubuntu@opsmanager-2-4:~$ BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=io2Rd9uaa6SoOnSefHXlR_gBHyj6fbu1 BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=172.14.0.3 bosh vms

 

Alternatively, what can be done for convenience is to export the command line string for the session and just use bosh <command>

ubuntu@opsmanager-2-4:~$ export BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=io2Rd9uaa6SoOnSefHXlR_gBHyj6fbu1 BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=172.14.0.3 bosh
ubuntu@opsmanager-2-4:~$ bosh vms

 

Or, add the export command to a file and source it. Don’t forget to delete the file once finished as not best to leave a file with credentials lying around!

ubuntu@opsmanager-2-4:~$ vi pks-env
ubuntu@opsmanager-2-4:~$ source pks-env
ubuntu@opsmanager-2-4:~$ bosh vms

 

Above was when issuing BOSH commands from the Ops Man VM. If using a jump host and managing several foundations/environments you can configure a bosh alias for each using bosh alias-env. Note, use –ca-cert to specify path to BOSH director cert or if your certs are trusted via system installed CA certs, there is no need to provide –ca-cert option.

root@ubuntu-jump:~# bosh alias-env pks -e 10.0.80.3
root@ubuntu-jump:~# bosh alias-env pas -e 10.1.80.3

 

With the alias set, we need to login. The credentials are retrieved from Ops Man > BOSH Director tile >  Credentials tab > Director Credentials.

To login we use bosh -e <alias> login and the retrieved credentials.

root@ubuntu-jump:~# bosh -e pks login
Using environment '10.10.0.3'

Email (): director
Password ():

Successfully authenticated with UAA

Succeeded

 

When issuing bosh commands, you then specify the environment alias eg bosh -e <alias> <command>

root@ubuntu-jump:~# bosh -e pks vms

 

Now that we can issues commands, lets have a look at a few I typically use when troubleshooting.

 

bosh cli options

bosh cli has the following options for each command. The most common I use is -d / –deployment

Application Options:
  -v, --version          Show CLI version
      --config=          Config file path (default: ~/.bosh/config) [$BOSH_CONFIG]
  -e, --environment=     Director environment name or URL [$BOSH_ENVIRONMENT]
      --ca-cert=         Director CA certificate path or value [$BOSH_CA_CERT]
      --sha2             Use SHA256 checksums [$BOSH_SHA2]
      --parallel=        The max number of parallel operations (default: 5)
      --client=          Override username or UAA client [$BOSH_CLIENT]
      --client-secret=   Override password or UAA client secret [$BOSH_CLIENT_SECRET]
  -d, --deployment=      Deployment name [$BOSH_DEPLOYMENT]
      --column=          Filter to show only given column(s)
      --json             Output as JSON
      --tty              Force TTY-like output
      --no-color         Toggle colorized output
  -n, --non-interactive  Don't ask for user input [$BOSH_NON_INTERACTIVE]

 

bosh deployments

https://bosh.io/docs/cli-v2#deployments

Lists deployments tracked by the Director. Shows their names, used releases and stemcells.

 bosh [OPTIONS] deployments
$ bosh deployments
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Name                                                   Release(s)                            Stemcell(s)                                      Team(s)
pivotal-container-service-8757185288bf36d3580b         backup-and-restore-sdk/1.8.0          bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.24  -
                                                       bosh-dns/1.10.0
                                                       bpm/0.13.0
                                                       cf-mysql/36.14.0
                                                       cfcr-etcd/1.8.0
                                                       docker/33.0.2
                                                       kubo/0.25.8
                                                       kubo-service-adapter/1.3.0-build.154
                                                       nsx-cf-cni/2.3.1.10693410
                                                       on-demand-service-broker/0.24.0
                                                       pks-api/1.3.0-build.154
                                                       pks-helpers/50.0.0
                                                       pks-nsx-t/1.19.0
                                                       pks-telemetry/2.0.0-build.113
                                                       pks-vrli/0.7.0
                                                       sink-resources-release/0.1.15
                                                       syslog/11.4.0
                                                       uaa/64.0
                                                       wavefront-proxy/0.9.0
service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5  bosh-dns/1.10.0                       bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.24  pivotal-container-service-8757185288bf36d3580b
                                                       bpm/0.13.0
                                                       cfcr-etcd/1.8.0
                                                       docker/33.0.2
                                                       kubo/0.25.8
                                                       nsx-cf-cni/2.3.1.10693410
                                                       pks-helpers/50.0.0
                                                       pks-nsx-t/1.19.0
                                                       pks-telemetry/2.0.0-build.113
                                                       pks-vrli/0.7.0
                                                       sink-resources-release/0.1.15
                                                       syslog/11.4.0
                                                       wavefront-proxy/0.9.0

2 deployments

Succeeded

Here we see “pivotal-container-service-8757185288bf36d3580b” which is the PKS API and “service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5” which is a PKS K8s cluster. Both of these are the deployment names that will be used in other BOSH commands.

 

bosh vms

https://bosh.io/docs/cli-v2#vms

Lists all VMs managed by the Director. Show instance names, IPs and VM CIDs.

 bosh [OPTIONS] vms [vms-OPTIONS]
[vms command options]
          --dns               Show DNS A records
          --vitals            Show vitals
          --cloud-properties  Show cloud properties
$ bosh vms
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 160
Task 161. Done

Task 160 done

Deployment 'pivotal-container-service-8757185288bf36d3580b'

Instance                                                        Process State  AZ       IPs         VM CID                                   VM Type  Active
pivotal-container-service/9b1a76d7-0fd6-4b21-a0ad-bb2fba1d25e5  running        Mgmt-AZ  172.14.0.4  vm-adc19244-85bb-4e26-812a-76da6dc20783  large    true

1 vms

Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'

Instance                                     Process State  AZ   IPs         VM CID                                   VM Type      Active
master/529ba6a2-2cdf-491c-942b-2df671c9be2e  running        AZ1  172.15.0.2  vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8  medium.disk  true
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99  running        AZ1  172.15.0.3  vm-556623ca-d18f-4a06-a6a7-322cf7c2775f  medium.disk  true
worker/4b53f264-43b5-48dc-87a6-a2379a285b72  running        AZ2  172.15.0.4  vm-b96a690d-29de-472f-8596-c7f018e36f83  medium.disk  true
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d  running        AZ3  172.15.0.5  vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb  medium.disk  true

4 vms

Succeeded

 

Lists all VMs managed by the Director in a single deployment. Show instance names, IPs and VM CIDs.

bosh -d deployment vms
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 159. Done

Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'

Instance                                     Process State  AZ   IPs         VM CID                                   VM Type      Active
master/529ba6a2-2cdf-491c-942b-2df671c9be2e  running        AZ1  172.15.0.2  vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8  medium.disk  true
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99  running        AZ1  172.15.0.3  vm-556623ca-d18f-4a06-a6a7-322cf7c2775f  medium.disk  true
worker/4b53f264-43b5-48dc-87a6-a2379a285b72  running        AZ2  172.15.0.4  vm-b96a690d-29de-472f-8596-c7f018e36f83  medium.disk  true
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d  running        AZ3  172.15.0.5  vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb  medium.disk  true

4 vms

Succeeded

 

Load, cpu, memory, and disk stats per instance (VM).

bosh -d deployment vms --vitals
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms --vitals
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 182. Done

Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'

Instance                                     Process State  AZ   IPs         VM CID                                   VM Type      Active  VM Created At                 Uptime          Load              CPU    CPU   CPU   CPU   Memory        Swap         System      Ephemeral   Persistent
                                                                                                                                                                                         (1m, 5m, 15m)     Total  User  Sys   Wait  Usage         Usage        Disk Usage  Disk Usage  Disk Usage
master/529ba6a2-2cdf-491c-942b-2df671c9be2e  running        AZ1  172.15.0.2  vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8  medium.disk  true    Wed Mar 20 23:12:41 UTC 2019  8d 20h 35m 18s  0.69, 0.57, 0.45  -      4.1%  2.9%  0.2%  28% (1.1 GB)  0% (268 kB)  46% (33i%)  8% (2i%)    4% (0i%)
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99  running        AZ1  172.15.0.3  vm-556623ca-d18f-4a06-a6a7-322cf7c2775f  medium.disk  true    Wed Mar 20 23:12:40 UTC 2019  8d 20h 35m 19s  0.24, 0.20, 0.13  -      3.4%  2.2%  0.1%  30% (1.2 GB)  0% (0 B)     46% (33i%)  12% (2i%)   19% (7i%)
worker/4b53f264-43b5-48dc-87a6-a2379a285b72  running        AZ2  172.15.0.4  vm-b96a690d-29de-472f-8596-c7f018e36f83  medium.disk  true    Wed Mar 20 23:12:40 UTC 2019  8d 20h 35m 14s  0.09, 0.09, 0.08  -      5.2%  1.8%  0.1%  30% (1.2 GB)  0% (0 B)     46% (33i%)  12% (2i%)   19% (7i%)
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d  running        AZ3  172.15.0.5  vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb  medium.disk  true    Wed Mar 20 23:12:41 UTC 2019  8d 20h 35m 12s  0.00, 0.00, 0.00  -      0.3%  0.3%  0.0%  10% (397 MB)  0% (0 B)     46% (33i%)  9% (1i%)    -

4 vms

Succeeded

 

Show cloud properties for each instance.

bosh -d deployment vms --cloud-properties
$ bosh -d service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5 vms --cloud-properties
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 183. Done

Deployment 'service-instance_b6a4d6ab-db52-41d3-a910-57b013cff3a5'

Instance                                     Process State  AZ   IPs         VM CID                                   VM Type      Active  Cloud Properties
master/529ba6a2-2cdf-491c-942b-2df671c9be2e  running        AZ1  172.15.0.2  vm-810ef6e9-a4a5-4d1c-9d6c-ded285dd52e8  medium.disk  true    cpu: 2
                                                                                                                                           datacenters:
                                                                                                                                           - clusters:
                                                                                                                                             - Compute-cluster:
                                                                                                                                                 resource_pool: AZ1
                                                                                                                                             name: AZ1
                                                                                                                                           disk: 32768
                                                                                                                                           ram: 4096
worker/2f59b6dc-bd11-4ce2-8018-49f061bb8d99  running        AZ1  172.15.0.3  vm-556623ca-d18f-4a06-a6a7-322cf7c2775f  medium.disk  true    cpu: 2
                                                                                                                                           datacenters:
                                                                                                                                           - clusters:
                                                                                                                                             - Compute-cluster:
                                                                                                                                                 resource_pool: AZ1
                                                                                                                                             name: AZ1
                                                                                                                                           disk: 32768
                                                                                                                                           ram: 4096
                                                                                                                                           vmx_options:
                                                                                                                                             disk.enableUUID: "1"
worker/4b53f264-43b5-48dc-87a6-a2379a285b72  running        AZ2  172.15.0.4  vm-b96a690d-29de-472f-8596-c7f018e36f83  medium.disk  true    cpu: 2
                                                                                                                                           datacenters:
                                                                                                                                           - clusters:
                                                                                                                                             - Compute-cluster:
                                                                                                                                                 resource_pool: AZ2
                                                                                                                                             name: AZ2
                                                                                                                                           disk: 32768
                                                                                                                                           ram: 4096
                                                                                                                                           vmx_options:
                                                                                                                                             disk.enableUUID: "1"
worker/f7683ff7-da8c-44f2-bbf2-05bd1382f42d  running        AZ3  172.15.0.5  vm-7c8c4fc8-92e7-4c5f-8e65-ed453cabfdeb  medium.disk  true    cpu: 2
                                                                                                                                           datacenters:
                                                                                                                                           - clusters:
                                                                                                                                             - Compute-cluster:
                                                                                                                                                 resource_pool: AZ3
                                                                                                                                             name: AZ3
                                                                                                                                           disk: 32768
                                                                                                                                           ram: 4096
                                                                                                                                           vmx_options:
                                                                                                                                             disk.enableUUID: "1"

4 vms

Succeeded

 

bosh tasks

https://bosh.io/docs/cli-v2#tasks

Lists active and previously ran tasks.

 bosh [OPTIONS] tasks [tasks-OPTIONS]
[tasks command options]
      -r, --recent=      Number of tasks to show
      -a, --all          Include all task types (ssh, logs, vms, etc)
ubuntu@opsmanager-2-4:~$ bosh tasks
Using environment '172.14.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

ID   State       Started At                    Last Activity At              User                                            Deployment                                             Description        Result
195  processing  Fri Mar 29 23:13:37 UTC 2019  Fri Mar 29 23:13:37 UTC 2019  pivotal-container-service-8757185288bf36d3580b  service-instance_6b0fd642-0d22-4711-a9cf-2b98364ee1df  create deployment  -

1 tasks

Succeeded

 

List the 30 most recent tasks

bosh tasks --recent
$ bosh tasks --recent
Using environment '10.10.0.3' as client 'ops_manager'

ID   State  Started At                    Last Activity At              User                                            Deployment                                             Description                                                                                              Result
166  done   Wed Mar 27 16:21:18 UTC 2019  Wed Mar 27 16:25:21 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258         1 succeeded, 0 errored, 0 canceled
165  done   Wed Mar 27 16:16:17 UTC 2019  Wed Mar 27 16:20:43 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258            1 succeeded, 0 errored, 0 canceled
160  done   Wed Mar 27 16:03:14 UTC 2019  Wed Mar 27 16:15:36 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  create deployment                                                                                        /deployments/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258
159  done   Wed Mar 27 16:03:09 UTC 2019  Wed Mar 27 16:26:21 UTC 2019  ops_manager                                     pivotal-container-service-5c365f595dc7e6c897d6         run errand upgrade-all-service-instances from deployment pivotal-container-service-5c365f595dc7e6c897d6  1 succeeded, 0 errored, 0 canceled
158  done   Wed Mar 27 16:02:45 UTC 2019  Wed Mar 27 16:03:07 UTC 2019  ops_manager                                     pivotal-container-service-5c365f595dc7e6c897d6         run errand pks-nsx-t-precheck from deployment pivotal-container-service-5c365f595dc7e6c897d6             1 succeeded, 0 errored, 0 canceled
157  done   Wed Mar 27 15:59:23 UTC 2019  Wed Mar 27 16:02:43 UTC 2019  ops_manager                                     pivotal-container-service-5c365f595dc7e6c897d6         create deployment                                                                                        /deployments/pivotal-container-service-5c365f595dc7e6c897d6
156  done   Wed Mar 27 15:59:19 UTC 2019  Wed Mar 27 15:59:19 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'wavefront-proxy/0.9.0'
155  done   Wed Mar 27 15:58:54 UTC 2019  Wed Mar 27 15:58:54 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'backup-and-restore-sdk/1.8.0'
154  done   Wed Mar 27 15:58:17 UTC 2019  Wed Mar 27 15:58:17 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'bpm/0.13.0'
153  done   Wed Mar 27 15:58:07 UTC 2019  Wed Mar 27 15:58:07 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'uaa/64.0'
152  done   Wed Mar 27 15:57:48 UTC 2019  Wed Mar 27 15:57:48 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-telemetry/2.0.0-build.140' 
151  done   Wed Mar 27 15:57:32 UTC 2019  Wed Mar 27 15:57:33 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'sink-resources-release/0.1.15.1'
150  done   Wed Mar 27 15:57:18 UTC 2019  Wed Mar 27 15:57:18 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'syslog/11.4.0'
149  done   Wed Mar 27 15:57:09 UTC 2019  Wed Mar 27 15:57:09 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-vrli/0.7.0'
148  done   Wed Mar 27 15:57:02 UTC 2019  Wed Mar 27 15:57:04 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'nsx-cf-cni/2.3.2.11695762'
147  done   Wed Mar 27 15:56:50 UTC 2019  Wed Mar 27 15:56:51 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-nsx-t/1.21.0'
146  done   Wed Mar 27 15:56:41 UTC 2019  Wed Mar 27 15:56:41 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-helpers/50.0.0'
145  done   Wed Mar 27 15:56:40 UTC 2019  Wed Mar 27 15:56:40 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-api/1.3.3-build.1'
144  done   Wed Mar 27 15:56:26 UTC 2019  Wed Mar 27 15:56:26 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'on-demand-service-broker/0.24.0'
143  done   Wed Mar 27 15:56:16 UTC 2019  Wed Mar 27 15:56:16 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'kubo-service-adapter/1.3.3-build.1'
142  done   Wed Mar 27 15:56:10 UTC 2019  Wed Mar 27 15:56:10 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'cfcr-etcd/1.8.0'
141  done   Wed Mar 27 15:55:59 UTC 2019  Wed Mar 27 15:55:59 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'kubo/0.25.11'
140  done   Wed Mar 27 15:55:36 UTC 2019  Wed Mar 27 15:55:36 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'docker/35.1.0'
139  done   Wed Mar 27 15:55:24 UTC 2019  Wed Mar 27 15:55:26 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'cf-mysql/36.14.0'
138  done   Wed Mar 27 15:54:39 UTC 2019  Wed Mar 27 15:54:40 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'pks-helpers/50.0.0'
137  done   Wed Mar 27 15:54:32 UTC 2019  Wed Mar 27 15:54:32 UTC 2019  ops_manager                                     -                                                      create release                                                                                           Created release 'syslog/11.4.0'
136 done Wed Mar 27 15:54:25 UTC 2019 Wed Mar 27 15:54:25 UTC 2019 ops_manager - create release Created release 'bosh-dns/1.10.0'
135 done Wed Mar 27 15:42:30 UTC 2019 Wed Mar 27 15:46:26 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled
134 done Wed Mar 27 15:37:38 UTC 2019 Wed Mar 27 15:42:28 UTC 2019 pivotal-container-service-5c365f595dc7e6c897d6 service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 1 succeeded, 0 errored, 0 canceled

30 tasks

Succeeded

 

List x number of recent tasks

bosh tasks --recent=<num>
$ bosh ts -r=5
Using environment '10.10.0.3' as client 'ops_manager'

ID   State  Started At                    Last Activity At              User                                            Deployment                                             Description                                                                                              Result
166  done   Wed Mar 27 16:21:18 UTC 2019  Wed Mar 27 16:25:21 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  run errand telemetry-agent from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258         1 succeeded, 0 errored, 0 canceled
165  done   Wed Mar 27 16:16:17 UTC 2019  Wed Mar 27 16:20:43 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  run errand apply-addons from deployment service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258            1 succeeded, 0 errored, 0 canceled
160  done   Wed Mar 27 16:03:14 UTC 2019  Wed Mar 27 16:15:36 UTC 2019  pivotal-container-service-5c365f595dc7e6c897d6  service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  create deployment                                                                                        /deployments/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258
159  done   Wed Mar 27 16:03:09 UTC 2019  Wed Mar 27 16:26:21 UTC 2019  ops_manager                                     pivotal-container-service-5c365f595dc7e6c897d6         run errand upgrade-all-service-instances from deployment pivotal-container-service-5c365f595dc7e6c897d6  1 succeeded, 0 errored, 0 canceled
158  done   Wed Mar 27 16:02:45 UTC 2019  Wed Mar 27 16:03:07 UTC 2019  ops_manager                                     pivotal-container-service-5c365f595dc7e6c897d6         run errand pks-nsx-t-precheck from deployment pivotal-container-service-5c365f595dc7e6c897d6             1 succeeded, 0 errored, 0 canceled

5 tasks

Succeeded

 

bosh task

https://bosh.io/docs/cli-v2#task

Show details of a single task. Continues to follow task if it is still running.

 bosh [OPTIONS] task [task-OPTIONS] [ID]
[task command options]
          --event        Track event log
          --cpi          Track CPI log
          --debug        Track debug log
          --result       Track result log
      -a, --all          Include all task types (ssh, logs, vms, etc)
$ bosh task 128
Using environment '10.10.0.3' as client 'ops_manager'

Task 128

Task 128 | 15:09:19 | Preparing deployment: Preparing deployment
Task 128 | 15:09:21 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:22 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:22 | Warning: DNS address not available for the link provider instance: pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f
Task 128 | 15:09:38 | Preparing deployment: Preparing deployment (00:00:19)
Task 128 | 15:09:51 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 128 | 15:09:51 | Creating missing vms: master/320749e3-aa1e-4710-a200-7beb67762c8a (0)
Task 128 | 15:09:51 | Creating missing vms: worker/820e841a-e121-437a-823b-1642468e4ae9 (0)
Task 128 | 15:09:51 | Creating missing vms: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1)
Task 128 | 15:09:51 | Creating missing vms: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2)
Task 128 | 15:11:59 | Creating missing vms: master/320749e3-aa1e-4710-a200-7beb67762c8a (0) (00:02:08)
Task 128 | 15:12:00 | Creating missing vms: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1) (00:02:09)
Task 128 | 15:12:02 | Creating missing vms: worker/820e841a-e121-437a-823b-1642468e4ae9 (0) (00:02:11)
Task 128 | 15:12:09 | Creating missing vms: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2) (00:02:18)
Task 128 | 15:12:11 | Updating instance master: master/320749e3-aa1e-4710-a200-7beb67762c8a (0) (canary) (00:02:48)
Task 128 | 15:14:59 | Updating instance worker: worker/820e841a-e121-437a-823b-1642468e4ae9 (0) (canary) (00:05:54)
Task 128 | 15:20:53 | Updating instance worker: worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1) (00:08:31)
Task 128 | 15:29:24 | Updating instance worker: worker/50384028-2098-4579-bb6e-5dfa5f36053c (2) (00:08:04)

Task 128 Started  Wed Mar 27 15:09:19 UTC 2019
Task 128 Finished Wed Mar 27 15:37:28 UTC 2019
Task 128 Duration 00:28:09
Task 128 done

Succeeded

 

Retrieve tasks debug logs.

bosh task <ID> --debug

I tend to pipe this to a file as it can be quite large eg 2MB+

$ bosh t 157 --debug  > task_157_debug.txt

Below is an excerpt of a task debug log

I, [2019-03-27T15:59:22.451689 #25] [0x2ab99f506418]  INFO -- TaskHelper: Director Version: 268.2.2
I, [2019-03-27T15:59:22.451757 #25] [0x2ab99f506418]  INFO -- TaskHelper: Enqueuing task: 157
I, [2019-03-27T15:59:23.341512 #32742] []  INFO -- DirectorJobRunner: Looking for task with task id 157
D, [2019-03-27T15:59:23.343843 #32742] [] DEBUG -- DirectorJobRunner: (0.000370s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
I, [2019-03-27T15:59:23.346548 #32742] []  INFO -- DirectorJobRunner: Found task #<Bosh::Director::Models::Task @values={:id=>157, :state=>"processing", :timestamp=
>2019-03-27 15:59:22 UTC, :description=>"create deployment", :result=>nil, :output=>"/var/vcap/store/director/tasks/157", :checkpoint_time=>2019-03-27 15:59:22 UTC,
 :type=>"update_deployment", :username=>"ops_manager", :deployment_name=>"pivotal-container-service-5c365f595dc7e6c897d6", :started_at=>nil, :event_output=>"", :res
ult_output=>"", :context_id=>""}>
I, [2019-03-27T15:59:23.346696 #32742] []  INFO -- DirectorJobRunner: Running from worker 'worker_3' on director/b40c1629-8572-46ea-7680-af70833300c4 (127.0.0.1)
I, [2019-03-27T15:59:23.346769 #32742] []  INFO -- DirectorJobRunner: Starting task: 157
I, [2019-03-27T15:59:23.346995 #32742] [task:157]  INFO -- DirectorJobRunner: Creating job
D, [2019-03-27T15:59:23.349534 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000405s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
D, [2019-03-27T15:59:23.351056 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000213s) (conn: 47254693400660) SELECT * FROM "tasks" WHERE "id" = 157
I, [2019-03-27T15:59:23.353690 #32742] [task:157]  INFO -- DirectorJobRunner: Performing task: #<Bosh::Director::Models::Task @values={:id=>157, :state=>"processing
", :timestamp=>2019-03-27 15:59:22 UTC, :description=>"create deployment", :result=>nil, :output=>"/var/vcap/store/director/tasks/157", :checkpoint_time=>2019-03-27
 15:59:22 UTC, :type=>"update_deployment", :username=>"ops_manager", :deployment_name=>"pivotal-container-service-5c365f595dc7e6c897d6", :started_at=>nil, :event_ou
tput=>"", :result_output=>"", :context_id=>""}>
D, [2019-03-27T15:59:23.356326 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000095s) (conn: 47254693400660) BEGIN
D, [2019-03-27T15:59:23.360350 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.000758s) (conn: 47254693400660) UPDATE "tasks" SET "state" = 'processing', "timesta
mp" = '2019-03-27 15:59:23.353831+0000', "description" = 'create deployment', "result" = NULL, "output" = '/var/vcap/store/director/tasks/157', "checkpoint_time" =
'2019-03-27 15:59:23.354810+0000', "type" = 'update_deployment', "username" = 'ops_manager', "deployment_name" = 'pivotal-container-service-5c365f595dc7e6c897d6', "
started_at" = '2019-03-27 15:59:23.354518+0000', "event_output" = '', "result_output" = '', "context_id" = '' WHERE ("id" = 157)
D, [2019-03-27T15:59:23.366628 #32742] [task:157] DEBUG -- DirectorJobRunner: (0.005804s) (conn: 47254693400660) COMMIT
I, [2019-03-27T15:59:23.366932 #32742] [task:157]  INFO -- DirectorJobRunner: Reading deployment manifest

 

Likewise for CPI (Cloud Provider Interface) logs.

bosh task <ID> --cpi

I pipe them to a file.

$ bosh t 26 --cpi  > task_26_cpi.txt

Below is an excerpt of a task cpi log

I, [2019-03-15T14:40:26.108211 #15220]  INFO -- [req_id cpi-909205]: Starting create_vm...
D, [2019-03-15T14:40:26.110953 #15220] DEBUG -- [req_id cpi-909205]: Running method 'RetrieveServiceContent'...
D, [2019-03-15T14:40:26.196337 #15220] DEBUG -- [req_id cpi-909205]: Running method 'Login'...
D, [2019-03-15T14:40:26.255184 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByInventoryPath'...
D, [2019-03-15T14:40:26.265972 #15220] DEBUG -- [req_id cpi-909205]: Running method 'CreateContainerView'...
D, [2019-03-15T14:40:26.305712 #15220] DEBUG -- [req_id cpi-909205]: All clusters provided: {"SA-Management-Edge"=>#<VSphereCloud::ClusterConfig:0x000055bfa94b7a30
@name="SA-Management-Edge", @config={"resource_pool"=>"AZ-MGMT"}>, "SA-Compute-01"=>#<VSphereCloud::ClusterConfig:0x000055bfa94b3d18 @name="SA-Compute-01", @config=
{"resource_pool"=>"AZ-COMP-3"}>}
D, [2019-03-15T14:40:26.305879 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByInventoryPath'...
I, [2019-03-15T14:40:26.365699 #15220]  INFO -- [req_id cpi-909205]: Using global ephemeral disk datastore pattern: ^(SA\-Shared\-02\-Remote)$
I, [2019-03-15T14:40:26.379263 #15220]  INFO -- [req_id cpi-909205]: Initiating VM Allocator with VM Placement Criteria: Disk Config: [Disk ==> Disk CID: Size:16384
 Ephemeral:true Target-DS:^(SA\-Shared\-02\-Remote)$ Existing-DS:]  Req Memory: 4096  Mem Headroom: 128
I, [2019-03-15T14:40:26.379427 #15220]  INFO -- [req_id cpi-909205]: Gathering vm placement resources for vm placement allocator pipeline
D, [2019-03-15T14:40:26.434012 #15220] DEBUG -- [req_id cpi-909205]: Found requested resource pool: <[Vim.ResourcePool] resgroup-401>
D, [2019-03-15T14:40:26.451823 #15220] DEBUG -- [req_id cpi-909205]: Filter VM Placement Cluster: SA-Management-Edge Memory: 59442 for free memory required: 4224
D, [2019-03-15T14:40:26.452066 #15220] DEBUG -- [req_id cpi-909205]: Filter VM Placement Cluster : SA-Management-Edge hosts:  datastores: [<Datastore: / SA-ESXi-01-
Local>, <Datastore: / SA-Shared-02-Remote>, <Datastore: / SA-ESXi-02-Local>, <Datastore: / SA-ESXi-03-Local>, <Datastore: / SA-Shared-01-Remote>] for combination of
 DS satisfying disk configurations
D, [2019-03-15T14:40:26.452192 #15220] DEBUG -- [req_id cpi-909205]: Trying to find placement for Disk ==> Disk CID: Size:16384 Ephemeral:true Target-DS:^(SA\-Share
d\-02\-Remote)$ Existing-DS:
I, [2019-03-15T14:40:26.452420 #15220]  INFO -- [req_id cpi-909205]: Initiating Disk Allocator with Storage Criteria: Space Req: 16384 Target Pattern: ^(SA\-Shared\
-02\-Remote)$
        Existing-DS Pattern:
D, [2019-03-15T14:40:26.555554 #15220] DEBUG -- [req_id cpi-909205]: Found <Datastore: / SA-Shared-02-Remote> for Disk ==> Disk CID: Size:16384 Ephemeral:true Targe
t-DS:^(SA\-Shared\-02\-Remote)$ Existing-DS:
D, [2019-03-15T14:40:26.555788 #15220] DEBUG -- [req_id cpi-909205]: Initial Storage Options for creating ephemeral disk from pattern: ["SA-Shared-02-Remote"]
D, [2019-03-15T14:40:26.555872 #15220] DEBUG -- [req_id cpi-909205]: Storage Options for creating ephemeral disk are: ["SA-Shared-02-Remote"]
I, [2019-03-15T14:40:26.556014 #15220]  INFO -- [req_id cpi-909205]: Checking if ip '10.10.0.5' is in use
D, [2019-03-15T14:40:26.556117 #15220] DEBUG -- [req_id cpi-909205]: Running method 'FindByIp'...
I, [2019-03-15T14:40:26.564608 #15220]  INFO -- [req_id cpi-909205]: No IP conflicts detected
I, [2019-03-15T14:40:26.565046 #15220]  INFO -- [req_id cpi-909205]: Creating vm: vm-884de320-42fa-4af5-ad61-f6fa19a554f9 on <[Vim.ClusterComputeResource] domain-c2
6> stored in <[Vim.Datastore] datastore-421>

 

bosh instances

https://bosh.io/docs/cli-v2#instances

List all instances in a deployment.

bosh [OPTIONS] instances [instances-OPTIONS]
[instances command options]
      -i, --details      Show details including VM CID, persistent disk CID, etc.
          --dns          Show DNS A records
          --vitals       Show vitals
      -p, --ps           Show processes
      -f, --failing      Only show failing instances

Instance command options –details and –ps are ones I typically use often

bosh instances --details
$ bosh instances --details
Using environment '10.10.0.3' as client 'ops_manager'

Task 186
Task 185
Task 186 done

Task 185 done

Deployment 'pivotal-container-service-5c365f595dc7e6c897d6'

Instance                                                        Process State  AZ       IPs        State    VM CID                                   VM Type  Disk CIDs                                  Agent ID                              Index  Resurrection  Bootstrap  Ignore
                                                                                                                                                                                                                                                      Paused
pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f  running        AZ-MGMT  10.10.0.4  started  vm-9ab49a58-6f4e-4cb7-9b30-19d5cc5e5ada  large    disk-d0ed65d5-ed25-4bbf-adf9-baf5fe3d92d9  fe77c7f7-dfb1-4c54-91a0-33376dc1d6c2  0      false         true       false

1 instances

Deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Instance                                           Process State  AZ         IPs        State    VM CID                                   VM Type      Disk CIDs                                  Agent ID                              Index  Resurrection  Bootstrap  Ignore
                                                                                                                                                                                                                                               Paused
apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857  -              AZ-COMP-1  -          started  -                                        micro        -                                          -                                     0      false         true       false
master/320749e3-aa1e-4710-a200-7beb67762c8a        running        AZ-COMP-1  10.20.0.2  started  vm-02ded866-05fe-46ae-a543-0d92ce07a953  medium.disk  disk-0f0d3195-e26a-45ba-ac2c-605fefa682ec  f5451f4d-b044-4bc5-bf1f-fe5ca553997a  0      false         true       false
worker/50384028-2098-4579-bb6e-5dfa5f36053c        running        AZ-COMP-3  10.20.0.5  started  vm-b72f5307-4c4d-48ca-b4ba-f4c8ee8b20cf  medium.disk  disk-95291f7c-435d-456f-a27e-9c0e105bd003  1d649278-46e0-4786-83e8-745f47128224  2      false         false      false
worker/820e841a-e121-437a-823b-1642468e4ae9        running        AZ-COMP-1  10.20.0.3  started  vm-0cab1c50-6bca-43c0-a92a-8ab89bfb49fd  medium.disk  disk-9eb21d95-ca43-43e2-b23b-95a206924741  5abc24ef-d611-4573-a6c4-d453df7463d1  0      false         true       false
worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47        running        AZ-COMP-2  10.20.0.4  started  vm-d95600f8-bb70-4e05-b02f-b0af042368ff  medium.disk  disk-b404dd6f-25d5-4019-9aac-ba3e1cfc2d02  b187b7d8-0ef3-45f6-911b-a7c11d72f950  1      false         false      false

5 instances

Succeeded

 

bosh instances --ps

Show all the processes for each instance. This information can also be retrieved when directly on the instance and using the monit summary command.

$ bosh instances --ps
Using environment '10.10.0.3' as client 'ops_manager'

Task 190
Task 189
Task 190 done

Task 189 done

Deployment 'pivotal-container-service-5c365f595dc7e6c897d6'

Instance                                                        Process                    Process State  AZ       IPs
pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f  -                          running        AZ-MGMT  10.10.0.4
~                                                               bosh-dns                   running        -        -
~                                                               bosh-dns-healthcheck       running        -        -
~                                                               bosh-dns-resolvconf        running        -        -
~                                                               broker                     running        -        -
~                                                               cluster_health_logger      running        -        -
~                                                               galera-healthcheck         running        -        -
~                                                               gra-log-purger-executable  running        -        -
~                                                               mariadb_ctrl               running        -        -
~                                                               pks-api                    running        -        -
~                                                               pks-nsx-t-osb-proxy        running        -        -
~                                                               telemetry-server           running        -        -
~                                                               uaa                        running        -        -

1 instances

Deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Instance                                           Process                          Process State  AZ         IPs
apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857  -                                -              AZ-COMP-1  -
master/320749e3-aa1e-4710-a200-7beb67762c8a        -                                running        AZ-COMP-1  10.20.0.2
~                                                  blackbox                         running        -          -
~                                                  bosh-dns                         running        -          -
~                                                  bosh-dns-healthcheck             running        -          -
~                                                  bosh-dns-resolvconf              running        -          -
~                                                  etcd                             running        -          -
~                                                  kube-apiserver                   running        -          -
~                                                  kube-controller-manager          running        -          -
~                                                  kube-scheduler                   running        -          -
~                                                  ncp                              running        -          -
~                                                  pks-helpers-bosh-dns-resolvconf  running        -          -
worker/50384028-2098-4579-bb6e-5dfa5f36053c        -                                running        AZ-COMP-3  10.20.0.5
~                                                  blackbox                         running        -          -
~                                                  bosh-dns                         running        -          -
~                                                  bosh-dns-healthcheck             running        -          -
~                                                  bosh-dns-resolvconf              running        -          -
~                                                  docker                           running        -          -
~                                                  kube-proxy                       running        -          -
~                                                  kubelet                          running        -          -
~                                                  nsx-kube-proxy                   running        -          -
~                                                  nsx-node-agent                   running        -          -
~                                                  ovs-vswitchd                     running        -          -
~                                                  ovsdb-server                     running        -          -
~                                                  pks-helpers-bosh-dns-resolvconf  running        -          -
worker/820e841a-e121-437a-823b-1642468e4ae9        -                                running        AZ-COMP-1  10.20.0.3
~                                                  blackbox                         running        -          -
~                                                  bosh-dns                         running        -          -
~                                                  bosh-dns-healthcheck             running        -          -
~                                                  bosh-dns-resolvconf              running        -          -
~                                                  docker                           running        -          -
~                                                  kube-proxy                       running        -          -
~                                                  kubelet                          running        -          -
~                                                  nsx-kube-proxy                   running        -          -
~                                                  nsx-node-agent                   running        -          -
~                                                  ovs-vswitchd                     running        -          -
~                                                  ovsdb-server                     running        -          -
~                                                  pks-helpers-bosh-dns-resolvconf  running        -          -
worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47        -                                running        AZ-COMP-2  10.20.0.4
~                                                  blackbox                         running        -          -
~                                                  bosh-dns                         running        -          -
~                                                  bosh-dns-healthcheck             running        -          -
~                                                  bosh-dns-resolvconf              running        -          -
~                                                  docker                           running        -          -
~                                                  kube-proxy                       running        -          -
~                                                  kubelet                          running        -          -
~                                                  nsx-kube-proxy                   running        -          -
~                                                  nsx-node-agent                   running        -          -
~                                                  ovs-vswitchd                     running        -          -
~                                                  ovsdb-server                     running        -          -
~                                                  pks-helpers-bosh-dns-resolvconf  running        -          -

5 instances

Succeeded

 

bosh cloud-check

https://bosh.io/docs/cli-v2#cloud-check
https://bosh.io/docs/cck/

BOSH provides the Cloud Check CLI command (a.k.a cck) to repair IaaS resources used by a specific deployment. It is not commonly used during normal operations; however, it becomes essential when some IaaS operations fail and the Director cannot resolve problems without a human decision or when the Resurrector is not enabled.

The Resurrector will only try to recover any VMs that are missing from the IaaS or that have non-responsive agents. The cck tool is similar to the Resurrector in that it also looks for those two conditions; however, instead of automatically trying to resolve these problems, it provides several options to the operator.

In addition to looking for those two types of problems, cck also checks correct attachment and presence of persistent disks for each deployment job instance.

 bosh [OPTIONS] cloud-check [cloud-check-OPTIONS]
[cloud-check command options]
      -a, --auto         Resolve problems automatically
          --resolution=  Apply resolution of given type
      -r, --report       Only generate report; don't attempt to resolve problems
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 cck
Using environment '10.10.0.3' as client 'ops_manager'

Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Task 203

Task 203 | 18:44:14 | Scanning 4 VMs: Checking VM states (00:00:07)
Task 203 | 18:44:21 | Scanning 4 VMs: 4 OK, 0 unresponsive, 0 missing, 0 unbound (00:00:00)
Task 203 | 18:44:21 | Scanning 4 persistent disks: Looking for inactive disks (00:00:35)
Task 203 | 18:44:56 | Scanning 4 persistent disks: 4 OK, 0 missing, 0 inactive, 0 mount-info mismatch (00:00:00)

Task 203 Started  Sun Mar 31 18:44:14 UTC 2019
Task 203 Finished Sun Mar 31 18:44:56 UTC 2019
Task 203 Duration 00:00:42
Task 203 done

#  Type  Description

0 problems

Succeeded

 

bosh events

https://bosh.io/docs/cli-v2#events

In addition to keeping a historical list of Director tasks for debugging history, the Director keeps detailed list of actions user and system took during its operation.

 bosh [OPTIONS] events [events-OPTIONS]
[events command options]
          --before-id=   Show events with ID less than the given ID
          --before=      Show events before the given timestamp (ex: 2016-05-08 17:26:32)
          --after=       Show events after the given timestamp (ex: 2016-05-08 17:26:32)
          --task=        Show events with the given task ID
          --instance=    Show events with given instance
          --event-user=  Show events with given user
          --action=      Show events with given action
          --object-type= Show events with given object type
          --object-name= Show events with given object name

Below is an excerpt as by default 200 events are returned.

$ bosh events
1220          Wed Mar 27 16:26:21 UTC 2019  ops_manager                                     release      lock         lock:deployment:pivotal-container-service-5c365f595dc7e6c897d6                                  159      pivotal-container-service-5c365f595dc7e6c897d6         -                                                                   -                                                                                                     -
1219 <- 1047  Wed Mar 27 16:26:21 UTC 2019  ops_manager                                     run          errand       upgrade-all-service-instances                                                                   159      pivotal-container-service-5c365f595dc7e6c897d6         pivotal-container-service/4f44daf6-44d8-4d45-bef3-b0b71461705f (0)  exit_code: 0                                                                                          -
1218          Wed Mar 27 16:26:06 UTC 2019  health_monitor                                  release      lock         lock:deployment:service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258                           167      service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  -                                                                   -                                                                                                     -
1217          Wed Mar 27 16:26:05 UTC 2019  health_monitor                                  acquire      lock         lock:deployment:service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258                           167      service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  -                                                                   -                                                                                                     -
1216          Wed Mar 27 16:26:05 UTC 2019  health_monitor                                  create       alert        dd88d138-1a48-45b5-a02a-2ad54e54cc2c                                                            -        service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258  -                                                                   message: 'Recreating unresponsive VM. Alert @ 2019-03-27 16:26:05 UTC, severity 4:                    -
                                                                                                                                                                                                                                                                                                                                                            Notifying Director to recreate instance: ''apply-addons/8cd3b226-5327-46ee-9e98-0fa2a900c857'';
                                                                                                                                                                                                                                                                                                                                                            deployment: ''service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258''; 1 of 5 agents
                                                                                                                                                                                                                                                                                                                                                            are unhealthy (20.0%)'

 

bosh ssh

https://bosh.io/docs/cli-v2#ssh

ssh into instance(s)

bosh [OPTIONS] ssh [ssh-OPTIONS] [INSTANCE-GROUP[/INSTANCE-ID]]
[ssh command options]
      -c, --command=                      Command
          --opts=                         Options to pass through to SSH
      -r, --results                       Collect results into a table instead of streaming
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 ssh master
Using environment '10.10.0.3' as client 'ops_manager'

Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Task 213. Done
Unauthorized use is strictly prohibited. All access and activity
is subject to logging and monitoring.
Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-43-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

Last login: Sun Mar 31 19:12:39 2019 from 10.10.0.2
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

master/320749e3-aa1e-4710-a200-7beb67762c8a:~$

 

monit

Monit is not a bosh command but I think is best mentioned here as its typically ran when bosh ssh’d on to an instance. On any BOSH managed VM, you can access Monit status for release jobs’ processes via Monit CLI. Before you can run the command you have to switch to become a root user (via sudo su) since Monit executable is only available to root users.

Each enabled release job has its own directory in /var/vcap/jobs/ directory. Each release job directory contains a monit file (e.g. /var/vcap/jobs/redis-server/monit) with final monit configuration for that release job. This is how you can tell which processes belong to which release job. Most release job only start a single process.

You can also get more detailed information about individual processes via monit status.

While debugging why certain process is failing it is usually useful to tell Monit to stop restarting the failing process. You can do so via monit stop <process-name> command. To start it back up use monit start <process-name> command.

monit summary
master/320749e3-aa1e-4710-a200-7beb67762c8a:~$ sudo su
master/320749e3-aa1e-4710-a200-7beb67762c8a:/var/vcap/bosh_ssh/bosh_94f00b3b4a94422# monit summary
The Monit daemon 5.2.5 uptime: 4d 4h 3m

Process 'kube-apiserver'            running
Process 'kube-controller-manager'   running
Process 'kube-scheduler'            running
Process 'etcd'                      running
Process 'blackbox'                  running
Process 'ncp'                       running
Process 'bosh-dns'                  running
Process 'bosh-dns-resolvconf'       running
Process 'bosh-dns-healthcheck'      running
Process 'pks-helpers-bosh-dns-resolvconf' running
System 'system_localhost'           running
monit status
master/320749e3-aa1e-4710-a200-7beb67762c8a:/var/vcap/bosh_ssh/bosh_94f00b3b4a94422# monit status
The Monit daemon 5.2.5 uptime: 4d 4h 5m

Process 'kube-apiserver'
  status                            running
  monitoring status                 monitored
  pid                               9836
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  548300
  memory kilobytes total            548300
  memory percent                    13.5%
  memory percent total              13.5%
  cpu percent                       2.4%
  cpu percent total                 2.4%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'kube-controller-manager'
  status                            running
  monitoring status                 monitored
  pid                               9902
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  106600
  memory kilobytes total            106600
  memory percent                    2.6%
  memory percent total              2.6%
  cpu percent                       2.4%
  cpu percent total                 2.4%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'kube-scheduler'
  status                            running
  monitoring status                 monitored
  pid                               9958
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  36072
  memory kilobytes total            36072
  memory percent                    0.8%
  memory percent total              0.8%
  cpu percent                       0.4%
  cpu percent total                 0.4%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'etcd'
  status                            running
  monitoring status                 monitored
  pid                               10016
  parent pid                        1
  uptime                            4d 4h 5m
  children                          1
  memory kilobytes                  3212
  memory kilobytes total            202712
  memory percent                    0.0%
  memory percent total              5.0%
  cpu percent                       0.0%
  cpu percent total                 0.9%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'blackbox'
  status                            running
  monitoring status                 monitored
  pid                               10051
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  14028
  memory kilobytes total            14028
  memory percent                    0.3%
  memory percent total              0.3%
  cpu percent                       0.4%
  cpu percent total                 0.4%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'ncp'
  status                            running
  monitoring status                 monitored
  pid                               10129
  parent pid                        1
  uptime                            4d 4h 5m
  children                          2
  memory kilobytes                  2776
  memory kilobytes total            73748
  memory percent                    0.0%
  memory percent total              1.8%
  cpu percent                       0.0%
  cpu percent total                 0.9%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'bosh-dns'
  status                            running
  monitoring status                 monitored
  pid                               9489
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  24636
  memory kilobytes total            24636
  memory percent                    0.6%
  memory percent total              0.6%
  cpu percent                       0.4%
  cpu percent total                 0.4%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'bosh-dns-resolvconf'
  status                            running
  monitoring status                 monitored
  pid                               9384
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  3760
  memory kilobytes total            3760
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'bosh-dns-healthcheck'
  status                            running
  monitoring status                 monitored
  pid                               9405
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  9740
  memory kilobytes total            9740
  memory percent                    0.2%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sun Mar 31 19:19:59 2019

Process 'pks-helpers-bosh-dns-resolvconf'
  status                            running
  monitoring status                 monitored
  pid                               9384
  parent pid                        1
  uptime                            4d 4h 5m
  children                          0
  memory kilobytes                  3760
  memory kilobytes total            3760
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sun Mar 31 19:19:59 2019

System 'system_localhost'
  status                            running
  monitoring status                 monitored
  load average                      [0.27] [0.44] [0.43]
  cpu                               3.9%us 4.4%sy 0.7%wa
  memory usage                      1107804 kB [27.4%]
  swap usage                        524 kB [0.0%]
  data collected                    Sun Mar 31 19:19:59 2019

 

bosh clean-up

https://bosh.io/docs/cli-v2#clean-up

Remove all unused releases, stemcells, etc.; otherwise most recent resources will be kept.

 bosh [OPTIONS] clean-up --all
$ bosh clean-up --all
Using environment '10.10.0.3' as client 'ops_manager'

Continue? [yN]: y

Task 215

Task 215 | 19:25:50 | Deleting releases: docker/35.0.0
Task 215 | 19:25:50 | Deleting releases: pks-nsx-t/1.19.0
Task 215 | 19:25:50 | Deleting releases: nsx-cf-cni/2.3.1.10693410
Task 215 | 19:25:50 | Deleting releases: pks-telemetry/2.0.0-build.113
Task 215 | 19:25:50 | Deleting releases: sink-resources-release/0.1.15
Task 215 | 19:25:50 | Deleting releases: kubo/0.25.9
Task 215 | 19:25:50 | Deleting packages: pks-nsx-t-cli/f8541c3ab3a3ddd8720ae4337cfda9d72fd9e9f2
Task 215 | 19:25:50 | Deleting jobs: docker/9106f2911a920980d955bda7bcdd42996e6a9b4c48374c293dff034caf2a1892
Task 215 | 19:25:50 | Deleting packages: sink-agent/d093297b1d5a3cadefaea61eee38ef924a9cf429
Task 215 | 19:25:50 | Deleting packages: cni/5502f221857b523ce73c29c1de24135f6d883b18
Task 215 | 19:25:50 | Deleting packages: billing-recorder/c4a73cafaedb4598eeaa302be97eb2c8f763bee7
Task 215 | 19:25:50 | Deleting jobs: docker/9106f2911a920980d955bda7bcdd42996e6a9b4c48374c293dff034caf2a1892 (00:00:00)
Task 215 | 19:25:50 | Deleting packages: pks-nsx-t-cli/f8541c3ab3a3ddd8720ae4337cfda9d72fd9e9f2 (00:00:00)
Task 215 | 19:25:50 | Deleting packages: cni/5502f221857b523ce73c29c1de24135f6d883b18 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-ncp-cleanup/901e0de4fdb24bcf248b60a8bc8df8a8aed36ca5
Task 215 | 19:25:51 | Deleting packages: sink-agent/d093297b1d5a3cadefaea61eee38ef924a9cf429 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: ncp_rootfs/1cf38e69b6a232870627e9b237df1521d27ff630
Task 215 | 19:25:51 | Deleting packages: billing-recorder/c4a73cafaedb4598eeaa302be97eb2c8f763bee7 (00:00:01)
Task 215 | 19:25:51 | Deleting releases: sink-resources-release/0.1.15 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: kubernetes/6d07c3c855858410b4d50cde2dcb216f460387e9aa8258780bdd961934b2b257
Task 215 | 19:25:51 | Deleting releases: docker/35.0.0 (00:00:01)
Task 215 | 19:25:51 | Deleting packages: send-crd/cef4ac64394b103b8512ccc6396178d8f55a5108
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-ncp-cleanup/901e0de4fdb24bcf248b60a8bc8df8a8aed36ca5 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-osb-proxy/b881c15160571d6b22e30df88729307904af3ffe
Task 215 | 19:25:51 | Deleting packages: ncp_rootfs/1cf38e69b6a232870627e9b237df1521d27ff630 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: nsx-cni/b2d11aeb068c1f8b11093746041503521a4f5645
Task 215 | 19:25:51 | Deleting packages: kubernetes/6d07c3c855858410b4d50cde2dcb216f460387e9aa8258780bdd961934b2b257 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: send-crd/cef4ac64394b103b8512ccc6396178d8f55a5108 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: send-to-vac/075751c4f6e1e9de76ad12558743d681300008c9
Task 215 | 19:25:51 | Deleting packages: pks-nsx-t-osb-proxy/b881c15160571d6b22e30df88729307904af3ffe (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kube-apiserver/792816434466dceb68402952aea817f09b489f85e56365b6e01fcf8cbb42981e
Task 215 | 19:25:51 | Deleting packages: nsx-cni/b2d11aeb068c1f8b11093746041503521a4f5645 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: nsx-cni-common/bf71c6d46900c360ddd6e0ec5c25ea7f4149ca0d
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-osb-proxy/868d40563f38b7c34f25f0975032faeb7925213a
Task 215 | 19:25:51 | Deleting packages: send-to-vac/075751c4f6e1e9de76ad12558743d681300008c9 (00:00:00)
Task 215 | 19:25:51 | Deleting packages: telemetry-agent-image/e852395d1dba4700757f6917fe8c794372fe6548
Task 215 | 19:25:51 | Deleting jobs: kube-apiserver/792816434466dceb68402952aea817f09b489f85e56365b6e01fcf8cbb42981e (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kube-controller-manager/bedc72c949209b3baf286d17604f0d90307f32d285481946f5ff28a0fd49db24
Task 215 | 19:25:51 | Deleting packages: nsx-cni-common/bf71c6d46900c360ddd6e0ec5c25ea7f4149ca0d (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-kube-proxy/1203863af9df1a2088b9080968009000ac0fb228
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-osb-proxy/868d40563f38b7c34f25f0975032faeb7925213a (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-ops-files/0e188d77bd76af84cbebe883a6744ca21fec71f7
Task 215 | 19:25:51 | Deleting jobs: kube-controller-manager/bedc72c949209b3baf286d17604f0d90307f32d285481946f5ff28a0fd49db24 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kubelet/c949e7cfa12980e770765b6b6b9d83d007cef7f1e4721487f730e044e1ec6f3e
Task 215 | 19:25:51 | Deleting packages: telemetry-agent-image/e852395d1dba4700757f6917fe8c794372fe6548 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-kube-proxy/1203863af9df1a2088b9080968009000ac0fb228 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: ncp/e015499c0e150d82d1f4d6cc18131665c6086cf0
Task 215 | 19:25:51 | Deleting jobs: telemetry-server/373c0ae8ade1198bfc7782e0d95aa98fdd48b3ce
Task 215 | 19:25:51 | Deleting jobs: pks-nsx-t-ops-files/0e188d77bd76af84cbebe883a6744ca21fec71f7 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: kubelet/c949e7cfa12980e770765b6b6b9d83d007cef7f1e4721487f730e044e1ec6f3e (00:00:00)
Task 215 | 19:25:51 | Deleting releases: pks-nsx-t/1.19.0 (00:00:01)
Task 215 | 19:25:51 | Deleting jobs: ncp/e015499c0e150d82d1f4d6cc18131665c6086cf0 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: nsx-node-agent/44920bf6488385cc9e276b183b088e74d47bbe57
Task 215 | 19:25:51 | Deleting jobs: telemetry-server/373c0ae8ade1198bfc7782e0d95aa98fdd48b3ce (00:00:00)
Task 215 | 19:25:51 | Deleting releases: kubo/0.25.9 (00:00:01)
Task 215 | 19:25:51 | Deleting releases: pks-telemetry/2.0.0-build.113 (00:00:01)
Task 215 | 19:25:51 | Deleting jobs: nsx-node-agent/44920bf6488385cc9e276b183b088e74d47bbe57 (00:00:00)
Task 215 | 19:25:51 | Deleting jobs: openvswitch/07485c61b5aaf8872db4b96e5e67341cb3ad6867 (00:00:00)
Task 215 | 19:25:51 | Deleting releases: nsx-cf-cni/2.3.1.10693410 (00:00:01)
Task 215 | 19:25:51 | Deleting orphaned disks: disk-25e22704-32cc-4b7d-8a32-8e2cc6a7d720
Task 215 | 19:25:51 | Deleting orphaned disks: disk-123f922e-b311-4dc6-9352-2c8568c25ee8 (00:00:20)
Task 215 | 19:26:15 | Deleting orphaned disks: disk-25e22704-32cc-4b7d-8a32-8e2cc6a7d720 (00:00:24)
Task 215 | 19:26:15 | Deleting dns blobs: DNS blobs (00:00:00)

Task 215 Started  Sun Mar 31 19:25:50 UTC 2019
Task 215 Finished Sun Mar 31 19:26:15 UTC 2019
Task 215 Duration 00:00:25
Task 215 done

Succeeded

 

bosh update-resurrection

https://bosh.io/docs/cli-v2#update-resurrection

Enables or disables resurrection globally. Note, this state is not reflected in the bosh instances command’s Resurrection column.

Quick recap on Resurrector. The Resurrector is a plugin to the Health Monitor. It’s responsible for automatically recreating VMs that become inaccessible. The Resurrector continuously cross-references VMs expected to be running against the VMs that are sending heartbeats. When resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director (scan and fix task) to try to “resurrect” that VM. The Director may do one of two things: 1) create a new VM if the old VM is missing from the IaaS. 2) replace a VM if the Agent on that VM is not responding to commands

 bosh [OPTIONS] update-resurrection on|off

 

bosh locks

https://bosh.io/docs/cli-v2/#locks

Lists current locks

bosh [OPTIONS] locks
$ bosh locks
Using environment '10.10.0.3' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Type        Resource                                               Task ID  Expires at
deployment  service-instance_c655c028-d671-4a81-be5b-5bc23652426a  510      Thu Feb 21 00:01:14 UTC 2019

1 locks

 

bosh errands

https://bosh.io/docs/cli-v2#errands
https://bosh.io/docs/errands/

Lists all errands defined by the deployment.

 bosh [OPTIONS] errands
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 errands
Using environment '10.10.0.3' as client 'ops_manager'

Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Name
apply-addons
apply-specs
drain-cluster
smoke-tests
telemetry-agent
wavefront-proxy-errand

6 errands

Succeeded

 

bosh run-errands

https://bosh.io/docs/cli-v2#run-errand
https://bosh.io/docs/errands/

Runs errand job by name

bosh [OPTIONS] run-errand [run-errand-OPTIONS] NAME
[run-errand command options]
          --instance=INSTANCE-GROUP[/INSTANCE-ID]    Instance or group the errand should run on (must specify errand by release job name)
          --keep-alive                               Use existing VM to run an errand and keep it after completion
          --when-changed                             Run errand only if errand configuration has changed or if the previous run was unsuccessful
          --download-logs                            Download logs
          --logs-dir=                                Destination directory for logs (default: .)

 

bosh logs

https://bosh.io/docs/cli-v2#logs

Download logs

 bosh [OPTIONS] logs [logs-OPTIONS] [INSTANCE-GROUP[/INSTANCE-ID]]
[logs command options]
          --dir=                          Destination directory (default: .)
      -f, --follow                        Follow logs via SSH
          --num=                          Last number of lines
      -q, --quiet                         Suppresses printing of headers when multiple files are being examined
          --job=                          Limit to only specific jobs
          --only=                         Filter logs (comma-separated)
          --agent                         Include only agent logs
$ bosh -d service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258 logs
Using environment '10.10.0.3' as client 'ops_manager'

Using deployment 'service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258'

Task 220

Task 220 | 20:39:28 | Fetching logs for worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for worker/820e841a-e121-437a-823b-1642468e4ae9 (0): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for worker/50384028-2098-4579-bb6e-5dfa5f36053c (2): Finding and packing log files
Task 220 | 20:39:28 | Fetching logs for master/320749e3-aa1e-4710-a200-7beb67762c8a (0): Finding and packing log files
Task 220 | 20:39:33 | Fetching logs for worker/820e841a-e121-437a-823b-1642468e4ae9 (0): Finding and packing log files (00:00:05)
Task 220 | 20:39:33 | Fetching logs for worker/50384028-2098-4579-bb6e-5dfa5f36053c (2): Finding and packing log files (00:00:05)
Task 220 | 20:39:34 | Fetching logs for worker/cc4be208-4de3-4aac-85ca-4f6aaf38ee47 (1): Finding and packing log files (00:00:06)
Task 220 | 20:40:06 | Fetching logs for master/320749e3-aa1e-4710-a200-7beb67762c8a (0): Finding and packing log files (00:00:38)
Task 220 | 20:40:07 | Fetching group of logs: Packing log files together

Task 220 Started  Sun Mar 31 20:39:28 UTC 2019
Task 220 Finished Sun Mar 31 20:40:07 UTC 2019
Task 220 Duration 00:00:39
Task 220 done

Downloading resource '5fb2e8de-584c-49e9-8b25-0a7a22ce6d03' to '/home/ubuntu/service-instance_695d3952-3e1e-47d7-8354-0c6ca35ae258-20190331-203823-446588011.tgz'...

##########################################################    96.52% 24.17 MiB/s
Succeeded

 

There are many other bosh commands but above are the ones I use often. And so that wraps up part 1 of my troubleshooting series.

Leave a Reply

Your email address will not be published. Required fields are marked *