Can a Bosh deployment be aborted when drain script returns non-zero exit code?


Carol Morneau
 

Hi Everybody,

Can someone explain what bosh does when a drain-script returns a non-zero exit code? From my testing, it appears to carry-on with the deployment no matter what the exit code is.

Here's a drain-script I'd hope would abort a deployment:
#!/bin/bash
echo 0
exit 1

I would like to be able to abort a deployment when the drain-script returns a non-zero code. Is there a way I could achieve that?

Thanks for you help,
Carol


Dmitriy Kalinin
 

currently bosh only relies on stdout of the script to determine success. given this historic behaviour its a bit hard to migrate to exit code paradigm (similar to our other scripts) without unintentionally breaking a few drain scripts. its definitely something we want to get done at some point though.

Sent from my iPhone

On Oct 19, 2017, at 8:33 AM, Carol Morneau <carol.morneau(a)solace.com> wrote:

Hi Everybody,

Can someone explain what bosh does when a drain-script returns a non-zero exit code? From my testing, it appears to carry-on with the deployment no matter what the exit code is.

Here's a drain-script I'd hope would abort a deployment:
#!/bin/bash
echo 0
exit 1

I would like to be able to abort a deployment when the drain-script returns a non-zero code. Is there a way I could achieve that?

Thanks for you help,
Carol


Carol Morneau
 

Thanks Dmitriy for your input.

Here's an example of a drain script that does abort the deployment:

#!/bin/bash
set -e
false


Dmitriy Kalinin
 

it fails most likely due to this:
https://github.com/cloudfoundry/bosh-agent/blob/master/agent/script/drain/concrete_script.go#L157

On Thu, Oct 19, 2017 at 2:26 PM, Carol Morneau <carol.morneau(a)solace.com>
wrote:

Thanks Dmitriy for your input.

Here's an example of a drain script that does abort the deployment:

#!/bin/bash
set -e
false


Carol Morneau
 

You're exactly right.

From /var/vcap/bosh/log:

2017-10-19_20:09:16.76704 [Cmd Runner] 2017/10/19 20:09:16 DEBUG - Stderr:
2017-10-19_20:09:16.76704 [Cmd Runner] 2017/10/19 20:09:16 DEBUG - Successful: false (1)
2017-10-19_20:09:16.76705 [ParallelScript] 2017/10/19 20:09:16 ERROR - '/var/vcap/jobs/containers/bin/drain' script has failed with error: Script did not return a signed integer: strconv.ParseInt: parsing "": invalid syntax
2017-10-19_20:09:16.76705 [Drain Action] 2017/10/19 20:09:16 DEBUG - Got a result
2017-10-19_20:09:16.76706 [Task Service] 2017/10/19 20:09:16 ERROR - Failed processing task #01b91a43-4f6e-4b52-5c65-0d7fd18d0b4d got: 1 of 1 drain scripts failed. Failed Jobs: containers.

Would you recommend a better way to fail a deployment?