Re: cf v231: Issue with new webdav blobstore job


Marco Nicosia
 

On Tue, Mar 22, 2016 at 2:07 PM, Rich Wohlstadter <lethwin(a)gmail.com> wrote:

Hi,

We recently upgraded to cf v231 and switched over from using nfs to the
new webdav nginx service. We have one environment where the blobstore is
very large. The monit startup script for the blobstore job includes a
recursive chown of the blobstore disk (chown -R vcap:vcap $RUN_DIR $LOG_DIR
$DATA) which depending on the speed of our storage can sometimes take a
long enough time for monit to have issues and try and start it again. The
first one will finish, but monit will try and start another one due to the
delay and logging will start showing errors binding to port 80 and monit
will eventually give up saying execution failed. Does that recursive chown
need to be there? I compared the blobstore job to the old debian nfs job
and the nfs job just did a chown on the toplevel /var/vcap/store/shared
directory. This is causing us issues in this environment whenever we need
to update/restart that vm.

We solved a similar problem in MySQL by doing that work in a pre-start
<https://bosh.io/docs/pre-start.html> script. Timeouts don't apply to pre-
and post-start phases, so you can do lengthy transformations there.

Would that be a reasonable solution for the WebDav transformation?

Unfortunately, those phases haven't got timeouts. The release author is
responsible for any failures which result in an infinite hang.

--
Marco Nicosia
Product Manager
Pivotal Software, Inc.



Rich

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.