Date
1 - 3 of 3
App deployment hangs in legacy CF installation
John Wong
Hi.
We are running on an extremely old version of CF (we are in the process of building one based on the latest), so I know there is very little the community may be able to help. But regardless... let me give it a try. In my debug session, I tried to deploy a hello world app, and deployment stopped with "STARTED" and eventually timeout. The full log: https://gist.githubusercontent.com/yeukhon/666fa1936ef3473c6de6/raw/1f662b86e806ab1fff230f5558f4942d9785c584/gistfile1.txt I can easily reproduce this when I did two concurrent push. Sometimes they go through, sometimes they don't. We have looked at every log in CF and we don't have any lead. I did bosh restart JOB hoping it was caused by a slow process, but that did not help. I found ntp was not installed on some of the components (we installed ntp on all of the DEAs), and i found clock was not synced so I synced the clocked, and still no help. Any idea where I should look at? I thought about our EC2 instance health but all of them seem to be healthy. I am considering relaunching (bosh recreate) one component at a time. The one thing I did notice is I am constantly deploying to a couple DEAs. I will look into them but I am not sure... Any ideas will be appreciated. Thanks. John
|
|
James Bayer
once you get to this line where you make the app started [1], then the next
toggle quoted messageShow quoted text
step is that the cloud controller should be sending a NATS message targeted at a particular DEA selected to run the app. so you could monitor: * NATS to see if you see the CC sending the NATS message * the DEA logs to see if it receives the message * the DEA to logs see if it is able to react to the message once it receives it we have had issues in the past where NATS issues on client/server communication were addressed with restarting clients and servers, but it's been quite awhile. letting us know which cf-release you are using could help. [1] https://gist.github.com/yeukhon/666fa1936ef3473c6de6#file-gistfile1-txt-L534
On Mon, Jun 29, 2015 at 7:20 AM, John Wong <gokoproject(a)gmail.com> wrote:
Hi. --
Thank you, James Bayer
|
|
John Wong
Hi James
toggle quoted messageShow quoted text
Thanks for the info. I and my team greatly appreciate your time here. I believe we are running on v153 (or close to that), which is very old. I will have a look at those components more closely. A symptom we observe is sometimes an app deployed successfully, the app would crash in a few minutes even without activity. What we see is socket closed on read error (which indicates IMO the container was killed and the logger could not contact it). John
On Mon, Jun 29, 2015 at 1:35 PM, James Bayer <jbayer(a)pivotal.io> wrote:
once you get to this line where you make the app started [1], then the
|
|