Re: App running even after delete. Pointers on finding it and debugging?


Tom Sherrod <tom.sherrod@...>
 

Thank you, Eric.
The query into auth_username and password, prompted me to review the
manifest. Those were not correct and a typo in the cc_uploader, cc,
base_url. I've made the corrections.
I likely got out of sync between versions of diego and cf. I use
generate_manifest occasionally. Will need to use it again to get the
versions back in sync.

Thanks,
Tom

On Fri, Apr 8, 2016 at 9:29 PM, Eric Malm <emalm(a)pivotal.io> wrote:

Thanks, Tom. The errors about the 401 response code make me suspect that
the nsync-bulker doesn't have the correct basic-auth credentials for the
internal app-enumeration endpoint it queries on CC. Could you check whether
the diego.nsync.cc.basic_auth_username
and diego.nsync.cc.basic_auth_password properties in your Diego manifest
are the same as the cc.internal_api_user and cc.internal_api_password
properties in your CF manifest? There was also a previous pair of CF/Diego
release versions where those properties had different defaults for the user
names in the job specs, but I believe they match in CF v230 and Diego
v0.1450.0.

Best,
Eric

On Fri, Apr 8, 2016 at 5:33 PM, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:

Yes, the logs are there.
I grepped the logs for error. I see a lot of:


{"timestamp":"1460161846.254659176","source":"nsync-bulker","message":"nsync-bulker.sync.not-bumping-freshness-because-of","log_level":2,"data":{"
error":"invalid response code 401","session":"6713"}}


{"timestamp":"1460161876.286883593","source":"nsync-bulker","message":"nsync-bulker.sync.not-bumping-freshness-because-of","log_level":2,"data":{"
error":"invalid response code 401","session":"6714"}}


{"timestamp":"1460161906.315121412","source":"nsync-bulker","message":"nsync-bulker.sync.not-bumping-freshness-because-of","log_level":2,"data":{"
error":"invalid response code 401","session":"6715"}}


{"timestamp":"1460161936.352133274","source":"nsync-bulker","message":"nsync-bulker.sync.not-bumping-freshness-because-of","log_level":2,"data":{"
error":"invalid response code 401","session":"6716"}}


{"timestamp":"1460161966.383990765","source":"nsync-bulker","message":"nsync-bulker.sync.not-bumping-freshness-because-of","log_level":2,"data":{"
error":"invalid response code 401","session":"6717"}}


Let me know if there's something specific you wish to find.


Tom

On Fri, Apr 8, 2016 at 11:47 AM, Eric Malm <emalm(a)pivotal.io> wrote:

Thanks, Tom, glad you were able to use veritas to find and remove the
stray apps. I'd like to know how they remained present in the first place.
Do you have logs from the nsync-bulker jobs on the cc_bridge VMs in your
deployment? That BOSH job has the responsibility of updating the Diego
DesiredLRPs to match the current set of CF apps, so if there are
synchronization errors they should be present in those logs.

Thanks,
Eric, CF Runtime Diego PM

On Fri, Apr 8, 2016 at 8:09 AM, Kris Hicks <khicks(a)pivotal.io> wrote:

It would be nice to figure out the root cause here.

Does having two crashed and two apps have some significance as to why
the delete failed, though appeared successful?

On Friday, April 8, 2016, Tom Sherrod <tom.sherrod(a)gmail.com> wrote:

Thank you.

Veritas is quite informative. I found 2 apps running and 2 crashed.
I deleted them and all appears well.


On Mon, Apr 4, 2016 at 7:03 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Ok, I would use veritas to look at the Diego BBS, and confirm that it
still thinks the app is there. You can also go onto the router and query
its HTTP endpoint to confirm that the route you're seeing is also still
there: https://github.com/cloudfoundry/gorouter#instrumentation.
Lastly I would connect to the CCDB and confirm that the app and route are
*not* there. This will reduce the problem to figuring out why Diego isn't
being updated to know that the non-existing app is no longer desired.

On Mon, Apr 4, 2016 at 3:47 PM, Tom Sherrod <tom.sherrod(a)gmail.com>
wrote:

The route still exists. I was reluctant to delete it and have the
"app" still running. I wanted some way to track it down, not that it has
helped, other than let me know it is still running.

Pushed the app, with a different name/host, with no problems and it
runs as it should.

On Mon, Apr 4, 2016 at 6:17 PM, Amit Gupta <agupta(a)pivotal.io>
wrote:

Tom,

So you're saying that none of the org/spaces shows the app or the
route, but the app continues to run and be routeable?

I could imagine this happen if some CC Bridge components are not
able to talk to either CC or Diego BBS, leaving the data in the Diego BBS
stale. In the case of stale info, Diego may not know that the LRP is no
longer desired, and it will do the safe thing of keeping it around, and
emitting its route to the gorouter, which just does what it's told (it
doesn't check whether CC knows about the route or not).

Are you able to push new apps or delete other apps with the Diego
backend?

Amit

On Fri, Apr 1, 2016 at 1:00 PM, Tom Sherrod <tom.sherrod(a)gmail.com>
wrote:

JT,

Thanks for responding.

This is a test runtime and small. I checked all orgs and spaces.
No routes matching the app.

Found the route information and the result:

{

"total_results": 0,

"total_pages": 1,

"prev_url": null,

"next_url": null,

"resources": []

}

To learn what the output may look like, I check existing routes
with apps and without. The output appears to be the same as if the app has
been deleted.

Even now, the app url still returns a page from the app, even
though it is deleted.

Thanks,

Tom

On Fri, Apr 1, 2016 at 1:52 PM, JT Archie <jarchie(a)pivotal.io>
wrote:

Tom,

Are you sure the route isn't bound to another application in
another org/space?

When you do `cf routes` it only show routes for the current
space. You can hit specific API endpoints though to get all the apps for a
route.

For example, `cf
curl /v2/routes/89fc2a5e-3a9b-4a88-a360-e405cdbd6f87/apps` will show all
the apps for a particular route. Obviously replacing the route ID with the
correct ID. To find that, I recommend going through `CF_TRACE=true cf
routes` and grabbing the ID.

Let see if you can hunt it down that way.

Kind Regards,

JT

On Fri, Apr 1, 2016 at 8:51 AM, Tom Sherrod <
tom.sherrod(a)gmail.com> wrote:

cf 230, diego 0.1450.0, etcd 27, garden-linux 0.330.0
Default to diego true.

Developer deployed a java application. Deleted the application:
cf delete <app> No errors.
The app still responds. The only thing left is the route.
I've not encountered this before. Delete has been delete and
even if route remains, 404 Not Found: Requested route ('<hostname.domain>')
does not exist. is returned.

Pointers on tracking this down appreciated.

Tom

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.