Re: HM9000 gets stuck in bad state


Amit Kumar Gupta
 

This is not a known issue. Can you copy full log lines (including
timestamps, other metadata, error message, etc.) and paste them into a Gist
or Pastebin, and share the link?

Also, can you get a session on one of your etcd nodes (so that etcd is
reachable via localhost) and share the output of the following queries?

curl http://localhost:4001/v2/keys/hm/v4/actual-fresh
curl http://localhost:4001/v2/keys/hm/v4/desired-fresh

Note, you might need a different version than v4. You can figure out the
correct version by querying:

curl http://localhost:4001/v2/keys/hm

When I do so, I get the output:

{"action":"get","node":{"key":"/hm","dir":true,"nodes":[{"key":"/hm/locks","dir":true,"modifiedIndex":5,"createdIndex":5},{"key":"/hm/v4","dir":true,"modifiedIndex":15,"createdIndex":15}],"modifiedIndex":5,"createdIndex":5}}

Note the part that says "key":"/hm/v4", that's how you can determine
whether you need to query the v4 API or some other version.

On Wed, Oct 21, 2015 at 9:16 AM, kyle havlovitz <kylehav(a)gmail.com> wrote:

I'm having an issue where after a day or two or running, the health
manager is getting stuck in a bad state and doesn't display the state of
apps correctly. The logs show messages like this: "Store is not fresh -
Error:Actual and desired state are not fresh" and "Daemon returned an
error. Continuining... - Error:Actual and desired state are not fresh".

Restarting the process fixes the issue, but I'm wondering how to avoid
this problem altogether. Is this a known issue?

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.