John Wong

Hi Dieu

Thank you for the answers. They are very helpful.

Regarding #4, you are right, I believe when I do CF deployment I get these
short-lived VMs that compile different CF jobs.

Regarding #5, I think it is doppler in our latest deployment (v193, I know
still behind the most current version). I think in very old CF version
(as seen in the documentation mentions syslog loggreator).

So we probably don't need to worry about syslog then.

It seems like these are the one we can run >=2
Log traffic controller
NFS (use s3 in our case)
Postgres (use RDS in our case)

These are the one not to run with > 1

Not sure:
stats server (metro agent?)



On Wed, May 6, 2015 at 2:27 AM, Dieu Cao <dcao(a)> wrote:

1) I'll ask our doc team to clarify the title of the section.
It's not recommended to run more than 1 collector. This component
collects metrics from system components. We use it in combination with
Datadog to monitor the many components of cloud foundry. This component is
not strictly required for an HA system.

2) HM9000 can have multiple active instances. No need for a standby mode.

3) The Cloud Controller clock periodically schedules Cloud Controller
clean up tasks for app usage events, audit events, failed jobs, and more.
Only single instance of this job is necessary.

4) Likely the job called api_workers is actually the cloud controller
workers. These are not compilation vms.
Cloud Controller worker processes background tasks submitted via clients
of the api.

5) I'm not sure what you mean by this. Do you mean loggregator? or doppler?

Is up to

1) Why is collector listed 1 but in scalable process table?

2) How do you run a second Health Manager in standby mode if only 1 can
run at any time?

3) Do we still need clock job? Is it also 1 instance?

4) I notice I have a job called api_workers, and I believe that's
compilation machine. I run two of these 24x7, is that necessary? The doc
said it is active if we need to compile things (say deploying a new
release). Is that all? I don't think they handle application code

5) What about syslog? Can it have 2? I understand we have to choose what
to be HA or not...I am not sure "the BOSH resurrector will recover the VM
if it becomes non-responsive" convinces me because all of these jobs are
deployed with BOSH but if BOSH is down I am facing some outage. I know Dr.
Nic has some article regarding HA bosh.

Correct me if I am wrong.



