Re: How to tune ETCD performance for bbs job


Jason Huang
 

Do you know what was used as the Diego store in the test reported below? Is
it etcd or a relational database? If the later, what was it?
https://content.pivotal.io/blog/250k-containers-in-production-a-real-test-for-the-real-world
It is reported that 250k containers were running in one environment and
performed well.

Thanks,

Jason

On Fri, Apr 28, 2017 at 7:38 AM, Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Maggie,

You may be hitting some performance issues with regard to the load on the
Diego etcd versus the capabilities of the disks it's writing to. Adding
more etcd nodes doesn't improve performance because etcd is a consistent
system, so all of the nodes are active and writing changes to disk. Instead
of tuning the etcd resource configuration, I would recommend you migrate
your Diego deployment to a relational store (MySQL or Postgres): you're
already on Diego v1.0.0, which supports it, and Diego v1.2.0 and later
require you to migrate.

Best,
Eric, CF Diego PM

On Thu, Apr 27, 2017 at 7:39 PM, Meng, Xiangyi <Xiangyi.Meng(a)dell.com>
wrote:

Hi, Juan



Yes, I mean from DEA to Diego. Sorry for the typo.



Thanks,

Maggie



*From:* Juan Pablo Genovese [mailto:juanpgenovese(a)gmail.com]
*Sent:* Thursday, April 27, 2017 11:59 PM
*To:* Discussions about Cloud Foundry projects and the system overall. <
cf-dev(a)lists.cloudfoundry.org>
*Subject:* [cf-dev] Re: How to tune ETCD performance for bbs job



Maggie,



I'm not sure, but, do you mean migrating *from* DEA to Diego?



Thank you!!



2017-04-27 4:03 GMT-03:00 Meng, Xiangyi <Xiangyi.Meng(a)dell.com>:

Hi,



We are migrating our application from Diego backend to Dea backend. But
recently we experienced some intermittent failures when pushing application
or fetching application status. We found below errors from bbs.stdout.log



*{"timestamp":"1493197381.010582685","source":"bbs","message":"bbs.request.tasks.tasks.etcd-error.unknown-error","log_level":2,"data":{"error":"501:
All the given peers are not reachable (failed to propose on members
[https://etcd.service.cf.internal:4001
<https://etcd.service.cf.internal:4001>] twice [last error: Unexpected HTTP
status code])
[0]","method":"POST","request":"/v1/tasks/list.r1","revision":0,"session":"1280311.1.1.1"}}*



And quite a lot of errors from etcd.stderr.log.



*etcdhttp: got unexpected response error (etcdserver: request timed out)*



So I added two more database jobs to compose a etcd cluster. But still I
can find same error messages from bbs log and etcd log on the first
database job.



I suppose all etcd nodes should accept read/write action. But only one
bbs node accepts access. Am I right? Why the errors are only found from the
first database job?



And my question is what is the suggested configuration for etcd cluster
and bbs node? Do we have to use relational database such as mysql instead
of ETCD?



Our env is CF 249 + Diego 1.0.0 + Etcd 86.



Any help would be appreciated.



Thanks,

Maggie





--

Mis mejores deseos,
Best wishes,
Meilleurs vœux,

Juan Pablo
------------------------------------------------------
http://www.jpgenovese.com

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.