Re: 3 etcd nodes don't work well in single zone
Tony
Hi Amit,
Here is the latest logs I got from etcd and hm9k (I use scp instead of bosh logs to avoid missing something) immediately after finishing test. May I mention that there is a test folder in the zip file: test-etcd.sh is a simple script I use, it sends cf app dora every second, and records responses in status.log, if instance number changed ,then records it in variation.log In in variation.log, you can see the instance number varies between 2/2 and ?/2 eight times within about 10 minutes. Thu Jul 23 08:39:29 UTC 2015 instances: 2/2 Thu Jul 23 08:42:59 UTC 2015 instances: ?/2 Thu Jul 23 08:43:36 UTC 2015 instances: 2/2 Thu Jul 23 08:44:55 UTC 2015 instances: ?/2 Thu Jul 23 08:45:32 UTC 2015 instances: 2/2 Thu Jul 23 08:48:31 UTC 2015 instances: ?/2 Thu Jul 23 08:49:02 UTC 2015 instances: 2/2 Thu Jul 23 08:50:05 UTC 2015 instances: ?/2 Thu Jul 23 08:50:41 UTC 2015 instances: 2/2 The start time of this test is “Thu Jul 23 08:39:29 UTC 2015” , it is around "timestamp":1437640773, so I delete most of content before 143763… to make the logs clear. I didn’t delete any log after 1437640773. If you see the last line of some file(e.g. hm9000_sender.log) is before 1437640773, that just means it didn’t print any log since then. And I find that at the moments it varies, there isn’t any error recorded in etcd log. So it seems that the problem is in hm. I’m not sure. Regards, Tony From: Amit Gupta [via CF Dev] [mailto:ml-node+s70369n810h86(a)n6.nabble.com] Sent: Wednesday, 22 July 2015 10:09 AM To: Li, Tony Subject: Re: [cf-dev] 3 etcd nodes don't work well in single zone Hi Tony, The logs you've retrieved only go back to Jul 21, which I can't correlate with the "?/2" issues you were seeing. If you could possibly record again a bunch of occurrences of flapping between "2/2" and "?/2" for an app (along with datetime stamps), and then immediately get logs from *all* the HM and etcd nodes (`bosh logs` only gets logs from one node at a time), I can try to dig in more. It's important to get the logs from the HM and etcd VMs soon after recording the "?/2" events, otherwise BOSH may rotate/archive the logs and then make them harder to obtain. Best, Amit On Tue, Jul 21, 2015 at 4:53 PM, Amit Gupta <[hidden email]</user/SendEmail.jtp?type=node&node=810&i=0>> wrote: You should definitely not run etcd with 2 instances. You can read more about recommended cluster sizes in the etcd docs: https://github.com/coreos/etcd/blob/740187f199a12652ca1b7bddb7b3489160103d84/Documentation/admin_guide.md#fault-tolerance-table I will look at the attached logs and get back to you, but wanted to make sure to advise you to run either 1 or 3 nodes. With 2, you can wedge the system, because it will need all nodes to be up to achieve quorum. If you roll one of the two nodes, it will not be able to rejoin the cluster, and the service will be stuck in an unavailable state. ----- Amit, CF OSS Release Integration PM Pivotal Software, Inc. -- View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-3-etcd-nodes-don-t-work-well-in-single-zone-tp746p809.html Sent from the CF Dev mailing list archive at Nabble.com. _______________________________________________ cf-dev mailing list [hidden email]</user/SendEmail.jtp?type=node&node=810&i=1> https://lists.cloudfoundry.org/mailman/listinfo/cf-dev _______________________________________________ cf-dev mailing list [hidden email]</user/SendEmail.jtp?type=node&node=810&i=2> https://lists.cloudfoundry.org/mailman/listinfo/cf-dev Amit, CF OSS Release Integration PM Pivotal Software, Inc. ________________________________ If you reply to this email, your message will be added to the discussion below: http://cf-dev.70369.x6.nabble.com/cf-dev-3-etcd-nodes-don-t-work-well-in-single-zone-tp746p810.html To unsubscribe from [cf-dev] 3 etcd nodes don't work well in single zone, click here<http://cf-dev.70369.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=746&code=VG9ueWxAZmFzdC5hdS5mdWppdHN1LmNvbXw3NDZ8LTQ5MjU5Njk1Nw==>. NAML<http://cf-dev.70369.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> Disclaimer The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof. Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached. If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(a)fast.au.fujitsu.com logs.zip (103K) <http://cf-dev.70369.x6.nabble.com/attachment/847/0/logs.zip> -- View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-3-etcd-nodes-don-t-work-well-in-single-zone-tp746p847.html Sent from the CF Dev mailing list archive at Nabble.com.
|
|