Installing Diego feedback


Mike Heath
 

On Wed, Jul 1, 2015 at 2:46 PM Eric Malm <emalm(a)pivotal.io> wrote:

Hi, Mike,

Thanks for the feedback! Responses inline below.

On Tue, Jun 30, 2015 at 5:05 PM, Mike Heath <elcapo(a)gmail.com> wrote:

I just got Diego successfully integrated and deployed in my Cloud Foundry
dev environment. Here's a bit of feedback.

One of the really nice features of BOSH is that you can set a property
once and any job that needs that property can consume it. Unfortunately,
the Diego release takes this beautiful feature and throws it out the
window. The per-job name spaced properties suck. Sure this would be easier
if I were using Spiff but our existing deployments don't use Spiff. Unless
Spiff is the only supported option for using the Diego BOSH release, the
Diego release properties need to be fixed to avoid the mass duplication and
properties that much up with properties in cf-release should be renamed. I
spent more time matching up duplicate properties than anything else which
is unfortunate since BOSH should have relieved me of this pain.
We intentionally decided to namespace these component properties very
early on in the development of diego-release: initially everything was
collapsed, as it is in cf-release, and then when we integrated against
cf-release deployments and their manifests, we ended up with some property
collisions, especially with etcd. Consequently, we took the opposite tack
and scoped all those properties to the individual diego components to keep
them decoupled. I've generally found it helpful to think of them as 'input
slots' to each specific job, with the authoritative input value coming from
some other source (often a cf-release property), but as you point out that
can be painful and error-prone without another tool such as spiff to
propagate the values. As we explore how we might reorganize parts of
cf-release and diego-release into more granular releases designed for
composition, and as BOSH links emerge to give us richer semantics about how
to flow property information between jobs, we'll iterate on these patterns.
As an immediate workaround, you could also use YAML anchors and aliases to
propagate those values in your hand-crafted manifest.
So, I certainly like the idea of namespacing Diego specific properties. The
job level granularity is excessive though. cf-release is also very old so a
lot of its properties could be rethought/reorganized. Just warn us when you
make changes. :)

And yeah, I'm already using anchors and aliases all over.




SSH Proxy doesn't support 2048 bit RSA keys. I get this error:

{"timestamp":"1435189129.986424685","source":"ssh-proxy","message":"ssh-proxy.failed-to-parse-host-key","log_level":3,"data":{"error":"crypto/rsa:
invalid exponents","trace":"goroutine 1 [running]:\
ngithub.com/pivotal-golang/lager.(*logger).Fatal(0xc2080640c0, 0x8eba10,
0x18, 0x7fa802383b00, 0xc20802ad80, 0x0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/pivotal-golang/lager/logger.go:131
+0xc8\nmain.configure(0x7fa8023886e0, 0xc2080640c0, 0x7fa8023886e0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:167
+0xacb\nmain.main()\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:75
+0xb4\n"}}

1024-bit keys work just fine.

The *.cc.external_port properties should have a default value (9022) just
like cc.external_port does in the cloud_controller_ng job in cf-release.

In the receptor job, there's a property diego.receptor.nats.username but
every other job (in cf-release and diego-release) uses nats.user rather
than nats.username.
We could standardize on nats.user everywhere (the route-emitter needs
these NATS properties, too, and it also currently uses nats.username). I
also think it makes sense to supply that default CC port in the job specs
and to make sure our spiff templates supply overrides from the cf manifest
correctly. I'll add a story to straighten these out.


Rather than deploy two etcd jobs, I'm just using the etcd job provided by
cf-release. Is there a reason not to do this? Everything appears to be
working fine. I haven't yet run the DATs though.
I agree with Matt: these two etcd clusters will soon become operationally
distinct as we secure access to Diego's internal etcd. I don't believe
anything will currently collide in the keyspace, but we also can't make
strong guarantees about that.
Thanks for the clarification. If anythings colliding the keyspace, I
haven't found it yet. :) I'll fix my deployment.




Consul is great and all but in my dev environment the Consul server
crashed a couple of times and it took a while to discover that the reason
CF crapped out was was because Consul DNS lookups were broken. Is Consul a
strategic solution or is it just a stop gap until BOSH Links are ready? (I
would prefer removing Consul in favor of BOSH links, for the record.)
So far, Consul has provided us with a level of dynamic DNS-based service
discovery beyond what it sounds like BOSH links can: for example, if one of
the receptors is down for some reason, it's removed from the
consul-provided DNS entries in a matter of seconds. That said, we're also
exploring other options to provide that type of service discovery, such as
etcd-backed SkyDNS.
Yeah, that makes sense. I suppose I'm used to everything in cf-release
going through the Gorouter for automatic fail-over. Thanks for the response.



Thanks,
Eric
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Eric Malm <emalm@...>
 

Hi, Mike,

Thanks for the feedback! Responses inline below.

On Tue, Jun 30, 2015 at 5:05 PM, Mike Heath <elcapo(a)gmail.com> wrote:

I just got Diego successfully integrated and deployed in my Cloud Foundry
dev environment. Here's a bit of feedback.

One of the really nice features of BOSH is that you can set a property
once and any job that needs that property can consume it. Unfortunately,
the Diego release takes this beautiful feature and throws it out the
window. The per-job name spaced properties suck. Sure this would be easier
if I were using Spiff but our existing deployments don't use Spiff. Unless
Spiff is the only supported option for using the Diego BOSH release, the
Diego release properties need to be fixed to avoid the mass duplication and
properties that much up with properties in cf-release should be renamed. I
spent more time matching up duplicate properties than anything else which
is unfortunate since BOSH should have relieved me of this pain.
We intentionally decided to namespace these component properties very early
on in the development of diego-release: initially everything was collapsed,
as it is in cf-release, and then when we integrated against cf-release
deployments and their manifests, we ended up with some property collisions,
especially with etcd. Consequently, we took the opposite tack and scoped
all those properties to the individual diego components to keep them
decoupled. I've generally found it helpful to think of them as 'input
slots' to each specific job, with the authoritative input value coming from
some other source (often a cf-release property), but as you point out that
can be painful and error-prone without another tool such as spiff to
propagate the values. As we explore how we might reorganize parts of
cf-release and diego-release into more granular releases designed for
composition, and as BOSH links emerge to give us richer semantics about how
to flow property information between jobs, we'll iterate on these patterns.
As an immediate workaround, you could also use YAML anchors and aliases to
propagate those values in your hand-crafted manifest.


SSH Proxy doesn't support 2048 bit RSA keys. I get this error:

{"timestamp":"1435189129.986424685","source":"ssh-proxy","message":"ssh-proxy.failed-to-parse-host-key","log_level":3,"data":{"error":"crypto/rsa:
invalid exponents","trace":"goroutine 1 [running]:\
ngithub.com/pivotal-golang/lager.(*logger).Fatal(0xc2080640c0, 0x8eba10,
0x18, 0x7fa802383b00, 0xc20802ad80, 0x0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/pivotal-golang/lager/logger.go:131
+0xc8\nmain.configure(0x7fa8023886e0, 0xc2080640c0, 0x7fa8023886e0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:167
+0xacb\nmain.main()\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:75
+0xb4\n"}}

1024-bit keys work just fine.

The *.cc.external_port properties should have a default value (9022) just
like cc.external_port does in the cloud_controller_ng job in cf-release.

In the receptor job, there's a property diego.receptor.nats.username but
every other job (in cf-release and diego-release) uses nats.user rather
than nats.username.
We could standardize on nats.user everywhere (the route-emitter needs these
NATS properties, too, and it also currently uses nats.username). I also
think it makes sense to supply that default CC port in the job specs and to
make sure our spiff templates supply overrides from the cf manifest
correctly. I'll add a story to straighten these out.


Rather than deploy two etcd jobs, I'm just using the etcd job provided by
cf-release. Is there a reason not to do this? Everything appears to be
working fine. I haven't yet run the DATs though.
I agree with Matt: these two etcd clusters will soon become operationally
distinct as we secure access to Diego's internal etcd. I don't believe
anything will currently collide in the keyspace, but we also can't make
strong guarantees about that.


Consul is great and all but in my dev environment the Consul server
crashed a couple of times and it took a while to discover that the reason
CF crapped out was was because Consul DNS lookups were broken. Is Consul a
strategic solution or is it just a stop gap until BOSH Links are ready? (I
would prefer removing Consul in favor of BOSH links, for the record.)
So far, Consul has provided us with a level of dynamic DNS-based service
discovery beyond what it sounds like BOSH links can: for example, if one of
the receptors is down for some reason, it's removed from the
consul-provided DNS entries in a matter of seconds. That said, we're also
exploring other options to provide that type of service discovery, such as
etcd-backed SkyDNS.

Thanks,
Eric


Mike Heath
 

I have created and used quite a few BOSH releases and the property
namespacing feels very odd to me. I'm very curious to understand the
reasoning behind it.

I can't reproduce my 2048-bit key problem. It must have been some odd fluke
on my end.

Thanks for the response!

-Mike

On Tue, Jun 30, 2015 at 8:37 PM Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

Thanks for the feedback. I'll let others comment on the bosh aspects other
than to say that we are expecting people to use spiff to generate the
manifests and that the decision to namespace properties was intentional.

For the SSH proxy, it absolutely does support 2048 bit RSA keys so I'm not
sure why you ran into a problem. Our bosh-lite template uses a 2014 bit key
and we have tests that use 1024 and 2048 bit keys in CI. If you want to dig
into that, please open an issue.

As for consul, it's TBD whether or not it becomes a strategic solution but
it offers capabilities above and beyond bosh links. We kicked off some work
today to look at recreating the health checks and dns resolution with a sky
dns + etcd solution. If that looks promising, we'll probably go in that
direction.

On the etcd side, it's probably best not to share the two for now. Diego
is in the process of enabling mutual auth over SSL - something that
probably won't be done in cf-release any time soon.

On Tue, Jun 30, 2015 at 8:05 PM, Mike Heath <elcapo(a)gmail.com> wrote:

I just got Diego successfully integrated and deployed in my Cloud Foundry
dev environment. Here's a bit of feedback.

One of the really nice features of BOSH is that you can set a property
once and any job that needs that property can consume it. Unfortunately,
the Diego release takes this beautiful feature and throws it out the
window. The per-job name spaced properties suck. Sure this would be easier
if I were using Spiff but our existing deployments don't use Spiff. Unless
Spiff is the only supported option for using the Diego BOSH release, the
Diego release properties need to be fixed to avoid the mass duplication and
properties that much up with properties in cf-release should be renamed. I
spent more time matching up duplicate properties than anything else which
is unfortunate since BOSH should have relieved me of this pain.

SSH Proxy doesn't support 2048 bit RSA keys. I get this error:

{"timestamp":"1435189129.986424685","source":"ssh-proxy","message":"ssh-proxy.failed-to-parse-host-key","log_level":3,"data":{"error":"crypto/rsa:
invalid exponents","trace":"goroutine 1 [running]:\
ngithub.com/pivotal-golang/lager.(*logger).Fatal(0xc2080640c0, 0x8eba10,
0x18, 0x7fa802383b00, 0xc20802ad80, 0x0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/pivotal-golang/lager/logger.go:131
+0xc8\nmain.configure(0x7fa8023886e0, 0xc2080640c0, 0x7fa8023886e0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:167
+0xacb\nmain.main()\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:75
+0xb4\n"}}

1024-bit keys work just fine.

The *.cc.external_port properties should have a default value (9022) just
like cc.external_port does in the cloud_controller_ng job in cf-release.

In the receptor job, there's a property diego.receptor.nats.username but
every other job (in cf-release and diego-release) uses nats.user rather
than nats.username.

Rather than deploy two etcd jobs, I'm just using the etcd job provided by
cf-release. Is there a reason not to do this? Everything appears to be
working fine. I haven't yet run the DATs though.

Consul is great and all but in my dev environment the Consul server
crashed a couple of times and it took a while to discover that the reason
CF crapped out was was because Consul DNS lookups were broken. Is Consul a
strategic solution or is it just a stop gap until BOSH Links are ready? (I
would prefer removing Consul in favor of BOSH links, for the record.)

-Mike

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


--
Matthew Sykes
matthew.sykes(a)gmail.com
_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev


Matthew Sykes <matthew.sykes@...>
 

Thanks for the feedback. I'll let others comment on the bosh aspects other
than to say that we are expecting people to use spiff to generate the
manifests and that the decision to namespace properties was intentional.

For the SSH proxy, it absolutely does support 2048 bit RSA keys so I'm not
sure why you ran into a problem. Our bosh-lite template uses a 2014 bit key
and we have tests that use 1024 and 2048 bit keys in CI. If you want to dig
into that, please open an issue.

As for consul, it's TBD whether or not it becomes a strategic solution but
it offers capabilities above and beyond bosh links. We kicked off some work
today to look at recreating the health checks and dns resolution with a sky
dns + etcd solution. If that looks promising, we'll probably go in that
direction.

On the etcd side, it's probably best not to share the two for now. Diego is
in the process of enabling mutual auth over SSL - something that probably
won't be done in cf-release any time soon.

On Tue, Jun 30, 2015 at 8:05 PM, Mike Heath <elcapo(a)gmail.com> wrote:

I just got Diego successfully integrated and deployed in my Cloud Foundry
dev environment. Here's a bit of feedback.

One of the really nice features of BOSH is that you can set a property
once and any job that needs that property can consume it. Unfortunately,
the Diego release takes this beautiful feature and throws it out the
window. The per-job name spaced properties suck. Sure this would be easier
if I were using Spiff but our existing deployments don't use Spiff. Unless
Spiff is the only supported option for using the Diego BOSH release, the
Diego release properties need to be fixed to avoid the mass duplication and
properties that much up with properties in cf-release should be renamed. I
spent more time matching up duplicate properties than anything else which
is unfortunate since BOSH should have relieved me of this pain.

SSH Proxy doesn't support 2048 bit RSA keys. I get this error:

{"timestamp":"1435189129.986424685","source":"ssh-proxy","message":"ssh-proxy.failed-to-parse-host-key","log_level":3,"data":{"error":"crypto/rsa:
invalid exponents","trace":"goroutine 1 [running]:\
ngithub.com/pivotal-golang/lager.(*logger).Fatal(0xc2080640c0, 0x8eba10,
0x18, 0x7fa802383b00, 0xc20802ad80, 0x0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/pivotal-golang/lager/logger.go:131
+0xc8\nmain.configure(0x7fa8023886e0, 0xc2080640c0, 0x7fa8023886e0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:167
+0xacb\nmain.main()\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:75
+0xb4\n"}}

1024-bit keys work just fine.

The *.cc.external_port properties should have a default value (9022) just
like cc.external_port does in the cloud_controller_ng job in cf-release.

In the receptor job, there's a property diego.receptor.nats.username but
every other job (in cf-release and diego-release) uses nats.user rather
than nats.username.

Rather than deploy two etcd jobs, I'm just using the etcd job provided by
cf-release. Is there a reason not to do this? Everything appears to be
working fine. I haven't yet run the DATs though.

Consul is great and all but in my dev environment the Consul server
crashed a couple of times and it took a while to discover that the reason
CF crapped out was was because Consul DNS lookups were broken. Is Consul a
strategic solution or is it just a stop gap until BOSH Links are ready? (I
would prefer removing Consul in favor of BOSH links, for the record.)

-Mike

_______________________________________________
cf-dev mailing list
cf-dev(a)lists.cloudfoundry.org
https://lists.cloudfoundry.org/mailman/listinfo/cf-dev

--
Matthew Sykes
matthew.sykes(a)gmail.com


Mike Heath
 

I just got Diego successfully integrated and deployed in my Cloud Foundry
dev environment. Here's a bit of feedback.

One of the really nice features of BOSH is that you can set a property once
and any job that needs that property can consume it. Unfortunately, the
Diego release takes this beautiful feature and throws it out the window.
The per-job name spaced properties suck. Sure this would be easier if I
were using Spiff but our existing deployments don't use Spiff. Unless Spiff
is the only supported option for using the Diego BOSH release, the Diego
release properties need to be fixed to avoid the mass duplication and
properties that much up with properties in cf-release should be renamed. I
spent more time matching up duplicate properties than anything else which
is unfortunate since BOSH should have relieved me of this pain.

SSH Proxy doesn't support 2048 bit RSA keys. I get this error:

{"timestamp":"1435189129.986424685","source":"ssh-proxy","message":"ssh-proxy.failed-to-parse-host-key","log_level":3,"data":{"error":"crypto/rsa:
invalid exponents","trace":"goroutine 1 [running]:\
ngithub.com/pivotal-golang/lager.(*logger).Fatal(0xc2080640c0, 0x8eba10,
0x18, 0x7fa802383b00, 0xc20802ad80, 0x0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/pivotal-golang/lager/logger.go:131
+0xc8\nmain.configure(0x7fa8023886e0, 0xc2080640c0, 0x7fa8023886e0, 0x0,
0x0)\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:167
+0xacb\nmain.main()\n\t/var/vcap/packages/ssh_proxy/src/
github.com/cloudfoundry-incubator/diego-ssh/cmd/ssh-proxy/main.go:75
+0xb4\n"}}

1024-bit keys work just fine.

The *.cc.external_port properties should have a default value (9022) just
like cc.external_port does in the cloud_controller_ng job in cf-release.

In the receptor job, there's a property diego.receptor.nats.username but
every other job (in cf-release and diego-release) uses nats.user rather
than nats.username.

Rather than deploy two etcd jobs, I'm just using the etcd job provided by
cf-release. Is there a reason not to do this? Everything appears to be
working fine. I haven't yet run the DATs though.

Consul is great and all but in my dev environment the Consul server crashed
a couple of times and it took a while to discover that the reason CF
crapped out was was because Consul DNS lookups were broken. Is Consul a
strategic solution or is it just a stop gap until BOSH Links are ready? (I
would prefer removing Consul in favor of BOSH links, for the record.)

-Mike