Date   

Problem deploying basic Apps on PWS

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi,

I am trying to deploy an Application on PWS and I think that the problem is something related with Diego.
In previous weeks, I deployed without any problem to PWS but today, I receive the following message from CLI:

2015-10-29T15:32:18.28+0100 [HEALTH/0] OUT healthcheck failed
2015-10-29T15:32:18.29+0100 [HEALTH/0] OUT Exit status 1
2015-10-29T15:32:19.33+0100 [HEALTH/0] OUT healthcheck failed
2015-10-29T15:32:19.33+0100 [HEALTH/0] OUT Exit status 1
2015-10-29T15:32:19.33+0100 [CELL/0] ERR Timed out after 1m0s: health check
never passed.
2015-10-29T15:32:19.34+0100 [CELL/0] OUT Exit status 0
2015-10-29T15:32:19.34+0100 [APP/0] OUT Exit status 255
2015-10-29T15:32:19.42+0100 [API/2] OUT App instance exited with guid f658
33a-03d8-4191-a463-ddb9e35b7215 payload: {"instance"=>"56f0ebf2-62a0-4bf1-5bda-
b790d1af0f7", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"2 error(s)
ccurred:\n\n* Exited with status 1\n* 2 error(s) occurred:\n\n* cancelled\n* ca
celled", "crash_count"=>4, "crash_timestamp"=>1446129139379123812, "version"=>"
5bbb951-ebf7-4a58-a50f-e92e834d46ca"}

Current manifest is:

---
applications:
- name: psldeploymanager4
memory: 512MB
disk_quota: 512MB
instances: 1
command: node ./index.js
host: psldeploymanager4
path: .
buildpack: https://github.com/cloudfoundry/nodejs-buildpack

I am not sure if Diego needs some additional attribute. In local, the app runs nice and it was tested on a local instance and bluemix.

Any clue?

Many thanks in advance.

Juan Antonio


Re: Presentation of BOSH on OpenStack Tokyo Summit 2015

Michael Maximilien
 

About 66 responded to survey in the one week I kept it open.




Best,




Max




Sent from Mailbox

On Thu, Oct 29, 2015 at 7:07 PM, Aleksey Zalesov
<aleksey.zalesov(a)altoros.com> wrote:

Cool! Thank you for sharing
How many people took survey?
Alex Zalesov


Re: Presentation of BOSH on OpenStack Tokyo Summit 2015

Aleksey Zalesov
 

Cool! Thank you for sharing

How many people took survey?

Alex Zalesov


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Matthew Sykes <matthew.sykes@...>
 

I did the work on the cli plugin but not on the integration into the cli.
Based on your first error, it looked like we were having a problem getting
the one time code, not authenticating with the ssh proxy. The fact that
you're able to get the code from the UAA manually implies that piece is
working correctly.

The authorization code message could be related but, if it were, I'd expect
some evidence of that in the ssh proxy logs as well.

You can try to manually ssh using the instructions in the diego-ssh repo
[1] and see if you can isolate if the problem is on the cli side or the
server side.

[1]:
https://github.com/cloudfoundry-incubator/diego-ssh#cloud-foundry-via-cloud-controller-and-uaa

On Wed, Oct 28, 2015 at 9:57 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I think I'm getting closer. In UAA I now get the error:

TokenEndpoint: Handling error: InvalidGrantException, Invalid
authorization code: ad1o9o

This must be someone trying to redeem the auth code.

Mike

On Wed, Oct 28, 2015 at 7:41 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

That curl command returns what appears to be the correct response:

curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" 'https://uaa.
{redacted}/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy'
{trim}
GET
/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
HTTP/1.1
User-Agent: curl/7.38.0
Host: uaa.cf1-dev.lds.org
Accept: */*
Authorization: bearer {redacted}
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Language: en-US
< Content-Length: 0
< Date: Thu, 29 Oct 2015 01:32:08 GMT
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Location: http://uaa.{redacted}/login?code=huQG3t
< Pragma: no-cache
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< X-Cf-Requestid: 36f6b88e-f8a9-49f1-5f90-ef2b868c266d
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Content-Type: text/plain; charset=utf-8

I never see a call like this in my CF_TRACE.

Mike

On Wed, Oct 28, 2015 at 7:09 PM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

That's not the request that the plugin is making to get the token. We're
using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes a
code parameter. If not, can you use a jwt decoder against your bearer token
and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh
plugin is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com>
wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it
should be set to the client ID of the ssh proxy. If it's not set, I think
that's one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200.
It appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why
UAA wouldn't be sending a redirect response from /oauth/token when the
plugin is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com

--
Matthew Sykes
matthew.sykes(a)gmail.com
--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom
 

I think I'm getting closer. In UAA I now get the error:

TokenEndpoint: Handling error: InvalidGrantException, Invalid authorization
code: ad1o9o

This must be someone trying to redeem the auth code.

Mike

On Wed, Oct 28, 2015 at 7:41 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

That curl command returns what appears to be the correct response:

curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" 'https://uaa.
{redacted}/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy'
{trim}
GET
/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
HTTP/1.1
User-Agent: curl/7.38.0
Host: uaa.cf1-dev.lds.org
Accept: */*
Authorization: bearer {redacted}
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Language: en-US
< Content-Length: 0
< Date: Thu, 29 Oct 2015 01:32:08 GMT
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Location: http://uaa.{redacted}/login?code=huQG3t
< Pragma: no-cache
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< X-Cf-Requestid: 36f6b88e-f8a9-49f1-5f90-ef2b868c266d
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Content-Type: text/plain; charset=utf-8

I never see a call like this in my CF_TRACE.

Mike

On Wed, Oct 28, 2015 at 7:09 PM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

That's not the request that the plugin is making to get the token. We're
using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes a
code parameter. If not, can you use a jwt decoder against your bearer token
and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh
plugin is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com>
wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it
should be set to the client ID of the ssh proxy. If it's not set, I think
that's one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200.
It appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why
UAA wouldn't be sending a redirect response from /oauth/token when the
plugin is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom
 

That curl command returns what appears to be the correct response:

curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" 'https://uaa.
{redacted}/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy'
{trim}
GET
/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
HTTP/1.1
User-Agent: curl/7.38.0
Host: uaa.cf1-dev.lds.org
Accept: */*
Authorization: bearer {redacted}
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Language: en-US
< Content-Length: 0
< Date: Thu, 29 Oct 2015 01:32:08 GMT
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Location: http://uaa.{redacted}/login?code=huQG3t
< Pragma: no-cache
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< X-Cf-Requestid: 36f6b88e-f8a9-49f1-5f90-ef2b868c266d
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Content-Type: text/plain; charset=utf-8

I never see a call like this in my CF_TRACE.

Mike

On Wed, Oct 28, 2015 at 7:09 PM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

That's not the request that the plugin is making to get the token. We're
using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes a
code parameter. If not, can you use a jwt decoder against your bearer token
and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh plugin
is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com>
wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it
should be set to the client ID of the ssh proxy. If it's not set, I think
that's one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200.
It appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: cloud_controller_ng performance degrades slowly over time

Matt Cholick
 

Thanks for taking a look, fingers crossed you can see it happen as well.

Our 217 install is on stemcell 3026 and our 212 install is on 2989.

IaaS is CenturyLink Cloud.

-Matt

On Wed, Oct 28, 2015 at 6:08 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

I got up to 10k on an AWS deployment of HEAD of cf-release with ruby 2.2,
then started another loop on the same box with ruby 2.1. In the end, they
got up to 40-50k without showing any signs of change. I had to switch to
resolving the UAA endpoint, eventually google started responding with 302s.

I'm going to try with a cf-release 212 deployment on my bosh lite, but
eventually I want to try on the same stemcell as you're using.

On Wed, Oct 28, 2015 at 5:01 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no change.
I'm also shelling out to curl in the script, to see if both curl and ruby
get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Looks like you're right and we're experiencing the same issue as you
are Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize :879

I've experimented some with the environment and, after narrowing things
down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some
point, requests start taking 5 second longer, which I assume is a timeout.
One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time
out varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig"
against localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server,
just the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches
for user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that
the DNS request to consul agent fails fast and moves on to the next
resolver, we haven't figured out why a couple envs exhibit different
behavior. The impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't
explain your 1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Matthew Sykes <matthew.sykes@...>
 

That's not the request that the plugin is making to get the token. We're
using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes a
code parameter. If not, can you use a jwt decoder against your bearer token
and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh plugin
is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com> wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it should
be set to the client ID of the ssh proxy. If it's not set, I think that's
one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200. It
appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com
--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: cloud_controller_ng performance degrades slowly over time

Amit Kumar Gupta
 

I got up to 10k on an AWS deployment of HEAD of cf-release with ruby 2.2,
then started another loop on the same box with ruby 2.1. In the end, they
got up to 40-50k without showing any signs of change. I had to switch to
resolving the UAA endpoint, eventually google started responding with 302s.

I'm going to try with a cf-release 212 deployment on my bosh lite, but
eventually I want to try on the same stemcell as you're using.

On Wed, Oct 28, 2015 at 5:01 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no change.
I'm also shelling out to curl in the script, to see if both curl and ruby
get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Looks like you're right and we're experiencing the same issue as you are
Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize :879

I've experimented some with the environment and, after narrowing things
down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some point,
requests start taking 5 second longer, which I assume is a timeout. One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time
out varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig"
against localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server,
just the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches
for user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that the
DNS request to consul agent fails fast and moves on to the next resolver,
we haven't figured out why a couple envs exhibit different behavior. The
impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't explain your
1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom
 

In case it helps this is the CF_TRACE of the UAA call that the ssh plugin
is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com> wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it should
be set to the client ID of the ssh proxy. If it's not set, I think that's
one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200. It
appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom
 

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.

https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike

On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com> wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it should
be set to the client ID of the ssh proxy. If it's not set, I think that's
one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200. It
appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Matthew Sykes <matthew.sykes@...>
 

Does /v2/info contain the `app_ssh_auth_client` key? If not, it should be
set to the client ID of the ssh proxy. If it's not set, I think that's one
of the symptom.

https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com> wrote:

I'm working on upgrading to latest cf-release+diego and I'm having trouble
getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200. It
appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike
--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: cloud_controller_ng performance degrades slowly over time

Amit Kumar Gupta
 

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no change.
I'm also shelling out to curl in the script, to see if both curl and ruby
get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Looks like you're right and we're experiencing the same issue as you are
Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize :879

I've experimented some with the environment and, after narrowing things
down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some point,
requests start taking 5 second longer, which I assume is a timeout. One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time
out varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig" against
localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server, just
the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches for
user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that the
DNS request to consul agent fails fast and moves on to the next resolver,
we haven't figured out why a couple envs exhibit different behavior. The
impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't explain your
1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom
 

I'm working on upgrading to latest cf-release+diego and I'm having trouble
getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a 200. It
appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why UAA
wouldn't be sending a redirect response from /oauth/token when the plugin
is expecting it?

Mike


Re: cloud_controller_ng performance degrades slowly over time

Dieu Cao <dcao@...>
 

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Looks like you're right and we're experiencing the same issue as you are
Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize :879

I've experimented some with the environment and, after narrowing things
down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some point,
requests start taking 5 second longer, which I assume is a timeout. One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time out
varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig" against
localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server, just
the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches for
user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that the
DNS request to consul agent fails fast and moves on to the next resolver,
we haven't figured out why a couple envs exhibit different behavior. The
impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't explain your
1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: cloud_controller_ng performance degrades slowly over time

Matt Cholick
 

Looks like you're right and we're experiencing the same issue as you are
Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize :879

I've experimented some with the environment and, after narrowing things
down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed:
#{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some point,
requests start taking 5 second longer, which I assume is a timeout. One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time out
varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the amount
of paging that a fetching data for a large org does, as that 5 seconds is
multiplied out over several calls. Every user is feeling this delay, it's
just that it's only unworkable pulling the large datasets from UAA.

I was not able to reproduce timeouts using a script calling "dig" against
localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server, just
the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches for
user management in our larger orgs. Amit: is there a bug we can follow?

-Matt

On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul agent
colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

We've seen issues on some environments where requests to cc that involve
cc making a request to uaa or hm9k have a 5s delay while the local consul
agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that the
DNS request to consul agent fails fast and moves on to the next resolver,
we haven't figured out why a couple envs exhibit different behavior. The
impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't explain your
1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: Ability to move a space between orgs

Mike Youngstrom
 

You're right. I can see why the ability to move a space is not in any
nearish term plans.

We'll probably look at creating a clone/copy solution and consider it a
long term investment.

Thanks,
Mike

On Tue, Oct 27, 2015 at 6:32 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

Hi Mike,

Yes, moving gets stickier very quickly and precisely what one team wants
to have preserved across an org may differ depending on use case.
Service bindings, environment variables, routes, membership etc are tied
to particular spaces and orgs.
There are many implications to "moving" service instances, apps, etc to be
considered.

Have you thought about "cloning" a space?
I could imagine a plugin that could clone apps (names, bits, environment
variables) from one space to another.
Perhaps even moving routes.
Even creation of service instances and binding to similarly named apps
could be reasoned over.

-Dieu
CF CAPI PM


On Tue, Oct 27, 2015 at 9:23 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

We occasionally need to move spaces between orgs when our business
reorganizes. It would be great if we could atomically move spaces between
orgs.

It seems not difficult but when you look deeper things get stickier:
* Private Domains are owned by orgs
* Service access may be different between orgs
* New Organization scoped brokers may cause issues.

Thoughts on supporting moving a space between orgs? Could perhaps error
out if an issue like the ones above is detected?

Mike


Error to make a Request to update password in UAA

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi,

Using UAA API, it is possible to create users without password. Later if you need to update the password what is the right request to make the process? Current documentation is not very clear:

https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#create-a-user-post-users

The document has al ink for a section to udpate password but it was removed:
http://www.simplecloud.info/specs/draft-scim-api-01.html#change-password

Using the documentation, the request throws an Error:

uaa_options = {
"schemas":["urn:scim:schemas:core:1.0"],
"password": "abc123456",
"oldPassword": "oldpassword"
}

return CloudFoundryUsersUAA.updatePassword(token_type, access_token, uaa_guid, uaa_options);

UsersUAA.prototype.updatePassword = function (token_type, access_token, uaa_guid, uaa_options) {
"use strict";

var url = this.UAA_API_URL + "/Users/" + uaa_guid + "/password";
var options = {
method: 'PUT',
url: url,
headers: {
Accept: 'application/json',
Authorization: token_type + ' ' + access_token
},
json: uaa_options
};

return this.REST.request(options, "200", false);
};

Error:

Error: the string "<html><head><title>Apache Tomcat/7.0.55 - Error report</
title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-
color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:whi
te;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-s
erif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tah
oma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,
Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Ar
ial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.
name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Stat
us 400 - </h1><HR size=\"1\" noshade=\"noshade\"><p><b>type</b> Status report</p
<p><b>message</b> <u></u></p><p><b>description</b> <u>The request sent by the c
lient was syntactically incorrect.</u></p><HR size=\"1\" noshade=\"noshade\"><h3
Apache Tomcat/7.0.55</h3></body></html>" was thrown, throw an Error :)
Note: it is possible to create an user in UAA with a password in the first operation, but the documenation is not clear in this point.

var uaa_options = {
"schemas":["urn:scim:schemas:core:1.0"],
"userName":username,
"emails":[
{
"value":"demo(a)example.com",
"type":"work"
}
],
"password": "123456",
};

Usage with CF CLI: cf login -a https://apiMY_IP.xip.io -u userXXX -p 123456 --skip-ssl-validation

Any help to update passwords?

Juan Antonio


Presentation of BOSH on OpenStack Tokyo Summit 2015

Hua ZZ Zhang <zhuadl@...>
 

Hi CF users and developers,
 
Today in the OpenStack Tokyo Summit 2015, we presented BOSH to the OpenStack community and shared the results of the survey we did a couple days ago. We also recorded a video and demonstrated how to use BOSH to deploy Cloud Foundry on OpenStack and shared the experiences with audience. You can find the document and video links below. Any comment is welcomed! Thanks!
 
Presentationhttps://goo.gl/QVUAZn
 
Best regards,
 
-Edward, Dr. Max, Tom (BOSH contributors)


Re: Logs, Timestamps

Bharath
 

Hi daniel ,


The logs formats are basically using log library called steno
https://github.com/cloudfoundry/steno

It was mentioned that we can configure to modify the logs to a human
readable format. I never did that . I think you can have a look into it

you can also look at firehouse and nozzles project . I think that is also
use ful

http://docs.cloudfoundry.org/loggregator/architecture.html#firehose

http://docs.cloudfoundry.org/loggregator/architecture.html#nozzles

tutorials on nozzle
http://docs.cloudfoundry.org/loggregator/nozzle-tutorial.html


regards
Bharath

On Wed, Oct 28, 2015 at 2:06 PM, Daniel Jones <
daniel.jones(a)engineerbetter.com> wrote:

Hi all,

Why are logs like those of the CloudController dated by machine-readable
timestamps? Are there any tools to parse CF logs in-situ?

I find it a constant source of frustration trying to debug an issue
reported by a human with a human-parseable date, putting that into a Unix
timestamp converter, searching through logs for that timestamp, then
scrolling up and down having to occasionally copy/paste a timestamp back
into a converter to see if I've gone too far away from my rough target time.

Computers are rather good at converting data formats - my brain, not so
much. Wouldn't it make more sense to have the logs human-readable by
default, and if an automated system needs to ingest those logs, let *it* do
the parsing?

Regards,

Daniel Jones
EngineerBetter.com

6921 - 6940 of 9425