Date   

Does exist some REST Call to detect Diego on any environment?

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi,

I would like to know if exist some REST API to detect if a platform has installed Diego.
Reading API from Cloud Controller, I didn't recognize some method to identify Diego in CF architecture.

Is it possible?

Many thanks in advance

Juan Antonio


Re: Problem deploying basic Apps on PWS

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi Charles,

You said the clue!!!
Yesterday, I updated the development and I could deploy on PWS.

From environments without Diego, the way to run a Node development is:

var localPort = process.env.VCAP_APP_PORT|| 5000;

With Diego the way is:

var localPort = process.env.PORT || 5000;

If the developer uses GO Cli, it is necessary to indicate the application that it uses Diego:

cf push APP_XXX --no-start
cf enable-diego APP_XXX
cf start APP_XXX


Re: Source IP ACLs

Gwenn Etourneau
 

Oh right I misread it I thought it was avoid application to connect to
certain IP.....

On Fri, Oct 30, 2015 at 4:46 PM, ronak banka <ronakbanka.cse(a)gmail.com>
wrote:

Gwenn,

If I'm not wrong Application sec rules are for restricting outbound
traffic from application side , no?

Ronak

On Fri, Oct 30, 2015, 16:38 Gwenn Etourneau <getourneau(a)pivotal.io> wrote:

What about
https://docs.pivotal.io/pivotalcf/adminguide/app-sec-groups.html ?

On Fri, Oct 30, 2015 at 4:21 PM, Carlo Alberto Ferraris <
carlo.ferraris(a)rakuten.com> wrote:

Is there any provision for restricting the source IPs that are allowed
to access a certain application (or route)? Or the only way to do this is
to place a reverse proxy in front of the gorouter?
In case the reverse proxy is the only way to go, would there be interest
to have something like this implemented inside the gorouter itself? (we're
willing to contribute)


Re: Source IP ACLs

Ronak Banka
 

Gwenn,

If I'm not wrong Application sec rules are for restricting outbound traffic
from application side , no?

Ronak

On Fri, Oct 30, 2015, 16:38 Gwenn Etourneau <getourneau(a)pivotal.io> wrote:

What about
https://docs.pivotal.io/pivotalcf/adminguide/app-sec-groups.html ?

On Fri, Oct 30, 2015 at 4:21 PM, Carlo Alberto Ferraris <
carlo.ferraris(a)rakuten.com> wrote:

Is there any provision for restricting the source IPs that are allowed to
access a certain application (or route)? Or the only way to do this is to
place a reverse proxy in front of the gorouter?
In case the reverse proxy is the only way to go, would there be interest
to have something like this implemented inside the gorouter itself? (we're
willing to contribute)


Re: Source IP ACLs

Gwenn Etourneau
 

What about https://docs.pivotal.io/pivotalcf/adminguide/app-sec-groups.html
?

On Fri, Oct 30, 2015 at 4:21 PM, Carlo Alberto Ferraris <
carlo.ferraris(a)rakuten.com> wrote:

Is there any provision for restricting the source IPs that are allowed to
access a certain application (or route)? Or the only way to do this is to
place a reverse proxy in front of the gorouter?
In case the reverse proxy is the only way to go, would there be interest
to have something like this implemented inside the gorouter itself? (we're
willing to contribute)


Source IP ACLs

Carlo Alberto Ferraris
 

Is there any provision for restricting the source IPs that are allowed to access a certain application (or route)? Or the only way to do this is to place a reverse proxy in front of the gorouter?
In case the reverse proxy is the only way to go, would there be interest to have something like this implemented inside the gorouter itself? (we're willing to contribute)


Re: cloud_controller_ng performance degrades slowly over time

Amit Kumar Gupta
 

Matt, that's awesome, thanks! Mind trying this?

require 'uri'
require 'net/http'
require 'logger'

SYSTEM_DOMAIN = '--CHANGE-ME--'

u = URI.parse('http://uaa.' + SYSTEM_DOMAIN + '/login')
h = Net::HTTP.new(u.host, u.port)
l = Logger.new('/var/vcap/data/tmp/slow-dns.log')
h.set_debug_output(l)

1.step do |i|
l.info('Request number: %04d' % i)
s = Time.now
r = h.head(u.path)
d = Time.now - s
l.info('Duration: %dms' % (d * 1000).round)
l.info('Response code: %d' % r.code)
l.error('!!! SLOW !!!') if d > 5
end

I'd want to know what we see in /var/vcap/data/tmp/slow-dns.log before and
after the DNS slowdown. By having the http object take a debug logger, we
can narrow down what Ruby is doing that's making it uniquely slow.

On Thu, Oct 29, 2015 at 7:39 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Amit,
Here's a run with the problem manifesting:

...
00248 [200]: ruby 26ms | curl 33ms | nslookup 21ms
00249 [200]: ruby 20ms | curl 32ms | nslookup 14ms
00250 [200]: ruby 18ms | curl 30ms | nslookup 17ms
00251 [200]: ruby 22ms | curl 31ms | nslookup 16ms
00252 [200]: ruby 23ms | curl 30ms | nslookup 16ms
00253 [200]: ruby 26ms | curl 40ms | nslookup 16ms
00254 [200]: ruby 20ms | curl 40ms | nslookup 14ms
00255 [200]: ruby 20ms | curl 35ms | nslookup 20ms
00256 [200]: ruby 17ms | curl 32ms | nslookup 14ms
00257 [200]: ruby 20ms | curl 37ms | nslookup 14ms
00258 [200]: ruby 25ms | curl 1038ms | nslookup 14ms
00259 [200]: ruby 27ms | curl 37ms | nslookup 13ms
00260 [200]: ruby 4020ms | curl 32ms | nslookup 16ms
00261 [200]: ruby 5032ms | curl 45ms | nslookup 14ms
00262 [200]: ruby 5021ms | curl 30ms | nslookup 14ms
00263 [200]: ruby 5027ms | curl 32ms | nslookup 16ms
00264 [200]: ruby 5025ms | curl 34ms | nslookup 15ms
00265 [200]: ruby 5029ms | curl 31ms | nslookup 14ms
00266 [200]: ruby 5030ms | curl 37ms | nslookup 18ms
00267 [200]: ruby 5022ms | curl 43ms | nslookup 14ms
00268 [200]: ruby 5026ms | curl 31ms | nslookup 17ms
00269 [200]: ruby 5027ms | curl 33ms | nslookup 14ms
00270 [200]: ruby 5025ms | curl 32ms | nslookup 14ms
00271 [200]: ruby 5022ms | curl 36ms | nslookup 15ms
00272 [200]: ruby 5030ms | curl 32ms | nslookup 13ms
00273 [200]: ruby 5024ms | curl 32ms | nslookup 13ms
00274 [200]: ruby 5028ms | curl 34ms | nslookup 14ms
00275 [200]: ruby 5048ms | curl 30ms | nslookup 14ms


It's definitely interesting that Ruby is the only one to manifest the
problem.

And here's the consul output:
https://gist.github.com/cholick/f7e91fb58891cc0d8f5a


On Thu, Oct 29, 2015 at 4:27 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hey Matt,

Dieu's suggestion will fix your problem (you'll have to make the change
on all CC's), although it'll get undone on each redeploy. We do want to
find the root cause, but have not been able to reproduce it in our own
environments. If you're up for some investigation, may I suggest the
following:

* Run the following variation of your script on one of the CCs:

require 'uri'
require 'net/http'

SYSTEM_DOMAIN = '--CHANGE-ME--'

uaa_domain = "uaa.#{SYSTEM_DOMAIN}"
login_url = "https://#{uaa_domain}/login"

curl_command="curl -f #{login_url} 2>&1"
nslookup_command="nslookup #{uaa_domain} 2>&1"

puts 'STARTING SANITY CHECK'
curl_output = `#{curl_command}`
raise "'#{curl_command}' failed with output:\n#{curl_output}" unless
$?.to_i.zero?
puts 'SANITY CHECK PASSED'

def duration_string(start)
"#{((Time.now - start) * 1000).round}ms"
end

puts 'STARTING TEST'
1.step do |i|
uri = URI.parse(login_url)
ruby_start = Time.now
ruby_response = Net::HTTP.get_response(uri)
ruby_duration = duration_string(ruby_start)

curl_start = Time.now
`#{curl_command}`
curl_duration = duration_string(curl_start)

nslookup_start = Time.now
`#{nslookup_command}`
nslookup_duration = duration_string(nslookup_start)

puts "#{"%05d" % i} [#{ruby_response.code}]: ruby #{ruby_duration} |
curl #{curl_duration} | nslookup #{nslookup_duration}"
end

* Send a kill -QUIT <consul_agent_pid> to the consul agent process once
you see the slow DNS manifest itself, you will get a dump of all the
goroutines running in the consul agent process
/var/vcap/sys/log/consul_agent/consul_agent.stderr.log. I would be curious
to see what it spits out.

Amit


On Wed, Oct 28, 2015 at 6:10 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Thanks for taking a look, fingers crossed you can see it happen as well.

Our 217 install is on stemcell 3026 and our 212 install is on 2989.

IaaS is CenturyLink Cloud.

-Matt

On Wed, Oct 28, 2015 at 6:08 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

I got up to 10k on an AWS deployment of HEAD of cf-release with ruby
2.2, then started another loop on the same box with ruby 2.1. In the end,
they got up to 40-50k without showing any signs of change. I had to switch
to resolving the UAA endpoint, eventually google started responding with
302s.

I'm going to try with a cf-release 212 deployment on my bosh lite, but
eventually I want to try on the same stemcell as you're using.

On Wed, Oct 28, 2015 at 5:01 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no
change. I'm also shelling out to curl in the script, to see if both curl
and ruby get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Looks like you're right and we're experiencing the same issue as you
are Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize
:879

I've experimented some with the environment and, after narrowing
things down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some
point, requests start taking 5 second longer, which I assume is a timeout.
One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things
time out varies considerably (one run started in the 10s and another took
20k requests), but it always happens. After that, lookups take an
additional 5 second and never recover to their initial speed. This is why
restarting the cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig"
against localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server,
just the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real
headaches for user management in our larger orgs. Amit: is there a bug we
can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

You may not be running any consul servers, but you may have a
consul agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next
time.

Amit,
1:20 delay is due to paging. The total call length for each page
is closer to 10s. Just included those two calls with paging by the cf
command line included numbers to demonstrate the dramatic difference after
a restart. Delays disappear after a restart. We're not running consul yet,
so it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that
the DNS request to consul agent fails fast and moves on to the next
resolver, we haven't figured out why a couple envs exhibit different
behavior. The impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't
explain your 1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: SSL Mutual Auth

James Bayer
 

once the tcp routing work is done with the haproxy approach, you should be
able to try mutual ssl using a IP/port. you should be able to test tcp
routing with lattice now.

however web traffic using a cf web route (FQDN and optional path) that goes
through a Load Balancer like F5/ELB and CF gorouters, the SSL request is
terminated at least once before it reaches the application.

On Thu, Oct 29, 2015 at 9:22 PM, Anthony Lee <lee.apc(a)gmail.com> wrote:

Does any one have any experience using SSL mutual authentication for an
app running on CF?

Thanks!
Anthony
--
Thank you,

James Bayer


SSL Mutual Auth

Anthony
 

Does any one have any experience using SSL mutual authentication for an app running on CF?

Thanks!
Anthony


Re: cloud_controller_ng performance degrades slowly over time

Matt Cholick
 

Amit,
Here's a run with the problem manifesting:

...
00248 [200]: ruby 26ms | curl 33ms | nslookup 21ms
00249 [200]: ruby 20ms | curl 32ms | nslookup 14ms
00250 [200]: ruby 18ms | curl 30ms | nslookup 17ms
00251 [200]: ruby 22ms | curl 31ms | nslookup 16ms
00252 [200]: ruby 23ms | curl 30ms | nslookup 16ms
00253 [200]: ruby 26ms | curl 40ms | nslookup 16ms
00254 [200]: ruby 20ms | curl 40ms | nslookup 14ms
00255 [200]: ruby 20ms | curl 35ms | nslookup 20ms
00256 [200]: ruby 17ms | curl 32ms | nslookup 14ms
00257 [200]: ruby 20ms | curl 37ms | nslookup 14ms
00258 [200]: ruby 25ms | curl 1038ms | nslookup 14ms
00259 [200]: ruby 27ms | curl 37ms | nslookup 13ms
00260 [200]: ruby 4020ms | curl 32ms | nslookup 16ms
00261 [200]: ruby 5032ms | curl 45ms | nslookup 14ms
00262 [200]: ruby 5021ms | curl 30ms | nslookup 14ms
00263 [200]: ruby 5027ms | curl 32ms | nslookup 16ms
00264 [200]: ruby 5025ms | curl 34ms | nslookup 15ms
00265 [200]: ruby 5029ms | curl 31ms | nslookup 14ms
00266 [200]: ruby 5030ms | curl 37ms | nslookup 18ms
00267 [200]: ruby 5022ms | curl 43ms | nslookup 14ms
00268 [200]: ruby 5026ms | curl 31ms | nslookup 17ms
00269 [200]: ruby 5027ms | curl 33ms | nslookup 14ms
00270 [200]: ruby 5025ms | curl 32ms | nslookup 14ms
00271 [200]: ruby 5022ms | curl 36ms | nslookup 15ms
00272 [200]: ruby 5030ms | curl 32ms | nslookup 13ms
00273 [200]: ruby 5024ms | curl 32ms | nslookup 13ms
00274 [200]: ruby 5028ms | curl 34ms | nslookup 14ms
00275 [200]: ruby 5048ms | curl 30ms | nslookup 14ms


It's definitely interesting that Ruby is the only one to manifest the
problem.

And here's the consul output:
https://gist.github.com/cholick/f7e91fb58891cc0d8f5a

On Thu, Oct 29, 2015 at 4:27 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hey Matt,

Dieu's suggestion will fix your problem (you'll have to make the change on
all CC's), although it'll get undone on each redeploy. We do want to find
the root cause, but have not been able to reproduce it in our own
environments. If you're up for some investigation, may I suggest the
following:

* Run the following variation of your script on one of the CCs:

require 'uri'
require 'net/http'

SYSTEM_DOMAIN = '--CHANGE-ME--'

uaa_domain = "uaa.#{SYSTEM_DOMAIN}"
login_url = "https://#{uaa_domain}/login"

curl_command="curl -f #{login_url} 2>&1"
nslookup_command="nslookup #{uaa_domain} 2>&1"

puts 'STARTING SANITY CHECK'
curl_output = `#{curl_command}`
raise "'#{curl_command}' failed with output:\n#{curl_output}" unless
$?.to_i.zero?
puts 'SANITY CHECK PASSED'

def duration_string(start)
"#{((Time.now - start) * 1000).round}ms"
end

puts 'STARTING TEST'
1.step do |i|
uri = URI.parse(login_url)
ruby_start = Time.now
ruby_response = Net::HTTP.get_response(uri)
ruby_duration = duration_string(ruby_start)

curl_start = Time.now
`#{curl_command}`
curl_duration = duration_string(curl_start)

nslookup_start = Time.now
`#{nslookup_command}`
nslookup_duration = duration_string(nslookup_start)

puts "#{"%05d" % i} [#{ruby_response.code}]: ruby #{ruby_duration} |
curl #{curl_duration} | nslookup #{nslookup_duration}"
end

* Send a kill -QUIT <consul_agent_pid> to the consul agent process once
you see the slow DNS manifest itself, you will get a dump of all the
goroutines running in the consul agent process
/var/vcap/sys/log/consul_agent/consul_agent.stderr.log. I would be curious
to see what it spits out.

Amit


On Wed, Oct 28, 2015 at 6:10 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Thanks for taking a look, fingers crossed you can see it happen as well.

Our 217 install is on stemcell 3026 and our 212 install is on 2989.

IaaS is CenturyLink Cloud.

-Matt

On Wed, Oct 28, 2015 at 6:08 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

I got up to 10k on an AWS deployment of HEAD of cf-release with ruby
2.2, then started another loop on the same box with ruby 2.1. In the end,
they got up to 40-50k without showing any signs of change. I had to switch
to resolving the UAA endpoint, eventually google started responding with
302s.

I'm going to try with a cf-release 212 deployment on my bosh lite, but
eventually I want to try on the same stemcell as you're using.

On Wed, Oct 28, 2015 at 5:01 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no
change. I'm also shelling out to curl in the script, to see if both curl
and ruby get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Looks like you're right and we're experiencing the same issue as you
are Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize
:879

I've experimented some with the environment and, after narrowing
things down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some
point, requests start taking 5 second longer, which I assume is a timeout.
One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things
time out varies considerably (one run started in the 10s and another took
20k requests), but it always happens. After that, lookups take an
additional 5 second and never recover to their initial speed. This is why
restarting the cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig"
against localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server,
just the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches
for user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that
the DNS request to consul agent fails fast and moves on to the next
resolver, we haven't figured out why a couple envs exhibit different
behavior. The impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't
explain your 1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Filip Hanik
 

best way around it , same as in the story.

set the time zone of the UAA can to match DB vm

On Thursday, October 29, 2015, Mike Youngstrom <youngm(a)gmail.com> wrote:

It appears my issue was caused by this uaa issue:
https://github.com/cloudfoundry/uaa/issues/223

Now to figure out the best way to work around it.

Thanks for your help Matt.

Mike

On Thu, Oct 29, 2015 at 10:07 AM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

This is what I'm now seeing in the logs:

ssh-proxy:
{"timestamp":"1446134601.007453442","source":"ssh-proxy","message":"ssh-proxy.authentication-failed","log_level":2,"data":{"error":"no
auth passed yet","user":"cf:52489e92-11b3-447f-813a-322353996d4a/0"}}

{"timestamp":"1446134601.011086702","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.authenticate-starting","log_level":1,"data":{"session":"23548"}}
{"timestamp":"1446134601.077978134","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.exchange-access-code-for-token.response-status-not-ok","log_level":2,"data":{"error":"Authentication
failed","session":"23548.1","status-code":400}}

{"timestamp":"1446134601.078182459","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.authenticate-finished","log_level":1,"data":{"session":"23548"}}
{"timestamp":"1446134601.078270674","source":"ssh-proxy","message":"ssh-proxy.authentication-failed","log_level":2,"data":{"error":"Authentication
failed","user":"cf:52489e92-11b3-447f-813a-322353996d4a/0"}}

uaa:
[2015-10-29T16:03:20.422Z] uaa - 6067 [http-bio-8080-exec-7] .... INFO
--- Audit: TokenIssuedEvent
('["cloud_controller.read","password.write","cloud_controller.write","openid","doppler.firehose","scim.read","cloud_controller.admin","uaa.user"]'):
principal=af373f0b-a193-4434-85ca-692c89e8feab, origin=[caller=cf,
details=(type=UaaAuthenticationDetails)], identityZoneId=[uaa]
[2015-10-29T16:03:21.073Z] uaa - 6067 [http-bio-8080-exec-27] .... INFO
--- TokenEndpoint: Handling error: InvalidGrantException, Invalid
authorization code: nZJfFg

CLI:
ssh: handshake failed: ssh: unable to authenticate, attempted methods
[none password], no supported methods remain

I'll try manual ssh. I'm also going to debug into UAA more see if I can
figure out why it isn't validating the authorization code being sent to it
presumably from ssh-proxy.

Mike

On Thu, Oct 29, 2015 at 3:12 AM, Matthew Sykes <matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>> wrote:

I did the work on the cli plugin but not on the integration into the
cli. Based on your first error, it looked like we were having a problem
getting the one time code, not authenticating with the ssh proxy. The fact
that you're able to get the code from the UAA manually implies that piece
is working correctly.

The authorization code message could be related but, if it were, I'd
expect some evidence of that in the ssh proxy logs as well.

You can try to manually ssh using the instructions in the diego-ssh repo
[1] and see if you can isolate if the problem is on the cli side or the
server side.

[1]:
https://github.com/cloudfoundry-incubator/diego-ssh#cloud-foundry-via-cloud-controller-and-uaa

On Wed, Oct 28, 2015 at 9:57 PM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

I think I'm getting closer. In UAA I now get the error:

TokenEndpoint: Handling error: InvalidGrantException, Invalid
authorization code: ad1o9o

This must be someone trying to redeem the auth code.

Mike

On Wed, Oct 28, 2015 at 7:41 PM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

That curl command returns what appears to be the correct response:

curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.
{redacted}/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy'
{trim}
GET
/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
HTTP/1.1
User-Agent: curl/7.38.0
Host: uaa.cf1-dev.lds.org
Accept: */*
Authorization: bearer {redacted}
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Language: en-US
< Content-Length: 0
< Date: Thu, 29 Oct 2015 01:32:08 GMT
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Location: http://uaa.{redacted}/login?code=huQG3t
< Pragma: no-cache
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< X-Cf-Requestid: 36f6b88e-f8a9-49f1-5f90-ef2b868c266d
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Content-Type: text/plain; charset=utf-8

I never see a call like this in my CF_TRACE.

Mike

On Wed, Oct 28, 2015 at 7:09 PM, Matthew Sykes <
matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>> wrote:

That's not the request that the plugin is making to get the token.
We're using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes
a code parameter. If not, can you use a jwt decoder against your bearer
token and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh
plugin is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac
clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>> wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it
should be set to the client ID of the ssh proxy. If it's not set, I think
that's one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com
<javascript:_e(%7B%7D,'cvml','youngm(a)gmail.com');>> wrote:

I'm working on upgrading to latest cf-release+diego and I'm
having trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a
200. It appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea
why UAA wouldn't be sending a redirect response from /oauth/token when the
plugin is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>

--
Matthew Sykes
matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>

--
Matthew Sykes
matthew.sykes(a)gmail.com
<javascript:_e(%7B%7D,'cvml','matthew.sykes(a)gmail.com');>


Re: xip.io IO errors

Dan Wendorf
 

If you're using the common bosh lite IP of 10.244.0.34, you can also use
the more-reliable *.bosh-lite.com

On Thu, Oct 29, 2015, 5:40 PM Amit Gupta <agupta(a)pivotal.io> wrote:

Yes, xip.io is flaky. I would recommend setting up real DNS if you want
to avoid the 2% failure rate if you want a longer term, reliable solution.

On Thu, Oct 29, 2015 at 4:55 PM, Sumanth Yamala <Sumanth.Yamala(a)sas.com>
wrote:

Hi,



Currently we see IO errors when trying to contact xip.io – this happens
around 2% of the times. To apply a temporary fix – I updated the /etc/hosts
of the ha_proxy VM to map the - IP to the service routes of each app
deployed on the system.



After the above change I still see the packet drops – so wanted to know
if this needs to be done on all the runners as well or ..? any thoughts.

Thanks,

Sumanth


Re: xip.io IO errors

Amit Kumar Gupta
 

Yes, xip.io is flaky. I would recommend setting up real DNS if you want to
avoid the 2% failure rate if you want a longer term, reliable solution.

On Thu, Oct 29, 2015 at 4:55 PM, Sumanth Yamala <Sumanth.Yamala(a)sas.com>
wrote:

Hi,



Currently we see IO errors when trying to contact xip.io – this happens
around 2% of the times. To apply a temporary fix – I updated the /etc/hosts
of the ha_proxy VM to map the - IP to the service routes of each app
deployed on the system.



After the above change I still see the packet drops – so wanted to know if
this needs to be done on all the runners as well or ..? any thoughts.

Thanks,

Sumanth


xip.io IO errors

Sumanth Yamala
 

Hi,

Currently we see IO errors when trying to contact xip.io - this happens around 2% of the times. To apply a temporary fix - I updated the /etc/hosts of the ha_proxy VM to map the - IP to the service routes of each app deployed on the system.

After the above change I still see the packet drops - so wanted to know if this needs to be done on all the runners as well or ..? any thoughts.
Thanks,
Sumanth


Re: cloud_controller_ng performance degrades slowly over time

Amit Kumar Gupta
 

Hey Matt,

Dieu's suggestion will fix your problem (you'll have to make the change on
all CC's), although it'll get undone on each redeploy. We do want to find
the root cause, but have not been able to reproduce it in our own
environments. If you're up for some investigation, may I suggest the
following:

* Run the following variation of your script on one of the CCs:

require 'uri'
require 'net/http'

SYSTEM_DOMAIN = '--CHANGE-ME--'

uaa_domain = "uaa.#{SYSTEM_DOMAIN}"
login_url = "https://#{uaa_domain}/login"

curl_command="curl -f #{login_url} 2>&1"
nslookup_command="nslookup #{uaa_domain} 2>&1"

puts 'STARTING SANITY CHECK'
curl_output = `#{curl_command}`
raise "'#{curl_command}' failed with output:\n#{curl_output}" unless
$?.to_i.zero?
puts 'SANITY CHECK PASSED'

def duration_string(start)
"#{((Time.now - start) * 1000).round}ms"
end

puts 'STARTING TEST'
1.step do |i|
uri = URI.parse(login_url)
ruby_start = Time.now
ruby_response = Net::HTTP.get_response(uri)
ruby_duration = duration_string(ruby_start)

curl_start = Time.now
`#{curl_command}`
curl_duration = duration_string(curl_start)

nslookup_start = Time.now
`#{nslookup_command}`
nslookup_duration = duration_string(nslookup_start)

puts "#{"%05d" % i} [#{ruby_response.code}]: ruby #{ruby_duration} |
curl #{curl_duration} | nslookup #{nslookup_duration}"
end

* Send a kill -QUIT <consul_agent_pid> to the consul agent process once you
see the slow DNS manifest itself, you will get a dump of all the goroutines
running in the consul agent process
/var/vcap/sys/log/consul_agent/consul_agent.stderr.log. I would be curious
to see what it spits out.

Amit

On Wed, Oct 28, 2015 at 6:10 PM, Matt Cholick <cholick(a)gmail.com> wrote:

Thanks for taking a look, fingers crossed you can see it happen as well.

Our 217 install is on stemcell 3026 and our 212 install is on 2989.

IaaS is CenturyLink Cloud.

-Matt

On Wed, Oct 28, 2015 at 6:08 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

I got up to 10k on an AWS deployment of HEAD of cf-release with ruby 2.2,
then started another loop on the same box with ruby 2.1. In the end, they
got up to 40-50k without showing any signs of change. I had to switch to
resolving the UAA endpoint, eventually google started responding with 302s.

I'm going to try with a cf-release 212 deployment on my bosh lite, but
eventually I want to try on the same stemcell as you're using.

On Wed, Oct 28, 2015 at 5:01 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Thanks Matt, this is awesome.

I'm trying to reproduce this with your script, up at 10k with no
change. I'm also shelling out to curl in the script, to see if both curl
and ruby get affected, and so, if they're affected at the same time.

What IaaS and stemcell are you using?

Thanks,
Amit

On Wed, Oct 28, 2015 at 2:54 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

You might try moving the nameserver entry for the consul_agent in
/etc/resolv.conf on the cloud controller to the end to see if that helps.

-Dieu

On Wed, Oct 28, 2015 at 12:55 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Looks like you're right and we're experiencing the same issue as you
are Amit. We're suffering slow DNS lookups. The code is spending all of its
time here:
/var/vcap/packages/ruby-2.1.6/lib/ruby/2.1.0/net/http.rb.initialize
:879

I've experimented some with the environment and, after narrowing
things down to DNS, here's some minimal demonstrating the problem:

require "net/http"
require "uri"

# uri = URI.parse("http://uaa.example.com/info")
uri = URI.parse("https://www.google.com")

i = 0
while true do
beginning_time = Time.now
response = Net::HTTP.get_response(uri)

end_time = Time.now
i+=1
puts "#{"%04d" % i} Response: [#{response.code}], Elapsed: #{((end_time - beginning_time)*1000).round} ms"
end


I see the issue hitting both UAA and just hitting Google. At some
point, requests start taking 5 second longer, which I assume is a timeout.
One run:

0349 Response: [200], Elapsed: 157 ms
0350 Response: [200], Elapsed: 169 ms
0351 Response: [200], Elapsed: 148 ms
0352 Response: [200], Elapsed: 151 ms
0353 Response: [200], Elapsed: 151 ms
0354 Response: [200], Elapsed: 152 ms
0355 Response: [200], Elapsed: 153 ms
0356 Response: [200], Elapsed: 6166 ms
0357 Response: [200], Elapsed: 5156 ms
0358 Response: [200], Elapsed: 5158 ms
0359 Response: [200], Elapsed: 5156 ms
0360 Response: [200], Elapsed: 5156 ms
0361 Response: [200], Elapsed: 5160 ms
0362 Response: [200], Elapsed: 5172 ms
0363 Response: [200], Elapsed: 5157 ms
0364 Response: [200], Elapsed: 5165 ms
0365 Response: [200], Elapsed: 5157 ms
0366 Response: [200], Elapsed: 5155 ms
0367 Response: [200], Elapsed: 5157 ms

Other runs are the same. How many requests it takes before things time
out varies considerably (one run started in the 10s and another took 20k
requests), but it always happens. After that, lookups take an additional 5
second and never recover to their initial speed. This is why restarting the
cloud controller fixes the issue (temporarily).

The really slow cli calls (in the 1+min range) are simply due to the
amount of paging that a fetching data for a large org does, as that 5
seconds is multiplied out over several calls. Every user is feeling this
delay, it's just that it's only unworkable pulling the large datasets from
UAA.

I was not able to reproduce timeouts using a script calling "dig"
against localhost, only inside a ruby code.

The re-iterate our setup: we're running 212 without a consul server,
just the agents. I also successfully reproduce this problem in completely
different 217 install in a different datacenter. This setup also didn't
have an actual consul server, just the agent. I don't see anything in the
release notes past 217 indicating that this is fixed.

Anyone have thoughts? This is definitely creating some real headaches
for user management in our larger orgs. Amit: is there a bug we can follow?

-Matt


On Fri, Oct 9, 2015 at 10:52 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

You may not be running any consul servers, but you may have a consul
agent colocated on your CC VM and running there.

On Thu, Oct 8, 2015 at 5:59 PM, Matt Cholick <cholick(a)gmail.com>
wrote:

Zack & Swetha,
Thanks for the suggestion, will gather netstat info there next time.

Amit,
1:20 delay is due to paging. The total call length for each page is
closer to 10s. Just included those two calls with paging by the cf command
line included numbers to demonstrate the dramatic difference after a
restart. Delays disappear after a restart. We're not running consul yet, so
it wouldn't be that.

-Matt



On Thu, Oct 8, 2015 at 10:03 AM, Amit Gupta <agupta(a)pivotal.io>
wrote:

We've seen issues on some environments where requests to cc that
involve cc making a request to uaa or hm9k have a 5s delay while the local
consul agent fails to resolves the DNS for uaa/hm9k, before moving on to a
different resolver.

The expected behavior observed in almost all environments is that
the DNS request to consul agent fails fast and moves on to the next
resolver, we haven't figured out why a couple envs exhibit different
behavior. The impact is a 5 or 10s delay (5 or 10, not 5 to 10). It doesn't
explain your 1:20 delay though. Are you always seeing delays that long?

Amit


On Thursday, October 8, 2015, Zach Robinson <zrobinson(a)pivotal.io>
wrote:

Hey Matt,

I'm trying to think of other things that would affect only the
endpoints that interact with UAA and would be fixed after a CC restart.
I'm wondering if it's possible there are a large number of connections
being kept-alive, or stuck in a wait state or something. Could you take a
look at the netstat information on the CC and UAA next time this happens?

-Zach and Swetha


Re: Disable HTTP transport

Krzysztof Wilk
 

Thanks for authoritive answer. My application is a Java (Spring Framework) one hence the solution with Spring Security is just fine for me.


Re: Multiple ldap backend in UAA

Sree Tummidi
 

Yep, this is not supported. Our recommendation is to do consolidation on
the LDAP side.

-Sree

On Thu, Oct 29, 2015 at 10:08 AM, Jakub Witkowski <cuba888(a)wp.pl> wrote:

Thank You very much for info.
For now I'm using slapd-meta backend (
http://linux.die.net/man/5/slapd-meta) witch allow me to merge two trees
in one tree.
The AD I'm using is Samba 4 solutions.

I was looking for solution which allow me directly use multiple ldap
backend from UAA without proxy.
Looking at the solution You recommend I assume is not possible?




Re: Trouble enabling diego ssh in cf-release:222 diego:0.1437

Mike Youngstrom <youngm@...>
 

It appears my issue was caused by this uaa issue:
https://github.com/cloudfoundry/uaa/issues/223

Now to figure out the best way to work around it.

Thanks for your help Matt.

Mike

On Thu, Oct 29, 2015 at 10:07 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

This is what I'm now seeing in the logs:

ssh-proxy:
{"timestamp":"1446134601.007453442","source":"ssh-proxy","message":"ssh-proxy.authentication-failed","log_level":2,"data":{"error":"no
auth passed yet","user":"cf:52489e92-11b3-447f-813a-322353996d4a/0"}}

{"timestamp":"1446134601.011086702","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.authenticate-starting","log_level":1,"data":{"session":"23548"}}
{"timestamp":"1446134601.077978134","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.exchange-access-code-for-token.response-status-not-ok","log_level":2,"data":{"error":"Authentication
failed","session":"23548.1","status-code":400}}

{"timestamp":"1446134601.078182459","source":"ssh-proxy","message":"ssh-proxy.cf-authenticate.authenticate-finished","log_level":1,"data":{"session":"23548"}}
{"timestamp":"1446134601.078270674","source":"ssh-proxy","message":"ssh-proxy.authentication-failed","log_level":2,"data":{"error":"Authentication
failed","user":"cf:52489e92-11b3-447f-813a-322353996d4a/0"}}

uaa:
[2015-10-29T16:03:20.422Z] uaa - 6067 [http-bio-8080-exec-7] .... INFO
--- Audit: TokenIssuedEvent
('["cloud_controller.read","password.write","cloud_controller.write","openid","doppler.firehose","scim.read","cloud_controller.admin","uaa.user"]'):
principal=af373f0b-a193-4434-85ca-692c89e8feab, origin=[caller=cf,
details=(type=UaaAuthenticationDetails)], identityZoneId=[uaa]
[2015-10-29T16:03:21.073Z] uaa - 6067 [http-bio-8080-exec-27] .... INFO
--- TokenEndpoint: Handling error: InvalidGrantException, Invalid
authorization code: nZJfFg

CLI:
ssh: handshake failed: ssh: unable to authenticate, attempted methods
[none password], no supported methods remain

I'll try manual ssh. I'm also going to debug into UAA more see if I can
figure out why it isn't validating the authorization code being sent to it
presumably from ssh-proxy.

Mike

On Thu, Oct 29, 2015 at 3:12 AM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

I did the work on the cli plugin but not on the integration into the cli.
Based on your first error, it looked like we were having a problem getting
the one time code, not authenticating with the ssh proxy. The fact that
you're able to get the code from the UAA manually implies that piece is
working correctly.

The authorization code message could be related but, if it were, I'd
expect some evidence of that in the ssh proxy logs as well.

You can try to manually ssh using the instructions in the diego-ssh repo
[1] and see if you can isolate if the problem is on the cli side or the
server side.

[1]:
https://github.com/cloudfoundry-incubator/diego-ssh#cloud-foundry-via-cloud-controller-and-uaa

On Wed, Oct 28, 2015 at 9:57 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I think I'm getting closer. In UAA I now get the error:

TokenEndpoint: Handling error: InvalidGrantException, Invalid
authorization code: ad1o9o

This must be someone trying to redeem the auth code.

Mike

On Wed, Oct 28, 2015 at 7:41 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

That curl command returns what appears to be the correct response:

curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.
{redacted}/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy'
{trim}
GET
/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
HTTP/1.1
User-Agent: curl/7.38.0
Host: uaa.cf1-dev.lds.org
Accept: */*
Authorization: bearer {redacted}
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Language: en-US
< Content-Length: 0
< Date: Thu, 29 Oct 2015 01:32:08 GMT
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Location: http://uaa.{redacted}/login?code=huQG3t
< Pragma: no-cache
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< X-Cf-Requestid: 36f6b88e-f8a9-49f1-5f90-ef2b868c266d
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Content-Type: text/plain; charset=utf-8

I never see a call like this in my CF_TRACE.

Mike

On Wed, Oct 28, 2015 at 7:09 PM, Matthew Sykes <matthew.sykes(a)gmail.com
wrote:
That's not the request that the plugin is making to get the token.
We're using the API that was created for us [1].

If you use straight curl with something like this, what does the flow
really look like? Are there any errors in the uaa's logs?

$ curl -v -k -H "Authorization: $(cf oauth-token | tail -1)" '
https://uaa.bosh-lite.com/oauth/authorize?response_type=code&grant_type=authorization_code&client_id=ssh-proxy
'

The UAA should respond with a 302 and a Location header that includes
a code parameter. If not, can you use a jwt decoder against your bearer
token and verify that there's a `uaa.user` scope in the token?

[1]:
https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#api-authorization-requests-code-get-oauth-authorize-non-standard-oauth-authorize

On Wed, Oct 28, 2015 at 8:56 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

In case it helps this is the CF_TRACE of the UAA call that the ssh
plugin is expecting to be a redirect.

REQUEST: [2015-10-28T17:25:11-06:00]
POST /oauth/token HTTP/1.1
Host: uaa.{redacted}
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/x-www-form-urlencoded
User-Agent: go-cli 6.12.3-5364935 / linux

grant_type=refresh_token&refresh_token={token redacted}&scope=

RESPONSE: [2015-10-28T17:25:12-06:00]
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 28 Oct 2015 23:25:12 GMT
Expires: 0
Pragma: no-cache
Pragma: no-cache
Server: Apache-Coyote/1.1
X-Cf-Requestid: 4a6ad262-07e6-48a8-4640-271996e9bf64
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

a20
{"access_token":"[PRIVATE DATA
HIDDEN]","token_type":"bearer","refresh_token":"[PRIVATE DATA
HIDDEN]","expires_in":3599,"scope":"scim.userids cloud_controller.read
password.write cloud_controller.write openid doppler.firehose scim.read
cloud_controller.admin","jti":"00e07ad7-5090-42e9-8096-a542dffd6026"}
0

This is the ssh-proxy client info returned from a call to uaac
clients:

ssh-proxy
scope: cloud_controller.read cloud_controller.write openid
resource_ids: none
authorized_grant_types: authorization_code refresh_token
redirect_uri: /login
autoapprove: true
action: none
authorities: uaa.none
lastmodified: 1446074693000


Mike

On Wed, Oct 28, 2015 at 6:47 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Yes /v2/info contains "app_ssh_oauth_client: "ssh-proxy"".

Though I didn't set it. It appears CC sets it by default now.


https://github.com/cloudfoundry/cf-release/blob/master/jobs/cloud_controller_ng/spec#L81

Any other ideas?

Mike
On Oct 28, 2015 6:16 PM, "Matthew Sykes" <matthew.sykes(a)gmail.com>
wrote:

Does /v2/info contain the `app_ssh_auth_client` key? If not, it
should be set to the client ID of the ssh proxy. If it's not set, I think
that's one of the symptom.


https://github.com/cloudfoundry-incubator/diego-release/blob/develop/stubs-for-cf-release/enable_diego_ssh_in_cf.yml#L4

On Wed, Oct 28, 2015 at 7:36 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

I'm working on upgrading to latest cf-release+diego and I'm having
trouble getting ssh working.

When attempting to ssh with the latest cli I get the error:

"Authorization server did not redirect with one time code"

The relevant config is:

ssh_proxy.uaa_token_url=https://{uaa server}/oauth/token

uaa.clients.ssh-proxy:
authorized-grant-types: authorization_code
autoapprove: true
override: true
redirect-uri: /login
scope: openid,cloud_controller.read,cloud_controller.write
secret: secret

When tracing the CLI I see a call to "POST /oauth/token" and a
200. It appears that the CLI is expecting a redirect and not a 200.

Is "oauth/token" the correct uaa_token_url endpoint? Any idea why
UAA wouldn't be sending a redirect response from /oauth/token when the
plugin is expecting it?

Mike


--
Matthew Sykes
matthew.sykes(a)gmail.com

--
Matthew Sykes
matthew.sykes(a)gmail.com

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Problem deploying basic Apps on PWS

Charles Wu
 

We have seen this with some node apps. Does your app or buildpack reference
either of the following env variables VCAP_APP_HOST and VCAP_APP_PORT.

http://support.run.pivotal.io/entries/105844873-Migrating-Applications-from-DEAs-to-Diego

If your app references, you can adjust with the following:


- *VCAP_APP_HOST* and *VCAP_APP_PORT* environment variables have been
removed. *VCAP_APP_HOST* was always *0.0.0.0*, so you can safely use
that in place of the environment variable. *VCAP_APP_PORT* is replaced
by *PORT* and is set to *8080* by default.



On Thu, Oct 29, 2015 at 8:21 AM, Juan Antonio Breña Moral <
bren(a)juanantonio.info> wrote:

Hi Nicholas,

many thanks for the doubt.

I continue here:

http://support.run.pivotal.io/entries/106222933-Problem-deploying-basic-Apps-on-PWS


Re: Error to make a Request to update password in UAA

Filip Hanik
 

If your `access_token` value is

1. a client_credentials grant - the `oldPassword` field is not evaluated
and can be omitted
OR
2. it's a user token, with the scope of `uaa.admin` and the admin is trying
to change password for another user - the `oldPassword` field is not
evaluated and can be omitted

We will update the documentation to reflect these two use cases.
https://github.com/cloudfoundry/uaa/blob/feature/fix_saml_metadata_validation/scim/src/main/java/org/cloudfoundry/identity/uaa/password/PasswordChangeEndpoint.java#L124-L158


On Wed, Oct 28, 2015 at 9:43 AM, Juan Antonio Breña Moral <
bren(a)juanantonio.info> wrote:

Hi,

Using UAA API, it is possible to create users without password. Later if
you need to update the password what is the right request to make the
process? Current documentation is not very clear:


https://github.com/cloudfoundry/uaa/blob/master/docs/UAA-APIs.rst#create-a-user-post-users

The document has al ink for a section to udpate password but it was
removed:
http://www.simplecloud.info/specs/draft-scim-api-01.html#change-password

Using the documentation, the request throws an Error:

uaa_options = {
"schemas":["urn:scim:schemas:core:1.0"],
"password": "abc123456",
"oldPassword": "oldpassword"
}

return CloudFoundryUsersUAA.updatePassword(token_type, access_token,
uaa_guid, uaa_options);

UsersUAA.prototype.updatePassword = function (token_type, access_token,
uaa_guid, uaa_options) {
"use strict";

var url = this.UAA_API_URL + "/Users/" + uaa_guid + "/password";
var options = {
method: 'PUT',
url: url,
headers: {
Accept: 'application/json',
Authorization: token_type + ' ' + access_token
},
json: uaa_options
};

return this.REST.request(options, "200", false);
};

Error:

Error: the string "<html><head><title>Apache Tomcat/7.0.55 - Error
report</
title><style><!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-
color:#525D76;font-size:22px;} H2
{font-family:Tahoma,Arial,sans-serif;color:whi
te;background-color:#525D76;font-size:16px;} H3
{font-family:Tahoma,Arial,sans-s
erif;color:white;background-color:#525D76;font-size:14px;} BODY
{font-family:Tah
oma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,
Arial,sans-serif;color:white;background-color:#525D76;} P
{font-family:Tahoma,Ar
ial,sans-serif;background:white;color:black;font-size:12px;}A {color :
black;}A.
name {color : black;}HR {color : #525D76;}--></style>
</head><body><h1>HTTP Stat
us 400 - </h1><HR size=\"1\" noshade=\"noshade\"><p><b>type</b> Status
report</p
<p><b>message</b> <u></u></p><p><b>description</b> <u>The request sent by
the c
lient was syntactically incorrect.</u></p><HR size=\"1\"
noshade=\"noshade\"><h3
Apache Tomcat/7.0.55</h3></body></html>" was thrown, throw an Error :)
Note: it is possible to create an user in UAA with a password in the first
operation, but the documenation is not clear in this point.

var uaa_options = {
"schemas":["urn:scim:schemas:core:1.0"],
"userName":username,
"emails":[
{
"value":"demo(a)example.com",
"type":"work"
}
],
"password": "123456",
};

Usage with CF CLI: cf login -a https://apiMY_IP.xip.io -u userXXX -p
123456 --skip-ssl-validation

Any help to update passwords?

Juan Antonio