cf login fails with "Error finding available orgs"


Neil Watson
 

'CF_TRACE=true cf login -u admin' produces the output below. What might the problem be?

REQUEST: [2016-08-30T13:07:02-04:00]
GET /v2/organizations?order-by=name HTTP/1.1
Host: api.example.com
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Connection: close
Content-Type: application/json
User-Agent: go-cli 6.21.1+6fd3c9f / linux

RESPONSE: [2016-08-30T13:07:02-04:00]
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Length: 21
Content-Type: text/plain
Date: Tue, 30 Aug 2016 16:07:17 GMT
Server: nginx
X-Vcap-Request-Id: 04164af8-e82a-47aa-50be-3bef1b25ab0c

Internal server error

API endpoint: https://api.example.com (API version: 2.59.0)
User: admin
No org or space targeted, use 'cf target -o ORG -s SPACE'
FAILED
Error finding available orgs


Daniel Mikusa
 

Look at your cloud controller logs. You should see some reason for the 500
error. That should give you more clues as to why it's failing.

Dan

On Tue, Aug 30, 2016 at 12:24 PM, Neil Watson <neil(a)watson-wilson.ca> wrote:

'CF_TRACE=true cf login -u admin' produces the output below. What might
the problem be?

REQUEST: [2016-08-30T13:07:02-04:00]
GET /v2/organizations?order-by=name HTTP/1.1
Host: api.example.com
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Connection: close
Content-Type: application/json
User-Agent: go-cli 6.21.1+6fd3c9f / linux

RESPONSE: [2016-08-30T13:07:02-04:00]
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Length: 21
Content-Type: text/plain
Date: Tue, 30 Aug 2016 16:07:17 GMT
Server: nginx
X-Vcap-Request-Id: 04164af8-e82a-47aa-50be-3bef1b25ab0c

Internal server error

API endpoint: https://api.example.com (API version: 2.59.0)
User: admin
No org or space targeted, use 'cf target -o ORG -s SPACE'
FAILED
Error finding available orgs


Neil Watson
 

If you mean cloud_controller_ng.log on the api server, that file is unfortunately in unreadable json format. I see nothing in it returning a 500 error.


Daniel Mikusa
 

That would be the file. It should have more details. You can try
searching on the vcap-request-id or you can look at the access log that
nginx writes, look for the request to /v2/organizations that's returning
500, take the timestamp and then look around that time in the log CC log.

I would agree that the files are not pretty to look at, but grep can be
fairly helpful to search them. It's also pretty easily machine parseable
so you can script something to help make them more readable. I run `!%
python -m json.tool` in vim a lot which pretty prints the json, `jq` also
helps or python / ruby can be used to parse through with some custom script
pretty easily.

Hope that helps!

Dan

On Tue, Aug 30, 2016 at 12:58 PM, Neil Watson <neil(a)watson-wilson.ca> wrote:

If you mean cloud_controller_ng.log on the api server, that file is
unfortunately in unreadable json format. I see nothing in it returning a
500 error.


Neil Watson
 

There is nothing in that log for any organizations API query:

pwd
/var/vcap/sys/log/cloud_controller_ng

ls -ltr
total 16760
-rw-r----- 1 vcap vcap 239706 Aug 30 13:15 pre-start.stdout.log
-rw-r----- 1 vcap vcap 0 Aug 30 13:16 post-start.stdout.log
drwxr-xr-x 2 vcap vcap 4096 Aug 30 15:52 drain
-rw-r----- 1 vcap vcap 11378 Aug 30 15:53 pre-start.stderr.log
-rw-r--r-- 1 vcap vcap 240 Aug 30 15:53 cloud_controller_ng_ctl.err.log
-rw-r--r-- 1 vcap vcap 605 Aug 30 15:53 cloud_controller_ng_ctl.log
-rw-r--r-- 1 vcap vcap 496 Aug 30 15:53 cloud_controller_worker_ctl.err.log
-rw-r----- 1 vcap vcap 3618 Aug 30 15:53 post-start.stderr.log
-rw-r--r-- 1 vcap vcap 6577 Aug 30 15:54 cloud_controller_worker_ctl.log
-rw-r--r-- 1 vcap vcap 16877018 Aug 30 17:43 cloud_controller_ng.log

grep 'v2/org' cloud_controller_ng.log; echo $?
1


Neil Watson
 

And no ID in the file:

grep 1730392b-7537-4a7c-7940-85e51faa1b7b cloud_controller_ng.log ; echo $?
1


Amit Kumar Gupta
 

Do you have more than one api_zN instance running? Your logs may be on
another VM.

On Tue, Aug 30, 2016 at 10:50 AM, Neil Watson <neil(a)watson-wilson.ca> wrote:

And no ID in the file:

grep 1730392b-7537-4a7c-7940-85e51faa1b7b cloud_controller_ng.log ; echo
$?
1


Neil Watson
 

Not just the one:

[root(a)np-cloud-found01 ~]# bosh vms|grep api
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'admin' on 'my-bosh'
| api_worker_z1/0 (ec4fbebf-03dd-4fe2-a44a-acf237b404a9) | running | n/a | small_z1 | 10.120.7.37 |
| api_z1/0 (4ea752a7-caa5-4340-a7f5-8e5873376296) | running | n/a | large_z1 | 10.120.7.35 |


Daniel Mikusa
 

Did you recently `bosh recreate`, `bosh deploy` or has the resurrector
recently recreated the VM (maybe check the uptime if you're unsure)? These
are all things that can cause the VM to be rebuilt and you to lose logs.
Also, perhaps try running `tail -f` on the CC log file and replicating the
problem again? If the CC is having a 500 error, it will log it here.

Dan

On Tue, Aug 30, 2016 at 1:57 PM, Neil Watson <neil(a)watson-wilson.ca> wrote:

Not just the one:

[root(a)np-cloud-found01 ~]# bosh vms|grep api
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'admin' on 'my-bosh'
| api_worker_z1/0 (ec4fbebf-03dd-4fe2-a44a-acf237b404a9)
| running | n/a | small_z1 | 10.120.7.37 |
| api_z1/0 (4ea752a7-caa5-4340-a7f5-8e5873376296)
| running | n/a | large_z1 | 10.120.7.35 |


Daniel Mikusa
 

Look at the access log for Nginx. It's up a directory,
`nginx_cc/access.log` (I think). Make sure that you see the request listed
in the access log. The access log contains all incoming requests so if
it's not listed there, then it is not making it to this CC instance.

Dan

On Tue, Aug 30, 2016 at 1:44 PM, Neil Watson <neil(a)watson-wilson.ca> wrote:

There is nothing in that log for any organizations API query:

pwd
/var/vcap/sys/log/cloud_controller_ng

ls -ltr
total 16760
-rw-r----- 1 vcap vcap 239706 Aug 30 13:15 pre-start.stdout.log
-rw-r----- 1 vcap vcap 0 Aug 30 13:16 post-start.stdout.log
drwxr-xr-x 2 vcap vcap 4096 Aug 30 15:52 drain
-rw-r----- 1 vcap vcap 11378 Aug 30 15:53 pre-start.stderr.log
-rw-r--r-- 1 vcap vcap 240 Aug 30 15:53 cloud_controller_ng_ctl.err.
log
-rw-r--r-- 1 vcap vcap 605 Aug 30 15:53 cloud_controller_ng_ctl.log
-rw-r--r-- 1 vcap vcap 496 Aug 30 15:53 cloud_controller_worker_ctl.
err.log
-rw-r----- 1 vcap vcap 3618 Aug 30 15:53 post-start.stderr.log
-rw-r--r-- 1 vcap vcap 6577 Aug 30 15:54 cloud_controller_worker_ctl.
log
-rw-r--r-- 1 vcap vcap 16877018 Aug 30 17:43 cloud_controller_ng.log

grep 'v2/org' cloud_controller_ng.log; echo $?
1


Neil Watson
 

There has been no recreation of any VMs during this testing. The time span of the log file is within my tests. There is no 500 error and no matching ID. There is this issue that is similar:

https://github.com/cloudfoundry/cloud_controller_ng/issues/633

And given the trouble we've had understanding (guessing) how to generate keys I think it sounds possible. This is a very frustrating installation process.


Neil Watson
 

yes, I see related access requests in nginx_cc/nginx.access.log.


Neil Watson
 

Yes, I see related requests in nginx_cc/nginx.access.log.


Neil Watson
 

Example:

api.example.com - [30/Aug/2016:19:39:20 +0000] "GET /v2/organizations?order-by=name HTTP/1.1" 500 195 "-" "go-cli 6.21.1+6fd3c9f / linux" 10.120.8.132, 10.120.7.14, 10.120.7.16 vcap_request_id:- response_time:0.003


Neil Watson
 

I think the problem was indeed related to issue 633. When I used another method to generate the rsa key pair for jwt (sadly the docs do not specify how this should be done) and deployed a new release with this change, the login worked.