Date   

Re: Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

Jordan Collier
 

I was unclear on what I am asking, the real question is as follows:

What is the best way to run the apps test suite within the CATS on an older version of cloud foundry? (for example I am running these tests on version 208)


Re: Error 400007: `stats_z1/0' is not running after update

Amit Kumar Gupta
 

Okay, please let me know if you are able to fix your security group
settings and whether the original problem gets resolved.

Amit

On Wed, Sep 23, 2015 at 7:03 PM, Guangcai Wang <guangcai.wang(a)gmail.com>
wrote:

That did help. It showed us the real error.

==> metron_agent/metron_agent.stdout.log <==
{"timestamp":1443054247.927488327,"process_id":23472,"source":"metron","log_level":"warn","message":"Failed
to create client: Could not connect to NATS: dial tcp 192.168.110.202:4222:
i/o
timeout","data":null,"file":"/var/vcap/data/compile/metron_agent/loggregator/src/
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar/collector_registrar.go
","line":51,"method":"
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar.(*CollectorRegistrar).Run
"}

I checked the security rule. It seems to have some problems.

On Thu, Sep 24, 2015 at 2:47 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

I often take the following approach to debugging issues like this:

* Open two shell sessions to your failing VM using bosh ssh, and switch
to superuser
* In one session, `watch monit summary`. You might see collector going
back and forth between initializing and not monitored, but please report
anything else of interest you see here
* In the other session, `cd /var/vcap/sys/log` and then `watch
--differences=cumulative ls -altr **/*` to see which files are being
written to while the startup processes are thrashing. Then `tail -f FILE_1
FILE_2 ...` listing all the files that were being written to, and seem
relevant to the thrashing process(es) in monit


On Wed, Sep 23, 2015 at 12:21 AM, Guangcai Wang <guangcai.wang(a)gmail.com>
wrote:

It frequently logs the message below. It seems not helpful.


{"timestamp":1442987404.9433253,"message":"collector.started","log_level":"info","source":"collector","data":{},"thread_id":70132569199380,"fiber_id":70132570371720,"process_id":19392,"file":"/var/vcap/packages/collector/lib/collector/config.rb","lineno":45,"method":"setup_logging"}

the only possible error message from the bosh debug log is
"ntp":{"message":"bad ntp server"}

But I don't think, it is related to the failure of stats_z1 updating.

I, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] INFO --
DirectorJobRunner: Checking if stats_z1/0 has been updated after
63.333333333333336 seconds
D, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] DEBUG --
DirectorJobRunner: SENT: agent.7d3452bd-679e-4a97-8514-63a373a54ffd
{"method":"get_state","arguments":[],"reply_to":"director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a"}
D, [2015-09-23 04:55:59 #2392] [] DEBUG -- DirectorJobRunner: RECEIVED:
director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a
{"value":{"properties":{"logging":{"max_log_file_size":""}},"job":{"name":"stats_z1","release":"","template":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d","templates":[{"name":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d"},{"name":"collector","version":"889b187e2f6adc453c61fd8f706525b60e4b85ed","sha1":"f5ae15a8fa2417bf984513e5c4269f8407a274dc","blobstore_id":"3eeb0166-a75c-49fb-9f28-c29788dbf64d"},{"name":"metron_agent","version":"e6df4c316b71af68dfc4ca476c8d1a4885e82f5b","sha1":"42b6d84ad9368eba0508015d780922a43a86047d","blobstore_id":"e578bfb0-9726-4754-87ae-b54c8940e41a"},{"name":"apaas_collector","version":"8808f0ae627a54706896a784dba47570c92e0c8b","sha1":"b9a63da925b40910445d592c70abcf4d23ffe84d","blobstore_id":"3e6fa71a-07f7-446a-96f4-3caceea02f2f"}]},"packages":{"apaas_collector":{"name":"apaas_collector","version":"f294704d51d4517e4df3d8417a3d7c71699bc04d.1","sha1":"5af77ceb01b7995926dbd4ad7481dcb7c3d94faf","blobstore_id":"fa0e96b9-71a6-4828-416e-dde3427a73a9"},"collector":{"name":"collector","version":"ba47450ce83b8f2249b75c79b38397db249df48b.1","sha1":"0bf8ee0d69b3f21cf1878a43a9616cb7e14f6f25","blobstore_id":"722a5455-f7f7-427d-7e8d-e562552857bc"},"common":{"name":"common","version":"99c756b71550530632e393f5189220f170a69647.1","sha1":"90159de912c9bfc71740324f431ddce1a5fede00","blobstore_id":"37be6f28-c340-4899-7fd3-3517606491bb"},"fluentd-0.12.13":{"name":"fluentd-0.12.13","version":"71d8decbba6c863bff6c325f1f8df621a91eb45f.1","sha1":"2bd32b3d3de59e5dbdd77021417359bb5754b1cf","blobstore_id":"7bc81ac6-7c24-4a94-74d1-bb9930b07751"},"metron_agent":{"name":"metron_agent","version":"997d87534f57cad148d56c5b8362b72e726424e4.1","sha1":"a21404c50562de75000d285a02cd43bf098bfdb9","blobstore_id":"6c7cf72c-9ace-40a1-4632-c27946bf631e"},"ruby-2.1.6":{"name":"ruby-2.1.6","version":"41d0100ffa4b21267bceef055bc84dc37527fa35.1","sha1":"8a9867197682cabf2bc784f71c4d904bc479c898","blobstore_id":"536bc527-3225-43f6-7aad-71f36addec80"}},"configuration_hash":"a73c7d06b0257746e95aaa2ca994c11629cbd324","networks":{"private_cf_subnet":{"cloud_properties":{"name":"random","net_id":"1e1c9aca-0b5a-4a8f-836a-54c18c21c9b9","security_groups":["az1_cf_management_secgroup_bosh_cf_ssh_cf2","az1_cf_management_secgroup_cf_private_cf2","az1_cf_management_secgroup_cf_public_cf2"]},"default":["dns","gateway"],"dns":["192.168.110.8","133.162.193.10","133.162.193.9","192.168.110.10"],"dns_record_name":"0.stats-z1.private-cf-subnet.cf-apaas.microbosh","gateway":"192.168.110.11","ip":"192.168.110.204","netmask":"255.255.255.0"}},"resource_pool":{"cloud_properties":{"instance_type":"S-1"},"name":"small_z1","stemcell":{"name":"bosh-openstack-kvm-ubuntu-trusty-go_agent","version":"2989"}},"deployment":"cf-apaas","index":0,"persistent_disk":0,"persistent_disk_pool":null,"rendered_templates_archive":{"sha1":"0ffd89fa41e02888c9f9b09c6af52ea58265a8ec","blobstore_id":"4bd01ae7-a69a-4fe5-932b-d98137585a3b"},"agent_id":"7d3452bd-679e-4a97-8514-63a373a54ffd","bosh_protocol":"1","job_state":"failing","vm":{"name":"vm-12d45510-096d-4b8b-9547-73ea5fda00c2"},"ntp":{"message":"bad
ntp server"}}}


On Wed, Sep 23, 2015 at 5:13 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Please check the file collector/collector.log, it's in a subdirectory
of the unpacked log tarball.

On Wed, Sep 23, 2015 at 12:01 AM, Guangcai Wang <
guangcai.wang(a)gmail.com> wrote:

Actually, I checked the two files in status_z1 job VM. I did not find
any clues. Attached for reference.

On Wed, Sep 23, 2015 at 4:54 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

If you do "bosh logs stats_z1 0 --job" you will get a tarball of all
the logs for the relevant processes running on the stats_z1/0 VM. You will
likely find some error messages in the collectors stdout or stderr logs.

On Tue, Sep 22, 2015 at 11:30 PM, Guangcai Wang <
guangcai.wang(a)gmail.com> wrote:

It does not help.

I always see the "collector" process bouncing between "running" and
"does not exit" when I use "monit summary" in a while loop.

Who knows how to get the real error when the "collector" process is
not failed? Thanks.

On Wed, Sep 23, 2015 at 4:11 PM, Tony <Tonyl(a)fast.au.fujitsu.com>
wrote:

My approach is to login on the stats vm and sudo, then
run "monit status" and restart the failed processes or simply
restart all
processes by running "monit restart all"

wait for a while(5~10 minutes at most)
If there is still some failed process, e.g. collector
then run ps -ef | grep collector
and kill the processes in the list(may be you need to run kill -9
sometimes)

then "monit restart all"

Normally, it will fix the issue "Failed: `XXX' is not running after
update"



--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-Error-400007-stats-z1-0-is-not-running-after-update-tp1901p1902.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Environment variables with special characters not handled correctly?

Dieu Cao <dcao@...>
 

Hi Jonas,

You'll need to escape the special characters like $.
See this tracker story for some background:
https://www.pivotaltracker.com/story/show/76655240

-Dieu

On Thu, Sep 24, 2015 at 11:14 AM, Daniel Mikusa <dmikusa(a)pivotal.io> wrote:

It's possible that your shell is escaping the characters, like the '$'.

Try `cf set-env appname WORDPRESS_BEARER
'1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd'`.
Note the single quotes around the value of the environment variable. Or
set the environment variable in a manifest.yml file.

Also, run `cf env <app-name>` to confirm the value is being set correctly.

Thanks,

Dan


On Thu, Sep 24, 2015 at 1:57 PM, Jonas Rosland <jonas.rosland(a)emc.com>
wrote:

Hi all,

I am having an issue with an environment variable containing special
characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER
1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd`
(obviously not my currently correct key) and then use it in this Ruby app:
https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being
incorrect, when it is, in fact, correct. If I set it manually in the
application it works, but that is of course not a good practice. I've also
verified that the environment variable does get picked up by the
application by adding some logging output to show the API key, but it still
won't work. I'm wondering if this is because of the special characters in
the environment variable?

Thanks in advance,
Jonas Rosland


Re: Environment variables with special characters not handled correctly?

Daniel Mikusa
 

It's possible that your shell is escaping the characters, like the '$'.

Try `cf set-env appname WORDPRESS_BEARER
'1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd'`. Note
the single quotes around the value of the environment variable. Or set the
environment variable in a manifest.yml file.

Also, run `cf env <app-name>` to confirm the value is being set correctly.

Thanks,

Dan


On Thu, Sep 24, 2015 at 1:57 PM, Jonas Rosland <jonas.rosland(a)emc.com>
wrote:

Hi all,

I am having an issue with an environment variable containing special
characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER
1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd`
(obviously not my currently correct key) and then use it in this Ruby app:
https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being
incorrect, when it is, in fact, correct. If I set it manually in the
application it works, but that is of course not a good practice. I've also
verified that the environment variable does get picked up by the
application by adding some logging output to show the API key, but it still
won't work. I'm wondering if this is because of the special characters in
the environment variable?

Thanks in advance,
Jonas Rosland


Environment variables with special characters not handled correctly?

Jonas Rosland
 

Hi all,

I am having an issue with an environment variable containing special characters that doesn't seem to picked up correctly by CF.

I run `cf set-env appname WORDPRESS_BEARER 1NNKhb5&Nfw$F(wqbqW&9nSeoonwAYz7#j2M1KKY!QU(Wbs(a)8xwjr6Q$hg(IPqcd` (obviously not my currently correct key) and then use it in this Ruby app: https://gist.github.com/jonasrosland/08b5758eaa9098a81cf8

When I check the output the app complains about the API key being incorrect, when it is, in fact, correct. If I set it manually in the application it works, but that is of course not a good practice. I've also verified that the environment variable does get picked up by the application by adding some logging output to show the API key, but it still won't work. I'm wondering if this is because of the special characters in the environment variable?

Thanks in advance,
Jonas Rosland


Jordan Collier email for mailing list

Jordan Collier
 

jordanicollier(a)gmail.com


Running the app test suite within the CATs, and the admin_buildpack_lifecycle_test is failing

Jordan Collier
 

`[2015-09-24 15:06:25.24 (UTC)]> cf logout
Logging out...
OK
• Failure [22.809 seconds]
Admin Buildpacks
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:172
when the buildpack fails to detect
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:129
fails to stage [It]
/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:128

Got stuck at:
Creating app CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 in org CATS-ORG-1-2015_09_24-08h01m29.448s / space CATS-SPACE-1-2015_09_24-08h01m29.448s as CATS-USER-1-2015_09_24-08h01m29.448s...
OK

Creating route cats-app-5c7775a6-1753-4e2d-4415-7f6abe01a974.switchollie.allstate.com...
OK

Binding cats-app-5c7775a6-1753-4e2d-4415-7f6abe01a974.switchollie.allstate.com to CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974...
OK

Uploading CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974...
Uploading app files from: /var/folders/ph/tg82ppzd6kngwm_g2tbzpccc0000gn/T/matching-app824262495
Uploading 132, 1 files
Done uploading
OK

Starting app CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 in org CATS-ORG-1-2015_09_24-08h01m29.448s / space CATS-SPACE-1-2015_09_24-08h01m29.448s as CATS-USER-1-2015_09_24-08h01m29.448s...
-----> Downloaded app package (4.0K)
Staging failed: An application could not be detected by any available buildpack


FAILED
Server error, status code: 400, error code: 170003, message: An app was not successfully detected by any available buildpack

TIP: use 'cf logs CATS-APP-5c7775a6-1753-4e2d-4415-7f6abe01a974 --recent' for more information

Waiting for:
NoAppDetectedError

/Users/localadmin/github.com/cloudfoundry/src/github.com/cloudfoundry/cf-acceptance-tests/apps/admin_buildpack_lifecycle_test.go:127`

It looks as if it is failing for the correct reason, is there something I am missing?


Re: Security group rules to allow HTTP communication between 2 apps deployed on CF

CF Runtime
 

Containers have a default iptables rule for REJECT all traffic. If there is
not a security group configured to allow the traffic to the destination,
you'll get a connection refused.

Security groups can only be created and configured by admin users.

Your only option is probably to have one app connect to the other using the
public route bound to that app.

Joseph
CF Release Integration Team

On Wed, Sep 23, 2015 at 3:54 AM, Denilson Nastacio <dnastacio(a)gmail.com>
wrote:

The message indicates this problem is unrelated to security groups. You
would get something like "host not found" instead of "connection refused".

Which version of CF are you using?
Can you curl a url from app2 at all?

On Wed, Sep 23, 2015, 3:27 AM Naveen Asapu <asapu.naveen(a)gmail.com> wrote:

Hi Matthew Sykes,

Actually I'm trying to monitor usage of app in bluemix. for that i'm
using cf-abacus in the example steps this command also there.

can u suggest how to monitor app usage using curl and cloudfoundary

--
Thanks
Naveen Asapu


Re: DEA/Warden staging error

kyle havlovitz <kylehav@...>
 

Ok, after more investigating the problem was that network manager was
running on the machine and was trying to take control of new network
interfaces after they came up, so it would cause problems with the
interface that Warden created for the container. With network manager
disabled I can push the app and everything is fine.

Thanks for your help everyone.

On Wed, Sep 23, 2015 at 10:45 AM, kyle havlovitz <kylehav(a)gmail.com> wrote:

Here's the output from those commands:
https://gist.github.com/MrEnzyme/36592831b1c46d44f007
Soon after running those I noticed that the container loses its IPv4
address shortly after coming up and ifconfig looks like this:

root(a)cf-build:/home/cloud-user/test# ifconfig -a
docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr fa:16:3e:cd:f3:0a
inet addr:172.25.1.52 Bcast:172.25.1.127 Mask:255.255.255.128
inet6 addr: fe80::f816:3eff:fecd:f30a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:515749 errors:0 dropped:0 overruns:0 frame:0
TX packets:295471 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1162366659 (1.1 GB) TX bytes:59056756 (59.0 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:45057315 errors:0 dropped:0 overruns:0 frame:0
TX packets:45057315 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:18042315375 (18.0 GB) TX bytes:18042315375 (18.0 GB)
w-190db6c54la-0 Link encap:Ethernet HWaddr 12:dc:ba:da:38:5b
inet6 addr: fe80::10dc:baff:feda:385b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1454 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:227 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:872 (872.0 B) TX bytes:35618 (35.6 KB)

Any idea what would be causing that?


On Tue, Sep 22, 2015 at 10:31 PM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

Based on your description, it doesn't sound like warden networking or the
warden iptables chains are your problem. Are you able to share all of your
routes and chains via a gist?

route -n
ifconfig -a
iptables -L -n -v -t filter
iptables -L -n -v -t nat
iptables -L -n -v -t mangle

Any kernel messages that look relevant in the message buffer (dmesg)?

Have you tried doing a network capture to verify the packets are look the
way you expect? Are you sure your host routing rules are good? Do the
warden subnets overlap with any network accessible to the host?

Based on previous notes, it doesn't sound like this is a standard
deployment so it's hard to say what could be impacting you.

On Tue, Sep 22, 2015 at 1:08 PM, Kyle Havlovitz (kyhavlov) <
kyhavlov(a)cisco.com> wrote:

I didn’t; I’m still having this problem. Even adding this lenient
security group didn’t let me get any traffic out of the VM:

[{"name":"allow_all","rules":[{"protocol":"all","destination":"0.0.0.0/0
"},{"protocol":"tcp","destination":"0.0.0.0/0
","ports":"1-65535"},{"protocol":"udp","destination":"0.0.0.0/0
","ports":"1-65535"}]}]

The only way I was able to get traffic out was by manually removing the
reject/drop iptables rules that warden set up, and even with that the
container still lost all connectivity after 30 seconds.

From: CF Runtime <cfruntime(a)gmail.com>
Reply-To: "Discussions about Cloud Foundry projects and the system
overall." <cf-dev(a)lists.cloudfoundry.org>
Date: Tuesday, September 22, 2015 at 12:50 PM
To: "Discussions about Cloud Foundry projects and the system overall." <
cf-dev(a)lists.cloudfoundry.org>
Subject: [cf-dev] Re: Re: Re: Re: Re: Re: Re: Re: DEA/Warden staging
error

Hey Kyle,

Did you make any progress?

Zak & Mikhail
CF Release Integration Team

On Thu, Sep 17, 2015 at 10:28 AM, CF Runtime <cfruntime(a)gmail.com>
wrote:

It certainly could be. By default the contains reject all egress
traffic. CC security groups configure iptables rules that allow traffic
out.

One of the default security groups in the BOSH templates allows access
on port 53. If you have no security groups, the containers will not be able
to make any outgoing requests.

Joseph & Natalie
CF Release Integration Team

On Thu, Sep 17, 2015 at 8:44 AM, Kyle Havlovitz (kyhavlov) <
kyhavlov(a)cisco.com> wrote:

On running git clone inside the container via the warden shell, I get:
"Cloning into 'staticfile-buildpack'...
fatal: unable to access '
https://github.com/cloudfoundry/staticfile-buildpack/': Could not
resolve host: github.com".
So the container can't get to anything outside of it (I also tried
pinging some external IPs to make sure it wasn't a DNS thing). Would this
be caused by cloud controller security group settings?

--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Removing support for v1 service brokers

Mike Youngstrom
 

My vote on to wait a couple more months. I guess we'll see if anyone else
would like more months.

Mike

On Sep 23, 2015 11:52 PM, "Dieu Cao" <dcao(a)pivotal.io> wrote:

Thanks Mike. Totally understandable.


On Wed, Sep 23, 2015 at 9:23 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Dieu, honestly I was just trying to find an angle to bargain for a
bit more time. :) Three months is generous. But six months would be
glorious. :)

After the CAB call this month we got started converting our brokers over
but our migration is more difficult because we use Service instance
credentials quite a bit and those don't appear to be handled well when
doing "migrate-service-instances". I think we can do 3 months but we'll be
putting our users through a bit of a fire drill.

That said I'll understand if you stick to 3 months since, we should have
started this conversion log ago.

Mike

On Wed, Sep 23, 2015 at 1:22 AM, Dieu Cao <dcao(a)pivotal.io> wrote:

We've found NATS to be unstable under certain conditions, temporary
network interruptions or network instability, around the client
reconnection logic.
We've seen that it could take anywhere from a few seconds to half an
hour to reconnect properly. We spent a fair amount of time investigating
ways to improve the reconnection logic and have made some improvements but
believe that it's best to work towards not having this dependency.
You can find more about this in the stories in this epic [1].

Mike, in addition to removing the NATS dependency, this will remove the
burden on the team, almost a weekly fight, in terms of maintaining
backwards compatibility for the v1 broker spec any time we work on adding
functionality to the service broker api.
I'll work with the team in the next couple of weeks on specific stories
and I'll link to it here.

[1] https://www.pivotaltracker.com/epic/show/1440790


On Tue, Sep 22, 2015 at 10:07 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the announcement.

To be clear is this announcement to cease support for the old v1
brokers or is this to eliminate support for the v1 api in the CC? Does the
v1 CC code depend on NATS? None of my custom v1 brokers depend on NATS.

Mike

On Tue, Sep 22, 2015 at 6:01 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

Hello all,

We plan to remove support for v1 service brokers in about 3 months, in
a cf-release following 12/31/2015.
We are working towards removing CF's dependency on NATS and the v1
service brokers are still dependent on NATS.
Please let me know if you have questions/concerns about this timeline.

I'll be working on verifying a set of steps that you can find here [1]
that document how to migrate your service broker from v1 to v2 and what is
required in order to persist user data and will get that posted to the
service broker api docs officially.

-Dieu
CF CAPI PM

[1]
https://docs.google.com/document/d/1Pl1o7mxtn3Iayq2STcMArT1cJsKkvi4Ey1-d3TB_Nhs/edit?usp=sharing




Re: How to deploy a Web application using HTTPs

Juan Antonio Breña Moral <bren at juanantonio.info...>
 

Hi Dieu,

many thanks for the technical info.

I will consider this factor to add this restriction in the development.

Juan Antonio


Re: Removing support for v1 service brokers

Dieu Cao <dcao@...>
 

Thanks Mike. Totally understandable.

On Wed, Sep 23, 2015 at 9:23 AM, Mike Youngstrom <youngm(a)gmail.com> wrote:

Thanks Dieu, honestly I was just trying to find an angle to bargain for a
bit more time. :) Three months is generous. But six months would be
glorious. :)

After the CAB call this month we got started converting our brokers over
but our migration is more difficult because we use Service instance
credentials quite a bit and those don't appear to be handled well when
doing "migrate-service-instances". I think we can do 3 months but we'll be
putting our users through a bit of a fire drill.

That said I'll understand if you stick to 3 months since, we should have
started this conversion log ago.

Mike

On Wed, Sep 23, 2015 at 1:22 AM, Dieu Cao <dcao(a)pivotal.io> wrote:

We've found NATS to be unstable under certain conditions, temporary
network interruptions or network instability, around the client
reconnection logic.
We've seen that it could take anywhere from a few seconds to half an hour
to reconnect properly. We spent a fair amount of time investigating ways to
improve the reconnection logic and have made some improvements but believe
that it's best to work towards not having this dependency.
You can find more about this in the stories in this epic [1].

Mike, in addition to removing the NATS dependency, this will remove the
burden on the team, almost a weekly fight, in terms of maintaining
backwards compatibility for the v1 broker spec any time we work on adding
functionality to the service broker api.
I'll work with the team in the next couple of weeks on specific stories
and I'll link to it here.

[1] https://www.pivotaltracker.com/epic/show/1440790


On Tue, Sep 22, 2015 at 10:07 PM, Mike Youngstrom <youngm(a)gmail.com>
wrote:

Thanks for the announcement.

To be clear is this announcement to cease support for the old v1 brokers
or is this to eliminate support for the v1 api in the CC? Does the v1 CC
code depend on NATS? None of my custom v1 brokers depend on NATS.

Mike

On Tue, Sep 22, 2015 at 6:01 PM, Dieu Cao <dcao(a)pivotal.io> wrote:

Hello all,

We plan to remove support for v1 service brokers in about 3 months, in
a cf-release following 12/31/2015.
We are working towards removing CF's dependency on NATS and the v1
service brokers are still dependent on NATS.
Please let me know if you have questions/concerns about this timeline.

I'll be working on verifying a set of steps that you can find here [1]
that document how to migrate your service broker from v1 to v2 and what is
required in order to persist user data and will get that posted to the
service broker api docs officially.

-Dieu
CF CAPI PM

[1]
https://docs.google.com/document/d/1Pl1o7mxtn3Iayq2STcMArT1cJsKkvi4Ey1-d3TB_Nhs/edit?usp=sharing




Re: How to deploy a Web application using HTTPs

Dieu Cao <dcao@...>
 

Your edge load balancer should be configured to add x-forwarded-for and
x-forwarded-proto headers.

On Wed, Sep 23, 2015 at 4:24 AM, Juan Antonio Breña Moral <
bren(a)juanantonio.info> wrote:

@James,

who add the headers?

"x-forwarded-for":"CLIENT_REAL_IP, CLOUD_FOUNDRY_IP",
"x-forwarded-proto":"https"

the load balancer or the GoRouter?


Re: Introducing CF-Swagger

Dieu Cao <dcao@...>
 

Separate from this proposal, the CAPI team has stories for spiking on a few
different api documentation options for the cloud controller api [1].
Swagger is one of the options we are looking into, but it is not the only
one.

[1] https://www.pivotaltracker.com/epic/show/2093796



On Wed, Sep 23, 2015 at 11:00 AM, Deepak Vij (A) <deepak.vij(a)huawei.com>
wrote:

Hi Mohamed and Dr. Max, I fully support this effort. By having Swagger
based “Application Interface” capability as part of the overall CF PaaS
platform would be very useful for the CF community as a whole. As a
matter of fact, I also initiated a similar thread few months ago on cf-dev
alias (see email text below). Your work exactly matches up with what our
current thinking is.



By having “Swagger” based “Application Interface” is a very good start
along those lines. This opens up lots of other possibilities such as
building out “Deployment Governance” capabilities not merely for Cloud
Foundry API or Services assets but for the whole Application landscape
built & deployed within CF PaaS environment and subsequently exposed as
APIs to end consumers.



As described below in my email I sent out earlier that “Deployment
Governance” as part of overall API Management is what we are striving
towards in order to expose comprehensive telecom API Management
capabilities within the public cloud environment.



Dr. Max, as I mentioned to you during our brief discussion few days ago
that “Heroku” folks also have a similar initiative ongoing. They have gone
lightweight “JSON” schema route versus Swagger/WADL/RAML etc.



In any case, I am fully in support of your proposal. Thanks.



Regards,

Deepak Vij



=============================

Hi folks, I would like to start a thread on the need for machine-readable “*Application
Interface*” supported at the platform level. Essentially, this interface
describes details such as available methods/operations, inputs/outputs data
types (schema), application dependencies etc. Any standard specifications
language can be used for this purpose, as long as it clearly describes the
schema of the requests and responses – one can use Web Application
Description Language (WADL), Swagger, RESTful API Modeling Language (RAML),
JSON Schema (something like *JSON Schema for Heroku Platform APIs*) or
any other language that provides similar functionality. These
specifications are to be automatically derived from the code and are
typically part of the application development process (e.g. generated by
the build system).



Such functionality can have lots of usage scenarios:

1. First and foremost, Deployment Governance for API Management (our
main vested interest) – API Versioning & Backward Compatibility,
Dependency Management and many more as part of the comprehensive telecom
API Management capabilities which we are currently in the process of
building out.

2. Auto-creating client libraries for your favorite programming
language.

3. Automatic generation of up-to-date documentation.

4. Writing automatic acceptance and integration tests etc.



From historical perspective, in the early 2000s when SOA started out, the
mindset was to author the application contract-first (application interface
using WSDL at that time) and subsequently generate and author code from the
application interface. With the advent of RESTful services, REST community
initially took a stand against such metadata for applications. Although, a
number of metadata standards have none-the-less emerged over the last
couple of years, mainly fueled by the use case scenarios described earlier.



Based on my knowledge, none of this currently exists within Cloud Foundry
at the platform level. It would be highly desirable to have a standard
common “*application interface*” definition at the platform level,
agnostic of the underlying application development frameworks.



I hope this all makes sense. I think this is something could be very
relevant to the “Utilities” PMC. I will also copy&paste this text under
“Utilities” PMC-notes on the github.



I would love to hear from the community on this. Thanks.



Regards,

Deepak Vij



*From:* Michael Maximilien [mailto:maxim(a)us.ibm.com]
*Sent:* Friday, September 18, 2015 4:52 PM
*To:* cf-dev(a)lists.cloudfoundry.org
*Cc:* Heiko Ludwig; Mohamed Mohamed; Alex Tarpinian; Christopher B Ferris
*Subject:* [cf-dev] Introducing CF-Swagger



Hi, all,



This email serves two purposes: 1) introduce CF-Swagger, and 2) shares the
results of the CF service broker compliance survey I sent out a couple of
weeks ago.



------

My IBM Research colleague, Mohamed (on cc:), and I have been working on
creating Swagger descriptions for some CF APIs.



Our main goal was to explore what useful tools or utilities we could build
with these Swagger descriptions once created.



The initial results of this exploratory research is CF-Swagger which is
included in the following:



See presentation here: https://goo.gl/Y16plT

Video demo here: http://goo.gl/C8Nz5p

Temp repo here: https://github.com/maximilien/cf-swagger



The gist of of our work and results are:



1. We created a full Swagger description of the CF service broker

2. Using this description you can use the Swagger editor to create a neat
API docs that is browsable and even callable

3. Using the description you can create client and server stubs for
service brokers in a variety of languages, e.g., JS, Java, Ruby, etc.

4. We've extended go-swagger to generate workable client and server stubs
for service brokers in Golang. We plan to submit all changes to go-swagger
back to that project

5. We've extended go-swagger to generate prototypes of working Ginkgo
tests to service brokers

6. We've extended go-swagger to generate a CF service broker Ginkgo Test
Compliance Kit (TCK) that anyone could use to validate their broker's
compliance with any Swagger-described version of spec

7. We've created a custom Ginkgo reporter that when ran with TCK will give
you a summary of your compliance, e.g., 100% compliant with v2.5 but 90%
compliant with v2.6 due to failing test X, Y, Z... (in Ginkgo fashion)

8. The survey results (all included in the presentation) indicate that
over 50% of respondants believe TCK tests for service broker would be
valuable to them. Many (over 50%) are using custom proprietary tests, and
this project maybe a way to get everyone to converge to a common set of
tests we could all use and improve...



------

We plan to propose this work to become a CF incubator at the next CAB and
PMC calls, especially the TCK part for service brokers. The overall
approach and project could be useful for other parts of the CF APIs but we
will start with CF Service Brokers.



The actual Swagger descriptions should ideally come from the teams who own
the APIs. So for service brokers, the CAPI team. We are engaging them as
they have also been looking at improving APIs docs and descriptions. Maybe
there are potential for synergies and at a minimum making sure what we
generate ends up becoming useful to their pipelines.



Finally, while the repo is temporary and will change, I welcome you to
take a look at presentation and video and code and let us know your
thoughts and feedback.



Thanks for your time and interest.



Mohamed and Max

IBM


Re: Error 400007: `stats_z1/0' is not running after update

iamflying
 

That did help. It showed us the real error.

==> metron_agent/metron_agent.stdout.log <==
{"timestamp":1443054247.927488327,"process_id":23472,"source":"metron","log_level":"warn","message":"Failed
to create client: Could not connect to NATS: dial tcp 192.168.110.202:4222:
i/o
timeout","data":null,"file":"/var/vcap/data/compile/metron_agent/loggregator/src/
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar/collector_registrar.go
","line":51,"method":"
github.com/cloudfoundry/loggregatorlib/cfcomponent/registrars/collectorregistrar.(*CollectorRegistrar).Run
"}

I checked the security rule. It seems to have some problems.

On Thu, Sep 24, 2015 at 2:47 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

I often take the following approach to debugging issues like this:

* Open two shell sessions to your failing VM using bosh ssh, and switch to
superuser
* In one session, `watch monit summary`. You might see collector going
back and forth between initializing and not monitored, but please report
anything else of interest you see here
* In the other session, `cd /var/vcap/sys/log` and then `watch
--differences=cumulative ls -altr **/*` to see which files are being
written to while the startup processes are thrashing. Then `tail -f FILE_1
FILE_2 ...` listing all the files that were being written to, and seem
relevant to the thrashing process(es) in monit


On Wed, Sep 23, 2015 at 12:21 AM, Guangcai Wang <guangcai.wang(a)gmail.com>
wrote:

It frequently logs the message below. It seems not helpful.


{"timestamp":1442987404.9433253,"message":"collector.started","log_level":"info","source":"collector","data":{},"thread_id":70132569199380,"fiber_id":70132570371720,"process_id":19392,"file":"/var/vcap/packages/collector/lib/collector/config.rb","lineno":45,"method":"setup_logging"}

the only possible error message from the bosh debug log is
"ntp":{"message":"bad ntp server"}

But I don't think, it is related to the failure of stats_z1 updating.

I, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] INFO --
DirectorJobRunner: Checking if stats_z1/0 has been updated after
63.333333333333336 seconds
D, [2015-09-23 04:55:59 #2392] [canary_update(stats_z1/0)] DEBUG --
DirectorJobRunner: SENT: agent.7d3452bd-679e-4a97-8514-63a373a54ffd
{"method":"get_state","arguments":[],"reply_to":"director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a"}
D, [2015-09-23 04:55:59 #2392] [] DEBUG -- DirectorJobRunner: RECEIVED:
director.c5b97fc1-b972-47ec-9412-a83ad240823b.473fda64-6ac3-4a53-9ebc-321fc7eabd7a
{"value":{"properties":{"logging":{"max_log_file_size":""}},"job":{"name":"stats_z1","release":"","template":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d","templates":[{"name":"fluentd","version":"4c71c87bbf0144428afacd470e2a5e32b91932fc","sha1":"b141c6037d429d732bf3d67f7b79f8d7d80aac5d","blobstore_id":"d8451d63-2e4f-4664-93a8-a77e5419621d"},{"name":"collector","version":"889b187e2f6adc453c61fd8f706525b60e4b85ed","sha1":"f5ae15a8fa2417bf984513e5c4269f8407a274dc","blobstore_id":"3eeb0166-a75c-49fb-9f28-c29788dbf64d"},{"name":"metron_agent","version":"e6df4c316b71af68dfc4ca476c8d1a4885e82f5b","sha1":"42b6d84ad9368eba0508015d780922a43a86047d","blobstore_id":"e578bfb0-9726-4754-87ae-b54c8940e41a"},{"name":"apaas_collector","version":"8808f0ae627a54706896a784dba47570c92e0c8b","sha1":"b9a63da925b40910445d592c70abcf4d23ffe84d","blobstore_id":"3e6fa71a-07f7-446a-96f4-3caceea02f2f"}]},"packages":{"apaas_collector":{"name":"apaas_collector","version":"f294704d51d4517e4df3d8417a3d7c71699bc04d.1","sha1":"5af77ceb01b7995926dbd4ad7481dcb7c3d94faf","blobstore_id":"fa0e96b9-71a6-4828-416e-dde3427a73a9"},"collector":{"name":"collector","version":"ba47450ce83b8f2249b75c79b38397db249df48b.1","sha1":"0bf8ee0d69b3f21cf1878a43a9616cb7e14f6f25","blobstore_id":"722a5455-f7f7-427d-7e8d-e562552857bc"},"common":{"name":"common","version":"99c756b71550530632e393f5189220f170a69647.1","sha1":"90159de912c9bfc71740324f431ddce1a5fede00","blobstore_id":"37be6f28-c340-4899-7fd3-3517606491bb"},"fluentd-0.12.13":{"name":"fluentd-0.12.13","version":"71d8decbba6c863bff6c325f1f8df621a91eb45f.1","sha1":"2bd32b3d3de59e5dbdd77021417359bb5754b1cf","blobstore_id":"7bc81ac6-7c24-4a94-74d1-bb9930b07751"},"metron_agent":{"name":"metron_agent","version":"997d87534f57cad148d56c5b8362b72e726424e4.1","sha1":"a21404c50562de75000d285a02cd43bf098bfdb9","blobstore_id":"6c7cf72c-9ace-40a1-4632-c27946bf631e"},"ruby-2.1.6":{"name":"ruby-2.1.6","version":"41d0100ffa4b21267bceef055bc84dc37527fa35.1","sha1":"8a9867197682cabf2bc784f71c4d904bc479c898","blobstore_id":"536bc527-3225-43f6-7aad-71f36addec80"}},"configuration_hash":"a73c7d06b0257746e95aaa2ca994c11629cbd324","networks":{"private_cf_subnet":{"cloud_properties":{"name":"random","net_id":"1e1c9aca-0b5a-4a8f-836a-54c18c21c9b9","security_groups":["az1_cf_management_secgroup_bosh_cf_ssh_cf2","az1_cf_management_secgroup_cf_private_cf2","az1_cf_management_secgroup_cf_public_cf2"]},"default":["dns","gateway"],"dns":["192.168.110.8","133.162.193.10","133.162.193.9","192.168.110.10"],"dns_record_name":"0.stats-z1.private-cf-subnet.cf-apaas.microbosh","gateway":"192.168.110.11","ip":"192.168.110.204","netmask":"255.255.255.0"}},"resource_pool":{"cloud_properties":{"instance_type":"S-1"},"name":"small_z1","stemcell":{"name":"bosh-openstack-kvm-ubuntu-trusty-go_agent","version":"2989"}},"deployment":"cf-apaas","index":0,"persistent_disk":0,"persistent_disk_pool":null,"rendered_templates_archive":{"sha1":"0ffd89fa41e02888c9f9b09c6af52ea58265a8ec","blobstore_id":"4bd01ae7-a69a-4fe5-932b-d98137585a3b"},"agent_id":"7d3452bd-679e-4a97-8514-63a373a54ffd","bosh_protocol":"1","job_state":"failing","vm":{"name":"vm-12d45510-096d-4b8b-9547-73ea5fda00c2"},"ntp":{"message":"bad
ntp server"}}}


On Wed, Sep 23, 2015 at 5:13 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Please check the file collector/collector.log, it's in a subdirectory of
the unpacked log tarball.

On Wed, Sep 23, 2015 at 12:01 AM, Guangcai Wang <guangcai.wang(a)gmail.com
wrote:
Actually, I checked the two files in status_z1 job VM. I did not find
any clues. Attached for reference.

On Wed, Sep 23, 2015 at 4:54 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

If you do "bosh logs stats_z1 0 --job" you will get a tarball of all
the logs for the relevant processes running on the stats_z1/0 VM. You will
likely find some error messages in the collectors stdout or stderr logs.

On Tue, Sep 22, 2015 at 11:30 PM, Guangcai Wang <
guangcai.wang(a)gmail.com> wrote:

It does not help.

I always see the "collector" process bouncing between "running" and
"does not exit" when I use "monit summary" in a while loop.

Who knows how to get the real error when the "collector" process is
not failed? Thanks.

On Wed, Sep 23, 2015 at 4:11 PM, Tony <Tonyl(a)fast.au.fujitsu.com>
wrote:

My approach is to login on the stats vm and sudo, then
run "monit status" and restart the failed processes or simply
restart all
processes by running "monit restart all"

wait for a while(5~10 minutes at most)
If there is still some failed process, e.g. collector
then run ps -ef | grep collector
and kill the processes in the list(may be you need to run kill -9
sometimes)

then "monit restart all"

Normally, it will fix the issue "Failed: `XXX' is not running after
update"



--
View this message in context:
http://cf-dev.70369.x6.nabble.com/cf-dev-Error-400007-stats-z1-0-is-not-running-after-update-tp1901p1902.html
Sent from the CF Dev mailing list archive at Nabble.com.


Re: Loggregator/Doppler Syslog Drain Missing Logs

Michael Schwartz
 

That does makes sense. Especially since I'm seeing a consistent percentage. At one point, one app was getting 100% throughput and another was only getting 75%. So maybe one of the dopplers didn't have the drain binding.


Re: Loggregator/Doppler Syslog Drain Missing Logs

Michael Schwartz
 

We are running with 2 zones, 2 loggregators in each zone. I thought the same thing. Stopping all but one loggregator showed the same results.

If it helps, yesterday I was seeing 75% of the logs make it through with 4 loggregators and about 90% when I bumped the node count to 8. So it isn't always a 50/50.

Also, after shutting down or restarting a node, I see almost 100% of the logs come through at first. Then it slowly degrades back to 50% after a few minutes.


Re: Loggregator/Doppler Syslog Drain Missing Logs

Matthew Sykes <matthew.sykes@...>
 

v210 has quite a few bugs in this area. One fairly major one is a
connection leak [1] in the syslog_drain_binder component. When this
happens, changes to the syslog drain bindings do not make their way into
the doppler servers.

I'd strongly recommend you try to move to a newer release.

[1]:
https://github.com/cloudfoundry/loggregator/commit/b8d14b7fdc65b9d0d4a11cffa6b6f855e4d640ae

On Wed, Sep 23, 2015 at 2:48 PM, Michael Schwartz <mschwartz1411(a)gmail.com>
wrote:

The system is currently running ~200 apps and they all bind to an external
syslog drain.


--
Matthew Sykes
matthew.sykes(a)gmail.com


Re: Loggregator/Doppler Syslog Drain Missing Logs

Erik Jasiak
 

Hi Michael

First question that springs to mind when I see ~50% - how many zones are
you running as part of your setup? ("every other log" sounds like a
round-robin to something dead or misconfigured.)

Have to run but will follow up more soon,
Erik

Michael Schwartz wrote:


The system is currently running ~200 apps and they all bind to an
external syslog drain.


Re: Loggregator/Doppler Syslog Drain Missing Logs

Michael Schwartz
 

The system is currently running ~200 apps and they all bind to an external syslog drain.

7421 - 7440 of 9387