Date
1 - 3 of 3
Monit issue
Stephen Knight <sknight@...>
Hi guys,
I have a boshrelease, which includes Dante proxy server. The release builds and the code compiles, installs and runs on the hosts. However, I have an issue with Monit. When I initially run deploy, although the machines build and the correct ports are listening (as in Dante started on port 1081), Bosh first reports this error during build time: """ Failed updating job socks > socks/0: `socks/0' is not running after update (00:01:36) Failed updating job socks (00:01:36) Error 400007: `socks/1' is not running after update """ When I log on to the stemcell and run "monit summary": """ -bash-4.2# monit summary The Monit daemon 5.2.4 uptime: 1m Process 'socksd' Execution failed Process 'stunnel' not monitored System 'system_61693bef-3e13-4be5-bbde-90154639f452' running """ However, even with these errors if I run lsof I see the application running: """ -bash-4.2# lsof -i:1081 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sockd 11124 root 7u IPv4 24814 0t0 TCP 61693bef-3e13-4be5-bbde-90154639f452:pvuniwien (LISTEN) """ I can't figure out what the issue might be, one suspicious thing is appearing in the logs, this message keeps recurring: /var/vcap/sys/log/socksd/*.err.log """ Oct 22 10:47:23 (1445510843.376364) sockd[11314]: error: serverinit(): failed to bind internal addresses: Address already in use Oct 22 10:47:23 (1445510843.376375) sockd[11314]: alert: mother[1/1]: shutting down Oct 22 10:48:33 (1445510913.391384) sockd[11364]: warning: checkconfig(): setting the unprivileged uid to 0 is not recommended for security reasons Oct 22 10:48:33 (1445510913.391490) sockd[11364]: warning: bindinternal(): bind of address 192.168.100.111.1081 (address #1/1) for server to listen on failed: Address already in use Oct 22 10:48:33 (1445510913.391501) sockd[11364]: error: serverinit(): failed to bind internal addresses: Address already in use Oct 22 10:48:33 (1445510913.391516) sockd[11364]: alert: mother[1/1]: shutting down """ Strange thing is nothing else is configured to run on this port, I checked the ctl_setup.sh and socks startup script, I suspect it might be flapping somehow but Monit is not giving any debug info that would help me resolve the issue. Hoping for some advice - tried on both the latest Ubuntu and CentOS stemcells. - *bosh gem versions * [root(a)bosh-cli-01 socksd-boshrelease]# bosh -v BOSH 1.3104.0 - *bosh director info * [root(a)bosh-cli-01 socksd-boshrelease]# bosh status Config /root/.bosh_config Director Name my-bosh URL https://192.168.100.205:25555 Version 1.3104.0 (00000000) User admin UUID eef3d294-5790-41eb-81e2-296cf3883c07 CPI cpi dns disabled compiled_package_cache disabled snapshots disabled Deployment Manifest /root/bosh-workspace/deployments/socksd-boshrelease/socksd.yml - *stemcell version(s)* bosh-vsphere-esxi-ubuntu-trusty-go_agent / 3104 bosh-vsphere-esxi-centos-7-go_agent /3104 - *deployment manifest * <% director_uuid = 'eef3d294-5790-41eb-81e2-296cf3883c07' deployment_name = 'socksd' %> --- name: <%= deployment_name %> director_uuid: <%= director_uuid %> releases: - name: socksd version: latest compilation: workers: 2 network: default reuse_compilation_vms: false cloud_properties: preemptible: true cpu: 2 ram: 2_048 disk: 10_240 update: canaries: 0 canary_watch_time: 30000-60000 update_watch_time: 30000-60000 max_in_flight: 32 serial: false networks: - name: default type: manual subnets: - range: 192.168.100.0/24 reserved: [192.168.100.2 - 192.168.100.110] gateway: 192.168.100.1 cloud_properties: name: VLAN100 tags: - bosh - socksd dns: - 192.168.100.1 - 8.8.8.8 resource_pools: - name: default network: default stemcell: name: bosh-vsphere-esxi-centos-7-go_agent version: latest cloud_properties: cpu: 1 ram: 1_024 disk: 10_240 jobs: - name: socks templates: - name: socksd - name: stunnel instances: 2 resource_pool: default persistent_disk: 10240 networks: - name: default default: [dns, gateway] properties: vsphere: host: xxx user: xxx password: xxx datacenters: - name: LAB vm_folder: bosh template_folder: Templates disk_path: bosh_disks datastore_pattern: '\AISCSI_SSD\z' persistent_datastore_pattern: '\AISCSI_SSD\z' clusters: - LAB: {resource_pool: LAB} director: max_threads: 3 hm: resurrector_enabled: true resurrector: minimum_down_jobs: 5 percent_threshold: 0.2 time_threshold: 600 ntp: - 0.asia.pool.ntp.org - 1.asia.pool.ntp.org I suspect the issue is deep in Monit somewhere, so any advice on diagnosing it with self made releases please, it would help a lot. Stephen |
|
Dr Nic Williams
If I recall one possibility is that your monit file is referencing a pid file that is a different location to where your ctl script/app is dropping the pid fileĀ
So your monit starts the app, you drop a pid file. Monit can't see the pid file it expects so it starts your app again. And now you have two processes vying for the same port/shared resource. On Thu, Oct 22, 2015 at 3:56 AM, Stephen Knight <sknight(a)pivotal.io> wrote: Hi guys, |
|
Gwenn Etourneau
Maybe you have the source code of the bosh release somewhere especially the
monit and the start up script. My guess is the same at Dr Nic, monit check a pid file which is different from the one your application / script is writing. So Monit try to restart again and again your program. |
|