#cf Proposal to use modern and more efficient VM sizes on all IaaSes #cf


David Stevenson
 

Hi CF community! I have a proposal to use modern & more efficient VM types on CF, more suitable for testing and CI workloads, and I'd love feedback on it:
https://github.com/cloudfoundry/community/pull/290

Some impacts to highlight, if this rolls out as-in:
  • New bbl up runs will switch to the new VM sizes automatically for CF-d, Jumpbox-D, and BOSH-d
    • Existing envs that do not re-run BBL up will not change
    • If you want the old machine sizes, you'll have to run an older bbl version or customize your cloud config yourself (we expect production instances of CF to tune their VM sizes anyway!)
  • A full CF/BOSH/Jumpbox deployment is about 70% cheaper! Using some IaaS commitments, you can now run a default deployment for ~$420/mo!
  • If you purchase Reserved/Committed Instances (and re-run bbl or create a new env), you'll end up using new VM types that might not be utilized
  • I've standardized the meaning of the handful VM types used in BOSH-D and CF-D, and tried to provide a similar amount of resources on all 3 IaaSes.
    • Azure used to be 56GB diego cells, GCP was 26GB 🙄
  • We'd end up using a lot of burstable VM types on GCP, AWS, and Azure. This is intentional and has been validated to work.
I've been trying to do infrastructure budgeting for running more CF testing and CI using CF Foundation GCP and AWS accounts, and I realized we need to reduce that spend before much of it could reasonably fit in the CFF budget. I'm going to move ahead with this sometime next week if there's no interesting feedback to address, so that we can do CFF IT budgeting effectively.

Thanks!
-David Stevenson
CFF TOC member / VMWare Tech Lead


David Stevenson
 

This change has been successfully rolled out, we're saving 50-70% on our default deployments, yay! "bbl up" with newer bbl versions will change your default VM types catalog. If you run CATS, read on...

Newer versions of BBL starting with 8.4.91 will deploy more modern and efficient VM sizes by default. Along the way we encountered a problem where the default configuration is less reliable in passing CATS, which makes sense because there's a lot less hardware deployed. A change from CATS timeout_scale: 1 to timeout_scale: 2 fixes the issue and the CATS still pass when run in parallel with 12 nodes only about 25% slower (which is acceptable). We plan to fix CATS timeout defaults eventually such that this timeout scaling configuration option is not required.

Using the 1AZ cf-deployment ops-file, the CATS now will always fail when run in parallel due to the smaller standardized diego cell size. To work around this, BBL v8.4.93 standardizes the "medium" (4 cores, 16GB) and "large" (8 cores, 32GB) VM sizes and introduces a "medium-highmem" (4 cores, 32GB) size. These additional standard sizes give operators and CF authors additional sizes to choose from for diego cells, such as this newly introduced ops file designed to make parallel CATS testing fast+reliable in a single-AZ minimal configuration.


minseok kim
 

hi 
it is awesome work!
thanks

2022년 7월 13일 (수) 오전 9:25, David Stevenson via lists.cloudfoundry.org <stevensonda=vmware.com@...>님이 작성:

This change has been successfully rolled out, we're saving 50-70% on our default deployments, yay! "bbl up" with newer bbl versions will change your default VM types catalog. If you run CATS, read on...

Newer versions of BBL starting with 8.4.91 will deploy more modern and efficient VM sizes by default. Along the way we encountered a problem where the default configuration is less reliable in passing CATS, which makes sense because there's a lot less hardware deployed. A change from CATS timeout_scale: 1 to timeout_scale: 2 fixes the issue and the CATS still pass when run in parallel with 12 nodes only about 25% slower (which is acceptable). We plan to fix CATS timeout defaults eventually such that this timeout scaling configuration option is not required.

Using the 1AZ cf-deployment ops-file, the CATS now will always fail when run in parallel due to the smaller standardized diego cell size. To work around this, BBL v8.4.93 standardizes the "medium" (4 cores, 16GB) and "large" (8 cores, 32GB) VM sizes and introduces a "medium-highmem" (4 cores, 32GB) size. These additional standard sizes give operators and CF authors additional sizes to choose from for diego cells, such as this newly introduced ops file designed to make parallel CATS testing fast+reliable in a single-AZ minimal configuration.

--
김민석 
+82-10-3266-8040