Problem extracting stemcell archive


Ulrik Sandberg
 

Following the instructions on https://github.com/cloudfoundry/bosh-lite, when I ran provision_cf, the stemcell download failed (temporarily probably). But running it again with a successful download didn't resolve the issue.

$ ./bin/provision_cf
...
+ curl --progress-bar http://bosh-jenkins-artifacts.s3.amazonaws.com/bosh-stemcell/warden/latest-bosh-stemcell-warden.tgz
###################################################################### 98.3%
curl: (56) Recv failure: Operation timed out

I ran it again, thinking it would re-download, but it didn't. Instead I got this:

$ ./bin/provision_cf
...
Read tarball OK
/Users/ulrik/.rvm/gems/ruby-2.1.5/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:630:in `read': unexpected end of file (Zlib::GzipFile::Error)

I then removed the broken archive and ran provision_cf again:

$ rm latest-bosh-stemcell-warden.tgz
$ ./bin/provision_cf
...
+ curl --progress-bar http://bosh-jenkins-artifacts.s3.amazonaws.com/bosh-stemcell/warden/latest-bosh-stemcell-warden.tgz
######################################################################## 100.0%
...
Uploading stemcell...

latest-bosh-s: 100% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo| 431.4MB 39.2MB/s Time: 00:00:10

Director task 1
Started update stemcell
Started update stemcell > Extracting stemcell archive. Failed: Extracting stemcell archive failed. Check task debug log for details. (00:00:04)

Checking the debug log, I see this:

$ bosh task 1 --debug
...
E, [2015-11-13 07:23:53 #2798] [task:1] ERROR -- DirectorJobRunner: Extracting stemcell archive failed in dir /var/vcap/data/tmp/director/stemcell20151113-2798-rg19uo, tar returned 2, output:
gzip: stdin: unexpected end of file

I tried just the upload, making sure the flag --skip-if-exists was not used, but I get the same error:

$ bosh -n -u admin -p admin upload stemcell latest-bosh-stemcell-warden.tgz
...
Uploading stemcell...

latest-bosh-s: 100% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo| 431.4MB 40.3MB/s Time: 00:00:10

Director task 2
Started update stemcell
Started update stemcell > Extracting stemcell archive. Failed: Extracting stemcell archive failed. Check task debug log for details. (00:00:03)

How do I recover from this?


bharath P
 

Hi Ulrik

can you run the command

$bosh verify stemcell <<stemcell-name>>

to see whether the stemcell is valid or not .

you need to even see the disk space available . provide the logs if it
fails

regards
Bharath




On Fri, Nov 13, 2015 at 1:20 PM, Ulrik Sandberg <ulrik.sandberg(a)jayway.com>
wrote:

Following the instructions on https://github.com/cloudfoundry/bosh-lite,
when I ran provision_cf, the stemcell download failed (temporarily
probably). But running it again with a successful download didn't resolve
the issue.

$ ./bin/provision_cf
...
+ curl --progress-bar
http://bosh-jenkins-artifacts.s3.amazonaws.com/bosh-stemcell/warden/latest-bosh-stemcell-warden.tgz
######################################################################
98.3%
curl: (56) Recv failure: Operation timed out

I ran it again, thinking it would re-download, but it didn't. Instead I
got this:

$ ./bin/provision_cf
...
Read tarball OK
/Users/ulrik/.rvm/gems/ruby-2.1.5/gems/minitar-0.5.4/lib/archive/tar/minitar.rb:630:in
`read': unexpected end of file (Zlib::GzipFile::Error)

I then removed the broken archive and ran provision_cf again:

$ rm latest-bosh-stemcell-warden.tgz
$ ./bin/provision_cf
...
+ curl --progress-bar
http://bosh-jenkins-artifacts.s3.amazonaws.com/bosh-stemcell/warden/latest-bosh-stemcell-warden.tgz
########################################################################
100.0%
...
Uploading stemcell...

latest-bosh-s: 100%
|oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo|
431.4MB 39.2MB/s Time: 00:00:10

Director task 1
Started update stemcell
Started update stemcell > Extracting stemcell archive. Failed:
Extracting stemcell archive failed. Check task debug log for details.
(00:00:04)

Checking the debug log, I see this:

$ bosh task 1 --debug
...
E, [2015-11-13 07:23:53 #2798] [task:1] ERROR -- DirectorJobRunner:
Extracting stemcell archive failed in dir
/var/vcap/data/tmp/director/stemcell20151113-2798-rg19uo, tar returned 2,
output:
gzip: stdin: unexpected end of file

I tried just the upload, making sure the flag --skip-if-exists was not
used, but I get the same error:

$ bosh -n -u admin -p admin upload stemcell latest-bosh-stemcell-warden.tgz
...
Uploading stemcell...

latest-bosh-s: 100%
|oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo|
431.4MB 40.3MB/s Time: 00:00:10

Director task 2
Started update stemcell
Started update stemcell > Extracting stemcell archive. Failed:
Extracting stemcell archive failed. Check task debug log for details.
(00:00:03)

How do I recover from this?


Ulrik Sandberg
 

Thanks for your reply.

$ bosh verify stemcell latest-bosh-stemcell-warden.tgz

Verifying stemcell...
File exists and readable OK
Verifying tarball...
Read tarball OK
Manifest exists OK
Stemcell image file OK
Stemcell properties OK

Stemcell info
-------------
Name: bosh-warden-boshlite-ubuntu-trusty-go_agent
Version: 389

`latest-bosh-stemcell-warden.tgz' is a valid stemcell


I'm assuming you mean the vagrant disk, which seems to have plenty of space on the root partition:

vagrant(a)bosh-lite:~$ df -h /var/vcap/data/tmp/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-root 78G 1.8G 72G 3% /


Anything else to check? Should I destroy the vagrant and start over?


bharath P
 

Hi ulrik

You can now again run the command. this time in another terminal try to see
the debug logs with the below command

bosh task <<task_id>> --debug .

I think the first one failed because that file might be corrupted. do it
once again and let me know the status. provide the logs if it fails again

regards
Bharath

On Mon, Nov 16, 2015 at 1:58 PM, Ulrik Sandberg <ulrik.sandberg(a)jayway.com>
wrote:

Thanks for your reply.

$ bosh verify stemcell latest-bosh-stemcell-warden.tgz

Verifying stemcell...
File exists and readable OK
Verifying tarball...
Read tarball OK
Manifest exists OK
Stemcell image file OK
Stemcell properties OK

Stemcell info
-------------
Name: bosh-warden-boshlite-ubuntu-trusty-go_agent
Version: 389

`latest-bosh-stemcell-warden.tgz' is a valid stemcell


I'm assuming you mean the vagrant disk, which seems to have plenty of
space on the root partition:

vagrant(a)bosh-lite:~$ df -h /var/vcap/data/tmp/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-root 78G 1.8G 72G 3% /


Anything else to check? Should I destroy the vagrant and start over?


Ulrik Sandberg
 


Ulrik Sandberg
 

Interestingly, I destroyed the vagrant, removed the downloaded stemcell file, started a new vagrant, ran provision_cf again, and got the same error.


Hristo Iliev
 

Do you happen to use proxy or firewall? Sometimes in my setup the corrupted
downloads are cached and I have to select another mirror. Not that there
are mirrors for the stemcells.

2015-11-16 15:39 GMT+02:00 Ulrik Sandberg <ulrik.sandberg(a)jayway.com>:

Interestingly, I destroyed the vagrant, removed the downloaded stemcell
file, started a new vagrant, ran provision_cf again, and got the same error.


Ulrik Sandberg
 

All downloads after the first failed one have unarchived without problem on my local machine. 'bost verify' says it's OK. It's only after the file has uploaded to the vagrant that it fails unarchiving.

I checked out previous version of bosh-lite, 8dca74b, since box version was changed after that. No difference.

Could someone verify that it actually works with these versions?

bosh-lite: 295b76d
bosh: BOSH 1.3130.0
vagrant: Vagrant 1.7.4
VirtualBox: 5.0.10 (tried 4.3.34 too)
ruby: 2.1.5p273
MacOSX: 10.11.1


Bharath
 

Hi ulrik ,

Try to use the stemcell in the below linkg provided

https://d26ekeud912fhb.cloudfront.net/bosh-stemcell/warden/bosh-stemcell-360-warden-boshlite-ubuntu-trusty-go_agent.tgz

regards
Bharath

On Tue, Nov 17, 2015 at 4:51 AM, Ulrik Sandberg <ulrik.sandberg(a)jayway.com>
wrote:

All downloads after the first failed one have unarchived without problem
on my local machine. 'bost verify' says it's OK. It's only after the file
has uploaded to the vagrant that it fails unarchiving.

I checked out previous version of bosh-lite, 8dca74b, since box version
was changed after that. No difference.

Could someone verify that it actually works with these versions?

bosh-lite: 295b76d
bosh: BOSH 1.3130.0
vagrant: Vagrant 1.7.4
VirtualBox: 5.0.10 (tried 4.3.34 too)
ruby: 2.1.5p273
MacOSX: 10.11.1


Ulrik Sandberg
 

When I run bin/provision_cf, it downloads this file:

+ STEMCELL_SOURCE=http://bosh-jenkins-artifacts.s3.amazonaws.com/bosh-stemcell/warden
+ STEMCELL_FILE=latest-bosh-stemcell-warden.tgz

The version of this "latest" file is 389:

Stemcell info
-------------
Name: bosh-warden-boshlite-ubuntu-trusty-go_agent
Version: 389

Looking at https://bosh.io/stemcells, version 389 seems to be over a year old (and apparently broken). The most current bosh-warden-boshlite-ubuntu-trusty-go_agent is 3126. If I download version 3126 and rename it latest-bosh-stemcell-warden.tgz, it unpacks correctly when I run bin/provision_cf:

Stemcell info
-------------
Name: bosh-warden-boshlite-ubuntu-trusty-go_agent
Version: 3126
...
Started update stemcell > Extracting stemcell archive. Done (00:00:03)
Started update stemcell > Verifying stemcell manifest. Done (00:00:00)
Started update stemcell > Checking if this stemcell already exists. Done (00:00:00)
Started update stemcell > Uploading stemcell bosh-warden-boshlite-ubuntu-trusty-go_agent/3126 to the cloud. Done (00:00:12)
Started update stemcell > Save stemcell bosh-warden-boshlite-ubuntu-trusty-go_agent/3126 (0e357757-4f75-4ccb-77d4-b228c69b771a). Done (00:00:00)
Done update stemcell (00:00:15)

Now the script trips up on a missing ../cf-release directory. I'll probably need to clone another Git repo.

I guess just running bin/provision_cf is not enough to get started, even though the documentation suggests so.


Ulrik Sandberg
 

I cloned cf-release beside bosh-lite, installed spiff, and ran bin/provision_cf again. Now it goes all the way to deployment, but fails with this:

...
Started preparing deployment > Binding unallocated VMs. Done (00:00:00)
Started preparing deployment > Binding instance networks. Done (00:00:00)

Started preparing package compilation > Finding packages to compile. Failed: File exists @ dir_s_mkdir - /vagrant (00:00:00)

Error 100: File exists @ dir_s_mkdir - /vagrant


Ulrik Sandberg
 

This was apparently a known issue [https://github.com/cloudfoundry/bosh-lite/issues/109], probably related to moving a local directory after it had been shared with vagrant. I did move the bosh-lite directory into a subfolder before cloning cf-release, to keep them together, so it was likely caused by that move. 'vagrant reload' solved it.

It's looking pretty good, compiling packages now.