Packaging CF app as bosh-release


Paul Bakare
 

Hi,

Does it make any sense to package a CF app as boshrelease?


Amit Gupta
 

Can you say a bit more about what you're trying to do?


Ronak Banka
 

Can possibly package anything but that way application will run inside a vm
rather than on platform.

On Wed, Sep 16, 2015 at 12:02 AM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:

Hi,

Does it make any sense to package a CF app as boshrelease?


Paul Bakare
 

On Wed, Sep 16, 2015 at 5:15 AM, Amit Gupta <amitkgupta84(a)gmail.com> wrote:

Can you say a bit more about what you're trying to do?

I'm working on an experimental analytics project that leverages logsearch +
Apache Spark.

So instead of having the Spark jobs as apps, I'm thinking of building a
bosh release job for it.

Can possibly package anything but that way application will run inside a vm
rather than on platform.

What are the cons of apps running on VMs instead of warden?


Amit Kumar Gupta
 

Are the spark jobs tasks that you expect to end, or apps that you expect to
run forever?
Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
Do you need things like a static IP allocated to your jobs?
Are your spark jobs serving any web traffic?

On Wed, Sep 16, 2015 at 1:32 AM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:


On Wed, Sep 16, 2015 at 5:15 AM, Amit Gupta <amitkgupta84(a)gmail.com>
wrote:

Can you say a bit more about what you're trying to do?

I'm working on an experimental analytics project that leverages logsearch
+ Apache Spark.

So instead of having the Spark jobs as apps, I'm thinking of building a
bosh release job for it.

Can possibly package anything but that way application will run inside a
vm rather than on platform.

What are the cons of apps running on VMs instead of warden?


Paul Bakare
 

On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you expect
to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.


Amit Kumar Gupta
 

The shared file system aspect is an interesting wrinkle to the problem.
Unless you use some network layer to how you write to the shared file
system, e.g. SSHFS, I think apps will not work because they get isolated to
run in a container, they're given a chroot "jail" for their file system,
and it gets blown away whenever the app is stopped or restarted (which will
commonly happen, e.g. during a rolling deploy of the container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g. HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you expect
to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Paul Bakare
 

On Wed, Sep 16, 2015 at 4:09 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

How do your VMs currently access this shared FS?
This is the big ???? . What I planned on using is HDFS over http or
SWIFTFS. Would have to see what this gives.

I also don't know what happens when you scale your VMs down, because BOSH
would generally destroy the associated persistent disk
Not until you mentioned this, my understanding is this wasn't going to be
an issue for BOSH.

I don't have something useful yet since I'm still experimenting. Would push
it somewhere once I'm able to package it as a boshrelease.


Jim Park
 

IDK if BOSH has the abstractions for shared filesystems, but a helper monit
job could make sure that the filesystem is mounted and use
deployment-manifest supplied parameters to drive the ctl script. Monit also
has file_exists? status checker as well. The job could wait on this as a
monit-level dependency.

On Wed, Sep 16, 2015 at 8:11 AM Kayode Odeyemi <dreyemi(a)gmail.com> wrote:

On Wed, Sep 16, 2015 at 4:09 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

How do your VMs currently access this shared FS?
This is the big ???? . What I planned on using is HDFS over http or
SWIFTFS. Would have to see what this gives.

I also don't know what happens when you scale your VMs down, because BOSH
would generally destroy the associated persistent disk
Not until you mentioned this, my understanding is this wasn't going to be
an issue for BOSH.

I don't have something useful yet since I'm still experimenting. Would
push it somewhere once I'm able to package it as a boshrelease.


Dmitriy Kalinin <dkalinin@...>
 

BOSH allocates a persistent disk per instance. It never shares persistent
disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the problem.
Unless you use some network layer to how you write to the shared file
system, e.g. SSHFS, I think apps will not work because they get isolated to
run in a container, they're given a chroot "jail" for their file system,
and it gets blown away whenever the app is stopped or restarted (which will
commonly happen, e.g. during a rolling deploy of the container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g. HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you expect
to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Paul Bakare
 

Thanks Dmitriy,

Just for clarity, are you saying multiple instances of a VM cannot share a
single shared filesystem?

On Wed, Sep 16, 2015 at 6:59 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

BOSH allocates a persistent disk per instance. It never shares persistent
disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the problem.
Unless you use some network layer to how you write to the shared file
system, e.g. SSHFS, I think apps will not work because they get isolated to
run in a container, they're given a chroot "jail" for their file system,
and it gets blown away whenever the app is stopped or restarted (which will
commonly happen, e.g. during a rolling deploy of the container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g. HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you
expect to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Amit Kumar Gupta
 

My very limited understanding is that NFS writes to the actual filesystem,
and achieves persistence by having centralized NFS servers where it writes
to a real mounted device, whereas the clients write to an ephemeral
nfs-mount.

My very limited understanding of HDFS is that it's all userland FS, does
not write to the actual filesystem, and relies on replication to other
nodes in the HDFS cluster. Being a userland FS, you don't have to worry
about the data being wiped when a container is shut down, if you were to
run it as an app.

I think one main issue is going to be ensuring that you never lose too many
instances (whether they are containers or VMs), since you might then lose
all replicas of a given data shard. Whether you go with apps or BOSH VMs
doesn't make a big difference here.

Deploying as an app may be a better way to go, it's simpler right now to
configure and deploy and app, than to configure and deploy a full BOSH
release. It's also likely to be a more efficient use of resources, since a
BOSH VM can only run one of these spark-job-processors, but a CF
container-runner can run lots of other things. That actually brings up a
different question: is your compute environment a multi-tenant one that
will be running multiple different workloads? E.g. could someone also use
the CF to push their own apps? Or is the whole thing just for your spark
jobs, in which case you might only be running one container per VM anyways?

Assuming you can make use of the VMs for other workloads, I think this
would be an ideal use case for Diego. You probably don't need all the
extra logic around apps, like staging and routing, you just need Diego to
efficiently schedule containers for you.

On Wed, Sep 16, 2015 at 1:13 PM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:

Thanks Dmitriy,

Just for clarity, are you saying multiple instances of a VM cannot share a
single shared filesystem?

On Wed, Sep 16, 2015 at 6:59 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

BOSH allocates a persistent disk per instance. It never shares persistent
disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the problem.
Unless you use some network layer to how you write to the shared file
system, e.g. SSHFS, I think apps will not work because they get isolated to
run in a container, they're given a chroot "jail" for their file system,
and it gets blown away whenever the app is stopped or restarted (which will
commonly happen, e.g. during a rolling deploy of the container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g. HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you
expect to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Amit Kumar Gupta
 

Hey Kayode,

Were you able to make any progress with the deployments you were trying to
do?

Best,
Amit

On Wed, Sep 16, 2015 at 12:48 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

My very limited understanding is that NFS writes to the actual filesystem,
and achieves persistence by having centralized NFS servers where it writes
to a real mounted device, whereas the clients write to an ephemeral
nfs-mount.

My very limited understanding of HDFS is that it's all userland FS, does
not write to the actual filesystem, and relies on replication to other
nodes in the HDFS cluster. Being a userland FS, you don't have to worry
about the data being wiped when a container is shut down, if you were to
run it as an app.

I think one main issue is going to be ensuring that you never lose too
many instances (whether they are containers or VMs), since you might then
lose all replicas of a given data shard. Whether you go with apps or BOSH
VMs doesn't make a big difference here.

Deploying as an app may be a better way to go, it's simpler right now to
configure and deploy and app, than to configure and deploy a full BOSH
release. It's also likely to be a more efficient use of resources, since a
BOSH VM can only run one of these spark-job-processors, but a CF
container-runner can run lots of other things. That actually brings up a
different question: is your compute environment a multi-tenant one that
will be running multiple different workloads? E.g. could someone also use
the CF to push their own apps? Or is the whole thing just for your spark
jobs, in which case you might only be running one container per VM anyways?

Assuming you can make use of the VMs for other workloads, I think this
would be an ideal use case for Diego. You probably don't need all the
extra logic around apps, like staging and routing, you just need Diego to
efficiently schedule containers for you.

On Wed, Sep 16, 2015 at 1:13 PM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:

Thanks Dmitriy,

Just for clarity, are you saying multiple instances of a VM cannot share
a single shared filesystem?

On Wed, Sep 16, 2015 at 6:59 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

BOSH allocates a persistent disk per instance. It never shares
persistent disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the
problem. Unless you use some network layer to how you write to the shared
file system, e.g. SSHFS, I think apps will not work because they get
isolated to run in a container, they're given a chroot "jail" for their
file system, and it gets blown away whenever the app is stopped or
restarted (which will commonly happen, e.g. during a rolling deploy of the
container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g.
HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

Are the spark jobs tasks that you expect to end, or apps that you
expect to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Paul Bakare
 

Yes Amit. Thanks

I'm trying the 2 approaches since the both have their pros and cons.

is your compute environment a multi-tenant one that will be running
multiple different workloads?
Yes. dev can push their own spark-based apps and non-spark apps. The
spark-based apps would rely on the existing Spark cluster.

it's also likely to be a more efficient use of resources, since a BOSH VM
can only run one of these spark-job-processors,
I think a Spark cluster(using YARN) of BOSH VMs should be able to run
multiple spark jobs concurrently.

With the app deployment approach, I did setup a UPS for the Spark cluster
and I've been able to submit Spark jobs to the cluster programmatically
through the Spark API. I'll stay with app deployment for now until I get a
stronger use case for a boshrelease.

On Tue, Sep 22, 2015 at 12:21 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hey Kayode,

Were you able to make any progress with the deployments you were trying to
do?

Best,
Amit

On Wed, Sep 16, 2015 at 12:48 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

My very limited understanding is that NFS writes to the actual
filesystem, and achieves persistence by having centralized NFS servers
where it writes to a real mounted device, whereas the clients write to an
ephemeral nfs-mount.

My very limited understanding of HDFS is that it's all userland FS, does
not write to the actual filesystem, and relies on replication to other
nodes in the HDFS cluster. Being a userland FS, you don't have to worry
about the data being wiped when a container is shut down, if you were to
run it as an app.

I think one main issue is going to be ensuring that you never lose too
many instances (whether they are containers or VMs), since you might then
lose all replicas of a given data shard. Whether you go with apps or BOSH
VMs doesn't make a big difference here.

Deploying as an app may be a better way to go, it's simpler right now to
configure and deploy and app, than to configure and deploy a full BOSH
release. It's also likely to be a more efficient use of resources, since a
BOSH VM can only run one of these spark-job-processors, but a CF
container-runner can run lots of other things. That actually brings up a
different question: is your compute environment a multi-tenant one that
will be running multiple different workloads? E.g. could someone also use
the CF to push their own apps? Or is the whole thing just for your spark
jobs, in which case you might only be running one container per VM anyways?

Assuming you can make use of the VMs for other workloads, I think this
would be an ideal use case for Diego. You probably don't need all the
extra logic around apps, like staging and routing, you just need Diego to
efficiently schedule containers for you.

On Wed, Sep 16, 2015 at 1:13 PM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:

Thanks Dmitriy,

Just for clarity, are you saying multiple instances of a VM cannot share
a single shared filesystem?

On Wed, Sep 16, 2015 at 6:59 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

BOSH allocates a persistent disk per instance. It never shares
persistent disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the
problem. Unless you use some network layer to how you write to the shared
file system, e.g. SSHFS, I think apps will not work because they get
isolated to run in a container, they're given a chroot "jail" for their
file system, and it gets blown away whenever the app is stopped or
restarted (which will commonly happen, e.g. during a rolling deploy of the
container-runner VMs).

Do you have something that currently works? How do your VMs currently
access this shared FS? I'm not sure BOSH has the abstractions for choosing
a shared, already-existing "persistent disk" to be attached to multiple
VMs. I also don't know what happens when you scale your VMs down, because
BOSH would generally destroy the associated persistent disk, but you don't
want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g.
HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io>
wrote:

Are the spark jobs tasks that you expect to end, or apps that you
expect to run forever?
They are tasks that run forever. The jobs are subscribers to RabbitMQ
queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.




Amit Kumar Gupta
 

Great. Let us know if you have further questions, or share if you end up
getting something cool deployed and working, since I haven't seen Spark on
BOSH or CF yet.

Best,
Amit

On Mon, Sep 21, 2015 at 11:33 PM, Kayode Odeyemi <dreyemi(a)gmail.com> wrote:

Yes Amit. Thanks

I'm trying the 2 approaches since the both have their pros and cons.

is your compute environment a multi-tenant one that will be running
multiple different workloads?
Yes. dev can push their own spark-based apps and non-spark apps. The
spark-based apps would rely on the existing Spark cluster.

it's also likely to be a more efficient use of resources, since a BOSH VM
can only run one of these spark-job-processors,
I think a Spark cluster(using YARN) of BOSH VMs should be able to run
multiple spark jobs concurrently.

With the app deployment approach, I did setup a UPS for the Spark cluster
and I've been able to submit Spark jobs to the cluster programmatically
through the Spark API. I'll stay with app deployment for now until I get a
stronger use case for a boshrelease.

On Tue, Sep 22, 2015 at 12:21 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

Hey Kayode,

Were you able to make any progress with the deployments you were trying
to do?

Best,
Amit

On Wed, Sep 16, 2015 at 12:48 PM, Amit Gupta <agupta(a)pivotal.io> wrote:

My very limited understanding is that NFS writes to the actual
filesystem, and achieves persistence by having centralized NFS servers
where it writes to a real mounted device, whereas the clients write to an
ephemeral nfs-mount.

My very limited understanding of HDFS is that it's all userland FS, does
not write to the actual filesystem, and relies on replication to other
nodes in the HDFS cluster. Being a userland FS, you don't have to worry
about the data being wiped when a container is shut down, if you were to
run it as an app.

I think one main issue is going to be ensuring that you never lose too
many instances (whether they are containers or VMs), since you might then
lose all replicas of a given data shard. Whether you go with apps or BOSH
VMs doesn't make a big difference here.

Deploying as an app may be a better way to go, it's simpler right now to
configure and deploy and app, than to configure and deploy a full BOSH
release. It's also likely to be a more efficient use of resources, since a
BOSH VM can only run one of these spark-job-processors, but a CF
container-runner can run lots of other things. That actually brings up a
different question: is your compute environment a multi-tenant one that
will be running multiple different workloads? E.g. could someone also use
the CF to push their own apps? Or is the whole thing just for your spark
jobs, in which case you might only be running one container per VM anyways?

Assuming you can make use of the VMs for other workloads, I think this
would be an ideal use case for Diego. You probably don't need all the
extra logic around apps, like staging and routing, you just need Diego to
efficiently schedule containers for you.

On Wed, Sep 16, 2015 at 1:13 PM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:

Thanks Dmitriy,

Just for clarity, are you saying multiple instances of a VM cannot
share a single shared filesystem?

On Wed, Sep 16, 2015 at 6:59 PM, Dmitriy Kalinin <dkalinin(a)pivotal.io>
wrote:

BOSH allocates a persistent disk per instance. It never shares
persistent disks between multiple instances at the same time.

If you need a shared file system, you will have to use some kind of a
release for it. It's not any different from what people do with nfs
server/client.

On Wed, Sep 16, 2015 at 7:09 AM, Amit Gupta <agupta(a)pivotal.io> wrote:

The shared file system aspect is an interesting wrinkle to the
problem. Unless you use some network layer to how you write to the shared
file system, e.g. SSHFS, I think apps will not work because they get
isolated to run in a container, they're given a chroot "jail" for their
file system, and it gets blown away whenever the app is stopped or
restarted (which will commonly happen, e.g. during a rolling deploy of the
container-runner VMs).

Do you have something that currently works? How do your VMs
currently access this shared FS? I'm not sure BOSH has the abstractions
for choosing a shared, already-existing "persistent disk" to be attached to
multiple VMs. I also don't know what happens when you scale your VMs down,
because BOSH would generally destroy the associated persistent disk, but
you don't want to destroy the shared data.

Dmitriy, any idea how BOSH can work with a shared filesystem (e.g.
HDFS)?

Amit

On Wed, Sep 16, 2015 at 6:54 AM, Kayode Odeyemi <dreyemi(a)gmail.com>
wrote:


On Wed, Sep 16, 2015 at 3:44 PM, Amit Gupta <agupta(a)pivotal.io>
wrote:

Are the spark jobs tasks that you expect to end, or apps that you
expect to run forever?
They are tasks that run forever. The jobs are subscribers to
RabbitMQ queues that process
messages in batches.


Do your jobs need to write to the file system, or do they access a
shared/distributed file system somehow?
The jobs write to shared filesystem.


Do you need things like a static IP allocated to your jobs?
No.


Are your spark jobs serving any web traffic?
No.