Import large dataset to Postgres instance in CF


Noburou TANIGUCHI
 

I'm afraid I don't understand your situation, but can't you use User-Provided
Service [1]?

If you can build a PostgreSQL instance where you can access it with psql and
where a CF app instance can access, you can use it as a User-Provided
Service.

- Pros:
-- you can use it as an ordinary PostgreSQL instance

- Cons:
-- you should manage it on your own
-- you may have to ask to your CF administrator to open Application Security
Groups [2].

If you aren't allowed to create User-Provided Service, please forget this
post.

[1] https://docs.cloudfoundry.org/devguide/services/user-provided.html
[2] https://docs.cloudfoundry.org/adminguide/app-sec-groups.html



Siva Balan wrote
Thanks. We are not running Diego. So writing an app seem to be the most
viable option.

On Fri, Dec 11, 2015 at 3:44 AM, Matthew Sykes <
matthew.sykes@
>
wrote:

Regarding Nic's ssh comment, if you're running Diego, I'd recommend using
the port forwarding feature instead of copying the data. It was actually
one of the scenarios that drove the implementation of that feature.

Once a the port forwarding is setup, you should be able to target the
local endpoint with your database tools and have everything forwarded
over
the tunnel to the database.

On Thu, Dec 10, 2015 at 12:35 AM, Nicholas Calugar <
ncalugar@
>
wrote:

Hi Siva,

1. If you run the PostgreSQL, you likely want to temporarily open the
firewall to load data or get on a jump box of some sort that can
access the
database. It's not really a CF issue at this point, it's a general
issue of
seeding a database out-of-band from the application server.
2. If the above isn't an option and your CF is running Diego, you
could use SSH to get onto an app container after SCPing the data to
that
container.
3. The only other option I can think of is writing a simple app that
you can push to CF to do the import.

Hope that helps,

Nick

On Wed, Dec 9, 2015 at 3:08 PM Siva Balan <
mailsiva@
> wrote:

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql
CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume
many would have faced a similar circumstance of importing large sets of
data to their DB which is behind a firewall and accessible only through
CF
apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <
ncalugar@
>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was
deployed, but you might be able to connect to it from your local
machine
using the psql CLI and the credentials for one of your bound apps.
This
takes CF out of the equation other than the service binding providing
the
credentials.

If this doesn't work, there are a number of things that could be in
the
way, i.e. firewall that only allows connection from CF or the
PostgreSQL
server is on a different subnet. You can then try using some machine
as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <
mailsiva@
> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to
it.
Now I need to import a very large dataset(millions of records) into
this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80
and 443. So I am not be able to use any of the native postgresql
tools to
import the data. I can view and run simple SQL commands on this
postgres
instance using the phppgadmin app that is also bound to my postgres
service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans

--
Matthew Sykes
matthew.sykes@


--
http://www.twitter.com/sivabalans




-----
I'm not a ...
noburou taniguchi
--
View this message in context: http://cf-dev.70369.x6.nabble.com/cf-dev-Import-large-dataset-to-Postgres-instance-in-CF-tp3017p3079.html
Sent from the CF Dev mailing list archive at Nabble.com.


Siva Balan <mailsiva@...>
 

Thanks. We are not running Diego. So writing an app seem to be the most
viable option.

On Fri, Dec 11, 2015 at 3:44 AM, Matthew Sykes <matthew.sykes(a)gmail.com>
wrote:

Regarding Nic's ssh comment, if you're running Diego, I'd recommend using
the port forwarding feature instead of copying the data. It was actually
one of the scenarios that drove the implementation of that feature.

Once a the port forwarding is setup, you should be able to target the
local endpoint with your database tools and have everything forwarded over
the tunnel to the database.

On Thu, Dec 10, 2015 at 12:35 AM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

1. If you run the PostgreSQL, you likely want to temporarily open the
firewall to load data or get on a jump box of some sort that can access the
database. It's not really a CF issue at this point, it's a general issue of
seeding a database out-of-band from the application server.
2. If the above isn't an option and your CF is running Diego, you
could use SSH to get onto an app container after SCPing the data to that
container.
3. The only other option I can think of is writing a simple app that
you can push to CF to do the import.

Hope that helps,

Nick

On Wed, Dec 9, 2015 at 3:08 PM Siva Balan <mailsiva(a)gmail.com> wrote:

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume
many would have faced a similar circumstance of importing large sets of
data to their DB which is behind a firewall and accessible only through CF
apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was
deployed, but you might be able to connect to it from your local machine
using the psql CLI and the credentials for one of your bound apps. This
takes CF out of the equation other than the service binding providing the
credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to it.
Now I need to import a very large dataset(millions of records) into this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80
and 443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans

--
Matthew Sykes
matthew.sykes(a)gmail.com


--
http://www.twitter.com/sivabalans


Matthew Sykes <matthew.sykes@...>
 

Regarding Nic's ssh comment, if you're running Diego, I'd recommend using
the port forwarding feature instead of copying the data. It was actually
one of the scenarios that drove the implementation of that feature.

Once a the port forwarding is setup, you should be able to target the local
endpoint with your database tools and have everything forwarded over the
tunnel to the database.

On Thu, Dec 10, 2015 at 12:35 AM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

1. If you run the PostgreSQL, you likely want to temporarily open the
firewall to load data or get on a jump box of some sort that can access the
database. It's not really a CF issue at this point, it's a general issue of
seeding a database out-of-band from the application server.
2. If the above isn't an option and your CF is running Diego, you
could use SSH to get onto an app container after SCPing the data to that
container.
3. The only other option I can think of is writing a simple app that
you can push to CF to do the import.

Hope that helps,

Nick

On Wed, Dec 9, 2015 at 3:08 PM Siva Balan <mailsiva(a)gmail.com> wrote:

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume
many would have faced a similar circumstance of importing large sets of
data to their DB which is behind a firewall and accessible only through CF
apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was
deployed, but you might be able to connect to it from your local machine
using the psql CLI and the credentials for one of your bound apps. This
takes CF out of the equation other than the service binding providing the
credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to it.
Now I need to import a very large dataset(millions of records) into this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans

--
Matthew Sykes
matthew.sykes(a)gmail.com


Guillaume Berche
 

Hi Siva,

We've been working at Orange on a solution which dumps of an existing db to
an S3-compatible endpoint and then reimports from the S3 bucket into a db
instance (see mailing list announce in [1] and specs in [2]). The
implementation at [3] is still in early stage and currently lacks
documentation beyond the specs. We'd be happy to get feedback from the
community. While this does not directly addresses your issue, this might
provide ideas:

a) within corp network manually upload the data set (e.g. a pg dump) and
upload it to S3 using S3 CLIs (e.g. riakcs service). Then within one of
your CF instance, ssh to it, and download the dump from S3 and stream it
into a pg client to import it into a CF reacheable instance (as to avoid
reaching ephemeral FS limit)

b) If this process is recurrent and needs automation, then the
service-db-dumper could potentially help.

I'll think about extending the service db dumper to accept a remote S3
bucket as the source of a dump (currently it accepts a db URL to perform a
dump from, and soon a service instance name/guid)

If this service-db-dumper improvement were available, then you could
instanciate a service-db-dumper within your private CF instance. Then
instanciate a dump service instance from the S3 bucket were you would have
uploaded the dump.
Then use the service-db-dumper to restore/import this dump into to your pg
instance accessible within CF.

Hope this helps,

Guillaume.

[1]
http://cf-dev.70369.x6.nabble.com/cf-dev-Data-services-import-export-tp1717.html
[2]
https://docs.google.com/document/d/1Y5vwWjvaUIwHI76XU63cAS8xEOJvN69-cNoCQRqLPqU/edit
[3] https://github.com/Orange-OpenSource/service-db-dumper

On Thu, Dec 10, 2015 at 6:35 AM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

1. If you run the PostgreSQL, you likely want to temporarily open the
firewall to load data or get on a jump box of some sort that can access the
database. It's not really a CF issue at this point, it's a general issue of
seeding a database out-of-band from the application server.
2. If the above isn't an option and your CF is running Diego, you
could use SSH to get onto an app container after SCPing the data to that
container.
3. The only other option I can think of is writing a simple app that
you can push to CF to do the import.

Hope that helps,

Nick

On Wed, Dec 9, 2015 at 3:08 PM Siva Balan <mailsiva(a)gmail.com> wrote:

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume
many would have faced a similar circumstance of importing large sets of
data to their DB which is behind a firewall and accessible only through CF
apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was
deployed, but you might be able to connect to it from your local machine
using the psql CLI and the credentials for one of your bound apps. This
takes CF out of the equation other than the service binding providing the
credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to it.
Now I need to import a very large dataset(millions of records) into this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans


Nicholas Calugar
 

Hi Siva,

1. If you run the PostgreSQL, you likely want to temporarily open the
firewall to load data or get on a jump box of some sort that can access the
database. It's not really a CF issue at this point, it's a general issue of
seeding a database out-of-band from the application server.
2. If the above isn't an option and your CF is running Diego, you could
use SSH to get onto an app container after SCPing the data to that
container.
3. The only other option I can think of is writing a simple app that you
can push to CF to do the import.

Hope that helps,

Nick

On Wed, Dec 9, 2015 at 3:08 PM Siva Balan <mailsiva(a)gmail.com> wrote:

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume
many would have faced a similar circumstance of importing large sets of
data to their DB which is behind a firewall and accessible only through CF
apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was
deployed, but you might be able to connect to it from your local machine
using the psql CLI and the credentials for one of your bound apps. This
takes CF out of the equation other than the service binding providing the
credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to it.
Now I need to import a very large dataset(millions of records) into this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans


Siva Balan <mailsiva@...>
 

Hi Nick,
Your Option 1(Using psql CLI) is not possible since there is a firewall
that only allows connection from CF apps to postgres DB. Apps like psql CLI
that are outside of CF have no access to the postgres DB.
I just wanted to get some thoughts from this community since I presume many
would have faced a similar circumstance of importing large sets of data to
their DB which is behind a firewall and accessible only through CF apps.

Thanks
Siva

On Wed, Dec 9, 2015 at 2:27 PM, Nicholas Calugar <ncalugar(a)pivotal.io>
wrote:

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was deployed,
but you might be able to connect to it from your local machine using the
psql CLI and the credentials for one of your bound apps. This takes CF out
of the equation other than the service binding providing the credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I
have created a service instance of this postgres and bound my app to it.
Now I need to import a very large dataset(millions of records) into this
postgres instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans

--
http://www.twitter.com/sivabalans


Nicholas Calugar
 

Hi Siva,

You'll have to tell us more about how your PostgreSQL and CF was deployed,
but you might be able to connect to it from your local machine using the
psql CLI and the credentials for one of your bound apps. This takes CF out
of the equation other than the service binding providing the credentials.

If this doesn't work, there are a number of things that could be in the
way, i.e. firewall that only allows connection from CF or the PostgreSQL
server is on a different subnet. You can then try using some machine as a
jump box that will allow access to the PostgreSQL.

Nick

On Wed, Dec 9, 2015 at 9:40 AM Siva Balan <mailsiva(a)gmail.com> wrote:

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I have
created a service instance of this postgres and bound my app to it. Now I
need to import a very large dataset(millions of records) into this postgres
instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans


Siva Balan <mailsiva@...>
 

Hello,
Below is my requirement:
I have a postgres instance deployed on our corporate CF deployment. I have
created a service instance of this postgres and bound my app to it. Now I
need to import a very large dataset(millions of records) into this postgres
instance.
As a CF user, I do not have access to any ports on CF other than 80 and
443. So I am not be able to use any of the native postgresql tools to
import the data. I can view and run simple SQL commands on this postgres
instance using the phppgadmin app that is also bound to my postgres service
instance.
Now, what is the best way for me to import this large dataset to my
postgres service instance?
All thoughts and suggestions welcome.

Thanks
Siva Balan

--
http://www.twitter.com/sivabalans