How to automate a Python script on Google Cloud

Being able to automate your Python scripts is essential to get the most out of our scripts. After all, many processes are periodically repeated and it does not make any sense to execute the same code over and over again. This could be applied from ETL processes to putting models in production (in batch).

One of the most typical ways to automate a script is to automate it locally… but that has a problem: your computer must be on for it to work.

Fortunately, that is not the only way to go. In fact, today we are going to learn how to automate a Python script on Google Cloud, so that you ensure that your automation will always work, even if we go on vacation. Sounds interesting right? Let's get to it!

How to automate a Python script on Google Cloud

Creating our Python script to automate

First of all, to automate a script we must have a script. In this case, we are going to do something quite common: extract data from an API and upload it to a database. It is something very typical that is usually done in ETL processes.

For this, I have created a free database on Heroku. This is a Postgre database and, regarding the automation, the table is already created, we just have to add new data.

In any case, the idea is as follows: we are going to extract the position (latitude, longitude, etc.) of all the airplanes in circulation using the OpenSky API and we will upload them to our database. This is something that we would have to do, for example, if we want to visualize the evolution of the position of the planes.

The script we will automate is the following:

              import requests import pandas as pd from sqlalchemy import create_engine   url = "https://opensky-network.org/api/states/all" response = requests.get(url) data = response.json() df = pd.DataFrame(data['states'])  columns = [     'icao24','callsign','origin_country','time_position','last_contact',     'longitude','latitude','baro_altitude','on_ground','velocity','true_track',     'vertical_rate','sensors','geo_altitude','squawk','spi','position_source' ] df.columns = columns  # I get only 20 rows to not exceed HerokuDb free limit df = df.iloc[0:20,:]  host = 'XXX-XX-XXX-XXX-XXX.eu-west-1.compute.amazonaws.com' db = "xxxxxxxxxxxxxxx" user = "xxxxxxxxxxxxxxx" pasw = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"  engine = create_engine('postgres://xxxxxxxxxxxxxxx:xxxxxxxxxxxxxxx@xxx-xx-xxx-xxx-xxx.eu-west-1.compute.amazonaws.com:5432/xxxxxxxxxxxxxxx') df.to_sql('flights', engine,if_exists='append')            

With this we can now see how to automate our Python script on Google Cloud. Let's do it!

Setting up our function in Cloud Functions

Cloud Functions is a Google Cloud service that allows us to run Python functions in the cloud without setting up a server.

Besides, Cloud Function offers two very interesting things:

  1. We can choose on which Google Cloud servers the function will be saved. Being able to choose the location of the server will give you a shorter response time.
  2. The trigger of the function. We have different triggers available, from an HTTP point to a Pub / Sub topic.

In our case, we will choose a Pub / Sub topic as the trigger as it will make automation easier and more secure. Let me explain myself: if we choose HTTP calls, we must choose whether the call will require credentials or not. Ideally, the call should require credentials, otherwise, anyone could invoke it and we do not want that.

The point is that, if we set the credentials as required, in order to call that endpoint we will have to download the credentials of a service account, pass them in the script… Despite this is not difficult, in fact, we have done it in many posts (example1, example2), it is a bit boring.

Instead, if we subscribe to a Pub / Sub topic, the authentication is done internally, so the credentials are not necessary. Of course, when automating the function with Cloud Scheduler, we will have to send messages to that topic instead of making POST requests. Anyway, this is something we will see later.

In any case, this is how we should configure our function in Cloud Functions:

How to add environment variables to a Cloud Functions function

Also, we should try to avoid exposing sensitive data on the script, such as the hostname, username, or password. To do this, Cloud Functions offers the possibility of creating environment variables so that these values are not exposed in the code

Also, if any of the values change, we will not have to redeploy everything, we would only have to change these variables.

How to add environment variables to a Cloud Functions function

With this, Cloud Functions would already be set up. Now, let's see how to upload our Python function to Cloud Functions!

Uploading our Python script to Cloud Functions

Now we will find a panel like the following:

Cloud Functions panel

First of all, we must choose the programming language that we are going to use. In our case, Python 3.7.

After that, main.py will appear. In this file, we must define the function that we want to be executed. In this sense, it is important to highlight the following:

  • Pub / Sub will pass two parameters to the function. Therefore, our function should require two parameters, even if they are not used later.
  • For Cloud Functions to work properly, the entry point and the function name must match.

Also, to debug and see what has gone wrong, I would recommend doing error and status returns. In my case, I only return an Ok to check that everything is fine.

Finally, we have to define the requirements of our script in the requirements.txt file. Here we must indicate which packages and in which versions our script needs to work.

In my case, the requirements are as follows:

              requests==2.22.0 pandas==0.24.2 sqlalchemy==1.3.5 psycopg2==2.8.6            

Once we have configured the function and its requirements, we simply have to deploy the function. If the display is correct, we will see a green check with our function. However, even if the green check appears this does not mean that your function is working correctly. To check so, you have to go to 'Testing' and click on 'Test the function'.

If the output is what you expected … congratulations! You just have taken an important step in learning how to automate your Python script on Google Cloud. Let's see now how to automate it!

How to automate a Cloud Function with Cloud Scheduler

To automate the function, we must first create a Job in Cloud Scheduler. To do so, we have to define how often this function will be called. This is defined in unix cron format, which I already talked about in this post.

Basically, you have to indicate at what minute, hour, day of the month, month, and day of the week, we want our function to run. For example, if we want it to run every minute, we should indicate * * * * *, while if we want it to run every hour, we should indicate 0 * * * *, so that it runs at every o'clock.

At first, it might seem a little bit "difficult" to fully understand how it works, so I would recommend using tools that allow you if the format is correct or not. I, for example, use Crontabguru.

In my case, for example, I will automate it to run every hour. Also, as I mentioned before, we will send a message to Pub / Sub. To do this, we must choose the topic to which we have subscribed our function, which in my case was flights.

In addition, we will have to indicate the payload, some additional attributes that, in our case, we do not use and that, therefore, we can put what we want.

How to configure Cloud Scheduler to automate Cloud Functions

We save and… we have already learned to automate our Python script on Google Cloud!

Conclusions on automating our Python script on Google Cloud

As you can see, automating our Python script on Google Cloud is something super simple (much more than automating an R script in Google Cloud), but very very useful.

In addition, in terms of costs, it is not expensive: Pub / Sub offers 10Gb of free messages per month, Cloud Functions 2 million invocations per month, and the first three tasks of Cloud Scheduler are free. In short, for simple processes, surely there is no need to pay.

But, despite being free (or very cheap), with what you have learned today you can take your Python scripts to the next level, whether you need to automate ETL processes or even if you want to put your algorithm into production!

In any case, I hope that this post has been useful for your day to day and that thanks to it you can help your company achieve better results.

As always, any post suggestions do not hesitate to answer the survey or write to me on Linkedin. See you in the next one!