Serving a Python Container in Production

Build, deploy and predict

Jonathan Yarkoni
4 min readApr 2, 2021

The code for this example is available on Github.

Goals

The world of ML is vast. There are many different different tools to amalgamate if you want to maximize value. Going from research to production is not trivial. There are many things to consider and many decisions to make. I often stumble upon customers who use different frameworks such as Sci-kit, Pytorch and Tensorflow. Several customers use multiple frameworks and some even want to serve legacy python models.

In this post we will go through all of the steps necessary to build and deploy containers to AI Platform’s “models” product. This blogpost aims at providing a starting point for someone with a basic model who is looking to set up a framework- agnostic service layer which is secured, scalable and backed by an SLA. We will be making use of AI Platform’s ability to wrap any code and serve it.

Non-Goals

Containers are an incredible tool but I won’t be covering them in depth. For the purpose of this tutorial the main benefits are ease of deployment and the framework agnosticity they provide.

Why use AI Platform endpoints?

An organization should focus on it’s core business. It is rarely the case that creating a robust scalable serving layer is the business objective. You could design and build a capable serving layer using custom components such as a load balancer and an instance group of VM’s. GCP’s AI platform allows us to accelerate development by making use of ancillary services such as data preparation, metadata layer, feature store, etc. It can significantly shorten your deployment cycle and maintenance overhead.

Advantages

  • Framework agnostic
  • Infrastructure
  • GPUs
  • Scalability
  • Secure
  • SLA
  • Real-time / batch requests
  • Ease of development
  • Time to deployment
  • gRPC

Architecture

Working with custom containers

Our end goal is to have a service available behind an endpoint that we can use to make real-time requests and batch requests. Even though AI Platform is capable of natively handling serving for frameworks such as Tensorflow, Pytorch and Sci-kit, containers have become the de-facto go to on every step of the MLops pipeline. You can choose from a wide list of prebuilt images available on AI platform.

Preparing the container

For us to work with the AI Platform’s serving layer we must upload a container to an Artifact Registry hosted in the same project . Make sure that the service account used has the proper permissions to read the container (learn more about service accounts). When building the container we need to follow AI platform’s requirements.

First we create a Dockerfile. There is no need to invent the wheel. There are many base images we can build on top of. Since we’re using Python and need a running server I choose to create a basic flask image.

Create a Dockerfile file:

And from here we see that we need to add the requirements.txt file containing one line with the flask dependency:

And the app.py which can be replaced with any other code such as serving a Tensorflow model:

Steps

Now we will use the container. We will run the steps using an AI platform managed notebook service.

  1. Create the docker:

2. Push the docker to the an Artifact Repository:

3. Create a model:

4. Upload new version to the model from step #3:

5. Set the version as default:

6. Test the endpoint:

7. Rinse and repeat ~~~~~

Conclusion

Unless there are constraints that force you to deploy on the edge the advantages AI platform and ease of use of containers outweigh the benefit of building out a custom service.

Next steps

Follow this guide to serve pytorch models.

--

--