Serving a Python Container in Production
Build, deploy and predict
The code for this example is available on Github.
Goals
The world of ML is vast. There are many different different tools to amalgamate if you want to maximize value. Going from research to production is not trivial. There are many things to consider and many decisions to make. I often stumble upon customers who use different frameworks such as Sci-kit, Pytorch and Tensorflow. Several customers use multiple frameworks and some even want to serve legacy python models.
In this post we will go through all of the steps necessary to build and deploy containers to AI Platform’s “models” product. This blogpost aims at providing a starting point for someone with a basic model who is looking to set up a framework- agnostic service layer which is secured, scalable and backed by an SLA. We will be making use of AI Platform’s ability to wrap any code and serve it.
Non-Goals
Containers are an incredible tool but I won’t be covering them in depth. For the purpose of this tutorial the main benefits are ease of deployment and the framework agnosticity they provide.
Why use AI Platform endpoints?
An organization should focus on it’s core business. It is rarely the case that creating a robust scalable serving layer is the business objective. You could design and build a capable serving layer using custom components such as a load balancer and an instance group of VM’s. GCP’s AI platform allows us to accelerate development by making use of ancillary services such as data preparation, metadata layer, feature store, etc. It can significantly shorten your deployment cycle and maintenance overhead.
Advantages
- Framework agnostic
- Infrastructure
- GPUs
- Scalability
- Secure
- SLA
- Real-time / batch requests
- Ease of development
- Time to deployment
- gRPC
Architecture
Working with custom containers
Our end goal is to have a service available behind an endpoint that we can use to make real-time requests and batch requests. Even though AI Platform is capable of natively handling serving for frameworks such as Tensorflow, Pytorch and Sci-kit, containers have become the de-facto go to on every step of the MLops pipeline. You can choose from a wide list of prebuilt images available on AI platform.
Preparing the container
For us to work with the AI Platform’s serving layer we must upload a container to an Artifact Registry hosted in the same project . Make sure that the service account used has the proper permissions to read the container (learn more about service accounts). When building the container we need to follow AI platform’s requirements.
First we create a Dockerfile. There is no need to invent the wheel. There are many base images we can build on top of. Since we’re using Python and need a running server I choose to create a basic flask image.
Create a Dockerfile file:
And from here we see that we need to add the requirements.txt file containing one line with the flask dependency:
And the app.py which can be replaced with any other code such as serving a Tensorflow model:
Steps
Now we will use the container. We will run the steps using an AI platform managed notebook service.
- Create the docker:
2. Push the docker to the an Artifact Repository:
3. Create a model:
4. Upload new version to the model from step #3:
5. Set the version as default:
6. Test the endpoint:
7. Rinse and repeat ~~~~~
Conclusion
Unless there are constraints that force you to deploy on the edge the advantages AI platform and ease of use of containers outweigh the benefit of building out a custom service.
Next steps
Follow this guide to serve pytorch models.