TensorFlow Serving with Docker on AWS
Contents
TensorFlow Serving with Docker on AWS#
In this project/tutorial, we will
Deploy a TensorFlow model on AWS and serve it with TensorFlow Serving in a Docker container
The source code file for this tutorial are located in
examples/4-tensorflow-serving-docker-aws/
.
Deploying on Amazon EC2#
In this tutorial, we will take the containerised example TensorFlow model from the previous tutorial Serving TensorFlow Models in Docker and deploy it on Amazon EC2. To achieve this, we will
create an AWS Identity & Access Management (IAM) user with the required permission policies
install the AWS Command Line Interface (CLI)
upload our Docker image to Amazon Elastic Container Registry (ECR)
create a virtual server based on one of the Deep Learning AMIs on Amazon Elastic Cloud Compute (EC2)
pull the Docker images from ECR to the server instance and run it there
expose the server instance to the internet for answering HTTP prediction requests via TensorFlow Serving’s REST API
The Deep Learning AMIs are container images with a range of machine learning frameworks and tools installed, including TensorFlow, PyTorch and Apache MXNet.
In this tutorial, we will issue commands both from our host machine
terminal as well as the terminal of virtual servers running on EC2
in the Amazon cloud. To mark the difference, the commands for the local
machine are prepended with $
, while the commands to be run on the
virtual server are prepended with (ec2) $
.
Links & Resources
Amazon ECR: Using Amazon ECR with the AWS CLI
AWS Deep Learning Containers
Preparations#
Storage Encryption by Default#
In the EC2 management console, go to
EC2 Dashboard > Account attributes > EBS encryption
and set Always encrypt new EBS volumes to Enabled.
If the alias aws/ebs
does not work, go to the AWS Key Management Service
(KMS) management console and copy over the proper ARN.
Creating an IAM User#
Either extend an existing user’s permissions or create a new IAM user with the following permission policies attached
AmazonEC2ContainerRegistryFullAccess
AmazonECS_FullAccess
If it doesn’t yet exist, create an access key and note down the access key ID and the secret access key.
In the following, I will assume a user with the name tf-fmnist-ec2
.
Installing the AWS CLI#
The installation instructions
are given in the user guide in the AWS CLI documentation.
Download the *.zip
-file, uncompress it, install it with
$ ./aws/install --install-dir /path/to/aws-installdir --bin-dir /path/to/aws-bin-dir
and add /path/to/aws-bin-dir
to the PATH
environment variable
in .bashrc
:
$ export PATH=$PATH:/path/to/aws-bin-dir
In case the AWS CLI is already installed, check that it is up to date and if not, install the latest version as described here.
Run aws configure
and add the access key created earlier as described
here.
To add the user as the default profile, run
$ aws configure
Here, however, we will add the user tf-fmnist-ec2
under a specific profile
with the same name:
$ aws configure --profile tf-fmnist-ec2
After adding the access key ID and the secret access key, you will be asked
for a region (<AWS_REGION>
) and a default output format - in my case
eu-central-1
and json
.
A list of the regional endpoints and their names can be found
here.
Profiles with a specific name, i.e. the non-default profiles, can be
used in aws
commands like this:
$ aws s3 ls --profile <PROFILE>
The profile information is stored in ~/.aws/credentials
and
~/.aws/config
. More information on named profiles is given here.
Uploading the Docker Image to ECR#
On ECR, create a private or public repository named
tf-serving-fmnist
and enable Scan on push.
The commands needed to push images to this repository
can be displayed in ECR by selecting the repository and clicking on
View push commands.
Next, we have to authenticate Docker on our local machine to this repository:
$ aws ecr get-login-password --profile tf-fmnist-ec2 --region <AWS_REGION> | docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com
Tag the image and push it to the tf-serving-fmnist
repository on ECR:
list the images
$ docker images
Out:
REPOSITORY TAG IMAGE ID CREATED SIZE tf-serving-fmnist latest f34fefc2ee4c 27 hours ago 411MB tensorflow/serving latest e874bf5e4700 5 weeks ago 406MB
tag the image
$ docker tag tf-serving-fmnist:latest <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist:latest
push the image
$ docker push <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist:latest
Out:
The push refers to repository [<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist] 75d124fb0170: Pushed bb4423850a27: Pushed b60ba33781cd: Pushed 547f89523b17: Pushed bd91f28d5f3c: Pushed 8cafc6d2db45: Pushed a5d4bacb0351: Pushed 5153e1acaabc: Pushed latest: digest: sha256:8807f835d9beadfa22679630bcd75f9555695272245b11201bc39f9c8c55d6e0 size: 1991
The latest image from this ECR repository can be pulled with
$ docker pull <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist:latest
as described here.
Serving the Model on EC2#
Now, we are going to create an EC2 instance based on the Deep Learning Base AMI.
In the EC2 management console, go to Network & Security > Key Pairs,
create a key pair called tf-fmnist-ec2
and download it as a *.pem
-file.
Then, go to Instances and click on Launch instances. In this tutorial, we are using the AWS Deep Learning AMI (Ubuntu 18.04) from the AWS Marketplace.
Next, we have to choose an instance type. I have been running on a
t2.medium
which comes with 2 vCPUs, 4 GiB of memory and ‘low to
moderate’ network performance, which is more than enough for testing.
Also, it has storage on EBS, which can be configured to persist
after an EC2 instance is terminated. To achieve this, when launching
an EC2 instance, we have to deselect Delete on Termination in
Step 4: Add Storage.
In Step 5: Add Tags, you can add a tag with
key: Name
value: tf-fmnist-ec2
In Step 6: Configure Security Group, create a new security group with
the name tf-fmnist-ec2
and description SSH and TCP:8501 HTTP REST access
and add the following rule:
type: custom TCP
protocol: TCP
port range: 8501
source: anywhere
description: tensorflow-serving predict REST API
As the description suggests, opening port 8501 is necessary to allow internet traffic requesting a prediction from the model via TensorFlow Serving’s REST API to reach the EC2 instance.
After confirming the configuration in Step 7: Review Instance Launch by
clicking on Launch, a dialog window will ask you to either select an existing
key pair or create a new one. Here we can choose the existing key pair
tf-fmnist-ec2
. This concludes the instance configuration and will spin
up the requested server.
However, as long as no internet gateway is configured in VPC, the instance
cannot be reached via SSH or otherwise. Therefore, in the VPC management
console go to Virtual Private Cloud > Internet Gateways, click on
Create internet gateway and set the name tag to tf-fmnist-ec2
.
Once it is created, attach it to the existing VPC that is used by the
EC2 instance: Actions > Attach to VPC. Next, in the VPC management
console, go to Virtual Private Cloud > Route Tables, select the appropriate
one and check that in the Routes tab, there is a route with the newly
created internet gateway as a target. If this is not the case, click on
Edit routes and add a route with the following configuration:
destination: 0.0.0.0/0
target: the newly created internet gateway
Now, if the instance has started and is marked as running, we can connect to it from the terminal via SSH as described here:
$ ssh -i /path/to/tf-fmnist-ec2.pem ubuntu@<Public_IPv4_DNS>
The command with the adequate IP address can be displayed in the EC2 management console by selecting the instance and clicking on Actions > Connect > SSH client.
Note
In case of connectivity issues, be it reaching the instance via SSH or querying the deployed model via TensorFlow Serving, the VPC Reachability Analyzer is a very helpful tool. In the VPC management console, go to Reachability > Reachability Analyzer and click on Create and analyze path to configure an analysis. If the connection fails, the error message will give hints as to what should be fixed or set up differently.
When connected to the EC2 instance, update the AWS CLI as described
here
and then run aws configure
(ec2) $ aws configure --profile tf-fmnist-ec2
and specify the credentials of the tf-fmnist-ec2
IAM user.
We can then pull the Docker image from ECR as described here.
But before, we have to authenticate to the ECR repository - otherwise
we will get the no basic auth credentials
error from Docker.
Since we are not on a tty console, the aws ecr ... | docker login ...
command from before will not work. Instead, we can do
(ec2) $ docker login -u AWS -p $(aws ecr get-login-password --profile tf-fmnist-ec2 --region <AWS_REGION>) <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com
as suggested here.
Now we can pull the image with
(ec2) $ docker pull <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist:latest
Out:
latest: Pulling from tf-serving-fmnist
01bf7da0a88c: Pull complete
f3b4a5f15c7a: Pull complete
57ffbe87baa1: Pull complete
e72e6208e893: Pull complete
6ea3f464ef73: Pull complete
01e9bf86544b: Pull complete
68f6bba3dc50: Pull complete
dafe84328936: Pull complete
Digest: sha256:8807f835d9beadfa22679630bcd75f9555695272245b11201bc39f9c8c55d6e0
Status: Downloaded newer image for 009984474629.dkr.ecr.eu-central-1.amazonaws.com/tf-serving-fmnist:latest
009984474629.dkr.ecr.eu-central-1.amazonaws.com/tf-serving-fmnist:latest
show the list of all images with
(ec2) $ docker images
Out:
REPOSITORY TAG IMAGE ID CREATED SIZE
<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist latest f34fefc2ee4c 2 days ago 411MB
and run it with the same command already used in the previous tutorial Serving TensorFlow Models in Docker:
(ec2) $ docker run -p 8501:8501 -e MODEL_NAME=fmnist-model -t --name tf-serving-fmnist <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tf-serving-fmnist
Out:
[...]
2021-06-26 09:12:09.808270: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
Now we can query the deployed model with an HTTP request to the REST API
like in the last tutorial /examples/3-tensorflow-serving-docker/code.
For this, the script 1-request
is provided in
examples/4-tf-serving-docker-aws/
. In it, set the IPv4
variable to
the appropriate address for your EC2 instance.
Download a few example images of different items of clothing from the
internet and include their filenames in the script.
def load_images(images: Union[str, List[str]]):
if isinstance(images, str): images = [images]
array = np.empty(shape=(len(images), 28, 28, 1))
for i, image in enumerate(images):
img = tf.keras.preprocessing.image.load_img(
path = image,
color_mode = 'grayscale',
target_size = (28, 28))
array[i] = tf.keras.preprocessing.image.img_to_array(img=img)
return (255 - array) / 255
tshirts = [
'images/test/tshirt-1.jpg', 'images/test/tshirt-2.png',
'images/test/tshirt-3.jpg', 'images/test/tshirt-4.png'
]
sandals = [f'images/test/sandal-{i}.jpg' for i in range(1, 5)]
sneakers = [f'images/test/sneaker-{i}.jpg' for i in range(1, 5)]
# test_images = load_images(tshirts)
test_images = load_images(sandals)
# test_images = load_images(sneakers)
IPv4 = 'ec2-54-93-96-215.eu-central-1.compute.amazonaws.com'
# IPv4 = '54.93.96.215'
data = json.dumps({
'signature_name': 'serving_default',
'instances': test_images.tolist() # *either* 'instances'
# 'inputs': test_images.tolist() # *or* 'inputs'
})
headers = {'content-type': 'application/json'}
json_response = requests.post(
f'http://{IPv4}:8501/v1/models/fmnist-model:predict',
headers=headers,
data=data
)
predictions = json.loads(json_response.text)['predictions'] # for 'instances'
# predictions = json.loads(json_response.text)['outputs'] # for 'inputs'
for i, prediction in enumerate(predictions):
print('prediction {}: {} -> predicted class: {}'.format(
i, prediction, np.argmax(prediction)))
Runing the script on a few example pictures of sandals yields
$ python 1-request.py
Out:
prediction 0: [6.30343275e-05, 1.85037115e-05, 7.5565149e-06, 2.83105837e-05, 1.74145697e-07, 0.998879, 3.21875377e-05, 2.88182633e-07, 2.93241665e-05, 0.000941563747] -> predicted class: 5
prediction 1: [0.00130418374, 2.53213402e-05, 3.95829147e-06, 7.95430169e-05, 1.08047243e-06, 0.975131869, 0.000146661841, 0.0195651725, 2.74548784e-05, 0.00371479546] -> predicted class: 5
prediction 2: [0.000241088521, 3.73763069e-05, 1.56952246e-05, 0.000210506681, 5.54936341e-05, 0.992369056, 9.06738e-05, 0.00564925186, 0.00128137472, 4.94648739e-05] -> predicted class: 5
prediction 3: [9.74363502e-06, 7.06914216e-06, 6.15942408e-05, 0.00042412043, 0.000195012341, 0.910350859, 1.11019081e-05, 0.0710703135, 0.0176780988, 0.000191997562] -> predicted class: 5
so all example images were correctly classified as sandals.