Lab Assignment (Fall 2024)


Table of Contents


This lab is about the deployment of a micro-service application in a Kubernetes cluster, the management of this application and of the cluster. It includes a set of mandatory steps as well as additional steps that will allow you to extend the work in different directions.


Important information


Collaboration and plagiarism

You are encouraged to discuss ideas and problems related to this project with the other students. You can also look for additional resources on the Internet. However, we consider cheating and plagiarism very seriously. Hence, if any part of your final submission reflects influences from external sources, you must cite these external sources in your report. Also, any part of your design, your implementation, and your report should come from you and not from other students. In case of plagiarism, your submission will not be graded and appropriate actions will be taken.

If you store your work in a Git repository on a platform such as GitHub or GitLab, your files must not be publicly accessible.


Your submission

How to submit? see here

Before the deadline of this lab assignment, you should submit in a single email:

The title of the email must be: [M2 Mosig Cloud] lab submission. The body of the email must contain the name(s) of the student(s).

Please find below the main instructions regarding each part of the submission.

The report

The report must be in md (MarkDown) or pdf format. (Other formats will be rejected.)

It must include the following information:

Your report can also include a description of the steps that your tried but did not succeed to address. In this case, describe:

Your code

To submit your code, you can:

Your code should include a minimal documentation (README) describing:

Although the documentation can be short, it should be sufficiently clear/detailed to allow the teachers to reproduce the tests/experiments that you have done.



Overview

Main instructions

The lab is divided into 3 parts:

Doing at least the base steps is required to obtain a passing grade. Doing the base and advanced steps is required to get a good grade. To get a very good grade, you need to have investigated at least some bonus steps.

Recommendations


The “Online Boutique” demo application

This lab work targets a microservice demo application developed by Google and named “Online Boutique”. As the name suggests, this is a mock application that simulates an online shop. It is made of a combination of 11 microservices. For most of these services, the implementation is very simplified (compared to the expected features of a realistic application). Nonetheless, it is sufficient to illustrate the structure and operation of a reasonably complex cloud-native application.

The demo application is available from its GitHub repository. The main documentation can be found in the top-level README file and in the docs folder.

Note: This demo application used to be named “Hipster Shop”. Some documents and code/configuration files still use this name.



Base steps [Mandatory]

Deploying the original application in GKE

Follow the guidelines from the documentation to try and deploy the demo application on GKE. Check that the application is working correctly - via the following means:

Note the following points:


Analyzing the provided configuration

To better understand how the deployed application is configured, we would like you to analyze the configuration file that is used for deployment.

In this step, we ask you to select one service (we let you choose the one you prefer), and to explain in your report the purpose of the different parameters that are used to configure this service in the file kubernetes-manifests.yaml.


Deploying the load generator on a local machine

To test your application, it would be better to deploy the load generator on a machine that is outside of the Kubernetes cluster (so that, the load generator does not consume the resources of the cluster).

As an intermediate step, we ask you to manually deploy the load generator on a local machine – either:

To do so, the best solution is to deploy the load generator as a Docker container. Several resources can help you in this task:

If you are unfamiliar with Docker, we suggest that you start by learning the basics. Here are some resources that might help you (many more resources are available on the Web):


Deploying automatically the load generator in Google Cloud

For a more convenient and realistic way of testing your application, it would be better to deploy the load generator in the cloud but outside of the GKE cluster where the online boutique runs.

In this stage, we ask you to deploy the load generator in a virtual machine (VM) that you will have reserved and configured on Google Cloud (the VM service of Google Cloud is named Google Compute Engine - GCE). To simplify the deployment of the load generator, you should re-use the work done in the previous step and take advantage of Docker.

Reserving and configuring a VM manually is a first option to deploy the load generator. However, this is an operation that will probably have to execute multiple times to run tests. Being able to do that automatically would be a better option.

Hence, we ask you to adopt an infrastructure-as-code approach, where you write the code that will allow to do all these operations automatically, instead of executing these operations manually. Different tools can be used for this purpose, some of the most popular are:

Note that Terraform can also be used for resource configuration.

Here are some simple examples that can help you in the design of the solution for this step:



Advanced steps [Mandatory]

Monitoring the application and the infrastructure

When you deploy a GKE cluster, some monitoring tools are already deployed for you, and you can already observe some monitoring data through dashboards accessible through the Google Cloud console.

Still, it can be important to deploy your own monitoring infrastructure. Hence, in this step, we ask you to deploy your own monitoring stack in your Kubernetes cluster. The stack should be based on:

Both components should be deployed inside your Kubernetes cluster.

Your monitoring infrastructure should collect at least the following information:

You should provide already configured Grafana dashboards that allow visualizing these data.

Note that collecting information at the node and at the pod levels requires deploying exporters. More specifically, in this case, the node exporter and cAdvisor can be good options.

Suggestions to further extend the work on monitoring are described here.


Performance evaluation

The load generator coming with the application allows you to evaluate the performance of your application. While injecting load in the application, Locust collects metrics such as the response time and the number of failed requests for different types of requests.

In this part, we ask you to conduct a performance evaluation of your application running in your Kubernetes cluster.

Here are our main recommendations:

Suggestions to further extend the work on performance evaluation are described here.


Canary releases

In this step, the goal is to support canary releases of the microservices, which is a useful technique to achieve continuous deployment. More precisely, we want to test and deploy a new version v2 of a microservice while still using the current (stable) version v1 of the service for a majority of the production requests. For simplicity, in this lab, we will apply this technique only to a single microservice within our software architecture, and we will only consider this canary technique in the case of a stateless microservice. The choice of the microservice to consider for this exercise is left to you.

Some of the links below might be helpful for this part:

For this part, the first thing you have to do is modifying the code of the microservice to create a new version (v2). For simplicity, you can keep the code very similar to the one of the first version and simply modify a text string or value displayed to the end user (so that the change of version can be easily noticed when testing).

Then, you are asked to implement and test the following steps:

Suggestions to further extend the work on canary releases are described here.



Bonus steps

Monitoring the application and the infrastructure [Bonus]

These steps extends the advanced task: Monitoring the application and the infrastructure.

Collecting more specific metrics

In addition to general metrics about resource consumption, dedicated exporters can allow you to collect more specific metrics related to some components of your application. For instance, one could:

You could even envision to write your own exporter to collect application-specific metrics.

In this part, we suggest you to dig into this topic.

Raising alerts

Prometheus allow users to configure alerts on some events, and to send these alerts through different means.

We suggest you to look at how this work, and to configure some alerts for your deployed application.


Performance evaluation [Bonus]

This step extends the advanced task: Performance evaluation.

Identifying bottlenecks

When evaluating the performance of your application, you should be able to identify the maximum performance of your application for a default deployment configuration. The maximum performance is when adding more clients does not translate into processing more requests, or said differently, when adding more clients results in significantly higher response time.

When the maximum performance is reached, it means that there are one or several bottlenecks. A bottleneck is hardware or software component that is saturated and prevents the performance from going higher.

We suggest that you take advantage of the monitoring infrastructure you deployed (Step Monitoring the application and the infrastructure) to identify the bottleneck(s) in your deployed application.


Autoscaling [Bonus]

Autoscaling techniques are used to adapt the amount of resources allocated to an application/service/cluster to the workload. In Kubernetes, different scaling mechanisms are available:

We suggest that you try to apply autoscaling strategies to improve the performance of your application. This task involves:


Optimizing the cost of your deployment [Bonus]

Any resource used on a public cloud infrastructure has a cost. The less resources you use, the less you pay. In this step, we suggest you to study how to minimize resource usage for your deployed application. Different aspects can be considered here. Here are some directions you could explore:

Resource consumption should always be analyzed with respect to performance to draw meaningful conclusions. Consuming a large amount of resources to process a high number of requests per second can make sense, while consuming the same amount of resources to process a few requests per second might not be satisfying.

Performance may also depend on which resources are allocated (with the same budget, I could allocate several small VMs or a few large VMs), and to which part of the system the resources are allocated (I could allocate more resources to one service or to another).

Run tests and present the results in your report, as well as the conclusions your draw from these results. Consider cost metrics to analyze your results (e.g., dollars per hour, dollars per 1000 requests, etc.).


Canary releases [Bonus]

This step extends the advanced task: Canary releases.

We now consider a version v3 of the microservice, and we assume that this version is defective. The defect could be a bug or a performance issue. Here, for simplicity, v3 will be identical to v2 except the fact that it introduces a significant additional (artificial) delay in the processing of each request (e.g., 3 seconds per request).

Set up a configuration allowing to automatically test a canary release and rollback (remove) it when it is found to be defective.


Managing a storage backend for logging orders [Bonus]

The goal of this step is to extend the initial architecture of the “online boutique” web application: we will add a service that will store a log of all the orders placed by the customers of the shop.

Introducing the log

To introduce the log service and integrate it in the application, there are several things to do:

Notes :

Using a storage backend for the log

Now the goal is to plug a real storage backend behind the OrderLog microservice in order to store the log: in this design, OrderLog is a stateless entry point that hides the implementation of the storage backend and forwards the requests to the (stateful) backend.

You are free to choose the type of storage backend and the corresponding implementation. We recommend using a database and we suggest some typical examples:

In any case, we advise you to use a simple data model for storing the orders details (advanced data management is outside of the scope of this course). Similarly, you are not required to deal with database transactional guarantees.

Another important design question is the management of the storage backend. There are three main ways to architect the storage backend.

You are free to choose the any of the three approaches listed above. Note however that:

Important note: Checking the correct operation of your log will require the possibility to query the backend. There is no obligation for you to extend the implementation of the application (i.e., introducing a graphical/web interface to display the contents of the log) for this purpose; this would of course be more realistic but also time consuming. Instead, you can simply write a small script or connect to a web interface of the database to manually query it.

Making the log persistent

If this aspect has not already been addressed in the previous steps, the goal is to achieve persistence for your order log. This means that the online boutique application must be able to retrieve the contents of the log even across shutdowns and restarts (of the application, of the GKE cluster, of the GCE virtual machines, etc.).


Deploying your own Kubernetes infrastructure [Bonus]

The goal of this step is to replace the managed Kubernetes infrastructure operated by the cloud provider (in this case, GKE operated by GCP) by a Kubernetes infrastructure under your control. This involves the following requirements:

Some recommendations:


Review of recent publications [Bonus]

During the course, we have read different research papers (we refer here to the “Mandatory Readings”). We suggest you to select the article you enjoyed the most and to write a review of this article. Reviewing an article involves doing some of the following things:

Warning: A simple summary of an article through copy/paste of existing content or minimal rephrasing will not give points