How we saved over 80% of requests with Cloudflare?

January 10, 2020

Few years ago, we migrated one of our legacy web applications into SPA with RESTful API. Throughout this journey, we have had many unique challenges and we got through them one by one. One of the challenges was our web servers serving the same content over and over again for multiple URLs across millions of requests. One of the constraint was that certain requests should be redirected to other in-house applications and third-party services. It is therefore mandatory for all requests to go through our web servers to choose the right back-end service. We couldn’t change that behavior because that was one of the basic operational requirements.

Hmm, we had a real challenge! It kind of reminds me the following quote and it is quite true, I have experienced many times in my life, I hope for you too.

Problems are like washing machines. they twist us, spin us and knock us around.. but in the end we come out cleaner, brighter and better than before.

In this article, I share how this challenge had twisted and knocked us around, and ultimately how we stopped over 80% of requests to our web servers.

TL;DR; – summary

Cloudflare is a proxy and more than a CDN, it is more specialized in providing front line security and performance.
Cloudflare has over 160 data centers worldwide, which brings the region less concepts. It means a user can connect to the nearby Cloudflare center rather than connecting to the place where the origin servers are located and it makes it easy for us to safely move and keep our contents very close to our users. We highly recommend using Cloudflare as a front liner to provide more security and improve the performance.
Page rules are effective for most of cases, unfortunately it wasn’t working for one of our use cases.
Cloudflare worker is a server-less function which can intercept any requests at a Cloudflare edge and run near the user’s location.
Worker helped us solve our challenges by taking control over requests using JavaScript code at Cloudflare edge.
When we started using worker for the first time, we had no tools to facilitate development as we have today. We created a small SDK to simulate the worker environment. We debugged the workers through logs and unit tests.
Even if workers are suited to a greater number of use cases, please be mindful on implementing them. The reason being that it might break the existing development, deployment, operations and DR processes, and add one more place to manage our apps. Again, it is a trade-off between performance and maintenance.
Keep in mind that the worker has its own limitations and not designed for long running requests. Refer to this link for more information.
Monitoring is an important component of operations; it is important to validate the effects of the changes done at each stage.

The Crux Of The Problem

Let us take a closer look at the context of the problem. Normally, SPA will have a single point of entry to access the application or certain cases, they may have a few different points of entry. In our case, we have over a thousand URLs to access the same application. This has caused our web-server to repeatedly serve the same SPA shell content for all SPA related requests. Every day, our servers began to serve the same content for millions of requests over and over again. As a result of this, our servers started consuming about 80-90% of our hardware resources (CPU and memory). And also, since we hosted this app on Azure and we had to pay a surcharge for the additional resources consumed. Another side effect was that website performance was hit (Time to first byte was raised).

The Heuristics

Initially, we had planned to use server side cache to serve the SPA shell content since it’s built dynamically for each request. But it had no effect since all the requests were still served from the web servers. Next step, we planned to introduce a proxy in front of our web servers and decided to go with Cloudflare. Cloudflare is more than a CDN. Those who are not aware of Cloudflare, refer to this link. Cloudflare provides lots of features, but we mainly use it for enhancing security and performance.

Cloudflare provides an option to cache the dynamic content like .aspx, .php, etc. This feature is called as Page Rule. We then configured a couple of page rules to reduce the number of requests to our web servers by adding caching at Cloudflare edge.

Though we saw a reduction in the number of requests to the web server, it was only a reduction of about 11% of requests to the web servers in a month. There was still room to reduce more requests.

Why Cloudflare page rules weren’t effective in our case?

Based on the above stats, you may be wondering why page rules were unable to achieve the expected results. The answer to this question boiled down to how page rules work in Cloudflare.

To store and retrieve a cache, Cloudflare needs a Cache Key and a value. The value is the actual response from the origin server for a specific request. The Cache Key is constructed from this request by combining aspects such as Cloudflare Zone ID, scheme, hostname, some headers and path with query strings into a Cache Key. When a request received at an edge, Cloudflare will form the Cache Key and it will check with the cache server for the presence of Cache Key. The cache server will then return the cached response when the Cache Key matches.

However, the following rules will have a major influence on cache retrieval,

Only GET requests can be cached in the Cloudflare through page rules.
When the sequence of query string parameters is changed, then Cloudflare will process as a different request.
Cache Key is case sensitive, so when any segment or character of URI has different case (lower or upper case or both) then Cloudflare will treat as a different request.

Because of the above reasons, the page rules turned out to be ineffective, especially in our use case.

So, what else now? We nearly gave up until the Cloudflare’s Worker was released.

Cloudflare Worker for the rescue

The worker is a server-less function that intercepts all requests between Cloudflare edge servers and origin servers. It was strongly influenced by browser-based service workers and therefore it called as “worker”. Worker requires a URL route and custom Java Script code to execute, it will intercept the requests, and execute the custom code when configured route matches. Fundamentally, worker helps us to gain more control over the selected requests through custom code at a Cloudflare edge. This is precisely what we want to do to resolve the challenge. To find out more about worker, refer to this link.

The following diagram depicts the architecture of a Cloudflare Worker.

The following diagram explains how we exactly implemented the worker for our use case.

The flow highlighted in purple is the one doing the magic to stop sending over 80% SPA related requests to our origin web servers. Cloudflare has got more than 160 data centers across the globe. Users connect to nearby data centers and all SPA related requests, will now be served from the nearby data centers’ cache (first request will go to the origin server when cache not exists in the particular data center).

Here are some stats about SPA and Cloudflare cached requests before and after the worker implementation.

SPA Requests to origin server:

Before worker: ~300K requests per day

After worker: ~50K requests per day

The flow of requests to the origin server was reduced by ~84%.

Other observations on the system,

Request queuing was reduced
Web servers have been more quiet since traffic requirements gone down to 80 rpm from 400 rpm
Ingress and egress of data were reduced
Time to first byte was decreased

Conclusions

Clouflare Worker helped us overcome a challenge. Cloudflare is best known for security and web performance on the market. I strongly recommend using Cloudflare for your web apps to improve security and performance. Clouflare has a free subscription that best suits your personal blogs or other sites. I hope you found something useful from this post. Thank you for the reading and we will meet in a different post.

Happy problem solving!