Root Cause Analysis for Quick Base Production Incident on 2020-Jun-09

By Quick Base Team posted 06-11-2020 13:53

  

Incident Overview

Between 4:33 PM and 5:23 PM Eastern US Time (50 Minutes) on Tuesday, June 9, 2020, the Quick Base platform was unavailable due to an accidental deletion of the public SSL certificate issued on our behalf by our CDN and DDOS prevention service, Cloudflare. An SSL certificate (more accurately called a TLS certificate), is necessary for a website to have HTTPS encryption, thereby making the connection between our customers and the Quick Base platform secure. The accidental deletion of the public SSL certificate used by the Quick Base platform occurred during a configuration change process initiated by Cloudflare for which the Quick Base team was not given advance notice, and Cloudflare could not roll back the changes. Cloudflare did not intend for the configuration changes to be disruptive but, in hindsight and with further review, they acknowledge that they did not account for all the possible consequences that could result from the configuration changes. Cloudflare has acknowledged the mistake and takes full responsibility. They are continuing to investigate the specific details that led to the mistake and we will make those details available once we get them from Cloudflare, which is expected to occur by June 17. Meanwhile, we are continuing to investigate and define the consequences of the Cloudflare configuration changes and assisting impacted customers. We will update this post as more information becomes available.

We understand that Quick Base customers rely on us for your important business processes each day and we strive to provide a world-class level of security, availability and performance for the Quick Base platform. Cloudflare is an essential partner in that effort. Although Cloudflare did not live up to our collective expectations with respect to their change management practices in this case, we are confident they will respond to this issue in a manner that will prevent such an issue from occurring again and continue to deliver the excellent security and availability on which their own business has been built.

NOTE: This incident did NOT and does NOT result in any security risk for Quick Base customers or their data. During this incident, Quick Base customers were unable to access the platform because we require a secure connection to be established before any data is sent by your browser or API scripts to the platform. Since the secure connection could not be established, no data ever left a customer’s network, i.e., their browser sessions or API scripts.

If you believe you are continuing to be negatively impacted by this incident we encourage you to read the remainder of this document for possible solutions and otherwise please submit a support case at https://www.quickbase.com/support.

Incident Details

We believe in a culture of transparency. To that end, we have included further details for those interested, or for customers that may still be experiencing issues.

The changes that occurred as part of the unannounced configuration change process initiated by Cloudflare on Tuesday, June 9, 2020, were as follows (all times are Eastern US Time):

  1. The public wildcard SSL certificate, i.e., *.quickbase.com, provisioned on behalf of Quick Base by Cloudflare, was accidentally deleted at 4:33 PM Eastern US Time resulting in the Quick Base platform being unavailable to customers. This is the certificate used to establish a trusted and encrypted connection between our customers and the Quick Base platform each time customers use the platform. The common terms used to describe the establishment of this trusted connection are "handshake" or "SSL handshake". Our monitoring systems detected a problem with the platform immediately at 4:33 PM, we triaged the issue and determined the SSL certificate had been deleted. We called Cloudflare at 4:43 PM and a new wildcard SSL certificate was provisioned by them thereby restoring access to the platform at 5:23 PM Eastern US Time.

  2. The SSL certificate chain was changed, including the public wildcard certificate and other certificates in the chain. A certificate chain is an ordered list of certificates, containing an SSL Certificate and Certificate Authority (CA) Certificates, that enable your browser or API script to trust the *.quickbase.com certificate because that certificate, itself, is trusted by other certificate authorities in the chain that your browser or API script already trust. The chain terminates with a Root CA Certificate. The Root CA Certificate is always signed by the CA itself.

  3. SNI (Server Name Indication), which is an extension of the TLS (SSL) protocol, was enabled as a requirement for every connection made between our customers and the Quick Base platform. SNI is an industry standard supported by 99.96% of hosts on the Internet (study conducted by Akamai). Only very old software and operating systems do not support it (mostly pre-2015 versions of software). Although the Quick Base platform itself does not require the use of SNI, Cloudflare made it a requirement for establishing the trusted connection, i.e., the SSL handshake. As of 7:15 PM Eastern US Time on Thursday, June 11, Cloudflare has disabled SNI as a requirement.

  4. The public IP addresses to which the Quick Base platform host names resolve were changed, i.e., the IP addresses to which <yourcompany>.quickbase.com resolve were changed.

Ongoing Incident Issues

For 99.9% of Quick Base customers, once Cloudflare provisioned a new SSL certificate for Quick Base, which occurred at 5:23 PM Eastern US Time on June 9, access to the Quick Base platform was restored and those customers were able to use the platform without further issues.

However, some customers continued to report issues after the new certificate was provisioned. The following is a list of the issues we have seen and the potential solutions or status of each.

  1. Customers using a browser or API script to access Quick Base who do not have Quick Base specific IP address routing rules or restrictions in place on their local network. A browser session or API script that has been open since before the start of the incident may try to get to the Quick Base platform using the public IP addresses used by Quick Base prior to the incident. Potential Solutions, listed first for a browser and then for an API script (in order of most probable solution):

    1. If you are using a browser to access Quick Base, try the following steps, in order:

      1. Close/open the browser session (that means close ALL currently open browser windows).

      2. Clear the browser cache and close/open the browser session (that means close ALL currently open browser windows after clearing the browser cache in one of them).

      3. Use a different browser. For example, if you are having trouble accessing Quick Base using the Chrome browser, try using another browser you may have installed on your computer, e.g., Firefox.

      4. Reboot your computer.

      5. Contact your IT department and ask them to remove <yourcompany>.quickbase.com from the cache of the DNS server used by your computer, or to restart the DNS server used by your computer. If you get to this step, you should repeat the prior steps if this step alone does not resolve your issue. Please note, the reference to <yourcompany>.quickbase.com should be replaced with whatever name you use in the URL field of your browser to access QuickBase, e.g., https://companyname.quickbase.com.

    2. If you are running an API script to access Quick Base, try the following steps, in order:

      1. Restart the API script or the service/program that runs the API script.

      2. Reboot the computer on which the API script is running.

      3. Contact your IT department and ask them to remove <yourcompany>.quickbase.com from the cache of the DNS server used by the computer on which the API script is running, or to restart the DNS server used by the computer on which the API script is running. Please note, the reference to <yourcompany>.quickbase.com should be replaced with whatever name you use in the API requests, e.g., https://companyname.quickbase.com.

  2. Customers using a browser or API script to access Quick Base who do have Quick Base specific IP address rules or restrictions in place on their local network. The practice of using a website's public IP addresses as part of local network rules or restrictions is known as “IP whitelisting”. Some examples of IP whitelisting are 1) only allowing use of the Internet for designated sites, such as Quick Base, by specifying in the customer’s local network rules the public IP addresses to which the Quick Base platform host names resolve, and 2) requiring all traffic destined for Quick Base (as defined by our public IP addresses) to use a specific VPN connection on the customer’s local network. The VPN use case is often paired with the use of Quick Base's “IP Address Filtering” feature which allows access to a Quick Base application, or an entire realm, to be limited to only certain client IP addresses. By forcing all traffic destined for Quick Base to come from the IP addresses used by the customer's VPN, the “IP Address Filtering” list only needs to include the IP address or address range used by the customer's VPN. Potential Solutions (in order of most probable solution):

    1. Quick Base does not support, and highly discourages, the use of IP address whitelisting as part of a customer’s local network rules or restrictions. The public IP addresses to which the Quick Base platform resolves can and do change without notice. Relying on IP address whitelisting on your local network can result in disruption to your access to the Quick Base platform if Quick Base’s public IP addresses change.

    2. If you are unable to discontinue use of IP whitelisting on your local network at this time, you may, at your own risk, use the following public IP addresses in your IP whitelist: 104.18.100.30 and 104.18.101.30. Any DNS lookup on a <realm>.quickbase.com host name will resolve to one of these two IP addresses which, as noted, are subject to change without notice. For example, if you do a DNS lookup on <companyname>.quickbase.com where <companyname> is the name of your realm in Quick Base, it will resolve to one of the two aforementioned IP addresses.

  3. Customers accessing Quick Base using a browser or API script that uses a locally installed copy of the public Quick Base SSL certificate, or SSL certificate chain. For example, customers may configure browsers, computer operating systems, or API scripts to refer to locally stored copies of public SSL certificates or SSL certificate chains instead of referencing the SSL certificates or SSL certificate chains directly on a destination website such as the Quick Base platform. Some common reasons for this practice are 1) the customer’s compute environment does not trust one or more of the SSL certificates in the SSL certificate chain, or 2) the customer wants to compare the locally installed SSL certificate or SSL certificate chain with the version from the website such as the Quick Base platform. Potential Solutions (in order of most probable solution):

    1. Quick Base does not support, and highly discourages, the use of locally installed SSL certificates or SSL certificate chains. Quick Base and Cloudflare use widely trusted SSL certificates, SSL certificate chains, and Certificate Authorities. SSL certificates and SSL certificate chains used by the Quick Base platform can and do change without notice. Relying on locally installed SSL certificates or SSL certificate chains can result in disruption to your access to the Quick Base platform if Quick Base’s public SSL certificate or SSL certificate chain change.

    2. If you are unable to discontinue use of locally installed SSL certificates or SSL certificate chains at this time, you may, at your own risk, download and install the public Quick Base SSL certificate and/or SSL certificate chain.

  4. Customers accessing Quick Base using a browser, computer operating system, or API script that does not support the SNI (Server Name Indication) TLS protocol extension either because the browser, computer operating system, or software used to run the API script is too old (typically a version from prior to 2015), or the software is specifically configured to NOT support SNI.

    1. As noted earlier in this document, although the Quick Base platform does not require the use of SNI, Cloudflare made it a requirement for establishing the trusted connection, i.e., the SSL handshake. Quick Base worked with Cloudflare to determine if it is possible to disable the required use of SNI and as of 7:15 PM Eastern US Time on Thursday, June 11, Cloudflare has disabled SNI as a requirement.

    2. NOTE: It is our intention to re-enable the required use of SNI at a date no later than September 30, 2020. Therefore, customers who experienced ongoing issues due to lack of support for SNI should plan to upgrade their browsers, computer operating systems, or API scripts to a version that supports SNI and to configure support for SNI no later than September 30, 2020.

  5. Customers accessing Quick Base using a browser, computer operating system, or API script that does not support TLS 1.2 or above. Although Cloudflare, nor Quick Base, changed our support of TLS recently, some customers have asked what version of TLS we support as part of support cases opened about this incident. Quick Base only supports encrypted requests using TLS v1.2 or v1.3. You can find more information about our TLS support in this Quick Base Community post.

Permalink