Uptime, Availability, and SLAs

There are two separate questions about uptime. The first is what Medplum commits to in a contract. The second is what Medplum has actually delivered in production. These are different things, and this page covers them separately.

Medplum commits to 99.99% uptime in its service level agreement (SLA). In production, Medplum has historically operated above that level.

Availability percentages

Availability is usually expressed as a percentage. The table below shows the time allowed at each level.

Availability	Per year	Per 30-day month
99.9% (three nines)	about 8.8 hours	about 43 minutes
99.99% (four nines)	about 53 minutes	about 4.3 minutes
99.999% (five nines)	about 5.3 minutes	about 26 seconds
99.9999% (six nines)	about 32 seconds	about 2.6 seconds

The amount of allowed downtime depends on the measurement period. At 99.99%, the service can be unavailable for about 53 minutes per year, or about 4 minutes per 30-day month. SLAs are usually measured per month.

What Medplum commits to

Medplum offers a 99.99% uptime SLA for the production hosted platform. The SLA is measured per month. It covers authentication, the FHIR API, subscriptions and webhooks, WebSocket connections, and Bot execution. If uptime falls below 99.99% in a month, the customer receives service credits as described in the agreement. Medplum sets the SLA at a level it can sustain under normal and adverse conditions.

IMPORTANT

The exact percentages, measurement methods, credit amounts, and exclusions are defined in the customer agreement. This page is documentation, not a contract or legal advice.

How Medplum achieves high availability

The platform is designed so that no component requires downtime to operate, scale, or upgrade.

The platform runs across multiple AWS Availability Zones. If one zone fails, the service continues to run in the others. This is the default configuration, so customers do not need to set anything up to get it.

Compute runs on Amazon ECS. ECS detects unhealthy hosts and replaces them automatically. Load balancers are configured for fast failover. The platform retries failed requests automatically where appropriate. Deployments do not require downtime. This includes major-version upgrades of the primary database, which run without a maintenance window.

Historical performance

This section describes measured performance. It is not a guarantee and is not part of any contract.

Medplum has historically operated between five and six nines of availability in production. Downtime in a given period has ranged from zero to a few minutes.

Past performance does not predict future performance. Medplum publishes these figures so customers can compare actual results against the contractual commitment.

What counts as downtime

Availability is the percentage of minutes in a measurement period during which the service can accept and respond to valid requests. The exact formula and the definition of a qualifying outage are set in the customer agreement.

The following are generally not counted as downtime, subject to the agreement:

Problems caused by customer configuration, customer code, or customer networks.
Failures of third-party systems outside Medplum's control.
Force majeure events.

Self-hosted deployments run on customer infrastructure. They are not covered by the hosted SLA.

Status and incidents

Medplum publishes a status page with real-time information about uptime, performance, and incidents. The status page includes historical uptime data and a record of past incidents, including root cause analyses and remediation steps.

You can also monitor uptime through third-party services that check the availability of the Medplum API endpoints. Some of these services include:

Enterprise customers with shared Slack channels receive real-time notifications of incidents and maintenance events. Customers can also subscribe to email notifications for the status page.

Availability percentages​

What Medplum commits to​

How Medplum achieves high availability​

Historical performance​

What counts as downtime​

Status and incidents​