Self-Hosting Best Practices

You've decided to self-host Medplum — great. This guide covers operational practices that will help you run a stable, maintainable deployment and set yourself up for long-term success.

These recommendations come from patterns we've seen across self-hosted deployments, both what works well and what causes pain down the road.

Infrastructure Organization

Use a Dedicated Cloud Account

Deploy your Medplum infrastructure in a dedicated cloud account (e.g., an AWS child account under your organization). This provides several benefits:

Smaller blast radius: Medplum resources are isolated from your other workloads
Simplified access control: Easier to grant scoped permissions for audits, compliance reviews, or vendor support
Cost visibility: Clear attribution of Medplum-related infrastructure costs
Support readiness: If you ever need hands-on assistance from the Medplum team, granting temporary access to a dedicated account is straightforward and doesn't expose your broader infrastructure

Infrastructure as Code

Manage all infrastructure through code (Terraform, CDK, CloudFormation, etc.) rather than manual console changes. This gives you:

An audit trail of what changed and when
Reproducible environments for staging and disaster recovery testing
Easier onboarding for new team members
The ability to roll back problematic changes

Medplum provides CDK templates as a starting point. If you diverge from these templates, document your changes and the reasoning behind them.

Database Management

Treat the Database as Sacred

The Medplum database schema is managed by the application. Manual changes to PostgreSQL — adding tables, columns, indexes, or modifying data directly — can cause upgrade failures, unexpected behavior, and makes troubleshooting significantly harder.

If you need to make direct database changes:

Document everything: Keep a log of what was changed, when, why, and by whom
Understand the risks: Schema modifications may conflict with future Medplum upgrades
Inform support: If you later engage Medplum for assistance, disclose any manual changes upfront — this information is critical for diagnosis

We strongly recommend treating direct database manipulation as a last resort, not a routine operational tool.

Plan for PostgreSQL Major Version Upgrades

PostgreSQL major version upgrades (e.g., 14 → 15 → 16) require manual intervention and planning. AWS does not automatically perform major version upgrades for RDS.

Schedule regular upgrades: Don't let your PostgreSQL version fall too far behind. Running an unsupported version limits your options during incidents.
Test in staging first: Major version upgrades can surface compatibility issues. Always test your upgrade path in a non-production environment.
Budget the time: A major version upgrade, including preparation, testing, and execution, typically requires several days of focused effort.

Observability and Monitoring

Medplum's infrastructure-as-code templates provide the core application components but are intentionally unopinionated about observability. Every organization has different tooling preferences and compliance requirements, so we leave this configuration to you.

That said, having observability is non-negotiable for production deployments.

Minimum Viable Monitoring

At a minimum, instrument monitoring for:

Component	Key Metrics
Application servers	CPU, memory, disk, network
PostgreSQL (RDS)	CPU, memory, connections, replication lag, disk IOPS
Redis	Memory usage, connections, evictions
Load balancer	Request count, error rates, latency percentiles

Recommended Additions

Once basics are covered, consider adding:

Application-level metrics: Requests per second, response time distributions, error rates by endpoint
Alerting thresholds: Alerts that fire before you hit critical limits, not after
Log aggregation: Centralized, searchable logs from all components
Distributed tracing: Particularly valuable for debugging slow requests or integration issues

Tooling Options

Common choices we've seen work well:

AWS-native: CloudWatch metrics, alarms, and dashboards
Open source: Grafana + Prometheus/Loki/Tempo (LGTM stack)
Commercial: Datadog, New Relic, Sumo Logic

We're less concerned about which tool you choose and more concerned that something exists. Flying blind in production is how small issues become outages.

Capacity Planning

Don't Run Hot

There's a natural temptation to maximize utilization — running servers at 70-80% CPU feels like you're getting good value. The problem is that this leaves no headroom for traffic spikes, garbage collection pauses, or unexpected load.

Our recommendations:

Target 40-50% average CPU utilization on application servers during normal operation
Set alerts at 60-70% to trigger scaling or investigation before you're in trouble
Monitor memory pressure, not just usage — watch for swap activity and OOM kills
Right-size your database: RDS instance types can be changed, but doing so during an incident is stressful

A system with headroom absorbs surprises gracefully. A system running hot turns small spikes into cascading failures.

Connection Limits

Pay attention to database connection limits, especially as you scale horizontally. Each application server maintains a connection pool, and it's easy to exhaust RDS connection limits without realizing it.

Backup and Disaster Recovery

Verify Your Backups

Having automated RDS snapshots enabled is necessary but not sufficient. You should:

Test restores periodically: Actually spin up a database from a snapshot and verify data integrity
Document your RTO/RPO: Know how long recovery takes and how much data loss is acceptable
Understand point-in-time recovery: RDS supports PITR, but you need to know how to use it before an incident

Multi-AZ and Redundancy

For production deployments, ensure:

RDS is configured for Multi-AZ deployment
Application servers span multiple availability zones
You understand what happens during an AZ failure

Security Hygiene

Network Architecture

Database and Redis instances should not be publicly accessible
Use private subnets for backend services
Administrative access (SSH, database connections) should go through a bastion host or VPN
Review security group rules periodically — they tend to accumulate exceptions over time

Secrets Management

Store credentials in a secrets manager (AWS Secrets Manager, Parameter Store, HashiCorp Vault) rather than environment variables or config files
Implement a rotation strategy for database credentials and API keys
Audit who has access to production secrets

Patching Cadence

Establish a regular schedule for:

OS-level security patches on EC2 instances
Container base image updates if using ECS/Fargate
Dependency updates in any custom code

Upgrade Strategy

Stay Reasonably Current

Drifting too far behind Medplum releases creates compounding problems:

Security patches don't reach you
Bug fixes you might need aren't available
Eventually upgrading requires jumping multiple versions, which is riskier

We recommend staying within 2-3 minor versions of the latest release.

Maintain a Staging Environment

Before applying any Medplum upgrade to production:

Apply the upgrade to a staging environment that mirrors production
Run your test suite and smoke tests
Verify integrations still work as expected
Only then proceed to production

Watch the Medplum GitHub releases or join the Discord to stay informed about new versions, security patches, and breaking changes.

Document Your Customizations

If you've made modifications beyond the standard Medplum deployment — custom bot dependencies, forked components, unusual configurations, additional services — document them thoroughly.

Future you, your teammates, and anyone providing support will need to understand:

What was changed from the defaults
Why the change was made
Any operational implications

Getting Help

Even well-run self-hosted deployments occasionally need assistance. To make support engagements productive:

Maintain access paths: The dedicated cloud account structure mentioned earlier pays off here
Keep logs accessible: When something goes wrong, having recent logs available accelerates diagnosis
Disclose customizations: Any manual database changes, forked code, or unusual configurations are relevant

Questions? Reach out to the Medplum team at support@medplum.com or on Discord.

Infrastructure Organization​

Use a Dedicated Cloud Account​

Infrastructure as Code​

Database Management​

Treat the Database as Sacred​

Plan for PostgreSQL Major Version Upgrades​

Observability and Monitoring​

Minimum Viable Monitoring​

Recommended Additions​

Tooling Options​

Capacity Planning​

Don't Run Hot​

Connection Limits​

Backup and Disaster Recovery​

Verify Your Backups​

Multi-AZ and Redundancy​

Security Hygiene​

Network Architecture​

Secrets Management​

Patching Cadence​

Upgrade Strategy​

Stay Reasonably Current​

Maintain a Staging Environment​

Subscribe to Release Notifications​

Document Your Customizations​

Getting Help​