When is a good time for a healthcare system to consider cloud infrastructure?
A growing client, with an application portfolio of 800+ applications, discovered the answer to this question when they needed to better standardize and automate system management. The teams managing the applications, servers, and underlying components could not scale to manage the differences of each system. Staffing and budgeting were continuing to get cut.
They needed a better way to consistently manage it all.
Various applications within the client’s portfolio were in mixed states of complexity and maturity. Because of this, they could not immediately be consolidated without impacting both the acute and ambulatory sites. Designing a streamlined, methodical transition was imperative to our success.
Applications were hosted in data centers and MDFs in various regional locations. Migrating these applications to cloud hosting, while technically feasible, had to be done in a way that would ensure that vendor support and management did not impact the clinical experience.
Multiple applications could not easily be moved to remote or cloud hosting, due to latency sensitivity and various other technical reasons. Migration to alternate management systems needed to avoid introducing downtime, wherever possible.
Finally, an application catalog existed but was last updated years ago. It did not have relevant application metadata and was about 50% current (at best).
Our Approach & Methodology
First and foremost, establishing a refresh of the application catalog was a priority.
We chose not to focus on standardized app catalog metadata. Instead, we included data collection that would help us build out a new strategy. Then, we could qualify and categorize applications based on technical, process, and business criticality/dependency.
Multiple hosting locations would continue to be a reality for the foreseeable future, and cloud hosting was not (yet) a viable model. So, we implemented hyper-converged platforms to host applications at each regional location. This would ensure that physical distance and/or latency impact would not be a factor.
We identified that >65% of the applications in the portfolio were ideal candidates for automated management, based on a number of criteria. In migrating production systems to the local, hyper-converged platforms, we were also able to create a snapshot-based model that would perform instant, local P2V or V2V migration. It would also allow us to facilitate local DR failover validation, perform UAT with the clinical staff, and then cut over to the converged platform.
We allowed both the legacy system and new, hyper-converged platform to run in parallel for 2+ weeks, to ensure that the impact was mitigated. Then, we ultimately decommissioned each legacy system.
Success & Value Realized
By performing the transitions in this way, we made a number of valuable discoveries.
- By hosting in hyper-converged platforms locally, we were able to realize a significantly reduced physical footprint. Server and storage devices shrunk to 90%+ in some cases. This had a positive realized value return of floor space (which was set to be converted to usable office space).
- Environmental costs (cooling, electricity, etc.) were reduced for the more condensed and efficient, hyper-converged devices.
- Many hundreds of thousands of dollars in annual hardware maintenance expenses were effectively eliminated.
- By being able to refresh the application catalog with valuable metadata, we created automation scripts that tightly aligned with business needs. RTO/RPO models were very closely aligned with the needs of the clinical departments.
- Snapshot-based recovery and replication were now automation-based and push-button simple. Each app was snapshotted on its business need, then replicated. Recovery was just as simple from any snapshot.
- An entirely new Disaster Recovery (DR) model was created, with snapshot-based replication to cloud hosting. Virtual networking was also implemented, to allow for transparent IP and network-based redirection. That way, in any disaster scenario, diverting user activity to the cloud-hosted DR images did not require physically touching each and every endpoint device.
- Using the cloud as a DR platform allowed for a significantly reduced initial cost model for cloud hosting. It also allowed the client to begin testing the long-term feasibility of cloud hosting production systems on a case-by-case basis
We also learned quite a few lessons in the process.
- A majority of applications were updated/maintained very inconsistently. Being able to standardize the automated management of systems, we were able to bring >85% of systems up to compliance with patching and updates, without any major staff movement.
- Automation management required a new skill set for staff. Those that had Unix backgrounds and/or scripting expertise (SQL, Perl/Python, etc.) quickly adapted to the new skill set and were rejuvenated with their work.
- Being able to standardize virtualization and automation enabled us to reduce workforce manual and physical intervention by >90%.
- Critical applications were able to be put on a significantly shorter snapshot window (one hour), to allow for near-immediate roll-back in any impact scenario. This guaranteed recovery in less than one hour, in many instances.
- Once we were able to have a comprehensive overview of the entire server inventory (40K+), we found that there were only 12 common server builds. Further scrutiny allowed us to standardize on three server builds. Automating 40k+ servers would have been impossible. Automating 12 server builds would be difficult, but feasible. Automating three server builds, however, became a simple success.