The European Center for Medium-Range Weather Forecasts is both an international research institute and a 24/7 operational service providing numerical weather predictions and managing one of the largest climate datasets in the world. ECMWF operates a large IT infrastructure including one of the top supercomputers worldwide and a big data archive of several hundreds of Petabytes.
ECMWF was entrusted by the European Commission to build and operate the Copernicus Climate Data Store (CDS) - a distributed system that provides access to Essential Climate Variables (ECVs), climate analyses, projections, and indicators at temporal and spatial scales relevant to adaptation and mitigation strategies for various sectoral and societal benefit areas. The major technical challenge of this project was to provide a high-performance, scalable service that would efficiently retrieve time-series-based climate data from a 180 PB tape archive and make it available for analysis, aggregation, processing, and delivery to a fast-growing community of users.
Aligning with the European Cloud Computing Strategy adopted by the EC and considering the huge amounts of data already stored at ECMWF premises, the Center was looking for a hybrid cloud solution composed of a fully managed Private Cloud located at the Center's premises and a public cloud service allowing to easily scale the hybrid cloud infrastructure in phase with CDS usage growth.
The cloud solution was also intended to host the CDS-Toolbox allowing for its reliable and flexible operation, growth, and scaling across multiple environments (production, development, testing, and integration). The CDS-Toolbox is under continuous development and requires the underlying cloud to be flexible enough to scale and adapt to possible changes.
In order to meet those needs, after a competitive procurement process, ECMWF has selected CloudFerro to supply a hybrid cloud hosting service for CDS. The private cloud hardware has been physically installed on customer premises to enable broadband communication with the local infrastructure, and to meet extended security requirements, but it is provided in a pay-per-use 'as a service' model and is being fully operated and monitored by CloudFerro.
The CDS Hybrid Cloud is made of two components - an On-premises Private Cloud that is directly connected to the ECMWF storage infrastructure to provide fast, high-bandwidth access to the Center's huge data stores and Cloud Ferro’s remote Public Cloud that can be used to accommodate peak loads. In order to provide a seamless user experience, both infrastructures are managed by a common cloud system based on OpenStack - the leading open-source software for building cloud systems.
OpenStack provides rich functionality, high maturity, and a vibrant developer community. The CDS Clouds Hybrid offers all the important services of OpenStack including authentication, compute, storage, networking, orchestration, and application catalog. The OpenStack cloud is complemented by a fully integrated unified distributed storage system based on Ceph.
The storage system offers block and object storage services in normal (HDD) and high-performance (SSD) tiers, triple, transparent data replication, thin provisioning, on-line scaling, and high availability. Its main application is to act as a fast access cache tier for data retrieved from the tape archive. The usage of such an inexpensive disk-based cache provides tremendous performance gains over a tape archive which is particularly ineffective for time-series retrieval.
Advantages of the CDS Hybrid Cloud
The CDS cloud has started with 840 vCores, 6,78 TB of RAM, and 1 PB of disk storage and has successfully grown to 1680 vCores, 8,51 TB of RAM, and 3,89 PB of disk storage within a few years. It produces over 40TB of climate data daily, directly servicing a community of over 30 000 users and helping to improve the future of 7.8 billion others.
- Scalable and cost-effective hybrid cloud solution.
- Flexible, pay-per-use purchase model with dedicated private infrastructure installed on-site.
- Open, vendor-neutral solution based on industry-leading cloud frameworks OpenStack and Ceph.
- Fast, successful, cost-effective on-demand scaling.
- Efficient resolution of tape archive-related performance bottleneck.
- Rich functionality available over standard API-s and Web Portal.
- Seamless integration with local customer infrastructure.
- Fully separated virtual environments for production, tests, development, and other purposes.
- Flexibility in terms of integration and functionality (open source), on-premises and remote cloud scaling, the configuration of environments.
- Professional services supporting the deployment of the CDS application in the cloud.
- Highly optimized hardware setup, including fast connectivity, fast SSD-based storage tier, performant solutions for high bandwidth data processing of Earth Observation data.