Case studies - Copernicus CLIMATE DATA STORE

Cloud solution for climate research support

How ECMWF is using a hybrid cloud to access, analyze and distribute one of the largest climate datasets worldwide.

The customer

The European Center for Medium-Range Weather Forecasts is both an international research institute and a 24/7 operational service providing numerical weather predictions and managing one of the largest climate datasets in the world. ECMWF operates a large IT infrastructure including one of the top supercomputers worldwide and a big data archive of several hundreds of Petabytes.

The challenge

ECMWF was entrusted by the European Commission to build and operate the Copernicus Climate Data Store (CDS) - a distributed system that provides access to Essential Climate Variables (ECVs), climate analyses, projections, and indicators at temporal and spatial scales relevant to adaptation and mitigation strategies for various sectoral and societal benefit areas. The major technical challenge of this project was to provide a high-performance, scalable service that would efficiently retrieve time-series-based climate data from a 180 PB tape archive and make it available for analysis, aggregation, processing, and delivery to a fast-growing community of users.

Aligning with the European Cloud Computing Strategy adopted by the EC and considering the huge amounts of data already stored at ECMWF premises, the Center was looking for a hybrid cloud solution composed of a fully managed Private Cloud located at the Center's premises and a public cloud service allowing to easily scale the hybrid cloud infrastructure in phase with CDS usage growth.

The cloud solution was also intended to host the CDS-Toolbox allowing for its reliable and flexible operation, growth, and scaling across multiple environments (production, development, testing, and integration). The CDS-Toolbox is under continuous development and requires the underlying cloud to be flexible enough to scale and adapt to possible changes.

The solution

In order to meet those needs, after a competitive procurement process, ECMWF has selected CloudFerro to supply a hybrid cloud hosting service for CDS. The private cloud hardware has been physically installed on customer premises to enable broadband communication with the local infrastructure, and to meet extended security requirements, but it is provided in a pay-per-use 'as a service' model and is being fully operated and monitored by CloudFerro.

The CDS Hybrid Cloud is made of two components - an On-premises Private Cloud that is directly connected to the ECMWF storage infrastructure to provide fast, high-bandwidth access to the Center's huge data stores and Cloud Ferro’s remote Public Cloud that can be used to accommodate peak loads. In order to provide a seamless user experience, both infrastructures are managed by a common cloud system based on OpenStack - the leading open-source software for building cloud systems.

OpenStack provides rich functionality, high maturity, and a vibrant developer community. The CDS Clouds Hybrid offers all the important services of OpenStack including authentication, compute, storage, networking, orchestration, and application catalog. The OpenStack cloud is complemented by a fully integrated unified distributed storage system based on Ceph.

The storage system offers block and object storage services in normal (HDD) and high-performance (SSD) tiers, triple, transparent data replication, thin provisioning, on-line scaling, and high availability. Its main application is to act as a fast access cache tier for data retrieved from the tape archive. The usage of such an inexpensive disk-based cache provides tremendous performance gains over a tape archive which is particularly ineffective for time-series retrieval.

Advantages of the CDS Hybrid Cloud

The CDS Hybrid Cloud comes fully configured with predefined Virtual Machine flavors, various operating system images, predefined external network configuration, accounting, and monitoring. The CDS Hybrid Cloud has been connected to ECMWF LAN infrastructure using 40Gbps interfaces in conformance with the access rules defined by the Customer.

The CDS Hybrid Cloud provides flexible mechanisms to provision resources and manage access rules in different virtual environments that share the underlying hardware infrastructure. On the logical level, these environments are completely separate, with independent IP address spaces, networks, users, and other namespaces. The access management system allows configuring flexible access rules to the different environments. Integrated accounting mechanisms can be used to manage resource usage.

The CDS Hybrid Cloud provides high availability of services by using redundancy and replication in all the important system components including cloud controllers, networks, and storage servers. All the cloud functionalities are available from a web portal (Horizon Dashboard) and via standard, documented API-s, allowing for easy automation and integration with other systems.

CDS Hybrid Cloud resources are organized in two zones - on-premises and remote. Both zones are administered using a single interface allowing users to select the location of their resources. In order to reflect the hybrid nature of the CDS Hybrid Cloud, we provide separate billing and scaling methods for both zones. The On-premises zone is billed based on the number of resources installed on-premises. Its resources may be scaled within weeks by installing extra hardware. The remote zone uses the CloudFerro DC infrastructure and is billed per virtual resource usage.

The CDS Hybrid Cloud is fully managed by CloudFerro staff. Systems administration, monitoring, and management operations are being handled remotely from CloudFerro offices using secure VPN connectivity. Tasks requiring physical intervention on the on-premises cloud, such as hardware upgrades or maintenance are handled by our remote hands, deployed on the customer's premises. Throughout the project, CloudFerro has also provided extensive professional services, working with the Customer to ensure smooth and efficient deployment of CDS applications in the cloud.

Summary

The CDS cloud has started with 840 vCores, 6,78 TB of RAM, and 1 PB of disk storage and has successfully grown to 1680 vCores, 8,51 TB of RAM, and 3,89 PB of disk storage within a few years. It produces over 40TB of climate data daily, directly servicing a community of over 30 000 users and helping to improve the future of 7.8 billion others.

Customer benefits

  • Scalable and cost-effective hybrid cloud solution.
  • Flexible, pay-per-use purchase model with dedicated private infrastructure installed on-site.
  • Open, vendor-neutral solution based on industry-leading cloud frameworks OpenStack and Ceph.
  • Fast, successful, cost-effective on-demand scaling.
  • Efficient resolution of tape archive-related performance bottleneck.
  • Rich functionality available over standard API-s and Web Portal.
  • Seamless integration with local customer infrastructure.
  • Fully separated virtual environments for production, tests, development, and other purposes.
  • Flexibility in terms of integration and functionality (open source), on-premises and remote cloud scaling, the configuration of environments.
  • Professional services supporting the deployment of the CDS application in the cloud.
  • Highly optimized hardware setup, including fast connectivity, fast SSD-based storage tier, performant solutions for high bandwidth data processing of Earth Observation data.

See also: our projects

If you have any query, contact us. Our experts will be happy to assist you.

We will answer all Your questions as soon as possible.