Case studies - Data Lake

A massive data repository for Destination Earth initiative

Destination Earth Data Lake - DEDL case study cover image 1920px 1Destination Earth Data Lake - DEDL case study cover image 1920px 1

Data Lake is a cloud based massive repository of data that provides the foundation for Destination Earth – a European Union’s flagship initiative to develop a highly accurate digital model of Earth, contributing to the European digital and green transformation.

Data Lake is a massive repository of data based on cloud services that provides the foundation for Destination Earth – a European Union’s flagship initiative to develop a highly accurate model of Earth. The digital replica of our planet will enable to monitor complex interactions of our planet's ecosystems and human activities with unprecedented accuracy. This will help better predict weather phoenomena, and develop more effective strategies to counteract the effects of climate change. The initiative contributes to the European digital and green transformation building a more sustainable future for all European citizens.

Data Lake is delivered and operated by CloudFerro, who implements the project on behalf of EUMETSAT, providing and operating a complex cloud infrastructure distributed across several locations in Europe. The services provided for this multi-cloud Data Lake include design, establishment, testing, operations and provision of an online inventory of massive amounts of data.

The project goals and assumptions

Rapid climate change is imposing an increasing burden on the European health, society and economy. To combat the environmental challenges, the European Commission, together with partnering organisations – the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), the European Space Agency (ESA), and the European Centre for Medium-Range Weather Forecasts (ECMWF), has launched a ground-breaking initiative – Destination Earth (DestinE).

The DestinE’s long-term goal is to develop a digital replica of the Earth that will enable to model all the Earth’s ecosystems in a highly accurate manner to help monitor natural and human activities, predict weather and climate phenomena, and test scenarios for more sustainable developments. It is supported by Horizon Europe and other relevant European and national initiatives in the area of research and innovation.

The most important benefits:

Destination Earth Data Lake - 004 open source

Scalability

Scalable architecture of the cloud infrastructure to handle growing amounts of datasets and users in the future.

Destination Earth Data Lake - 1444 browser ciety

Proximity to the data

Destination Earth users can be close to the data.

Destination Earth Data Lake - 003 elastic

Processing near data

Ensuring greater efficiency by processing only valuable data.

Destination Earth Data Lake - 002 coding

Distributed location

Infrastructure of data bridges built around supercomputers to ensure effective data exchange and near data processing.

Destination Earth Data Lake - 001

Open for other providers

The system has capability to accept all types, shapes and file sizes of data that is stored today and in the future.

Destination Earth Data Lake - 6 cloud computing

Harmonised Data Access

Easy access to Digital Twin Data and Federated Data Providers Data.


Destination Earth system

The key elements of Destination Earth system include:

  • Core Service Platform operated by ESA,
  • DestinE Data Lake managed by EUMETSAT,
  • Digital Twins managed by ECMWF.

Destination Earth Data Lake -
Fig. 1. Destination Earth system. Source: EUMETSAT.

The data

The data stored in the Data Lake come from:

  • EUMETSAT’s own Earth observation satellite systems
  • ESA missions
  • Copernicus Sentinel satellites
  • ECMWF
  • other relevant European data providers.
Destination Earth Data Lake - Categories of data in DEDL case study www
Fig. 2. Categories of data in the Data Lake repository. Source: EUMETSAT.

Description of services provided

Data Lake is based on a few services which together are the core foundation of Destination Earth (DestinE):

  • Data Access and Discovery
  • Big Data Processing Services
  • Support Service
Destination Earth Data Lake -
Fig. 3. Data Lake services. Source: EUMETSAT.

The Data Lake environment that CloudFerro develops and operates is designed to store over several dozen PB of data stored within private clouds located in data centres across Europe. These data centres are also locations of powerful computing facilities built under the EuroHPC programme that take part in digital modelling. The HPC supercomputers will process the models created within the Digital Twin component, and the results of the analyses will be stored in the Data Lake repository.   

The data centres for Data Lake infrastructure are located in:

  • Warsaw, Poland – central site
  • Kajaani, Finland
  • Bologna, Italy
  • Darmstadt, Germany
  • Barcelona, Spain.

Destination Earth Data Lake - DL Infrastructure A
Fig. 4. Data Lake distributed infrastructure

The project's infrastructure consists of independent Sites that are networked together through a WAN solution.

  • Internet Access is 100G per site
  • Virtual Private line between sites is 10G connection
  • Dedicated connection link with HPC

As a prime contractor for EUMETSAT, CloudFerro is responsible for:

  • Coordinating the Data Lake project
  • Delivery of cloud infrastructure (Built as Infrastructure-as-a-Service and Platform-as-a-Service):
    • More than 60 PetaBytes of storage available for Destination Earth
    • More than 23 500 CPUs installed
  • Provision of Big Data processing Services
  • Service operations and maintenance.

CloudFerro collaborates with two European partners – CS Group responsible for the area of discovery and data access, and EODC, who are in charge of Big Data processing.

Destination Earth Data Lake - DEDL scope graphic 1
Fig. 5. Services provided by CloudFerro and its partners in the Data Lake project.

Data Lake is a Self-standing component:

  • Built from geographically distributed physical elements
  • Distributed services – with seamless access

It provides Discovery & Data Access via Harmonised Data Access (HDA) to simplify data discovery & access to:

  • Initially two Digital Twins (ECMWF):
    • Extreme Weather and Climate Change Adaptation
  • External federated data spaces allowing to leverage many public data sources supported by the EU
  • Destination Earth User generated data

Big Data Processing:

  • Processing near data including distributed computing & workflows

Data Lake is designed to provide seamless access to all data specified in Destination Earth data portfolio. The repository will be available from a large number of external data spaces, Digital Twins or applications residing on the DestinE Core Service Platform, regardless of data type and location.

In general, the Data Lake service is expected to establish a scalable service framework that enables federation with different data spaces, as well as the provision of computing and storage services  close to DestinE's data.

A fully implemented system will be completed within 7-10 years to serve scientists, researchers, private sector, and the general public.


If you have any query, contact us. Our experts will be happy to assist you.

We will answer all your questions as soon as possible.