How to use SPOT instances?
by Mateusz Ślaski, Sales Support Engineer, CloudFerro
What are SPOTS?
SPOTS are cloud computing resources (virtual machines) available at discount prices. They run using unused provider capacity and may be stopped when the demand for standard-priced resources increases.
CloudFerro implementation of SPOT instances
Typical SPOTS implementations provide floating prices depending on resources availability. CloudFerro cloud offers a simplified SPOTs pricing model. SPOT instances are available for high-memory, GPU-equipped and ARM CPU-based instances. The prices are flat and not related to changes in availability of resources.
At the time of writing this article, SPOT instances prices were:
- 2 times cheaper for GPU-equipped instances
- 3 times cheaper for high-memory instances
- 1.8 times cheaper for ARM-based instances
If cloud resources are available, SPOT instances may be launched.
The general documentation about SPOT instances usage is “Spot instances on CREODIAS” https://docs.cloudferro.com/en/latest/cloud/Spot-instances-on-CloudFerro-Cloud.html. It covers important technical details. However, it does not provide more hints on when and how to use SPOT instances effectively.
Usage examples
SPOT instances are a great choice for:
- Stateless services.
- Short life-span entities.
- Workloads that can quickly dump current data and status when an interruption signal comes.
- Jobs that can be divided into smaller steps executed separately.
Or any combination of the above.
More specific examples are described below.
Test environments with short lifetimes
QA teams often create test environments using IaaS tools, automatically deploy software to be tested execute manual or automated tests, and finally destroy the complete environment. Such test environments have short lifetimes measured in minutes or, at most, hours.
The situation where such tests might be sometimes cancelled due to instance eviction can be acceptable when tests environments costs significantly less.
Additionally, the environment provisioner can be configured so that if SPOTs are not available, on-demand instances will be used for this single test occurrence.
Web services
Most web services can be based on stateless services.
However, you need to plan resources for stable service operation for low and medium load as in case of load spikes. You can configure auto-scaling tools to run on demand instances for low and medium load and attempt running the services necessary for spike handling using SPOT instances if they are available. The cost of spike handling is significantly lower. Since service instances dedicated to handling load spikes typically have a short lifespan, the chance of eviction is relatively low.
Another scenario in this area is to go further with this economical approach and configure scaling in a way that on-demand instances are used only for a low constant load range. All instances required to fulfill medium and high load demand would be run on SPOT instances. However, with this approach monitoring and eventually replacing evicted SPOTS instances with on-demand instances would be crucial to assure your service quality.
Batch processing
You can use SPOT instances as processing workers if your batch process can be divided into smaller tasks or steps, and the processing workflow is configured in such a way that the entire workflow does not pause or stop in case of stopping or failure of a single step.
The general architecture for such a solution should consist of:
- Persistent storage
This will be the place for all input data, temporary results saved after each calculation step and results. Technically it can be:- An on-demand instance with:
- An NFS server with block storage volume attached.
- A dedicated service or database.
- A bucket in object storage.
- An on-demand instance with:
- Workers on SPOT instances.
They will process the work in possibly small sub-batches saving output to persistent storage. - Persistent VM acting as a control node or other work scheduling/coordination solution.
All workers should check the status of tasks to be executed here and report which tasks they are taking to avoid repeated tasks or worse, falling into a deadlock.
This part is not a must. You may design the entire process and prepare workers software in such a way that coordination is performed by them - usually by saving work status on storage. However, a dedicated coordination node would contain additional logic and simplify monitoring status and progress.
SPOTS Implementation tips
Resource availability and allocation order
As mentioned at the beginning of this article, SPOT instances may be provisioned if cloud contains enough resources for the selected flavor.
If many users started allocating SPOT resources at the same time, all of them would be treated with the same priority. Instances would be created in an order that requests to the OpenStack API are received.
Speed up instances provisioning by launching them from dedicated image
In the case of SPOT instances, fast VM activation is needed. The best practice to achieve this is to
- Install all needed software on a standard VM with the same or smaller flavor than the expected SPOT flavor.
- Test it.
- Create a snapshot image from the VM.
- Use this image as the source of final SPOT instances.
If you do not have much experience with virtual machine images maintenance, you may find it beneficial to learn from available resources and develop your own procedure and tools. I strongly recommend using automation scripts, dedicated tools such as OpenStack Diskimage-Builder, or automation tools like Ansible. Infrastructure as a code tools, as for example Terraform, should be considered.
Below, you will find a simple example of how you can prepare such image using a user-data option when creating the instance.
Please create a new VM using the following command:
openstack server create --image MY_WORKER_IMAGE --flavor spot.hma.large --user-data MY_INSTALL_SCRIPT.sh VM_INSTANCE_NAME
Where MY_INSTALL_SCRIPT.sh contains for example:
#!/bin/sh
apt-get update
apt-get install -y python3 python3-virtualenv
sudo -u eouser bash -c \
'cd /home/eouser && \
virtualenv app_venv && \
source ./app_venv/bin/activate && \
git clone https://github.com/hossainchisty/Photo-Album-App.git && \
cd Photo-Album-App && \
pip install -r requirements.txt'
A VM created in this way will have:
- Python installed.
- An example application.
- Created virtual environment.
- All application requirements installed within the virtual environment.
You can then create an image from this machine by executing the following commands:
openstack server stop VM_INSTANCE_NAME
openstack server image create VM_INSTANCE_NAME --name IMAGE_NAME
The image IMAGE_NAME may be used as a source to create SPOT instances with the same or higher resource requirement flavor.
Reacting on an instance termination attempt
The documentation “Spot instances on CREODIAS” https://docs.cloudferro.com/en/latest/cloud/Spot-instances-on-CloudFerro-Cloud.html mentions that instances can be tagged during their creation or later with callback_url:<url value> tag.
This allows us to react when the OpenStack scheduler tries to delete our SPOT instance.
The first reaction option is to convert the instance to an on-demand instance. You need to prepare a persistent service able to receive message about the attempt to delete the instance. Then include the dedicated endpoint’s full address in the tag. When the service receives the instance UIID in message, it should execute script containing the following OpenStack client commands:
openstack server resize --flavor NON_SPOT_FLAVOR VM_INSTANCE_UIID
When output of command:
openstack server list -f value -c ID -c Status | grep VM_INSTANCE_UIID
shows the status VM_INSTANCE_UIID VERIFY_RESIZE,
then the resize may be confirmed with a command:
openstack server resize confirm VM_INSTANCE_UIID
However, we need to consider the following:
- The resize process forces instance to reboot, so if your workload is not prepared, you may lose your data.
- It is not possible to resize this instance back to a SPOT flavor.
Manual resizing from a SPOT flavor to on-demand is also possible from the Horizon GUI, but it should be treated as a planned preventive action because humans usually would not have any time to react with the resize after receiving the notification.
If your processing software can save the current state or temporary result, the second reaction option is to place the service on the instance and activate it on a system boot. Then, you need to tag the instance with its own IP address in URL in “callback_url” tag.
When the service on the instance receives a message with information about the instance delete attempt, it may trigger the interruption of the workload and save the current state or temporary result on the persistent storage. In this way, the job can be continued by another still active SPOT instance or on an on-demand instance (already active or provisioned if selected number of workers is requested).
Flexible SPOT instances
When creating SPOT instances, you may need to hardcode some configuration data, such as the addresses of input data servers or storage destinations for the results. This can be done using the solution provided in the previous section. However, if you wish to run many different projects, customizing the instance becomes necessary.
A simple functional solution is to:
- Prepare your software to use configuration provided in environment variables
- Inject those variables values thru “cloud-init” when the instance is created
For example, you can create an instance in this way:
openstack server create --image MY_WORKER_IMAGE --flavor spot.hma.large --user-data MY_CONFIG_FILE.yaml VM_INSTANCE_NAME
If your program retrieves data and saves results to REST API endpoints, MY_CONFIG_FILE.yaml could contain:
#cloud-config
runcmd:
- echo "INPUT_DATA_API='https://192.168.1.11/input/'" >> /etc/profile
- echo "OUTPUT_DATA_API='https://192.168.1.11/output/'" >> /etc/profile
- echo "DATA_USER='username'" >> /etc/profile
This file can be generated from a template manually or by a tool/script.
If you are using Terraform, the configuration can be applied using Terraform variables in this way:
#cloud-config
runcmd:
- echo "INPUT_DATA_API='${var.input_data_api}'" >> /etc/profile
- echo "OUTPUT_DATA_API='${var.input_data_api}'" >> /etc/profile
- echo "DATA_USER='username'" >> /etc/profile
Releasing SPOT instances resources
Using SPOTs is related to the demand for lowering the costs.
If you wish to release resources when the workload is done to minimize the costs even further, then you can consider the following:
- Install and configure a load monitoring tool on the image used to create instances. Check periodically for idle instances, then manually delete idle instances.
- Design and develop notifications about finished tasks into your workload software. Then manually delete the idle instance after receiving a notification.
- Upgrade the previous solution with notifications sent to a software agent working on the persistent VM which can automatically delete idle instances. This solution would be beneficial when the savings from deleting idle instances exceed the cost of maintaining a persistent VM based on a cheap, low-resource flavor.
- Finally, you can build the capability for our software to self-destruct the instance. This can be done in this manner:
- Configure the image with an OpenStack client and credentials.
To do this, follow the steps provided in the documents:
“How to install OpenStackClient for Linux on CREODIAS” https://docs.cloudferro.com/en/latest/openstackcli/How-to-install-OpenStackClient-for-Linux-on-CloudFerro-Cloud.html
and
“How to generate or use Application Credentials via CLI on CREODIAS” https://docs.cloudferro.com/en/latest/cloud/How-to-generate-or-use-Application-Credentials-via-CLI-on-CloudFerro-Cloud.html
and information given in the previous chapter “Speed up instances provisioning by launching them from dedicated image”
- Building into your application procedure accessing "http://169.254.169.254/openstack/latest/meta_data.json" to retrieve the instance UUID and then executing self-destruct with command:
openstack server delete INSTANCE_UUID
Kubernetes Autoscaling
We can avoid many of the previous traps by moving to the next level of abstraction: using a Kubernetes cluster created with OpenStack Magnum, featuring autoscaling on SPOT instances. With this solution, SPOT instances will be used to create Kubernetes cluster nodes when necessary. When the demand for resources decreases, nodes will be automatically removed after a configured autoscaling delay.
As autoscaling and eventual node eviction will be performed on the separated and dedicated node group, then core services on the default-workers node group will not be affected. This solution is cost-effective if the cost of maintaining a working cluster is lower than keeping idle SPOT instances. You need to:
- Create a production cluster with 3 control nodes and a minimum one work node. More details can be found in one of the CREODIAS mini-guides.
- Read documentation about node groups https://docs.cloudferro.com/en/latest/kubernetes/Creating-Additional-Nodegroups-in-Kubernetes-Cluster-on-CloudFerro-Cloud-OpenStack-Magnum.html
- Create a new node-group using one of spot.* flavors.
For example:
openstack coe nodegroup create --min-nodes 1 --max-nodes 3 --node-count 1 --flavor spot.hma.medium --role worker CLUSTER_NAME spot-workers
This command crates a node group named “spot-workers” with a single node activated just after the group creation, and maximum 3 nodes.
Note that explicit definition of all options from the command above is necessary to activate autoscaling on the new group.
Define the affinity of all your persistent services to the node group default-worker.
Follow the example of Kubernetes deployment definition:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: magnum.openstack.org/nodegroup
operator: In
values:
- default-worker
containers:
- name: ubuntu
image: ubuntu:latest
command: ["sleep", "3600"]
- Define the affinity of worker pods to the node group based on spot instance nodes. Follow the example of Kubernetes deployment definition. Remember to replace “spot-workers” name with your node-group name if you use another one:
apiVersion: apps/v1
kind: Deployment
metadata:
name: counter-log-deployment
labels:
app: counter-log
spec:
replicas: 4
selector:
matchLabels:
app: counter-log
template:
metadata:
labels:
app: counter-log
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: magnum.openstack.org/nodegroup
operator: In
values:
- spot-workers
containers:
- name: counter-log
image: busybox:1.28
args: [/bin/sh, -c, 'i=0; while true; do echo "$i: $(date) $(hostname) $(uuidgen)-$(uuidgen)"; i=$((i+1)); sleep 1; done']
resources:
requests:
cpu: 200m
Final Thoughts
Using SPOT instances certainly brings significant cost savings, but it also provides several other benefits:
- You gain a better understanding of how cloud virtual machine instances work and how to efficiently create, configure, and initialize them.
- IaaC. If you do not use this already, with staring using SPOTS you have opportunity to learn and start to use Infrastructure as a code tools as Terraform.
- Failure (SPOT eviction) is not an accidental behavior but a feature. Fault tolerance must be a core part of the system design.
- In the case of long-running processes:
- You need to better understand the workflow to split it into smaller chunks that can be processed by SPOT instances. This may provide an opportunity to enable or improve parallel data processing.
- Data structures for checkpoints and temporary data should be analyzed, which may lead to data size optimization. - Adding components dedicated to coordination gives an opportunity to implement detailed monitoring of the entire system’s processing status and progress.
Key benefits of using public cloud computing platform based on open source technologies
By Marcin Kowalski, Product Manager - CloudFerro
Open source technologies are the foundation of many modern IT environments. They can also act as the basis for cloud computing infrastructure, including platforms operating on a great scale.
Open source technologies i.e. based on an open code, allow their users to become independent of many aspects, including geopolitical factors. Such solutions also make it tremendously easier to ensure superior scalability. In this article we present the benefits that open source can bring, and find out why some cloud service providers are betting on open source technologies.
Let us take as an example our cloud computing services, where we build and operate clouds for collecting, storing, sharing, and processing various satellite data on behalf of institutions such as the European Space Agency, the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT), and the European Centre for Medium-Range Weather Forecasts (ECMWF), among others. Every day, we provide thousands of users around the world with access to PB-counted volumes of data delivered by the Sentinel constellation’s European satellites, and a cloud environment designed specifically for their processing.
The nature of the data processed defines significantly the cloud computing architecture. With regards to satellite data, it is crucial to ensure the highest possible availability of both historical and current data (satellite data is available to the user within 15 minutes after it is generated), as well as the scalability and stability of long-term development. This is possible through the use of open source technology and the competences of the provider's team.
Open source as the foundation for cloud computing platform
CloudFerro's cloud platform has been built from scratch using open source technologies as a multitenant environment. We address our services to large European public institutions, and the vast majority of them prefer open-code solutions. From their perspective, open source means the ability to guarantee the continuity of IT environments in the long term.
The ability to scale infrastructure horizontally, i.e. the fact that at some point we will not face a limitation in the ability to expand resources or the ability to support another group of users, is of great importance to the essence of the public cloud model.
From the perspectives of both the provider and the users of cloud environments based on open source solutions, this translates into a number of benefits. Let's summarize the most significant ones.
1. Avoiding the risk of vendor lock-in
In practice, it is important for the supplier, among other things, to ensure the independence of the hardware and application layers. This, in turn, eliminates possible risk of excessive attachment to a single vendor. Over-reliance on one vendor's solutions naturally poses a risk not only to business continuity, but also to the ability to freely set the directions of platform development.
Clients and providers need to make sure that the software we use as a basis for our services, and very often also for our business model, will be available, supported and developed in a long-term perspective. For instance, let’s look at a well-known case of VMware company who has changed its license model, ending the sale of perpetual licenses for their products, and transitioning to a subscription model. This has put many customers in a difficult situation as they have to take into account additional workload and expenses.
The modularity of this technology, that is quite natural for the open source approach, also meets these needs - both in the hardware and software layers. For instance, any component, or even the entire cloud computing technology stack, can be replaced at any time. This gives organizations using such solutions have, among other things, a degree of versatility and the ability of changing easily their cloud service provider. Workloads or data based on standard open source solutions can be directly migrated to another environment providing the same mechanisms. This also translates into new opportunities for ensuring business continuity.
2. High level of technological and operational security
A separate issue is the ability to fine-tune the environments based on open source solutions in line with customers’ or users’ needs. It is of great value that anyone, if they only wish, can thoroughly analyze the application code to fully understand how the entire technology stack works. Openness and technological transparency have impact on the level of data security and user burden.
Statistically, open code technologies do not stand out with a higher scale of vulnerabilities than closed solutions. There are hundreds or thousands of people working on particular solutions within the open source community, so there is a good chance that potential vulnerabilities will be quickly detected and neutralized. The specificity of CloudFerro's cloud services also requires ensuring high resilience and business continuity. This, in turn, is reflected in the framework of internal procedures and regulations implemented by the operator of the cloud.
Last but not least, understanding how the cloud platform components function on a low level, paves the way for implementing unique modifications to tailor its functioning to customers' specific needs.
3. Multi-user accessibility and data isolation of individual users
One thing that differentiates various solutions based on open source technologies is the ability to separate the data and processing used by different users of public cloud environments. This translates into non-obvious technology decisions.
A critical aspect that determines which technology stack will be selected to build public cloud environment is the ability to ensure full multitenancy and complete isolation of individual user's data in a transparent manner. There are a number of technologies that perform well in a single-tenant architecture, but far less well in a multi-tenant architecture. This is especially true in terms of data security issues.
At the same time, as our team’s experience shows, it is possible to build a multi-tenant cloud computing platform that provides full isolation of customer data based entirely on open source technologies. Ensuring end-to-end data separation of individual users of a single cloud environment - including users representing different organizations - is crucial in terms of security, as well as the ability to comply with regulations in certain industries.
4. Cloud platform functional development and adaptation to particular needs
The ability to customize or change the solutions that make up a cloud platform are also beneficial to customers as they can have a much greater impact on setting the development priorities than in case of closed-code solutions.
In the case of closed solutions, their users have a fairly limited influence on the directions of technology or feature development. With open source solutions, the possibilities for determining the vectors of such changes are generally very broad. However, they pose business and technical challenges.
The base for such plans is, obviously, the dialogue between the client and the provider. For many institutions, the very possibility of negotiating the cloud platform development roadmap means greater freedom in defining strategic business plans.
However, in terms of customizing standard capabilities of the open source technology stack, the experience and competence of the team are of great importance, as well as the scale of operations that affects the effectiveness of customization projects.
Naturally, it is unlikely that a single organization is able to implement all ideas on its own, but it can envision at least the direction of the necessary changes. In large-scale organizations like CloudFerro, such a discussion mainly comes down to setting priorities and gradually implementing planned changes. In this sense, open source is much safer and more flexible than closed-code solutions.
5. Freedom to switch suppliers
The greater flexibility of open source technology also means hardware independence. From the vendor's perspective, it is easier to change various elements of the technology stack. For users, it provides greater freedom to migrate workloads to another platform, whether cloud-based or on-premise.
The technology standardization is key here. As the solutions used are technologically similar, they can be more clearly priced, which in effect makes it possible to compare offers of different suppliers. At the same time, technological consistency ensures that the possibility to change supplier is technically feasible with relatively little effort.
In this context, standardization at the level of the technology stack, regardless of possible functional or technological adjustments, leads to into greater universality and versatility in the use of cloud platforms in the face of dynamically changing needs.
6. Price, quality and functional competitiveness
The example of CloudFerro platform shows that the software layer can play a significant role in terms of differentiating cloud platforms in the market. From the providers' perspective, independence in composing the technology stack and freedom of choosing hardware create unique opportunities to compete on price, quality and functional scope of the cloud environment. In turn, from the customers' point of view, these attributes are a significant differentiator between cloud environments based end-to-end on open source solutions and other cloud computing services, particularly those using closed-code technologies. Greater flexibility in terms of cost, quality and functionality, among other things, means the ability to fine-tune the way the entire cloud platform operates to suit unique needs. Significant differences also exist between environments based on open source technologies.
Our customers expect open source solutions. They don't want closed technologies because they are aware of the potential risks associated with them. Instead, they know that open and closed technologies can provide the same values.
Enhancing Earth Observation capabilities - satellite data applications and implications
By Maciej Myśliwiec, Planet Partners
The view from space offers an expanded perspective on Earth. It is remarkable how data can be obtained and comprehended through satellite-based Earth observation. Sixty-five years after the launch of the first artificial satellite, space technology continues to provide an immense volume of data utilized by analysts and scientists worldwide, enabling the development of solutions across various disciplines. Let us explore the breadth of observations that can be done from space!
Currently, over 4,500 artificial satellites orbit our planet. Among them, there are the satellites of the European Union’s Copernicus programme – the leading provider of satellite data in Europe. Each day Copernicus delivers over 12 terabytes of data that is freely accessible to all users. With a simple internet connection, one can access the extensive collection of satellite images gathered since the programme's inception in 2014, with the launch of the first Sentinel-1A satellite.
Managing and processing such vast amounts of data necessitates substantial computational and storage resources. Earth Observation (EO) platforms like CREODIAS provide users with the means to transform EO data into valuable information. CREODIAS currently houses a repository of over 34 petabytes of data and an integrated cloud computing environment, enabling efficient processing of EO data.
Presently, half a million users of the Copernicus programme process more than 16 terabytes of data daily. To put this into perspective, 1 terabyte equates to approximately 250,000 photos captured by a 12-megapixel camera or 500 hours of high-definition video footage. Hence, we speak of millions of images and substantial quantities of other data (e.g., from satellite sensors) generated within the European programme every single day. The easy accessibility of satellite data fuels innovation across various industries and scientific domains. The current market turnover for Earth observation data is estimated at €2.8 billion, with over 41% of the European industry relying on data obtained from Earth observation programmes.
In recent times, several initiatives have emerged to facilitate data access and processing for scientists, researchers, public administrations, the private sector, and the general public. One of these initiatives is the Copernicus Data Space Ecosystem, which aims to provide comprehensive spatial and temporal coverage of Copernicus EO data, immediately accessible to all users free of charge, along with cloud computing resources for further data processing. Another notable initiative is Destination Earth, the flagship project of the European Commission that strives to develop an advanced digital model of the Earth powered by European supercomputers (HPC) and state-of-the-art artificial intelligence technologies.
The convenient access to current and historical Earth Observation data enhances planning and decision-making across diverse fields. How can we leverage these vast resources of satellite data?
Environmental research and climate change
One of the primary applications that come to my mind is the study of climate change and meteorology. EO data plays a crucial role in developing meteorological models for weather forecasting, preparing for and mitigating natural disasters, as well as monitoring and addressing climate change.
Accurate knowledge of current land cover and land use is pivotal for effective planning and management by local and regional authorities. Areas under the jurisdiction of public administration institutions, such as national parks, wetlands, lakes, riverbeds, and coastlines, benefit immensely from open satellite data captured by Sentinel-1 and Sentinel-2 satellites. These data offer extensive possibilities for managing such areas with minimal effort. Even 10-meter multispectral data is sufficient for a range of environmentally significant applications, including drought monitoring, flood management, freshwater ecosystem health assessment, forest cover analysis, biodiversity monitoring, air quality monitoring, and land surface temperature analysis.
Sentinel satellite missions, part of the European Copernicus programme, have revolutionized Earth research since 2014. The ubiquitous availability of freely accessible satellite data is complemented by specialized tools that enable their processing and analysis, converting raw data into actionable information.
Illustrative examples of climate change analysis tools include:
- "Global temperature trend monitor": an application that monitors global air temperature changes on Earth's surface over recent decades.
- "Climate monitoring and volcanic eruptions": an application studying volcanic eruptions, which can cause short-term climate effects on a global and regional scale, contributing to natural climate variability.
- "European temperature statistics derived from climate projections": an application providing European temperature statistics for historical periods (1976-2005) and projected 30-year periods (2031-2060 and 2071-2100) under different climate change scenarios.
These applications represent a fraction of the vast array of possibilities available to users of satellite data programs, tailored to address their specific research inquiries.
Forestry
Most forests are managed by state-owned enterprises for timber production, which significantly contributes to national budgets. Effective forestry management requires comprehensive knowledge of soil characteristics, water dynamics, climate conditions, and microclimates across expansive and sometimes challenging-to-access areas. Satellite data offers valuable insights into forest dynamics and their influence on climate patterns. Real-time spaceborne data serves as an invaluable tool for locating forest fires, monitoring plant diseases, and assessing the impact of drought on tree health.
Agriculture
Earth Observation supports various agricultural processes. At the local level, data assists in identifying optimal locations for cultivation, predicting crop yields, and maintaining records of agricultural parcels. Advanced remote sensing techniques, such as Synthetic Aperture Radar (SAR) and hyperspectral analysis, are increasingly being applied in precision agriculture. At regional or national scales, EO data facilitates land cover classification and supports the implementation of programs aimed at agricultural development and direct subsidies for farmers.
Spatial planning
Satellite data plays a vital role in spatial planning for both urban and rural landscapes. Leveraging EO data, one can conduct detailed surveys and complex analyses of areas of interest. Currently, imagery serves as a primary source of information for formulating up-to-date spatial development plans, considering changes in land use and cover, identifying new areas for investment, and detecting wetland areas. Copernicus Sentinel-2 data, for instance, provides rapid and comprehensive insights into current and historical phenomena. Investors seeking to expand their activities in remote regions greatly benefit from information on the status of potential investment areas.
Urban heat islands
Satellite-based Earth observation data also contributes to urban management, including the monitoring and mitigation of urban heat islands, characterized by significantly higher air temperatures within urban areas compared to adjacent rural regions. Unlike traditional point-based air temperature measurements from monitoring stations, satellite data enables the measurement of surface temperatures at any location within a city. This capability facilitates the development of spatial patterns critical for understanding the phenomenon and aids local authorities in taking necessary measures to improve residents' quality of life.
Maritime transport and logistics
Satellite imagery plays a crucial role in the monitoring and detection of vessel traffic on seas and oceans. It also supports the management of maritime economies and ensures the safe transportation of valuable commodities such as liquefied chemicals and crude oil. In the event of accidents, EO images provide timely assistance in mapping the consequences. Combining EO data with Automatic Identification System (AIS) information yields a powerful tool for monitoring marine objects and phenomena, including coastal monitoring and analysis of changing water depths.
Crisis management
Natural disasters like floods, wildfires, and storms often have a wide spatial extent. Open satellite data forms the foundation for the detection and monitoring of such events. Specialized methods, relying on environmental and atmospheric satellite data, are employed to detect favourable conditions and early warning signs of potential natural hazards like droughts or algal blooms in lakes or seas. Synthetic Aperture Radar (SAR) image processing is recommended for monitoring land movement, particularly for applications such as landslide risk detection, vital for mining and mountain tourism. Very high-resolution data plays a crucial role in assessing building disasters, managing mass events, and ensuring the security of government and military facilities.
Natural disasters like floods, wildfires, and storms often have a wide spatial extent. Open satellite data forms the foundation for the detection and monitoring of such events. Specialized methods, relying on environmental and atmospheric satellite data, are employed to detect favourable conditions and early warning signs of potential natural hazards like droughts or algal blooms in lakes or seas. Synthetic Aperture Radar (SAR) image processing is recommended for monitoring land movement, particularly for applications such as landslide risk detection, vital for mining and mountain tourism. Very high-resolution data plays a crucial role in assessing building disasters, managing mass events, and ensuring the security of government and military facilities.
Satellite imagery provides an insightful perspective on the locations and magnitudes of flooding events. These visual data enable the identification of inundated regions, assessment of soil and vegetation moisture levels, and the prediction of areas potentially vulnerable to flash floods. Earth Observation platforms like CREODIAS house an extensive array of such images, inclusive of superior-resolution datasets from esteemed programs such as Copernicus. Users have the opportunity to peruse diverse images, categorize them chronologically or geographically, and thus acquire a more profound comprehension of flood dynamics.
When combined with ground-level measurements, satellite-based remote sensing offers an in-depth perspective on global drought trends. Meteorological satellites play a crucial role in monitoring key environmental indicators such as humidity, temperature, and wind, which help predict periods of limited rainfall that can subsequently lead to dry soil conditions and stressed vegetation, commonly referred to as agricultural droughts. To support this analysis, the Copernicus programme provides an array of tools tailored to drought assessment. When it comes to hydrological droughts, which signify water shortages within certain regions, advanced satellite imagery becomes indispensable. One of the standout techniques in this domain is SAR – Synthetic Aperture Radar, which is a type of radar imaging that uses radio waves to capture detailed images of the Earth's surface, making it possible to detect even subtle changes in water levels and providing valuable insights into drought conditions.
Assessment of war-related environmental damage
Satellite data proves invaluable during wars and armed conflicts. However, its utility extends beyond military applications. By processing data, it becomes possible to estimate the scale of environmental destruction or predict actions necessary for post-war reconstruction. An example of such an initiative is EO4UA (Earth Observation for Ukraine), which aims to support Ukrainian institutions and international organizations in assessing the environmental damage caused by war activities within Ukraine. EO4UA activities are conducted in a cloud computing environment integrated with a substantial repository of Earth observation data, encompassing various datasets, such as satellite imagery, crop classifications, forest fire extents, and more, necessary for comprehensive environmental analysis.
The above-mentioned areas provide only a glimpse into the diverse applications of EO data. As we witness rapid technological advancements in space technologies, we can anticipate gaining unprecedented insights into the Earth's ecosystems. We believe that the future will bring a deeper understanding of our planet, facilitated by rapidly evolving satellite data technologies. Equipped with these advancements, we will be better prepared to address environmental challenges on our planet, fostering a more optimistic outlook for the future of Earth.
Enhancing Earth Observation capabilities - satellite data applications and implications
By Maciej Myśliwiec, Planet Partners
The view from space offers an expanded perspective on Earth. It is remarkable how data can be obtained and comprehended through satellite-based Earth observation. Sixty-five years after the launch of the first artificial satellite, space technology continues to provide an immense volume of data utilized by analysts and scientists worldwide, enabling the development of solutions across various disciplines. Let us explore the breadth of observations that can be done from space!
Currently, over 4,500 artificial satellites orbit our planet. Among them, there are the satellites of the European Union’s Copernicus programme – the leading provider of satellite data in Europe. Each day Copernicus delivers over 12 terabytes of data that is freely accessible to all users. With a simple internet connection, one can access the extensive collection of satellite images gathered since the programme's inception in 2014, with the launch of the first Sentinel-1A satellite.
Managing and processing such vast amounts of data necessitates substantial computational and storage resources. Earth Observation (EO) platforms like CREODIAS provide users with the means to transform EO data into valuable information. CREODIAS currently houses a repository of over 34 petabytes of data and an integrated cloud computing environment, enabling efficient processing of EO data.
Presently, half a million users of the Copernicus programme process more than 16 terabytes of data daily. To put this into perspective, 1 terabyte equates to approximately 250,000 photos captured by a 12-megapixel camera or 500 hours of high-definition video footage. Hence, we speak of millions of images and substantial quantities of other data (e.g., from satellite sensors) generated within the European programme every single day. The easy accessibility of satellite data fuels innovation across various industries and scientific domains. The current market turnover for Earth observation data is estimated at €2.8 billion, with over 41% of the European industry relying on data obtained from Earth observation programmes.
In recent times, several initiatives have emerged to facilitate data access and processing for scientists, researchers, public administrations, the private sector, and the general public. One of these initiatives is the Copernicus Data Space Ecosystem, which aims to provide comprehensive spatial and temporal coverage of Copernicus EO data, immediately accessible to all users free of charge, along with cloud computing resources for further data processing. Another notable initiative is Destination Earth, the flagship project of the European Commission that strives to develop an advanced digital model of the Earth powered by European supercomputers (HPC) and state-of-the-art artificial intelligence technologies.
The convenient access to current and historical Earth Observation data enhances planning and decision-making across diverse fields. How can we leverage these vast resources of satellite data?
Environmental research and climate change
One of the primary applications that come to my mind is the study of climate change and meteorology. EO data plays a crucial role in developing meteorological models for weather forecasting, preparing for and mitigating natural disasters, as well as monitoring and addressing climate change.
Accurate knowledge of current land cover and land use is pivotal for effective planning and management by local and regional authorities. Areas under the jurisdiction of public administration institutions, such as national parks, wetlands, lakes, riverbeds, and coastlines, benefit immensely from open satellite data captured by Sentinel-1 and Sentinel-2 satellites. These data offer extensive possibilities for managing such areas with minimal effort. Even 10-meter multispectral data is sufficient for a range of environmentally significant applications, including drought monitoring, flood management, freshwater ecosystem health assessment, forest cover analysis, biodiversity monitoring, air quality monitoring, and land surface temperature analysis.
Sentinel satellite missions, part of the European Copernicus programme, have revolutionized Earth research since 2014. The ubiquitous availability of freely accessible satellite data is complemented by specialized tools that enable their processing and analysis, converting raw data into actionable information.
Illustrative examples of climate change analysis tools include:
- "Global temperature trend monitor": an application that monitors global air temperature changes on Earth's surface over recent decades.
- "Climate monitoring and volcanic eruptions": an application studying volcanic eruptions, which can cause short-term climate effects on a global and regional scale, contributing to natural climate variability.
- "European temperature statistics derived from climate projections": an application providing European temperature statistics for historical periods (1976-2005) and projected 30-year periods (2031-2060 and 2071-2100) under different climate change scenarios.
These applications represent a fraction of the vast array of possibilities available to users of satellite data programs, tailored to address their specific research inquiries.
Forestry
Most forests are managed by state-owned enterprises for timber production, which significantly contributes to national budgets. Effective forestry management requires comprehensive knowledge of soil characteristics, water dynamics, climate conditions, and microclimates across expansive and sometimes challenging-to-access areas. Satellite data offers valuable insights into forest dynamics and their influence on climate patterns. Real-time spaceborne data serves as an invaluable tool for locating forest fires, monitoring plant diseases, and assessing the impact of drought on tree health.
Agriculture
Earth Observation supports various agricultural processes. At the local level, data assists in identifying optimal locations for cultivation, predicting crop yields, and maintaining records of agricultural parcels. Advanced remote sensing techniques, such as Synthetic Aperture Radar (SAR) and hyperspectral analysis, are increasingly being applied in precision agriculture. At regional or national scales, EO data facilitates land cover classification and supports the implementation of programs aimed at agricultural development and direct subsidies for farmers.
Spatial planning
Satellite data plays a vital role in spatial planning for both urban and rural landscapes. Leveraging EO data, one can conduct detailed surveys and complex analyses of areas of interest. Currently, imagery serves as a primary source of information for formulating up-to-date spatial development plans, considering changes in land use and cover, identifying new areas for investment, and detecting wetland areas. Copernicus Sentinel-2 data, for instance, provides rapid and comprehensive insights into current and historical phenomena. Investors seeking to expand their activities in remote regions greatly benefit from information on the status of potential investment areas.
Urban heat islands
Satellite-based Earth observation data also contributes to urban management, including the monitoring and mitigation of urban heat islands, characterized by significantly higher air temperatures within urban areas compared to adjacent rural regions. Unlike traditional point-based air temperature measurements from monitoring stations, satellite data enables the measurement of surface temperatures at any location within a city. This capability facilitates the development of spatial patterns critical for understanding the phenomenon and aids local authorities in taking necessary measures to improve residents' quality of life.
Maritime transport and logistics
Satellite imagery plays a crucial role in the monitoring and detection of vessel traffic on seas and oceans. It also supports the management of maritime economies and ensures the safe transportation of valuable commodities such as liquefied chemicals and crude oil. In the event of accidents, EO images provide timely assistance in mapping the consequences. Combining EO data with Automatic Identification System (AIS) information yields a powerful tool for monitoring marine objects and phenomena, including coastal monitoring and analysis of changing water depths.
Crisis management
Natural disasters like floods, wildfires, and storms often have a wide spatial extent. Open satellite data forms the foundation for the detection and monitoring of such events. Specialized methods, relying on environmental and atmospheric satellite data, are employed to detect favourable conditions and early warning signs of potential natural hazards like droughts or algal blooms in lakes or seas. Synthetic Aperture Radar (SAR) image processing is recommended for monitoring land movement, particularly for applications such as landslide risk detection, vital for mining and mountain tourism. Very high-resolution data plays a crucial role in assessing building disasters, managing mass events, and ensuring the security of government and military facilities.
Natural disasters like floods, wildfires, and storms often have a wide spatial extent. Open satellite data forms the foundation for the detection and monitoring of such events. Specialized methods, relying on environmental and atmospheric satellite data, are employed to detect favourable conditions and early warning signs of potential natural hazards like droughts or algal blooms in lakes or seas. Synthetic Aperture Radar (SAR) image processing is recommended for monitoring land movement, particularly for applications such as landslide risk detection, vital for mining and mountain tourism. Very high-resolution data plays a crucial role in assessing building disasters, managing mass events, and ensuring the security of government and military facilities.
Satellite imagery provides an insightful perspective on the locations and magnitudes of flooding events. These visual data enable the identification of inundated regions, assessment of soil and vegetation moisture levels, and the prediction of areas potentially vulnerable to flash floods. Earth Observation platforms like CREODIAS house an extensive array of such images, inclusive of superior-resolution datasets from esteemed programs such as Copernicus. Users have the opportunity to peruse diverse images, categorize them chronologically or geographically, and thus acquire a more profound comprehension of flood dynamics.
When combined with ground-level measurements, satellite-based remote sensing offers an in-depth perspective on global drought trends. Meteorological satellites play a crucial role in monitoring key environmental indicators such as humidity, temperature, and wind, which help predict periods of limited rainfall that can subsequently lead to dry soil conditions and stressed vegetation, commonly referred to as agricultural droughts. To support this analysis, the Copernicus programme provides an array of tools tailored to drought assessment. When it comes to hydrological droughts, which signify water shortages within certain regions, advanced satellite imagery becomes indispensable. One of the standout techniques in this domain is SAR – Synthetic Aperture Radar, which is a type of radar imaging that uses radio waves to capture detailed images of the Earth's surface, making it possible to detect even subtle changes in water levels and providing valuable insights into drought conditions.
Assessment of war-related environmental damage
Satellite data proves invaluable during wars and armed conflicts. However, its utility extends beyond military applications. By processing data, it becomes possible to estimate the scale of environmental destruction or predict actions necessary for post-war reconstruction. An example of such an initiative is EO4UA (Earth Observation for Ukraine), which aims to support Ukrainian institutions and international organizations in assessing the environmental damage caused by war activities within Ukraine. EO4UA activities are conducted in a cloud computing environment integrated with a substantial repository of Earth observation data, encompassing various datasets, such as satellite imagery, crop classifications, forest fire extents, and more, necessary for comprehensive environmental analysis.
The above-mentioned areas provide only a glimpse into the diverse applications of EO data. As we witness rapid technological advancements in space technologies, we can anticipate gaining unprecedented insights into the Earth's ecosystems. We believe that the future will bring a deeper understanding of our planet, facilitated by rapidly evolving satellite data technologies. Equipped with these advancements, we will be better prepared to address environmental challenges on our planet, fostering a more optimistic outlook for the future of Earth.
Lessons learned from my Kubernetes Journey
By Paweł Turkowski, Product Manager at CloudFerro
At CloudFerro, one of our guiding principles is Technological Courage. This value has strongly resonated with me from the start of my journey at the company. It is a simple yet powerful statement that gives our team a clear guideline not to be afraid to operate outside our comfort zone.
Kubernetes, by today’s standards, is not a new technology anymore. It has graduated to being an often primary go-to model of application deployment. CloudFerro was quick to jump on the Kubernetes bandwagon. We have multiple applications running on Kubernetes, as well as a massive amount of K8S jobs running daily to process data in our satellite image repositories.
For me, personally, it has been an exciting challenge to be a Product Manager for the Kubernetes product offering. Apart from Kubernetes, there is also a large ecosystem of other technologies required to navigate in the DevOps space. I have worked in the Product Management field for several years, managing a variety of different products. The generic understanding of operating principles is a very useful and transferable skill. Yet I think that taking time to understand the domain well is one of the best investments one can make as a PM.
Some of our clients have been running Kubernetes clusters on CloudFerro clouds for likely as long as this technology has lasted. However, in the past couple of years, we have made some significant commitments to grow our Kubernetes offering as a natively available solution on our clouds.
With OpenStack Magnum, we enable our customers to deploy a functional Kubernetes cluster in literally a couple of minutes. It is a matter of selecting a few options from the Horizon UI. In fact, the only key decision before deploying pods/applications/workloads is to choose your cluster size and underlying Virtual Machine flavors. This setting can also be changed after cluster creation and one can opt for an auto-scalable number of nodes.
Aside from making sure the Kubernetes clusters are quick to set up and reliable to run, we also constantly scan the interaction with various integrations used by our partners, to make sure they work seamlessly with Magnum.
Working with Kubernetes is an exciting journey, where you never stop learning. Some of the lessons I can share from my own perspective:
- Browsing through the documentation for hours might not be unusual for a developer, but it is in fact quite the same for a PM working on a highly technical product. These days there is a lot of high-quality, free supportive content that helps a lot. Recently, Chat GPT is also enabling us all to make some significant shortcuts in various domains, and Kubernetes is no exception.
- Learning from others is valuable, however, it helps to do your own research before asking a question (this refers to colleagues, but also e.g., to online forums). You might be able to answer your own question faster, or, at minimum, narrow down the question to the actual core.
- Speaking to clients is of tremendous value. It helps to get a proper understanding of what features we think might help, and which ones actually move the needle for our partners.
It has been a rewarding experience to see the growth of Magnum and the trust of our clients to run their Kubernetes clusters on the CloudFerro infrastructure. I would like to take the opportunity to invite readers of this post to watch my Kubernetes webinar. It can provide some help to quickly get started with Magnum and Kubernetes overall, with some examples from the Earth Observation domain.
Paweł Turkowski
Paweł has over 15 years of experience in product management. At CloudFerro, he is responsible for the infrastructure offering, currently focusing on the Kubernetes platform.
Copernicus Data Space Ecosystem explained
CloudFerro, together with its partners, has been selected by ESA as an industrial team to implement and operate Copernicus Data Space Ecosystem. The goal is to provide full spatial and temporal Copernicus EO data coverage available immediately and free for the users.
On Tuesday, 24th January 2022, the new Copernicus Data Space Ecosystem was launched. The service is taking the European Union Copernicus programme to the next level, ensuring the data make the greatest possible impact on users, all the citizens, and – ultimately – on our planet. The new Copernicus Data Space Ecosystem builds on existing data distribution services (incl. CREODIAS and others), ensuring their continuity and introducing significant improvements.
Evolution of Copernicus data access
The European Union Copernicus programme is among the biggest providers of Earth Observation data in the world and its launch was one of the most important events in the remote sensing community over the past decade. With systematic monitoring over large areas, good quality of data, resolution fitting its objectives, ensured the longevity of the program, and, most importantly, a clear and simple open data policy it took the world by storm. This, however, is not something to be taken for granted. When Copernicus started, remote sensing was a niche field occupying mostly researchers and the intelligence community. There was no specific reason to expect that it will be any different with Sentinel data. But it happened.
The data was picked by enthusiasts, then by companies and institutions. Fast forward a couple of years and suddenly there are hundreds of applications helping farmers to better manage their fields, financial markets forecast prices of corn, journalists regularly use it to observe or validate news, the European Common Agriculture Policy relies on its data to monitor practically all agriculture fields for sustainable agriculture practices, researchers are building digital twins of the environment, security organizations predict migration patterns... There are many more examples.
Nobody expected Copernicus data to start such a revolution in this field. It was (and still is) a tremendous validation of ESA's operational capacity, but at the same time also significant pressure on the data distribution systems as they were not expecting such uptake. Still, the Copernicus Data Hubs have already distributed an order of magnitude more data than is published. With user uptake growing further, as well as the recognized importance of Earth Observation data for the monitoring of climate change, the EU decided to invest in the next level of user data processing and distribution infrastructure. The new Copernicus Data Space Ecosystem was born.
The future of European user data processing and distribution is powered by experienced players - T-Systems and CloudFerro, with their well-used cloud infrastructure, Sinergise and VITO with Sentinel Hub and OpenEO data discovery and processing tools, and DLR, ACRI-ST, and RHEA taking care of on-demand processing and Copernicus Contributing Missions.
What does it mean to CREODIAS users?
This is CloudFerro long-term commitment to grow and maintain the Copernicus EO data repository available in CREODIAS. Within the next few years, one of the largest online repositories worldwide will grow from 35PB to over 85 PB of free and immediately available EO data. Data from all over the world is available directly from the cloud through efficient, cloud-native access mechanisms (both API and GUI). Data is available for current users, new users, and federated platforms.
Copernicus Data Space Ecosystem - what’s in it for users?
- The data offer available for the user will increase. The data repository includes full up-to-date Copernicus data as well as complimentary commercial data offering (EO data+)
- Data that can be processed directly in a cloud environment
- Expanded search and visualisation of the data with Jupyter notebooks for testing and development
- Federation with other EO services – Sentinel Hub, OpenEO, JupyterHub, and others
- Serverless, on-demand processing services – for Sentinel products and other data
- Many services will be free of charge with easy access to a payable but easily scalable extension
Data offer
First and foremost, it's about the data we guarantee - instant data availability. Just about all the data ever acquired by Sentinel satellites, with minor exceptions, will be available online, instantaneously including e.g. Sentinel-1 SLC and GRD and L2 OCN, Sentinel-2 L1C, and L2A (reprocessed to Collection 1, as reprocessed data becomes gradually available), Sentinel-3 and Sentinel-5P L1 and L2 data as well as Copernicus Contributing Missions data. Full archive, and always up-to-date.
There is currently no infrastructure where all of this would be available in one place. The new Copernicus Data Space Ecosystem will be it. Data will be available through various interfaces – from old-fashioned download, direct S3 access with STAC items, and cloud-optimized formats to streamlined access APIs, which are able not just to fetch the data but also to process it.
EO services
There is a web-based application built on top of a very popular EO Browser technology to allow for data visualisation and as a user interface to access the data. Several on-demand processors are capable of building non-default formats and derived products, such as Sentinel-1 coherence and CARD4L products, MAJA-powered atmospheric processing, and similar. The data can be accessed via the openEO and JupyterHub for serverless processing. Last but not least, there will be a special focus put on traceability. For all data managed within the Copernicus Data Space Ecosystem, it will be possible to trace where it came from and what operations were performed on it.
Very important – the vast majority of these capabilities will be available free for use, funded by the European Union. The quotas designated for users should be more than sufficient for the individual's use - personal, research, or commercial. For those interested in the larger-scale operation, there will be practically unlimited resources available under commercial terms in CREODIAS allowing them to build applications on a world scale. Furthermore, there will be significant credits made available, in the form of extra free resources, to be used for research and pre-commercial exploitation opportunities.
In addition to that, users will be able to add and disseminate their collections and processors. What is more, there will be a set of additional services available for the CREODIAS users, such as streamlined access to data, and on-demand processing.
It is all about timing, responsibilities, and commitments
The initiative has quite an intense phase-in plan, in order to allow hundreds of thousands of existing users of the Copernicus Data Hubs to migrate their workflows to the new service. First, a limited but stable, roll-out has just happened (24th January 2023) with continuous upgrades over the upcoming months until full service will be made available by the end of June 2023.
This is a very challenging yet super exciting opportunity for everyone in the remote sensing community - from beginners to experts, from researchers, companies, and institutions, as well as individuals. Just about anyone can benefit from being aware of what is happening with our planet. And we should all take interest in it, and act. For the consortium partners, however, this is especially important - we were given an opportunity to build a new ecosystem for everyone to use. An ecosystem that strives to provide a significant upgrade over existing tools and services and is open to welcoming new partners and service providers in the future! This comes with huge responsibility towards everyone - not just to the ESA and European Commission, but for the worldwide EO community. We are confident we can execute it, so stay tuned for further information.
Discover Copernicus Data Space Ecosystem.
Read more in our news.
How can satellite remote sensing help analyse forests?
Forests cover 31 percent of the world's land surface, that's over 4 billion hectares[1]. According to the research published in Nature Climate Change in 2021, forests are a natural ’carbon sink’ that absorbs a net 7.6 billion metric tonnes of CO2 per year. The degradation of timberland around the world has prompted scientists not only to seek ways to halt these processes but also to develop objective diagnostic methods to assess the status and changes occurring in forests. After all, there is a lot at stake – as the United Nation's Global Forest Goals Report 2021 estimates, 1.6 billion people worldwide depend directly on forests for food, shelter, energy, medicines, and income[2].
The development of technology and the integration of satellite data with environmental models have given us the ability to observe the Earth on a large scale. One of the tools used to monitor the state of the environment, especially complex ecosystems such as forests, is satellite remote sensing.
How satellites for remote sensing work
Satellite remote sensing includes a whole set of procedures involving the acquisition, processing, and interpretation of satellite images, from which spatial information is extracted. Images of a given area can be taken regularly with successive satellite overpasses, which makes it possible to observe changes that have occurred in each area even over a long period of time. These methods allow us to observe changes in land cover, analyse the condition of vegetation over large areas, or monitor the extent of natural disasters, such as droughts, floods, and others. Moreover, remote sensing methods effectively reduce costs and research time. The CREODIAS platform, for example, has more than 30 PB Copernicus EO data available free of charge and off-the-shelf. This platform and similar platforms allow for data to be processed in the cloud without the need for time-consuming downloads to local computers or arrays. Moreover, the public cloud for their processing that CloudFerro provides is a much more cost-effective solution than having infrastructure on one's own.
Observing forests from space
Applying remote sensing techniques to monitor such complex ecosystems as forests can be a laborious undertaking. The complexity lies mainly in the multi-layered structure of forests, their species composition, and spatial variability. Therefore, when analysing images from several months, we can see completely different objects, despite the fact that the characteristics of the forest have remained unchanged.
However, this can bring some advantages to the process. For instance, if we dispose of multi-temporal data, we can better classify the species because their evolution during the growing season is specific. The growing popularity of remote sensing can also be attributed to its detail and comprehensiveness. Today, satellites are able to capture high-resolution images, making it possible to detect even individual trees.
The process of forest area analysis
Today, remote sensing is mainly used for ongoing monitoring and forest inventories. A satellite image is a unique way to do it on a global scale. Even at regional, national, or local scales, remote sensing is often the only way to obtain data on forests located in hard-to-reach places or dangerous regions of the world. Such data can be used by local forest management institutions and global environmental agencies to monitor the condition of the area, and manage environmental emergencies.
The first stage of forest monitoring is determining its spatial extent, where different remote sensing data, are used. This information, combined with the documentation on land use, allows us to create accurate maps, but also to indicate areas of succession.
Knowing the extent of forests, satellite data can also be used for forest analysis to:
- determine the size (volume) of the forest, i.e., the amount of biomass present in a given area. This allows us to estimate the amount of wood that can be harvested, as well as determine the amount of carbon in the ecosystem
- rapidly and accurately identify changes on the forest surface caused by wind, fires, floods, or sustained or human activities
- to map all the impacts of human activities, even in the most remote areas of the planet.
Processing forest data in a cloud environment
One of the most advanced forest analyses performed on a global scale is Forest Digital Twin Earth Precursor developed by the European Space Agency, as part of the Digital Twin Earth Precursor (DTEP) initiative. Its aim is to create a digital replica of all the forest ecosystems, which involves integrating complex environmental models with a vast amount of Earth observation data combined with artificial intelligence solutions. CloudFerro, the operator of the CREODIAS platform, was a partner in the Precursor phase of the Digital Twin Earth project, delivering technical expertise and resources.
Large-scale research projects require advanced competencies and vast IT resources because they need to collect, store and process large volumes of data in an easy, cost-effective, and timely manner. CREODIAS’ powerful cloud infrastructure and its large repository of current and historic Earth observation data greatly facilitate conducting this type of research helping meet computation and storage requirements for processing growing volumes of data.
For more on species classification of forests, go to an article by a forest expert published on CREODIAS.
[1] FAO. The State of the World’s Forests 2022. Forest pathways for green recovery and building inclusive, resilient and sustainable economies. Rome, FAO (2022). https://doi.org/10.4060/cb9360en
[2] United Nations Department of Economic and Social Affairs, United Nations Forum on Forests Secretariat. The Global Forest Goals Report (2021). https://doi.org/10.18356/9789214030515
Computing Cloud as a perfect environment for Big EO data repositories
Copernicus as a flagship European space programme
Copernicus is the European Union's Earth Observation (EO) programme managed by the European Commission. The main goal of the programme is to observe our planet and its environment to benefit all European citizens. The programme has revolutionized the European space industry by making access to sattelite data is easier than never before. The enormous amount of EO data available as a public good for anyone at no cost.Huge amount of archival data and the continuous growth of large EO data sets made it necessary not only to create a cloud environment, but also to develop the usability and exploit the potential of EO data - the DIAS platforms (Data and Information Access Services). For this reason, EO platforms were launched, thanks to which we have gained a powerful tool for EO data access and scalable processing.
DIAS and EO platforms
EO platforms are an essential element to enhance the EO data usability, making it possible to change data into information. The platforms provide users with a computing power that enables fast and efficient processing and allow users to access, process and analyse Copernicus products in the cloud directly connected to an EO archive. The availability of data, storage, and processing in a single place makes it easier to develop and distribute scalable digital services based on the EO data. Below you can see consumption model for DIAS and EO platforms.
We rely on an open- source technology
We have built and now operate several EO platforms based on a hybrid cloud in an Infrastructure-as-a-service model. Our services consist of open-source components, such as Open-Stack for cloud orchestration, Ceph for storage management, Prometheus and Graphana for monitoring, K8s for processing, and others.
EO platform as a service
Taking as an example CREODIAS, WEkEO, CODE-DE and other platforms built and operated by CloudFerro, we have created the main building blocks and architecture of entirely European cloud platforms based on open-source software.
We have managed to create a new service - an EO platform as a service.
Federation and multi- platform approach
Federation and a multi-platform approach allows users and operators to gain benefits from the synergy between different platforms, such as a shared multi PB data repository available from various platforms and elasticity cloud components for additional processing needs. At the same time, it generates a technical challenge that we need to address. We have met challenges focusing on three main topics:
- Interoperability of the environments: how to support easy migration between different clouds and provide direct access to the repositories available from different computing clouds, with multi-cloud processing enabled and dynamically adaptable usage and payment models.
- Data access and dissemination: the multi-petabyte, scalable EO data repositories should provide fast, instantaneous access to many heterogeneous users and algorithms with different needs. At the same time, the data needs to be discoverable and ready for analysis in a context of constantly evolving data offers (new collections, commercial data, products), and users should be supported in data dissemination.
- Cloud efficiency: how to optimize data access, storage, and processing costs and energy usage both for individual users and for the entire ecosystem.
Why is it so important to use EO platform
Using the EO Platform is connected with numerous benefits. The most significant are:
How satellite data can be used to improve comfort of city residents
The use of Earth Observation (EO) data contributes to smart city management helping to reduce harmful effects of urban heat island phenomenon and improve the quality of life.
The density of buildings and urban infrastructure causes changes in the microclimate of urban areas. This has a direct impact on the health and quality of life of the inhabitants. The solution may be Copernicus satellite data, which will allow for a better understanding of spatial phenomena occurring in densely populated places.
What is urban heat island phenomenon and how does it affect cities
Heatwaves boosted by urban heat island phenomenon are the most dangerous weather extremes, which not only damage our health, but also lead to the increase of energy consumption. To reduce or prevent their detrimental effects we need to track urban climate. Urban surface heat island (USHI) is an obvious consequence of urban processes and it is associated with the increase of temperature of urbanized areas in relation to undeveloped rural areas.
The phenomenon results from city morphology* and it is influenced by many factors. Even a single building can increase the ambient temperature. The accumulation of concrete buildings and urban infrastructure causes the temperature in the city to rise from 2 to even 8ºC, and in extreme cases, even above 12ºC. The strongest temperature amplitudes are observed on summer nights.
Urban Heat Island affects human health and energy consumption and that's why the issue of the urban heat island is extremely important in the management and planning of a space comfortable for residents to live.
The most important factors contributing to the urban heat island phenomenon are:
- The environmental context of city
- City size
- City topography
- Water resources
- Urban green areas
- Anthropogenic heat sources (e.g. air conditioning, cars)
* City morphology - structure of the inner and outer urban space and the origin of its parts. The spatial conditions of the city influence the scale of the urban heat island phenomenon. The amount of heat accumulated is determined by factors such as the city's environmental context, the number of inhabitants, the amount of greenery, the height of buildings, width of streets, the predominant color of the surface (albedo), the presence of ventilation corridors or even the roughness of materials used in the city.
The most dangerous weather extreme
Heatwaves are the most dangerous weather extremes that cause more casualties than hurricanes and floods combined. On the average, 1,500 American city dwellers die each year because of the heat, which is many more deaths than those caused by all other natural disasters combined. The compounding effect of heatwaves and UHI could prove to be even deadlier to urban populations, and are extremely dangerous especially for people who suffer from chronic diseases (such as asthma, heart condition), elderly people, manual workers, children and athletes.
Threats to cities caused by the urban heat island phenomenon:
- Health hazard for the elders and risk groups
- Higher energy consumption
- Intensification of convective processes
- Increased probability of heavy rainfall and greater exposure of cities to local flooding.
Land Surface Temperature Monitoring on CREODIAS
The CREODIAS platform allows users to search and download satellite scenes of the Sentinel, Landsat and MODIS programmes at no cost (repository of ready-to-use 30 PB of current and archive Earth observation data) which allows you to analyse the data in order to find the areas that are most endangered by heat waves and the urban heat island phenomenon.
Cloud workflow with EO data repository allows researchers to automate processes and calculate large amounts of data and changes over time. These operations combined with data repository are crucial in identifying critical areas for the microclimatic functioning of cities. Analysts can find spatial patterns to better understand the phenomena occurring in urban space, and consequently help local governments to take necessary steps to improve the quality of life of the residents. As we can see, spatial analyses are crucial for a better understanding of the urban climate.
The best solution - urban green areas
The easiest and most effective way to cool down urban climate is to introduce urban green areas. Vegetation contributes to the formation of shade, reflects solar radiation, and thanks to evapotranspiration processes, it effectively reduces the temperature in cities. Urban vegetation also contributes to the reduction of air and water pollution, reduces noise and accumulates large amounts of water, creating opportunities to reduce energy consumption.
Copernicus data can be applied to analyse urban climate and improve urban greenery in the most neglected areas by new investments in green infrastructure, for instance:
- Planting street greenery (stormwater street design)
- Protecting ecological urban areas
- Green rooftops
- Relocation of industrial areas outside the city
The extensive usage of EO data provides faster and more sustainable economic growth through better use of resources. This is key to achieving sustainable development goals driven by space innovations.
Questions?
Should you have any questions, please contact us at communication@cloudferro.com