Global AI embeddings in Earth Observation
CloudFerro and European Space Agency (ESA) Φ-lab have published the first global embeddings dataset for Earth observations. This innovative dataset integrates cutting-edge AI technologies to advance Earth Observation capabilities, enabling more precise and scalable analysis of satellite data.
What are embeddings?
Embeddings are high-dimensional vectors that transform complex data into numerical representations, capturing relationships and meanings.
By mapping words, images, or entire documents into this space, embeddings maintain the semantic properties of the original data, allowing AI models to understand and process context with accuracy.
This enables machines to identify patterns, similarities, and connections that might otherwise be challenging to detect.
Embeddings insights
First global embeddings dataset for EO
CloudFerro and European Space Agency (ESA) Φ-lab has introduced the first global embeddings dataset for Earth observations. This innovative dataset integrates cutting-edge AI technologies to advance Earth Observation capabilities, enabling more precise and scalable analysis of satellite data. By harnessing the power of GPU-accelerated instances provided by CloudFerro, that not only shares the infrastructure but also plays a key role in preparing the embeddings, alongside the expertise of ESA Φ-lab.
As a CloudFerro team, we have created a high-performance computational environment capable of processing vast amounts of Earth Observation data at an unprecedented scale. The global embeddings are computed using the CREODIAS cloud service platform, powered by GPU-accelerated instances provided by CloudFerro.
Value for Earth Observation data
Embeddings are increasingly valuable in the field of Earth Observation (EO), offering a range of applications for professionals across this sector.
Global embeddings for satellite images
CloudFerro, in collaboration with ESA Phi Lab, has successfully calculated global embeddings based on advanced AI models for Sentinel-2 and Sentinel-1 imagery at a 10-meter resolution. Utilizing general-purpose vision models like SigLIP and DINOv2, along with SSL4EO for Earth Observation models. This global run marks a major advancement in our efforts, representing a significant leap forward in the scale and scope of Earth Observation data processing.
Over 170 Million Embeddings from Trillions of Pixels
Over 170 million embeddings were generated from more than 62 TB of raw data, distilling insights from 9.368 trillion pixels of source data. This comprehensive analysis involved processing more than 8 million Sentinel-1 and Sentinel-2 images from the Major TOM dataset.
CloudFerro and Phi Lab have created an efficient data representation that captures key relationships and insights, enabling faster processing, easier analysis, and more actionable data for smarter decision-making. This work is part of an expanded standard for releasing Major TOM. Embeddings expansion through open datasets on HuggingFace, including:
Enhancing trend discovery, and applications e.g. in agriculture, land management, image restoration, through AI-driven embeddings.
AI embeddings enable a broad range of advanced capabilities for more intelligent and efficient analyses of satellite imagery.
Embeddings can enhance satellite image browsers, allowing users to quickly and accurately explore vast datasets with improved navigation and context. With similarity search, patterns and trends from satellite images can be identified at a global scale, unlocking insights into climate patterns, urbanization, and natural events. Additionally, AI-driven embeddings help model crop yields, improving agricultural predictions, while aiding in image restoration for clearer, more precise visuals. For land cover classification, embeddings enable more accurate mapping of various terrains and crop types, supporting better land management and resource allocation.
Evaluating embeddings across EO tasks, integrating foundation models, and expanding resource availability through the CREODIAS repository.
Next steps
The next steps involve testing and evaluating the computed embeddings across a range of Earth Observation (EO) tasks to assess their performance and applicability in various real-world scenarios. Additionally, further testing will be conducted with other EO foundation models, including MMEarth and DeCUR, to examine their integration with the existing embeddings and optimize their capabilities. Furthermore, the MajorTOM dataset with embeddings will be integrated into the CREODIAS repository, providing access to these resources for the broader EO research community.