|

 

Large-scale data storage and access technology 
to power European and global research and industry

 

 

Large research projects require collecting, storing, processing and analysing huge amounts of data. Now thanks to the advancements in large data storage and access technologies, it is possible to provide large volumes of data in an instant manner. CloudFerro has carried out tests that proved its CREODIAS platform is now able to serve over 2PB daily.

Scientific exploration - new paradigm

The Fourth Paradigm of discovery is a concept  that assumes the science and scientific exploration processes will be redefined by an open or increased access to data and new methods of sharing this data. In today’s world where we are overwhelmed by large amount of information, it is necessary to supplement classic scientific paradigms: observation, theory and simulation with the fourth one: large-scale Data Exploration. This will unify observation, theory and simulation in an extensive system.
 

Recent research projects and initiatives highlight  the importance of making data, tools, technologies and platforms more accessible and easier to use. Initiatives like Digital Twin Earth, Human Brain Project, Digital Twin Manufacturing, and many others are frameworks for understanding, modelling and forecasting the behaviour of extremely complex systems. For such frameworks to work it is indispensable to store and manage huge amounts of heterogenous data and to make it available through unified, flexible, streamlined interfaces to multiple user communities.
 

Storing and dissemination of peta- or exabytes of heterogenous data in an open and flexible manner poses a serious technical challenge. Object storage provides a solution that is cost-effective, easily scalable and accessible. It allows for storing unstructured or extremely diversely structured data, thanks to the lack of hierarchical directory structure. Instead, object storage uses a unique identifier for each object. This convention and “flat” architecture allows also for massive, dynamic scaling of the storage where the scale is nearly infinite. The matter is to employ it in a real life, commercially viable scenario. Something that requires advanced cataloguing and network services altogether guaranteeing an automated, fast and easy access to data stored online for immediate use.

The solution

CREODIAS, operated by CloudFerro, currently stores almost 21 PB of Earth Observation data. On average, it ingests 25 TB of data daily and disseminates it to more than 6 thousand registered users of the portal and countless non-registered ones. 


Using opensource CEPH software for building storage – thanks to its ability to build and manage object storage for OpenStack – and an advanced cataloguing solution, CREODIAS platform can serve data inside and outside of its cloud via graphical application (CREODIAS Finder) and a variety of access interfaces. This machine2machine interfacing enables stakeholders to leverage the data in their processing chains in an automated manner, both on CREODIAS cloud and on any other infrastructure of their choice.

Why CloudFerro's cloud?

CloudFerro builds on expertise and lessons learned from our previous projects. This is why we are able to ingest, store, index and disseminate massive amounts of EO data, tens and hundreds of Petabytes. 

  • CloudFerro has recently conducted tests proving the ability to serve 2PB of data daily from our repositories. We are even able to double this rate. 

  • CloudFerro’s experience and technology proved excellent performance in benchmarking and tests results. 

  • We are currently storing in our cloud over 21 PB of EO data and it can grow to 50PB if necessary in the near future

  • CloudFerro has gained broad experience in building and operating numerous large cloud platforms - Climate Data Store, CODE-DE, WEkEO, EO IPT and others - with a combined storage of over 100PB.

All the above allows us to operate at a scale required by such complex projects as Destination Earth. 
With our services, our clients can have an easy, remote, broadband and scalable access to online, granular data in a cost-efficient manner. Those are vital capabilities when the fourth paradigm is in force.