Ceph

Distributed storage system that allows us to build flexible storage tiers

Ceph - ceph

What is Ceph?

Ceph is an open-source distributed storage system that may be scaled to thousands of nodes offering petabytes of high-performance, secure, highly available distributed storage.

What is the role of Ceph?

Ceph cuts user data into small chunks that get distributed across multiple servers with a policy-regulated number of copies. Ceph Clients use a hashing algorithm to determine the location of data, fetching it directly from the relevant server, without going through a proxy/gateway that would constitute a throughput bottleneck or a single point of failure of the system.

Though internally, Ceph is an object storage system, it may be used to provide storage services in the form of blocks (or Volumes) that can be mounted in a Virtual Machine, Object Storage that can be accessed through an HTTP RESTFull S3 or SWIFT interface or a network filesystem share that can be accessed from multiple servers.

Ceph advantages

Ceph - data center
Flexible storage tiers

Ceph allows us to build extremely flexible storage tiers, combining HDD, SSD, and NVMe disks, exploiting the advantages of each media type. Ceph allows for flexible configuration of additional features such as redundancy type and level (number of replicated or erasure-coded copies), cache’ing, or data distribution strategy (ex. anti-affinity at the server, rack, or floor level), compression, and data encryption.

These capabilities combined with the appropriate choice of hardware (disk, server, and network sizing) allow us to build storage services perfectly adapted to customer needs – from high-performance replicated SSD-based volume storage to inexpensive, petabyte-scale erasure-coded HDD object storage.

Ceph - server
Security and reliability

The system is fully resilient to hardware failures – including disks, servers, or network components. In case of a disk or server failure resulting in loss of data, Ceph automatically restores data redundancy by replicating the missing data from another copy, to restore the defined redundancy level. In addition, Ceph keeps object-level CRC-s and performs continuous data scrubbing to detect and proactively repair broken data objects.

The network components are fully redundant with a spine-leaf topology and double connections to every server. Ceph also provides several security-related features such as in-transit and at-rest data encryption, access rights management, and strong user authentication.

Ceph - cloud
Rich functionality

Ceph provides a rich set of functionalities for each of the volume, object, and filesystem services it offers to users. The volume service provides users with block storage volumes that may be used in VMs. The storage blocks supplied by Ceph may be used as booting or extended volumes that may be attached to a running VM. Ceph provides features such as thin provisioning, instant snapshots, in-the-background backups, and volume resize.

Thanks to the data distribution mechanism, volume sizes are virtually unlimited and their performance grows with size.

Ceph - diagram
Scalable to tens of Petabytes

Ceph clusters are designed to provide storage services at a petabyte scale. Clusters may be scaled horizontally by the addition of extra nodes. When extra nodes are added, Ceph automatically rebalances the cluster to equalize storage space usage and load. Rebalancing happens in the background without interrupting cluster operations. Additional nodes increase not only the cluster’s capacity but also its total throughput.

Ceph is also equipped with automated error detection, data scrubbing, and repair mechanisms that provide data resiliency at scale.

Ceph - 6 cloud computing
Fully integrated with OpenStack

Ceph is the storage backend of choice for OpenStack-based clouds. It provides services for the VM-s block storage, enabling functionalities such as thin provisioning, snapshotting, volume extension, and live VM migration. Ceph is also used by the Image Service to store system images that are used by the running VM-s. Ceph is also used to provide S3 and SWIFT Object Storage Services in OpenStack.

With careful tuning and design of the data pools used by the different OpenStack storage services, Ceph delivers performance and functionality adapted to the different usage scenarios cost-effectively.

Ceph - save money
Cost efective and green

Ceph was designed from the ground up to build large data stores cost-effectively. Based on standard, off-the-shelf hardware components, it makes the best use of modern storage technologies such – NVMe, SSDs, and HDD, leveraging their specific capabilities. Expensive NVMe-s are used to store small, but frequently used data items such as indexes and metadata, allowing for fast operation of the whole cluster

SSD-s deliver IOPS-oriented performance, while inexpensive HDD-s are used for cost-sensitive applications requiring large data volumes. The erasure-coding and compression features allow for cost and energy savings by reducing the overhead needed for data redundancy by up to 66%, allowing for green and cost-efficient data storage.

See also