Architecture
Apache Ozone is a scalable, distributed, and highly available object store designed to handle billions of objects of any size. This document provides an overview of Ozone's architecture, including its core components, data organization, and operational concepts.
Ozone Namespace
Ozone organizes data in a hierarchical namespace consisting of three levels:
Volumes
Volumes are the top-level entities in the Ozone namespace, conceptually similar to filesystems in traditional storage systems. They typically represent:
- Multi-tenant boundaries
- Organizational divisions
- Project groupings
Volumes provide isolation and can have their own admins and quota limits. For more information, see Volumes Overview.
Buckets
Buckets exist within volumes and act as containers for objects (keys). Each bucket can be configured with specific properties:
- Storage type and replication factor
- Encryption settings
- Access control policies
- Quota limits
Buckets are analogous to directories in a filesystem or buckets in cloud object stores. For more information, see Buckets Overview.
Keys (Objects)
Keys are the actual data objects stored in buckets. They can be:
- Files of any size
- Binary data
- Named using a path-like structure depending on bucket layout
For more details about Ozone's namespace, see Namespace Overview.
Ozone Bucket Types
Ozone supports two distinct bucket layouts, each optimized for different use cases:
Object Store Layout
The Object Store layout (OBS) works like traditional object stores with a flat namespace:
- Objects are stored with their full key path
- No concept of directories (though path-like naming is supported)
- Optimized for object storage workloads
- Compatible with S3-style access patterns
File System Optimized Layout
The File System Optimized layout (FSO) provides hierarchical directory structure:
- Directories are first-class entities
- Supports efficient directory operations (listing, renaming)
- Includes filesystem features like trash
- Better performance for filesystem-style workloads
- Default layout type
The bucket layout determines how data is organized and accessed within a bucket. For more information, see Bucket Layouts.
Core Components
Ozone has a modular architecture with several key components that work together to provide a scalable and reliable storage system.
Ozone Manager (OM)
The Ozone Manager is the metadata server that manages the namespace:
- Handles all volume, bucket, and key operations
- Maintains metadata in RocksDB
- Allocates blocks for data storage
- Manages access control
- Supports HA deployment with Ratis consensus protocol
The OM is the entry point for all namespace operations. It tracks which objects exist and where they are stored. For more information, see Ozone Manager Details.
Storage Container Manager (SCM)
The Storage Container Manager orchestrates the container lifecycle and coordinates datanodes:
- Manages container creation and allocation
- Tracks datanode status and health
- Handles container replication and EC
- Issues block allocation requests
- Coordinates container balancing
- Supports HA deployment with Ratis
SCM is the control plane for container management. For more information, see Storage Container Manager Details.
Datanode
Datanodes are the workhorses that store the actual data:
- Store data in containers on local disks
- Serve read and write requests
- Report container statistics to SCM
- Participate in replication pipelines
- Handle data integrity checks
Each datanode manages a set of containers and serves read/write requests from clients. For more information, see Datanode Details.
Recon
Recon is the analytics and monitoring component:
- Collects and aggregates metrics
- Provides a web UI for monitoring
- Offers a consolidated view of the cluster
- Helps identify issues and bottlenecks
- Syncs data from OM, SCM, and Datanodes
Recon is an optional but recommended component for operational visibility. For more information, see Recon Details.
S3 Gateway
The S3 Gateway provides S3-compatible API access:
- Implements S3 REST API
- Translates S3 operations to Ozone operations
- Supports most S3 features and SDKs
- Provides authentication and authorization
The S3 Gateway enables applications built for S3 to work with Ozone. For more information, see S3 Gateway Details.
HttpFS
HttpFS provides a REST interface compatible with WebHDFS:
- Enables HTTP access to Ozone
- Compatible with HDFS clients
- Supports read/write operations
- Facilitates integration with web applications
HttpFS allows web applications to interact with Ozone using familiar APIs. For more information, see HttpFS Details.
Ozone Client
The Ozone client is the software component that enables applications to interact with the Ozone storage system:
- Provides Java libraries for programmatic access
- Handles communication with OM for namespace operations
- Manages direct data transfer with datanodes
- Implements client-side caching for improved performance
- Offers pluggable interfaces for different protocols (S3, OFS)
- Handles authentication and token management
The client library abstracts away the complexity of the distributed system, providing applications with a simple, consistent interface to Ozone storage. For more information, see Client Details.
Component Interactions
The components of Ozone interact in well-defined patterns for different operations:
Write Path Sequence
The typical write sequence follows these steps:
- Namespace Operations: The client contacts the Ozone Manager to create or locate the key in the namespace
- Block Allocation: The Ozone Manager requests blocks from the Storage Container Manager
- Data Transfer: The client directly writes data to the selected Datanodes according to the replication pipeline
Read Path Sequence
For reads, the process is simpler:
- The client requests key location information from the Ozone Manager
- Using the block location information, the client reads data directly from Datanodes
- In case of failures, the client retries with alternative replicas
Monitoring and Management
The Recon service continuously:
- Collects metrics from the Ozone Manager, Storage Container Manager, and Datanodes
- Provides consolidated views of system health and performance
- Facilitates troubleshooting and management
Ozone Internals
Understanding Ozone's internal structure helps to grasp how data is organized and protected.
Containers
Containers are the fundamental storage units in Ozone:
- Fixed-size (typically 5GB) units of storage
- Managed by the Storage Container Manager (SCM)
- Replicated or erasure-coded across datanodes
- Contain multiple blocks
- Include metadata and chunk files
Containers are self-contained units that include all necessary metadata and data. They are the unit of replication in Ozone.
Blocks
Blocks are logical units of data within containers:
- Represent portions of objects/keys
- Created when clients write data
- Referenced by object metadata
- Allocated by the Ozone Manager
- Secured with block tokens
When a client writes data, the OM allocates blocks from SCM, and the client writes data to these blocks through datanodes.
Chunks
Chunks are the physical data units stored on disk:
- Fixed-size portions of blocks
- Written sequentially in container data files
- Checksummed for data integrity
- Optimized for disk I/O
Chunks are the smallest units of data stored on disk and include checksums for integrity verification.
Replicated Containers
Ozone provides durability through container replication:
- Default replication factor is 3
- Uses Ratis (Raft) consensus protocol
- Synchronously replicates data across datanodes
- Provides strong consistency guarantees
- Handles node failures transparently
Replicated containers ensure data durability by storing multiple copies of each container across different datanodes.
Erasure Encoded Containers
Erasure coding provides space-efficient durability:
- Splits data across multiple datanodes with parity
- Supports various coding schemes (e.g., RS-3-2-1024k)
- Reduces storage overhead compared to replication
- Trades some performance for storage efficiency
- Suitable for cold data storage
Erasure coding allows for data durability with less storage overhead than full replication.
Pipelines
Pipelines are groups of datanodes that work together to store data:
- Managed by SCM
- Consist of multiple datanodes
- Handle write operations as a unit
- Support different replication strategies
For detailed information, see Write Pipelines.
Ratis Replicated
Ratis pipelines use the Raft consensus protocol:
- Typically three datanodes per pipeline
- One leader and multiple followers
- Synchronous replication
- Strong consistency guarantees
- Automatic leader election on failure
Erasure Coded
Erasure coded pipelines distribute data and parity:
- Datanodes store data or parity chunks
- EC pipeline size depends on the coding scheme
- Handles reconstruction on node failure
- Optimized for storage efficiency
Multi-Protocol Access
Ozone supports multiple access protocols, making it versatile for different applications:
Native API
- Command-line interface and Java API
- Full feature access
- Highest performance
Ozone File System (OFS)
- Hadoop-compatible filesystem interface
- Works with all Hadoop ecosystem applications
- Path format:
ofs://om-host/volume/bucket/key
S3 Compatible
- REST API compatible with Amazon S3
- Works with S3 clients and SDKs
- Path format:
http://s3g-host/bucket/key
HttpFS
- REST API compatible with WebHDFS
- Enables web applications to access Ozone
- Path format:
http://httpfs-host/webhdfs/v1/volume/bucket/key
The multi-protocol architecture allows for flexible integration with existing applications and workflows. For more information, see Client Interfaces.
Summary
Apache Ozone's architecture provides:
- Scalability through separation of metadata and data paths
- Reliability via replication and erasure coding
- Flexibility with multiple access protocols
- Performance through optimized data paths
- Manageability with comprehensive monitoring
This architecture enables Ozone to serve as both an object store and a filesystem, making it suitable for a wide range of applications from big data analytics to cloud-native workloads.
For more detailed information about Ozone's components and internals, see the System Internals section.