Skip to main content

Advances, Systems and Applications

Journal of Cloud Computing Cover Image

Table 2 Partitioning, replication, consistency, and concurrency control capabilities

From: Data management in cloud environments: NoSQL and NewSQL data stores

NoSQL data stores Partitioning Replication Consistency Concurrency control
Key-value stores Redis Not available (planned for Redis Cluster release). It can be implemented by a client or a proxy. Master–slave, asynchronous replication. Eventual consistency. Application can implement optimistic (using the WATCH command) or pessimistic concurrency control.
Strong consistency if slave replicas are solely for failover.
Memcached Clients’ responsibility. Most clients support consistent hashing. No replication Strong consistency (single instance). Application can implement optimistic (using CAS with version stamps) or pessimistic concurrency control.
Repcached can be added to memcached for replication.
BerkeleyDB Key-range partitioning and custom partitioning functions. Not supported by the C# and Java APIs at this time. Master–slave Configurable Readers-writer locks
Voldemort Consistent hashing. Masterless, asynchronous replication. Configurable, based on quorum read and write requests. MVCC with vector clock
Replicas are located on the first R nodes moving over the partitioning ring in a clockwise direction.
Riak Consistent hashing. Masterless, asynchronous replication. Configurable, based on quorum read and write requests. MVCC with vector clock.
The built-in functions determine how replicas distribute the data evenly.
Column family stores Cassandra Consistent hashing and range partitioning (known as order preserving partitioning in Cassandra terminology) is not recommended due to the possibility of hot spots and load balancing issues. Masterless, asynchronous replication. Configurable, based on quorum read and write requests. Client-provided timestamps are used to determine the most recent update to a column. The latest timestamp always wins and eventually persists.
Two strategies for placing replicas: replicas are placed on the next R nodes along the ring; or, replica 2 is placed on the first node along the ring that belongs to another data centre, with the remaining replicas on the nodes along the ring in the same rack as the first.
HBase Range partitioning. Master–slave or multi-master, asynchronous replication. Does not support read load balancing (a row is served by exactly one server). Replicas are used only for failover. Strong consistency MVCC
DynamoDB Consistent hashing. Three-way replication across multiple zones in a region. Configurable Application can implement optimistic (using incrementing version numbers) or pessimistic concurrency control.
Synchronous replication
Amazon SimpleDB Partitioning is achieved in the DB design stage by manually adding additional domains (tables). Cannot query across domains. Replicas within a chosen region. Configurable Application can implement optimistic concurrency control by maintaining a version number (or a timestamp) attribute and by performing a conditional put/delete based on the attribute value.
Document stores MongoDB Range partitioning based on a shard key (one or more fields that exist in every document in the collection). In addition, hashed shard keys can be used to partition data. Master–slave, asynchronous replication. Configurable Readers–writer locks
Two methods to achieve strong consistency: set connection to read only from primary; or, set write concern parameter to “Replica Acknowledged”.
CouchDB Consistent hashing. Multi-master, asynchronous replication. Eventual consistency. MVCC. In case of conflicts, the winning revision is chosen, but the losing revision is saved as a previous version.
Designed for off-line operation. Multiple replicas can maintain their own copies of the same data and synchronize them at a later time.
Couchbase server A hashing function determines to which bucket a document belongs. Next, a table is consulted to look up the server that hosts that bucket. Multi-master. Within a cluster: strong consistency. Application can implement optimistic (using CAS) or pessimistic concurrency control.
Across clusters: eventual consistency.
Graph databases Neo4J No partitioning (cache sharding only). Master–slave, but can handle write requests on all server nodes. Write requests to slaves must synchronously propagate to master. Eventual consistency. Write locks are acquired on nodes and relationships until committed.
Hyper GraphDB Graph parts can reside in different P2P nodes. Builds on autonomous agent technologies. Multi-master, asynchronous replication. Eventual consistency. MVCC.
Agent style communication based on Extensible Messaging and Presence Protocol (XMPP) .
Allegro graph No partitioning (federation concept which aims to integrate graph databases is abstract at the moment). Master–slave. Eventual consistency. Unclear how locking is implemented “100% Read Concurrency, Near Full Write Concurrency”.
NewSQL VoltDB Consistent hashing. Users define whether stored procedures should run on a single server or on all servers. Updates executed on all replicas at the same time. Strong consistency. Single threaded model (no concurrency control).
Spanner Data partitioned into tablets. Complex policies determine in which tablet the data should reside. Global ordering in all replicas (Paxos state machine algorithm). Strong consistency. Pessimistic locking in read-write transactions. Read-only transactions are lock-free (versioned reads).
Clustrix Consistent hashing. Also partitions the table indices using the same approach. Updates executed on all replicas at the same time. Strong consistency. MVCC.
NuoDB No partition. The underlying key-value store can partition the data, but it is not visible by the user. Multi-master (distributed object replication). Asynchronous. Eventual consistency. MVCC.