Data management in cloud environments: NoSQL and NewSQL data stores

Grolinger, Katarina; Higashino, Wilson A; Tiwari, Abhinav; Capretz, Miriam AM

doi:10.1186/2192-113X-2-22

Journal of Cloud Computing

Advances, Systems and Applications

Table 2 Partitioning, replication, consistency, and concurrency control capabilities

From: Data management in cloud environments: NoSQL and NewSQL data stores

NoSQL data stores		Partitioning	Replication	Consistency	Concurrency control
Key-value stores	Redis	Not available (planned for Redis Cluster release). It can be implemented by a client or a proxy.	Master–slave, asynchronous replication.	Eventual consistency.	Application can implement optimistic (using the WATCH command) or pessimistic concurrency control.
	Redis		Master–slave, asynchronous replication.	Strong consistency if slave replicas are solely for failover.
	Memcached	Clients’ responsibility. Most clients support consistent hashing.	No replication	Strong consistency (single instance).	Application can implement optimistic (using CAS with version stamps) or pessimistic concurrency control.
	Memcached		Repcached can be added to memcached for replication.	Strong consistency (single instance).
	BerkeleyDB	Key-range partitioning and custom partitioning functions. Not supported by the C# and Java APIs at this time.	Master–slave	Configurable	Readers-writer locks
	Voldemort	Consistent hashing.	Masterless, asynchronous replication.	Configurable, based on quorum read and write requests.	MVCC with vector clock
	Voldemort	Consistent hashing.	Replicas are located on the first R nodes moving over the partitioning ring in a clockwise direction.	Configurable, based on quorum read and write requests.	MVCC with vector clock
	Riak	Consistent hashing.	Masterless, asynchronous replication.	Configurable, based on quorum read and write requests.	MVCC with vector clock.
	Riak	Consistent hashing.	The built-in functions determine how replicas distribute the data evenly.	Configurable, based on quorum read and write requests.	MVCC with vector clock.
Column family stores	Cassandra	Consistent hashing and range partitioning (known as order preserving partitioning in Cassandra terminology) is not recommended due to the possibility of hot spots and load balancing issues.	Masterless, asynchronous replication.	Configurable, based on quorum read and write requests.	Client-provided timestamps are used to determine the most recent update to a column. The latest timestamp always wins and eventually persists.
	Cassandra		Two strategies for placing replicas: replicas are placed on the next R nodes along the ring; or, replica 2 is placed on the first node along the ring that belongs to another data centre, with the remaining replicas on the nodes along the ring in the same rack as the first.	Configurable, based on quorum read and write requests.
	HBase	Range partitioning.	Master–slave or multi-master, asynchronous replication. Does not support read load balancing (a row is served by exactly one server). Replicas are used only for failover.	Strong consistency	MVCC
	DynamoDB	Consistent hashing.	Three-way replication across multiple zones in a region.	Configurable	Application can implement optimistic (using incrementing version numbers) or pessimistic concurrency control.
	DynamoDB	Consistent hashing.	Synchronous replication	Configurable
	Amazon SimpleDB	Partitioning is achieved in the DB design stage by manually adding additional domains (tables). Cannot query across domains.	Replicas within a chosen region.	Configurable	Application can implement optimistic concurrency control by maintaining a version number (or a timestamp) attribute and by performing a conditional put/delete based on the attribute value.
Document stores	MongoDB	Range partitioning based on a shard key (one or more fields that exist in every document in the collection). In addition, hashed shard keys can be used to partition data.	Master–slave, asynchronous replication.	Configurable	Readers–writer locks
	MongoDB		Master–slave, asynchronous replication.	Two methods to achieve strong consistency: set connection to read only from primary; or, set write concern parameter to “Replica Acknowledged”.	Readers–writer locks
	CouchDB	Consistent hashing.	Multi-master, asynchronous replication.	Eventual consistency.	MVCC. In case of conflicts, the winning revision is chosen, but the losing revision is saved as a previous version.
	CouchDB	Consistent hashing.	Designed for off-line operation. Multiple replicas can maintain their own copies of the same data and synchronize them at a later time.	Eventual consistency.
	Couchbase server	A hashing function determines to which bucket a document belongs. Next, a table is consulted to look up the server that hosts that bucket.	Multi-master.	Within a cluster: strong consistency.	Application can implement optimistic (using CAS) or pessimistic concurrency control.
	Couchbase server		Multi-master.	Across clusters: eventual consistency.
Graph databases	Neo4J	No partitioning (cache sharding only).	Master–slave, but can handle write requests on all server nodes. Write requests to slaves must synchronously propagate to master.	Eventual consistency.	Write locks are acquired on nodes and relationships until committed.
	Hyper GraphDB	Graph parts can reside in different P2P nodes. Builds on autonomous agent technologies.	Multi-master, asynchronous replication.	Eventual consistency.	MVCC.
	Hyper GraphDB		Agent style communication based on Extensible Messaging and Presence Protocol (XMPP) .	Eventual consistency.	MVCC.
	Allegro graph	No partitioning (federation concept which aims to integrate graph databases is abstract at the moment).	Master–slave.	Eventual consistency.	Unclear how locking is implemented “100% Read Concurrency, Near Full Write Concurrency”.
NewSQL	VoltDB	Consistent hashing. Users define whether stored procedures should run on a single server or on all servers.	Updates executed on all replicas at the same time.	Strong consistency.	Single threaded model (no concurrency control).
	Spanner	Data partitioned into tablets. Complex policies determine in which tablet the data should reside.	Global ordering in all replicas (Paxos state machine algorithm).	Strong consistency.	Pessimistic locking in read-write transactions. Read-only transactions are lock-free (versioned reads).
	Clustrix	Consistent hashing. Also partitions the table indices using the same approach.	Updates executed on all replicas at the same time.	Strong consistency.	MVCC.
	NuoDB	No partition. The underlying key-value store can partition the data, but it is not visible by the user.	Multi-master (distributed object replication). Asynchronous.	Eventual consistency.	MVCC.

Back to article page