Relational DB and SQL
ISO standards of SQL
The term SQL was coined in 1983. It combines DML and DDL.
- ISO SQL
- SQL 1992
- SQL 1999
- SQL 2003
- SQL 2011 with temporal and bitemporal tables for time-versioning
Transaction and ACID properties
A transaction is 'a transformation of state' that has the ACID properties (Atomicity, Consistency, Isolation, Durability),
a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc.
- Atomic means 'all or nothing'; that is, when a statement is executed, every
update within the transaction must succeed in order to be called successful.
There is no partial failure where one update was successful and another related
update failed. The common example here is with monetary transfer.
- Consistent means that data moves from one correct state to another correct
state, with no possibility that readers could view different values that don’t
make sense together.
- Isolated means that transactions executing concurrently will not become entangled
with each other; they each execute in their own space. That is, if two
different transactions attempt to modify the same data at the same time, then
one of them will have to wait for the other to complete.
- Durable means that once a transaction has succeeded, the changes will not be lost. This doesn’t
imply another transaction won’t later modify the same data; it just means that
writers can be confident that the changes are available for the next transaction.
Scalability and performance
- Two phase commit - blocking - ref Gregor Hohpe's blog entry “Starbucks Does Not Use Two-Phase Commit”.
- Compensation - the operation is immediately committed, and then in the event that some error is reported,
a new operation is invoked to restore proper state
- ORM - Object Relationship Mapping, such as Hibernate
- Sharding - split the data so that instead of hosting all of
it on a single server or replicating all of the data on all of the servers in a cluster,
divide up portions of the data horizontally and host them each separately.
Refer to semantic web.
Databases & Object Persistence
Refer also to crypto-applications and blockchain.
- OpenGroup - Distributed Transaction Processing: The XA Specification
- The goal of XA is to guarantee atomicity in "global transactions" that are executed across heterogeneous components.
A transaction is a unit of work such as transferring money from one person to another.
Distributed transactions update multiple data stores (such as databases, application servers, message queues, transactional caches, etc.)
To guarantee integrity, XA uses a two-phase commit to ensure that all of a transaction's changes either take effect (commit) or do not (roll back), i.e., atomically.
The CAP theorem (also named Brewer's theorem) states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability.
Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.
No distributed system is safe from network failures, thus network partitioning generally has to be tolerated.
In the presence of a partition, one is then left with two options: consistency or availability.
When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
In the absence of network failure – that is, when the distributed system is running normally – both availability and consistency can be satisfied.
CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made.
Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE (Basically Available, Soft state, Eventual consistency) philosophy,
common in the NoSQL movement for example, choose availability over consistency.
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
Eventual consistency, also called optimistic replication, is widely deployed in distributed systems, and has origins in early mobile computing projects.
A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence.
Eventual consistency is a weak guarantee – most stronger models, like linearizability are trivially eventually consistent, but a system that is merely eventually consistent does not usually fulfill these stronger constraints.
Eventually-consistent services are often classified as providing BASE semantics, in contrast to traditional ACID (Atomicity, Consistency, Isolation, Durability) guarantees.
The CAP theorem was formally proved to be true by Seth Gilbert and Nancy Lynch of MIT in 2002.
In distributed systems, however, it is very likely that there will be network partitioning, and that at some point,
machines will fail and cause others to become unreachable. Networking issues such as packet loss or
high latency are nearly inevitable and have the potential to cause temporary partitions.
This leads the conclusion that a distributed system must do its best to
continue operating in the face of network partitions (i.e. be partition tolerant),
leaving only two real options to compromise on: availability and consistency.
In 2012 Brewer provided an update of the CAP perspective, describing the 'pick two' axiom as somewhat misleading,
due to avances in partition technology.
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different
from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve.
Good intro at http://www.nosql-database.org/.
Key-value stores, column stores, document stores, graph databases, object databases, XML databases, multi-model databases
- Key-value stores: data items are keys that have a set of attributes. All
data relevant to a key is stored with the key; data is frequently duplicated.
Popular key-value stores include Amazon’s Dynamo DB, Riak, and
Voldemort. Additionally, many popular caching technologies act as key-value
stores, including Oracle Coherence, Redis, and MemcacheD.
- Column stores such Google’s Bigtable served as the inspiration for implementations including Cassandra,
Accumulo, and Apache Hadoop’s HBase.
- Document stores use the complete document as basic unit of storage, often in a format such as JSON, XML, or YAML.
Popular document stores include MongoDB and CouchDB.
- Graph databases represent data as a graph—a network of nodes and edges
that connect the nodes. Both nodes and edges can have properties. Because
they give heightened importance to relationships, graph databases such as
Neo4J, JanusGraph, and DataStax Graph have proven popular for building
social networking and semantic web applications.
- Object databases store data in terms of objects as understood from the discipline of object-oriented
programming. This makes it straightforward to use these databases from
object-oriented applications. Object databases such as db4o and InterSystems
Caché allow you to avoid techniques like stored procedures and objectrelational
mapping (ORM) tools.
- XML databases are a special form of document databases, optimized for working with data in the eXtensible Markup Language
(XML). They include BaseX and eXist.
- Multi-model databases support more than one of the above styles, based on a primary underlying
database (most often a relational, key-value or column store) and exposing additional
models as APIs on top of that underlying database. Examples include Microsoft Azure Cosmos DB,
which exposes document, wide column and graph APIs on top of a key-value store, and DataStax Enterprise,
which offers a graph API on top of Cassandra’s wide column model.
Sharding and shared-nothing architecture
Sharding could be termed a kind of shared-nothing architecture that’s specific to
databases. A shared-nothing architecture is one in which there is no centralized
(shared) state, but each node in a distributed system is independent, so there is no
client contention for shared resources. The term was first coined by Michael Stonebraker at the
University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.”
Shared-nothing architecture was popularised by Google's Bigtable database and its MapReduce implementation
that do not share state, and are therefore capable of near-infinite scaling.
Lucene and Solr
- Lucene - for full text indexing and searching
- Wikipedia - Lucene
- Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking,
hit highlighting and analysis/tokenization capabilities
- Also recognized for its utility in the implementation of Internet search engines and local, single-site searching
- In 2010, the Apache Solr search server joined as a Lucene sub-project, merging the developer communities.
- Solr - search server built using Lucene Core
- LevelDB - fast key-value storage library written at Google that provides an ordered mapping from string keys to string values
- CouchDB - Apache - document oriented
- MongoDB - document oriented
such as int, long, date, floating point, and decimal128
- Mongo’s 'replica set' is a primary/secondary scheme in which the primary can be replaced by an automated leader
- Fundamental difference with relational databases is that the MongoDB query model is implemented as
methods or functions within the API of a specific programming language, as opposed to a completely separate language like SQL
- MongoDB Compass is a GUI
- Cloud version is Mongo DB Atlas
- Ships with four storage engines all of which can coexist within a single MongoDB replica
- default WiredTiger storage engine
- encrypted storage engine, requires MongoDB Enterprise Advanced
- in-memory storage engine for real time analytics, requires MongoDB Enterprise Advanced
- MMAPv1 engine, an improved version of the storage engine used in pre-3.x MongoDB releases
- for authentication MongoDB offers integration with LDAP, Windows Active Directory, Kerberos, and x.509 certificates
- for authorisation roles can be defined in MongoDB, or centrally within an LDAP server
The Cassandra database is a shared-nothing architecture, as it has no central controller
and no notion of primary/secondary replicas; all of its nodes are the same.
Its data model can be described as a partitioned row store, in which
data is stored in sparse multidimensional hashtables.
Somewhat confusingly, this type of data model is also frequently referred to as a wide column store.
'A Decentralized Structured Storage System'
by Facebook’s Lakshman and Malik was a central paper on Cassandra. It was initially developed at Facebook to power
the Facebook inbox search feature.
DataStax: in terms of the CAP theorem, Cassandra can be tuned to act more like a CP (consistent and partition tolerant)
or AP (highly available and partition tolerant) system. Note that it is not possible to "tune" Cassandra into
a completely CA system.
Many of the early production deployments of Cassandra involved storing user activity
updates, social network usage, recommendations/reviews, and application
statistics. These are strong use cases for Cassandra because they involve lots of
writing with less predictable read operations, and because updates can occur unevenly
with sudden spikes. The ability to handle application workloads
that require high performance at significant write volumes with many concurrent
client threads is one of the primary features of Cassandra.
- Sparse means that for any given row you can have one or more columns, but each row doesn’t need to have
all the same columns as other rows like it (as in a relational model).
- Partitioned means that each row has a unique key which makes its data accessible, and the
keys are used to distribute the rows across multiple data stores.
- Distributed, wide column store, NoSQL system designed to handle large amounts of data across commodity servers
- All of the nodes in a Cassandra cluster function exactly the same
- Offers support for clusters spanning multiple datacenters, with asynchronous masterless replication
- Open source, distributed, decentralized, elastically scalable,
highly available, fault-tolerant, tuneably consistent, row-oriented database.
Cassandra bases its distribution design on Amazon’s Dynamo and its data model
on Google’s Bigtable, with a query language similar to SQL.
- Uses a peer-to-peer architecture and gossip protocol to maintain and keep in sync a list of nodes
that are alive or dead.
- Clustes, key spaces and consistency:
- A cluster is a container for keyspaces. A keyspace is the outermost container for
data in Cassandra, corresponding closely to a database in the relational model.
A key space is a container for tables.
- Companies that use both microservice architectures and Cassandra at large scale,
such as Netflix, are known to use dedicated clusters per service
- A table is a container for an ordered collection of rows, each of which is itself an
ordered collection of columns. Rows are organized in partitions and assigned to
nodes in a Cassandra cluster according to the column(s) designated as the partition
key. The ordering of data within a partition is determined by the clustering
- When you write data to a table in Cassandra, you specify values for one or more
columns. That collection of values is called a row. You must specify a value for
each of the columns contained in the primary key as those columns taken together
will uniquely identify the row.
- Uses a special type of primary key called a composite key to represent groups of related rows,
also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns.
The partition key determines the nodes on which rows are stored and
can itself consist of multiple columns. The clustering columns are used to control
how data is sorted for storage within a partition.
- Also supports static columns for storing data that is not part of the primary key but is shared by
every row in a partition.
- Trades some consistency in order to achieve total availability.
Cassandra is more accurately termed 'tuneably consistent' which means it allows
to decide the level of consistency required, in balance with the level of availability
- Dynamo and Cassandra choose to be always writable, opting to defer the complexity
of reconciliation to read operations, and realize tremendous performance
gains. The alternative is to reject updates amidst network and server failures.
- The replication factor is set per key space and lets you decide how much you want to pay in performance
to gain more consistency. You set the replication factor to the number of nodes in
the cluster you want the updates to propagate to
- The consistency level is set by the client on every query and allows to decide how many replicas in the
cluster must acknowledge a write operation or respond to a read operation in order to be considered successful.
- Cassandra provides the ability to achieve strong consistency by specifying sufficiently high consistency levels on
writes and reads. However, strong consistency is not enough to prevent race conditions
in cases where clients need to read, then write data.
Therefore, since Cassandra 2.0 a lightweight transaction (LWT) mechanism is included, based on Paxos
Cache management with atomicity: see Redis, https://en.wikipedia.org/wiki/Redis.
In 2012 two key papers were produced regarding new forms of SQL databases.
These are the Calvin transaction protocol paper from Yale and the Google Spanner paper. FaunaDB is an example of the first,
Google Cloud Spanner, Cockroach DB and YugaDB are examples of the second.