Database
Relational DB and SQL
Training
ISO standards of SQL
The term SQL was coined in 1983. It combines DML and DDL.
- ISO SQL
- SQL 1992
- SQL 1999
- SQL 2003
- SQL 2011 with temporal and bitemporal tables for time-versioning
Transaction and ACID properties
A transaction is 'a transformation of state' that has the ACID properties (Atomicity, Consistency, Isolation, Durability),
a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc.
- Atomic means 'all or nothing'; that is, when a statement is executed, every
update within the transaction must succeed in order to be called successful.
There is no partial failure where one update was successful and another related
update failed. The common example here is with monetary transfer.
- Consistent means that data moves from one correct state to another correct
state, with no possibility that readers could view different values that don’t
make sense together.
- Isolated means that transactions executing concurrently will not become entangled
with each other; they each execute in their own space. That is, if two
different transactions attempt to modify the same data at the same time, then
one of them will have to wait for the other to complete.
- Durable means that once a transaction has succeeded, the changes will not be lost. This doesn’t
imply another transaction won’t later modify the same data; it just means that
writers can be confident that the changes are available for the next transaction.
Scalability and performance
- Two phase commit - blocking - ref Gregor Hohpe's blog entry “Starbucks Does Not Use Two-Phase Commit”.
- Compensation - the operation is immediately committed, and then in the event that some error is reported,
a new operation is invoked to restore proper state
- ORM - Object Relationship Mapping, such as Hibernate
- Sharding - split the data so that instead of hosting all of
it on a single server or replicating all of the data on all of the servers in a cluster,
divide up portions of the data horizontally and host them each separately.
Graph databases
Refer to semantic web.
Databases & Object Persistence
Distributed databases
- OpenGroup - Distributed Transaction Processing: The XA Specification
- The goal of XA is to guarantee atomicity in "global transactions" that are executed across heterogeneous components.
A transaction is a unit of work such as transferring money from one person to another.
Distributed transactions update multiple data stores (such as databases, application servers, message queues, transactional caches, etc.)
To guarantee integrity, XA uses a two-phase commit to ensure that all of a transaction's changes either take effect (commit) or do not (roll back), i.e., atomically.
Refer also to crypto-applications and blockchain.
CAP theorem
The CAP theorem (also named Brewer's theorem) states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability.
Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.
No distributed system is safe from network failures, thus network partitioning generally has to be tolerated.
In the presence of a partition, one is then left with two options: consistency or availability.
When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
In the absence of network failure – that is, when the distributed system is running normally – both availability and consistency can be satisfied.
CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made.
Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE (Basically Available, Soft state, Eventual consistency) philosophy,
common in the NoSQL movement for example, choose availability over consistency.
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
Eventual consistency, also called optimistic replication, is widely deployed in distributed systems, and has origins in early mobile computing projects.
A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence.
Eventual consistency is a weak guarantee – most stronger models, like linearizability are trivially eventually consistent, but a system that is merely eventually consistent does not usually fulfill these stronger constraints.
Eventually-consistent services are often classified as providing BASE semantics, in contrast to traditional ACID (Atomicity, Consistency, Isolation, Durability) guarantees.
The CAP theorem was formally proved to be true by Seth Gilbert and Nancy Lynch of MIT in 2002.
In distributed systems, however, it is very likely that there will be network partitioning, and that at some point,
machines will fail and cause others to become unreachable. Networking issues such as packet loss or
high latency are nearly inevitable and have the potential to cause temporary partitions.
This leads the conclusion that a distributed system must do its best to
continue operating in the face of network partitions (i.e. be partition tolerant),
leaving only two real options to compromise on: availability and consistency.
In 2012 Brewer provided an update of the CAP perspective, describing the 'pick two' axiom as somewhat misleading,
due to avances in partition technology.
NoSQL
The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different
from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve.
Good intro at http://www.nosql-database.org/.
Key-value stores, column stores, document stores, graph databases, object databases, XML databases, multi-model databases
- Key-value stores: data items are keys that have a set of attributes. All data relevant to a key is stored with the key; data is frequently duplicated.
Popular key-value stores include Amazon’s Dynamo DB, Riak, and Voldemort. Additionally, many popular caching technologies act as key-value
stores, including Oracle Coherence, Redis, and MemcacheD.
- Column stores such Google’s Bigtable served as the inspiration for implementations including Cassandra, Accumulo, and Apache Hadoop’s HBase.
- Document stores use the complete document as basic unit of storage, often in a format such as JSON, XML, or YAML. Popular document stores include MongoDB and CouchDB.
- Graph databases represent data as a graph—a network of nodes and edges that connect the nodes. Both nodes and edges can have properties. Because they give heightened importance to relationships, graph databases such as Neo4J, JanusGraph, and DataStax Graph have proven popular for building social networking and semantic web applications.
- Object databases store data in terms of objects as understood from the discipline of object-oriented programming. This makes it straightforward to use these databases from object-oriented applications. Object databases such as db4o and InterSystems Caché allow you to avoid techniques like stored procedures and objectrelational mapping (ORM) tools.
- XML databases are a special form of document databases, optimized for working with data in the eXtensible Markup Language
(XML). They include BaseX and eXist.
- Multi-model databases support more than one of the above styles, based on a primary underlying database (most often a relational, key-value or column store) and exposing additional models as APIs on top of that underlying database. Examples include Microsoft Azure Cosmos DB,
which exposes document, wide column and graph APIs on top of a key-value store, and DataStax Enterprise, which offers a graph API on top of Cassandra’s wide column model.
Sharding and shared-nothing architecture
Sharding could be termed a kind of shared-nothing architecture that’s specific to databases. A shared-nothing architecture is one in which there is no centralized (shared) state, but each node in a distributed system is independent, so there is no client contention for shared resources. The term was first coined by Michael Stonebraker at the University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.”
Shared-nothing architecture was popularised by Google's Bigtable database and its MapReduce implementation that do not share state, and are therefore capable of near-infinite scaling.
Lucene and Solr
- Lucene - for full text indexing and searching
- Wikipedia - Lucene
- Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking,
hit highlighting and analysis/tokenization capabilities
- Also recognized for its utility in the implementation of Internet search engines and local, single-site searching
- In 2010, the Apache Solr search server joined as a Lucene sub-project, merging the developer communities.
- Solr - search server built using Lucene Core
Google's LevelDB
- LevelDB - fast key-value storage library written at Google that provides an ordered mapping from string keys to string values
Apache CouchDB
Couch is an acronym for cluster of unreliable commodity hardware. It was first released in 2005 and became an Apache Software Foundation project in 2008.
Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang. It uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.
CouchDB differs from others by accepting eventual consistency, as opposed to putting absolute consistency ahead of raw availability, like RDBMS or Paxos.
Unlike a relational database, a CouchDB database does not store data and relationships in tables. Instead, each database is a collection of independent documents. Each document maintains its own data and self-contained schema. An application may access multiple databases, such as one stored on a user's mobile phone and another on a server. Document metadata contains revision information, making it possible to merge any differences that may have occurred while the databases were disconnected.
At the heart of CouchDB is a powerful B-tree storage engine. A B-tree is a sorted data structure that allows for searches, insertions, and deletions in logarithmic time. CouchDB uses MapReduce to compute the results of a view. MapReduce makes use of two functions, “map” and “reduce”, which are applied to each document in isolation. Being able to isolate these operations means that view computation lends itself to parallel and incremental computation. More important, because these functions produce key/value pairs, CouchDB is able to insert them into the B-tree storage engine, sorted by key.
Documents in CouchDB are versioned, and if you want to change a value in a document, you create a new version of that document so you end up with two versions of the same document, one old and one new. How does this offer an improvement over locks? Consider a set of requests wanting to access a document. The first request reads the document. While this is being processed, a second request changes the document. Since the second request includes a completely new version of the document, CouchDB can simply append it to the database without having to wait for the read request to finish. When a third request wants to read the same document, CouchDB will point it to the new version that has just been written. During this whole process, the first request could still be reading the original version. +++A read request will always see the most recent snapshot of your database at the time of the beginning of the request.+++
CouchDB implements a form of multiversion concurrency control (MVCC) so it does not lock the database file during writes. Conflicts are left to the application to resolve. Resolving a conflict generally involves first merging data into one of the documents, then deleting the stale one.
Other features include document-level ACID semantics with eventual consistency, (incremental) MapReduce, and (incremental) replication. One of CouchDB's distinguishing features is multi-master replication, which allows it to scale across machines to build high-performance systems.
What happens when you change the same document in two different databases and want to synchronize these with each other? CouchDB’s replication system comes with automatic conflict detection and resolution. When CouchDB detects that a document has been changed in both databases, it flags this document as being in conflict. When two versions of a document conflict during replication, the winning version is saved as the most recent version in the document’s history. Instead of throwing the losing version away, CouchDB saves this as a previous version in the document’s history, so that you can access it if you need to. This happens automatically and consistently, so both databases will make exactly the same choice. It is up to you to handle conflicts in a way that makes sense for your application. You can leave the chosen document versions in place, revert to the older version, or try to merge the two versions and save the result.
- CouchDB - Apache - document oriented
MongoDB
- MongoDB - document oriented
- Stores data in a BSON (Binary JSON) representation which extends JSON (JavaScript Object Notation) to include additional types
such as int, long, date, floating point, and decimal128
- Mongo’s 'replica set' is a primary/secondary scheme in which the primary can be replaced by an automated leader
election process
- Fundamental difference with relational databases is that the MongoDB query model is implemented as methods or functions within the API of a specific programming language, as opposed to a completely separate language like SQL
- Mongo shell is a rich, interactive JavaScript shell included with all MongoDB distributions
- MongoDB Compass is a GUI
- Cloud version is Mongo DB Atlas
- Ships with four storage engines all of which can coexist within a single MongoDB replica
- default WiredTiger storage engine
- encrypted storage engine, requires MongoDB Enterprise Advanced
- in-memory storage engine for real time analytics, requires MongoDB Enterprise Advanced
- MMAPv1 engine, an improved version of the storage engine used in pre-3.x MongoDB releases
- for authentication MongoDB offers integration with LDAP, Windows Active Directory, Kerberos, and x.509 certificates
- for authorisation roles can be defined in MongoDB, or centrally within an LDAP server
Apache Cassandra
Cassandra basics
The Cassandra database is a shared-nothing architecture, as it has no central controller
and no notion of primary/secondary replicas; all of its nodes are the same.
Its data model can be described as a partitioned row store, in which
data is stored in sparse multidimensional hashtables.
- Sparse means that for any given row you can have one or more columns, but each row doesn’t need to have
all the same columns as other rows like it (as in a relational model).
- Partitioned means that each row has a unique key which makes its data accessible, and the
keys are used to distribute the rows across multiple data stores.
Somewhat confusingly, this type of data model is also frequently referred to as a wide column store.
'A Decentralized Structured Storage System'
by Facebook’s Lakshman and Malik was a central paper on Cassandra. It was initially developed at Facebook to power
the Facebook inbox search feature.
DataStax: in terms of the CAP theorem, Cassandra can be tuned to act more like a CP (consistent and partition tolerant)
or AP (highly available and partition tolerant) system. Note that it is not possible to "tune" Cassandra into
a completely CA system.
Many of the early production deployments of Cassandra involved storing user activity
updates, social network usage, recommendations/reviews, and application
statistics. These are strong use cases for Cassandra because they involve lots of
writing with less predictable read operations, and because updates can occur unevenly
with sudden spikes. The ability to handle application workloads
that require high performance at significant write volumes with many concurrent
client threads is one of the primary features of Cassandra.
Apache Cassandra
Cassandra technicalities
- Distributed, wide column store, NoSQL system designed to handle large amounts of data across commodity servers
- All of the nodes in a Cassandra cluster function exactly the same
- Offers support for clusters spanning multiple datacenters, with asynchronous masterless replication
- Open source, distributed, decentralized, elastically scalable,
highly available, fault-tolerant, tuneably consistent, row-oriented database.
Cassandra bases its distribution design on Amazon’s Dynamo and its data model
on Google’s Bigtable, with a query language similar to SQL.
- Uses a peer-to-peer architecture and gossip protocol to maintain and keep in sync a list of nodes
that are alive or dead.
- Clustes, key spaces and consistency:
- A cluster is a container for keyspaces. A keyspace is the outermost container for
data in Cassandra, corresponding closely to a database in the relational model.
A key space is a container for tables
.
- Companies that use both microservice architectures and Cassandra at large scale,
such as Netflix, are known to use dedicated clusters per service
- A table is a container for an ordered collection of rows, each of which is itself an
ordered collection of columns. Rows are organized in partitions and assigned to
nodes in a Cassandra cluster according to the column(s) designated as the partition
key. The ordering of data within a partition is determined by the clustering
columns.
.
- When you write data to a table in Cassandra, you specify values for one or more
columns. That collection of values is called a row. You must specify a value for
each of the columns contained in the primary key as those columns taken together
will uniquely identify the row.
- Uses a special type of primary key called a composite key to represent groups of related rows,
also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns.
The partition key determines the nodes on which rows are stored and
can itself consist of multiple columns. The clustering columns are used to control
how data is sorted for storage within a partition.
- Also supports static columns for storing data that is not part of the primary key but is shared by
every row in a partition.
- Trades some consistency in order to achieve total availability.
Cassandra is more accurately termed 'tuneably consistent' which means it allows
to decide the level of consistency required, in balance with the level of availability
- Dynamo and Cassandra choose to be always writable, opting to defer the complexity
of reconciliation to read operations, and realize tremendous performance
gains. The alternative is to reject updates amidst network and server failures.
- The replication factor is set per key space and lets you decide how much you want to pay in performance
to gain more consistency. You set the replication factor to the number of nodes in
the cluster you want the updates to propagate to
- The consistency level is set by the client on every query and allows to decide how many replicas in the
cluster must acknowledge a write operation or respond to a read operation in order to be considered successful.
- Cassandra provides the ability to achieve strong consistency by specifying sufficiently high consistency levels on
writes and reads. However, strong consistency is not enough to prevent race conditions
in cases where clients need to read, then write data.
Therefore, since Cassandra 2.0 a lightweight transaction (LWT) mechanism is included, based on Paxos
Further
MarkLogic
Cache management with atomicity: see Redis, https://en.wikipedia.org/wiki/Redis.
ElasticSearch
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java.
NewSQL
In 2012 two key papers were produced regarding new forms of SQL databases.
These are the Calvin transaction protocol paper from Yale and the Google Spanner paper. FaunaDB is an example of the first,
Google Cloud Spanner, Cockroach DB and YugaDB are examples of the second.