Concepts

What is a Datadot

Datadots, commits, subdots, branches, pushing & pulling.

Want to try the examples here without installing Dotmesh? Try running the commands in our online learning environment.

Datadots

A Datadot allows you to capture your application’s state and treat it like a git repo.

A simple example is to start a PostgreSQL container using a Datadot called myapp:

docker run -d --volume-driver dm \
    -v myapp:/var/lib/postgresql/data --name postgres postgres:9.6.6

This creates a datadot called myapp, creates the writeable filesystem for the default master branch in that datadot, mounts the writeable filesystem for the master branch into /var/lib/postgresql/data in the postgres container, and starts the postgres container, like this:

myapp dot with master branch and postgres container's /data volume attached

First, switch to it, which, like cd‘ing into a git repo, makes it the “current” dot – the dot which later dm commands will operate on by default:

dm switch myapp

You can then see the dm list output:

dm list
  DOT      BRANCH  SERVER   CONTAINERS  SIZE       COMMITS  DIRTY
* myapp    master  a1b2c3d  /postgres   40.82 MiB  0        40.82 MiB

The current branch is shown in the BRANCH column and the current dot is marked with a * in the dm list output.

For more information on what all of the columns mean, see the CLI Reference.

Commits

You can commit a datadot by running:

dm commit -m "empty state"

This creates a commit: a point-in-time snapshot of the state of the filesystem on the current branch for the current dot.

Suppose PostgreSQL then writes some data to the docker volume. You can then capture this new state in another commit with:

dm commit -m "some data"

There will then be two commits, frozen point-in-time snapshots, that were created from the state of the master branch at the point in time when they were created:

two commits on the master branch

You can confirm this in the output of:

dm log
commit 7f8c7cb6-c925-44b4-5a65-bcbf05a1da39
Author: admin
Date: 1517055060834886217

    empty state

commit 435a520f-d01e-4bda-70e0-fc42e2043634
Author: admin
Date: 1517055069443640226

    some data

Consistency

Commits are made immediately and atomically: they are “consistent snapshots” in the sense used in the PostgreSQL documentation.

It’s safe to create a commit while a database is running as long as the database supports recovering from a power outage.

Rollback

Given the example above, you can roll back to the first commit with:

dm reset --hard HEAD^

HEAD^ means “one commit before the latest commit on the current branch”. You can also do dm log and refer to commits by id.

Note that rolling back stops the containers using a branch before the rollback, and starts them again afterwards. Otherwise, the database would be confused by its data directory changing “under its feet”.

Note also that a rollback is destructive – the commits after the commit that is rolled back to are irretrievably destroyed.

Subdots

Microservices applications often have more than one stateful component, e.g. databases, caches and queues. A datadot can capture all of those states in a single, atomic and consistent commit.

A datadot should be named after your application: myapp2.

Assume that your app has an orders service with an orders-db, and a catalog service with a catalog-db.

In this case, good names for your subdots would be myapp.orders-db and myapp.catalog-db.

The . character is used to separated the dot name from the subdot name.

Example Docker Compose syntax would be:

version: '3'
services:
  orders-db:
    image: mongo:3.4.10
    hostname: orders-db
    volumes:
     - myapp.orders-db:/data/db
  catalog-db:
    image: mysql:5.6.39
    environment:
     - MYSQL_ROOT_PASSWORD=secret
    hostname: catalog-db
    volumes:
     - myapp.catalog-db:/var/lib/mysql

volumes:
  myapp.orders-db:
    driver: dm
  myapp.catalog-db:
    driver: dm

For more information on using Docker Compose with Dotmesh, see the Docker Compose docs.

Starting the above Docker Compose file would create a dot with the following structure:

a dot with an orders-db and catalog-db subdots

You can think of subdots as different “partitions” of the master branch’s writeable filesystem, in the sense that they divide it up, so that different containers can use different independent parts of it.

Commits and branches of a datadot apply to the entire datadot, not specific subdots. This means that your datadot commits can represent snapshots of the state of your entire application, not the individual data services, like this:

dm switch myapp
dm commit -m "two empty dbs"

Then some data is written to both databases by the app, then you can capture them together atomically:

dm commit -m "data in two dbs"

The resulting dot structure is:

a dot with an orders-db and catalog-db subdots showing two commits which capture the entire dot, not the individual subdot - so the commits are of multiple databases simultaneously

See the subdots tutorial for a more complete example.

Branches

Just like git, you can make a branch from a commit on a datadot. You can checkout a branch and create it at the same time with:

dm checkout -b bug-16637

Then suppose you make some changes to the current dot by interacting with the app, which modifies its databases. You can then capture these changes:

dm commit -m "Reproducer for bug 16637"

Finally, you can go back to the original master branch:

dm checkout master

When switching branches on a dot, containers that are using the dot are stopped, the branch is switched out underneath them, and then the containers are started again.

If you want to disable the container stopping and starting behavior, you can pin a branch for a mount by specifying dot@branch rather than just dot when specifying the dot name.

The following commands:

dm commit -m "A"
dm commit -m "B"
dm checkout -b newbranch
dm commit -m "C"
dm checkout master
dm commit -m "D"
dm checkout newbranch
dm commit -m "E"

Would create the following dot structure:

a dot with commits A and B on master, a branch newbranch from B going to C and E, and a later commit (on the other side of the fork) D on master. two writeable filesystems, and the postgres container using the writeable filesystem of newbranch

Note that the postgres container in this example is using the writeable filesystem of newbranch – that is because at the end of the commands newbranch was the current branch, the latest one that was checked out. Running a further dm checkout master would switch the postgres container over to the master branch.

Branches work just fine with subdots too:

the same branching structure as above, but this time with subdots - each writeable filesystem and commit now has two databases in it

In which case each writeable filesystem and each commit just has multiple data stores in it.

When multiple data stores are captured in a commit, the commit is atomic across all of them.

Pushing

You can get more out of dotmesh by sharing your dots with others – either other users, or systems like a CI system. In order to facilitate this sharing, you can push the commits on a branch to a hub.

To achieve this we suggest you install dotmesh on a server which is accessible by all who need to use it and use that.

If your username is alice and you want to make commit E on newbranch from the example above available to others, first log into the hub:

dm remote add hub alice@<server-hostname>

Then push the branch to the hub:

dm push hub myapp newbranch

This will push all the commits (including commits on branches that a non-master branch depends on) necessary to get the latest commit on newbranch up to the hub:

pushing newbranch to a hub, showing that commits A, B, C and E are transferred to the hub

B is the base commit for branch newbranch, so, first the commits on the master branch up to and including B are pushed, then commits C and E are transferred to get the hub up to date with the latest commit on newbranch.

Only the differences between one commit and the next are sent in a push or a clone/pull – so it’s as efficient as possible.

Cloning & pulling

The opposite of pushing is cloning & pulling. Cloning is for the first time you pull down a dot. Pulling is for updating it later with more commits.

If you, bob, have collaborator access to a colleague alice’s dot myapp, you can clone it with:

dm clone hub alice/myapp newbranch

Later, if alice pushes new commits, you can pull them into your local sockshop dot with:

dm pull hub myapp

Note that when pulling, you give the local name myapp. You can see how the default upstream dot is configured by running:

dm dot show myapp

For more details, see the CLI reference.

Further reading