What is a Datadot
Datadots, commits, subdots, branches, pushing & pulling.
Want to try the examples here without installing Dotmesh? Try running the commands in our online learning environment.
Datadots
A Datadot allows you to capture your application’s state and treat it like a git
repo.
A simple example is to start a PostgreSQL container using a Datadot called myapp
:
docker run -d --volume-driver dm \
-v myapp:/var/lib/postgresql/data --name postgres postgres:9.6.6
This creates a datadot called myapp
, creates the writeable filesystem for the default master
branch in that datadot, mounts the writeable filesystem for the master
branch into /var/lib/postgresql/data
in the postgres
container, and starts the postgres
container, like this:
First, switch to it, which, like cd
‘ing into a git repo, makes it the “current” dot – the dot which later dm
commands will operate on by default:
dm switch myapp
You can then see the dm list
output:
dm list
DOT BRANCH SERVER CONTAINERS SIZE COMMITS DIRTY
* myapp master a1b2c3d /postgres 40.82 MiB 0 40.82 MiB
The current branch is shown in the BRANCH
column and the current dot is marked with a *
in the dm list
output.
For more information on what all of the columns mean, see the CLI Reference.
Commits
You can commit a datadot by running:
dm commit -m "empty state"
This creates a commit: a point-in-time snapshot of the state of the filesystem on the current branch for the current dot.
Suppose PostgreSQL then writes some data to the docker volume. You can then capture this new state in another commit with:
dm commit -m "some data"
There will then be two commits, frozen point-in-time snapshots, that were created from the state of the master
branch at the point in time when they were created:
You can confirm this in the output of:
dm log
commit 7f8c7cb6-c925-44b4-5a65-bcbf05a1da39
Author: admin
Date: 1517055060834886217
empty state
commit 435a520f-d01e-4bda-70e0-fc42e2043634
Author: admin
Date: 1517055069443640226
some data
Consistency
Commits are made immediately and atomically: they are “consistent snapshots” in the sense used in the PostgreSQL documentation.
It’s safe to create a commit while a database is running as long as the database supports recovering from a power outage.
Rollback
Given the example above, you can roll back to the first commit with:
dm reset --hard HEAD^
HEAD^
means “one commit before the latest commit on the current branch”.
You can also do dm log
and refer to commits by id.
Note that rolling back stops the containers using a branch before the rollback, and starts them again afterwards. Otherwise, the database would be confused by its data directory changing “under its feet”.
Note also that a rollback is destructive – the commits after the commit that is rolled back to are irretrievably destroyed.
Subdots
Microservices applications often have more than one stateful component, e.g. databases, caches and queues. A datadot can capture all of those states in a single, atomic and consistent commit.
A datadot should be named after your application: myapp2
.
Assume that your app has an orders
service with an orders-db
, and a catalog
service with a catalog-db
.
In this case, good names for your subdots would be myapp.orders-db
and myapp.catalog-db
.
The .
character is used to separated the dot name from the subdot name.
Example Docker Compose syntax would be:
version: '3'
services:
orders-db:
image: mongo:3.4.10
hostname: orders-db
volumes:
- myapp.orders-db:/data/db
catalog-db:
image: mysql:5.6.39
environment:
- MYSQL_ROOT_PASSWORD=secret
hostname: catalog-db
volumes:
- myapp.catalog-db:/var/lib/mysql
volumes:
myapp.orders-db:
driver: dm
myapp.catalog-db:
driver: dm
For more information on using Docker Compose with Dotmesh, see the Docker Compose docs.
Starting the above Docker Compose file would create a dot with the following structure:
You can think of subdots as different “partitions” of the master branch’s writeable filesystem, in the sense that they divide it up, so that different containers can use different independent parts of it.
Commits and branches of a datadot apply to the entire datadot, not specific subdots. This means that your datadot commits can represent snapshots of the state of your entire application, not the individual data services, like this:
dm switch myapp
dm commit -m "two empty dbs"
Then some data is written to both databases by the app, then you can capture them together atomically:
dm commit -m "data in two dbs"
The resulting dot structure is:
See the subdots tutorial for a more complete example.
Branches
Just like git
, you can make a branch from a commit on a datadot.
You can checkout a branch and create it at the same time with:
dm checkout -b bug-16637
Then suppose you make some changes to the current dot by interacting with the app, which modifies its databases. You can then capture these changes:
dm commit -m "Reproducer for bug 16637"
Finally, you can go back to the original master
branch:
dm checkout master
When switching branches on a dot, containers that are using the dot are stopped, the branch is switched out underneath them, and then the containers are started again.
If you want to disable the container stopping and starting behavior, you can pin a branch for a mount by specifying dot@branch
rather than just dot
when specifying the dot name.
The following commands:
dm commit -m "A"
dm commit -m "B"
dm checkout -b newbranch
dm commit -m "C"
dm checkout master
dm commit -m "D"
dm checkout newbranch
dm commit -m "E"
Would create the following dot structure:
Note that the postgres container in this example is using the writeable filesystem of newbranch
– that is because at the end of the commands newbranch
was the current branch, the latest one that was checked out.
Running a further dm checkout master
would switch the postgres
container over to the master
branch.
Branches work just fine with subdots too:
In which case each writeable filesystem and each commit just has multiple data stores in it.
When multiple data stores are captured in a commit, the commit is atomic across all of them.
Pushing
You can get more out of dotmesh by sharing your dots with others – either other users, or systems like a CI system. In order to facilitate this sharing, you can push the commits on a branch to a hub.
To achieve this we suggest you install dotmesh on a server which is accessible by all who need to use it and use that.
If your username is alice
and you want to make commit E
on newbranch
from the example above available to others, first log into the hub:
dm remote add hub alice@<server-hostname>
Then push the branch to the hub:
dm push hub myapp newbranch
This will push all the commits (including commits on branches that a non-master branch depends on) necessary to get the latest commit on newbranch
up to the hub:
B is the base commit for branch newbranch, so, first the commits on the master branch up to and including B are pushed, then commits C and E are transferred to get the hub up to date with the latest commit on newbranch
.
Only the differences between one commit and the next are sent in a push or a clone/pull – so it’s as efficient as possible.
Cloning & pulling
The opposite of pushing is cloning & pulling. Cloning is for the first time you pull down a dot. Pulling is for updating it later with more commits.
If you, bob
, have collaborator access to a colleague alice
’s dot myapp
, you can clone it with:
dm clone hub alice/myapp newbranch
Later, if alice
pushes new commits, you can pull them into your local sockshop
dot with:
dm pull hub myapp
Note that when pulling, you give the local name myapp
.
You can see how the default upstream dot is configured by running:
dm dot show myapp
For more details, see the CLI reference.
Further reading
- See also: Hello Dotmesh Tutorial
- See also: Docker Compose Guide