References

Command Line Reference

Drive Dotmesh from your shell

Let’s take a look at the dm command-line tool in more detail. This is a reference guide - a tutorial is where to go if you want a quick start guide. We’re also going to assume you’re familiar with Dotmesh concepts here. As a reference, it’s written so you can dive straight into the heading for the command you want - but it’s also been laid out in a suitable order for reading top-to-bottom, should you want to become a Dotmesh command-line expert and impress your friends.

How to read the examples.

In the examples given in this guide, anything in CAPITALS is something you need to replace with your own text. For instance, if you’re told to type dm commit -m 'MESSAGE', that means you need to put your own message in instead of MESSAGE. Anything in [square brackets] is optional; the text will describe the consequences of missing it out. Lists of things separated|by|vertical|pipes indicate that you get to choose one of them.

Examples look like this:

$ cat hello.txt # Text YOU type looks like this
Hello! Text the computer echoes back to you looks like this.

If any bits of the output need calling out, we'll highlight them
like this.

Basics.

If you just type dm on its own, it will give you basic command-line help. dm uses the “subcommands” pattern, where one command provides lots of different functions through different subcommands given on the command line. For instance, dm version will print out the versions of Dotmesh components. Some subcommands have subcommands of their own (for instance, dm dot delete to delete a Dot).

You can get further information about any command by typing dm COMMAND --help.

dm.

$ dm
dotmesh (dm) is like git for your data in Docker.

This is the client. Configure it to talk to a dotmesh cluster with 'dm remote
add'. Create a dotmesh cluster with 'dm cluster init'.

Usage:
  dm [command]

Available Commands:
  branch      List branches
  checkout    Switch or make branches
  clone       Make a complete copy of a remote dot
  cluster     Install a dotmesh server on a docker host, creating or joining a cluster
  commit      Record changes to a dot
  debug       Make API calls
  dot         Manage dots
  init        Create an empty dot
  list        Enumerate dots on the current remote
  log         Show commit logs
  pull        Pull new commits from a remote dot to a local copy of that dot
  push        Push new commits from the current dot and branch to a remote dot (creating it if necessary)
  remote      List remote clusters. Use dm remote -v to see remotes
  reset       Reset current HEAD to the specified state
  s3          Commands that handle S3 connections
  switch      Change which dot is active
  version     Show the Dotmesh version information

Flags:
  -c, --config string   Config file to use (default "~/.dotmesh/config")
      --verbose         Display details of RPC requests and responses to the dotmesh server

Use "dm [command] --help" for more information about a command.

dm version --help.

$ dm version --help
Show the Dotmesh version information

Usage:
  dm version [flags]

Global Flags:
  -c, --config string   Config file to use (default "~/.dotmesh/config")
      --verbose         Display details of RPC requests and responses to the dotmesh server

The configuration file.

dm stores some local state in a config file. You never need to edit this directory - dm will manage it for you. By default, it’s located in $HOME/.dotmesh/config, but all dm subcommands accept a -c PATH or --config PATH flag, to make dm use a different config file.

Verbose mode

You can view the contents of the RPC requests between the dm client and the dotmesh server by using the --verbose flag. It will print the contents of the JSON request and reponse body to standard out.

Connecting to clusters.

dm communicates to Dotmesh clusters using the Dotmesh API. In order to do anything interesting, it needs a username, a hostname to connect to, and an API key to use. These login details for a cluster are called a “remote”, and a list of remotes is stored in the configuration file.

One of the remotes in the config file is marked as the “current remote”. That’s the one dm will use, until told otherwise.

If you create a local cluster using dm cluster init, then the username will always be admin; the admin user is created by dm cluster init and automatically saved under a remote called local which is current to begin with, so dm commands will just work out of the box. But if you need to connect to an existing cluster, or you want to use the dm command directly against the Hub, you’re going to need to add a remote yourself.

The following commands are for managing the list of remotes stored in your local configuration file.

Add a new remote: dm remote add NAME USER@HOSTNAME[:PORT].

$ dm remote add test [email protected]
API key: Paste your API key here, it won't be echoed!

Remote added.

You can optionally specify a port if your remote cluster is not running on the default port (`32607`). You can get your Hub API key by browising to the [Settings/API Key](https://saas.dotmesh.io/ui/settings/apikey) page on the Hub. To get the admin API key for a Kubernetes Dotmesh cluster, get the `dotmesh-api-key.txt` key from the `dotmesh` secret in the `dotmesh` namespace:
$ kubectl examine secret dotmesh -n dotmesh -o yaml
apiVersion: v1
data:
  dotmesh-admin-password.txt: Y29ycmVjdGhvcnNlYmF0dGVyeXN0YXBsZQo=
  dotmesh-api-key.txt: VlZLR1lDQzNHNEs1RzJRTTNHTElWVEVDVlNCV1dKWkQK
kind: Secret
metadata:
  creationTimestamp: 2018-01-17T15:03:11Z
  name: dotmesh
  namespace: dotmesh
  resourceVersion: "418"
  selfLink: /api/v1/namespaces/dotmesh/secrets/dotmesh
  uid: 88c31d8b-fb97-11e7-b1fe-0242cd52be10
type: Opaque
$ echo VlZLR1lDQzNHNEs1RzJRTTNHTElWVEVDVlNCV1dKWkQK | base64 -d
VVKGYCC3G4K5G2QM3GLIVTECVSBWWJZD

The admin API key from an existing Docker-based cluster created with dm cluster init can be found from the Dotmesh config file where dm cluster init was run, with the following command:

$ cat ~/.dotmesh/config | jq -r .Remotes.local.ApiKey
VVKGYCC3G4K5G2QM3GLIVTECVSBWWJZD

For S3 remotes please see dm s3 remote add

List remotes: dm remote -v

$ dm remote -v
  hub   alaric@otherserver
  test  [email protected]
* local [email protected]

All the remotes in the config file are listed, one per line. Each line has the name of the remote, followed by the username and the hostname in USER@HOSTNAME form. The API keys are not printed out.

Note that the current remote is marked with a * at the start of the line.

Remove a remote: dm remote rm NAME.

$ dm remote -v
  hub   alaric@otherserver
  test  [email protected]
* local [email protected]
$ dm remote rm test
$ dm remote -v
  hub   alaric@otherserver
* local [email protected]

Select the current remote: dm remote switch NAME.

$ dm remote -v
  hub   alaric@otherserver
* local    [email protected]
$ dm remote switch hub
$ dm remote -v
* hub  alaric@otherserver
  local [email protected]

Comparing client and remote versions: dm version.

$ dm version
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

Client:
    Version: release-0.1.0
Server:
    Version: release-0.1.0

Dot management.

The current dot.

You often need to perform lots of operations on a single dot, so rather than specifying the name of the dot in every command, each remote in the config file has a “current dot”. That means that if you switch remotes, the current dot will change, and will change back if you return to the original remote. The current dot for each remote is stored in the configuration file.

List the available dots: dm list [-H|--scripting].

$ dm list
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

  DOT             BRANCH  SERVER            CONTAINERS  SIZE       COMMITS  DIRTY
* important_data  master  504954d09db78174              19.00 kiB  0        19.00 kiB
  test_data       master  504954d09db78174              19.00 kiB  0        19.00 kiB

Note that the current dot is marked with a * at the start of the line. The fields are:

  • The dot name.
  • The currently selected branch on that dot.
  • The ID of the server that’s currently managing that dot.
  • The names of any containers currently using this dot.
  • The size of the dot.
  • How many commits have been made on this branch of the dot.
  • How much data has been generated or modified since the last commit.

If you’re writing a script, you can also obtain this information in a more parseable format (without headings or prettification of numbers, and with a single tab between each field) using dm list -H or dm list --scripting - but it might be easier to use the API if you’re doing anything more complicated.

$ dm list -H
important_data  master  504954d09db78174        19456   0   19456
test_data   master  504954d09db78174        19456   0   19456

Select a different current dot: dm switch DOT.

Remember, each remote has a different list of dots - so the current dot is particular to each remote.

$ dm list
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

  DOT             BRANCH  SERVER            CONTAINERS  SIZE       COMMITS  DIRTY
* important_data  master  504954d09db78174              19.00 kiB  0        19.00 kiB
  test_data       master  504954d09db78174              19.00 kiB  0        19.00 kiB
$ dm switch test_data
$ dm list
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

  DOT             BRANCH  SERVER            CONTAINERS  SIZE       COMMITS  DIRTY
  important_data  master  504954d09db78174              19.00 kiB  0        19.00 kiB
* test_data       master  504954d09db78174              19.00 kiB  0        19.00 kiB

Create an empty dot: dm init DOT.

$ dm init staging_data
$ dm list
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

  DOT             BRANCH  SERVER            CONTAINERS  SIZE       COMMITS  DIRTY
  important_data  master  504954d09db78174              19.00 kiB  0        19.00 kiB  
* staging_data    master  504954d09db78174              19.00 kiB  0        19.00 kiB
  test_data       master  504954d09db78174              19.00 kiB  0        19.00 kiB  

A newly created dot has no subdots, but it starts off with a small amount of “dirty” data because basic filesystem metadata has been created.

Delete a dot: dm dot delete [-f|--force] DOT.

You will be prompted for confirmation, unless you specify the -f or --force flag.

$ dm dot delete staging_data
Please confirm that you really want to delete the dot staging_data, including all
branches and commits? (enter Y to continue): Y

Examine a dot: dm dot show [-H|--scripting] DOT.

$ dm dot show test_data
Dot admin/test_data:
Master branch ID: e05cf6bf-46b9-4e34-6e08-01bc9f323a72
Dot is current.
Dot size: 19.00 kiB (19.00 kiB dirty)
Branches:
* master
Tracks dot alaric/test_data on remote hub

The results show:

  • The full name of the dot, including a namespace.
  • The master branch ID, which isn’t something you generally need when using the command line, but is useful for debugging your API apps.
  • If this dot is the current dot, it will display Dot is current.
  • The size of the dot, and the amount of generated/modified “dirty” data since the last snapshot.
  • The list of all the branches of the dot, with the current branch marked with a *.
  • The default upstream dot on each remote that has one configured for this dot.

You can get all that data in a form more amenable to scripting with the -H or --scripting option:

$ dm dot show --scripting test_data
namespace   admin
name    test_data
masterBranchId  e05cf6bf-46b9-4e34-6e08-01bc9f323a72
current
size    19456
dirty   19456
currentBranch   master
branch  master
defaultUpstreamDot  hub alaric/test_data

Transferring dots.

When you clone a dot from the Hub or another cluster, dm stores the assocation between your local dot and the original remote dot in the configuration file.

Likewise, if you push a dot to another cluster, or pull updates to it from another cluster, dm will remember that association if it didn’t already have one for that remote.

Each dot may have a “default upstream dot” for each remote in your configuration. There can’t be two default upstreams of a dot on any remote, but there might be none!

The list of upstream dots for a dot can be viewed with dm dot show DOT. Upstream dots may be assigned or re-assigned with dm dot set-upstream [DOT] REMOTE REMOTE-DOT.

These commands can be a little confusing, because they involve two remotes at once. There is always a current remote selected with dm remote switch that is the “target” of your commands; that’s the “local cluster” from the perspective of these commands. The command line for transfer commands always names a second remote, which is the “remote cluster” we are transferring dots to and from.

Clone: dm clone [--local-name LOCAL-DOT] [--stash-on-divergence] REMOTE DOT BRANCH

In this example, we’ll clone the dot alice/testing_data from the Hub, and call it new_data locally.

$ dm clone --local-name new_data hub alice/testing_data
Pulling admin/new_data from hub:alice/testing_data
Calculating...
finished 9.50 KB / 9.50 KB [==========================] 100.00% 0.43 MiB/s (1/1)
Done!

If we run dm dot show on new_data, we’ll see that alice/testing_data is the default upstream dot for it on hub:

$ dm dot show new_data
Dot admin/new_data:
Master branch ID: c78bb46e-0d52-43e9-70bc-f2b78ace0f9d
Dot size: 19.00 kiB (all clean)
Branches:
* master
Tracks dot alice/test_data on remote hub

If you omit the --local-name LOCAL-DOT part, then the dot will just have the same name as the remote one - in this case, testing_data.

If you use --stash-on-divergence and: 1. Your local history contains all of the remote cluster’s commits plus extra, no error will occur but nothing will change. OR 1. Both clusters contain additional commits, the additional commits locally will be put on a branch and the remote commits will be pulled down OR 1. You have dirty data, but the same history on both ends apart from that, a commit will be generated locally.

If you are cloning an S3 bucket and only want to select a subset of the files, please see dm s3 clone-subset

Pull: dm pull REMOTE [DOT [BRANCH]] [--remote-name REMOTE-DOT] [--stash-on-divergence]

This command pulls new commits and branches from a remote dot into your local cluster.

If you only specify a REMOTE name, then it will attempt to pull updates to all branches of the current dot from that remote. If you have specified --remote-name REMOTE-DOT, it will pull from REMOTE-DOT on the remote cluster. If not, and there is a default upstream dot for that remote, it will pull from that dot. Otherwise, it will pull from a dot with the same name as the current dot on your local cluster, in the namespace corresponding to your username on the remote cluster (eg, your Hub username).

If you specify a REMOTE name and a DOT name, then it will perform the same steps, but with the local dot being the one named rather than the current dot.

If you specify a REMOTE name, a DOT name and a BRANCH, then it will only pull new commits on the named branch, rather than trying to pull commits for every branch.

$ dm pull hub
Pulling admin/new_data from hub:alice/testing_data
Calculating...
finished 9.50 KB / 9.50 KB [==========================] 100.00% 0.43 MiB/s (1/1)
Done!

If you use --stash-on-divergence and: 1. The local cluster contains all of the remote cluster’s commits plus additional commits, no error will occur but nothing will change. OR 1. Both clusters contain additional commits, the additional commits on the local cluster will be put on a branch and the additional remote commits will be pulled down OR 1. You have dirty data, but the same history on both ends apart from that, a commit will be generated locally.

Push: dm push REMOTE [--remote-name DOT] [--stash-on-divergence]

This command pushes the current branch of the current dot to the specified REMOTE. If the destination dot already exists, local commits that aren’t present in the destination will be pushed up, bringing it up to date. If the destination does not already exists, it will be created and all the commits on the current branch (and other branches that the current branch depends upon) will be pushed up.

If --remote-name is specified, then that is the name of the destination dot on the remote cluster. Otherwise, if the current dot has a default upstream dot for that remote, that will be the destination dot. If not, the destination dot name will be the same as the current dot’s name, but in your user’s namespace on the remote.

$ dm push hub
Pushing admin/new_data to hub:alice/testing_data
Calculating...
finished 9.50 KB / 9.50 KB [==========================] 100.00% 0.38 MiB/s (1/1)
Done!

If you use --stash-on-divergence and: 1. The remote cluster contains all of the local cluster’s commits plus extra, no error will occur but nothing will change. OR 1. Both clusters contain additional commits, the additional commits on the remote cluster will be put on a branch and the local commits will be pushed OR 1. You have dirty data, but the same history on both ends apart from that, a commit will be generated locally.

Set the upstream dot: dm dot set-upstream [DOT] REMOTE REMOTE-DOT

You can set the upstream dot for any given remote using this command. If you omit the DOT, then the current dot is used.

$ dm dot set-upstream new_data production bob/test_data
$ dm dot show new_data
Dot admin/new_data:
Master branch ID: c78bb46e-0d52-43e9-70bc-f2b78ace0f9d
Dot size: 19.00 kiB (all clean)
Branches:
* master
Tracks dot alice/test_data on remote hub
Tracks dot bob/test_data on remote production

Using dots.

These commands deal with the contents of a dot: branches and commits.

Commit: dm commit -m 'MESSAGE' [--metadata fieldname=value].

This command takes the “dirty” changes to the current dot since the last commit (or the creation of the dot), and makes them into a new commit with the given MESSAGE.

$ dm commit -m "A well-written commit message"

You can also pass extra metadata fields that are added to the commit by using the --metadata flag. You can pass multiple metadata fields, each using the format: --metadata fieldname=value:

$ dm commit -m "A well-written commit message" --metadata fruit=apples --metadata color=red

List commits: dm log.

This command lists the commits on the current branch.

$ dm log
commit c96eefda-6940-499a-411c-22521f4a3452
author: admin
date: 1516898188388491967
fruit: apples
color: red

    A well-written commit message

commit e568407c-5ea3-42bc-48e8-6e375c121d2b
author: admin
date: 1516898511693726664
fruit: apples
color: red

    A poorly-written commit message


Note the commit IDs (highlighted) - they are needed to do a dm reset.

List the branches: dm branch.

This command lists the branches in the current dot.

$ dm branch
  version_1
* master

Note how the current branch is indicated with a leading *.

Create a branch: dm checkout -b BRANCH.

This command creates a new branch, starting with the current branch, and makes the new branch current.

$ dm checkout -b version_2

Switch branches: dm checkout BRANCH.

This command makes a different branch current. If there are running containers using this dot that haven’t been pinned to a specific branch, they will be stopped before the change and restarted afterwards, using the new branch.

$ dm checkout version_1

Roll back commits: dm reset [--hard] COMMIT.

This command rolls back the state of the current branch to a given commit ID (which must be from this branch!). To get the commit IDs, use dm log.

The command won’t let you roll back if there are uncommitted changes, unless you specify --hard to override it.

$ dm reset c96eefda-6940-499a-411c-22521f4a3452

Cluster management.

These commands are for managing a Dotmesh cluster built using Docker. If you’re using Kubernetes, you don’t need these commands - the Dotmesh Kubernetes integration handles all of this for you!

Create a cluster: dm cluster init [--port PORTNUM].

This command creates a new single-node Dotmesh cluster. You can force the cluster to be exposed on a specific port by specifying the port flag.

If a ZFS pool called pool already exists, it will be used for Dot storage. Otherwise, Dotmesh will default to creating a pool based on a file in /var/lib/dotmesh. The file will be ten gigibytes in size.

The newly-created cluster will be automically configured as a remote called local in your dm configuration file.

$ dm cluster init
Checking suitable Docker is installed... assuming post-semver Docker client is sufficient.
assuming post-semver Docker server is sufficient.
Checking dotmesh isn't running... done.
Pulling dotmesh-server docker image... done.
Registering new cluster... got URL:
https://discovery.dotmesh.io/da045bfb125bb69f7f55902ed0409494
Generating PKI assets... done.
If you want more than one node in your cluster, run this on other nodes:

    dm cluster join https://discovery.dotmesh.io/da045bfb125bb69f7f55902ed0409494:DYJNVRS2PNJVBTQ44P3KVAC7LWKV325X

This is the last time this secret will be printed, so keep it safe!

Guessing docker host's IPv4 address (should be routable from other cluster nodes)... got: 192.168.1.34,192.168.1.33,10.192.0.1,172.18.0.1,172.19.0.1.
Guessing unique name for docker host (using hostname, must be unique wrt other cluster nodes)... got: nixos.
Starting etcd... done.
Succeeded setting initial admin password to 'UMY5XI6WFMHKAMNO2HGWGN3MHQ74KMUH' - writing it to /home/alaric/.dotmesh/admin-password.txt
Configuring dm CLI to authenticate to dotmesh server /home/alaric/.dotmesh/config... done.
Starting dotmesh server... done.
Waiting for dotmesh server to come up...
done.

Note the join command, highlighted in the example above. Keep a copy of that - you can’t get it again, and you’ll need it if you want to add any more nodes to your cluster.

Join a cluster: dm cluster join DISCOVERY-URL.

This command sets up a Dotmesh node on your computer, and joins it into an existing cluster using the DISCOVERY-URL printed out when the original cluster was created.

If you specify a pool PATH, then files will be created in the directory pointed at by PATH to store the actual dots.

If, instead, you specify a ZFSPOOL, then the dots will be stored in the ZFS pool with that name, which you must have created yourself. Use this option if you have dedicated disk partitions for Dotmesh to use.

If you specify neither, then Dotmesh will default to creating a pool directory in /var/lib/dotmesh.

The cluster will be automically configured as a remote called local in your dm configuration file.

$ dm cluster join https://discovery.dotmesh.io/1e52c023dfaa2f9e812ec7835bdd0540:OWSWZGRMUCBT5FFFD5NJIVCP5QQSQXVH
Checking suitable Docker is installed... yes, got 1.12.6.
Checking dotmesh isn't running... done.
Pulling dotmesh-server docker image... done.
Downloading PKI assets... done!
Guessing docker host's IPv4 address (should be routable from other cluster nodes)... got: 10.192.0.2.
Guessing unique name for docker host (using hostname, must be unique wrt other cluster nodes)... got: cluster-1516891762883170057-0-node-0.
Starting etcd... done.
Succeeded getting initial admin API key 'E3M6NJBGEEIWEKSPH7E4XLQAKQBQPBAB'
Configuring dm CLI to authenticate to dotmesh server /root/.dotmesh/config... done.
Starting dotmesh server... done.
Waiting for dotmesh server to come up....
done.

Upgrade your node: dm cluster upgrade.

This command stops the Dotmesh server on the current node, downloads the Dotmesh server Docker image corresponding to the version of the dm client you’re using, and starts it up. You would normally upgrade Dotmesh on your node by downloading a new dm client binary and running dm cluster upgrade with it. You can use dm version to check the client and server versions (make sure you’ve selected the local remote!).

$ dm cluster upgrade
Pulling dotmesh-server docker image... done.
Stopping dotmesh-server...done.
Stopping dotmesh-server-inner...done.
Starting dotmesh server... done.

Remove Dotmesh from your node: dm cluster reset.

This command stops the Dotmesh server on the current node, then deletes its resources. It doesn’t delete the Dot data itself, but it does destroy the local copy of the Dot metadata!

$ dm cluster reset
Destroying all dotmesh data... done.
Deleting dotmesh-etcd container... done.
Deleting dotmesh-server containers... done.
Deleting dotmesh-server-inner containers... done.
Deleting dotmesh socket... done.
Deleting dotmesh-etcd-data local volume... done.
Deleting dotmesh-kernel-modules local volume... done.
Deleting 'local' remote... done.
Deleting cached PKI assets... done.

S3 management.

Add a new S3 remote: dm s3 remote add ACCESS_KEY:SECRET_KEY[@HOST:PORT].

$ dm s3 remote add test access_key:secret

S3 remote added.

Invoking this command will check that Dotmesh is able to list buckets using the access key and secret supplied - if it cannot connect it will fail with an appropriate error.

You can then manage S3 buckets using clone, push and pull as if they were Dotmesh servers, but you will not be able to make an S3 remote your current default. You can also clone a subset of an S3 bucket using dm s3 clone-subset.

It is recommended that you enable versioning on your S3 bucket in order for Dotmesh to be able to discern changes easily.

We recommend you create a new user credential in AWS IAM (Aws console -> IAM -> Users -> Add User) in order to limit the security privileges provided. The access type should be “programmatic”.

You should then create a policy to apply to all credential pairs you wish to use (Policies -> Create Policy). The following policy should allow dotmesh to work:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketVersions",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::your bucket name here",
                "arn:aws:s3:::your bucket name here/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:HeadBucket",
            "Resource": "*"
        }
    ]
}

Clone a section of an S3 bucket: dm s3 clone-subset REMOTE BUCKET PREFIXES [--local-name LOCAL-DOT].

This command will clone only a selection of files from an S3 bucket, as dictated by PREFIXES.

$ dm s3 clone-subset --local-name new_data s3 test directory_1/
Pulling admin/new_data from s3:/test
Calculating...
finished 9.50 KB / 9.50 KB [==========================] 100.00% 0.43 MiB/s (1/1)
Done!

You can also use multiple prefixes, separating them by a comma:

$ dm s3 clone-subset –local-name new_data s3 test directory_1/,hello-
Pulling admin/new_data from s3:/test
Calculating…
finished 9.50 KB / 9.50 KB [==========================] 100.00% 0.43 MiB/s (22)
Done!

When pulling or pushing a volume cloned in this way, only files which begin with these prefixes will be updated.