Overview

Task service allows for direct access to cloud data in a secure compute environment with your own code inside Docker containers.

The LifeOmic CLI (Command Line Interface) offers a familiar interface a user to develop and run their code.

Users will need to have an account setup and are able to login at https://apps.us.lifeomic.com/phc/login before proceeding with using the CLI to run tasks.

General concept

Every resource in PHC is identified by a unique ID which looks like this 4a113171-9f4a-48e2-82be-5682f476cc76. In general, use the ID instead of resource name with the CLI. Very often, when working with task service and files, it is necessary to specify the project or dataset id to work under. Project and dataset are used interchangeable to mean the same thing.

Listing of files in project

The follow command lists the datasets/projects under the account, which has only 1 with datasetId = 4a113171-9f4a-48e2-82be-5682f476cc76:

>> lo projects list
items:
  -
    id:          4a113171-9f4a-48e2-82be-5682f476cc76
    name:        analytics-testing
    description: Analytics Testing Project
    lrn:         lrn:lo:dev:lifeomic:project:4a113171-9f4a-48e2-82be-5682f476cc7

To list the files in a dataset/project, provide the datasetId (in the example below is e447d01a-ae17-48d0-8cd8-86c9a65f779b) with the command like this which returns a list of 2 files:

>> lo files list e447d01a-ae17-48d0-8cd8-86c9a65f779b
items:
  -
    id:           89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
    name:         mmrf.rgel
    datasetId:    e447d01a-ae17-48d0-8cd8-86c9a65f779b
    size:         1223998865
    contentType:  application/octet-stream
    lastModified: 2018-06-15T19:01:58.365Z
    lrn:          lrn:lo:dev:lifeomic:file:89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
  -
    id:           988d4158-6aac-47a2-99d5-e5de8fa5acb1
    name:         mmrf.rgel.executor.0.stderr.txt
    datasetId:    e447d01a-ae17-48d0-8cd8-86c9a65f779b
    size:         82
    contentType:  text/plain
    lastModified: 2018-06-15T19:22:12.438Z
    lrn:          lrn:lo:dev:lifeomic:file:988d4158-6aac-47a2-99d5-e5de8fa5acb1

Upload of file

To upload a single file:

lo files upload ./myfile.txt <datasetId>

To upload a whole directory of files, provide the path to the local directory instead of a file name. Note that the files will be prefixed with the directory name <localDir> when uploaded to PHC.

lo files upload <localDir> <datasetId>

Task Service with the CLI

To submit a task job using the CLI, creates the json job definition. A task json file contains the <name> and <datasetId> declarations, followed by these main sections: inputs, outputs, resources and executors. The input and output sections are files and directories to copy and write out upon execution. Resource specifies the cpu cores needed and the ram in GB. Executors section defines a list of docker images to execute serially in the order of declaration.

Here is an example of file specification as input. Provide the file’s <url>, using its unique id which can be obtained by listing the files in the dataset. The <path> is where the file will be copied and available to the docker images upon execution.

{
    "path": "/tmp/input.txt",
    "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
    "type": "FILE"
}

For directory as input, provide the <url> with the project/dataset id, and the <prefix> specifies the path prefix in which all files with that prefix in the project will be copied over to the <path>.

{
    "path": "/tmp/in",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "prefix": "data/",
    "type": "DIRECTORY"
}

To save the results from the execution, declare file or directory to be copied out with <url> being the project/dataset id. In the following example, file "/out/result.txt" and directory "/outDir" in the container will be copied out upon successful completion of the task.

({
    "path": "/out/result.txt",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "type": "FILE"
},
{
    "path": "/outDir",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "type": "DIRECTORY"
})

Scheduling

Tasks can be scheduled to run at a future time, or be configured to run on a recurring schedule.

To schedule a task to run at a specific date and time, add the following to the task definition:

{
    "scheduleDate": "2019-07-26T16:35:31Z"
}

To schedule a task to run on a recurring schedule, use a cron expression:

{
    "scheduleExpression": "0 0 * * *"
}

To stop a recurring task, use the cancel task API.

Email Notifications

Email notifications can be sent when a task completes or fails. To use this feature, add the following to the task definition:

{
    "email": {
        "sendFailedTo": "user@company.com",
        "sendCompletedTo": "user@company.com
    }
}

In the example above, an email will be sent if the task fails or completes.

Using a non public image

Task Service can pull images from any public docker registry like Dockerhub. A non public image can be used by uploading an export of it to the PHC and then specifying it as an input to the task.

Use docker save to create a gzipped TAR file of the image and then use the CLI to upload to a project.

docker build -t my_image --rm .
docker save my_image | gzip > myimage.tar.gz
lo files upload ./myimage.tar.gz 0ec93203-febb-4c85-9aac-229703b6fa58

Specify the uploaded docker image as an input to a task. The task service will fetch the gzipped TAR file and load the docker image.

{
    "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
    "path": "/tmp/myimage.tar.gz",
    "type": "DOCKER_IMAGE"
}

Use the image in an executor within the task. Note: You use the image tag name and not the name of the gzipped TAR file.

"executors": [
    {
        "workdir": "/tmp",
        "image": "my_image",
        "command": [
            "echo",
            "hello world"
        ],
        "stderr": "/out/stderr.txt",
        "stdout": "/out/stdout.txt"
    }
]

Hello world json example

To run the following example, save the definition as a json file (e.g. hello.json) and changed <datasetId> and the output <url> with your datasetId. This task uses the downloaded image "busybox" from https://hub.docker.com/_/busybox/ and executes the linux command "echo hello world" which is saved to the "stdout" in file "out/stdout.txt". User will find the file "out/stdout.txt" in the file listing in the UI with content "hello world" text. Note that there is no input declared with only an output directory to return.

To submit the job, run:

cat hello.json | lo tasks create
{
    "name": "Hello World Task",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
        "workdir": "/tmp",
            "image": "busybox",
            "command": [
                "echo",
                "hello world"
            ],
            "stderr": "/out/stderr.txt",
            "stdout": "/out/stdout.txt"
        }
    ]
}

List files json example

This example lists the files in a directory and saved the result in a file. Assume there exists files with prefix "data/" and a bash script file "run.sh" with file id = 32c94154-1910-4e77-ab98-c6c8f2163060 in the project. The content of the bash script file "run.sh" has 2 lines:

#!/bin/bash
ls -al $1 > $2

The result of the listing of directory "/tmp/in" is saved to the file "/out/result.txt".

{
    "name": "Task Service Test",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
        {
            "path": "/tmp/in",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "prefix": "data/",
            "type": "DIRECTORY"
        },
        {
            "path": "/tmp/run.sh",
            "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
            "type": "FILE"
        }
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
            "workdir": "/tmp",
            "image": "busybox",
            "command": [
                "sh",
                "-l",
                "/tmp/run.sh",
                "/tmp/in",
                "/out/result.txt"
            ],
            "stderr": "/out/stderr.txt",
            "stdout": "/out/stdout.txt"
        }
    ]
}

A complete task json example

This is more comprehensive example of taking a variant "vcf" file and passing through a series of processing. It uses LifeOmic GNOSIS data resources as reference data inputs. The various resources available in GNOSIS to be used with Task Service will be an advance topic to discuss. This example is to demonstrate the practical usage of Task Service to perform a complete series of tasks.

{
    "name": "NantOmics Test",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
        {
            "path": "/tmp/nantomics.vcf",
            "url": "https://api.us.lifeomic.com/v1/files/5a61bfdb-5db0-4264-8573-ee9945383cf7",
            "type": "FILE"
        },
        {
            "path": "/tmp/genome",
            "name": "GRCh37",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/clinvar",
            "name": "ClinVar",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/cosmic",
            "name": "COSMIC",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/dbsnp",
            "name": "dbSNP",
            "genome": "GRCh37",
            "type": "GNOSIS"
        }
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        },
        {
            "path": "/log",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 4
    },
    "executors": [
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "extract",
                "-i",
                "/tmp/nantomics.vcf",
                "-v",
                "/out/nantomics.var.vcf.gz",
                "-s",
                "/out/nantomics.sv.vcf.gz",
                "-c",
                "/out/nantomics.cnv.vcf.gz",
                "-e",
                "/out/nantomics.exp.vcf.gz"
            ],
            "stderr": "/log/stderr1.txt",
            "stdout": "/log/stdout1.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "var-transform",
                "-i",
                "/out/nantomics.var.vcf.gz",
                "-o",
                "/out/nantomics.var.std.vcf.gz"
            ],
            "stderr": "/log/stderr2.txt",
            "stdout": "/log/stdout2.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "exp-transform",
                "-i",
                "/out/nantomics.exp.vcf.gz",
                "-g",
                "/out/nantomics.exp.gene.txt.gz",
                "-s",
                "/out/nantomics.exp.iso.txt.gz"
            ],
            "stderr": "/log/stderr3.txt",
            "stdout": "/log/stdout3.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-vtools",
            "command": [
                "vt-combo",
                "-r",
                "/tmp/genome/GRCh37.fa.gz",
                "-i",
                "/out/nantomics.var.std.vcf.gz",
                "-o",
                "/out/nantomics.var.nrm.vcf.gz"
            ],
            "stderr": "/log/stderr4.txt",
            "stdout": "/log/stdout4.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpeff",
                "-m",
                "Refseq",
                "-i",
                "/out/nantomics.var.nrm.vcf.gz",
                "-o",
                "/out/nantomics.var.fnc.vcf.gz"
            ],
            "stderr": "/log/stderr5.txt",
            "stdout": "/log/stdout5.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "CLN_",
                "-n",
                "/tmp/clinvar/clinvar-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.fnc.vcf.gz",
                "-o",
                "/out/nantomics.var.cln.vcf.gz"
            ],
            "stderr": "/log/stderr6.txt",
            "stdout": "/log/stdout6.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "CMC_",
                "-n",
                "/tmp/cosmic/cosmic-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.cln.vcf.gz",
                "-o",
                "/out/nantomics.var.cmc.vcf.gz"
            ],
            "stderr": "/log/stderr7.txt",
            "stdout": "/log/stdout7.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "DBS_",
                "-n",
                "/tmp/dbsnp/dbsnp-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.cmc.vcf.gz",
                "-o",
                "/out/nantomics.var.dbs.vcf.gz"
            ],
            "stderr": "/log/stderr8.txt",
            "stdout": "/log/stdout8.txt"
        }
    ]
}

FHIR resource ingest

This example shows how to ingest FHIR resources from a file using a task. In this example there are no executors, because the FHIR resources are taken as-is from the file with no transformation.

{
    "name": "FHIR ingest",
    "datasetId": "643efe57-430f-4b06-b1b0-3e565c62a64c",
    "inputs": [
        {
            "path": "/tmp/fhir.json",
            "url": "https://api.us.lifeomic.com/v1/files/146e0679-0e03-4e05-af46-d930cfaec761",
            "type": "FILE"
        }
    ],
    "outputs": [
        {
            "path": "/tmp/fhir.json",
            "url": "https://api.us.lifeomic.com/v1/projects/643efe57-430f-4b06-b1b0-3e565c62a64c",
            "type": "FHIR"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": []
}

The file of FHIR resources (fhir.json in this example) should be a in JSON Lines format (aka newline-delimited JSON). For example:

{"resourceType":"Patient","name":[{"family":"Zieme","given":["Mina"]}],"gender":"female","id":"024f2316-265a-46e8-965a-837e308ae678","birthDate":"1977-06-21"}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"62f3ccbf-c51b-48ed-ad1d-0420ea196af6","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-09-09T23:20:53Z","valueQuantity":{"value":10,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"b452acd6-00c9-4fab-847c-31177e14e412","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-05-05T15:11:18Z","valueQuantity":{"value":0,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}

FHIR resource listing and cohort creation

This example queries 100,000 FHIR Observations and writes them in JSON Lines format to a file in the task, and then runs a container to compute some statistics and make a cohort out of the outliers.

{
    "name": "FHIR Analytics",
    "datasetId": "cccdf419-ac83-4b7e-aa2d-70702d43297c",
    "inputs": [
        {
            "resourceType": "Observation",
            "limit": 100000,
            "path": "/fhir/Observation.json",
            "type": "FHIR"
        }
    ],
    "outputs": [
        {
            "path": "/output/",
            "url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
            "type": "DIRECTORY"
        },
        {
            "path": "/cohorts/cohort.csv",
            "url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
            "type": "COHORT"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
            "image": "aroach/task-sandbox:6",
            "command": ["python", "stats.py"],
            "stderr": "/output/stderr.txt",
            "stdout": "/output/stdout.txt"
        }
    ]
}

Getting updated FHIR resources with a Scheduled Task

This example creates a Scheduled Task that runs weekly, and gets only the FHIR Patients that have been updated since the last time it was run. It does this by querying the _lastUpdated field of the FHIR records, and using the special variables startTime and lastSuccessfulStartTime to limit the results to those that have been updated since the last time it was run successfully, and stopping at the the current start time (this is important to prevent overlap on the next run). The curly braces are a special syntax to represent a placeholder for the value inside the braces. This will cause the braces, and everything inside them, to be replaced with the specified variable before the input is downloaded to the container.

{
    "name": "Get Updated Patients",
    "datasetId": "8913220a-6e22-4747-9f00-8477c475b1ec",
    "scheduleExpression": "0 0 * * 0",
    "inputs": [
        {
            "path": "/fhir/patients.json",
            "type": "FHIR",
            "resourceType": "Patient",
            "limit": 100000,
            "query": "_lastUpdated=gt{{lastSuccessfulStartTime}}&_lastUpdated=le{{startTime}}"
        }
    ],
    "outputs": [
        {
            "path": "/fhir",
            "url": "https://api.dev.lifeomic.com/v1/projects/8913220a-6e22-4747-9f00-8477c475b1ec",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": []
}

NOTE: The first time a Scheduled Task runs, lastSuccessfulStartTime will be set to the Unix Epoch (Jan. 1st 1970 at midnight, UTC time). See: https://en.wikipedia.org/wiki/Unix_time

Final Note

The task and its execution status can be seen from the PHC UI. The task listing is only there for a period of time.

Reference


Last update: February 1, 2020