Skip to content

Overview

Workflow service provides for the ability to run Common Workflow Language executions, which is an open standard for describing analysis workflows and tools. The workflow service makes use of Cromwell as the orchestration layer to manage steps within a workflow. A custom plugin was developed so that the step executions are performed in a highly secure environment by Task Service. CWL executions may be initiated either through the LifeOmic CLI (Command Line Interface) or through the Add Workflow page.

Users will need to have an account setup and are able to login at https://apps.us.lifeomic.com/phc/login before proceeding with using the CLI or the UI to run workflows.

General concept

The workflow service requires that all CWL resource and dependency files exist withing the PHC File Service. Once all resources are in place, then a run of the CWL may be executed through the CLI or UI. The Automation page lists all workflows and their current states, start times, and run times.

Automation Landing Page

From this page an individual execution may be selected. This view will display a graph of the workflow using the Rabix CWL-SVG open source library for generating visualizations. This will also list the individual steps of the workflow as run by the Task Service.

View Workflow

NOTE: The folder icon at the top right of this view will open up a new view that will show you the original CWL files, your inputs saved as a JSON file, and all outputs generated by the workflow, each in a directory named the same as the CWL step that generated them.

View Workflow File Outputs

Running a basic workflow

The following step will demonstrate how a simple workflow may be run that generates an index for a BAM file using the CLI. For a demonstration of running a workflow through the UI, reference Add Workflow. It's important to note, that CWL has a fairly broad syntax and below is just a simple example, as noted above reference Common Workflow Language for more general information. Workflow service only implements a subset of the full CWL feature set, reference Workflow Service limitations for current limitations.

Generate and upload the CWL resources

Here is a sample master CWL

This file describes:

  • Two inputs, a bamfile and a filename for the index of the BAM file

  • One output, the index file that will have the name provided by the input above

  • One step, this gives a name to the step index_bam and the name of the CWL file bamindex.cwl that will execute the step

Generate this file then upload it using the CLI, ex lo files upload ./bam_master.cwl <datasetId>

cwlVersion: v1.0
class: Workflow

inputs:
    bamfile: File
    bamindexfilename: string

outputs:
    bamindexout:
        type: File
        outputSource: index_bam/bamindexout

steps:
    index_bam:
        run: bamindex.cwl
        in:
            bamfile: bamfile
            bamindexfilename: bamindexfilename
        out: [bamindexout]
Here is a sample CWL for the step CWL.

This file describes:

  • The type of tool used, in this case CommandLineTool

  • The Docker container that will run the step. NOTE: The workflow service requires that all steps use a docker container for execution. This allows for the secure execution within task service

  • Two inputs, the BAM file and the index filename

NOTE: In this example the two inputs are also used as arguments to the baseCommand notice the inputBinding and position values.

  • One output, in this case the input filename is re-used to name the output file
  • The command that the container runs

Generate this file then upload it using the CLI, ex lo files upload ./bamindex.cwl <datasetId>

cwlVersion: v1.0
class: CommandLineTool
hints:
    DockerRequirement:
        dockerPull: genomicpariscentre/samtools

inputs:
    bamfile:
        type: File
        inputBinding:
            position: 1
    bamindexfilename:
        type: string
        inputBinding:
            position: 2
outputs:
    bamindexout:
        type: File
        outputBinding:
            glob: $(inputs.bamindexfilename)

baseCommand: ['samtools', 'index']
Next a JSON file provides the inputs

This file describes:

  • A file input, using the class File and the id of the file

  • The name of the desired output file

{
    "bamfile": {
        "class": "File",
        "fileId": "805209e1-35cb-49f3-a5cc-327a93d1f72d"
    },
    "bamindexfilename": "HG00463.bam.bai"
}

Generate this file then upload it using the CLI, ex lo files upload ./bam_inputs.json <datasetId>

Finally we are ready to run the workflow using the CLI

lo workflows create <datasetId> -n "BAM Indexing" -w <masterCwlFileId> -f <inputsFileId> -d <cwlDependenciesFileId>

And that's it, the workflow is running. You can go to the Automation View to see the list of workflows and select the one you've started to look at in detail.

Using a non-public image

Task service can make use of non-public docker images, ref Using a non public image. The syntax to make use of this in workflow service is as follows:

Using the required DockerRequirement, prefix the name of the private container with lifeomic_private/. This informs workflow service to handle the image as a non-public image. Then add a file input type to the CWL master and step files, treating it as any other file input.

requirements:
    DockerRequirement:
        dockerPull: lifeomic_private/my_private_image

Glob Pattern Handling

The supported syntax for handling glob patterns in output files has some limitations when the pattern includes multiple unknown directories. The following examples explain in detail this limitation.

  • The pattern /tmp/**/*.txt will look for a *.txt file within any one sub directory, this is our best use case.
    • For example, pattern /tmp/**/*.txt would find output.txt given location /tmp/foo/output.txt
  • The pattern /tmp/**/*.txt should also find *.txt files under multiple sub directories, but currently we are limited to one sub directory.
    • For example pattern /tmp/**/*.txt would not find output.txt give location /tmp/foo/bar/output.txt
  • If the number of sub directories is known this pattern may be used to get through this limitation by including /** for each directory.
    • For example, pattern /tmp/**/**/*.txt will find output.txt given location /tmp/foo/bar/output.txt

Workflow Service limitations

The full CWL syntax is not currently supported. While some CWL Requirements are required, i.e. DockerRequirement others most likely will not be supported due to security concerns, i.e. InlineJavascript. However, as we add support for the other Requirements they will be listed here. Due to the explicit nature of the file service handling, CWL secondary files are also not supported. Each file needs to be explicitly listed as a file input and id provided in the inputs.

Supported Requirements

  • DockerRequirement (also a required value)

Reference


Last update: June 19, 2020