Overview¶
Workflow service provides for the ability to run Common Workflow Language executions, which is an open standard for describing analysis workflows and tools. The workflow service makes use of Cromwell as the orchestration layer to manage steps within a workflow. A custom plugin was developed so that the step executions are performed in a highly secure environment by Task Service. You initiate CWL executions either through the LifeOmic CLI (Command Line Interface), through the PHC Console Add Workflow page, or through the PHC SDK for Python.
Before using the CLI or the PHC web console to run workflows, users need to log into their PHC account at https://apps.us.lifeomic.com/phc/login.
General concept¶
The workflow service requires that all CWL resource and dependency files exist within the PHC File Service. Once all resources are in place, then you execute a run of the CWL through the CLI or the PHC web console. The Automation page lists all workflows and their current states, start times, and run times.
From this page, select an individual execution. This view displays a graph of the workflow using the Rabix CWL-SVG open source library for generating visualizations. It also list the individual steps of the workflow as run by the Task Service.
Note: Click the folder icon at the top right of this view to display the workflow files. The files include all of the outputs generated by the workflow. Each output is saved in a directory with the name of the CWL step that generated it.
Use the CLI to run a basic workflow¶
The following step demonstrates how to run a simple workflow that generates an index for a BAM file. It's important to note, that CWL has a fairly broad syntax and the below is just a simple example. Reference Common Workflow Language for more general information. Workflow service only implements a subset of the full CWL feature set, reference Workflow Service limitations for current limitations.
To use the PHC web console to run a workflow, see Add a Workflow.
Generate and upload the CWL resources¶
Here is a sample master CWL¶
This file describes:
-
Two inputs, a bamfile and a filename for the index of the BAM file
-
One output, the index file that will have the name provided by the input above
-
One step, this gives a name to the step
index_bam
and the name of the CWL filebamindex.cwl
that will execute the step
Generate this file then upload it using the CLI, ex lo files upload ./bam_master.cwl <datasetId>
cwlVersion: v1.0
class: Workflow
inputs:
bamfile: File
bamindexfilename: string
outputs:
bamindexout:
type: File
outputSource: index_bam/bamindexout
steps:
index_bam:
run: bamindex.cwl
in:
bamfile: bamfile
bamindexfilename: bamindexfilename
out: [bamindexout]
Here is a sample CWL for the step CWL.¶
This file describes:
-
The type of tool used, in this case
CommandLineTool
-
The Docker container that will run the step
Note: The workflow service requires that all steps use a docker container for execution. This allows for the secure execution within task service
-
Two inputs, the BAM file and the index filename
Note: In this example the two inputs are also used as arguments to the
baseCommand
notice theinputBinding
andposition
values. -
One output, in this case the input filename is re-used to name the output file
- The command that the container runs
Generate this file then upload it using the CLI, ex lo files upload ./bamindex.cwl <datasetId>
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: genomicpariscentre/samtools
inputs:
bamfile:
type: File
inputBinding:
position: 1
bamindexfilename:
type: string
inputBinding:
position: 2
outputs:
bamindexout:
type: File
outputBinding:
glob: $(inputs.bamindexfilename)
baseCommand: ['samtools', 'index']
Next a JSON file provides the inputs¶
This file describes:
-
A file input, using the
class
File
and the id of the file -
The name of the desired output file
{
"bamfile": {
"class": "File",
"fileId": "805209e1-35cb-49f3-a5cc-327a93d1f72d"
},
"bamindexfilename": "HG00463.bam.bai"
}
Generate this file then upload it using the CLI, ex lo files upload ./bam_inputs.json <datasetId>
Finally we are ready to run the workflow using the CLI¶
lo workflows create <datasetId> -n "BAM Indexing" -w <masterCwlFileId> -f <inputsFileId> -d <cwlDependenciesFileId>
And that's it, the workflow is running. You can go to the Automation View to see the list of workflows and select the one you've started to look at in detail.
Using a non-public image¶
Task service can make use of non-public docker images, ref Using a non public image. The syntax to make use of this in workflow service is as follows:
Using the required DockerRequirement
, prefix the name of the private container with lifeomic_private/
. This informs workflow service to handle the image as a non-public image. Then add a file input type to the CWL master and step files, treating it as any other file input.
requirements:
DockerRequirement:
dockerPull: lifeomic_private/my_private_image
Glob Pattern Handling¶
The supported syntax for handling glob patterns in output files has some limitations when the pattern includes multiple unknown directories. The following examples explain in detail this limitation.
- The pattern
/tmp/**/*.txt
will look for a*.txt
file within any one sub directory, this is our best use case.- For example, pattern
/tmp/**/*.txt
would findoutput.txt
given location/tmp/foo/output.txt
- For example, pattern
- The pattern
/tmp/**/*.txt
should also find*.txt
files under multiple sub directories, but currently we are limited to one sub directory.- For example pattern
/tmp/**/*.txt
would not findoutput.txt
give location/tmp/foo/bar/output.txt
- For example pattern
- If the number of sub directories is known this pattern may be used to get through this limitation by including
/**
for each directory.- For example, pattern
/tmp/**/**/*.txt
will findoutput.txt
given location/tmp/foo/bar/output.txt
- For example, pattern
Workflow Service limitations¶
The full CWL syntax is not currently supported. While some CWL Requirements
are required, i.e. DockerRequirement
others most likely will not be supported due to security concerns, i.e. InlineJavascript
. However, as we add support for the other Requirements
they will be listed here. Due to the explicit nature of the file service handling, CWL secondary
files are also not supported. Each file needs to be explicitly listed as a file input and id provided in the inputs.
Supported Requirements¶
- DockerRequirement (also a required value)
Reference¶
- BAM - https://en.wikipedia.org/wiki/Binary_Alignment_Map
- Common Workflow Language - https://www.commonwl.org/
- Cromwell - https://cromwell.readthedocs.io/en/stable/
- Docker Overview - https://docs.docker.com/engine/docker-overview/
- Rabix CWL-SVG - rabix/cwl-svg
- Registry of Docker based tools and workflows defined in CWL or WDL for the sciences - https://dockstore.org