Skip to content

Overview

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The LifeOmic PHC provides an environment for running notebooks.

Requirements

In order to run a notebook on the PHC, you must belong to at least one ENTERPRISE level account.

Getting Started

The launch a notebook server, you click the My Notebooks option in the left side nav menu.

Notebooks Nav Menu

Notebook Runtime Environments

The PHC provides the following notebook runtime environments:

  • Data Science Notebook

    • LifeOmic CLI
    • Pandoc and TeX Live for notebook document conversion
    • git, emacs, jed, nano, tzdata, and unzip
    • The R interpreter and base environment
      • IRKernel to support R code in Jupyter notebooks
      • tidyverse packages, including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and broom from conda-forge
      • plyr, devtools, shiny, rmarkdown, forecast, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest packages from conda-forge
    • Python interpreter and base environment
      • PHC SDK for Python
      • pandas, numexpr, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh, sqlalchemy, hdf5, vincent, beautifulsoup, protobuf, and xlrd packages
      • ipywidgets and plotly for interactive visualizations in Python notebooks
      • Facets for visualizing machine learning datasets
    • The Julia compiler and base environment
      • IJulia to support Julia code in Jupyter notebooks
      • HDF5, Gadfly, and RDatasets packages
    • The base container powering this Notebook type is: https://hub.docker.com/r/jupyter/datascience-notebook
  • Deep Learning Notebook - Provides access to 1 GPU resource

After selecting one of the options above, the notebook server is started. Note that based on current platform load, this can take a little bit of time. The Deep Learning notebook servers take longer to start based on needing to provision the GPU resources. Once started, the JupyterLab interface is presented and you can begin to use the notebook environment. You get a personal storage workspace for any notebook and data files that you use within JupyterLab. This storage workspace is persisted between notebook sessions. If you leave a notebook session running, it will get automatically shut down after 24 hours of idle time.

PHC Integration

When the notebook session is started, your PHC access tokens are injected as environment variables, PHC_ACCESS_TOKEN and PHC_REFRESH_TOKEN. The PHC CLI and Python SDK are both designed to use access tokens when present in these environment variables. This means that you can start to use both within a running notebook without any further authentication required. You can use the CLI and SDK to fetch and store data from the PHC to your personal notebook storage space.

If you wish to share the results of a notebook, you can upload the notebook file back to a PHC Project. From the PHC Files web console, you can click on the notebook file and the PHC will render the notebook. From this view, you can choose to create a sharable link to the notebook file which you can share with others that have access to the same PHC project.

Notebooks Inline

Running TensorBoard

With the Deep Learning Notebook, you can run TensorBoard. From the JupyterLab interface, bring up the Launcher view with File --> New Lancher. This view will show an option to start a TensorBoard session. Once started, you will see TensorBoard running in a new tab. Log files are configured to be stored in ~/tf-logs.

Notebooks TensorBoard

Shutting Down

Notebook servers will automatically be stopped after 24 hours of idle time. Occasionally, updates to the notebook environments are deployed. To pick up these updates or to just shut down your notebook server, do the following:

  1. From the JupyterLab interface, go to File --> Hub Control Panel. This opens a new browser tab.
  2. Click on the Stop My Server button. It will take a few seconds for this operation to complete.
  3. When your server has stopped, the view will refresh and you will see options to start a new notebook server.

Known Issues

  • As noted above, PHC access tokens are stored in environment variables of the running notebook server. These tokens allow you to access the PHC using the CLI or the Python SDK. These tokens expire after 24 hours. If you keep your notebook server up for more than 24 hours, then it is possible the tokens will expire and you will notice errors when trying to use the CLI or the SDK. To resolve, follow the instructions in the section above to restart your notebook server. When restarted, you should now have updated access tokens. Another option is to use a PHC API Key within your notebook to access PHC resources.

Last update: June 26, 2020