Skip to content

PHC SDK for Python

Source location - lifeomic/phc-sdk-py

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

(NOTE: All examples use fictious data or freely available data sets.)

0.17.1 - 2020-09-17

Added

  • Paging requests with all_results=True now automatically retries to the server with an exponentially smaller batch size on error (pow(limit, 0.85)). We can't tell what the error is, but we can retry with a smaller page size.
  • Added page_size to the easy modules for a custom batch size
  • Added max_pages to the easy modules for capping the number of pages returned
  • Added pretty print to FHIR Search Service queries when passing log=True
  • Warn and convert out of range date times (e.g. 0217-01-01) to NaT

Fixed

  • Properly parse date columns with positive time zones into the local time and time zone
  • Resolved a KeyError issue with coding where the valueCodeableConcept didn't have a system or url
  • Passing patient_id / patient_ids with a must FHIR Search Service query now works as expected

Changed

[BREAKING] The expanded columns have changed to more reflect the location of the value. All systems and URLs are separated by __ and prefixed with either url or system. Here is an example:

input_dict = [
    {
        "url": "http://hl7.org/fhir/StructureDefinition/us-core-race",
        "valueCodeableConcept": {
            "text": "race",
            "coding": [
                {
                    "code": "2106-3",
                    "system": "http://hl7.org/fhir/v3/Race",
                    "display": "white",
                }
            ],
        },
    },
    {
        "url": "http://hl7.org/fhir/StructureDefinition/us-core-ethnicity",
        "valueCodeableConcept": {
            "text": "ethnicity",
            "coding": [
                {
                    "code": "2186-5",
                    "system": "http://hl7.org/fhir/v3/Ethnicity",
                    "display": "not hispanic or latino",
                }
            ],
        },
    },
]

assert generic_codeable_to_dict(input_dict) == {
    "url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_text": "race",
    "url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Race__code": "2106-3",
    "url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Race__display": "white",
    "url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_text": "ethnicity",
    "url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Ethnicity__code": "2186-5",
    "url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Ethnicity__display": "not hispanic or latino",
}

0.16.0 - 2020-08-27

Added

  • Added most of remaining FSS entities:
  • AuditEvent
  • CarePlan
  • DiagnosticReport
  • DocumentReference
  • Encounter
  • ImagingStudy
  • Immunization
  • Media
  • MedicationAdministration
  • MedicationDispense
  • MedicationRequest
  • MedicationStatement
  • Person
  • Practitioner
  • Procedure
  • ProcedureRequest
  • Provenance
  • ReferralRequest
  • Sequence
  • Specimen
  • Add abstract Item class for entities that don't relate to a patient (e.g. Organization and Practitioner)

Changed

All date columns now return two columns--one for the local time (with time zone removed) and one for the time zone offset in hours. Consider the onsetDateTime column from BRCA's Condition table:

   onsetDateTime.tz       onsetDateTime.local
0               0.0 1998-01-01 00:00:00+00:00
1               0.0 2010-01-01 00:00:00+00:00
2               0.0 2008-01-01 00:00:00+00:00
3               0.0 1994-01-01 00:00:00+00:00
4               0.0 2008-01-01 00:00:00+00:00
5               0.0 2012-01-01 00:00:00+00:00
6               0.0 2017-06-27 04:00:00+00:00

0.15.0 - 2020-08-05

Includes more work on the easy modules (imported via import phc.easy as phc).

Added

  • Added phc.easy.Query.execute_ga4gh that auto-scrolls GA4GH results
  • Added phc.easy.Sequence as another entity module
  • Added generic methods on phc.easy.Query
  • get_count_by_field
  • get_codes
  • execute_composite_aggregations (used by get_count_by_field and get_codes)
  • Added phc.easy.PatientItem.get_count_by_patient (Observation, Procedure, Specimen, etc.)
# Example: Get number of procedures by patient

Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Procedure.get_count_by_patient()

#                                      doc_count

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# subject.reference                              

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 518eb55d-adbf-42c3-8aed-68176d0ed4b7        334

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 67233488-ddd6-46e1-88cc-a93140b86c02       2088

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# b41f8107-85e1-42c3-b36e-400085799ab5        176

Source location - <https://github.com/lifeomic/phc-sdk-py/>
  • Added phc.easy.PatientItem.get_count_by_field (Observation, Procedure, Specimen, etc.)
# Example: Get count of unique procedure display codes

Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Procedure.get_count_by_field("code.coding.display")

#                      code.coding.display  doc_count

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 0                             lumpectomy        247

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 1            modified radical mastectomy        322

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2                                  other        272

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3                      simple mastectomy        200

Source location - <https://github.com/lifeomic/phc-sdk-py/>
  • Added phc.easy.PatientItem.get_codes (Observation, Procedure, Specimen, etc.)
# Example: Get observation codes for specific patients

Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_codes(patient_ids=[
    "e296f292-230f-444c-887f-0b213bde90fa",
    "78adf262-c77e-4cb3-8435-034bd9e73b64"
])

#    doc_count            system     code                                     display        field

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 0        1.0  http://loinc.org  21893-3  Regional lymph nodes positive [#] Specimen  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 1        2.0  http://loinc.org  21975-8                        Date of Last Contact  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2        1.0  http://loinc.org  21981-6                 Date of Disease Progression  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3        2.0  http://loinc.org  49683-6                    HER2/neu receptor status  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 4        2.0  http://loinc.org  63931-0                           Date of Diagnosis  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 5        2.0  http://loinc.org  85337-4                    Estrogen Receptor Status  code.coding

Source location - <https://github.com/lifeomic/phc-sdk-py/>

Changed

  • Passing log to any PatientItem entities now logs the FSS query being run
  • For aggregations, phc.Query.execute_fhir_dsl now returns a FhirAggregation if an aggregation is specified in the query
  • phc.Query.execute_fhir_dsl_with_options now caches aggregation queries in JSON format
  • Specifying patient_id and/or patient_ids is now properly supported with a custom FHIR query.
# Example: Get observations tagged with loinc for a specific patient

Source location - <https://github.com/lifeomic/phc-sdk-py/>

phc.Observation.get_data_frame(patient_id="<id>", query_overrides={
    "where": {
        "type": "elasticsearch",
        "query": {
            "term": {
                "code.coding.system.keyword": "http://loinc.org"
            }
        }
    }
})

Fixed

  • Fix phc.easy.Procedure not inheriting new phc.easy.PatientItem behavior

0.14.1 - 2020-07-15

Fixed

  • Fixed missing trust_env args in created client objects

0.14.0 - 2020-07-14

Added

  • All-new easy module for faster analysis! Simply import phc.easy as phc.
  • Add Auth for shared authentication details (account, project, and token)
  • Add Query for scrolling through FHIR Search Service (FSS) data
  • Add Frame for expanding columns that contain FHIR data and parsing dates
  • Add APICache for auto-caching results from easy modules
  • Add CSVWriter for intelligently writing batches O(1) without having memory grow
  • Includes Project, Patient, Observation, Procedure, Condition, Goal, and Specimen

0.13.0 - 2020-04-17

Added

  • Switched build over to github actions

0.12.3 - 2020-04-13

Added

  • Adds dsl and sql methods to phc.services.Fhir

Changed

  • Deprecates execute_sql and execute_es methods in phc.services.Fhir

0.12.2 - 2020-03-25

Fixed

  • Added retries to file download requests

0.12.1 - 2020-03-25

Fixed

  • Fixed retry logic to include OS level errors.

0.12.0 - 2020-03-23

Added

  • Added retry support for failed API requests.

0.11.0 - 2020-03-17

Added

  • Added the trust_env parameter to all service classes to enable http proxy support.

0.10.0 - 2020-03-10

Added

  • Added execute_sql to phc.services.Analytics.

0.9.2 - 2020-02-19

Added

  • Added scroll support to phc.services.Fhir via the scroll param.

0.9.1 - 2019-12-17

Changed

  • Fixed phc.services.Genomics.Status enum.

0.9.0 - 2019-12-16

Changed

  • Added phc.services.Genomics for performing genomic related operations.

0.8.1 - 2019-11-27

Changed

  • In Analytics.load_data_lake_result_to_dataframe increased the amount of time it takes to wait for a results file.

0.8.0 - 2019-11-25

Added

  • Added Analytics.list_data_lake_schemas to fetch the schemas of each data lake table.
  • Added Analytics.get_data_lake_schema to fetch the schema of a single data lake table.
  • Added Analytics.execute_data_lake_query_to_dataframe to execute a data lake query and load the results to a Pandas dataframe.
  • Added Analytics.load_data_lake_result_to_dataframe to load the results of a previously executed data lake query to a Pandas dataframe.
  • Added Files.exists to check if a file exists.

0.7.1 - 2019-11-21

Fixed

  • Fixed issue with Files.download to create target directories if they do not exist.

0.7.0 - 2019-11-20

Added

  • Added optional pandas setup install
  • Added ApiResponse.get_as_dataframe to return a response item as a Pandas DataFrame.

0.6.0 - 2019-11-01

Added

  • Added the phc.services.Files submodule that provides actions for files in PHC projects.
  • Added the phc.services.Cohorts submodule that provides actions for files in PHC cohorts.

Last update: 2020-09-25