PHC SDK for Python¶
Source location - lifeomic/phc-sdk-py
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
(NOTE: All examples use fictious data or freely available data sets.)
0.23.2 - 2021-09-27¶
Fixed¶
- Terms and any list parameters (e.g.
ids
/patient_ids
) with more than 30K values get auto-chunked into multiple queries (No need to iterate!)
Added¶
- Added support for single/multiple values with the
term
orterms
(New!) parameters - Added configurable term limit for sending multiple queries (param
max_terms
) - Added
phc.DataLake
for sending SQL queries to the data lake
Example
phc.DataLake.get_data_frame(
"SELECT * FROM my_table",
extension="parquet",
transform=lambda df: df.drop(["id"], axis=1)
)
# => Loading from "~/Downloads/phc/api-cache/data_lake_my_table_f84bab09.parquet"
Source location - <https://github.com/lifeomic/phc-sdk-py/>
0.23.1 - 2021-07-26¶
Fixed¶
- Fix import error because of missing
__init__
file for summary API folder
0.23.0 - 2021-07-23¶
Fixed¶
phc.Project
now operates properly when one of the accounts only has limited accessphc.Observation
,phc.Condition
, andphc.Procedures
'sget_codes
method now uses the new summary APIs to return better results.
Added¶
- Summary APIs (PR #150)
phc.SummaryClinicalCounts
- Retrieve all clinical counts (across tables like observation, conditions, procedures, and medications)phc.SummaryOmicsCounts
- Retrieve summaries across genomic data (counts of clinvar_significance, gene_variant, sequence, test, etc)phc.SummaryCounts
- Retrieve all summaries (across omics and clinical)phc.SummaryItemCounts
- Retrieve counts for a specific table (e.g.condition
,procedure
)
Example
phc.SummaryClinicalCounts.get_data_frame(match="fuzzy", system=["snomed.info", "loinc.org"])
# summary code display patient_count system count media_type media_type_count
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 0 procedure 406505007 modified radical mastectomy 322.0 http://snomed.info/sct 322.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 1 procedure 392090004 other 272.0 http://snomed.info/sct 272.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2 observation 21975-8 Date of Last Contact 1094.0 http://loinc.org 1094.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3 medication 387420009 cytoxan 514.0 http://snomed.info/sct 523.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 4 medication 372817009 doxorubicin+cyclophosphamid 364.0 http://snomed.info/sct 371.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 5 condition 254837009 None 1086.0 http://snomed.info/sct 1086.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 6 condition 82711006 Infiltrating duct carcinoma, NOS 778.0 http://snomed.info/sct 778.0 NaN NaN
Source location - <https://github.com/lifeomic/phc-sdk-py/>
0.22.2 - 2021-04-30¶
Fixed¶
- Make token optional when using a custom adapter that doesn't support refreshing the token
0.22.1 - 2021-04-27¶
Added¶
- Added ability to use custom adapter for sending/receiving underlying data (e.g. for tests)
0.22.0 - 2021-03-24¶
Fixed¶
- Bug that caused error related to printing progress
Added¶
Added lots of Ocr functionality and a Composition module in the easy
namespace.
phc.Ocr.Config
- Create and update PrecisionOCR config within a projectphc.Ocr.Document
- Retrieve PrecisionOCR documentsphc.Ocr.DocumentComposition
- Retrieve metadata by page for PrecisionOCR documentsphc.Ocr.Block
- Retrieve the text and layout metadata from a PrecisionOCR documentphc.Ocr.Suggestion
- Retrieve all permutations of PrecisionOCR medical suggestionsphc.Composition
- Base FHIR class for retrieving Composition resources from the FHIR Search Service
Added the ability to create, read, update, and delete using the FHIR DSTU3 API by appending .DSTU3
to any easy module that supports it.
phc.Patient.DSTU3.create(...)
phc.Patient.DSTU3.get(...)
phc.Patient.DSTU3.update(...)
phc.Patient.DSTU3.put(...)
phc.Patient.DSTU3.delete(...)
0.21.1 - 2020-12-17¶
Fixed¶
- Updated API calls to the data-lake to use correct endpoints.
0.21.0 - 2020-12-10¶
Added¶
Tools
- A service to manager resources in the tool registry servicetools.create
- Adds a tool to the registrytools.download
- Downloads a tooltools.get
- Gets the default verson or a specific version of a tooltools.add_version
- Adds a verson to an existing tooltools.delete
- Deletes the tool or a specific version of a tool-
tools.get_list
- Returns tools from the registry and allows for optional filters -
Workflows
- A service to manager workflows workflows.run
- Runs a workflow using a provided tool from the registryworkflows.get
- Gets a workflow runworkflows.get_list
- Returns all workflows for a project-
workflows.describe
- Returns a list of the inputs and types the workflow requires to run a tool -
Added filtering by id for all
phc.easy
modules
# By single ID
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Patient.get_data_frame(id="<value>")
# Or by multiple IDs
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_data_frame(ids=["<value1>", "<value2>"])
- Added getting all pages of results for
phc.Project
Fixed¶
- Genomics modules (
phc.easy
) now handle an out of range date via a warning (and auto-conversion toNaT
) - Setting and retrieving projects now works properly again (Previously, projects were inaccurate or sometimes missing.)
0.20.0 - 2020-11-19¶
Added¶
- Auto-retrieve GenomicTests for each type of variant (short, copy number, structural, and expression) if no
variant_set_ids
passed
# Specify the specific sets within a test
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.GenomicShortVariant.get_data_frame(variant_set_ids)
# ...or have it auto-fetch the relevant tests (uses a sample if executed with
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# no arguments)
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.GenomicShortVariant.get_data_frame()
GenomicExpression
phc.GenomicExpression.get_data_frame(
expression=">=4000",
gene=["B2M", "MIR663B", "MT-CYB"],
order_by="expression:desc",
in_ckb=True,
all_results=True,
log=True
)
- Added
GenomicCopyNumberVariant
phc.GenomicCopyNumberVariant.get_data_frame(
effect=[phc.Option.CopyNumberStatus.AMPLIFICATION],
in_ckb=True
)
- Added
GenomicStructuralVariant
phc.GenomicStructuralVariant.get_data_frame(
patient_id="2c8660b4-1e63-403e-b52b-55c290072a66",
effect=[phc.Option.StructuralType.TRANSLOCATION],
gene=["TNRC6B", "CTD-2616J11.4"],
max_pages=2,
page_size=100
)
- Added
Gene
andGeneClass
from the knowledge APIs
phc.Gene.get_data_frame()
phc.GeneSet.get_data_frame()
-
Added abstract class
GenomicVariant
from which these specific classes inherit -
Added a whole host of options for these variant/expression classes
-
phc.Option.Chromosome
- phc.Option.ClinVarReview
- phc.Option.ClinvarSignificance
- phc.Option.CodingEffect
- phc.Option.Common
- phc.Option.CopyNumberStatus
- phc.Option.GeneClass
-
phc.Option.Zygosity
-
Added run-time validation of variant/expression options using these classes
-
phc.easy.omics.option.genomic_copy_number_variant.GenomicCopyNumberVariant
- phc.easy.omics.option.genomic_expression.GenomicExpression
- phc.easy.omics.option.genomic_short_variant.GenomicShortVariant
- phc.easy.omics.option.genomic_structural_variant.GenomicStructuralVariant
- phc.easy.omics.option.genomic_test.GenomicTest
Changed¶
- Updated options for
GenomicShortVariant
phc.GenomicShortVariant.get_data_frame(
patient_id="2c8660b4-1e63-403e-b52b-55c290072a66",
chromosome=[phc.Option.Chromosome.CHR_19],
gene_class=[phc.Option.GeneClass.PROTEIN_CODING],
zygosity=[phc.Option.Zygosity.HETEROZYGOUS],
rs_id=["rs11324363", "rs36247", "rs77134098"],
min_allele_frequency="0.2-1",
log=True,
all_results=True
)
0.19.0 - 2020-10-23¶
Added¶
- Added
GenomicTest
andGenomicShortVariant
# Get genomic tests and the associated sets
Source location - <https://github.com/lifeomic/phc-sdk-py/>
set_ids = phc.GenomicTest.get_data_frame(
patient_id="8cb82aa0-7f2c-4fdb-bf91-0ed1b315392c",
status="ACTIVE",
test_type="shortVariant",
all_results=True,
).id.values.tolist()
# Pull first 1000 short variants on chr1
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.GenomicShortVariant.get_data_frame(
variant_set_ids=set_ids,
chromosome=["chr1"],
page_size=1000,
log=True
)
- Added filtering by exact code, system, and/or display
phc.Condition.get_data_frame(code=["25910003", "30156004"], system="http://snomed.info/sct")
Fixed¶
- Sped up finding projects -
phc.Project.set_current()
Changed¶
- Overhauled
get_codes
to make results more accurate and allow searching by display (See #93 for full discussion)
# 1. Get display values and number of records they occur in
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_codes()
# => doc_count display field
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 0 1094.0 Date of Last Contact code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => ...
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2. Get full code values that match this text (case-insensitive)
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_codes(display_query="date of")
# ...
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# Retrieved 2332/2394 results
Source location - <https://github.com/lifeomic/phc-sdk-py/>
#
# => field system code display doc_count
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 0 code.coding http://loinc.org 63931-0 Date of Diagnosis 1094.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 1 code.coding http://loinc.org 21975-8 Date of Last Contact 1094.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 2 code.coding http://loinc.org 21981-6 Date of Disease Progression 144.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3. Get full code values but restrict number of records to find the associated system and code
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_codes("status", sample_size=10)
# ...
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# Retrieved 10/3017 results
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# Records with missing system/code values were not retrieved.
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => field code system display doc_count
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 0 code.coding 85337-4 http://loinc.org Estrogen Receptor Status 1048.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 1 code.coding 85339-0 http://loinc.org Progesterone Receptor Status 1047.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 2 NaN NaN NaN HER2/neu receptor status 919.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# => 3 NaN NaN NaN TMB Status 3.0
Source location - <https://github.com/lifeomic/phc-sdk-py/>
0.18.1 - 2020-10-09¶
Fixed¶
- Fixed
Genomics.update_set
use of readgroupsets API
0.18.0 - 2020-10-09¶
Added¶
- Added
Genomics.update_set
method for updating genomic sets
0.17.1 - 2020-09-17¶
Added¶
- Paging requests with
all_results=True
now automatically retries to the server with an exponentially smaller batch size on error (pow(limit, 0.85)
). We can't tell what the error is, but we can retry with a smaller page size. - Added
page_size
to the easy modules for a custom batch size - Added
max_pages
to the easy modules for capping the number of pages returned - Added pretty print to FHIR Search Service queries when passing
log=True
- Warn and convert out of range date times (e.g.
0217-01-01
) toNaT
Fixed¶
- Properly parse date columns with positive time zones into the local time and time zone
- Resolved a
KeyError
issue withcoding
where thevalueCodeableConcept
didn't have a system or url - Passing
patient_id
/patient_ids
with amust
FHIR Search Service query now works as expected
Changed¶
[BREAKING] The expanded columns have changed to more reflect the location of the value. All systems and URLs are separated by __
and prefixed with either url
or system
. Here is an example:
input_dict = [
{
"url": "http://hl7.org/fhir/StructureDefinition/us-core-race",
"valueCodeableConcept": {
"text": "race",
"coding": [
{
"code": "2106-3",
"system": "http://hl7.org/fhir/v3/Race",
"display": "white",
}
],
},
},
{
"url": "http://hl7.org/fhir/StructureDefinition/us-core-ethnicity",
"valueCodeableConcept": {
"text": "ethnicity",
"coding": [
{
"code": "2186-5",
"system": "http://hl7.org/fhir/v3/Ethnicity",
"display": "not hispanic or latino",
}
],
},
},
]
assert generic_codeable_to_dict(input_dict) == {
"url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_text": "race",
"url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Race__code": "2106-3",
"url__hl7.org/fhir/StructureDefinition/us-core-race__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Race__display": "white",
"url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_text": "ethnicity",
"url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Ethnicity__code": "2186-5",
"url__hl7.org/fhir/StructureDefinition/us-core-ethnicity__valueCodeableConcept_coding_system__hl7.org/fhir/v3/Ethnicity__display": "not hispanic or latino",
}
0.16.0 - 2020-08-27¶
Added¶
- Added most of remaining FSS entities:
- AuditEvent
- CarePlan
- DiagnosticReport
- DocumentReference
- Encounter
- ImagingStudy
- Immunization
- Media
- MedicationAdministration
- MedicationDispense
- MedicationRequest
- MedicationStatement
- Person
- Practitioner
- Procedure
- ProcedureRequest
- Provenance
- ReferralRequest
- Sequence
- Specimen
- Add abstract
Item
class for entities that don't relate to a patient (e.g. Organization and Practitioner)
Changed¶
All date columns now return two columns--one for the local time (with time zone removed) and one for the time zone offset in hours. Consider the onsetDateTime
column from BRCA's Condition
table:
onsetDateTime.tz onsetDateTime.local
0 0.0 1998-01-01 00:00:00+00:00
1 0.0 2010-01-01 00:00:00+00:00
2 0.0 2008-01-01 00:00:00+00:00
3 0.0 1994-01-01 00:00:00+00:00
4 0.0 2008-01-01 00:00:00+00:00
5 0.0 2012-01-01 00:00:00+00:00
6 0.0 2017-06-27 04:00:00+00:00
0.15.0 - 2020-08-05¶
Includes more work on the easy modules (imported via import phc.easy as phc
).
Added¶
- Added
phc.easy.Query.execute_ga4gh
that auto-scrolls GA4GH results - Added
phc.easy.Sequence
as another entity module - Added generic methods on
phc.easy.Query
get_count_by_field
get_codes
execute_composite_aggregations
(used byget_count_by_field
andget_codes
)- Added
phc.easy.PatientItem.get_count_by_patient
(Observation, Procedure, Specimen, etc.)
# Example: Get number of procedures by patient
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Procedure.get_count_by_patient()
# doc_count
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# subject.reference
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 518eb55d-adbf-42c3-8aed-68176d0ed4b7 334
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 67233488-ddd6-46e1-88cc-a93140b86c02 2088
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# b41f8107-85e1-42c3-b36e-400085799ab5 176
Source location - <https://github.com/lifeomic/phc-sdk-py/>
- Added
phc.easy.PatientItem.get_count_by_field
(Observation, Procedure, Specimen, etc.)
# Example: Get count of unique procedure display codes
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Procedure.get_count_by_field("code.coding.display")
# code.coding.display doc_count
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 0 lumpectomy 247
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 1 modified radical mastectomy 322
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2 other 272
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3 simple mastectomy 200
Source location - <https://github.com/lifeomic/phc-sdk-py/>
- Added
phc.easy.PatientItem.get_codes
(Observation, Procedure, Specimen, etc.)
# Example: Get observation codes for specific patients
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_codes(patient_ids=[
"e296f292-230f-444c-887f-0b213bde90fa",
"78adf262-c77e-4cb3-8435-034bd9e73b64"
])
# doc_count system code display field
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 0 1.0 http://loinc.org 21893-3 Regional lymph nodes positive [#] Specimen code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 1 2.0 http://loinc.org 21975-8 Date of Last Contact code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 2 1.0 http://loinc.org 21981-6 Date of Disease Progression code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 3 2.0 http://loinc.org 49683-6 HER2/neu receptor status code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 4 2.0 http://loinc.org 63931-0 Date of Diagnosis code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
# 5 2.0 http://loinc.org 85337-4 Estrogen Receptor Status code.coding
Source location - <https://github.com/lifeomic/phc-sdk-py/>
Changed¶
- Passing
log
to any PatientItem entities now logs the FSS query being run - For aggregations,
phc.Query.execute_fhir_dsl
now returns aFhirAggregation
if an aggregation is specified in the query phc.Query.execute_fhir_dsl_with_options
now caches aggregation queries in JSON format- Specifying
patient_id
and/orpatient_ids
is now properly supported with a custom FHIR query.
# Example: Get observations tagged with loinc for a specific patient
Source location - <https://github.com/lifeomic/phc-sdk-py/>
phc.Observation.get_data_frame(patient_id="<id>", query_overrides={
"where": {
"type": "elasticsearch",
"query": {
"term": {
"code.coding.system.keyword": "http://loinc.org"
}
}
}
})
Fixed¶
- Fix
phc.easy.Procedure
not inheriting newphc.easy.PatientItem
behavior
0.14.1 - 2020-07-15¶
Fixed¶
- Fixed missing trust_env args in created client objects
0.14.0 - 2020-07-14¶
Added¶
- All-new easy module for faster analysis! Simply
import phc.easy as phc
. - Add
Auth
for shared authentication details (account, project, and token) - Add
Query
for scrolling through FHIR Search Service (FSS) data - Add
Frame
for expanding columns that contain FHIR data and parsing dates - Add
APICache
for auto-caching results from easy modules - Add
CSVWriter
for intelligently writing batches O(1) without having memory grow - Includes
Project
,Patient
,Observation
,Procedure
,Condition
,Goal
, andSpecimen
0.13.0 - 2020-04-17¶
Added¶
- Switched build over to github actions
0.12.3 - 2020-04-13¶
Added¶
- Adds
dsl
andsql
methods tophc.services.Fhir
Changed¶
- Deprecates
execute_sql
andexecute_es
methods inphc.services.Fhir
0.12.2 - 2020-03-25¶
Fixed¶
- Added retries to file download requests
0.12.1 - 2020-03-25¶
Fixed¶
- Fixed retry logic to include OS level errors.
0.12.0 - 2020-03-23¶
Added¶
- Added retry support for failed API requests.
0.11.0 - 2020-03-17¶
Added¶
- Added the
trust_env
parameter to all service classes to enable http proxy support.
0.10.0 - 2020-03-10¶
Added¶
- Added
execute_sql
tophc.services.Analytics
.
0.9.2 - 2020-02-19¶
Added¶
- Added scroll support to
phc.services.Fhir
via thescroll
param.
0.9.1 - 2019-12-17¶
Changed¶
- Fixed
phc.services.Genomics.Status
enum.
0.9.0 - 2019-12-16¶
Changed¶
- Added
phc.services.Genomics
for performing genomic related operations.
0.8.1 - 2019-11-27¶
Changed¶
- In
Analytics.load_data_lake_result_to_dataframe
increased the amount of time it takes to wait for a results file.
0.8.0 - 2019-11-25¶
Added¶
- Added
Analytics.list_data_lake_schemas
to fetch the schemas of each data lake table. - Added
Analytics.get_data_lake_schema
to fetch the schema of a single data lake table. - Added
Analytics.execute_data_lake_query_to_dataframe
to execute a data lake query and load the results to a Pandas dataframe. - Added
Analytics.load_data_lake_result_to_dataframe
to load the results of a previously executed data lake query to a Pandas dataframe. - Added
Files.exists
to check if a file exists.
0.7.1 - 2019-11-21¶
Fixed¶
- Fixed issue with
Files.download
to create target directories if they do not exist.
0.7.0 - 2019-11-20¶
Added¶
- Added optional
pandas
setup install - Added
ApiResponse.get_as_dataframe
to return a response item as a Pandas DataFrame.
0.6.0 - 2019-11-01¶
Added¶
- Added the
phc.services.Files
submodule that provides actions for files in PHC projects. - Added the
phc.services.Cohorts
submodule that provides actions for files in PHC cohorts.
Created: 2022-05-19