Skip to content

Reconciling cell types

In this showcase we'll try to reconcile some cell-type data from the Tabula muris dataset.

First off, let's load the necessary packages and read in the data using Pandas. For speed and simplicity, I'll only be reconciling the unique tissue/cell_ontology_class pairs.

Pre-processing

from reconciler import reconcile
import pandas as pd

data_url = "https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/10039264/annotations_droplets.csv"

cell_table = pd.read_csv(data_url)
cell_table.head()
cell tissue cell_ontology_class cell_ontology_term_iri cell_ontology_id
10X_P4_3_AAAGTAGAGATGCCAG Bladder mesenchymal cell http://purl.obolibrary.org/obo/CL_0008019 CL:0008019
10X_P4_3_AACCGCGTCCAACCAA Bladder mesenchymal cell http://purl.obolibrary.org/obo/CL_0008019 CL:0008019
10X_P4_3_AACTCCCGTCGGGTCT Bladder mesenchymal cell http://purl.obolibrary.org/obo/CL_0008019 CL:0008019
10X_P4_3_AACTCTTAGTTGCAGG Bladder bladder cell http://purl.obolibrary.org/obo/CL_1001319 CL:1001319
10X_P4_3_AACTCTTTCATAACCG Bladder mesenchymal cell http://purl.obolibrary.org/obo/CL_0008019 CL:0008019

Filtering only the unique pairs:

unique_cells = cell_table.drop_duplicates(subset=['tissue', 'cell_ontology_class'])

Reconciliation

Reconciling, against cell type (Q189118), returning the first 2 matches for each item:

This step will take a while to complete, varying according to your upload speed, here it took around a minute.

reconciled = reconcile(unique_cells['cell_ontology_class'], type_id="Q189118", top_res=2)
reconciled.head(10)

The output I got:

id match name score type type_id input_value
Q1922379 True mesenchymal stem cells 100 cell type Q189118 mesenchymal cell
Q66563456 False epithelial cell of gall bladder 28 [] nan bladder cell
Q66568549 False urothelial cell of trigone of urinary bladder 21 [] nan bladder cell
Q11394395 False endothelial cells 50 [] nan endothelial cell
Q68620792 False human sinusoidal endothelial cell 32.5 [] nan endothelial cell
Q66590632 False basal cell of urothelium 50 [] nan basal cell of urothelium
Q66590636 False basal cell layer of urothelium 44.5 [] nan basal cell of urothelium
Q223143 False granulocyte 67 cell type Q189118 leukocyte
Q1775422 False agranulocyte 60 cell type Q189118 leukocyte
Q463418 True fibroblast 100 cell type Q189118 fibroblast

Now if you look at the object, you will see the matches retrieved for each item.

Interestingly, as of 30 Aug. 2020, there's not a lot of cell type data present in Wikidata, a lot of the matches didn't even return a "type" value! That means they don't even have an 'instance of' property. This could be very interesting to look into, and add this information.