Skip to content

reconciler.utils

get_query_dict()

Convert a pandas DataFrame column to a query dictionary

The reconciliation API requires a json request formatted in a very particular way. This function takes in a DataFrame column and reformats it.

Parameters:

Name Type Description Default
df_column Series

A pandas Series to reconcile.

required
type_id str

A string specifying the item type to reconcile against, in Wikidata this corresponds to the 'instance of' property of an item.

required
property_mapping dict

Property-column mapping of the items you want to reconcile against. For example, {"P17": df['country']} to reconcile against items that have the property country equals to the values in the column country. This is optional and defaults to None.

required

Returns:

Name Type Description
tuple

A tuple containing the list of the original values sent to reconciliation a dictionary with the column values reformatted.

Source code in reconciler/utils.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def get_query_dict(df_column, type_id, property_mapping):
    """
    Convert a pandas DataFrame column to a query dictionary

    The reconciliation API requires a json request formatted in a
    very particular way. This function takes in a DataFrame column
    and reformats it.

    Args:
        df_column (Series): A pandas Series to reconcile.
        type_id (str): A string specifying the item type to reconcile against,
            in Wikidata this corresponds to the 'instance of' property of an item.
        property_mapping (dict): Property-column mapping of the items you want to
            reconcile against. For example, {"P17": df['country']} to reconcile
            against items that have the property country equals to the values
            in the column country. This is optional and defaults to None.
    Returns:
        tuple: A tuple containing the list of the original values
            sent to reconciliation a dictionary with the
            column values reformatted.
    """
    input_keys = df_column.unique()
    reformatted = defaultdict(dict)

    for idx, value in enumerate(input_keys):

        reformatted[idx]["query"] = value

        if type_id is not None:
            reformatted[idx]["type"] = type_id
        if property_mapping is not None:
            reformatted[idx]["properties"] = create_property_array(
                df_column, property_mapping, value
            )

    return input_keys, reformatted

create_property_array()

Create a query JSON 'properties' array

Creates the properties array necessary for when the property_mapping is defined.

Parameters:

Name Type Description Default
df_column Series

A pandas Series to reconcile.

required
property_mapping dict

The property-column mapping dictionary.

required
current_value str

Current iteration through the input_keys

required

Returns:

Name Type Description
list

A list of dictionaries corresponding to the properties.

Source code in reconciler/utils.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def create_property_array(df_column, property_mapping, current_value):
    """
    Create a query JSON 'properties' array

    Creates the properties array necessary for when the property_mapping is defined.

    Args:
        df_column (Series): A pandas Series to reconcile.
        property_mapping (dict): The property-column mapping dictionary.
        current_value (str): Current iteration through the input_keys

    Returns:
        list: A list of dictionaries corresponding to the properties.
    """

    prop_mapping_list = []
    for key, value in property_mapping.items():

        prop_value = (
            value.loc[df_column == current_value].to_string(index=False).strip()
        )

        prop_mapping_list.append({"pid": key, "v": prop_value})

    return prop_mapping_list

chunk_dictionary()

Split a large dictionary into equal-sized dictionaries

Parameters:

Name Type Description Default
data dict

The dictionary to be split.

required
size int

The size the smaller dictionaries are supposed to be.

10

Returns:

Name Type Description
dict

A subdivision of the larger dictionary, of the corresponding size.

Source code in reconciler/utils.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def chunk_dictionary(data, size=10):
    """
    Split a large dictionary into equal-sized dictionaries

    Args:
        data (dict): The dictionary to be split.
        size (int): The size the smaller dictionaries are supposed to be.

    Returns:
        dict: A subdivision of the larger dictionary, of the
            corresponding size.
    """
    # https://stackoverflow.com/questions/22878743/how-to-split-dictionary-into-multiple-dictionaries-fast
    it = iter(data)
    for _ in range(0, len(data), size):
        yield {k: data[k] for k in islice(it, size)}