OCDS Kingfisher Colab 0.6.0¶

A set of utility functions for Google Colab notebooks using OCDS data.

If you are viewing this on GitHub, open the full documentation for additional details.

Troubleshooting¶

Using Jupyter Notebook¶

If you are using Kingfisher Colab in a Jupyter Notebook (not on Google Colab), you need to:

Install the google-colab package:
```
pip install google-colab
```
Upgrade the ipykernel package:
```
pip install --upgrade ipykernel
```

Using JSON operators with the %sql magic¶

When using the ipython-sql %sql line magic, you must avoid spaces around JSON operators.

E.g. data->'ocid' not data -> 'ocid'

API¶

exception ocdskingfishercolab.MissingFieldsError[source]¶: Raised when no fields are provided to a function.

exception ocdskingfishercolab.OCDSKingfisherColabError[source]¶: Base class for exceptions from within this package.

exception ocdskingfishercolab.UnknownPackageTypeError[source]¶: Raised when the provided package type is unknown.

ocdskingfishercolab.authenticate_gspread()[source]¶

Authenticate the current user and give the notebook permission to connect to Google Spreadsheets.

Returns:: a Google Sheets Client instance
Return type:: gspread.Client

ocdskingfishercolab.authenticate_pydrive()[source]¶

Authenticate the current user and give the notebook permission to connect to Google Drive.

Returns:: a GoogleDrive instance
Return type:: pydrive2.drive.GoogleDrive

ocdskingfishercolab.calculate_coverage(fields, scope=None, *, print_sql=True, return_sql=False)[source]¶

Calculate the coverage of one or more fields using the summary tables produced by Kingfisher Summarize’s --field-lists option. Return the coverage of each field and the co-occurrence coverage of all fields.

scope is the Kingfisher Summarize table to measure coverage against, e.g. "awards_summary". Coverage is calculated using the number of rows in this table as the denominator.

If scope is not set, it defaults to the parent table of the first field.

fields is a list of fields to measure the coverage of, specified using JSON Pointer.

If a field isn’t a child of the scope table, use an absolute pointer:

calculate_coverage(["tender/procurementMethod"], "awards_summary")

If a field is a child of the scope table, use either an absolute pointer:

calculate_coverage(["awards/value/amount"], "awards_summary")

Or a relative pointer (prepend with ":"):

calculate_coverage([":value/amount"], "awards_summary")

If a field is within an array, it counts if it appears in any object in the array.

calculate_coverage([":items/description"], "awards_summary")

To require a field to appear in all objects in the array, prepend with "ALL ":

calculate_coverage(["ALL :items/description"], "awards_summary")

Note

Nested arrays, like the "awards/items/description" field with a "release_summary" scope, will yield inaccurate results, unless the initial arrays are present and one-to-one with the scope table (i.e. there is always exactly one award for each release).

If scope is "awards_summary", you can specify fields on related contracts by prepending ":contracts/":

calculate_coverage([":value/amount", ":contracts/period"], "awards_summary")

If scope is "contracts_summary", you can specify fields on related awards by prepending ":awards/":

calculate_coverage([":value/amount", ":awards/date"], "contracts_summary")

Parameters:

fields (list) – the fields to measure coverage of
scope (str) – the table to measure coverage against
print_sql (bool) – print the SQL query
return_sql (bool) – return the SQL query instead of executing the SQL query and returning the results

Returns:

the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.

Return type:

pandas.DataFrame or sql.run.ResultSet

ocdskingfishercolab.download_data_as_json(data, filename)[source]¶

Dump the data to a JSON file, and invoke a browser download of the CSV file to your local computer.

Parameters:

data – JSON-serializable data
filename (str) – a file name

ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]¶

Convert the data frame to a CSV file, and invoke a browser download of the CSV file to your local computer.

Parameters:

dataframe (pandas.DataFrame) – a data frame
filename (str) – a file name

ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]¶

Select all releases with the given ocid from the given collection, and invoke a browser download of the packaged releases to your local computer.

Parameters:

collection_id (int) – a collection’s ID
ocid (str) – an OCID
package_type (str) – “release” or “record”

Raises:

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.download_package_from_query(sql, package_type=None)[source]¶

Execute a SQL statement that SELECTs only the data column of the data table, and invoke a browser download of the packaged data to your local computer.

Parameters:

sql (str) – a SQL statement
package_type (str) – “release” or “record”

Raises:

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.format_thousands(axis, locale='en_US')[source]¶: Set the thousands separator on the given axis for the given locale, e.g. en_US.

ocdskingfishercolab.get_ipython_sql_resultset_from_query(sql, _collection_id=None, _ocid=None)[source]¶

Execute a SQL statement and return a ResultSet.

Parameters are taken from the scope this function is called from (same behaviour as ipython-sql’s %sql magic).

Parameters:: sql (str) – a SQL statement
Returns:: the results as a ResultSet
Return type:: sql.run.ResultSet

ocdskingfishercolab.list_collections(source_id=None)[source]¶

Return, as a ResultSet or DataFrame, a list of collections with the given source ID.

Parameters:: source_id (str) – a source ID
Returns:: the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.
Return type:: pandas.DataFrame or sql.run.ResultSet

ocdskingfishercolab.list_source_ids(pattern='')[source]¶

Return, as a ResultSet or DataFrame, a list of source IDs matching the given pattern.

Parameters:: pattern (str) – a substring, like “paraguay”
Returns:: the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.
Return type:: pandas.DataFrame or sql.run.ResultSet

ocdskingfishercolab.render_json(json_string)[source]¶

Render JSON into collapsible HTML.

Parameters:: json_string – JSON-deserializable string

ocdskingfishercolab.save_dataframe_to_sheet(spreadsheet_name, dataframe, sheetname, *, prompt=True)[source]¶

Save a data frame to a worksheet in Google Sheets, after asking the user for confirmation.

Parameters:

spreadsheet_name (str) – the name of the spreadsheet
dataframe (pandas.DataFrame) – a data frame
sheetname (str) – the name of the sheet to add
prompt (bool) – whether to prompt the user

ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]¶

Dump the release_package column of a data frame to a JSON file, convert the JSON file to an Excel file, and upload the Excel file to Google Drive.

Parameters:

dataframe (pandas.DataFrame) – a data frame
name (str) – the basename of the Excel file to write

ocdskingfishercolab.set_dark_mode()[source]¶: Set the Seaborn theme to match Google Colab’s dark mode.

ocdskingfishercolab.set_light_mode()[source]¶: Set the Seaborn theme to light mode, for exporting plots.

ocdskingfishercolab.set_search_path(schema_name)[source]¶

Set the search_path to the given schema, followed by the public schema.

Parameters:: schema_name (str) – a schema name

ocdskingfishercolab.write_data_as_json(data, filename)[source]¶

Dump the data to a JSON file.

Parameters:

data – JSON-serializable data
filename (str) – a file name