OCDS Kingfisher Colab 0.3.9#

PyPI Version Build Status Coverage Status Python Version

A set of utility functions for Google Colaboratory notebooks using OCDS data.

If you are viewing this on GitHub, open the full documentation for additional details.

Troubleshooting#

Using Jupyter Notebook#

If you are using Kingfisher Colab in a Jupyter Notebook (not on Google Colaboratory), you need to:

  1. Install the google-colab package:

    pip install google-colab
    
  2. Upgrade the ipykernel package:

    pip install --upgrade ipykernel
    

Using JSON operators with the %sql magic#

When using the ipython-sql %sql line magic, you must avoid spaces around JSON operators.

E.g. data->'ocid' not data -> 'ocid'

API#

ocdskingfishercolab.authenticate_gspread()[source]#

Authenticates the current user and gives the notebook permission to connect to Google Spreadsheets.

Returns

a Google Sheets Client instance

Return type

gspread.Client

ocdskingfishercolab.authenticate_pydrive()[source]#

Authenticates the current user and gives the notebook permission to connect to Google Drive.

Returns

a GoogleDrive instance

Return type

pydrive.drive.GoogleDrive

ocdskingfishercolab.set_spreadsheet_name(name)[source]#

Sets the name of the spreadsheet to which to save.

Used by ocdskingfishercolab.save_dataframe_to_sheet().

Parameters

name (str) – a spreadsheet name

ocdskingfishercolab.list_source_ids(pattern='')[source]#

Returns, as a ResultSet or DataFrame, a list of source IDs matching the given pattern.

Parameters

pattern (str) – a substring, like “paraguay”

Returns

the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.

Return type

pandas.DataFrame or sql.run.ResultSet

ocdskingfishercolab.list_collections(source_id=None)[source]#

Returns, as a ResultSet or DataFrame, a list of collections with the given source ID.

Parameters

source_id (str) – a source ID

Returns

the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.

Return type

pandas.DataFrame or sql.run.ResultSet

ocdskingfishercolab.set_search_path(schema_name)[source]#

Sets the search_path to the given schema, followed by the public schema.

Parameters

schema_name (str) – a schema name

ocdskingfishercolab.save_dataframe_to_sheet(dataframe, sheetname, prompt=True)[source]#

Saves a data frame to a worksheet in Google Sheets, after asking the user for confirmation.

Use ocdskingfishercolab.set_spreadsheet_name() to set the spreadsheet name.

Parameters
  • dataframe (pandas.DataFrame) – a data frame

  • sheetname (str) – a sheet name

  • prompt (bool) – whether to prompt the user

ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]#

Dumps the release_package column of a data frame to a JSON file, converts the JSON file to an Excel file, and uploads the Excel file to Google Drive.

Parameters
  • dataframe (pandas.DataFrame) – a data frame

  • name (str) – the basename of the Excel file to write

ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]#

Converts the data frame to a CSV file, and invokes a browser download of the CSV file to your local computer.

Parameters
  • dataframe (pandas.DataFrame) – a data frame

  • filename (str) – a file name

ocdskingfishercolab.download_data_as_json(data, filename)[source]#

Dumps the data to a JSON file, and invokes a browser download of the CSV file to your local computer.

Parameters
  • data – JSON-serializable data

  • filename (str) – a file name

ocdskingfishercolab.get_ipython_sql_resultset_from_query(sql, _collection_id=None, _ocid=None)[source]#

Executes a SQL statement and returns a ResultSet.

Parameters are taken from the scope this function is called from (same behaviour as ipython-sql’s %sql magic).

Parameters

sql (str) – a SQL statement

Returns

the results as a ResultSet

Return type

sql.run.ResultSet

ocdskingfishercolab.download_package_from_query(sql, package_type=None)[source]#

Executes a SQL statement that SELECTs only the data column of the data table, and invokes a browser download of the packaged data to your local computer.

Parameters
  • sql (str) – a SQL statement

  • package_type (str) – “record” or “release”

Raises

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]#

Selects all releases with the given ocid from the given collection, and invokes a browser download of the packaged releases to your local computer.

Parameters
  • collection_id (int) – a collection’s ID

  • ocid (str) – an OCID

  • package_type (str) – “record” or “release”

Raises

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.write_data_as_json(data, filename)[source]#

Dumps the data to a JSON file.

Parameters
  • data – JSON-serializable data

  • filename (str) – a file name

ocdskingfishercolab.render_json(json_string)[source]#

Renders JSON into collapsible HTML.

Parameters

json_string – JSON-deserializable string

ocdskingfishercolab.calculate_coverage(fields, scope=None, print_sql=True, return_sql=False)[source]#

Calculates the coverage of one or more fields using the summary tables produced by Kingfisher Summarize’s --field-lists option. Returns the coverage of each field and the co-occurrence coverage of all fields.

scope is the Kingfisher Summarize table to measure coverage against, e.g. "awards_summary". Coverage is calculated using the number of rows in this table as the denominator.

If scope is not set, it defaults to the parent table of the first field.

fields is a list of fields to measure the coverage of, specified using JSON Pointer.

If a field isn’t a child of the scope table, use an absolute pointer:

calculate_coverage(["tender/procurementMethod"], "awards_summary")

If a field is a child of the scope table, use either an absolute pointer:

calculate_coverage(["awards/value/amount"], "awards_summary")

Or a relative pointer (prepend with ":"):

calculate_coverage([":value/amount"], "awards_summary")

If a field is within an array, it counts if it appears in any object in the array.

calculate_coverage([":items/description"], "awards_summary")

To require a field to appear in all objects in the array, prepend with "ALL ":

calculate_coverage(["ALL :items/description"], "awards_summary")

Note

Nested arrays, like the "awards/items/description" field with a "release_summary" scope, will yield inaccurate results, unless the initial arrays are present and one-to-one with the scope table (i.e. there is always exactly one award for each release).

If scope is "awards_summary", you can specify fields on related contracts by prepending ":contracts/":

calculate_coverage([":value/amount", ":contracts/period"], "awards_summary")

If scope is "contracts_summary", you can specify fields on related awards by prepending ":awards/":

calculate_coverage([":value/amount", ":awards/date"], "contracts_summary")
Parameters
  • fields (list) – the fields to measure coverage of

  • scope (str) – the table to measure coverage against

  • print_sql (bool) – print the SQL query

  • return_sql (bool) – return the SQL query instead of executing the SQL query and returning the results

Returns

the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether %config SqlMagic.autopandas is True or False respectively. This is the same behaviour as ipython-sql’s %sql magic.

Return type

pandas.DataFrame or sql.run.ResultSet

exception ocdskingfishercolab.OCDSKingfisherColabError[source]#

Base class for exceptions from within this package

exception ocdskingfishercolab.UnknownPackageTypeError[source]#

Raised when the provided package type is unknown

exception ocdskingfishercolab.MissingFieldsError[source]#

Raised when no fields are provided to a function