OCDS Kingfisher Colab 0.6.0¶
A set of utility functions for Google Colab notebooks using OCDS data.
If you are viewing this on GitHub, open the full documentation for additional details.
Troubleshooting¶
Using Jupyter Notebook¶
If you are using Kingfisher Colab in a Jupyter Notebook (not on Google Colab), you need to:
Install the
google-colabpackage:pip install google-colab
Upgrade the
ipykernelpackage:pip install --upgrade ipykernel
Using JSON operators with the %sql magic¶
When using the ipython-sql %sql line magic, you must avoid spaces around JSON operators.
E.g. data->'ocid' not data -> 'ocid'
API¶
- exception ocdskingfishercolab.MissingFieldsError[source]¶
Raised when no fields are provided to a function.
- exception ocdskingfishercolab.OCDSKingfisherColabError[source]¶
Base class for exceptions from within this package.
- exception ocdskingfishercolab.UnknownPackageTypeError[source]¶
Raised when the provided package type is unknown.
- ocdskingfishercolab.authenticate_gspread()[source]¶
Authenticate the current user and give the notebook permission to connect to Google Spreadsheets.
- Returns:
a Google Sheets Client instance
- Return type:
- ocdskingfishercolab.authenticate_pydrive()[source]¶
Authenticate the current user and give the notebook permission to connect to Google Drive.
- Returns:
a GoogleDrive instance
- Return type:
- ocdskingfishercolab.calculate_coverage(fields, scope=None, *, print_sql=True, return_sql=False)[source]¶
Calculate the coverage of one or more fields using the summary tables produced by Kingfisher Summarize’s
--field-listsoption. Return the coverage of each field and the co-occurrence coverage of all fields.scopeis the Kingfisher Summarize table to measure coverage against, e.g."awards_summary". Coverage is calculated using the number of rows in this table as the denominator.If
scopeis not set, it defaults to the parent table of the first field.fieldsis a list of fields to measure the coverage of, specified using JSON Pointer.If a field isn’t a child of the
scopetable, use an absolute pointer:calculate_coverage(["tender/procurementMethod"], "awards_summary")
If a field is a child of the
scopetable, use either an absolute pointer:calculate_coverage(["awards/value/amount"], "awards_summary")
Or a relative pointer (prepend with
":"):calculate_coverage([":value/amount"], "awards_summary")
If a field is within an array, it counts if it appears in any object in the array.
calculate_coverage([":items/description"], "awards_summary")
To require a field to appear in all objects in the array, prepend with
"ALL ":calculate_coverage(["ALL :items/description"], "awards_summary")
Note
Nested arrays, like the
"awards/items/description"field with a"release_summary"scope, will yield inaccurate results, unless the initial arrays are present and one-to-one with the scope table (i.e. there is always exactly one award for each release).If
scopeis"awards_summary", you can specify fields on related contracts by prepending":contracts/":calculate_coverage([":value/amount", ":contracts/period"], "awards_summary")
If
scopeis"contracts_summary", you can specify fields on related awards by prepending":awards/":calculate_coverage([":value/amount", ":awards/date"], "contracts_summary")
- Parameters:
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandasisTrueorFalserespectively. This is the same behaviour as ipython-sql’s%sqlmagic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.download_data_as_json(data, filename)[source]¶
Dump the data to a JSON file, and invoke a browser download of the CSV file to your local computer.
- Parameters:
data – JSON-serializable data
filename (str) – a file name
- ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]¶
Convert the data frame to a CSV file, and invoke a browser download of the CSV file to your local computer.
- Parameters:
dataframe (pandas.DataFrame) – a data frame
filename (str) – a file name
- ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]¶
Select all releases with the given ocid from the given collection, and invoke a browser download of the packaged releases to your local computer.
- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.download_package_from_query(sql, package_type=None)[source]¶
Execute a SQL statement that SELECTs only the
datacolumn of thedatatable, and invoke a browser download of the packaged data to your local computer.- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.format_thousands(axis, locale='en_US')[source]¶
Set the thousands separator on the given axis for the given locale, e.g.
en_US.
- ocdskingfishercolab.get_ipython_sql_resultset_from_query(sql, _collection_id=None, _ocid=None)[source]¶
Execute a SQL statement and return a ResultSet.
Parameters are taken from the scope this function is called from (same behaviour as ipython-sql’s
%sqlmagic).
- ocdskingfishercolab.list_collections(source_id=None)[source]¶
Return, as a ResultSet or DataFrame, a list of collections with the given source ID.
- Parameters:
source_id (str) – a source ID
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandasisTrueorFalserespectively. This is the same behaviour as ipython-sql’s%sqlmagic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.list_source_ids(pattern='')[source]¶
Return, as a ResultSet or DataFrame, a list of source IDs matching the given pattern.
- Parameters:
pattern (str) – a substring, like “paraguay”
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandasisTrueorFalserespectively. This is the same behaviour as ipython-sql’s%sqlmagic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.render_json(json_string)[source]¶
Render JSON into collapsible HTML.
- Parameters:
json_string – JSON-deserializable string
- ocdskingfishercolab.save_dataframe_to_sheet(spreadsheet_name, dataframe, sheetname, *, prompt=True)[source]¶
Save a data frame to a worksheet in Google Sheets, after asking the user for confirmation.
- Parameters:
spreadsheet_name (str) – the name of the spreadsheet
dataframe (pandas.DataFrame) – a data frame
sheetname (str) – the name of the sheet to add
prompt (bool) – whether to prompt the user
- ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]¶
Dump the
release_packagecolumn of a data frame to a JSON file, convert the JSON file to an Excel file, and upload the Excel file to Google Drive.- Parameters:
dataframe (pandas.DataFrame) – a data frame
name (str) – the basename of the Excel file to write
- ocdskingfishercolab.set_dark_mode()[source]¶
Set the Seaborn theme to match Google Colab’s dark mode.
- ocdskingfishercolab.set_light_mode()[source]¶
Set the Seaborn theme to light mode, for exporting plots.
- ocdskingfishercolab.set_search_path(schema_name)[source]¶
Set the search_path to the given schema, followed by the
publicschema.- Parameters:
schema_name (str) – a schema name