OCDS Kingfisher Colab 0.3.13#
A set of utility functions for Google Colaboratory notebooks using OCDS data.
If you are viewing this on GitHub, open the full documentation for additional details.
Troubleshooting#
Using Jupyter Notebook#
If you are using Kingfisher Colab in a Jupyter Notebook (not on Google Colaboratory), you need to:
Install the
google-colab
package:pip install google-colab
Upgrade the
ipykernel
package:pip install --upgrade ipykernel
Using JSON operators with the %sql magic#
When using the ipython-sql %sql
line magic, you must avoid spaces around JSON operators.
E.g. data->'ocid'
not data -> 'ocid'
API#
- ocdskingfishercolab.authenticate_gspread()[source]#
Authenticates the current user and gives the notebook permission to connect to Google Spreadsheets.
- Returns:
a Google Sheets Client instance
- Return type:
gspread.Client
- ocdskingfishercolab.authenticate_pydrive()[source]#
Authenticates the current user and gives the notebook permission to connect to Google Drive.
- Returns:
a GoogleDrive instance
- Return type:
pydrive.drive.GoogleDrive
- ocdskingfishercolab.set_spreadsheet_name(name)[source]#
Sets the name of the spreadsheet to which to save.
Used by
ocdskingfishercolab.save_dataframe_to_sheet()
.- Parameters:
name (str) – a spreadsheet name
- ocdskingfishercolab.list_source_ids(pattern='')[source]#
Returns, as a ResultSet or DataFrame, a list of source IDs matching the given pattern.
- Parameters:
pattern (str) – a substring, like “paraguay”
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.list_collections(source_id=None)[source]#
Returns, as a ResultSet or DataFrame, a list of collections with the given source ID.
- Parameters:
source_id (str) – a source ID
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.set_search_path(schema_name)[source]#
Sets the search_path to the given schema, followed by the
public
schema.- Parameters:
schema_name (str) – a schema name
- ocdskingfishercolab.save_dataframe_to_sheet(dataframe, sheetname, prompt=True)[source]#
Saves a data frame to a worksheet in Google Sheets, after asking the user for confirmation.
Use
ocdskingfishercolab.set_spreadsheet_name()
to set the spreadsheet name.
- ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]#
Dumps the
release_package
column of a data frame to a JSON file, converts the JSON file to an Excel file, and uploads the Excel file to Google Drive.- Parameters:
dataframe (pandas.DataFrame) – a data frame
name (str) – the basename of the Excel file to write
- ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]#
Converts the data frame to a CSV file, and invokes a browser download of the CSV file to your local computer.
- Parameters:
dataframe (pandas.DataFrame) – a data frame
filename (str) – a file name
- ocdskingfishercolab.download_data_as_json(data, filename)[source]#
Dumps the data to a JSON file, and invokes a browser download of the CSV file to your local computer.
- Parameters:
data – JSON-serializable data
filename (str) – a file name
- ocdskingfishercolab.get_ipython_sql_resultset_from_query(sql, _collection_id=None, _ocid=None)[source]#
Executes a SQL statement and returns a ResultSet.
Parameters are taken from the scope this function is called from (same behaviour as ipython-sql’s
%sql
magic).
- ocdskingfishercolab.download_package_from_query(sql, package_type=None)[source]#
Executes a SQL statement that SELECTs only the
data
column of thedata
table, and invokes a browser download of the packaged data to your local computer.- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]#
Selects all releases with the given ocid from the given collection, and invokes a browser download of the packaged releases to your local computer.
- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.write_data_as_json(data, filename)[source]#
Dumps the data to a JSON file.
- Parameters:
data – JSON-serializable data
filename (str) – a file name
- ocdskingfishercolab.render_json(json_string)[source]#
Renders JSON into collapsible HTML.
- Parameters:
json_string – JSON-deserializable string
- ocdskingfishercolab.calculate_coverage(fields, scope=None, print_sql=True, return_sql=False)[source]#
Calculates the coverage of one or more fields using the summary tables produced by Kingfisher Summarize’s
--field-lists
option. Returns the coverage of each field and the co-occurrence coverage of all fields.scope
is the Kingfisher Summarize table to measure coverage against, e.g."awards_summary"
. Coverage is calculated using the number of rows in this table as the denominator.If
scope
is not set, it defaults to the parent table of the first field.fields
is a list of fields to measure the coverage of, specified using JSON Pointer.If a field isn’t a child of the
scope
table, use an absolute pointer:calculate_coverage(["tender/procurementMethod"], "awards_summary")
If a field is a child of the
scope
table, use either an absolute pointer:calculate_coverage(["awards/value/amount"], "awards_summary")
Or a relative pointer (prepend with
":"
):calculate_coverage([":value/amount"], "awards_summary")
If a field is within an array, it counts if it appears in any object in the array.
calculate_coverage([":items/description"], "awards_summary")
To require a field to appear in all objects in the array, prepend with
"ALL "
:calculate_coverage(["ALL :items/description"], "awards_summary")
Note
Nested arrays, like the
"awards/items/description"
field with a"release_summary"
scope, will yield inaccurate results, unless the initial arrays are present and one-to-one with the scope table (i.e. there is always exactly one award for each release).If
scope
is"awards_summary"
, you can specify fields on related contracts by prepending":contracts/"
:calculate_coverage([":value/amount", ":contracts/period"], "awards_summary")
If
scope
is"contracts_summary"
, you can specify fields on related awards by prepending":awards/"
:calculate_coverage([":value/amount", ":awards/date"], "contracts_summary")
- Parameters:
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.set_dark_mode()[source]#
Set the Seaborn theme to match Google Colaboratory’s dark mode.
- ocdskingfishercolab.set_light_mode()[source]#
Set the Seaborn theme to light mode, for exporting plots.
- ocdskingfishercolab.format_thousands(axis, locale='en_US')[source]#
Set the thousands separator on the given axis for the given locale, e.g.
en_US
.
- exception ocdskingfishercolab.OCDSKingfisherColabError[source]#
Base class for exceptions from within this package