OCDS Kingfisher Colab 0.3.14¶
A set of utility functions for Google Colaboratory notebooks using OCDS data.
If you are viewing this on GitHub, open the full documentation for additional details.
Troubleshooting¶
Using Jupyter Notebook¶
If you are using Kingfisher Colab in a Jupyter Notebook (not on Google Colaboratory), you need to:
Install the
google-colab
package:pip install google-colab
Upgrade the
ipykernel
package:pip install --upgrade ipykernel
Using JSON operators with the %sql magic¶
When using the ipython-sql %sql
line magic, you must avoid spaces around JSON operators.
E.g. data->'ocid'
not data -> 'ocid'
API¶
- ocdskingfishercolab.authenticate_gspread()[source]¶
Authenticate the current user and give the notebook permission to connect to Google Spreadsheets.
- Returns:
a Google Sheets Client instance
- Return type:
gspread.Client
- ocdskingfishercolab.authenticate_pydrive()[source]¶
Authenticate the current user and give the notebook permission to connect to Google Drive.
- Returns:
a GoogleDrive instance
- Return type:
pydrive.drive.GoogleDrive
- ocdskingfishercolab.set_spreadsheet_name(name)[source]¶
Set the name of the spreadsheet to which to save.
Used by
ocdskingfishercolab.save_dataframe_to_sheet()
.- Parameters:
name (str) – a spreadsheet name
- ocdskingfishercolab.list_source_ids(pattern='')[source]¶
Return, as a ResultSet or DataFrame, a list of source IDs matching the given pattern.
- Parameters:
pattern (str) – a substring, like “paraguay”
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.list_collections(source_id=None)[source]¶
Return, as a ResultSet or DataFrame, a list of collections with the given source ID.
- Parameters:
source_id (str) – a source ID
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.set_search_path(schema_name)[source]¶
Set the search_path to the given schema, followed by the
public
schema.- Parameters:
schema_name (str) – a schema name
- ocdskingfishercolab.save_dataframe_to_sheet(dataframe, sheetname, *, prompt=True)[source]¶
Save a data frame to a worksheet in Google Sheets, after asking the user for confirmation.
Use
ocdskingfishercolab.set_spreadsheet_name()
to set the spreadsheet name.
- ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]¶
Dump the
release_package
column of a data frame to a JSON file, convert the JSON file to an Excel file, and upload the Excel file to Google Drive.- Parameters:
dataframe (pandas.DataFrame) – a data frame
name (str) – the basename of the Excel file to write
- ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]¶
Convert the data frame to a CSV file, and invoke a browser download of the CSV file to your local computer.
- Parameters:
dataframe (pandas.DataFrame) – a data frame
filename (str) – a file name
- ocdskingfishercolab.download_data_as_json(data, filename)[source]¶
Dump the data to a JSON file, and invoke a browser download of the CSV file to your local computer.
- Parameters:
data – JSON-serializable data
filename (str) – a file name
- ocdskingfishercolab.get_ipython_sql_resultset_from_query(sql, _collection_id=None, _ocid=None)[source]¶
Execute a SQL statement and return a ResultSet.
Parameters are taken from the scope this function is called from (same behaviour as ipython-sql’s
%sql
magic).
- ocdskingfishercolab.download_package_from_query(sql, package_type=None)[source]¶
Execute a SQL statement that SELECTs only the
data
column of thedata
table, and invoke a browser download of the packaged data to your local computer.- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]¶
Select all releases with the given ocid from the given collection, and invoke a browser download of the packaged releases to your local computer.
- Parameters:
- Raises:
UnknownPackageTypeError – when the provided package type is unknown
- ocdskingfishercolab.write_data_as_json(data, filename)[source]¶
Dump the data to a JSON file.
- Parameters:
data – JSON-serializable data
filename (str) – a file name
- ocdskingfishercolab.render_json(json_string)[source]¶
Render JSON into collapsible HTML.
- Parameters:
json_string – JSON-deserializable string
- ocdskingfishercolab.calculate_coverage(fields, scope=None, *, print_sql=True, return_sql=False)[source]¶
Calculate the coverage of one or more fields using the summary tables produced by Kingfisher Summarize’s
--field-lists
option. Return the coverage of each field and the co-occurrence coverage of all fields.scope
is the Kingfisher Summarize table to measure coverage against, e.g."awards_summary"
. Coverage is calculated using the number of rows in this table as the denominator.If
scope
is not set, it defaults to the parent table of the first field.fields
is a list of fields to measure the coverage of, specified using JSON Pointer.If a field isn’t a child of the
scope
table, use an absolute pointer:calculate_coverage(["tender/procurementMethod"], "awards_summary")
If a field is a child of the
scope
table, use either an absolute pointer:calculate_coverage(["awards/value/amount"], "awards_summary")
Or a relative pointer (prepend with
":"
):calculate_coverage([":value/amount"], "awards_summary")
If a field is within an array, it counts if it appears in any object in the array.
calculate_coverage([":items/description"], "awards_summary")
To require a field to appear in all objects in the array, prepend with
"ALL "
:calculate_coverage(["ALL :items/description"], "awards_summary")
Note
Nested arrays, like the
"awards/items/description"
field with a"release_summary"
scope, will yield inaccurate results, unless the initial arrays are present and one-to-one with the scope table (i.e. there is always exactly one award for each release).If
scope
is"awards_summary"
, you can specify fields on related contracts by prepending":contracts/"
:calculate_coverage([":value/amount", ":contracts/period"], "awards_summary")
If
scope
is"contracts_summary"
, you can specify fields on related awards by prepending":awards/"
:calculate_coverage([":value/amount", ":awards/date"], "contracts_summary")
- Parameters:
- Returns:
the results as a pandas DataFrame or an ipython-sql ResultSet, depending on whether
%config SqlMagic.autopandas
isTrue
orFalse
respectively. This is the same behaviour as ipython-sql’s%sql
magic.- Return type:
pandas.DataFrame or sql.run.ResultSet
- ocdskingfishercolab.set_dark_mode()[source]¶
Set the Seaborn theme to match Google Colaboratory’s dark mode.
- ocdskingfishercolab.set_light_mode()[source]¶
Set the Seaborn theme to light mode, for exporting plots.
- ocdskingfishercolab.format_thousands(axis, locale='en_US')[source]¶
Set the thousands separator on the given axis for the given locale, e.g.
en_US
.
- exception ocdskingfishercolab.OCDSKingfisherColabError[source]¶
Base class for exceptions from within this package.