OCDS Kingfisher Colab 0.2.2

PyPI Version Build Status Coverage Status Python Version

A set of utility functions for Google Colaboratory notebooks using OCDS data.

If you are viewing this on GitHub, open the full documentation for additional details.

ocdskingfishercolab.create_connection(database, user, password='', host='localhost', port='5432', sslmode=None)[source]

Creates a connection to the database.

Returns:a database connection
Return type:psycopg2.extensions.connection
ocdskingfishercolab.reset_connection()[source]

Closes and resets the connection to the database.

This does not re-open the connection again.

ocdskingfishercolab.authenticate_gspread()[source]

Authenticates the current user and gives the notebook permission to connect to Google Spreadsheets.

Returns:a Google Sheets Client instance
Return type:gspread.Client
ocdskingfishercolab.authenticate_pydrive()[source]

Authenticates the current user and gives the notebook permission to connect to Google Drive.

Returns:a GoogleDrive instance
Return type:pydrive.drive.GoogleDrive
ocdskingfishercolab.set_spreadsheet_name(name)[source]

Sets the name of the spreadsheet to which to save.

Used by ocdskingfishercolab.save_dataframe_to_sheet().

Parameters:name (str) – a spreadsheet name
ocdskingfishercolab.list_source_ids(pattern='')[source]

Returns, as a data frame, a list of source IDs matching the given pattern.

Parameters:pattern (str) – a substring, like “paraguay”
Returns:the results as a data frame
Return type:pandas.DataFrame
ocdskingfishercolab.list_collections(source_id)[source]

Returns, as a data frame, a list of collections with the given source ID.

Parameters:source_id (str) – a source ID
Returns:the results as a data frame
Return type:pandas.DataFrame
ocdskingfishercolab.set_search_path(schema_name)[source]

Sets the search_path to the given schema, followed by the public schema.

Parameters:schema_name (str) – a schema name
ocdskingfishercolab.execute_statement(cur, sql, params=None)[source]

Executes a SQL statement, adding a comment with a link to the notebook for database administrators.

Parameters:
  • cur (psycopg2.extensions.cursor) – a database cursor
  • sql (str) – a SQL statement
  • params – the parameters to pass to the SQL statement
ocdskingfishercolab.get_list_from_query(sql, params=None)[source]

Executes a SQL statement and returns the results as a list of tuples.

Parameters:
  • sql (str) – a SQL statement
  • params – the parameters to pass to the SQL statement
Returns:

the results as a list of tuples

Return type:

list

ocdskingfishercolab.get_dataframe_from_query(sql, params=None)[source]

Executes a SQL statement and returns the results as a data frame.

Parameters:
  • sql (str) – a SQL statement
  • params – the parameters to pass to the SQL statement
Returns:

the results as a data frame

Return type:

pandas.DataFrame

ocdskingfishercolab.get_dataframe_from_cursor(cur)[source]

Accepts a database cursor after a SQL statement has been executed and returns the results as a data frame.

Parameters:cur (psycopg2.extensions.cursor) – a database cursor
Returns:the results as a data frame
Return type:pandas.DataFrame
ocdskingfishercolab.save_dataframe_to_sheet(dataframe, sheetname, prompt=True)[source]

Saves a data frame to a worksheet in Google Sheets, after asking the user for confirmation.

Use ocdskingfishercolab.set_spreadsheet_name() to set the spreadsheet name.

Parameters:
  • dataframe (pandas.DataFrame) – a data frame
  • sheetname (str) – a sheet name
  • prompt (bool) – whether to prompt the user
ocdskingfishercolab.save_dataframe_to_spreadsheet(dataframe, name)[source]

Dumps the release_package column of a data frame to a JSON file, converts the JSON file to an Excel file, and uploads the Excel file to Google Drive.

Parameters:
  • dataframe (pandas.DataFrame) – a data frame
  • name (str) – the basename of the Excel file to write
ocdskingfishercolab.download_dataframe_as_csv(dataframe, filename)[source]

Converts the data frame to a CSV file, and invokes a browser download of the CSV file to your local computer.

Parameters:
  • dataframe (pandas.DataFrame) – a data frame
  • filename (str) – a file name
ocdskingfishercolab.download_data_as_json(data, filename)[source]

Dumps the data to a JSON file, and invokes a browser download of the CSV file to your local computer.

Parameters:
  • data – JSON-serializable data
  • filename (str) – a file name
ocdskingfishercolab.download_package_from_query(sql, params=None, package_type=None)[source]

Executes a SQL statement that SELECTs only the data column of the data table, and invokes a browser download of the packaged data to your local computer.

Parameters:
  • sql (str) – a SQL statement
  • params – the parameters to pass to the SQL statement
  • package_type (str) – “record” or “release”
Raises:

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.download_package_from_ocid(collection_id, ocid, package_type)[source]

Selects all releases with the given ocid from the given collection, and invokes a browser download of the packaged releases to your local computer.

Parameters:
  • collection_id (int) – a collection’s ID
  • ocid (str) – an OCID
  • package_type (str) – “record” or “release”
Raises:

UnknownPackageTypeError – when the provided package type is unknown

ocdskingfishercolab.write_data_as_json(data, filename)[source]

Dumps the data to a JSON file.

Parameters:
  • data – JSON-serializable data
  • filename (str) – a file name
exception ocdskingfishercolab.OCDSKingfisherColabError[source]

Base class for exceptions from within this package

exception ocdskingfishercolab.UnknownPackageTypeError[source]

Raised when the provided package type is unknown