streams.py:STREAMS
is an OrderedDict
. Only because we want to loop over it in the same order
every time.
It’s still the same global variable found in taps of this style. It maps stream names to a dictionary describing the stream.
Some notable things we learn in this file:
api
is either "files"
or "sheets"
We saw this used in client.py:GoogleClient.request()
to switch the base url of the request
"file_metadata"
is the only incremental stream
Full table streams include:
"spreadsheet_metadata"
"sheet_metadata"
"sheets_loaded"
"sheets_loaded"
is the only stream with a "data_key"
data_key
be the name of the key to get data out of “envelope” responsesfrom collections import OrderedDict
streams: API URL endpoints to be called
properties:
file_metadata: Queries Google Drive API to get file information and see if file has been modified Provides audit info about who and when last changed the file.
FILE_METADATA = {
"api": "files",
"path": "files/{spreadsheet_id}",
"key_properties": ["id"],
"replication_method": "INCREMENTAL",
"replication_keys": ["modifiedTime"],
"params": {
"fields": "id,name,createdTime,modifiedTime,version,teamDriveId,driveId,lastModifyingUser"
}
}
spreadsheet_metadata: Queries spreadsheet to get basic information on spreadhsheet and sheets
SPREADSHEET_METADATA = {
"api": "sheets",
"path": "spreadsheets/{spreadsheet_id}",
"key_properties": ["spreadsheetId"],
"replication_method": "FULL_TABLE",
"params": {
"includeGridData": "false"
}
}
sheet_metadata: Get Header Row and 1st data row (Rows 1 & 2) from a Sheet on Spreadsheet. This endpoint includes detailed metadata about each cell in the header and first data row incl. data type, formatting, etc.
SHEET_METADATA = {
"api": "sheets",
"path": "spreadsheets/{spreadsheet_id}",
"key_properties": ["sheetId"],
"replication_method": "FULL_TABLE",
"params": {
"includeGridData": "true",
"ranges": "'{sheet_title}'!1:2"
}
}
sheets_loaded: Queries a batch of Rows for each Sheet in the Spreadsheet.
Each query uses the values
endpoint, to get data-only, w/out the formatting/type metadata.
SHEETS_LOADED = {
"api": "sheets",
"path": "spreadsheets/{spreadsheet_id}/values/'{sheet_title}'!{range_rows}",
"data_key": "values",
"key_properties": ["spreadsheetId", "sheetId", "loadDate"],
"replication_method": "FULL_TABLE",
"params": {
"dateTimeRenderOption": "SERIAL_NUMBER",
"valueRenderOption": "UNFORMATTED_VALUE",
"majorDimension": "ROWS"
}
}
Ensure streams are ordered sequentially, logically.
STREAMS = OrderedDict()
STREAMS['file_metadata'] = FILE_METADATA
STREAMS['spreadsheet_metadata'] = SPREADSHEET_METADATA
STREAMS['sheet_metadata'] = SHEET_METADATA
STREAMS['sheets_loaded'] = SHEETS_LOADED