from singer.catalog import Catalog, CatalogEntry, Schema
from tap_google_sheets.schema import get_schemas, STREAMS
Construct a Catalog Entry for each stream
Inputs:
GoogleClient
objectReturns:
def discover(client, spreadsheet_id):
It’s typical for taps in this style to call schema.py:get_schemas()
to get schemas
and
field_metadata
.
Here schemas
is a dictionary of stream name to JSON schema and field_metadata
is a dictionary
of stream name to another dictionary of stuff. In this tap, it seems that discover.py:discover()
only cares about sometimes getting table-key-properties
from field_metadata
.
table-key-properties
is a stream / table level
metadata, which you may or may not expect to be returned and stored in field_metadata
. schemas, field_metadata = get_schemas(client, spreadsheet_id)
catalog = Catalog([])
for stream_name, schema_dict in schemas.items():
schema = Schema.from_dict(schema_dict)
mdata = field_metadata[stream_name]
key_properties = None
for mdt in mdata:
table_key_properties = mdt.get('metadata', {}).get('table-key-properties')
if table_key_properties:
key_properties = table_key_properties
Once you have the stream_name
, value of table-key-properties
, the schema, and the
metadata for the some stream, we pass all of that to the singer.CatalogEntry
constructor
and append that to the singer.Catalog
object initialized at the start of
discover.py:discover()
.
catalog.streams.append(CatalogEntry(
stream=stream_name,
tap_stream_id=stream_name,
key_properties=STREAMS.get(stream_name, {}).get('key_properties', key_properties),
schema=schema,
metadata=mdata
))
return catalog