discover.py

#
from singer.catalog import Catalog, CatalogEntry, Schema
from tap_google_sheets.schema import get_schemas, STREAMS
#

Construct a Catalog Entry for each stream

Inputs:

  • client: A GoogleClient object
  • spreadsheet_id: the ID of a Google Sheet Doc

Returns:

  • A singer.Catalog object
def discover(client, spreadsheet_id):
#

It’s typical for taps in this style to call schema.py:get_schemas() to get schemas and field_metadata.

#

Here schemas is a dictionary of stream name to JSON schema and field_metadata is a dictionary of stream name to another dictionary of stuff. In this tap, it seems that discover.py:discover() only cares about sometimes getting table-key-properties from field_metadata.

#
  • This could be a point of confusion because table-key-properties is a stream / table level metadata, which you may or may not expect to be returned and stored in field_metadata.
    schemas, field_metadata = get_schemas(client, spreadsheet_id)
    catalog = Catalog([])

    for stream_name, schema_dict in schemas.items():
        schema = Schema.from_dict(schema_dict)
        mdata = field_metadata[stream_name]
        key_properties = None
        for mdt in mdata:
            table_key_properties = mdt.get('metadata', {}).get('table-key-properties')
            if table_key_properties:
                key_properties = table_key_properties
#

Once you have the stream_name, value of table-key-properties, the schema, and the metadata for the some stream, we pass all of that to the singer.CatalogEntry constructor and append that to the singer.Catalog object initialized at the start of discover.py:discover().

        catalog.streams.append(CatalogEntry(
            stream=stream_name,
            tap_stream_id=stream_name,
            key_properties=STREAMS.get(stream_name, {}).get('key_properties', key_properties),
            schema=schema,
            metadata=mdata
        ))

    return catalog