diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 37 |
1 files changed, 23 insertions, 14 deletions
@@ -11,30 +11,37 @@ This tap: | |||
11 | - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get) | 11 | - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get) |
12 | - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get) | 12 | - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get) |
13 | - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) | 13 | - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) |
14 | - Outputs the following metadata streams: | ||
15 | - File Metadata: Name, audit/change info from Google Drive | ||
16 | - Spreadsheet Metadata: Basic metadata about the Spreadsheet: Title, Locale, URL, etc. | ||
17 | - Sheet Metadata: Title, URL, Area (max column and row), and Column Metadata | ||
18 | - Column Metadata: Column Header Name, Data type, Format | ||
19 | - Sheets Loaded: Sheet title, load date, number of rows | ||
14 | - For each Sheet: | 20 | - For each Sheet: |
15 | - Outputs the schema for each resource (based on the column header and datatypes of first row of data) | 21 | - Outputs the schema for each resource (based on the column header and datatypes of row 2, the first row of data) |
16 | - Outputs a record for all columns with column headers, and for each row of data until it reaches an empty row | 22 | - Outputs a record for all columns that have column headers, and for each row of data |
23 | - Emits a Singer ACTIVATE_VERSION message after each sheet is complete. This forces hard deletes on the data downstream if fewer records are sent. | ||
24 | - Primary Key for each row in a Sheet is the Row Number: `__sdc_row` | ||
25 | - Each Row in a Sheet also includes Foreign Keys to the Spreadsheet Metadata, `__sdc_spreadsheet_id`, and Sheet Metadata, `__sdc_sheet_id`. | ||
17 | 26 | ||
18 | ## API Endpoints | 27 | ## API Endpoints |
19 | [**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) | 28 | [**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) |
20 | - Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version | 29 | - Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version |
21 | - Primary keys: id | 30 | - Primary keys: id |
22 | - Replication strategy: Full (GET file audit data for spreadsheet_id in config) | 31 | - Replication strategy: Incremental (GET file audit data for spreadsheet_id in config) |
23 | - Process/Transformations: Replicate Data if Modified | 32 | - Process/Transformations: Replicate Data if Modified |
24 | 33 | ||
25 | [**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) | 34 | [**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) |
26 | - Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2 | 35 | - Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2 |
27 | - This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information) | 36 | - This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information) |
28 | - Primary keys: spreadsheetId, title, field_name | 37 | - Primary keys: Spreadsheet Id, Sheet Id, Column Index |
29 | - Foreign keys: None | 38 | - Foreign keys: None |
30 | - Replication strategy: Full (get and replace file metadata for spreadshee_id in config) | 39 | - Replication strategy: Full (get and replace file metadata for spreadshee_id in config) |
31 | - Process/Transformations: | 40 | - Process/Transformations: |
32 | - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area) | 41 | - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area) |
33 | - sheetId, title, index, gridProperties (rowCount, columnCount) | 42 | - sheetId, title, index, gridProperties (rowCount, columnCount) |
34 | - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column uniqueness | 43 | - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column name uniqueness |
35 | - Header's field_name, position: data.rowData[0].values[i].formattedValue | 44 | - Create/Verify Datatypes based on 2nd row value and cell metadata |
36 | - Create/Verify Datatypes (2nd row): | ||
37 | - Row 2's datatype, format: data.rowData[1].values[i] | ||
38 | - First check: | 45 | - First check: |
39 | - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue) | 46 | - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue) |
40 | - Valid types: numberValue, stringValue, boolValue | 47 | - Valid types: numberValue, stringValue, boolValue |
@@ -42,20 +49,22 @@ This tap: | |||
42 | - Then check: | 49 | - Then check: |
43 | - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType) | 50 | - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType) |
44 | - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC | 51 | - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC |
45 | - If DATE or DATE_TIME, set JSON schema datatype = string and format = date-time | 52 | - Determine JSON schema column data type based on the value and the above cell metadata settings. |
46 | - [effectiveFormat.numberFormat.pattern](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormat) | 53 | - If DATE, DATE_TIME, or TIME, set JSON schema format accordingly |
47 | 54 | ||
48 | [**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) | 55 | [**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) |
49 | - Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS | 56 | - Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS |
50 | - This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) | 57 | - This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) |
51 | - Primary keys: row | 58 | - Primary keys: _sdc_row |
52 | - Replication strategy: Full (GET file audit data for spreadsheet_id in config) | 59 | - Replication strategy: Full (GET file audit data for spreadsheet_id in config) |
53 | - Process/Transformations: | 60 | - Process/Transformations: |
54 | - Loop through sheets (compared to catalog selection) | 61 | - Loop through sheets (compared to catalog selection) |
55 | - Send metadata for sheet | 62 | - Send metadata for sheet |
56 | - Loop through ranges of rows until reaching empty row or area max row (from sheet metadata) | 63 | - Loop through ALL columns for columns having a column header |
57 | - Transform values, if necessary (dates, date-times, boolean, integer, numers) | 64 | - Loop through ranges of rows for ALL rows in sheet available area max row (from sheet metadata) |
58 | - Process/send records | 65 | - Transform values, if necessary (dates, date-times, times, boolean). |
66 | - Date/time serial numbers converted to date, date-time, and time strings. Google Sheets uses Lotus 1-2-3 [Serial Number](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) format for date/times. These are converted to normal UTC date-time strings. | ||
67 | - Process/send records to target | ||
59 | 68 | ||
60 | ## Authentication | 69 | ## Authentication |
61 | The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters. | 70 | The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters. |