aboutsummaryrefslogtreecommitdiffhomepage
path: root/README.md
diff options
context:
space:
mode:
authorJeff Huth <39202799+jeffhuth-bytecode@users.noreply.github.com>2020-01-09 07:30:53 -0800
committerKyle Allan <KAllan357@gmail.com>2020-01-09 10:30:53 -0500
commit43a24cbab1dbc35b893c35b86e34adc0f2fb84e7 (patch)
treebcbaae860aad0a94bcc4d27f4804504691401438 /README.md
parent5890b89c1aa7c554235b3cef156b5a5a2c594bec (diff)
downloadtap-google-sheets-43a24cbab1dbc35b893c35b86e34adc0f2fb84e7.tar.gz
tap-google-sheets-43a24cbab1dbc35b893c35b86e34adc0f2fb84e7.tar.zst
tap-google-sheets-43a24cbab1dbc35b893c35b86e34adc0f2fb84e7.zip
v.0.0.3 Sync error handling, activate version, documentation (#2)v0.0.3
* v.0.0.2 schema and sync changes Change number json schema to anyOf with multipleOf; skip empty rows; move write_bookmark to end of sync.py * v.0.0.3 Sync activate version and error handling Update README.md documentation. Improved logging and handling of errors and warnings. Better null handling in Discovery and Sync. Fix issues with activate version messages.
Diffstat (limited to 'README.md')
-rw-r--r--README.md37
1 files changed, 23 insertions, 14 deletions
diff --git a/README.md b/README.md
index 8c9cc9d..9470411 100644
--- a/README.md
+++ b/README.md
@@ -11,30 +11,37 @@ This tap:
11 - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get) 11 - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get)
12 - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get) 12 - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get)
13 - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) 13 - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
14- Outputs the following metadata streams:
15 - File Metadata: Name, audit/change info from Google Drive
16 - Spreadsheet Metadata: Basic metadata about the Spreadsheet: Title, Locale, URL, etc.
17 - Sheet Metadata: Title, URL, Area (max column and row), and Column Metadata
18 - Column Metadata: Column Header Name, Data type, Format
19 - Sheets Loaded: Sheet title, load date, number of rows
14- For each Sheet: 20- For each Sheet:
15 - Outputs the schema for each resource (based on the column header and datatypes of first row of data) 21 - Outputs the schema for each resource (based on the column header and datatypes of row 2, the first row of data)
16 - Outputs a record for all columns with column headers, and for each row of data until it reaches an empty row 22 - Outputs a record for all columns that have column headers, and for each row of data
23 - Emits a Singer ACTIVATE_VERSION message after each sheet is complete. This forces hard deletes on the data downstream if fewer records are sent.
24 - Primary Key for each row in a Sheet is the Row Number: `__sdc_row`
25 - Each Row in a Sheet also includes Foreign Keys to the Spreadsheet Metadata, `__sdc_spreadsheet_id`, and Sheet Metadata, `__sdc_sheet_id`.
17 26
18## API Endpoints 27## API Endpoints
19[**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) 28[**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get)
20- Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version 29- Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version
21- Primary keys: id 30- Primary keys: id
22- Replication strategy: Full (GET file audit data for spreadsheet_id in config) 31- Replication strategy: Incremental (GET file audit data for spreadsheet_id in config)
23- Process/Transformations: Replicate Data if Modified 32- Process/Transformations: Replicate Data if Modified
24 33
25[**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) 34[**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get)
26- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2 35- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2
27- This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information) 36- This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information)
28- Primary keys: spreadsheetId, title, field_name 37- Primary keys: Spreadsheet Id, Sheet Id, Column Index
29- Foreign keys: None 38- Foreign keys: None
30- Replication strategy: Full (get and replace file metadata for spreadshee_id in config) 39- Replication strategy: Full (get and replace file metadata for spreadshee_id in config)
31- Process/Transformations: 40- Process/Transformations:
32 - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area) 41 - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area)
33 - sheetId, title, index, gridProperties (rowCount, columnCount) 42 - sheetId, title, index, gridProperties (rowCount, columnCount)
34 - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column uniqueness 43 - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column name uniqueness
35 - Header's field_name, position: data.rowData[0].values[i].formattedValue 44 - Create/Verify Datatypes based on 2nd row value and cell metadata
36 - Create/Verify Datatypes (2nd row):
37 - Row 2's datatype, format: data.rowData[1].values[i]
38 - First check: 45 - First check:
39 - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue) 46 - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue)
40 - Valid types: numberValue, stringValue, boolValue 47 - Valid types: numberValue, stringValue, boolValue
@@ -42,20 +49,22 @@ This tap:
42 - Then check: 49 - Then check:
43 - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType) 50 - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType)
44 - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC 51 - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC
45 - If DATE or DATE_TIME, set JSON schema datatype = string and format = date-time 52 - Determine JSON schema column data type based on the value and the above cell metadata settings.
46 - [effectiveFormat.numberFormat.pattern](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormat) 53 - If DATE, DATE_TIME, or TIME, set JSON schema format accordingly
47 54
48[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) 55[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
49- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS 56- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS
50- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) 57- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
51- Primary keys: row 58- Primary keys: _sdc_row
52- Replication strategy: Full (GET file audit data for spreadsheet_id in config) 59- Replication strategy: Full (GET file audit data for spreadsheet_id in config)
53- Process/Transformations: 60- Process/Transformations:
54 - Loop through sheets (compared to catalog selection) 61 - Loop through sheets (compared to catalog selection)
55 - Send metadata for sheet 62 - Send metadata for sheet
56 - Loop through ranges of rows until reaching empty row or area max row (from sheet metadata) 63 - Loop through ALL columns for columns having a column header
57 - Transform values, if necessary (dates, date-times, boolean, integer, numers) 64 - Loop through ranges of rows for ALL rows in sheet available area max row (from sheet metadata)
58 - Process/send records 65 - Transform values, if necessary (dates, date-times, times, boolean).
66 - Date/time serial numbers converted to date, date-time, and time strings. Google Sheets uses Lotus 1-2-3 [Serial Number](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) format for date/times. These are converted to normal UTC date-time strings.
67 - Process/send records to target
59 68
60## Authentication 69## Authentication
61The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters. 70The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters.