aboutsummaryrefslogtreecommitdiffhomepage
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md37
1 files changed, 23 insertions, 14 deletions
diff --git a/README.md b/README.md
index 8c9cc9d..9470411 100644
--- a/README.md
+++ b/README.md
@@ -11,30 +11,37 @@ This tap:
11 - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get) 11 - [File Metadata](https://developers.google.com/drive/api/v3/reference/files/get)
12 - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get) 12 - [Spreadsheet Metadata](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get)
13 - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) 13 - [Spreadsheet Values](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
14- Outputs the following metadata streams:
15 - File Metadata: Name, audit/change info from Google Drive
16 - Spreadsheet Metadata: Basic metadata about the Spreadsheet: Title, Locale, URL, etc.
17 - Sheet Metadata: Title, URL, Area (max column and row), and Column Metadata
18 - Column Metadata: Column Header Name, Data type, Format
19 - Sheets Loaded: Sheet title, load date, number of rows
14- For each Sheet: 20- For each Sheet:
15 - Outputs the schema for each resource (based on the column header and datatypes of first row of data) 21 - Outputs the schema for each resource (based on the column header and datatypes of row 2, the first row of data)
16 - Outputs a record for all columns with column headers, and for each row of data until it reaches an empty row 22 - Outputs a record for all columns that have column headers, and for each row of data
23 - Emits a Singer ACTIVATE_VERSION message after each sheet is complete. This forces hard deletes on the data downstream if fewer records are sent.
24 - Primary Key for each row in a Sheet is the Row Number: `__sdc_row`
25 - Each Row in a Sheet also includes Foreign Keys to the Spreadsheet Metadata, `__sdc_spreadsheet_id`, and Sheet Metadata, `__sdc_sheet_id`.
17 26
18## API Endpoints 27## API Endpoints
19[**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) 28[**file (GET)**](https://developers.google.com/drive/api/v3/reference/files/get)
20- Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version 29- Endpoint: https://www.googleapis.com/drive/v3/files/${spreadsheet_id}?fields=id,name,createdTime,modifiedTime,version
21- Primary keys: id 30- Primary keys: id
22- Replication strategy: Full (GET file audit data for spreadsheet_id in config) 31- Replication strategy: Incremental (GET file audit data for spreadsheet_id in config)
23- Process/Transformations: Replicate Data if Modified 32- Process/Transformations: Replicate Data if Modified
24 33
25[**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get) 34[**metadata (GET)**](https://developers.google.com/drive/api/v3/reference/files/get)
26- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2 35- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}?includeGridData=true&ranges=1:2
27- This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information) 36- This endpoint eturns spreadsheet metadata, sheet metadata, and value metadata (data type information)
28- Primary keys: spreadsheetId, title, field_name 37- Primary keys: Spreadsheet Id, Sheet Id, Column Index
29- Foreign keys: None 38- Foreign keys: None
30- Replication strategy: Full (get and replace file metadata for spreadshee_id in config) 39- Replication strategy: Full (get and replace file metadata for spreadshee_id in config)
31- Process/Transformations: 40- Process/Transformations:
32 - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area) 41 - Verify Sheets: Check sheets exist (compared to catalog) and check gridProperties (available area)
33 - sheetId, title, index, gridProperties (rowCount, columnCount) 42 - sheetId, title, index, gridProperties (rowCount, columnCount)
34 - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column uniqueness 43 - Verify Field Headers (1st row): Check field headers exist (compared to catalog), missing headers (columns to skip), column order/position, and column name uniqueness
35 - Header's field_name, position: data.rowData[0].values[i].formattedValue 44 - Create/Verify Datatypes based on 2nd row value and cell metadata
36 - Create/Verify Datatypes (2nd row):
37 - Row 2's datatype, format: data.rowData[1].values[i]
38 - First check: 45 - First check:
39 - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue) 46 - [effectiveValue: key](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue)
40 - Valid types: numberValue, stringValue, boolValue 47 - Valid types: numberValue, stringValue, boolValue
@@ -42,20 +49,22 @@ This tap:
42 - Then check: 49 - Then check:
43 - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType) 50 - [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType)
44 - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC 51 - Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC
45 - If DATE or DATE_TIME, set JSON schema datatype = string and format = date-time 52 - Determine JSON schema column data type based on the value and the above cell metadata settings.
46 - [effectiveFormat.numberFormat.pattern](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormat) 53 - If DATE, DATE_TIME, or TIME, set JSON schema format accordingly
47 54
48[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get) 55[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
49- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS 56- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS
50- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) 57- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
51- Primary keys: row 58- Primary keys: _sdc_row
52- Replication strategy: Full (GET file audit data for spreadsheet_id in config) 59- Replication strategy: Full (GET file audit data for spreadsheet_id in config)
53- Process/Transformations: 60- Process/Transformations:
54 - Loop through sheets (compared to catalog selection) 61 - Loop through sheets (compared to catalog selection)
55 - Send metadata for sheet 62 - Send metadata for sheet
56 - Loop through ranges of rows until reaching empty row or area max row (from sheet metadata) 63 - Loop through ALL columns for columns having a column header
57 - Transform values, if necessary (dates, date-times, boolean, integer, numers) 64 - Loop through ranges of rows for ALL rows in sheet available area max row (from sheet metadata)
58 - Process/send records 65 - Transform values, if necessary (dates, date-times, times, boolean).
66 - Date/time serial numbers converted to date, date-time, and time strings. Google Sheets uses Lotus 1-2-3 [Serial Number](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) format for date/times. These are converted to normal UTC date-time strings.
67 - Process/send records to target
59 68
60## Authentication 69## Authentication
61The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters. 70The [**Google Sheets Setup & Authentication**](https://drive.google.com/open?id=1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4) Google Doc provides instructions show how to configure the Google Cloud API credentials to enable Google Drive and Google Sheets APIs, configure Google Cloud to authorize/verify your domain ownership, generate an API key (client_id, client_secret), authenticate and generate a refresh_token, and prepare your tap config.json with the necessary parameters.